





















|                  | Pentium III        | Pentium IV         |  |  |
|------------------|--------------------|--------------------|--|--|
| Technology       | 180nm              | 180nn              |  |  |
| Die Size         | 106mm <sup>2</sup> | 217mm <sup>2</sup> |  |  |
| Transistor Count | 24 million         | 42 millior         |  |  |
| # Grids          | 10 <sup>8</sup>    | 2x10               |  |  |
| Pipeline Stages  | 10                 | 20                 |  |  |
| Clock Rate       | 1GHz (15 FO4)      | 1.5GHz (10.4 FO4   |  |  |
| L1 D\$ Capacity  | 16KBytes           | 12KBytes           |  |  |
| SpecInt2000      | 454                | 524                |  |  |
| SpecInt/MHz      | 0.45               | 0.3                |  |  |







| Operation                         | Energy   | nergy        |  |
|-----------------------------------|----------|--------------|--|
|                                   | (0.13um) | (0.05um)     |  |
| 32b ALU Operation                 | 5pJ      | 0.3pJ        |  |
| 32b Register Read                 | 10pJ     | 0.6pJ        |  |
| Read 32b from 8KB RAM             | 50pJ     | 3pJ          |  |
| Transfer 32b across chip (10mm)   | 100pj    | 17pj         |  |
| Execute a uP instruction (SB-1)   | 1.1nJ    | 130pJ        |  |
| Transter 32b off chip (2.5G CML)  | 1.3nJ    | 400pJ        |  |
| Transfer 32b off chip (200M HSTL) | 1.9nJ    | 1.9nJ        |  |
| 000 00 4 55 1 1                   |          | and a second |  |

| 0<br>Operation                  | Delay     | elay        |  |
|---------------------------------|-----------|-------------|--|
| -                               | (0.13um)  | (0.05um)    |  |
| 32b ALU Operation               | 650ps     | 250ps       |  |
| 32b Register Read               | 325ps     | 125ps       |  |
| Read 32b from 8KB RAM           | 780ps     | 300ps       |  |
| Transfer 32b across chip (10mm) | 1400ps    | 2300ps      |  |
| Transter 32b across chip (20mm) | 2800ps    | 4600ps      |  |
| 2:1 global on-chip com          | m to oper | ation delay |  |
| 9:1 in 2010                     |           |             |  |

















































| Item                 | Cost  | Per Node |
|----------------------|-------|----------|
| Processor chip       | 200   | 200      |
| Router chip          | 200   | 50       |
| Memory chip          | 20    | 320      |
| Board/Backplane      | 3000  | 188      |
| Cabinet              | 50000 | 49       |
| Power                | 1     | 50       |
| Per-Node Cost        |       | 976      |
| \$/GFLOPS (64/node)  |       | 15       |
| \$/M-GUPS (250/node) |       | 4        |



