< previous page page_691 next page >

Page 691
Table 10.12 Baseline processor area.
Integer ALU
1.00
Added branch adder 1.00
Integer register
1.00
FP Regfile 1.
Shifter
0.50
FP adder 13.5
Incrementor
0.40
FP multiplier 20.3
PC unit + Bypass
1.00
Divide Support 3.
2 TLBs
6.00
F-P Unit 37.8
Decode + Control
1.00
.
Cache controller
1.00
Latches overhead 5.49
Bus logic
2.00
Bus overhead 21.96
Stored buffer + Bypass
1.00
Total baseline 82.35
Load/store
0.20
.
Clock generator
1.00
Die area 230.
Integer Unit
16.10
Area available 184.
Area in "A" 335.11
Option 1 .
Remaining area 252.76
-10% aspect ratio 25.28
Cache area 227.48
Cache bits561,116.96
Cache KB 68.50

spite of differences in transient performance or latency (e.g., IB runout). Buffers are useless if the average bandwidth of the entities is very different. In this case, the performance of the entities degrades to that of the lower bandwidth process.
Instruction Buffer Design
Before exploring cache design, we need to account for the effect of instruction buffer given the available area just computed. In designing a buffer, we may design it either for mean request rate or for the maximum request rate. Designing a buffer to accommodate only the mean request rate allows us to trade off buffer size against the probability of overflow. We must evaluate the consequences of overflow and decide whether it is tolerable in terms of overall processor performance. For the instruction reference traffic, we assumed a mean instruction traffic of 1 inst ref/cycle (i.e., 1 IF/I) for the baseline processor. Since the maximum throughput of the pipeline is one, the maximum instruction traffic is at least one instruction word per cycle, not including the branch reference traffic.
In-line instruction references dominate performance and should be designed for the maximum request rate, since the I-buffer fetches whenever the cache is free and the buffer is not full. The designer needs to ensure the following:

 
< previous page page_691 next page >