< previous page page_705 next page >

Page 705
Table 10.18 CPI summary without memory traffic scaling.
SizeBaselineCPISuperscalarCPIMultiprocessorCPI
0.75m64K-U1.95748K-U1.9188I/8I/16D2.37
32I/32D1.85116I/32D1.69316I/16I/8D1.977
32I/16D1.533
1.0m24k-U2.15816K-U2.5142I/2I/2D5.08
16I/8D2.128I/8D2.33
8I/16D2.17
0.6m64I/48D1.78664I/32D1.32432I/32I/32I1.403
48I/64D1.79532I/64D1.38248D/16I/16I1.634
0.5m96I/64D1.763128I/32D1.30648D/48I/48I1.326
128I/32D1.78196I/64D1.23964D/32I/32I1.348
0.4m256-U1.854256-U1.547128D/64I/64I1.182
128I/128D1.737128I/128D1.176
0.3m512-U1.84512-U1.51256D/128I/128I1.124
256I/256D1.72256I/256D1.145

Cache Design Options
After calculating the area available for cache, we proceed with the selection of the cache configuration that results in the optimum CPI. One of the problems is that the area may not fall squarely into 2nKB boundaries. Industry implementations use different associativities to accommodate this problem, but this explodes the number of configurations.
The effect of nonbinary associativity is better utilization of the available area, but it should have only a second-order effect on CPI. The cache configurations with their respective CPI are as shown in Table 10.18.
Some general observations can be made from Table 10.18:
1. The unified cache does not perform as well as split cache when contention is taken into account.
2. Since the processor core size shrinks with feature size, all implementations tend to have similar cache sizes at smaller feature sizes.
3. The multiprocessor requires at least three separate caches due to the requirement of having split instruction caches. This results in a higher I-cache miss rate, compared to the superscalar case.
Write Buffer Management
Recall the CPI section where we discussed different write buffer management schemes. For CBWA and WTNWA, there are three schemes and two schemes, respectively, with the first being the simplest. In order to evaluate

 
< previous page page_705 next page >