page_705

< previous page

page_705

Page 705

Table 10.18 CPI summary without memory traffic scaling.

Size

Baseline

CPI

Superscalar

CPI

Multiprocessor

CPI

0.75m

64K-U

1.957

48K-U

1.918

8I/8I/16D

2.37

32I/32D

1.851

16I/32D

1.693

16I/16I/8D

1.977

32I/16D

1.533

1.0m

24k-U

2.158

16K-U

2.514

2I/2I/2D

5.08

16I/8D

2.12

8I/8D

2.33

8I/16D

2.17

0.6m

64I/48D

1.786

64I/32D

1.324

32I/32I/32I

1.403

48I/64D

1.795

32I/64D

1.382

48D/16I/16I

1.634

0.5m

96I/64D

1.763

128I/32D

1.306

48D/48I/48I

1.326

128I/32D

1.781

96I/64D

1.239

64D/32I/32I

1.348

0.4m

256-U

1.854

256-U

1.547

128D/64I/64I

1.182

128I/128D

1.737

128I/128D

1.176

0.3m

512-U

1.84

512-U

1.51

256D/128I/128I

1.124

256I/256D

1.72

256I/256D

1.145



		Cache Design Options



		After calculating the area available for cache, we proceed with the selection of the cache configuration that results in the optimum CPI. One of the problems is that the area may not fall squarely into 2ⁿKB boundaries. Industry implementations use different associativities to accommodate this problem, but this explodes the number of configurations.



		The effect of nonbinary associativity is better utilization of the available area, but it should have only a second-order effect on CPI. The cache configurations with their respective CPI are as shown in Table 10.18.



		Some general observations can be made from Table 10.18:



		1. The unified cache does not perform as well as split cache when contention is taken into account.



		2. Since the processor core size shrinks with feature size, all implementations tend to have similar cache sizes at smaller feature sizes.



		3. The multiprocessor requires at least three separate caches due to the requirement of having split instruction caches. This results in a higher I-cache miss rate, compared to the superscalar case.



		Write Buffer Management



		Recall the CPI section where we discussed different write buffer management schemes. For CBWA and WTNWA, there are three schemes and two schemes, respectively, with the first being the simplest. In order to evaluate

< previous page

page_705