< previous page page_335 next page >

Page 335
Integrated Cache Evaluation
Now we can complete the analysis of the integrated 64KB cache (WTNWA). To make the comparison with the split cache as fair as possible, assume we use four-way set associative, 32B line with RANDrandom replacement of lines. We determine a miss rate of 1.8 ´ 1.05 ´ (1.7) = 3.2% miss/reference (DTMR adjusted for 4w and system's effect). From our earlier split cache analysis, we have determined the read traffic per instruction of:
d87111c01013bcda00bb8640fdff6754.gif
0.73 I-refr/I + 0.34 D-reads/I = 1.07 refr/I.
Then the actual miss rate is:
d87111c01013bcda00bb8640fdff6754.gif
3.2% ´ 1.12 (for RAND) = 3.6%,
and the delay due to cache misses per instruction is:
d87111c01013bcda00bb8640fdff6754.gif
.036 misses/refr ´ 1.07 refr/I ´ 6 cycles/miss = 0.23 CPI.
This compares to the split cache delay of 0.29 CPI.
To the integrated cache delay, we add 0.10 CPI caused by cache access contention, or:
d87111c01013bcda00bb8640fdff6754.gif
0.23 + 0.10 = 0.33 CPI.
Thus, split cache is more effective in this example. Note that the 0.10 CPI is still a conservative estimate and, in all likelihood, both caches would perform about the same.
Study 5.5 The Cache for the Baseline Processor
From study 2.2, we know that we have 28,633 bytes (cache bytes) available for cache data arrays and directories.
Based on an I-traffic to D-traffic ratio of about 2:1, let us split the cache and assign 16KB (16,384 bytes) to the I-cache array and 8KB (8,192 bytes) to the D-cache. This leaves 4,057 bytes available for directories.
Suppose we have fully blocking caches with a 5-cycle memory access, a 4B physical word and bus, and a one-cycle time per (4B) word transferred. In order to keep the cache miss penalty down, let us select a 16B line (4words).
Assuming a memory that supports fast sequential page mode (at least one word per processor cycle), this gives us a cache miss delay of:
Tc.miss
=
Tm.miss = 5 + (L - 1) cycles
=
8 cycles

Now we can compute the size of the directories. Assume direct-mapped caches. The directory entry consists of a tag plus some control information. We can now compute the tag size. The real address size is 24 bits (study 2.3).

 
< previous page page_335 next page >