|
|
|
|
|
|
Integrated Cache Evaluation |
|
|
|
|
|
|
|
|
Now we can complete the analysis of the integrated 64KB cache (WTNWA). To make the comparison with the split cache as fair as possible, assume we use four-way set associative, 32B line with RANDrandom replacement of lines. We determine a miss rate of 1.8 ´ 1.05 ´ (1.7) = 3.2% miss/reference (DTMR adjusted for 4w and system's effect). From our earlier split cache analysis, we have determined the read traffic per instruction of: |
|
|
|
 |
|
|
|
|
0.73 I-refr/I + 0.34 D-reads/I = 1.07 refr/I. |
|
|
|
|
|
|
|
|
Then the actual miss rate is: |
|
|
|
 |
|
|
|
|
3.2% ´ 1.12 (for RAND) = 3.6%, |
|
|
|
|
|
|
|
|
and the delay due to cache misses per instruction is: |
|
|
|
 |
|
|
|
|
.036 misses/refr ´ 1.07 refr/I ´ 6 cycles/miss = 0.23 CPI. |
|
|
|
|
|
|
|
|
This compares to the split cache delay of 0.29 CPI. |
|
|
|
|
|
|
|
|
To the integrated cache delay, we add 0.10 CPI caused by cache access contention, or: |
|
|
|
 |
|
|
|
|
0.23 + 0.10 = 0.33 CPI. |
|
|
|
|
|
|
|
|
Thus, split cache is more effective in this example. Note that the 0.10 CPI is still a conservative estimate and, in all likelihood, both caches would perform about the same. |
|
|
|
|
|
|
|
|
Study 5.5 The Cache for the Baseline Processor |
|
|
|
|
|
|
|
|
From study 2.2, we know that we have 28,633 bytes (cache bytes) available for cache data arrays and directories. |
|
|
|
|
|
|
|
|
Based on an I-traffic to D-traffic ratio of about 2:1, let us split the cache and assign 16KB (16,384 bytes) to the I-cache array and 8KB (8,192 bytes) to the D-cache. This leaves 4,057 bytes available for directories. |
|
|
|
|
|
|
|
|
Suppose we have fully blocking caches with a 5-cycle memory access, a 4B physical word and bus, and a one-cycle time per (4B) word transferred. In order to keep the cache miss penalty down, let us select a 16B line (4words). |
|
|
|
|
|
|
|
|
Assuming a memory that supports fast sequential page mode (at least one word per processor cycle), this gives us a cache miss delay of: |
|
|
|
|
| | |
|
|
|
|
Tm.miss = 5 + (L - 1) cycles |
|
|
|
| | | |
|
|
|
|
|
|
Now we can compute the size of the directories. Assume direct-mapped caches. The directory entry consists of a tag plus some control information. We can now compute the tag size. The real address size is 24 bits (study 2.3). |
|
|
|
|
|