|
|
|
|
|
|
For the I-cache, we have 16K/16 = 1,024 lines, and for the D-cache we have 512 lines. This means that the I-cache index is 10 bits and the D-cache index is 9 bits. Therefore (allowing 4 bits for B/line), the I-cache tag is |
|
|
|
 |
|
|
|
|
24 - 10 - 4 = 10b |
|
|
|
 |
|
|
|
|
24 - 9 - 4 = 11b. |
|
|
|
|
|
|
|
|
Even if we allow 2 bytes for tag + control for each directory entry, we would use only 3,072 bytes of the 4,057 available. |
|
|
|
|
|
|
|
|
Now consider the effect of cache misses on performance. Suppose we use CBWA policyno LRU considerations are necessary, as the cache is direct mapped. We have 1 IF/I, 0.31 DF/I, and 0.20 DS/I (from section 5.14). |
|
|
|
|
|
|
|
|
Now the miss rate (Appendix A) for the I-cache is 0.05 ´ 1.35 = 0.068 and the D-cache is 0.08 ´ 1.32 = 0.106. We can find the CPI loss due to cache misses by summing |
|
|
|
 |
|
|
|
|
Refs/I ´ misses/ref ´ delay/miss. |
|
|
|
|
| | |
|
|
|
|
I-cache loss + D-cache loss |
|
|
|
| | | |
|
|
|
|
1 ´ 0.068 ´ 8 + 0.50 ´ 0.106 ´ 8(1 + .5) |
|
|
|
| | | |
|
|
|
|
|
|
Now we must account for TLB misses. Suppose the "not in TLB" delay is 20 cycles. From Figure 5.46, we estimate that a TLB (4KB pages) with 128 entries (64 ´ 2) will have a miss rate of 0.006. There is one access to the I-TLB and 0.5 references to the D-TLB each instruction. The resulting delay estimate is: |
|
|
|
 |
|
|
|
|
CPI delay = 0.006 ´ 20 ´ 1.5 = 0.18. |
|
|
|
|
|
|
|
|
From study 4.10, we have a CPI (without cache misses) of 1.60. So we now have in total: |
|
|
|
 |
|
|
|
|
CPI = 1.60 + 1.18 + 0.18 = 2.96. |
|
|
|
|
|
|
|
|
How to improve this? Clearly, we need larger caches, more aggressive overlapping of misses with processor execution and (perhaps) increased associativity. All this requires area. We see more alternatives in chapter 7. |
|
|
|
|
|
|
|
|
5.19.1 Cache Evaluation Design Rules |
|
|
|
|
|
|
|
|
1. Find DTMR based on cache size and line size. |
|
|
|
|
|
|
|
|
2. Adjust for set associativity and line replacement (if other than fully associative and LRU). |
|
|
|
|
|
|
|
|
3. Select the most representative system environment and adjust miss rate. |
|
|
|
|
|