|
|
|
|
|
|
|
caches have an advantage over copyback caches, since memory is always consistent with cache. However, memory traffic can be exceedingly high. Increasing the physical word size to memory increases memory bandwidth, but does not help the situation with write-through traffic, since write-through traffic consists of single words. |
|
|
|
|
|
|
|
|
From the previous analysis, main memory cannot support the write-through traffic, so another solution needs to be considered. One such alternative is to use a Write Assembly Cache (WAC). Data show that a Write Assembly Cache of 16 8-byte lines can reduce the relative write traffic to 30%. Adding a write assembly cache serves two purposes: |
|
|
|
|
|
|
|
|
It filters out repeated writes to the same line. |
|
|
|
|
|
|
|
|
It converts single-word transfers to more efficient line transfers. |
|
|
|
|
|
|
|
|
Since a WAC is unique to the write-through cache, we must evaluate its cost to compare the merits of the different write policies. Based on data given in section 5.13, 16 8-byte lines can filter about 70% of the write traffic. The area of this Write Assembly Cache is calculated as follows: |
|
|
|
|
| | | | | |
|
|
|
|
(log2 (memory size/write assoc. cache size)
+ 2 control bits + 2-bit assoc. * 16 lines |
|
|
|
| | | |
|
|
|
|
(19 + 2 + 2) * 16 = 368 bits |
|
|
|
| | | |
|
|
|
|
|
|
This is small when compared to the cache size. The write cache size becomes bigger as the write traffic increases for the superscalar and multiprocessor implementations. |
|
|
|
|
|
|
|
|
With the addition of a secondary cache, the traffic to the memory bus is reduced, since it serves as a filter between L1 traffic and the memory bus. We verify this later. |
|
|
|
|
|
|
|
|
Different combinations of CBWA and WTNWA and different buffer management schemes have different implications for the overall CPI. For the analysis below, a "perfect write buffer" is assumed, since the effect of write traffic is analyzed separately. |
|
|
|
|
|
|
|
|
With the WTNWA configuration, we consider two buffer management schemes. |
|
|
|
 |
|
|
|
|
Scheme 1: In a read miss, the entire line must be fetched in prior to the resumption of processing. |
|
|
|
 |
|
|
|
|
Scheme 2: In a read miss, wraparound is used so the first word is returned first, and processing continues while memory finishes loading the rest of the line into cache. |
|
|
|
|
| | |
|
|
|
|
Time memory is busy due to a read request |
|
|
|
|
|
|
|
|
|
|
(table continued on next page) |
|
|
|
|
|