|
|
|
|
|
|
Figure 5.40
Write assembly cache, relative write traffic (line size
of 4 and 8 bytes, transfer unit of 4 and 8 bytes,
four-way set associative) (data from Bray [43]). |
|
|
|
|
|
|
|
|
is also measured in bytes. Assume for the moment that the write assembly cache (WAC) is fully associative and fully bypassed. If a reference r(x) occurs, and x is in the WAC, then x is returned to the processor without otherwise affecting the memory system or the WAC. |
|
|
|
|
|
|
|
|
The goal of the write assembly cache is to assemble writes so that they can be transmitted in an orderly way to the memory system, minimizing the use of the bus for memory traffic. If a synchronizing event occurs as in the case of multiple shared memory processors, the entire WAC should be transferred to memory to ensure memory consistency. |
|
|
|
|
|
|
|
|
Suppose we organize the WAC such that l = t, creating a WAC of multiple small entries. Figure 5.40 shows the temporal localities of writes and the efficiency of such a buffer. Now write traffic can be reduced to about one-quarter of the null buffer case, providing a significant performance improvement in those cases where write traffic dominates the memory traffic. |
|
|
|
|
|
|
|
|
Figure 5.40 shows the resultant write traffic (from the write assembly cache to memory) relative to a null buffer (i.e., the write traffic to memory without a WAC). The data presented are generally independent of set associativity. For most programs (such as the ones included in the benchmarks used for Figure 5.40), temporal locality seems to play a more important role in write traffic than spatial locality. Thus, it is advantageous to have more smaller lines rather than fewer larger lines to reduce resultant write traffic. An exception occurs in the case of large scientific programs accessing data arrays, where spatial locality (fewer large lines) dominates the reduction in write traffic. A suggested general solution for implementing a WAC [43] would be the following strategy: |
|
|
|
|
|
|
|
|
Number of linesminimum of 4. |
|
|
|
|
|
|
|
|
Line size8 bytes (equal to the transfer size). |
|
|
|
|
|
|
|
|
Associativitydirect mapped. |
|
|
|
|
|