|
|
|
|
|
|
|
Data reference traffic has different behavior from instruction traffic, which tends to exhibit more spatial locality (sequential). Data is explicitly loaded into register prior to use for a L/S machine; thus, the size of references should be the size of a data word. |
|
|
|
|
| Type | | | | Baseline | | | | Superscalar | | | | MP | | |
|
|
|
|
|
|
Perfect versus Real Buffers |
|
|
|
|
|
|
|
|
In evaluating performance, the effect of data reference traffic on performance needs to be evaluated. By assuming a perfect buffer, a majority of the performance degradation due to data stores can be removed. A perfect buffer is a buffer that is arbitrarily large enough such that the probability of overflow tends to zero. If the occupancy of memory is low, such an assumption may be a valid one. Ultimately, the bandwidth of the off-chip memory must be greater than that of the reference traffic, or a closed-queue situation will occur where the request rate of the processor has to be slowed down to that of the service rate. |
|
|
|
|
|
|
|
|
Since data store traffic is usually less than read traffic (instr read + data read) and is not as critical to performance, we can choose to design the write buffer based on either maximum or average traffic. The actual amount of write traffic depends on the cache management strategy and is discussed in the next section. |
|
|
|
|
|
|
|
|
There are three ways to view the data reference data. Obviously, 0.20 data read/cycle is an average, since we cannot have 0.20 data read references. For maximum traffic, we could feasibly have a sequence of data stores, thus giving one data write per cycle. On the other end of the spectrum, we can compute the average number of data stores per cycle. For the baseline CPI of 1.47, the data store per cycle rate is 0.20/1.47 = 0.16 stores/cycle. Thus, assuming 0.20 data stores/cycle seems to be a reasonable midpoint as long as we design our write buffer sufficiently large to handle a short period of peak traffic. |
|
|
|
|
|
|
|
|
In addition to deciding the size of the write buffer, there are various design decisions regarding the management of buffers and memory accesses. These decisions affect the processor in terms of the number of stall cycles during memory references. |
|
|
|
|
|
|
|
|
There are three classifications: |
|
|
|
|
|
|
|
|
Simple: all data in the write buffer must be written to memory before processing the read request. In other words, reads cannot bypass writes. For CBWA caches, the replaced line, if dirty, has to be written out before the read can proceed. |
|
|
|
|
|