|
|
|
|
|
|
6.3.1 Memory Systems Design |
|
|
|
|
|
|
|
|
The design of a high-performance memory system is an iterative process, where the bandwidth and partitioning of the system are determined by evaluation of cost, access time, and queueing requirements. As a general rule, more modules provide more low-order interleaving and more bandwidth, thus reducing queueing delays and improving access time. However, as interleaving increases, system costs are raised and the interconnection network becomes more complex, expensive, and potentially slower. |
|
|
|
|
|
|
|
|
The basic steps in the design of the memory system consist of the following: |
|
|
|
|
|
|
|
|
1. Determination of the number of memory modules and the partitioning of the memory system. The initial memory partition is determined by the relative cost of modules of various size, as well as the initial assessment of bandwidth required. Associated with this tradeoff is the choice of physical word size. Longer words provide enhanced sequential access, but create large memory module size when largescale (2n´ 1bit) memory chips are used. |
|
|
|
|
|
|
|
|
2. Determination of the offered bandwidth. As was mentioned before, the offered bandwidth is determined by the number of processors making requests on the memory system (or, equivalently, by the number of processor requests coming from one or more processors) per memory cycle. This rate is a function of the peak instruction processing rate that the processor(s) are designed for, times the number of expected references per instruction, times the number of processors in the processing ensemble. (See chapter 5 case study.) The memory system should be designed to accommodate the peak instruction execution rate, not the average rate, since the processor typically executes in bursts of high-performance execution interrupted by dead time for contention resolution. |
|
|
|
|
|
|
|
|
3. The interconnection network. The interconnection network may provide an additional source of contention and bandwidth limitation, especially in the case where n processors are accessing m modules, n ³ 2. For simple designs with small numbers of processors, a high-performance time-multiplexed bus or a small crossbar switch is commonly used, providing access from the processors to the memory without contention. For these cases, it is merely necessary to assess the physical delay through the network, and adjust the overall access time accordingly. For complex memory systems with large numbers of processors, a crossbar switch becomes exceedingly expensive, and the contention network is commonly introduced. These networks reduce overall interconnection cost at the expense of a somewhat reduced bandwidth and somewhat increased access time due to network contention. |
|
|
|
|
|
|
|
|
4. Referencing behavior. The important part of the evaluation process is an assessment of the probable program behavior in its sequence of requests to memory. Three cases are of particular interest: |
|
|
|
 |
|
|
|
|
(a) Purely sequentiali.e., each request follows its predecessor. |
|
|
|
 |
|
|
|
|
(b) Randomthe addresses are uniformly distributed, at least across the low-order interleave partition of memory. |
|
|
|
|
|