|
|
|
| Table 10.10 Bandwidth requirements. |
| | Baseline | Superscalar | Multiprocessor | | Peak issue rate | 1 | 2 | 2 | | Peak Instruction bw(MB/s) | 500 | 1000 | 1000 | | Average Instr. bw(MB/s) | 340 | 600 | 629 | | Average Data Read/cycle | 0.224 | 0.396 | 0.42 | | Average Data Read bw(MB/s) | 112 | 198 | 210 | | Average Data Write/cycle | 0.16 | 0.282 | 0.296 | | Average Data Write bw(MB/s) | 80 | 141 | 148 |
|
 |
|
|
|
|
Execution time = Instruction count/program * CPI * cycle time. |
|
|
|
|
|
|
|
|
Since all three implementations use the same optimizing compiler, the first parameter, Instruction/Program, remains constant. This implies that the compiler does not resolve pipeline dependencies by installing NOP instructions to delay instruction. All conflicts are resolved in hardware. The third parameter also is assumed to be constant at 8 ns. We have to be wary of this assumption, since we assume that none of the modules we are modifying is part of the critical path and does not become a critical path at any time in the future. An example of this is selecting the associativity during the cache design. Selecting a cache with higher associativity does not penalize us in cycle time. This leaves us with the task of optimizing the CPI. For this study, the freedom of design lies in the design of the memory hierarchy. Since the core processor has been preselected, this fixes the various CPI components of the pipeline: |
|
|
|
|
|
|
|
|
The remaining CPI components are mostly related to the memory hierarchy design. To facilitate this, the area available for cache design needs to be evaluated for each implementation. |
|
|
|
|
|
|
|
|
It is useful to do some approximate calculations prior to proceeding (Table 10.10). Since we are running at 8 ns per cycle or 125 MHz, the processor's bandwidth requirement is obvious. The memory hierarchy design needs to take this into account, or it becomes the limiting factor. As we go from baseline to superscalar to multiprocessor implementation, a key design goal is to supply sufficient bandwidth. Of course, because of area and cost limitations, we may not be able to satisfy some target requirements, and this determines the best implementation. |
|
|
|
|
|