< previous page page_689 next page >

Page 689
Table 10.10 Bandwidth requirements.
BaselineSuperscalarMultiprocessor
Peak issue rate122
Peak Instruction bw(MB/s)50010001000
Average Instr. bw(MB/s)340600629
Average Data Read/cycle0.2240.3960.42
Average Data Read bw(MB/s)112198210
Average Data Write/cycle0.160.2820.296
Average Data Write bw(MB/s)80141148

Basics of Performance
d87111c01013bcda00bb8640fdff6754.gif
Execution time = Instruction count/program * CPI * cycle time.
Since all three implementations use the same optimizing compiler, the first parameter, Instruction/Program, remains constant. This implies that the compiler does not resolve pipeline dependencies by installing NOP instructions to delay instruction. All conflicts are resolved in hardware. The third parameter also is assumed to be constant at 8 ns. We have to be wary of this assumption, since we assume that none of the modules we are modifying is part of the critical path and does not become a critical path at any time in the future. An example of this is selecting the associativity during the cache design. Selecting a cache with higher associativity does not penalize us in cycle time. This leaves us with the task of optimizing the CPI. For this study, the freedom of design lies in the design of the memory hierarchy. Since the core processor has been preselected, this fixes the various CPI components of the pipeline:
Procedural dependencies.
Data conflicts.
Resource conflicts.
The remaining CPI components are mostly related to the memory hierarchy design. To facilitate this, the area available for cache design needs to be evaluated for each implementation.
Bandwidth Calculations
It is useful to do some approximate calculations prior to proceeding (Table 10.10). Since we are running at 8 ns per cycle or 125 MHz, the processor's bandwidth requirement is obvious. The memory hierarchy design needs to take this into account, or it becomes the limiting factor. As we go from baseline to superscalar to multiprocessor implementation, a key design goal is to supply sufficient bandwidth. Of course, because of area and cost limitations, we may not be able to satisfy some target requirements, and this determines the best implementation.

 
< previous page page_689 next page >