page_689

< previous page

page_689

Page 689

Table 10.10 Bandwidth requirements.

Baseline

Superscalar

Multiprocessor

Peak issue rate

1

2

2

Peak Instruction bw(MB/s)

500

1000

1000

Average Instr. bw(MB/s)

340

600

629

Average Data Read/cycle

0.224

0.396

0.42

Average Data Read bw(MB/s)

112

198

210

Average Data Write/cycle

0.16

0.282

0.296

Average Data Write bw(MB/s)

80

141

148



		Basics of Performance



		Execution time = Instruction count/program * CPI * cycle time.



		Since all three implementations use the same optimizing compiler, the first parameter, Instruction/Program, remains constant. This implies that the compiler does not resolve pipeline dependencies by installing NOP instructions to delay instruction. All conflicts are resolved in hardware. The third parameter also is assumed to be constant at 8 ns. We have to be wary of this assumption, since we assume that none of the modules we are modifying is part of the critical path and does not become a critical path at any time in the future. An example of this is selecting the associativity during the cache design. Selecting a cache with higher associativity does not penalize us in cycle time. This leaves us with the task of optimizing the CPI. For this study, the freedom of design lies in the design of the memory hierarchy. Since the core processor has been preselected, this fixes the various CPI components of the pipeline:



		Procedural dependencies.



		Data conflicts.



		Resource conflicts.



		The remaining CPI components are mostly related to the memory hierarchy design. To facilitate this, the area available for cache design needs to be evaluated for each implementation.



		Bandwidth Calculations



		It is useful to do some approximate calculations prior to proceeding (Table 10.10). Since we are running at 8 ns per cycle or 125 MHz, the processor's bandwidth requirement is obvious. The memory hierarchy design needs to take this into account, or it becomes the limiting factor. As we go from baseline to superscalar to multiprocessor implementation, a key design goal is to supply sufficient bandwidth. Of course, because of area and cost limitations, we may not be able to satisfy some target requirements, and this determines the best implementation.

< previous page

page_689