|
|
|
|
|
|
|
Consider the Intel Pentium ä processor. This processor has 8-way interleaved data cache. It can make two references per processor cycle. The caches have the same cycle time as the processor. For the Intel instruction set, the |
|
|
|
 |
|
|
|
|
Prob (data references per instruction) = 0.6. |
|
|
|
|
|
|
|
|
Note that the instruction issue mechanism is the source here (not some pipeline stage, as discussed earlier). This mechanism is modeled as having two independent requestorsone for each of the instructions that can be issuedso |
|
|
|
 |
|
|
|
|
Prob (data reference per instruction) = d |
|
|
|
 |
|
|
|
|
d = 0.6. |
|
|
|
|
|
|
|
|
Since the Pentium tries to execute two instructions each cycle, we have |
|
|
|
|
|
|
|
|
Using the d-binomial model, we get: |
|
|
|
 |
|
|
|
|
B(m,n,d) = B(8,1.2,0.6). |
|
|
|
|
|
|
|
|
The relative performance is |
|
|
|
|
|
|
|
|
i.e., the processor slows down by about 4% due to contention. |
|
|
|
|
|
|
|
|
The primary objective of modern memory systems design is capacity (or size) at a low per-bit cost; but large memory capacity necessarily implies slow access time. Even if chip access is fast, the system's overhead, including bus signal transmission, error checking, and address distribution, add significant delay; and for most available technology, these overhead delays are likely to increase relative to decreasing machine cycle times. |
|
|
|
|
|
|
|
|
Faced with multiple cycle memory access time, the designer can at least provide adequate memory bandwidth to match or exceed the offered bandwidth (request rate) from the processor. There are two ways of accomplishing this: |
|
|
|
|
|