< previous page page_676 next page >

Page 676
for the magnitude of the problem while not overcomplicating the overall solution.
In order to determine the overall penalty of the two pipelines, we need to weight the penalties determined above with t0he engineering (scientific) environment instruction distribution values from Chapter 3. We use these values to determine the actual pipeline penalty for this instruction mix, using:
0676-01.gif
where Ppipeline is the total penalty when considering the distribution weight (wd) in the instruction mix and the penalty (Pd) as calculated in Table 10.3. The results of these calculations are displayed in Table 10.6.
Note that although the reduced-scale version of the processor has a lower delay than the super-pipelined version, the cycles per instruction are significantly greater, since the base cycle time for the reduced-scale version is 1 cycle and for the super-pipelined version, ½ cycle. Thus, we get an effective CPI of 1.811 cycles for the reduced-scale version and 1.448 cycles for the super-pipelined versiona reduction of 20%!
If this were all the analysis necessary to make a decision, then the result would be complete and conclusive: choose the super-pipelined processor. Unfortunately, there are yet two additional factors to considerthe cost of memory accesses, and the cost of fabrication for both processors. For the reduced-scale processor, the cache is increased into the available space (keeping the same die size), and for the super-pipelined, it remains the same size as in the original version of the processor (with a slightly larger die size to handle the additional latch overhead and cache access rate).
10.1.5 Cache and Memory Analysis
The original Baseline Mark I processor used a split cache as its on-chip cache. The DTMRs for the instruction cache and data cache are 0.05 (Figure 5.30) and 0.08 (Figure 5.27), respectively. The miss rate for each cache is then calculated by using the equation:
d87111c01013bcda00bb8640fdff6754.gif
MR = DTMR ´ fmapping´ fsystem.
In the case of the instruction cache, the values for fmapping and fsystem are 1.04 (Figure 5.10) and 1.75 (Figure 5.14), respectively. This yields a miss rate of 9.1%. For the data cache, on the other hand, the values for fmapping and fsystem are 1.04 (Figure 5.10) and 1.65 (Figure 5.14). This results in a miss rate of 13.7%. From Table 5.6, the number of data reads/instr is 0.33, while the number of data writes/instr is 0.235. As a result, the effective miss rate for the I- and D-caches is calculated to be 16.8% ((1)(0.091) + (.235 + .33)(0.137)). The effective not-in-TLB rate for both TLBs is 1.02% ((1) (0.65%) + (.235 + .33)(0.65%)).
The miss penalty is then calculated using the hit and miss penalties, which are 0 and 5 cycles, respectively. Thus, we get a miss penalty of 5 cycles ´ 16.8% = 0.84 cycles. The not-in-TLB rate for a 2 ´ 64 = 128-entry TLB is

 
< previous page page_676 next page >