page_675

< previous page

page_675

Page 675

Table 10.6 Weighted pipeline delay summary.

(a) Reduced-scale Processor

Sequence

Weighted Delay

Prob.

Eff. Delay

EX/EX

0.000 cycles

0.152

0.000

EX/AG

0.000 cycles

0.238

0.000

LD/EX

0.403 cycles

0.118

0.047

EX/ST

0.000 cycles

0.069

0.000

LD/ST

0.000 cycles

0.052

0.000

Branch

1.264 cycles

0.130

0.164

Run-on

0.600

Total

0.811

(b) Super-pipelined Processor

Sequence

Weighted Delay

Prob.

Eff. Delay

EX/EX

0.201 cycles

0.152

0.030

EX/AG

0.050 cycles

0.238

0.012

LD/EX

0.769 cycles

0.118

0.091

EX/ST

0.000 cycles

0.069

0.000

LD/ST

0.201 cycles

0.052

0.010

Branch

1.580 cycles

0.130

0.205

Run-on

0.600

Total

0.948



		any kind of speculative execution in processors, and continuing to execute the in-line case of a branch is a simple and common example of speculative execution.



		In both of these cases, it is clear that there is no advantage to assuming the target path when a branch is encountered. In many cases, the analysis is not so clear, and a full analysis must be performed, as in study 4.7.



		Now, from Table 3.10, we can calculate the effective penalty for both processors using the delays for branches going either in-line or to targetassuming in-line prediction. For the reduced-scale processor, we have a 2-cycle penalty for unconditional branches, which are 20% of the distribution; a 0-cycle penalty for conditional to in-line, which is 36.8% of the distribution; and a 2-cycle penalty for conditional to target, which is 43.2% of the distribution. This gives a weighted branch penalty of 1.264 cycles for branchesand since branches comprise 13% of the instruction mix (Table 3.4), this gives an aggregate penalty of 0.164 cycles for all branches. Similarly, for the super-pipelined processor we get an aggregate penalty of 0.185 for all branches.



		Finally, for run-on delays, we use the same assumption of 0.6 cycles that is used in study 4.3 as a simplificationthe actual problem is difficult to determine analytically, since there are many possible code sequences that must be considered. For our purposes, the assumption provides a feel

< previous page

page_675