|
|
|
| Table 10.6 Weighted pipeline delay summary. |
| | (a) Reduced-scale Processor |
| | Sequence | | | | | EX/EX | | | | | EX/AG | | | | | LD/EX | | | | | EX/ST | | | | | LD/ST | | | | | Branch | | | | | Run-on | | | | | | Total | | | (b) Super-pipelined Processor |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
|
|
|
|
|
|
any kind of speculative execution in processors, and continuing to execute the in-line case of a branch is a simple and common example of speculative execution. |
|
|
|
|
|
|
|
|
In both of these cases, it is clear that there is no advantage to assuming the target path when a branch is encountered. In many cases, the analysis is not so clear, and a full analysis must be performed, as in study 4.7. |
|
|
|
|
|
|
|
|
Now, from Table 3.10, we can calculate the effective penalty for both processors using the delays for branches going either in-line or to targetassuming in-line prediction. For the reduced-scale processor, we have a 2-cycle penalty for unconditional branches, which are 20% of the distribution; a 0-cycle penalty for conditional to in-line, which is 36.8% of the distribution; and a 2-cycle penalty for conditional to target, which is 43.2% of the distribution. This gives a weighted branch penalty of 1.264 cycles for branchesand since branches comprise 13% of the instruction mix (Table 3.4), this gives an aggregate penalty of 0.164 cycles for all branches. Similarly, for the super-pipelined processor we get an aggregate penalty of 0.185 for all branches. |
|
|
|
|
|
|
|
|
Finally, for run-on delays, we use the same assumption of 0.6 cycles that is used in study 4.3 as a simplificationthe actual problem is difficult to determine analytically, since there are many possible code sequences that must be considered. For our purposes, the assumption provides a feel |
|
|
|
|
|