< previous page page_206 next page >

Page 206
In this case, TI + 1 decode is delayed 6 cycles. If the branch is equally likely to go in-line as it is to take the target (TI), then the effective penalty is 5.5 cycles.
(ii) The effects of the unconditional branch (BR) can be similarly evaluated:
0206-01.gif
For this case, the penalty is 4 cycles.
Now, assuming the frequency of conditional branch is 15% and unconditional branch is 5%, we can compute the effect of branches on processor performance.
CPI
=
1 (decode) + .15(5.5) + .05(4),
=
1 + 0.825 + 0.20 = 2.025.

(iii) Consider the effect of address dependencies:
Cases such as:
d87111c01013bcda00bb8640fdff6754.gif
LD R5, ALPHA[R6,R7]
LD R6, BETA[R5,R7]
are evaluated as:
0206-02.gif
This results in a 3-cycle penalty. Similarly,
d87111c01013bcda00bb8640fdff6754.gif
ADD R5, ALPHA[R6,R7]
LD  R6, BETA[R5,R7]
is evaluated as:
0206-03.gif
or a 5-cycle penalty.
Assume a 3-cycle penalty occurs in 4% of the instruction executions and a 5-cycle penalty occurs 1.5% of the time. We now have performance:
CPI
=
2.025 + 0.04(3) + 0.015(5),
=
2.025 + 0.12 + 0.075,
=
2.22.

 
< previous page page_206 next page >