page_206

< previous page

page_206

Page 206



		In this case, TI + 1 decode is delayed 6 cycles. If the branch is equally likely to go in-line as it is to take the target (TI), then the effective penalty is 5.5 cycles.



		(ii) The effects of the unconditional branch (BR) can be similarly evaluated:



		For this case, the penalty is 4 cycles.



		Now, assuming the frequency of conditional branch is 15% and unconditional branch is 5%, we can compute the effect of branches on processor performance.



		CPI



		=



		1 (decode) + .15(5.5) + .05(4),



		=



		1 + 0.825 + 0.20 = 2.025.



		(iii) Consider the effect of address dependencies:



		Cases such as:



		LD R5, ALPHA[R6,R7] LD R6, BETA[R5,R7]



		are evaluated as:



		This results in a 3-cycle penalty. Similarly,



		ADD R5, ALPHA[R6,R7] LD R6, BETA[R5,R7]



		is evaluated as:



		or a 5-cycle penalty.



		Assume a 3-cycle penalty occurs in 4% of the instruction executions and a 5-cycle penalty occurs 1.5% of the time. We now have performance:



		CPI



		=



		2.025 + 0.04(3) + 0.015(5),



		=



		2.025 + 0.12 + 0.075,



		=



		2.22.

< previous page

page_206