page_256

< previous page

page_256

next page >

Page 256

Table 4.19 Instruction profiles.

Instruction Profiles

2.6%

10.4%

IALU^a

25.8%

IMult

1.2%

IDiv

0.4%

FPAdd

6.5%

FPMult

5.2%

FPDiv

0.4%

Load

30.0%

Store

17.5%

Total

100.0%

^aThis category includes ladd, shift, compare, etc., and those moves that do not reference memory (i.e., register to register).



		BC penalty



		=



		(0.54 BC's that go to target)(BC%)(BC delay)



		=



		0.54 ´ .104 ´ (2.0) = .11



		BR penalty



		=



		(BR%) (BR delay)



		=



		0.026 ´ 2 = 0.052



		Total branch delay



		=



		0.11 + 0.05 = 0.16 CPI delay



		We can now compute the data dependency effects (using Tables 3.19 and 3.20). Because of bypassing, there is no ALU dependency for our baseline processor.



		However, there is a load-ALU dependency required to keep the PA in order:



		There are actually two effects:



		1. Since the PA must be done in order (the register store bandwidth is usually limited to one PA per cycle), the ALU PA is delayed one cycle.



		2. When the ALU instruction uses the result of the LD, the EX of the ALU is delayed one cycle.



		For simplicity, we combine these two effects and assume that both EX (and PA) are delayed one cycle. This is shown above and is manifested as a delay in the ALU instruction that sets the CC immediately before a BC. If the BC goes to the target no additional delay is encountered, but if the BC goes

< previous page

page_256

next page >