< previous page page_256 next page >

Page 256
Table 4.19 Instruction profiles.
Instruction Profiles
BR
2.6%
BC
10.4%
IALUa
25.8%
IMult
1.2%
IDiv
0.4%
FPAdd
6.5%
FPMult
5.2%
FPDiv
0.4%
Load
30.0%
Store
17.5%
Total
100.0%
aThis category includes ladd, shift, compare, etc., and those moves that do not reference memory (i.e., register to register).

BC penalty
=
(0.54 BC's that go to target)(BC%)(BC delay)
=
0.54 ´ .104 ´ (2.0) = .11
BR penalty
=
(BR%) (BR delay)
=
0.026 ´ 2 = 0.052
Total branch delay
=
0.11 + 0.05 = 0.16 CPI delay

We can now compute the data dependency effects (using Tables 3.19 and 3.20). Because of bypassing, there is no ALU dependency for our baseline processor.
0256-01.gif
However, there is a load-ALU dependency required to keep the PA in order:
0256-02.gif
There are actually two effects:
1. Since the PA must be done in order (the register store bandwidth is usually limited to one PA per cycle), the ALU PA is delayed one cycle.
2. When the ALU instruction uses the result of the LD, the EX of the ALU is delayed one cycle.
For simplicity, we combine these two effects and assume that both EX (and PA) are delayed one cycle. This is shown above and is manifested as a delay in the ALU instruction that sets the CC immediately before a BC. If the BC goes to the target no additional delay is encountered, but if the BC goes

 
< previous page page_256 next page >