|
|
Table 4.19 Instruction profiles. | Instruction Profiles | BR | | BC | | IALUa | | IMult | | IDiv | | FPAdd | | FPMult | | FPDiv | | Load | | Store | | Total | | aThis category includes ladd, shift, compare, etc., and those moves that do not reference memory (i.e., register to register). |
|
|
| | |
|
|
|
|
(0.54 BC's that go to target)(BC%)(BC delay) |
|
|
|
| | | |
|
|
|
|
0.54 ´ .104 ´ (2.0) = .11 |
|
|
|
| | | | | | | | | |
|
|
|
|
0.11 + 0.05 = 0.16 CPI delay |
|
|
|
|
|
|
|
|
|
|
We can now compute the data dependency effects (using Tables 3.19 and 3.20). Because of bypassing, there is no ALU dependency for our baseline processor. |
|
|
|
|
|
|
|
|
However, there is a load-ALU dependency required to keep the PA in order: |
|
|
|
|
|
|
|
|
There are actually two effects: |
|
|
|
|
|
|
|
|
1. Since the PA must be done in order (the register store bandwidth is usually limited to one PA per cycle), the ALU PA is delayed one cycle. |
|
|
|
|
|
|
|
|
2. When the ALU instruction uses the result of the LD, the EX of the ALU is delayed one cycle. |
|
|
|
|
|
|
|
|
For simplicity, we combine these two effects and assume that both EX (and PA) are delayed one cycle. This is shown above and is manifested as a delay in the ALU instruction that sets the CC immediately before a BC. If the BC goes to the target no additional delay is encountered, but if the BC goes |
|
|
|
|
|