|
|
|
|
|
|
(1) Early CC (condition) setting: |
|
|
|
|
|
|
|
|
Here, the compiler rearranges code to find (and place) useful non-CC setting instructions (e.g., load, store) between the instruction that sets the CC and the branch that tests it. We evaluate the effect of n = 1, 2, and 3 intervening instructions. If the CC (condition code) is set in * - 1 (the immediately preceding instruction), n = 0 (n being the number of instructions between the CC setting instruction and the branch). |
|
|
|
|
|
|
|
|
For n = 0, the BC penalty is 5.5 cycles as determined in study 4.3. The effect of n intervening instructions is simply to delay the scheduled decode time (D¢). When the branch is taken this reduces the BC penalty to the greater of 6.0-n or the unconditional branch penalty. The BC penalizes 5.0-n if the branch is untaken. There is no effect on the unconditional branch. Thus, |
|
|
|
|
| | n = 0 | n = 1 | n = 2 | n = 3 | n = 5* | BC penalty | - taken | 6.0 | 5.0 | 4.0 | 4.0 | 4.0 | | - untaken | 5.0 | 4.0 | 3.0 | 2.0 | 0 | BC delay | (cycles) | 2.03 | 1.88 | 1.73 | 1.65 | 1.5 | * n is the number of instructions between the CC setting instruction and the branch. |
|
|
|
|
|
|
The performance includes only the effects of branchconditional and un-conditional. |
|
|
|
|
|
|
|
|
The delayed branch (DB) has an effect similar to the early setting of the CC. The delayed branch (at *) may immediately follow the instruction setting the CC, but the resulting instruction (target or in-line) is not decoded until * + n + 1. Other useful instructions (if available) are placed in-line and the delayed branch instruction is to be executed n instructions later (n is usually fixed at 1 or 2 but may be a parameter of the DB). |
|
|
|
|
|
|
|
|
The following illustrates the delayed branch (n = 1). Assume the delayed branch (DB) is unconditional: |
|
|
|
 |
|
|
|
|
DB ALPHA
LD
ALPHA INSTR |
|
|
|
|
|
|
|
|
With proper implementation support, the DB can reduce penalties for all branch types. Again assume that we prefetch in-line, and fetch only one target instruction: |
|
|
|
|
| | n = 0 | | | | | DB unconditional | | 4.0 | | | | | DB conditional | - taken | 6.0 | | | | | | - untaken | 5.0 | | | | | DB delay | (cycles) | 2.03 | | | | |
|
|
|