|
|
|
|
|
|
|
since there are 4 LD/ST units, each capable of issuing a LD and a buffered ST. So, |
|
|
|
|
|
|
|
|
The relative performance is |
|
|
|
|
|
|
|
|
(b) If we had no write buffer, we would have |
|
|
|
 |
|
|
|
|
B(4,2,0,0.25) = 1.57, |
|
|
|
|
|
|
|
|
and the relative performance would be |
|
|
|
|
|
|
|
|
7.6.9 Branches and Speculative Execution |
|
|
|
|
|
|
|
|
So far in our discussion, we have assumed that the instructions being tested for independence are non-branch instructions. The scope of the test for instruction independence has been a branch subset or a basic block. If we restrict the scope of the search for instruction independence to instructions that lie between conditional branches, we naturally limit the possible speedup. If branch frequency is 20% and we limit the scope of detection to branch subsets, then our maximum possible speedup is clearly less than 5. |
|
|
|
|
|
|
|
|
In order to improve processor speedup, we may: |
|
|
|
|
|
|
|
|
1. Minimize the frequency of branches, or |
|
|
|
|
|
|
|
|
2. Predict the outcome of a branch and speculatively execute a predicted path. |
|
|
|
|
|
|
|
|
Ideally, we would do both. In order to reduce the frequency of branch, we can use loop-unrolling techniques, which simply replicate multiple iterations of the same loop in line, eliminating the intervening branches. As this increases code size, however, it may have undesirable side effects. Trace scheduling, which was mentioned earlier in this chapter, is another technique targeted at reducing the occurrence of branches. |
|
|
|
|
|
|
|
|
Suppose that, using the techniques of chapter 4 (history bits, branch target buffers, etc.), we can be assured of a relatively high hit rate, say, greater than 90%. We might then choose to predict the outcome of a branch and conditionally or speculatively execute the chosen path. Of course, such speculative execution must be done in such a way that the results of speculatively executed instructions cannot affect the final register state until the outcome of the conditional branch has been determined. Speculative execution necessarily increases the required instruction bandwidth by at least |
|
|
|
|
|