|
|
|
|
|
|
Bypassing a result from DF to AG results in only a one-cycle loss, i.e.: |
|
|
|
|
| = 1-cycle loss ´ 0.099 | = 0.099 cycles/instruction | | = 0-cycle loss. | |
|
|
|
|
|
|
4.6.4 Execution Interlocks and Interlock Tables |
|
|
|
|
|
|
|
|
An execution interlock condition exists where the result register of one instruction is an operand register of a following instruction. This type of interlock can also be dealt with by incorporating bypasses at appropriate points in the pipeline. |
|
|
|
|
|
|
|
|
For very simple instruction sets and a single execution unit, it is easy to organize the execution facilities of the processor in the form of a bypassed pipeline. Then the execution facilities can accept a new instruction on every cycle. |
|
|
|
|
|
|
|
|
For most instruction sets, there are complex instructions whose execution requires a substantial process of iteration. (Examples might include floating-point multiplication and division instructions.) Such instructions may not be easily pipelined to accept new operands every cycle without cost increase, since pipelining amounts to the unraveling of the iterations onto a hardware assembly line. Instead, most implementations allow the execution facilities to become busy and hold up the pipeline while the execution proceeds iteratively. |
|
|
|
|
|
|
|
|
To minimize the performance loss due to execution run-ons, multiple execution resources can be provided; i.e., one or more add/logical units, a multiply unit, a decimal or character string manipulation unit, etc. These units can now be independently scheduled so that execution run-ons are reduced. However, since the units are likely to have different latencies, it is possible that instructions will complete out of order; i.e., an instruction with a relatively short execution time such as an add may be completed before a multiply instruction that preceded the add is completed. This means that a new stage in the pipeline must be defined, one that holds the results of completed instructions until their effects can be propagated in the proper order. |
|
|
|
|
|
|
|
|
Analysis of the performance impact of such interlocks is based on data such as that shown in Table 3.20. |
|
|
|
|
|
|
|
|
Cache Access and Priority Interlock Delays |
|
|
|
|
|
|
|
|
Cache access delays are associated with priority interlocks that are the result of limited bandwidth at various levels in the storage hierarchy. For example, an instruction fetch can be delayed because an operand fetch or store initiated by a preceding instruction takes priority. Also an operand can be delayed when a preceding instruction is storing an operand; the store is given priority when the store buffer is full, and the fetch is delayed. Analysis of such delay is discussed in the next chapter. |
|
|
|
|
|