|
|
|
|
|
|
The interlock logic is identical except for including E.LAST. This means a register dependency caused by an instruction in the E stage ceases to cause the interlock in its last EX cycle. |
|
|
|
|
|
|
|
|
The bypass controls can be very simple, since they only have to work if the D cycle is not interlocked. In particular, the bypass controls need not consider E.LAST. If the causing instruction is in the EX stage but not yet in its last cycle, the instruction in D is interlocked, so it makes no difference which way the MUX controls are set. |
|
|
|
|
|
|
|
|
It is interesting how little extra control logic is needed to allow this bypass. The logic for the interlock signal still takes the same number of gates, but one of the two-input AND gates changes to a three-input AND. Each of the two bypass signals requires only a single three-input AND. Clearly, the control logic for such a bypass is trivial. Most of the logic needed is in the multiplexors in the data paths, where the bypass requires 64 2:1 multiplexors. In actual designs, whether to implement such a bypass or not depends on details of the data path implementation and timing. |
|
|
|
| |
|
|
|
|
The preceding example may tend to give the impression that interlocks are not overly significant in the design of a processor. That would certainly be a wrong assumption. Even if, in a simple, static pipelined processor, they do not add greatly to the hardware gate count, the design effort to fully ensure correct operation can be considerable. Issues such as the integrity of state and buffer information on interrupt, the restoration of such data after interrupt, instruction retry after an error has been detectedall must be managed (in part) through the interlock control circuits. Dynamic pipelines with timing templates of unequal length further increase design complexity. |
|
|
|
| |
|
|
|
|
In an effort to present essential elements of processor control and optimization, we skip over much of the detailed elements of the interlock/control design. As any pipelined processor designer knows, a great deal of engineering effort is required to efficiently realize a fully functional set of interlocks! |
|
|
|
|
|
|
|
|
|
As we saw in study 4.3, we can compute the effect of long EX instructions by finding the difference between the total execution latency and the number of EX cycles that an instruction may use before causing a delay in the scheduled time for PA. We referred to the number of EX cycles the instruction may use before affecting PA as E0 This is usually one or two cycles, but can be larger, as in the case of L/S processors with relatively fast ALU instructions and slower LD/ST instructions. The MIPS R4000 described earlier has E0= 4 (half) cycles. |
|
|
|
|
|
|
|
|
Table 4.18 illustrates the execution delays for some recent microprocessors. |
|
|
|
|
|