< previous page page_249 next page >

Page 249
Now we can formulate the logic to determine whether the current instruction should interlock (and remain) in the D stage due to an address generation dependency. To do so, we need to determine whether any of the stages AG, T, DF, or EX contains an instruction that writes a result into a register needed for address generation by the instruction in D. We need not consider any instruction in the PA stage, since the register file logic automatically provides the right value. In this part of the study, we do not attempt to use the bypass path. We use the following logic:
A_G_Interlock = A.W [D_use_B2 (A.WR = D.B2) + D_use_X2 (A.WR = D.X2)]
+ T.W [D_use_B2 (T.WR = D.B2) + D_use_X2 (T.WR = D.X2)]
+ F.W [D_use_B2 (F.WR = D.B2) + D_use_X2 (F.WR = D.X2)]
+ E.W [D_use_B2 (E.WR = D.B2) + D_use_X2 (E.WR = D.X2)]
The interpretation is straightforward. If the current instruction uses a base or index register (D_use_B2 or D_use_X2), then the register it uses should be checked against registers that are yet to be written. Thus, if the instruction in the AG stage were the ADD R1, D[R2,R3] instruction, A.W would be valid and A.WR = 0001. If the current instruction uses a base register but no indexing, then D_use_B2 = 1 (valid). The A.WR compare against D.B2 then determines whether an interlock is placed on the instruction being decoded. The instruction must be checked for an interlock for all stages. If an interlock is found, the instruction remains in the decoder until the interlock is removed (when the instruction causing the interlock completes execution).
Number of Gates Required If we assume that all the fields and signals used in the preceding logic equation had to be available even without interlock detection, we can determine the number of additional gates needed to implement the interlock logic. A four-bit comparator can be built with five gates (four XOR and one NOR). There are eight comparisons needed, requiring 40 gates. The rest of the logic requires 12 two-input AND gates, 4 two-input OR gates, and 1 four-input OR. The grand total is 57 gates.
Using the Bypass Figure 4.29 shows a 32-bit data path from the output of the execution logic to a pair of 2:1 multiplexers in the D stage. With appropriate control logic, the result of an operation in the E stage can be selected in place of the DF outputs and used for address generation a cycle earlier than before. To take advantage of this bypass capability, we need slightly different logic for the A_G_Interlock signal (to avoid the interlock when the bypass becomes available), and we also need two new bypass control signals to control the multiplexors. The following logic equations do the job:
A_G_Interlock = A.W [D_use_B2 (A.WR = D.B2) + D_use_X2 (A.WR = D.X2)]
d87111c01013bcda00bb8640fdff6754.gif
+ T.W [D_use_B2 (T.WR = D.B2) + D_use_X2 (F.WR = D.X2)]
d87111c01013bcda00bb8640fdff6754.gif
+ F.W [D_use_B2 (F.WR = D.B2) + D_use_X2 (F.WR = D.X2)]
d87111c01013bcda00bb8640fdff6754.gif
+ E.W [D_use_B2 (E.WR = D.B2) + D_use_X2 (E.WR = D.X2)] E.LAST
Bypass_B2 = E.W D_use_B2 (E.WR = D.B2)
Bypass_X2 = E.W D_use_X2 (E.WR = D.X2)

 
< previous page page_249 next page >