|
|
|
|
|
|
(iv) Evaluate run-on instructions. |
|
|
|
|
|
|
|
|
These usually consist of a myriad of instruction types that are infrequently executed and expensive to implement within minimum cycle constraints. Divide, character operations, and load multiple (registers) are examples. When strict order of execution is to be maintained, the penalty assessment is straightforward. In our example, all instructions are allocated two cycles for execution. Load multiple (LDM) and store multiple (STM) are typical of a class of run-on instructions whose execution time is dependent upon operand size. Typically, in the case of LDM, a cycle is required for each register that is loaded. A single load instruction (LD) would complete after the second DF and use no EX cyclesit completes two cycles earlier than the expected ''average" instructionbut LDM continues using data fetch cycles, moving one word into a register each cycle. Thus, if the LDM calls for a movement of seven integers from memory into registers, eight cycles are requireda combination of two DF cycles plus an addition of six EX cycles that are now actually DF cycles. Suppose a run-on (e.g., LDMload multiple registers) takes 8 cycles for combined DF/EX. Then the LDM appears to have a penalty of 4 cycles, as shown: |
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
Actually, this is somewhat deceptive. Six cycles of DF have been lost to subsequent instructions. If these instructions required a DF, the actual penalty is six cycles. If they did not (as in the case of an RR instruction), then we may approach a four-cycle penalty, mentioned earlier. The type of linear modeling we follow here represents a conservative estimate on performance. Normally, the designer bases performance estimates on the more conservative estimate (e.g., the six-cycle penalty), which is simply the additional time (over that allocated in the timing template) required to complete an action. |
|
|
|
|
|
|
|
|
Instruction * + 1 must not alter a register state until the cycle after * completes execution. This assumes that * + 1 did not depend on a result from *'s executionthe delay is caused solely by the need to preserve the order of execution, so that this overlapped machine behaves (on interruptions, etc.) exactly the same as a well-mapped machine. While order of execution might be preserved if both * and * + 1 completed execution simultaneously, implementation considerations usually prevent this. There are simply insufficient paths to a common register set. |
|
|
|
|
|
|
|
|
The effects of run-on instructions on performance can be assessed simply by summing the weighted occurrence of those instructions whose execute (or data fetch, etc.) phase exceeds E0. E0 is the number of EX (or other outcome) cycles allocated in the templatein this case, two cycles: |
|
|
|
|
|
|
|
|
When Ei - E0 is negative, the ith delay is zero. |
|
|
|
|
|
|
|
|
While run-on delays are relatively infrequent for many instruction sets, the delay-weighted effect of run-on instructions (even without data dependencies) may be a principal delay contributor because of the associated large delay and the requirement to preserve order during execution. |
|
|
|
|
|