< previous page page_194 next page >

Page 194
on *. Out-of-order execution significantly complicates performance evaluation and is treated in chapter 7.
3. As a result of (2), "time lost is lost forever." It is impossible to regain time lost by delays later in program execution. Delays add linearly to compose the overall program execution time. In practice, occasionally delays are overlapped; for simplicity, we ignore this. Dubey [76] has a more complete model which includes the effects of overlapped delays.
4. We assume sufficient processing resources to execute a single sequence of instructions. Thus, "going both ways on a branch" is not allowed.
5. For our evaluation purposes, we assume that most (by frequency) functional-type instructions set a condition code (CC). We assume that this is always available at the end of the last EX cycle, even if other cycles such as a PA cycle follow, so that
0194-01.gif
Thus, the process of timing evaluation consists of the following steps.
Pipelined Processor Design Evaluation Rules
1. Form the timing template for the basic instructions. Suppose each instruction consists of five actions (IF, D, AG, DF, EX) and each action consists of a single cycle. We would have:
0194-02.gif
d87111c01013bcda00bb8640fdff6754.gif
as the timing template.
2. Create the relative templates for two successive (non-dependent) instructionse.g., for a fully pipelined processor, we might have (for two successive instructions located at * and * + 1):
0194-03.gif
3. Assess the time for instruction execution (at the maximum decode rate). In the preceding example, the ideal execution rate is one instruction each cycle.
4. For each delay type, find the scheduled occurrence of an action (D¢, AG¢, etc.) in * + 1, and the actual time that * + 1 was able to perform the action (D, AG, etc.) due to the dependency on *. The delay is the number of cycles by which effective execution was deferred. For most functional instructions, this will be the time by which the completion of execution (EX) is delayed. For example, suppose an unconditional branch instruction (BR) located at * is decoded. The next instruction (the target instruction, TI) is then delayed by two cycles:

 
< previous page page_194 next page >