|
 |
|
|
|
|
and their cumulative effect on the execution of these instructions. Study 4.3 uses the same machine, but now analyzes performance on a statistical basis, computing the ideal performance plus the effects of the presumed dependency types. The effects of conditional branch on performance are quite noticeable in both studies 4.2 and 4.3. Study 4.4 looks at ways of potentially lessening the effect of branches on performance by studying two possibilities: early condition code setting and delayed branches. |
|
|
|
|
|
|
|
|
Study 4.2 Dependencies in Sequences of Code |
|
|
|
 |
|
|
|
|
Assumptions: |
|
|
|
 |
|
|
|
|
This study assumes a single instruction template for all instructions, and a single instruction size32 bitsfor each instruction. This timing template represents a simple dynamic pipeline with in-order execution of instructions. Register-to-register instructions (RR format) require a 32-bit instruction but do not use AG or DF cycles. Still, their execution must be delayed to preserve in-order execution. |
|
|
|
|
|
|
|
|
A new R/M 32-bit machine has been designed (Figure 4.10). All instructions are 32 bits, but otherwise it is similar to our R/M prototype processor described in Chapter 2. Each R/M instruction has one data reference except for branch, which takes a target instruction fetch (TIF). The execution pattern is: |
|
|
|
|
|
|
|
|
with one instruction decode per cycle (pipelined). The problem is to assess all code dependency and cycle penalties assuming all references are in the cache. Assess the penalty both with and without considering branches taken. |
|
|
|
|
|
|
|
|
Discussion: In instruction 1, the Load takes no EX cycles. This is generally true for most timing template configurations. Templates that include putaway cycles (PA) usually include these cycles for loads. |
|
|
|
|
|
|
|
|
In instruction 2, address generation (AG) cannot begin until instruction 1 completes, since R4 is used and its value is determined by instruction 1. Similarly, instruction 3 depends on 2 for an operand (R3). Once a dependency is detected (e.g., instruction 2), that instruction remains in the stage where the dependency was detected and a new instruction does not enter |
|
|
|
|
|