|
|
|
|
|
|
|
Figure 7.32
Instruction window. |
|
|
|
|
|
|
|
|
Detection of concurrent instructions may be done at compile time, at run time (by the hardware), or at both times. It is clearly advantageous to use both the compiler and the run-time hardware to support concurrent instruction execution. The compiler, for example, may be able to unroll loops and generally create larger basic block sizes, thus mitigating the effect of procedural dependencies. On the other hand, it is only at run time that the complete machine statei.e., the state of the resourcesis completely known. For example, an apparent resource dependency created by a sequence of divide, load, divide instructions may in fact not exist if, say, the intervening load instruction created a cache miss. The effect of this miss may be to insert a sufficient delay for the first divide to complete before the second divide is activated. |
|
|
|
|
|
|
|
|
Wedig [302] treats compile time concurrency detection, while Wedig [302], Uht [292 ], and Kuck [174] provide an analysis of the complementary nature of compile and run-time instruction concurrency detection. |
|
|
|
|
|
|
|
|
During decode, a number of instructions are examined to determine their independence. If an instruction is found to be independent of other, earlier instructions, and if there are available resources (reservation stations), the instruction is issued to the functional executional unit. The total number of instructions inspected (i.e, candidates for issue) determines the size of the instruction window (Figure 7.32). The instruction window has size N instructions, and at any given cycle M instructions are issued. In the next cycle, the successor M instructions are brought into the buffer and again N instructions are evaluated for their dependencies. Up to M instructions may be issued in a single cycle (although resource limitations may necessarily make this unobtainable.) Figure 7.33 illustrates the overall layout of an M pipelined processor inspecting N instructions and issuing M instructions. |
|
|
|
|
|
|
|
|
When the window size is 1 (N = 1), we have a trivial case where only one instruction is issued per decode cycle. However, if out-of-order execution is allowed, the dependencies mentioned before apply. Both essential and ordering dependencies must also be allowed, to ensure current execution for out-of-order code. Table 7.5 illustrates some of the processor possibilities. |
|
|
|
|
|
|
|
|
In the window model, any of up to N instructions are candidates for being issued, depending solely on whether they satisfy the independence proper- |
|
|
|
|
|