page_467

< previous page

page_467

Page 467



		ties and whether or not there are sufficient execution resources available. As a practical matter, the second restriction more or less implies the use of an LIW (long instruction word) technique.



		If the processor, for example, can only accommodate two load/store instructions, a floating-point instruction, and a fixed-point instruction, then the decoder in the instruction window must select exactly these classes of instructions for issue while also determining their independence, so that three load/store instructions could not be issued even if they were all independent from all other instructions in the instruction window.



		Scheduling is the process of assigning specific instructions and their operand values to designated resources at designated times. Scheduling can be done either centrally at the time of decode, or in a distributed manner by the functional units themselves at execute time. The former approach is called control flow scheduling; the latter is called data flow scheduling. In control flow scheduling, data and resource dependencies are resolved during the decode cycle and the instructions are held (not issued) until the dependencies have been resolved. In a data flow scheduling system, the instructions leave decode stage when they are decoded and are held in buffers at the functional units until their operands and the functional unit are available. In data flow scheduling, instructions are, in a sense, self-scheduled.



		Early machines used either control flow or data flow to ensure correct operation of out-of-order instructions. The IBM 7030 [46] and the CDC 6600 [281] used a control flow approach; in the CDC 6600, this was called the scoreboard. The IBM 360 Model 91 [288] was the first system to use data flow scheduling.



		Superscalar Processors



		The ability to issue multiple instructions in a single cycle is sometimes referred to as a superscalar implementation. Not all implementations are the same, however. Not only is the number of instructions that are issued per decode cycle different, but the constraints on the instructions may significantly differ. A processor with a concurrently operating floating-point coprocessor can be referred to as superscalar, but the performance advantage it offers over a simple dynamic pipeline may be marginal, because programs may not have the required integer and floating-point combination. It is especially important in these superscalar implementations for the user to understand the capabilities, restrictions, and limitations of the hardware in assessing any possible advantage it might offer in the execution of programs.



		7.6.5 Two Scheduling Implementations



		In this section, we look at two simple prototypical scheduling implementations. Both have N = 1 and M = 1, but allow out-of-order execution (PA).

< previous page

page_467