< previous page page_456 next page >

Page 456
0456-01.gif
Figure 7.27
A partial VLIW format. Each fragment concurrently accesses a
single centralized register set.
7.5 Multiple-Issue Machines
The alternative to vector processors is multiple-issue machines. There are two broad classes of multiple-issue machines: statically scheduled and dynamically scheduled. In principle, these two classes are quite similar. Dependencies among groups of instructions are evaluated, and groups found to be independent are simultaneously dispatched to multiple execution units. For statically scheduled processors, this detection process is done by the compiler, and instructions are assembled into instruction packets, which are decoded and executed at run time. For dynamically scheduled processors, the detection of independent instructions may also be done at compile time and the code suitably arranged to optimize execution patterns, but the ultimate selection of instructions (to be executed or dispatched) is done by the hardware in the decoder at run time. In principle, the dynamically scheduled processor may have an instruction representation and form that is indistinguishable from slower pipeline processors. Statically scheduled processors must have some additional information either implicitly or explicitly indicating instruction packet boundaries.
Early static multiple-issue machines include the so-called VLIW (very long instruction word) machines [89], typified by processors from Multiflow and Cydrome. These machines use an instruction word that consists of 8 to 10 instruction fragments. Each fragment controls a designated execution unit; thus, the register set is extensively multiported to support simultaneous access to the multiplicity of execution units. In order to accommodate the multiple instruction fragments, the instruction word is typically over 200 bits long. (See Figure 7.27.) In order to avoid the obvious performance limitations imposed by the occurrence of branches, a novel compiler technology called trace scheduling was developed. By use of trace scheduling, the dynamic frequency of branching is greatly reduced. Branches are predicted where possible, and on the basis of the probable success rate the predicted path is incorporated into a larger basic block. This process continues until a suitably sized basic block can be efficiently scheduled. If an unanticipated (or unpredicted) branch occurs during the execution of the code, at the end of the basic block the proper result is fixed up for use by a target basic block.
More recent attempts at multiple-issue processors have been directed at rather lower amounts of concurrency. The potential speedup available from the Multiflow compiler using trace scheduling under ideal processor conditions is generally less than 3 [78]. Based upon this, it would seem that in the absence of significant compiler breakthroughs, available speedup is limited. Recent attempts at multiple-issue processor design have generally focused on more modest objectives (Table 7.4). Johnson [149] refers to this new generation of multiple-issue machines, whether concurrency is statically or dynamically determined, as superscalar machines.

 
< previous page page_456 next page >