< previous page page_506 next page >

Page 506
1. Out-of-order execution of instructions.
2. Out-of-order dispatching or issuance of instructions and out-of-order execution.
3. Issuance of multiple instructions, regardless of their order in the sequence of code.
Multiple-issue processors also have two classes: VLIW (very large instruction word) and superscalar. In the VLIW approach, multiple instructions are partitioned statically at compile time for concurrent execution. In the superscalar approach, a more conventional stream of instructions is partitioned dynamically by the action of an instruction window during execute time. Multiple-issue processors are also limited by memory; however, it is usually the data cache size rather than main memory bandwidth that forms the basic limitation. Data cache size can be a significant limitation, especially if the type of code being run references vector data structures. These structures may have a significant stride, which for smaller or intermediatesized data cache and large vector sizes display little or no locality of execution. Effective execution of scientific code on multiple-instruction-issue processors requires software that effectively blocks large vector data structures into smaller pieces that will fit in the available data cache size.
7.9 Some Areas for Further Research
While vector processors represent a relatively mature organizational approach, there are still a large number of outstanding issues. The vector pipeline seems ideally suited to using wave pipelining, mentioned in chapter 2. These pipelined techniques require a relatively static pipeline. The possibility of extremely fast vector cycle times using wave pipelining may offer a significant advantage and compensate for the implementation difficulties.
Probably the most controversial (and hence most interesting) area is the efficiency of the multiple issue, or superscalar, pipelined processor. Since the performance of these processors is known to be relatively limited, can one justify the cost (area) that the implementations must occupy to deliver this performance? The issue returns to one mentioned in chapter 2:measuring the efficiency of an implementation as a function of the area it occupies. These questions cannot be answered in the isolation of a processor design, but they must include memory hierarchy considerationscache and bandwidth requirements. In implementations of the future, it may be a more efficient use of area to include multiple processors on a single chip rather than to design extremely complex, but only marginally faster, single processors.
Both vector processors and multiple-issue machines require significant compiler support. ''Vectorizing" compilers are generally available, but compilers that block vectors so that they fit in data caches suitable for either vector or multiple-issue processors are not yet available. Compiler optimization techniques certainly play a pivotal role in determining the ultimate evolution of concurrent processor architecture.

 
< previous page page_506 next page >