< previous page page_426 next page >

Page 426
rency. In vector processors, a single instruction replaces multiple (scalar) instructions. As we shall see, vector processors use vector instructions, which depend heavily on the ability of the compiler to vectorize the code to transform loops into sequences of vector operations. Similarly, multiple issue machines require the compiler to detect sequences of instructions whose effects are independent from each another. (See study 7.2.) Without such code structuring, there is no concurrent instruction execution, and hence no use is made of the additional resources designed into the concurrent processor to achieve its potential speedup over a pipelined processor.
If programs are to execute multiple instructions concurrently, they require additional execution resources, such as adders and multipliers, to allow independent instructions to complete simultaneously. Associated with these resources is additional control to manage dispatching and scheduling of concurrent operations.
Finally, the concurrent processor depends heavily on the memory system to supply the operand and instruction bandwidth required to execute programs at the desired rate. In many ways, the memory system design is the key to achieving effective concurrent processor hardware [249, 301].
In this chapter, we look at two general approaches to realizing concurrent processors: the vector processor and the multiple-issue processor. Multipleissue processors are further subdivided into VLIW (very long instruction word) and superscalar processors. The premise behind the concurrency in the vector processor differs somewhat from that of the multiple-issue processor. The premise underlying the vector processor is that the original program has either explicitly declared many of the data operands to be vectors or arrays or it implicitly uses loops whose data references can be expressed as references to a vector of operands.
The premise behind the multiple-issue machine is simply that instructions can be found whose effects are independent from one another. The search for independent instructions may be done at run time by the hardware or at compile time. The scope of the search is different, depending on which approach is chosen. At run time, the search for concurrent instructions is restricted to the localities of the last executing instruction, while at compile time the compiler may globally search the entire program for sets of concurrent (hence independent) instructions. Whether or not the hardware supports run-time concurrent instruction detection, it is always better to support the multiple-issue processor with the best possible compiler technology to provide the maximum program speedup.
The issue of program speedup as a result of concurrent processing is important to the understanding of the effectiveness of the concurrent processor. We introduce the notion of speedup here.
Speedup (Sp) is:
0426-01.gif
where T1 is the time it takes a nonconcurrent, single-pipelined processor to execute a task, and Tp is the time it takes the concurrent processor to execute the same task. The subscript p is the maximum degree of instructionlevel concurrency available in the concurrent processor. Since the minimum time it takes the concurrent processor to execute the program is Tp = T1/p, the maximum speedup is p.

 
< previous page page_426 next page >