< previous page page_505 next page >

Page 505
7.7.3 Alternative Organizations
There are two obvious alternatives to overcoming the limitations of either processor type. If the memory implementations allow reasonably efficient, cost-effective direct access to memory, then the multiple-issue machine can profit significantly by simply having the ability to force the system to leave certain data structures (e.g., vectors and arrays) in memory and not allow those structures to be brought into the data cache. Of course, even if this is done, vector references directed at memory must still be checked in the cache directory to ensure that a data item was not accessed under a scalar alias. Another approach is to include a very large cache for all processors, either vector or scalar. According to a study by Gee and Smith [99], a cache of about 4 MB would be sufficient to capture most structured data. If this cache were designed to match required vector access bandwidth, it could provide an overall significant improvement in performance. It would have a double advantage; it would:
1. Reduce the n½, or effective pipeline startup time, by providing a quicker access to the data structures, and
2. Provide a uniform access pattern between cache and main memory, so that main memory need not be accessed by both scalar cache and single word vector references.
7.8 Conclusions
Vector processors and multiple-instruction-issue processors are alternative approaches to realize performance in excess of one instruction per cycle (less than 1 CPI)the usual limit for a pipelined processor. Historically, the evolution of microprocessors has closely followed that of mainframe computers. Microprocessors have increasingly adopted techniques pioneered in mainframes. It is in the area of highly concurrent processors targeted at under 1 CPI that we seem to see a cleavage in the mainframe and microprocessor approaches. Mainframes (and supercomputers) have adopted a vector processor approach to concurrent processing. This approach relies heavily on the availability of multiple large (vector) register sets, which are directly loaded from main store. In turn, these vector registers make rapid pipelined use of functional units. These processors depend on the ability of the compiler to detect and use large numeric data structures organized as vectors. They are generally limited by available memory bandwidth. As memory chips have increased in size, it has become increasingly difficult and/or expensive to design memory systems that can supply the very high data rates required by the vector processor.
Pipelined processors based on out-of-order execution and concurrent execution of multiple instructions are a natural extension to the pipelined processor. If instructions are allowed to complete their execution out of order, a speedup is realized in processor performance. Processor performance can be enhanced at three different levels of implementation, each requiring more complex implementation but affording an enhanced level of performance:

 
< previous page page_505 next page >