< previous page page_428 next page >

Page 428
0428-01.gif
Figure 7.1
For an array in memory, different
accessing patterns use different
strides in accessing memory.
3. They organize the data argument into regular sequences that can be efficiently handled by the hardware.
4. They can represent a simple loop construct, thus removing the control overhead for loop execution.
Vector processing hardware does not come for free. It requires some obvious extensions to the instruction set, together with (for best performance) extensions to the functional units, the register sets, and particularly to the memory of the system. Early vector processors used primarily memory-to-memory instruction formats. Vectors, as they are usually derived from large arrays, are the one data structure that is not always well managed by a data cache. Accessing array elements, separated by an addressing distance (called the stride), may completely fill a smaller to intermediate-sized data cache with data of little or no temporal locality; hence there is no re-use of the localities before the items must be replaced (Figure 7.1).
To decouple arithmetic processing from memory, almost all modern vector processors include vector register hardware. The vector register set is the source and destination for all vector operands. Access from or to these registers to or from memory usually bypasses the cache. The cache then contains only scalar data objectsobjects not used in the vector registers (Figure 7.2).
7.2.1 Vector Functional Units
The vector registers typically consist of eight or more register sets, each consisting of 1664 vector elements. Each vector element is a floating-point word (Figure 7.3).
The vector registers access memory with special load and store instructions that will be described later. The vector execution units are usually arranged as an independent functional unit for each instruction class. Thus, as a

 
< previous page page_428 next page >