|
|
|
| Table 7.9 Estimated additional relative costs of vector processor and multiple-issue processor (assumptions: Same base processor and same execution units; changes (additions) measured in A units; A = 1,481 rbe). | | Vector Processor | Multiple-Issue Processor | | Vector registers | | | | Vector memory management and buffers (m = 8) | | | | I-cache and I-fetch | | | | Decoder | | | | D-cachea | | | | aAssuming the vector processor has 16KB (scalars only); the multiple-issue processor has 3264KB (i.e., 1648KB additional). |
|
|
|
|
|
|
larger data cache to ensure high performance. The vector processor also must have support hardware for managing access to the memory system, and the memory system itself is significantly complicated by the requirements for direct vector access. Even with processors with 64256 MB of main storage, it is difficult to accommodate the high degrees of interleaving required to support processor bandwidth. Figure 7.54 illustrates memory size vs. chip size at various levels of interleaving. |
|
|
|
|
|
|
|
|
Suppose we build a vector processor with a cycle time of 10 ns, using DRAM memory modules with a memory cycle time of 80ns. If we allow two accesses to memory per processor cycle, then we have n, the total number of accesses per memory cycle, =16. Assuming that g = 1 (recall that g is the bypass factor discussed earlier in this chapter) is achievable with sufficiently large buffer size, and reducing interleaving to the lowest practical limit (m = n), we have B(16,16,g = 1). This gives a vector processing slowdown of 0.78 due to memory contention. Instead of the speedup of Sp during vector processing, the processor achieves only 0.78Sp. Of course, we could raise this by increased interleaving, say m = 2n, but this doubles the minimum memory configuration. |
|
|
|
|
|
|
|
|
Multiple-issue machines do not achieve performance merely by including additional execution resources. The basic hardware register set (presumably 32 registers, each with 64 bits) must be arranged to support 46 reads and 23 writes per cycle. This also increases the area required for buses between the arithmetic units and the registers. The multiple-issue system must also have the ability to access and hold multiple instructions each cycle from the I-cache. This in general significantly increases the size of the I-fetch path between the I-cache and the instruction decoder/instruction register. Finally, at the instruction decoder, multiple instructions must be decoded simultaneously and detection for independence/instruction interlock must be performed. Table 7.9 summarizes the relative costs of the two approaches. The difference depends heavily on the size of the data cache required by the multiple issue processor to realize its performance potential and the relative cost of the memory interleaving required by the vector processor. We see more of this in the next section. |
|
|
|
|
|