|
|
|
|
|
|
|
In the definition of Sp, it is assumed that the program representations for the simple pipelined processor (the T1 time) and the concurrent processor (the Tp time) are optimized for their respective organizations. Several things are important to notice about the concept of speedup: |
|
|
|
|
|
|
|
|
1. Speedup is referenced to the best pipelined processor algorithm (for T1) and the best concurrent processor algorithm (for Tp). These are generally not the same algorithm. |
|
|
|
|
|
|
|
|
2. Speedup is effectively determined by the harmonic mean of T1/Tp, not the arithmetic mean. Suppose a certain concurrent processor executes half of its workload in the same manner as a uniprocessor (Sp = 1), but the other average workload is done in only 1/3 the time that the uniprocessor executes (Sp = 3). We have a speedup of: |
|
|
|
 |
|
|
|
|
Note that this is not the same as a speedup of .5 + 1.5 = 2. |
|
|
|
|
|
|
|
|
A classic method of speeding up processor performance in high-speed machines is to extend the instruction set and the architecture of the system to support the execution of commonly used vector operations in hardware. Directly supporting vector operations in hardware generally reduces or eliminates the overhead of loop control, which would otherwise be necessary in representing the vector operation as a loop construct. Thus, arithmetic operations of the form: |
|
|
|
|
|
|
|
|
are represented by a vector instruction of the type |
|
|
|
|
|
|
|
|
where VOP represents a vector operation, and V1, V2, V3 indicate specific vector registers. The operation performed is: |
|
|
|
|
|
|
|
|
for all register values within each vector register. |
|
|
|
|
|
|
|
|
The individual element within a vector register is designated V1.X. Thus, the vector elements of a 64-element vector register V1 are indicated as V1.1 through V1.64. |
|
|
|
|
|
|
|
|
Vector instructions are effective in several ways. |
|
|
|
|
|
|
|
|
1. They significantly improve code density. |
|
|
|
|
|
|
|
|
2. They reduce the number of instructions required to execute a program (they reduce the I-bandwidth). |
|
|
|
|
|