|
|
|
|
|
|
|
Figure 8.12
Ep and s for various processors vs. numbers of processors (p). Data
from Karp and Flatt [155]. |
|
|
|
|
|
|
|
|
program is serial, then the maximum speedup is surely less than 10, since if the remaining 90% of the program executed in no time at all, we would still be limited to a speedup of 1/s. As observed by Karp and Flatt, we would expect s to be an increasing function of p. As p increases, surely there will be additional memory contention, resource contention, and potential scheduling difficulties. Karp and Flatt provide data on measured efficiency and the serial fraction s for a series of machines on the Linpack problem (Figure 8.12). Linpack has significantly more logical parallelism available than configured in any of the tested multiprocessor configurations; hence, this example has no partitioning limitations. For rather large machines employing relatively small numbers of processors (such as the Cray), there is little change in the serial fraction as p is increased. The Alliant has a higher serial fraction, apparently because of larger task switching overhead; hence, it has a lower efficiency, and the serial fraction increases with p. If, as might be inferred from Figure 8.11, synchronization overhead increases with increasing numbers of processors and efficiency falls as s increases, then this puts a serious limitation on large multiprocessor configurations, since as we have seen earlier, speedup is generally limited to 1/s. |
|
|
|
|
|
|
|
|
Even if synchronization can be low, partitioning (especially control structure limitations) serves as another bound on multiprocessor speedup. Because of limitations in the control structure, a program that is amenable to execution by p processors may be able to use only a much smaller number of processors at various points during its program execution. The result is generally similar to the Amdahl Law effect that we saw in the previous discussion. |
|
|
|
|
|
|
|
|
Suppose we designate the amount of time that a p processor system spends in the execution of a program using i processors Ti,p. Thus, T1,p = Ts, the serial time that we introduced in our earlier discussion. R. B. Lee [182] |
|
|
|
|
|