< previous page page_529 next page >

Page 529
0529-01.gif
Figure 8.12
E
p and s for various processors vs. numbers of processors (p). Data
from Karp and Flatt [155].
program is serial, then the maximum speedup is surely less than 10, since if the remaining 90% of the program executed in no time at all, we would still be limited to a speedup of 1/s. As observed by Karp and Flatt, we would expect s to be an increasing function of p. As p increases, surely there will be additional memory contention, resource contention, and potential scheduling difficulties. Karp and Flatt provide data on measured efficiency and the serial fraction s for a series of machines on the Linpack problem (Figure 8.12). Linpack has significantly more logical parallelism available than configured in any of the tested multiprocessor configurations; hence, this example has no partitioning limitations. For rather large machines employing relatively small numbers of processors (such as the Cray), there is little change in the serial fraction as p is increased. The Alliant has a higher serial fraction, apparently because of larger task switching overhead; hence, it has a lower efficiency, and the serial fraction increases with p. If, as might be inferred from Figure 8.11, synchronization overhead increases with increasing numbers of processors and efficiency falls as s increases, then this puts a serious limitation on large multiprocessor configurations, since as we have seen earlier, speedup is generally limited to 1/s.
Even if synchronization can be low, partitioning (especially control structure limitations) serves as another bound on multiprocessor speedup. Because of limitations in the control structure, a program that is amenable to execution by p processors may be able to use only a much smaller number of processors at various points during its program execution. The result is generally similar to the Amdahl Law effect that we saw in the previous discussion.
Suppose we designate the amount of time that a p processor system spends in the execution of a program using i processors Ti,p. Thus, T1,p = Ts, the serial time that we introduced in our earlier discussion. R. B. Lee [182]

 
< previous page page_529 next page >