|
|
|
|
|
|
|
fault-tolerant computation. Generally, these systems may not provide any program speedup over a single processor. Systems that duplicate computations or that triplicate and vote on results (called triple modular redundant, or TMR, processors) are examples of designing for fault tolerance. |
|
|
|
|
|
|
|
|
The objective of this chapter is to cover processor design for program speedup. We take as a premise that no reasonable design for speedup ought to come at the expense of fault tolerance. It is generally not acceptable for the whole multiprocessor system to fail if any one of its processors fail. Thus, the designer must be aware of a general need for fault tolerance in multiprocessor systems and do nothing to jeopardize this, while at the same time directing primary design attention to program execution speedup. |
|
|
|
|
|
|
|
|
Since multiprocessors simply consist of multiple computing elements (multiple identical processors of a type such as described in chapters 4 and 7), each computing element is subject to the same basic design issues discussed in those chapters. These elements are slowed down by branch delays, cache misses, etc. The multiprocessor configuration, however, introduces speedup potential as well as additional sources of delay and performance degradation. The sources of performance bottlenecks in multiprocessors generally relate to the way the program was decomposed to allow concurrent execution on multiple processors. |
|
|
|
|
|
|
|
|
These basic issues (see Figure 8.1) may be summarized as follows: |
|
|
|
|
|
|
|
|
1. Partitioning. This is the process whereby the original program is decomposed into basic sub-program units or tasks, each of which can be assigned to a separate processor. Partitioning is performed either by programmer directives in the original source program or by the compiler at compile time. |
|
|
|
|
|
|
|
|
2. Scheduling of tasks. Associated with each program is a flow of control among the sub-program units or tasks. Certain tasks must be completed before others can be initiated (i.e., one is dependent on the other). Other tasks represent functions that can be executed independently of the main program execution. The scheduler's run-time function is to arrange the task order of execution in such a way as to minimize overall program execution time. |
|
|
|
|
|
|
|
|
3. Communication and synchronization. It does the system no good to merely schedule the initiation of various tasks in the proper order, unless the data that the tasks require is made available in an efficient way. Thus, communication time has to be minimized and the receiver task must be aware of the synchronization protocol being used. An issue associated with communications is memory coherency. This property ensures that the transmitting and receiving elements have the same, or a coherent, picture of the contents of memory, at least for data which is communicated between the two tasks. |
|
|
|
|
|
|
|
|
Sarkar [252] provides a rather complete treatment of these three problems in the context of a single assignment language, SISAL. |
|
|
|
|
|