page_516

< previous page

page_516

Page 516

Dependency

Task List

T₁

T₂

T₃

T₁

0

-

-

T₂

1

0

-

T₃

0

1

0



		Figure 8.4 Dependency matrix. A 'one' entry indicates a dependency; e.g., in this figure a T₂ depends on T₁ and T₃ depends on T₂.



		3. Implicit parallelism, which can be detected via compilers such as Parafrase [235]. These compilers use sophisticated techniques to find the parallelism, such as data dependency tests to transform normal serial code into an efficient program form for execution on multiprocessors.



		The techniques for partitioning programs into independent tasks suitable for concurrent execution use the same type of control graph to define task precedence as we used with instruction-level concurrency. This control graph can be represented as a dependency matrix (Figure 8.4). The same types of dependencies apply, and hence the same theory of task dependencies expresses the relationship among the concurrent tasks. The basic difference between instruction-level concurrency and task-level concurrency through compilation is the scope of the detection process. While instruction-level concurrency detection can be compiler-assisted, usually it is complemented by a run-time examination of the several instructions in the instruction window. The compiler sees the entire program, and, at least in principle, has a great deal more flexibility in determining and assigning independent events. A run-time concurrency detector sees the actual state of the machine and can optimize instruction issuance over a small scope of instructions.



		EXAMPLE 8.1 FINDING PARALLELISM IN A PROGRAM



		Section (a) of Figure 8.5 shows a small section of code that replaces an array, a[], of size 3n with a subsampled version of size n. Sections (b), (c), and (d) are attempts to improve the achievable execution concurrency. Section (e) shows the final subarray and the original array.



		The loop in (a) cannot be split to run on multiple processors, due to the induction variable m and the forward dependency of the array a[]. On each iteration of the loop, the value of m is incremented by 3, using the previous value of m. We might write it this way in order to replace an expensive multiplication with an addition, on machines where multiplication takes more cycles than an addition.



		The code can be rewritten as shown in (b) to eliminate m's dependence on its value in previous iterations of the loop. We hope that the increase in parallelization will offset any additional cycles lost by replacing the addition with a multiplication.



		The code in (b) still has an essential or forward dependency with respect to array a[]. If all of the iterations were scheduled at once, it would be

< previous page

page_516