|
|
|
|
|
|
|
Figure 2.9
Pipelined processors. |
|
|
|
|
|
|
|
|
2.2.4 Pipelined Processors |
|
|
|
|
|
|
|
|
Optimizing the partitioning of instructions into cycles is only one way to speed up program execution. Another approach uses concurrency of instruction execution. One could, for example, begin fetching the next instruction as soon as the current instruction had been decoded. An extension to this is the pipelined machine, where as soon as one instruction is begun, i.e., decoded, the next instruction is decoded. We fetch, decode, and execute one instruction each cycle. |
|
|
|
|
|
|
|
|
Pipelined machine instruction execution is shown in Figure 2.9. Suppose we have a very simple instruction execution process consisting of four cycles: |
|
|
|
|
| |
|
|
|
|
Instruction fetch from cache into the IR |
|
|
|
| | | | |
|
|
|
|
Data fetch from either memory or register set (ignoring address generation, etc.) |
|
|
|
| | |
|
|
|
|
|
|
Pipelined machines attempt to keep each pipeline segment busy all the time. As soon as the first instruction completes the IF cycle, the next instruction is fetched, etc., much as would happen in an assembly line. Pipelined instruction execution can significantly speed up program running time. If the pipeline has four segments or stages, the maximum speedup is four times the well-mapped machine. |
|
|
|
|
|
|
|
|
Of course, speed has a pricepipelined processors are complex and costly and one never really achieves the four-times speedup. Still, such processors are cost-effective overall as long as chip area accommodates the required added complexity. Almost all recently introduced microprocessors are pipelined. |
|
|
|
|
|
|
|
|
A basic optimization for the pipeline processor designer is the partitioning of the pipeline into concurrently operating segments. The more segments, the higher the maximum speedup. However, each new segment carries clocking overhead with it, which adversely affects performance. |
|
|
|
|
|
|
|
|
If we ignore quantization effects we can determine an optimal cycle time, Dt, and hence the degree of functional segmentation for a simple pipelined |
|
|
|
|
|