< previous page page_260 next page >

Page 260
Dependency detection comes at the expense of interlocks, however. The interlocks consist of logic associated with the decoder to detect dependencies and ensure proper logical operation of the machine in executing code sequences.
Complete analysis requires multiple models with multiple simulators and a well-developed and representative workload, but a good estimate of performance for a representative workload can be found with the linear additive delay model outlined in this chapter.
We will study more complex processors in Chapter 7, after we better understand the role of cache (Chapter 5) and memory (Chapter 6).
4.10 Some Areas for Further Research
Analysis of pipelined processor designs, new algorithms for improving performance, and the organization and analysis of multiple-instruction-issue pipelined machines have become hot topics in computer hardware research. Even a cursory review of any of the recent computer architecture conference proceedings will illustrate the scope of activity and the depth of interest in these areas.
Despite being well studied, pipelined processor design still remains a fruitful research area, since many tradeoffs and optimizations are possible. The basic tradeoff between cycle time and the action completed with the cycle (hence, the number of cycles for instruction execution) remains a key issue. Techniques that improve branch performance, execution run-on, or dependency delay can be important secondary factors in the cycle-time issue. Design techniques that maintain good pipeline performance with a large latency (number of cycles) of instruction execution allow the use of shorter cycles and, potentially, realize higher overall performance.
4.11 Data Notes
Data Note 1: The Linear Performance Model.
The model of pipeline performance presented here is based on a linear accumulation of various delays. The implicit assumption is that these are delays that occur independently and that their effects on performance are linearly accumulatedthat is, they are not overlapped. While generally true, there are exceptions. For example, a divide operation (run-on) may continue despite the fact that a subsequent instruction causes a cache miss. The net effect of this is that the linear accumulation of delays ought to be a conservative estimate of performance. (Dubey [76] presents a more accurate model.)
The difficulty for the designer is in accurately anticipating all of the events that cause delay and properly attributing the frequency of occurrence of these events. The biggest flaw in modeling pipeline processor design is to overlook the occurrence of a generally infrequent event that has a large performance penalty. If crucial applications incur a disproportionately high ratio of the overlooked dependency, the performance objectives will not be met.

 
< previous page page_260 next page >