< previous page page_672 next page >

Page 672
Table 10.3 Pipeline delay summary.
(a) Reduced-Scale Processor
(b) Super-Pipelined Processor
Sequence
Delay
Sequence
Delay
EX/EX
0 cycles
EX/EX
½ cycle
EX/AG
0 cycles
EX/AG
½ cycle
LD/EX
1 cycle
LD/EX
1½ cycle
EX/ST
0 cycles
EX/ST
0 cycles
LD/ST
0 cycles
LD/ST
½ cycle

Table 10.4 Branch delay summary (in cycles)
(a) Reduced-Scale Processor
Penalty When
Branch type(assumed path)
Taken
Not Taken
Unconditional
2 cycles
Conditional(in-line)
2 cycles
0 cycles
Conditional(target)
2 cycles
0 cycles
(b) Super-Pipelined Processor
Penalty when
Branch type(assumed path)
Taken
Not Taken
Unconditional
2½ cycles
Conditional(in-line)
2½ cycles
0 cycles
Conditional(target)
2½ cycles
1 cycles

instructions. However, the number of cycles that are actually penalty cycles is not constantit is a linear function of the distance between the two dependent instructions. The greater the distance between the two instructions, the less the actual penalty, with the worst penalty being when the two instructions are sequential. Table 3.20 shows the distribution of distances between two dependent ALU operation instructions, and Table 3.19 shows the distribution of distances between an ALU operation and an address generate phase of an instruction. For simplicity, we assume that load and store instructions follow the same distribution as the two ALU operations.
From Chapter 4, we know that the actual penalty from a given instruction is found by applying the equation:
0672-01.gif
whereP1,2is the total penalty between the instructions over all possible dependency distances. The pipeline delay for any given instruction pair is

 
< previous page page_672 next page >