page_672

< previous page

page_672

Page 672

Table 10.3 Pipeline delay summary.

(a) Reduced-Scale Processor

(b) Super-Pipelined Processor

Sequence

Delay

Sequence

Delay

EX/EX

0 cycles

EX/EX

½ cycle

EX/AG

0 cycles

EX/AG

½ cycle

LD/EX

1 cycle

LD/EX

1½ cycle

EX/ST

0 cycles

EX/ST

0 cycles

LD/ST

0 cycles

LD/ST

½ cycle

Table 10.4 Branch delay summary (in cycles)

(a) Reduced-Scale Processor

Penalty When

Branch type^{(assumed path)}

Taken

Not Taken

Unconditional

2 cycles

Conditional^(in-line)

2 cycles

0 cycles

Conditional^(target)

2 cycles

0 cycles

(b) Super-Pipelined Processor

Penalty when

Branch type^{(assumed path)}

Taken

Not Taken

Unconditional

2½ cycles

Conditional^(in-line)

2½ cycles

0 cycles

Conditional^(target)

2½ cycles

1 cycles



		instructions. However, the number of cycles that are actually penalty cycles is not constantit is a linear function of the distance between the two dependent instructions. The greater the distance between the two instructions, the less the actual penalty, with the worst penalty being when the two instructions are sequential. Table 3.20 shows the distribution of distances between two dependent ALU operation instructions, and Table 3.19 shows the distribution of distances between an ALU operation and an address generate phase of an instruction. For simplicity, we assume that load and store instructions follow the same distribution as the two ALU operations.



		From Chapter 4, we know that the actual penalty from a given instruction is found by applying the equation:



		whereP_1,2is the total penalty between the instructions over all possible dependency distances. The pipeline delay for any given instruction pair is

< previous page

page_672