< previous page page_473 next page >

Page 473
Study 7.1 Control and Dataflow Timing
For the code sequence:
d87111c01013bcda00bb8640fdff6754.gif
I1 DIV.F R3, R1, R2
I
2 MPY.F R5, R3, R4
I
3 ADD.F R4, R6, R7,
assume three separate floating-point units with execution times:
Divide
8 cycles
Multiply
4 cycles
Add
3 cycles,

and show the timing for both control flow (scoreboard) and data flow.
For the scoreboard approach, we might have the following:
Cycle 1
Decoder issues
I1®DIV unit
R1®DIV unit
R2®DIV unit
TAG_DIV®R3 write scoreboard
Cycle 2
Divide begins Decoder
DIV.F
I2®MPY unit
TAG_DIV (data not ready) ®MPY unit
TAG_R4 (data ready) ®MPY unit
TAG [MPY]®R4 read scoreboard
TAG_MPY®R5 write scoreboard
Cycle 3
Multiplier waits Decoder issues
I3®ADD unit
TAG_R6®ADD unit
TAG_R7®ADD unit
TAG [ADD]®R6 read scoreboard
TAG [ADD]®R7 read scoreboard
TAG_ADD®R4 write scoreboard

Cycle 9
Data is ready, but scoreboard "holds" adder from execution.
Divide completes in this cycle.
Divide requests permission to broadcast result in next cycle (granted).
Cycle 10
Divide result ®R3
Divide result ®MPY unit
R4®MPY unit
Cycle 11
Begin MPY.F
Hold on adder removed (R4 is freed).
R6®ADD unit
R7®ADD unit
Cycle 12
Begin ADD.F
Cycle 14
MPY unit completes and requests data broadcast (granted).
ADD unit completes and requests data broadcast (denied).
Cycle 15
MPY unit result ®R5
ADD
unit requests data broadcast (granted).
Cycle 16
ADD unit result ®R4 ..

 
< previous page page_473 next page >