< previous page page_179 next page >

Page 179
9. Assume all instructions execute in a single cycle. By adding a new feature, the number of Shift, Compare, and Logical (word) instructions can be reduced by 50%. This feature increases the cycle time by 10%. What performance impact will this feature have on the R+M architecture for (a) scientific applications, and (b) commercial applications?
10. The fixed-point multiplies for the L/S architecture are to be implemented with sequences of integer shifts and adds. Assume that per fixed-point multiply, the mean additional number of shifts is 13, adds is 5, and branches is 22. Recompute the expected instructions per HLL in Table 3.4 for the L/S architecture.
11. A superscalar architecture is capable of concurrently executing one branch instruction and nine non-branch instructions. For the workload characterized by Figure 3.4 and assuming single-cycle execution of all instructions, what is the expected maximum utilization of the processor? Ignore data dependencies.
12. What does the difference between the branch target reference capture rates of a small forward branch table and a small backward branch table tell us about the nature of conditional branches? What highlevel language instructions would produce these distributions?
13. A branch table of 512 bytes can be configured asymmetrically. For the following combinations of backward bytes/forward bytes, compute the branch target capture percentages using the linear interpolation of data given in Table 3.11: (a) 256/256, (b) 384/128, (c) 448/64, (d) 480/32, (e) 496/16.
14. Assume for the R/M architecture that a conditional branch takes one cycle if the condition code was set prior to the preceding instruction, and two cycles otherwise. Assume all other instructions take a single cycle. What is the range of cycles per 100 HLL instructions for the scientific workload? What is the expected number of cycles?
15. Assume a delay of four cycles for an address interlock, and a delay of two cycles on an execution interlock. For the following sequence of code, identify all dependencies and compute the total delay:
d87111c01013bcda00bb8640fdff6754.gif
ADD.W  R7, R7, 4
LD.W   R1, 0(R7)
MUL.W  R2, R1, R1
ADD.W  R3, R3, R2
LD.W   R4, 2(R7)
SUB.W  R5, R2, R3
ADD.W  R5, R5, R4
16. Suppose two character move instructions are implemented (MOVD.C and MOVO.C), which move disjoint and overlapped character strings, respectively. MOVD.C moves up to 8 characters in at most three cycles. MOVO.C moves up to 128 characters between overlapping source and destination addresses at the rate of 4 characters per cycle, with a fixed initial delay of 2 cycles. Does this solve the dilemma of character moves? Why, or why not?

 
< previous page page_179 next page >