page_179

< previous page

page_179

Page 179



		9. Assume all instructions execute in a single cycle. By adding a new feature, the number of Shift, Compare, and Logical (word) instructions can be reduced by 50%. This feature increases the cycle time by 10%. What performance impact will this feature have on the R+M architecture for (a) scientific applications, and (b) commercial applications?



		10. The fixed-point multiplies for the L/S architecture are to be implemented with sequences of integer shifts and adds. Assume that per fixed-point multiply, the mean additional number of shifts is 13, adds is 5, and branches is 22. Recompute the expected instructions per HLL in Table 3.4 for the L/S architecture.



		11. A superscalar architecture is capable of concurrently executing one branch instruction and nine non-branch instructions. For the workload characterized by Figure 3.4 and assuming single-cycle execution of all instructions, what is the expected maximum utilization of the processor? Ignore data dependencies.



		12. What does the difference between the branch target reference capture rates of a small forward branch table and a small backward branch table tell us about the nature of conditional branches? What highlevel language instructions would produce these distributions?



		13. A branch table of 512 bytes can be configured asymmetrically. For the following combinations of backward bytes/forward bytes, compute the branch target capture percentages using the linear interpolation of data given in Table 3.11: (a) 256/256, (b) 384/128, (c) 448/64, (d) 480/32, (e) 496/16.



		14. Assume for the R/M architecture that a conditional branch takes one cycle if the condition code was set prior to the preceding instruction, and two cycles otherwise. Assume all other instructions take a single cycle. What is the range of cycles per 100 HLL instructions for the scientific workload? What is the expected number of cycles?



		15. Assume a delay of four cycles for an address interlock, and a delay of two cycles on an execution interlock. For the following sequence of code, identify all dependencies and compute the total delay:



		ADD.W R7, R7, 4 LD.W R1, 0(R7) MUL.W R2, R1, R1 ADD.W R3, R3, R2 LD.W R4, 2(R7) SUB.W R5, R2, R3 ADD.W R5, R5, R4



		16. Suppose two character move instructions are implemented (MOVD.C and MOVO.C), which move disjoint and overlapped character strings, respectively. MOVD.C moves up to 8 characters in at most three cycles. MOVO.C moves up to 128 characters between overlapping source and destination addresses at the rate of 4 characters per cycle, with a fixed initial delay of 2 cycles. Does this solve the dilemma of character moves? Why, or why not?

< previous page

page_179