< previous page page_178 next page >

Page 178
4. In a L/S architecture with a branch target table, suppose that forward branches are four times "more expensive" (due to the cost of prefetching, etc.) than backward branches. For a 256- and 512-byte buffer, create a cost-effective branch table. The cost can be approximated as the number of entries for backwards-directed branches plus four times the number of prefetched entries. Use Table 3.11 and interpolate data for intermediate (non-power of 2) buffer sizes, as needed. Compare the percent of found references and the cost to the centered branch table of the same sizes.
5. What is the effect of floating-point operations that add the following extra cycles to execution:
Add/subtract
2 cycles
Multiply
6 cycles
Divide
12 cycles

d87111c01013bcda00bb8640fdff6754.gif
Assume a scientific environment and L/S architecture where all nonfloating-point instructions execute in one cycle and floating-point extra cycles cannot be overlapped. Compute the effect of the preceding floating-point execution on per-instruction execution time (in cycles per instruction).
6. Suppose now that the floating point extra cycles in the previous problem can be overlapped except for dependency effects noted in Table 3.20. Compute the effects of dependency [i.e., given floating point operation, compute the effect of a preceding (a) add, (b) multiply, and (c) divide].
7. The execution time of variable operand length instructions is usually determined by the operand length. For a commercial environment and R/M machine, suppose all instructions (except variable operand length instructions) execute in one cycle. Suppose for the variable operand length instructions their execution is the same as the number of bytes (or digits, for decimal) of the length of the source operands. Assuming these cycles cannot be overlapped, what is the resultant performance in cycles per instruction? Use Table 3.22.
8. For an R/M machine, if integer multiply and divide each take 32 cycles and floating point multiply and divide each take 48 cycles, and character and decimal instructions take execution time (in cycles) equal to the length of the longest operand (in bytes or digits) and multiple register moves take an execution cycle per register moved, compute the run-on delay for:
d87111c01013bcda00bb8640fdff6754.gif
(a) Scientific environment.
d87111c01013bcda00bb8640fdff6754.gif
(b) Commercial environment.
d87111c01013bcda00bb8640fdff6754.gif
(c) Systems environment.
d87111c01013bcda00bb8640fdff6754.gif
Execution occurs in order and all other instructions have been allocated one execution cycle.

 
< previous page page_178 next page >