|
|
|
|
|
|
|
10.1.1 Design Assumptions |
|
|
|
|
|
|
|
|
In order to design the new processor to be used in the Baseline Mark II, we must have some guidelines on which to base our design decisions. The following assumptions provide a basis for the use of data in earlier sections of this book: |
|
|
|
|
|
|
|
|
Cache should be adjusted for systems effects ("warm" cache). |
|
|
|
|
|
|
|
|
Use 4kB pages with 1MB segments. |
|
|
|
|
|
|
|
|
Out-of-order execution is not allowed. |
|
|
|
|
|
|
|
|
Run-on instructions stall the pipeline in all cases, not just those with dependencies between instructions.1 |
|
|
|
|
|
|
|
|
The condition codes are set after the last execution cycle. |
|
|
|
|
|
|
|
|
Arithmetic operations (integer and floating-point) are the only run-on instructions, and are not pipelined (integer add/subtract excepted). |
|
|
|
|
|
|
|
|
Wafer yield is based on the Poisson model developed in Chapter 2 and assumes a 1 defect/cm2 defect density (r). |
|
|
|
|
|
|
|
|
The cache miss penalty is 5 cycles. |
|
|
|
|
|
|
|
|
The TLB miss penalty is 20 cycles. |
|
|
|
|
|
|
|
|
The memory system can support a single word access with a latency of eight cycles, and an additional three words with an additional latency of three cycles. |
|
|
|
|
|
|
|
|
General overhead is modeled as 50% of the data path area (latch 10%, bus and wiring 40%). |
|
|
|
|
|
|
|
|
Pads, pad drivers, power supply, and guard area are modeled as a 20% full chip overhead. |
|
|
|
|
|
|
|
|
Cache overhead (address tag and information bits) is modeled as a 15% area (5% tag and status bit information, 10% area mismatch penalty). |
|
|
|
|
|
|
|
|
A direct-mapped cache can support an access rate of one word per cycle, or can be pipelined to support an access rate of one word per half-cycle (for the super-pipelined processor version), and thus requires an additional 10% area penalty for pipelining the cache. A set associative cache can support an access rate of one word per two cycles (the extra is there because the request has to go through the directory first), or can be pipelined to support an access rate of one word per cycle for an additional 10% area penalty. |
|
|
|
 |
|
 |
|
|
1This provides an upper-bound estimate. The lower bound can be generated assuming that run-on instructions stall the pipeline only when there is a data dependency, using the execution dependency distances from Table 3.20 to generate an effective penalty. However, since multiplication units are only occasionally pipelined and division units are rarely pipelined (and often use the same hardware as the multiplication unit), this lower bound is not a very useful number. |
|
|
|
|
|