page_537

< previous page

page_537

Page 537



		Study 8.1 SRMP vs. Pipelined Processor



		In this study, we contrast a conventional pipelined processor (similar to our baseline) with a four-processor SRMP occupying roughly the same chip area.



		Suppose an L/S pipelined processor has a 16KB I-cache and an 8KB D-cache, both set associative, CBWA and LRU replacement. The caches have a 16B line and miss delay of eight cycles. The processor makes one I-refr/I and 0.5 D-refr/I. The processor itself has performance of 1.5 CPI without cache misses (i.e., one CPI for decode and 0.5 CPI for branch, run-on, and other effects). We contrast the piplined processor with a four-processor SRMP. Each processor has its own register set and I-cache (4KB direct mapped). The SRMP shares D-cache, decoder, floating point ALU, etc. Once a processor is stalled (cache miss, etc.), it immediately switches on the next cycleto the next available processor. The SRMP D-cache is designed to allow it to "non-block" on a miss; i.e., the miss is processed concurrently with accesses for another processor (unless, of course, it is to the missed line).



		Pipelined Processor Analysis



		The base CPI = 1.5.



		The additional CPI lost due to cache misses (using chapter 4 data) is computed as follows:



		I-cache CPI loss



		=



		I-cache miss rate ´ I-refr/I ´ miss penalty



		=



		[0.05 ´ 1.04] ´ 1 ´ 8 cycles



		=



		0.42 CPI.



		D-cache CPI loss



		=



		D-cache miss rate ´ D-refr/I ´ miss penalty



		=



		[0.08 ´ 1.04] ´ 0.5 ´ 8 cycles



		=



		0.33 CPI.



		Pipelined processor CPI total = 2.25.



		SRMP (Figure 8.15)



		Now each processor has its own I-cache: 4KB direct mapped. They share the D-cache. This ensures cache consistency and simplifies the I-cache design.



		I-cache CPI loss



		=



		[.095 ´ 1.29] ´ 1 ´ 8 cycles



		=



		0.98 CPI



		The D-cache has data for four processors resident. We approximate this situation by using MP = 3 (warm start) and Q = 100.



		D-cache CPI loss



		=



		[0.26 ´ 1.04] ´ 0.5 ´ 8 cycles



		=



		1.08 CPI.



		Total CPI for single SRMP processor = 3.56 CPI.

< previous page

page_537