page_699

< previous page

page_699

Page 699



		caches have an advantage over copyback caches, since memory is always consistent with cache. However, memory traffic can be exceedingly high. Increasing the physical word size to memory increases memory bandwidth, but does not help the situation with write-through traffic, since write-through traffic consists of single words.



		From the previous analysis, main memory cannot support the write-through traffic, so another solution needs to be considered. One such alternative is to use a Write Assembly Cache (WAC). Data show that a Write Assembly Cache of 16 8-byte lines can reduce the relative write traffic to 30%. Adding a write assembly cache serves two purposes:



		It filters out repeated writes to the same line.



		It converts single-word transfers to more efficient line transfers.



		Since a WAC is unique to the write-through cache, we must evaluate its cost to compare the merits of the different write policies. Based on data given in section 5.13, 16 8-byte lines can filter about 70% of the write traffic. The area of this Write Assembly Cache is calculated as follows:



		Cache bits



		=



		16 * 8 * 8 = 1024 bits



		Tag bits



		=



		(log₂ (memory size/write assoc. cache size) + 2 control bits + 2-bit assoc. * 16 lines



		=



		(19 + 2 + 2) * 16 = 368 bits



		Total (bytes)



		=



		174 bytes



		This is small when compared to the cache size. The write cache size becomes bigger as the write traffic increases for the superscalar and multiprocessor implementations.



		With the addition of a secondary cache, the traffic to the memory bus is reduced, since it serves as a filter between L1 traffic and the memory bus. We verify this later.



		CPI



		Different combinations of CBWA and WTNWA and different buffer management schemes have different implications for the overall CPI. For the analysis below, a "perfect write buffer" is assumed, since the effect of write traffic is analyzed separately.



		With the WTNWA configuration, we consider two buffer management schemes.



		Scheme 1: In a read miss, the entire line must be fetched in prior to the resumption of processing.



		Scheme 2: In a read miss, wraparound is used so the first word is returned first, and processing continues while memory finishes loading the rest of the line into cache.



		From chapter 6:



		T_m.miss



		=



		Time memory is busy due to a read request



		(table continued on next page)

< previous page

page_699