As a design strategy, one could make the buffer two times the expected mQ, then realize that the effective g is significantly less than gopt due to buffer overflows.
Conservatively assume that with a TBF= 32, we achieve g= 0.5; now:
B(m, n, g = 0.5) = 10.13
and
mQc = n + ng - B = 7.9
and
Our buffer size now must be large enough to support g = 0.5. We have assumed that this could be accomplished with a total buffer (TBF) of, say:
2 ´ ngopt = 2 ´ 12 ´ 1.375 33 entries,
or about 32 entries, of which ng + nB or (12 ´ .5 + 12 10.13 » 8) are occupied on the average. The achieved bandwidth and the expected number of requests in a buffer are plotted as a function of g in Figures 7.20 and 7.21.
7.3.4 Bypassing between Vector Instructions
Consider now the effect of all this in terms of a timeline. In order to load a vector register (64 entries), the following time steps are required: