A Comparative Analysis of Branch Prediction Schemes
Conclusion
In this report, we have analyzed the performance of several most
well-known and effective branch prediction schemes with respect
to their branch prediction performance and cost effectiveness.
The schemes that we have looked at are static,
bimodal, common correlation, gshare,
local, gselect, and selective.
Thanks for the fast speed of the Shade analyzers, we
were able to trace hundreds times more instructions than
previous research, and more accurate results could be
obtained.
The following findings may be of interest for future research
in branch prediction and new architecture design.
- Selective predictors perform better than other schemes
using same size of branch prediction tables. This kind of predictors
can adapt to the branch behavior of the running program to achieve
high prediction accuracy. Since we used a large set of benchmark
programs and traced most of them to the end, we believe this
conclusion, which contradicts to some other research based on
either fewer testing programs or much smaller tracing portion of each
testing program, is more convincing. We anticipate this kind of
predictors will be popular in future's branch predictor designs.
- Gselect predictors perform especially well on highly
correlated programs such as integer programs, which contain
many if-then-else statements. Local predictors perform
especially well on floating point programs. This good performance
is due to the fact that floating point programs have many
looping structures. Local predictors keep a history
register for each branch address, and thus reduce the interference
between different branches. However, for the same reason,
they do not perform as well as gselect predictors
on integer programs.
- Gshare is a set of predictors which have the effect of
reducing aliasing. Many of the dynamic schemes suffers from aliasing,
which potentially makes the branch prediction table sparse. Gshare
tries to reduce aliasing. It xors branch address with
global history register to distribute the 2-bit counters more
evenly. Although reduced aliasing should improve prediction accuracy,
we did not observe any net performance win from this approach.
- With respect to cost-effectiveness of different approaches,
gselect and selective seem to be the clear winners.
Gselect, local, and selective have about
the same performance. However, local takes about 20%
more space than the other two schemes and has higher
implementation complexity than gselect.
- We also looked at the effects of changing buffer size and context
switches on branch prediction. For simple schemes, increasing buffer
size does not have as evident an effect on prediction compared to
more complex schemes. For example, while the bimodal scheme
does not experience any performance improvement as long as the
buffer size is over 8K bits, the gselect scheme still
shows clear improvement even if the buffer is over 2M bits.
- As to the effect of context switches, we observe that the effect
of context switches on prediction rate decreases as the number of
instructions between context switches gets larger. We see very little
effect after the number of instructions is over 1 million. This is
exactly what we expected since the larger the number of instructions
between context switches, the less frequently the prediction tables
need to be flushed. We have also noticed that for less complicated
schemes such as bimodal, the effect of context switches is not
as evident as for some more complicated schemes such as gselect.
This also indicates complicated schemes need longer warm-up
time than simple schemes.
- Branch behavior of a program is a very important factor, if not the
most important, to determine the performance of any branch prediction
schemes. The behavior of an integer program go from SPECint95-beta
strongly supports this. This program is not biased at all, with about same
number of branches for each 5% interval of takenness. All of the
seven schemes perform poorly on this program if only 8K bits branch
buffer is used. With increased buffer size, we show some of the schemes
have big improvement in prediction accuracy.
Project Home
|
Previous Section: Future Work
|
Next Section: References