Summary

A Comparative Analysis of Branch Prediction Schemes

Zhendong Su and Min Zhou

Computer Science Division
University of California at Berkeley
Berkeley, CA 94720

Conclusion

In this report, we have analyzed the performance of several most well-known and effective branch prediction schemes with respect to their branch prediction performance and cost effectiveness. The schemes that we have looked at are static, bimodal, common correlation, gshare, local, gselect, and selective. Thanks for the fast speed of the Shade analyzers, we were able to trace hundreds times more instructions than previous research, and more accurate results could be obtained.

The following findings may be of interest for future research in branch prediction and new architecture design.

Selective predictors perform better than other schemes using same size of branch prediction tables. This kind of predictors can adapt to the branch behavior of the running program to achieve high prediction accuracy. Since we used a large set of benchmark programs and traced most of them to the end, we believe this conclusion, which contradicts to some other research based on either fewer testing programs or much smaller tracing portion of each testing program, is more convincing. We anticipate this kind of predictors will be popular in future's branch predictor designs.
Gselect predictors perform especially well on highly correlated programs such as integer programs, which contain many if-then-else statements. Local predictors perform especially well on floating point programs. This good performance is due to the fact that floating point programs have many looping structures. Local predictors keep a history register for each branch address, and thus reduce the interference between different branches. However, for the same reason, they do not perform as well as gselect predictors on integer programs.
Gshare is a set of predictors which have the effect of reducing aliasing. Many of the dynamic schemes suffers from aliasing, which potentially makes the branch prediction table sparse. Gshare tries to reduce aliasing. It xors branch address with global history register to distribute the 2-bit counters more evenly. Although reduced aliasing should improve prediction accuracy, we did not observe any net performance win from this approach.
With respect to cost-effectiveness of different approaches, gselect and selective seem to be the clear winners. Gselect, local, and selective have about the same performance. However, local takes about 20% more space than the other two schemes and has higher implementation complexity than gselect.
We also looked at the effects of changing buffer size and context switches on branch prediction. For simple schemes, increasing buffer size does not have as evident an effect on prediction compared to more complex schemes. For example, while the bimodal scheme does not experience any performance improvement as long as the buffer size is over 8K bits, the gselect scheme still shows clear improvement even if the buffer is over 2M bits.
As to the effect of context switches, we observe that the effect of context switches on prediction rate decreases as the number of instructions between context switches gets larger. We see very little effect after the number of instructions is over 1 million. This is exactly what we expected since the larger the number of instructions between context switches, the less frequently the prediction tables need to be flushed. We have also noticed that for less complicated schemes such as bimodal, the effect of context switches is not as evident as for some more complicated schemes such as gselect. This also indicates complicated schemes need longer warm-up time than simple schemes.
Branch behavior of a program is a very important factor, if not the most important, to determine the performance of any branch prediction schemes. The behavior of an integer program go from SPECint95-beta strongly supports this. This program is not biased at all, with about same number of branches for each 5% interval of takenness. All of the seven schemes perform poorly on this program if only 8K bits branch buffer is used. With increased buffer size, we show some of the schemes have big improvement in prediction accuracy.

Project Home | Previous Section: Future Work | Next Section: References

A Comparative Analysis of Branch Prediction Schemes

Zhendong Su and Min Zhou

Computer Science Division University of California at Berkeley Berkeley, CA 94720

Conclusion

Computer Science Division
University of California at Berkeley
Berkeley, CA 94720