A Comparative Analysis of Branch Prediction Schemes

Zhendong Su and Min Zhou

Computer Science Division
University of California at Berkeley
Berkeley, CA 94720


Introduction

Today's fast CPUs allow very deep pipelines and wide issue rates, which are two of the most effective ways of improving performance of processors. Branches impede machine performance in that conditional branch is not resolved until the condition is resolved and the target address is calculated, and unconditional branch is not resolved until the target address is calculated. As pipelines get deeper or issuing rate gets higher, the penalty imposed by branches gets larger. One way to reduce this penalty is predicting the direction of a conditional branch, pre-fetching, decoding, and executing the instruction at the branch target. A large amount of speculative work has to be thrown away after a branch miss predication. This results in higher misprediction penalty as the memory hierarchy getting more complex. Extremely accurate branch prediction is thus the key to reduce this penalty. Many schemes have been proposed to reduce prediction miss rate.

Branch prediction schemes can be classified into static schemes and dynamic schemes by the way the prediction is made. Static prediction schemes can be simple. The most straight forward one is predicting branches to be always taken by observing that majority of branches are taken. As reported by Lee ad Smith [LS92], this simple strategy can predict correctly 68% of the time. In our study, out of the dynamic instructions traced, 65% of the conditional branches are taken. Our traces also indicate that this simple approach may result in less than 50% correct prediction for some integer programs. Static schemes can also be based on branches' opcodes. Another simple method is using the direction of the branches to make a prediction. If the branch is backward, i.e., the target address is smaller than the PC of the branch instruction, it is predicted to be taken. Otherwise, if the branch is forward, it is predicted to be not taken. This strategy tries to take advantage of loops in the program. It works well for programs with many looping structures. However, it does not work well in the case there are many irregular branches. Profiling is another static strategy which uses previous runs of a program to collect information on the tendencies of a given branch to be taken or not taken and preset a static prediction bit in the opcode of the given branch. Later runs of the program can use this information to make predictions. This strategy suffers from the fact that runs of a program with different input data sets usually result in different branch behaviors. Recently, C. Young and M. Smith proposed static correlated branch prediction(SCBP) trading off increased code size with increased prediction accuracy. At this time, we do not know whether this approach will yield any performance improvement. For more information, refer to [YS94]. In this project, we studied the limit of static approach without code expansion. Our results indicate that the static schemes without code expansion are not comparable to dynamic approaches.

Dynamic schemes are different from static ones in the sense that they use the run-time behavior of branches to make predictions. J. Smith [S81] gave a survey of early simple static and dynamic schemes. The best scheme in his paper is the one which uses 2-bit saturating up-down counters to collect history information which is then used to make predictions. This is perhaps the most well-known technique. McFarling [M93] referred to it as the bimodal branch prediction. There are several variations in the design of the 2-bit counter. Yeh and Patt [YP91] discussed these variations. In many programs with intensive control flow, very often the direction of a branch is affected by the behavior of other branches. By observing this fact, Pan, So, & Rahmeh [PSR92] and Yeh & Patt [YP91] independently proposed correlated branch prediction schemes or two-level adaptive branch prediction schemes. This new approach improved the prediction accuracy by a large factor. Yeh and Patt [YP93] classified the variations of dynamic schemes that using two levels of branch history. McFarling [M93] exploited the possibility of combining branch predictors to achieve even higher prediction accuracy.

Computer technology is advancing at a rapid speed. Advanced VLSI technology makes it possible to have larger branch prediction table and more complicated schemes. The advancement in programming languages also makes it possible to have larger and more complicated programs, and allows more cross-references between branches because of more complicated procedure calls. Multiprocessing and threading become important because of the rise of Multiple Instruction streams, Multiple data streams(MIMD) machines. Therefore, it is important to look at the effect of these advancements on branch prediction.

In this project, we look at some of the following issues. In the literature, there are many MIPS, Alpha, HP-PA, and Power architecture based branch prediction research. We are interested in knowing the impact of a different architecture, SPARC architecture, on branch prediction. Our results do not show any significant, if at all any, impact of architecture on branch prediction. We get similar results as compared to results from previous research. Taking advantage of the fast simulation speed of Shade, we are able to trace much larger programs, try out many different schemes, and experiment with different parameters of the schemes in a reasonable amount of time. We notice that the number of instructions traced clearly affects the resulted branch behavior and prediction accuracy, and that the selection and the size of the testing program set also affects the comparison over different schemes. Since in different applications and programming languages conditional branches behave differently, it is important to have a set of benchmark programs that can truthfully represents the average workload and complexity of the programs people run. We use a partially new collection of programs, which includes 8 SPECint95 beta benchmarks and 13 SPECfp92 benchmarks, to see how well the well-known schemes work on these new programs. We do observe that one new SPEC95 program go has much different branch behavior from previous SPEC programs. In this paper, we first show the performance of several well-known dynamic branch prediction schemes. From the results, we conclude that selective predictor achieves the highest prediction accuracy with the same size of branch prediction buffer. We also observe that conventional static predictors cannot compete with dynamic predictors, and context switching has little impact on branch prediction with today's fast CPU. In addition, complex schemes require longer time to warm up than simple schemes do.

The rest of the report is organized as follows: the related work section gives references to previous work on branch prediction. The design methodology section discusses the methodology used in this study: how the simulated prediction models and testing programs are selected, and how the simulated prediction schemes are designed and implemented. The result analysis section discusses our findings from traces of the benchmark programs. The future work section presents some of the work that may be interesting to explore. The last section summaries the report.


Project Home | Previous Section: Abstract | Next Section: Related Work