脸肿了怎么回事
Y-Branches:When You Come to a Fork in the Road,Take It
Nicholas Wang Michael Fertig Sanjay Patel
Center for Reliable and High-Performance Computing
Department of Electrical and Computer Engineering
University of Illinois at Urbana-Champaign
nwang,fertig,sjp@crhc.uiuc.edu
Abstract
In this paper,we study the effects of manipulating the
architected direction of conditional branches.Through the
u of statistical sampling,wefind that about40%of all dy-
namic branches and about50%of mispredicted branches
do not affect correct program behavior when forced down
the incorrect path.We call such branches Y-branches.
To further examine this unexpected phenomenon,we
巴金随想录provide a characterization of the coding constructs that
give ri to such branches.Examples of such coding con-
structs include short-circuits and ineffectual loop itera-
tions.We provide a statistical breakdown of the frequency
of the branches and their constructs.Finally,we suggest
some techniques for exploiting this behavior,particularly
when it results from short-circuit constructs.
1.Introduction
Control speculation enables high-performance pro-
cessing for control intensive applications.Ultimately,
though,predictability becomes a limiting factor in improv-
ing processor performance:at some level,improving the
performance of a single thread of execution can be tied to
improving the processor’s ability to accurately predict and
rapidly evaluate branches.
In this paper,we investigate a surprising property of
certain dynamic branches.Using fault injection,wefind
that about40%of all dynamic conditional branches are
outcome-tolerant.That is,the behavior of the application
is unaffected by whether the particular branch instance is
taken or not.Dynamic branches that are mispredicted ex-
hibit a larger degree of outcome-tolerance—about half of
all mispredicted branches can proceed down an incorrect
architectural path and result in correct execution.
2.Methodology
In order to characterize Y-branches,a method for dis-covering them is necessary.Tofind all Y-branches for our input ts would require testing all instances of every static branch executed in each benchmark.Since it is computa-tionally expensive to perform this complete analysis,we u statistical sampling.This ction describes our experi-mental tup as well as the applications and infrastructure ud to perform the identification process.
2.1.Performing Injection
For each experiment,we randomly sample1000con-ditional branches1.To determine whether a branch is outcome-tolerant,wefirst choo a single conditional branch instance.Then,when the chon branch is encoun-tered at runtime,the execution of the program is forced down the alternate path.That is,if the conditional branch was resolved to be taken,then the execution of the pro-gram is forced down the fall-through path and vice versa. Exactly one injection is made per benchmark execution,so the effects of each of the injections are isolated from one another.Each of the benchmark executions constitute a single trial,and each experiment is compod of1000 trials for each benchmark.Only conditional branches are targeted in our study—all other control instructions are ex-ecuted as defined by architectural state.
After each injection is performed,the execution of the program is monitored.We call the execution of the manipulated program the injected simulation.To determine the effects of the injection,we compare the injected simu-lation with a reference simulation of the program that is generated without injection.All of architectural state(PC, registers,and memory)is frequently compared against the reference simulation to identify if a reconvergence point exists.The discovery of this point guarantees identical ex-ecution thereafter as well as confirming the chon branch is outcome-tolerant.
Most applications do not execute in isolation,so it is important that all external communication is equi
可爱的孩子valent with that of the reference simulation.The executed applica-tions perform all external communication via system calls. While system calls contain afinite t of inputs,we treat them as barriers for verification.That is,the injected pro-gram must reconverge to the original program’s architected state prior to the execution of any system call.This re-striction reduces reconvergence likelihood,however,it also reduces simulation time and complexity.
Benchmark Dynamic Static Footprint
Count System Calls(instructions) bzip228.4M8543
625M1082318
eon10.4M49268
501M25252397
gcc33.6M164167
869M32828
mcf53.9M8198
513M10702390 perlbmk11.8M34472
595M4032896 vortex24.0M82037
534M1022116
Table1.Benchmark statistics.
现在什么生意好做ject out of the t of all dynamic branches.The random
lection is done before the trial begins by generating a
pudo-random number between one and the number of dy-
namic conditional branches in the particular execution of a
benchmark.
Outcome-tolerant mispredicted branches are deter-
mined by restricting the t of dynamic branches to tho
that are mispredicted by an18-bit gshare conditional
branch predictor[9].Again,the random determination is
done before the trial begins.
Figure1plots the outcome-tolerance rates of the three
experiments.Averaged across all benchmarks,approxi-
mately30%of the instances of an individual static branch
are outcome-tolerant2.Approximately40%of all dynamic电热水器怎么选
branches are outcome-tolerant,indicating that dynamically
frequent branches tend to exhibit more tolerance than infre-
quent branches.Finally,50%of dynamically mispredicted
branches exhibit outcome-tolerance,indicating that half of
all mispredicted conditional branches might be resilient to
being mispredicted.
The significant ri in Y-branches in the ca of
mispredicted branches is particularly pronounced for the
benchmarks bzip2and gzip.The benchmark bzip2contains
an unrolled loop that accounts for a small portion of the
dynamically executed instructions.However,the loop end-
ing branches of the unrolled iterations also account for a
large portion of mispredictions.The branches can safely
perform early exits since the clean-up loop will perform
the remaining loop operations.Similarly,the increa in
outcome-tolerance in the benchmark gzip is due to three
branches that also account for a small portion of the pro-
gram execution,but reprent approximately half of all
mispredictions and are highly outcome-tolerant.The loop
structions.Eventually,if the injected branch is a Y-branch,the two streams converge and execute the same instructions for the remainder of the program—we term this control re-convergence .The span in instructions between the point of injection and the control reconvergence point in the in-jected execution is what we call the control reconvergence latency,which is labeled in the figure.This same pa-rameter with respect to the reference execution is labeled .Zero or more instructions later,if the injected branch is a Y-branch,both executions arrive at the same architectural state,or full state reconvergence .The full state reconver-gence latency is labeled in the figure.
There is actually another interesting point:live state reconvergence ,that is some of the full state of the running process is actually dead and therefore not required to be equivalent between the two executions.Live state recon-vergence will always occur at or before the two executions reach full stat
e reconvergence.In addition,live state recon-vergence is possible in the abnce of full state reconver-gence.Thus,the t of branches that are outcome-tolerant with respect to live state is a supert of the t of branches that are outcome-tolerant with respect to complete state.Unfortunately,live state reconvergence is difficult and ex-pensive to measure precily,so we do not investigate it here.
Injected Execution
Reference Execution Figure 2.Comparison of reference versus in-jected executions.
We provide data on the span (in instructions)between the injection of a Y-branch and the subquent control re-convergence (with respect to the injected simulation),in Figure 3.Becau the distributions of the data are very wide,we have only plotted the middle 80%of the ,we exclude the upper and lower 10th percentiles.For example,80%of Y-branches in bzip2tend to control converge b
etween approximately 10to 100instructions.The median Y-branch in bzip2reconverges on control in 41instructions,as indicated by the diamond along the vertical line.In our graphs with medians,the average bar repre-nts an averaging across all other percentile bars.With this in mind,the average benchmark reaches control recon-vergence in approximately 18instructions.
Eon differs slightly from the other benchmarks in that many of its Y-branches reconverge on control immediately
after the injection.The reason for this is that about a third of the Y-branches are from a single loop-controlling branch.Injecting this branch usually results in an early loop exit which then in turn often results in immediate control reconvergence.Despite the short control reconver-gence latency,eon ’s full state reconvergence latency is rel-atively slow.We believe this occurs becau the early loop exits skip a number of dead stores,where the dead values they would have written have a relatively long
lifetime.银行风险
b z i p
2
c r a f t y
e o
n
唐狡g a
p
g c c
g z i p
m
c f
p a
r s e r
p e
r l b m k
t w o l
f
v o r t e
x v p r a v e r a g
e
Figure 3.Control reconvergence latency in instructions.
Figure 4plots the full state reconvergence spans in instructions for the middle 80%of Y-branches for every benchmark.Notice that full state reconvergence spans are much wider ranges than for control reconvergence,ranging from 1instruction for a Y-branch in perlbmk to over 3M instructions on the benchmark vpr .Across all benchmarks,the median Y-branch reaches full state reconvergence after approximately 1300
instructions.
110100100010000
100000100000010000000b z i p 2
c r a f t y
e o n
g a p
g c c
g z i p
m
c f
p a r s e r p e r l b m k
t w o l
f
v o r t e x
v p r
a v e r a g e
关于对联的故事
Figure 4.Full state reconvergence latency in instructions.
Figure 5prents the difference in the number of in-structions executed between the two ,the value of in Figure 2.Becau of the nature of the branch being injected,sometimes the injected execution runs for fewer instructions than the reference execution,
sometimes longer (we explain this in Section 3.3).The ranges are provided in this figure (positive numbers indi-cate that the injected execution ran for fewer instructions).In the median ca for the average benchmark,the injected execution ran for nearly the same number of instructions as the reference trace.Somewhat interesting is the fact that in roughly half the cas,the injected execution required fewer instructions than the reference execution to arrive at the
of -200
-160-120-80-40040801201602003.3.Sources of Outcome Tolerance
Outcome-tolerant branches are ultimately due to re-dundancies introduced by the programmer or compiler.The redundancies occur in various forms,such as super-fluous loop iterations,performance-related branches,short-circuits,and the occasional irrelevant decision.Most of the Y-branches are a result of partially dead control,meaning that the branches are outcome-tolerant subject to some but not necessarily all initial states.In this ction,we perform a careful disction of a sampling of outcome-tolerant branches in order to find the underlying reasons behind this phenomena.
What we desire is an ability to examine the control flow graph and possibly the source code of a detected Y-branch in order to characterize the coding constructs that give ri to its outcome-tolerance.To do this automatically,we examine the differences in execution between the ref-erence execution and the injected execution.That is,re-ferring to Figure 2,we examine the differences in PCs ex-ecuted between the shaded regions of each execution,up until the point of control reconvergence.
Two basic situations ari when we examine the differ-ence between the injected and reference exe
cutions,as rep-rented in Figure 6.The leftmost subfigure reprents the
Disjoint ca.Here,the injected execution and the refer-ence execution execute completely different static instruc-tions.This
situation is primarily due to
the Y-branch being an if-el construct,where the decision made by the particular dynamic instance of the branch is irrelevant (we provide a specific example of this at the end of this ction).Sometimes the reference execution executes more instruc-tions;sometimes the injected execution does.
Execution A
Execution B
Execution A
Execution B
Disjoint Ca Skip Ca
Figure 6.General structure for difference analysis.
The other basic situation is reprented in the right subfigure.Here,one of the executions skips over instruc-tions executed by the other starting immediately after the injection point.That is,Execution A might execute blocks X,Y ,Z
before hitting the control reconvergence point whereas Execution B flows directly to the reconvergence point immediately after injection.For example,if the injec-tion targets a Y-branch that is a loop-ending branch causing the loop to either terminate early or execute additional iter-ations,then the Y-branch is of the Skip category.
Y;
X;}Figure 7.The branch at the end of block A is potentially a Y-branch of the Skip variety.Skip-type Y-branches can be further classified bad on whether loops are involved.If looping is not involved (i.e.,none of the branches in the shaded region are back-wards branches),the code construct responsible for the outcome-tolerance is most probably of the short-circuit va-riety.For example,Figure 7demonstrates the control flow that results from a simple short-circuit expression in C.The branch at the end of block A is an early jump to block X
0%
10%20%30%40%
50%
60%
70%
80%b z i p 2
c r a f t y
e o n
g a p
g c c
g z i p
m
c f
p a r s e r p e r l b m k
t w o l f
v o r t e x
v p r
a v e r a g e
[Other] Unclassified
[Skip] Complex
Figure 8.Categorization of Y-branches for mispredicted branches.
if it is determined that A is true.This branch is potentially a Y-branch of the Skip variety when conditio
n B is true and B has no side-effects during evaluation.We call Y-branches of this category Skip/Simple branches.Note that some nested if constructs that are not short-circuits also fall into this category.
If the Skip-type Y-branch involves a backwards branch,then we classify the branch as either a Skip/Loop or Skip/Complex branch.Skip/Loop branches are cas where we are confident that the Y-branch is a loop-terminal conditional branch,whereas the Skip/Complex situation is indeterminate:a backwards branch is encountered,so a loop potentially exists in the extra instructions executed.Unfortunately,it is not clear from our analysis whether the Y-branch itlf was the loop-terminal branch.
But not all cas are so clean as the categorizations of Skip/Simple,Skip/Loop,Skip/Complex,and Disjoint.Occasionally,the region between the injection and full state reconvergence is too large for our tools to analyze.For cas where this region is over 1M instructions,we categorize it as Trace Too Big .Furthermore,cas ari where the Y-branch injection has lingering effects on pro-gram state,causing control to randomly waver before fi-nally converging.When this occurs,we ver the traces at the first common instruction and attempt to classify it as described above.The cas are called Skip/Loop +Slop ,Skip/Simple +Slop ,etc.In certain cas it is difficult to find a point at which t
o truncate so we give up (Unclassified ).
Figure 8provides a detailed classification of Y-branches that ari from mispredicted branches.From this figure,it is possible to identify the approximate con-tribution to Y-branches of each coding construct.On average,the contributions to the various coding con-structs is about equal bad on our categorization.Ap-proximately 10%of mispredicted branches are outcome-tolerant and due to short-circuit like code constructs
(Skip/Simple and Skip/Simple+Slop).Approximately 20%of mispredictions are Y-branches that involve loops (Skip/Loop,Skip/Complex,...).Approximately 15%of them involve if-el -type constructs (Skip/Disjoint,Skip/Disjoint+Slop).
3.4.Specific examples
To further illustrate the nature of the outcome-tolerance phenomenon,we examine specific examples,taken from SPEC,of three coding constructs that lead to instances of outcome-tolerance.
3.4.1.Short-circuit example.An example of a short-circuit code construct from gzip is shown in Figure 9.The if statement is within a do-while loop,and it tests four conditions.If any of them evaluate to true,
the rest of the current iteration of the loop is skipped by the continue statement.In the binary,the compiler implements this if statement as four conditional branches,each of which tests one of the conditions and,if true,branches to the bottom of the loop.We obrve that the first branch accounts for about 15%of all mispredictions in gzip and about 90%of the are outcome-tolerant.In general,the full state and control reconvergence latencies tend to be short for short-circuit like code constructs,and the difference in numbers of instructions executed tends to be small.
3.4.2.Early loop exit examples.Bzip2is a file compres-sion program that employs move-to-front coding after ap-plying a transformation on the data to be compresd [4].When an element is ud during encoding or decoding,it is moved to the front of a list,shifting all previous ele-ments down by one entry.The shifting of elements is im-plemented in a tight loop,which is unrolled four times in the source code.The source code is shown in Figure 10.