Improved Static Resolution of Dynamic Class Loading in Java
Jason Sawin Atanas Rountev
Ohio State University
{sawin,rountev}@c.ohio-state.edu
Abstract
Modern applications are becoming increasingly more dynamic andflexible.In Java software,one important flexibility mechanism is dynamic class loading.Unfor-tunately,the vast majority of static analys for Java handle this feature either unsoundly or overly conr-vatively.We prent a t of techniques for static reso-lution of dynamic-class-loading sites in Java software. Previous work has ud static string analysis to achieve this goal.However,a large number of such sites are impossible to resolve with purely static techniques.We prent a novel mi-static approach which combines static string analysis with dynamically gathered infor-mation about the execution environment.The key in-sight behind this approach is the obrvation that dy-namic class loading often depends on characteristics of the execution environment that are encoded in various environment variables.In ad
dition,we propo gen-eralizations of string analysis to increa the number of sites that can be resolved purely statically,and to track the names of environment variables.We prent an experimental evaluation on10,238class from the standard Java libraries.Our results show that a state-of-the-art purely static approach resolves only28%of non-trivial sites,while our approach resolves more than twice as many sites.This work is a step towards mak-ing static analysis tools better equipped to handle the dynamic features of Java.
1Introduction
Modern software applications need to be highly adaptable andflexible to stay competitive.Applica-tions are expected to perform similarly on multiple op-erating systems,under various execution environments. Software urs are demanding the ability to customize their applications to a degree that has never been en before.To meet this demand,more and more applica-tions such as Eclip and Tomcat support third-party extensions.The extensions allow the frameworks to stay current and relevant without requiring them to absorb resulting massive development costs.To gauge the demand and success of such extensions,one only has to take note of the number of third-party exten-sions available for Eclip.
This incread applicationflexibility does limit what can statically be determined about a program.One very significant limitation is the lack of access to code for program ,third-party extensions that are not available at analysis time,or modules that have yet to be developed).However,even if all code en-tities are available,most static analys would not be able to accurately analyze modern software systems. This is becau the language constructs that make this unprecedented level offlexibility possible are largely viewed as a nuisance by the static analysis commu-nity.Prime examples of this situation are Java con-structs that allow for dynamic class loading.The powerful language features allow Java applications to load class into the JVM at run time,requiring only a string reprentation of the class’fully-qualified name. Dynamic class loading is ud extensively in applica-tions such as Eclip,Tomcat,EJB application rvers, etc.In the most general ca,there is no way to deter-mine which entities will be loaded until run time.As a result,many static analys either choo to ignore dynamic class loading constructs,thus producing an unsound result,or handle them in such a conrvative fashion as to render the end result uless.
Some recent work has employed static string analy-sis to resolve instances of the dynamic features.Such an approach statically attempts to determine the value of the string which specifies th
e target class that is to be loaded.For example,a call Class.forName(s)dy-namically loads the class with the name reprented by the string expression s.If,through static string analysis,the preci run-time value of s could be de-termined,then the statement could be treated as a static initialization of the class specified by s.Current string analysis approaches have two potential points of
failure when trying to determine the value of s:(1) when the value of s is not a compile-time constant, and truly depends on the run-time execution,and(2) when the analysis is not powerful enough to model the flow of the string value through the application.Un-fortunately,the u of such truly-dynamic values and complex string manipulations is fairly common when designing aflexible application.For example,many ap-plications will inspect environment variables,configu-rationfiles or particular directories to determine which extensions are available to be loaded.In such cas any purely static analysis will fail to produce a preci re-sult.Similarly,many applications u data structures and perform string operations that are currently be-yond the modeling capabilities of string analys.
In this paper we prent a novel mi-static approach which combines static string analysis with dynami-cally gathered information about the execution envi-ronment.The key insight behind this approach is the obrvation that dynamic class loading often depends on characteristics of the exec
ution environment that are encoded in various environment variables.Our investigation of the Java libraries revealed that over 40%of the fully-contained instances of dynamic class loading—i.e.,ones that could not be affected directly by client code—depend upon environment variables. Though such variables are not static elements of an application,they are different from other forms of dy-namic input data in that their run-time values typically remain the same across multiple execution of the appli-cation.Our approach identifies dynamic-class-loading sites that depend only on such variables,and resolves them bad on the current variable values.As part of this approach,we also propo veral generalizations of static string analysis that improve the tracking the names of environment variables,as well as increa the number of sites that can be resolved purely statically.
Our approach produces results that are sound with respect to the current execution environment,but do not apply to all possible environments.For many clients of static analys this is both reasonable and desirable.For example,consider program understand tools such as SHriMP[21]or Rigi[18].Such tools have the potential to overwhelm their urs with too much information[22].If such tools tried to account for ev-ery class that could potentially be loaded at dynamic-class-loading sites for all possible combinations of envi-ronment variable values,their ufulness may be com-promid.Instead,using our approach,the ur can obtain information that is sound for her own local
,for the specific environment variable values that capture component configurations,operat-ing system parameters,etc.).
昆虫的英语
This work makes the following contributions:•We propo a fully automated mi-static ap-proach that utilizes the system’s current config-uration information to aid in the resolution of dy-namic class loading in Java applications.This ap-proach defines a uful and practical relaxation of purely static approaches for handling of dynamic class loading.
•We prent veral generalizations of string anal-ysis that not only enable our approach to re-solve more instances of environment-dependent in-stances of dynamic class loading,but also allow for
a greater number of purely static instances to be
resolved.
•We describe an experimental study in which our approach was applied to the entire Java1.4stan-dard libraries.The results of this experiment in-dicate that the approach is able to resolve more then twice the number of client-independent sites currently resolvable by the state-of-the-art static string analysis.Through comprehensive man-ual investigation we also determined that our ap-proach identi
fies91%of all sites that are in fact truly static or environment-variable-dependent, which implies very high analysis precision.
The propod approach and the experimental results define a significant improvement for the handling of dy-namic class loading in static analysis,compared to cur-rent techniques.Such improvement could be valuable for a range of software tools that employ static analy-s to support software understanding,transformation, verification,and optimization.
前文
2Background
This ction provides a brief overview of the dynamic class loading feature in Java,as well as a high-level de-scription of the state of the art in Java string analysis.
2.1Dynamic Class Loading in Java
The Java Virtual Machine(JVM)is one the defin-ing components of the Java platform[14].It interprets Java bytecode,allowing Java applications to be plat-form independent.It also supports dynamic class load-ing,which is the ability to load class at run time[13]. This is a very powerful mechanism that allows class to interface with software components that are speci-fied at run time,and in fact do not even need to exist at compile time.This feature is a key mechanism that
1private static final String handlerPropName="ption.handler"; 2private static String handlerClassName=null;
3
4private boolean handleException(Throwable thrown){
<
6/*Get the class name stored in environment
7*variable ption.handler*/
8handlerClassName=(String)AccessController.doPrivileged(
9new GetPropertyAction(handlerPropName)); 10.....
11/*Load the class and instantiate it*/
12Object h;
13Class c=Class.forName(handlerClassName,...);
wInstance();
<
16}
Figure1.Sample code from library class java.awt.EventDispatchThread.
allows modern applications to achieve the desired level offlexibility.
Loading class into the JVM is the responsibility of class loaders.At its simplest,a class loader takes a string reprentation of the fully-qualified name of the class that is to be loaded and then performs a hier-archical arch for the corresponding classfile.Upon finding the classfile,the loader loads the bytecode into the JVM and returns a Class object.This is a meta-data object through which the program can access the ,to create class instances).
Example.Figure1illustrates theflexibility an application can gain from the u of dynamic class load-ing.We revisit this example veral times throughout the rest of the paper.The code is from library class java.awt.EventDispatchThread and allows custom-defined event handlers to be loaded in a running appli-cation.If a client wishes to u a custom event handler, all she needs to do is create th
e appropriate class and t the environment variable ption.handler to the string value reprenting the fully-qualified name of this class.Method handleException in EventDispatchThread queries this environment vari-able to retrieve the specified class name(lines8and9) and stores it infield handlerClassName.The custom handler is then loaded at line13—method forName is one of veral methods in the Java libraries that can be ud to dynamically load class at run time.A call to newInstance is ud to create a new object of the class;this call has the same effect as calling the no-arguments constructor of the class.
Similar examples can be found throughout the en-tire JDK code.Frameworks such as Eclip heavily u dynamic class loading features to implement their component models;the same is true for EJB applica-tion rvers.The us of the mechanisms will only become more prevalent as the complexity of Java ap-plications grows.It is critical that the static analysis community began to aggressively attack the problem of handling such features.
2.2Java String Analyzer
Most static analys have taken two approaches for the handling of the dynamic features in Java:ignore them or treat them very conrvatively.Ignoring the features produces a result that is
unsound and may miss vital program entity interactions.Such an ap-proach would render an analysis impractical for u on modern Java applications;for example,there is evi-dence[15]that significant portions of the program call graph can be omitted by this approach.Converly, the conrvative approach assumes that any class can be loaded and instantiated.However,the relevant in-formation can be easily obfuscated by the number of infeasible interactions inferred by this technique.Some analys such as[15]and[24]require that the ur manually specify the interactions which occur due to dynamic class loading.However,this technique can be time consuming and error prone.Yet others[15]utilize casting information to narrow thefield of what needs to be considered.However,such an approach would fail for the code prented in Figure1,since no cast of the dynamically loaded class is performed.
Since strings specify the class that are to be loaded at instances of dynamic class loading,a preci string analysis has the greatest potential to precily resolve such instances without requiring input from the ur.
The work in[2,15,25]employs various forms of string analysis in an attempt to determine the possible run-time values of the target strings.The most powerful string analysis currently available for Java is in the Java String Analyzer(JSA)library described in[2].
The input to JSA is a t of Java class and a t of expressions(hotspots).JSA conrvatively computes the possible run-time string values at all instances of tho hotspots in the input class.The analysis uti-lizes the Soot analysis framework to generate and par the Jimple intermediate reprentation[25].From this reprentation,JSA builds aflow graph that models the flow of string values and the operations which manipu-late them.The nodes of the graph reprent variables and expressions;the edges are directed def-u edges that reprent the possibleflow of data.The graph containsfive types of nodes:Init nodes reprent the initial construction of string values,Join nodes model assignments and control join points,Concat nodes rep-rent string concatenation,UnaryOp nodes reprent unary string operations such as rever,and BinaryOp nodes model binary string operations such as inrt. In esnce,this graph is a static single assignment form where the join nodes are analogous toφfunctions.
From theflow graph JSA constructs a context-free grammar.For each node n in the graph,a nonter-minal A n is added to the grammar along with a t of productions corresponding to the incoming edges of n.The productions are determined by the type of n.For example,if n were a Concat node and nodes x
and y were predecessors of n,the following rule would be added to the grammar:A n→A x A y.The pr
oduc-tion for an Init node n is A n→reg where reg cor-responds to a regular language.JSA then utilizes the Mohri-Nederhof algorithm[17]to transform the gram-mar into a strongly-regular context-free grammar.The result can be accurately modeled by afinite state au-tomaton.Such an automaton is created for each node in the graph that reprents a hotspot.The language produced by the automaton is a supert of the possible string values that can occur at that hotspot.
3Generalizing JSA
String analys such as the one prented in Sec-tion2.2have two points of possible failure when at-tempting to precily determine the run-time values a string-typed expression can assume:
1.The value of the expression depends upon values初中生必读
鸡腿拼音
that the analysis does not have access ,the args[]array pasd to a main method).
2.The analysis is not powerful enough to model the
flow and manipulation of the string values.In this ction we prent veral generalizations to JSA.The generalizations increa both the number of program entities the analysis can access and its over-all modeling capabilities.The enhancements greatly improve the analysis’ability to res
olve instances of dy-namic class loading.
3.1Semi-Static Analysis
Consider the example code shown in Figure1.If some JSA client specifies the invocation statement forName(str,...)as a hotspot,JSA will attempt to resolve the possible run-time values of parameter str. However,in this example JSA will return the value anystring for handlerClassName.This resulting value indicates that under JSA’s model,the parameter could potentially be any Unicode string.This occurs,in part, due to the fact that JSA views environment variables as run-time inputs to the program and thus assumes that it has no access to the values stored in them.
Unfortunately,most applications that utilize dy-namic class loading often rely on string values that are not statically contained in their own code.It is rare, however,that a needed string valueflows from direct ur ,from stdin).A much more common ca is that such valuesflow from system environment variables,such as in the example above.Environment variables are key/value pairs that are stored in the ex-ecution environment and can be accesd by all pro-grams.Under most common programming paradigms, the variables provide the program with information about the type of environment it is operating in.Hy-pothetically,it is possible that the ur could manip-ulate the
values between concutive runs of an ap-plication.This,however,is not the intent of many of the variables.Consider the Java system prop-erty marked by the key os.name;clearly,this prop-erty is not meant to be modified by the ur.More-over,many of the variables will be consistent across a large number of the host environments that the ap-plication will be executed on,and certainly across mul-tiple runs on the same host.For example,library class java.awt.print.PrinterJob queries an environment variable to determine which class to load in order to create a job that the current system’s printer will recog-nize.Such a variable will be consistent across systems that have the same type of printer.It is rare that a system frequently changes its printer,and therefore for a given system the value will esntially be static.
We purpo a generalization to JSA that will allow it to make u of the values stored in environment vari-ables.Our approach requires only alterations to the graph model that JSA builds to reprent theflow of
java.Property(<string>)
java.Property(<string>,<string>) java.Property(<string>) sun.curity.action.GetPropertyAction(<string>) Figure2.Some entry points for environment
黄澄澄的读音
variables.
string values.We prent only the end alterations to the graph;for brevity,the details of the intermediate stages are not discusd.
Our approach is bad on the t of Java library methods that rve as entry points for the values of environment variables;a subt of the methods is shown in Figure2.All of the methods take a key string parameter which specifies the environment vari-able that is to be accesd.In the example prented in Figure1,the constantfield handlerPropName con-tains the key"ption.handler".Several of the methods take a cond default string parame-ter.The methods return the value stored in default if the value of key does not specify an environment variable with a t value.Since the parameters are strings,we can add a special env-hotspot node to the JSA graph for each encountered call to a method that is an environment variable entry point.By leveraging the existing techniques in JSA,it is often possible to resolve the potential run-time values that both the key and default parameters can assume.
If JSA is able to resolve the key and default param-eters,our approach performs an analysis time look-up of the key/value pair in the environment.This look-up is achieved by executing the method ca
ll reprented by the env-hotspot node.We term this step to be mi-static since,strictly speaking,it is a dynamic execution of a slice from the application under analysis,but in esnce it is a look-up of a“static”entity.The out-come of a look-up will result in one of three possible modifications to the graph,as prented below.
Single value return.The most straightforward ca occurs when both the key and default(if it ex-ists)parameters for an env-hotspot resolve to a sin-gle value.In such situations it is guaranteed that the look-up step will return a single string value:if the key/value pair exists it will return the value,and if the pair does not exist it will return the value specified in default or null.1In such cas our approach re-places the env-hotspot node with an Init node.The value associated with this Init node is the result of the environment variable look-up.Due to this change of theflow graph,all strings that were dependent upon 1JSA does provide treatment of null string values.the original method call are now dependent upon the looked-up value.
关于牛顿的故事
Multiple value return.Of cour,more then one string value mayflow to key,to default,or to both. In such situations the look-up executes the env-hotspot method for every key value and a special default value, if required.Since JSA is context-innsitive,if the spe-cial default value is ever returned from a look-up,our approach assumes that all default values may be pos-sible.Every value that the lo
ok-up step discovers,in-cluding all defaults when applicable,is assigned to a new artificial Init node.The env-hotspot node is then replaced by a Join node and an edge is added from ev-ery new Init node to this new Join.Since Join nodes are analogous toφfunctions(e Section2.2),this has the effect of unioning all the returned look-up values. Thus,all entities that were originally dependent upon the method invocation are now dependent upon the t of possible values that could be returned at run time.
Variable corruption.It is entirely possible that for some env-hotspot JSA will not be able to resolve the key parameter,the default parameter,or both. If the key value is unresolvable there is no preci way to determine the appropriate environment variable to look up.Thus,our approach replaces the env-hotspot node with an Init node assigned the anystring value. This is also the action taken if the default parameter is unresolvable and one of the key parameter values is an environment variable which is not ,does not have a key/value pair in the environment).This has the affect of“corrupting”all other strings that are dependent upon the original method call.
The result of this generalization is a solution that is sound with respect to all possible run-time executions during which the configuration values are the same as the values that were obrved during the analysis.This mi-static approach differs from both a completely static analysis(which pro
duces a solution describing all possible run-time executions)and a completely dy-namic analysis(which produces a solution describing the specific obrved run-time execution).While this paper employs this technique to resolve dynamic class loading,other static analys may benefit from the same ,by performing partial redundancy elim-ination bad on looked-up values).
3.2Modeling Generalizations
Even with the addition of the mi-static technique described above,the current publicly available version of JSA would still not be able to determine the possi-ble run-time values of handlerClassName at line13in the running example(Figure1).This is due to JSA’s
inability to accurately model all possibleflows of string values.For example,JSA currently does not precily track theflow of string values to/fromfields.All string values thatflow fromfields are ,assigned the anystring value).
We propo a more preci handling offields.Our technique modelsfields similarly to the manner that JSA handles method invocations in that both are treated in a context-innsitive manner.Currently,we are only consideringfields of type String and in some special cas,arrays with a ba type of String.The approachfirst identifies all access to a givenfield x in the input class.It
then unions all values thatflow to instances of x.In thefinalflow graph this is modeled by adding edges from every Join node that reprents an assignment to x,to a newly synthesized Join node. An edge from this synthesized node is then added to the node reprenting thefield.Conquently all sites that read the value of x will be modeled as potentially receiving all possible values that could be assumed by every instance of x.This approach of modelingfields is similar to that of[3]and[23].
During our manual investigation of the Java li-braries,described in the next ction,we discovered veral instances of dynamic class loading that de-pended on string values defined in staticfinal array fields,as illustrated by the following example:
private static final String[]codecClassNames= {"dia.sound.UlawCodec",
强迫症怎么治"dia.sound.AlawCode"}
This structure encapsulates the strings specifying the two possible SunCodec class that could be loaded at run time by class dia.sound.SunCodec. For such cas,our approach treats the array as a sin-gle Stringfield.Synthesized Init nodes are created for each statically defined array entry.The values are unioned together in the fashion described above.
Even after increasing JSA’s ability to modelfields, it would still not be able to resolve the possible run-time values of handlerClassName from the running example.This is due to the limited number of vari-ables types modeled by JSA.In its original form JSA only models variables of type String,StringBuffer, StringBuilder and arrays with a ba type of String. However,in the code displayed in Figure1,the look-up of environment variable ption.handler is accomplished by creating an instance of library class sun.curity.action.GetPropertyAction(line9). This is a convenience class that implements inter-face java.curity.PrivilegedAction.Instances of PrivilegedAction are typically pasd to invocations of AccessController.doPrivileged.This results in the execution of PrivilegedAction.run with privi-leges enabled.In the ca of class GetPropertyAction, the run method simply wraps an invocation of Property.The problem is that the return type of PrivilegedAction.run is java.lang.Object. Even though String is a subclass of Object,JSA is not powerful enough to model objects with a compile-time type of Object that are actually of type String.
It is a very common practice to wrap access to environment variables in a PrivilegedAction.Thus, it is paramount for the success of our mi-static approach that JSA be able to properly model such occurrences.We propo a generalization through which JSA can conrvatively determine variables
with compile-time types of Object that are actually of type String.To achieve this,we augment JSA to also con-sider variables of type Object.Suppo that the only actions performed on such a variable are(1)assignment to another variable of type Object,(2)assignment from a variable with a compile-time type of Object that is actually of type String,(3)cast to a String variable,and(4)assignment from a String variable or a string literal.If this is the ca,we direct JSA to treat the variable as a String.If any action outside of tho specified above occurs,the variable is conrvatively corrupted,and transitively so are all string values de-pendent upon it.This approach is quite conrvative and more powerful type inferencing techniques could reveal more instances of Object variables which are really of type String.Still,our experimental results show that this approach is sufficient to model theflow of most string values which are utilized at dynamic class loading sites in the Java libraries.
4Experimental Evaluation
We implemented our propod generalizations of JSA and evaluated the enhanced version’s ability to re-solve instances of dynamic class loading in the10,238 class from the Java1.4standard libraries.We iden-tified13library methods that are ud to dynami-cally load class into the JVM;some examples are shown in Figure3.The methods were ud as the hotspots input to JSA.A site was considered resolved if JSA returned afinite number of possible string val-ues for the<
string>parameter reprenting the fully-qualified name of the class to be loaded;we will refer to this parameter as the target string.
Manual investigation.To establish a“perfect baline”for our results,we performed a manual investigation of the input class.During the investi-gation we examined all potential hotspots as defined above.Not considered were occurrences where the target string was a constant string literal.For exam-
>寓言故事读后感