Finding Security Vulnerabilities in Java Applications
with Static Analysis
V.Benjamin Livshits and Monica S.Lam
Computer Science Department
Stanford University
{livshits,lam}@cs.stanford.edu
Abstract
This paper propos a static analysis technique for detecting many recently discovered application vulner-abilities such as SQL injections,cross-site scripting,and HTTP splitting attacks.The vulnerabilities stem from unchecked input,which is widely recognized as the most common source of curity vulnerabilities in Web appli-cations.We propo a static analysis approach bad on a scalable and preci points-to analysis.In our system, ur-provided specifications of vulnerabilities are auto-matically translated into static analyzers.Our approach finds all vulnerabilities matching a specific
ation in the statically analyzed code.Results of our static analysis are prented to the ur for asssment in an auditing interface integrated within Eclip,a popular Java devel-opment environment.
Our static analysis found29curity vulnerabilities in nine large,popular open-source applications,with two of the vulnerabilities residing in widely-ud Java libraries. In fact,all but one application in our benchmark suite had at least one vulnerability.Context nsitivity,com-bined with improved object naming,proved instrumen-tal in keeping the number of fal positives low.Our approach yielded very few fal positives in our experi-ments:in fact,only one of our benchmarks suffered from fal alarms.
1Introduction
The curity of Web applications has become increas-ingly important in the last decade.More and more Web-bad enterpri applications deal with nsitivefinancial and medical data,which,if compromid,in addition to downtime can mean millions of dollars in damages.It is crucial to protect the applications from hacker attacks. However,the current state of application curity leaves much to be desired.The2002Computer Crime and Security Survey conducted by the Computer Secu-
rity Institute and the FBI revealed that,on a yearly ba-sis,over half of all databas experience at least one -curity breach and an average episode results in clo to $4million in loss[10].A recent penetration test-ing study performed by the Imperva Application De-fen Center included more than250Web applications from e-commerce,online banking,enterpri collabo-ration,and supply chain management sites[54].Their vulnerability asssment concluded that at least92%of Web applications are vulnerable to some form of hacker attacks.Security compliance of application vendors is especially important in light of recent U.S.industry reg-ulations such as the Sarbanes-Oxley act pertaining to in-formation curity[4,19].
A great deal of attention has been given to network-level attacks such as port scanning,even though,about 75%of all attacks against Web rvers target Web-bad applications,according to a recent survey[24].Tra-ditional defen strategies such asfirewalls do not pro-tect against Web application attacks,as the attacks rely solely on HTTP traffic,which is usually allowed to pass throughfirewalls unhindered.Thus,attackers typically have a direct line to Web applications.
Many projects in the past focud on guarding against problems caud by the unsafe nature of C,such as buffer overruns and format string vulnerabilities[12,45,51]. However,in recent years,Java has emerged as the lan-guage of choice for building large complex Web-bad systems,in part beca
u of language safety features that disallow direct memory access and eliminate problems such as buffer overruns.Platforms such as J2EE(Java2 Enterpri Edition)also promoted the adoption of Java as a language for implementing e-commerce applications such as Web stores,banking sites,etc.
A typical Web application accepts input from the ur browr and interacts with a back-end databa to rve ur requests;J2EE libraries make the common tasks easy to code.However,despite Java language’s safety,it is possible to make logical programming errors that lead to vulnerabilities such as SQL injections[1,2,14]and cross-site scripting attacks[7,22,46].A simple pro-
gramming mistake can leave a Web application vulner-able to unauthorized data access,unauthorized updates or deletion of data,and application crashes leading to denial-of-rvice attacks.
1.1Caus of Vulnerabilities
Of all vulnerabilities identified in Web applications, problems caud by unchecked input are recognized as being the most common[41].To exploit unchecked in-put,an attacker needs to achieve
two goals:
Inject malicious data into Web applications.Common methods ud include:
•Parameter tampering:pass specially crafted ma-licious values infields of HTML forms.
•URL manipulation:u specially crafted parame-ters to be submitted to the Web application as part of the URL.
•Hiddenfield manipulation:t hiddenfields of HTML forms in Web pages to malicious values.•HTTP header tampering:manipulate parts of HTTP requests nt to the application.
•Cookie poisoning:place malicious data in cookies, smallfiles nt to Web-bad applications. Manipulate applications using malicious data.Com-mon methods ud include:
•SQL injection:pass input containing SQL com-mands to a databa rver for execution.•Cross-site scripting:exploit applications that out-put unchecked input verbatim to trick the ur into executing malicious scripts.
•HTTP respon splitting:exploit applications that output input verbatim to perform Web page deface-ments or Web cache poisoning attacks.
•Path traversal:exploit unchecked ur input to control whichfiles are accesd on the rver.•Command injection:exploit ur input to execute shell commands.
The kinds of vulnerabilities are widespread in today’s Web applications.A recent empirical study of vulnera-bilities found that parameter tampering,SQL injection, and cross-site scripting attacks account for more than a third of all reported Web application vulnerabilities[49]. While different on the surface,all types of attacks listed above are made possible by ur input that has not been (properly)validated.This t of problems is similar to tho handled dynamically by the taint mode in Perl[52], even though our approach is considerably more extensi-ble.We refer to this class of vulnerabilities as the tainted object propagation problem.1.2Code Auditing for Security
Many attacks described in the previous ction can be detected with code auditing.Code reviews pinpoint potential vulnerabilities before an application is run.In fact,most Web application development methodologies recommend a curity asssment or review step as a p-arate development pha after testing and before applica-tion deployment[40,41].
Code reviews,while recognized as one of the most effective defen strategies[21],are time-consuming, costly,and are therefore performed infrequently.Secu-rity auditing requires curity exper
ti that most devel-opers do not posss,so curity reviews are often car-ried out by external curity consultants,thus adding to the cost.In addition to this,becau new curity errors are often introduced as old ones are corrected,double-audits(auditing the code twice)is highly recommended. The current situation calls for better tools that help de-velopers avoid introducing vulnerabilities during the de-velopment cycle.
1.3Static Analysis
This paper propos a tool bad on a static analy-sis forfinding vulnerabilities caud by unchecked in-put.Urs of the tool can describe vulnerability pat-terns of interest succinctly in PQL[35],which is an easy-to-u program query language with a Java-like syntax. Our tool,as shown in Figure1,applies ur-specified queries to Java bytecode andfinds all potential matches statically.The results of the analysis are integrated into Eclip,a popular open-source Java development envi-ronment[13],making the potential vulnerabilities easy to examine andfix as part of the development process. The advantage of static analysis is that it canfind all potential curity violations without executing the appli-cation.The u of bytecode-level analysis obviates the need for the source code to be accessible.This is espe-cially important since libraries who source is unavail-able are ud extensively in Java applications.Our ap-proach can be applied to other forms
of bytecode such as MSIL,thereby enabling the analysis of C#code[37]. Our tool is distinctive in that it is bad on a preci context-nsitive pointer analysis that has been shown to scale to large applications[55].This combination of scalability and precision enables our analysis tofind all vulnerabilities matching a specification within the por-tion of the code that is analyzed statically.In contrast, previous practical tools are typically unsound[6,20]. Without a preci analysis,the tools wouldfind too many potential errors,so they only report a subt of er-rors that are likely to be real problems.As a result,they can miss important vulnerabilities in programs.
Figure1:Architecture of our static analysis framework.
1.4Contributions
A unified analysis framework.We unify multiple, emingly diver,recently discovered categories of
-curity vulnerabilities in Web applications and propo an extensible tool for detecting the vulnerabilities using a sound yet practical static analysis for Java.
A powerful static analysis.Our tool is thefirst prac-tical static curity analysis that utilizes fully context-nsitive pointer analysis results.We improve the state of the art in pointer analysis by improving the object-naming scheme.The precision of the analysis is effec-tive in reducing the number of fal positives issued by our tool.
A simple ur interface.Urs of our tool canfind a variety of vulnerabilities involving tainted objects by specifying them using PQL[35].Our system provides a GUI auditing interface implemented on top of Eclip, thus allowing urs to perform curity audits quickly during program development.
Experimental validation.We prent a detailed ex-perimental evaluation of our system and the static analy-sis approach on a t of large,widely-ud open-source Java applications.We found a total of29curity errors, including two important vulnerabilities in widely-ud li-braries.Eight out of nine of our benchmark applications had at least one vulnerability,and our analysis produced only12fal positives.
1.5Paper Organization
The rest of the paper is organized as follows.Section2 prents a detailed overview of application-level curity vulnerabilities we address.Section3describes our static analysis approach.Section4describes improvements that increa analysis precision and coverage.Section5 describes the auditing environment our system provides. Section6summarizes our experimentalfindings.Sec-tion7describes related work,and Section8concludes. 2Overview of Vulnerabilities
In this ction we focus on a variety of curity vulnerabilities in Web applications that are caud by unchecked input.According to an influential sur-vey performed by the Open Web Application Security Project[41],unvalidated input is the number one cu-rity problem in Web applications.Many such curity vulnerabilities have recently been appearing on special-ized vulnerability tracking sites such as SecurityFocus and were widely publicized in the technical press[39, 41].Recent reports include SQL injections in Oracle products[31]and cross-site scripting vulnerabilities in Mozilla Firefox[30].
2.1SQL Injection Example
Let us start with a discussion of SQL injections,one of the most well-known kinds of curity vulnerabilities found in Web applications.SQL injections are caud by unchecked ur input being p
asd to a back-end databa for execution[1,2,14,29,32,47].The hacker may embed SQL commands into the data he nds to the application,leading to unintended actions performed on the back-end databa.When exploited,a SQL injection may cau unauthorized access to nsitive data,updates or deletions from the databa,and even shell command execution.
Example1.A simple example of a SQL injection is shown below:
HttpServletRequest request=...;
String Parameter("name");
Connection con=...
String query="SELECT*FROM Urs"+
"WHERE name=’"+urName+"’";
This code snippet obtains a ur name(urName)by Parameter("name")and u
s it to construct a query to be pasd to a databa for execution (ute(query)).This emingly innocent piece of code may allow an attacker to gain access to unautho-rized information:if an attacker has full control of string urName obtained from an HTTP request,he can for example t it to’OR1=1;−−.Two dashes are ud to indicate comments in the Oracle dialect of SQL,so the WHERE clau of the query effectively becomes the tau-tology name=’’OR1=1.This allows the attacker to circumvent the name check and get access to all ur records in the databa.2
SQL injection is but one of the vulnerabilities that can be formulated as tainted object propagation prob-lems.In this ca,the input variable urName is con-sidered tainted.If a tainted object(the source or any other object derived from it)is pasd as a parameter to
2.2Injecting Malicious Data
20万左右的车
Protecting Web applications against unchecked input vulnerabilities is difficult becau applications can obtain information from the ur in a variety of different ways. One must check all sources of ur-controlled data such as form parameters,HTTP headers,and cookie values systematically.While commonly ud,client-sidefilter-ing of malicious values is not an effective defen strat-egy.For example,a banking application may prent the ur with a form containing a choice of only two account numbers;however,this restriction can be easily circum-vented by saving the HTML page,editing the values in the list,and resubmitting the form.Therefore,inputs must befiltered by the Web application on the rver. Note that many attacks are relatively easy to mount:an attacker needs little more than a standard Web browr to attack Web applications in most cas.
2.2.1Parameter Tampering含有雪的诗句
The most common way for a Web application to accept parameters is through HTML forms.When a form is sub-mitted,parameters are nt as part of an HTTP request. An attacker can easily tamper with parameters pasd to a Web application by entering maliciously crafted values into textfields of HTML forms.
2.2.2URL Tampering
For HTML forms that are submitted using the HTTP GET method,form parameters as well as their values ap-pear as part of the URL that is accesd after the form is submitted.An attacker may directly edit the URL string, embed malicious data in it,and then access this new URL to submit malicious data to the application.
Example2.Consider a Web page at a bank site that al-lows an authenticated ur to lect one of her accounts from a list and debit$100from the account.When the submit button is presd in the Web browr,the follow-ing URL is requested:
accountnumber=341948&debit_amount=100 However,if no additional precautions are taken by the
Web application receiving this request,accessing
accountnumber=341948&debit_amount=-5000 may in fact increa the account balance.22.2.3Hidden Field Manipulation
Becau HTTP is stateless,many Web applications u hiddenfields to emulate persistence.Hiddenfields are just formfields made invisible to the end-ur.For example,consider an order form that includes a hidden field to store the price of items in the shopping cart: <input type="hidden"name="total_price"
value="25.00">
A typical Web site using multiple forms,such as an on-line store will likely rely on hiddenfields to transfer state information between pages.Unlike regularfields,hid-denfields cannot be modified directly by typing values into an HTML form.However,since the hiddenfield is part of the page source,saving the HTML page,editing the hiddenfield value,and reloading the page will cau the Web application to receive the newly updated value of the hiddenfield.
2.2.4HTTP Header Manipulation
HTTP headers typically remain invisible to the ur and are ud only by the browr and the Web rver. However,some Web applications do process the head-ers,and attackers can inject malicious data into applica-tions through them.While a normal Web browr will not allow forging the outgoing headers,multiple freely available tools allow a hacker to craft an HTTP request leading to a移居
n exploit[9].Consider,for example,the Refererfield,which contains the URL indicating where the request comes from.Thisfield is commonly trusted by the Web application,but can be easily forged by an attacker.It is possible to manipulate the Refererfield’s value ud in an error page or for redirection to mount cross-site scripting or HTTP respon splitting attacks.
2.2.5Cookie Poisoning红枣枸杞豆浆
Cookie poisoning attacks consist of modifying a cookie,which is a smallfile accessible to Web applica-tions stored on the ur’s computer[27].Many Web ap-plications u cookies to store information such as ur login/password pairs and ur identifiers.This informa-tion is often created and stored on the ur’s computer af-ter the initial interaction with the Web application,such as visiting the application login page.Cookie poison-ing is a variation of header manipulation:malicious in-put can be pasd into applications through values stored within cookies.Becau cookies are suppodly invisi-ble to the ur,cookie poisoning is often more dangerous in practice than other forms of parameter or header ma-nipulation attacks.
2.2.6Non-Web Input Sources
Malicious data can also be pasd in as command-line parameters.This problem is not as important
be-cau typically only administrators are allowed to ex-ecute components of Web-bad applications directly
from the command line.However,by examining our benchmarks,we discovered that command-line utilities are often ud to perform critical tasks such as initializ-ing,cleaning,or validating a back-end databa or mi-grating the data.Therefore,attacks against the impor-tant utilities can still be dangerous.
2.3Exploiting Unchecked Input
Once malicious data is injected into an application,an attacker may u one of many techniques to take advan-tage of this data,as described below.
2.3.1SQL Injections
SQL injectionsfirst described in Section 2.1are caud by unchecked ur input being pasd to a back-end databa for execution.When exploited,a SQL in-jection may cau a variety of conquences from leak-ing the structure of the back-end databa to adding new urs,mailing passwords to the hacker,or even executing arbitrary shell commands.
Many SQL injections can be avoided relatively eas-ily with the u of better APIs.J2EE provides the PreparedStatement class,that allows specifying a SQL statement template with?’s indicating statement pa-rameters.Prepared SQL statements are precompiled,and expanded parameters never become part of executable SQL.However,not using or improperly using prepared statements still leaves plenty of room for errors.
2.3.2Cross-site Scripting Vulnerabilities
Cross-site scripting occurs when dynamically gener-ated Web pages display input that has not been properly validated[7,11,22,46].An attacker may embed mali-cious JavaScript code into dynamically generated pages of trusted sites.When executed on the machine of a ur who views the page,the scripts may hijack the ur ac-count credentials,change ur ttings,steal cookies,or inrt unwanted content(such as ads)into the page.At the application level,echoing the application input back to the browr verbatim enables cross-site scripting. 2.3.3HTTP Respon Splitting
HTTP respon splitting is a general technique that enables various new attacks including Web cache poi-soning,cross-ur defacement,nsitive page hijacking, as well as cross-site scripting[28].By supplying unex-pected line break CR and LF characters,an attacker can cau two
HTTP respons to be generated for one mali-ciously constructed HTTP request.The cond HTTP re-spon may be erroneously matched with the next HTTP request.By controlling the cond respon,an attacker can generate a variety of issues,such as forging or poi-soning Web pages on a caching proxy rver.Becau the proxy cache is typically shared by many urs,this makes the effects of defacing a page or constructing a spoofed page to collect ur data even more devastating. For HTTP splitting to be possible,the application must include unchecked input as part of the respon headers nt back to the client.For example,applications that embed unchecked data in HTTP Location headers re-turned back to urs are often vulnerable.
2.3.4Path Traversal
Path-traversal vulnerabilities allow a hacker to ac-cess or controlfiles outside of the intendedfile access path.Path-traversal attacks are normally carried out via unchecked URL input parameters,cookies,and HTTP request headers.Many Java Web applications ufiles to maintain an ad-hoc databa and store application re-sources such as visual themes,images,and so on.
If an attacker has control over the specification of the file locations,then he may be able to read or removefiles with nsitive data or mount a denial-of-rvice attack by trying to write to read-onlyfiles.
Using Java cu-rity policies allows the developer to restrict access to the file system(similar to using chroot jail in Unix).How-ever,missing or incorrect policy configuration still leaves room for errors.When ud carelessly,IO operations in Java may lead to path-traversal attacks.
2.3.5Command Injection
Command injection involves passing shell commands into the application for execution.This attack technique enables a hacker to attack the rver using access rights of the application.While relatively uncommon in Web applications,especially tho written in Java,this attack technique is still possible when applications carelessly u functions that execute shell commands or load dy-namic libraries.
3Static Analysis
In this ction we prent a static analysis that ad-dress the tainted object propagation problem described in Section2.
3.1Tainted Object Propagation
We start by defining the terminology that was infor-mally introduced in Example1.We define an acce
依靠的近义词ss path as a quence offield access,array index operations,or method calls parated by dots.For instance,the result of applying access path f.g to variable v is We denote the empty access path by ;array indexing opera-tions are indicated by[].
A tainted object propagation problem consists of a t of source descriptors,sink descriptors,and derivation descriptors:
•Source descriptors of the form m,n,p specify ways in which ur-provided data can enter the pro-gram.They consist of a source method m,parame-ter number n and an access path p to be applied to
argument n to obtain the ur-provided input.We u argument number-1to denote the return result of a method call.
•Sink descriptors of the form m,n,p specify un-safe ways in which data may be ud in the program.
They consist of a sink method m,argument number n,and an access path p applied to that argument.•Derivation descriptors of the form m,n s,p s,n d,p d specify how data propa-gates between objects in the program.They consist of a derivation method m,a source object given by argu
ment number n s and access path p s,and a destination object given by argument number n d and access path p d.This derivation descriptor spec-ifies that at a call to method m,the object obtained by applying p d to argument n d is derived from the object obtained by applying p s to argument n s.
In the abnce of derived objects,to detect potential vul-nerabilities we only need to know if a source object is ud at a sink.Derivation descriptors are introduced to handle the mantics of strings in Java.Becau String s are immutable Java objects,string manipulation routines such as concatenation create brand new String objects, who contents are bad on the original String objects. Derivation descriptors are ud to specify the behavior of string manipulation routines,so that taint can be explic-itly pasd among the String objects.
Most Java programs u built-in String libraries and can share the same t of derivation descriptors as a result.However,some Web applications u multiple String encodings such as Unicode,UTF-8,and URL encoding.If encoding and decoding routines propagate taint and are implemented using native method calls or character-level string manipulation,they also need to be specified as derivation descriptors.Sanitization rou-tines that validate input are often implemented using character-level string manipulation.Since taint does not propagate through such routines,they should not be in-cluded in the list of derivation descriptors.
It is possible to obviate the need for manual specifica-tion with a static analysis that determines the relationship between strings pasd into and returned by low-level string manipulation routines.However,such an analy-sis must be performed not just on the Java bytecode but on all the relevant native methods as well.
Example3.We can formulate the problem of detecting parameter tampering attacks that result in a SQL injec-tion as follows:the source descriptor for obtaining pa-rameters from an HTTP request is:
Due to space limitations,we show only a few descrip-tors here;more information about the descriptors in our experiments is available in our technical report[34].2 Below we formally define a curity violation:
等闲之辈
Definition3.1A source object for a source descriptor m,n,p is an object obtained by applying access
path p to argument n of a call to m.冒险经历
Definition 3.2A sink object for a sink descriptor m,n,p is an object obtained by applying access path p to argument n of a call to method m.
Definition3.3Object o2is derived from object o1, written derived(o1,o2),bad on a derivation descrip-tor m,n s,p s,n d,p d ,if o1is obtained by applying p s to argument n s and o2is obtained by applying p d to ar-gument n d at a call to method m.
Definition3.4An object is tainted if it is obtained by applying relation derived to a source object zero or more times.
Definition3.5A curity violation occurs if a sink ob-ject is tainted.A curity violation consists of a quence of k such that o1is a source object and o k is a sink object and each object is derived from the pre-vious one:
∀
0≤i<k
i:derived(o i,o i+1).
怎样减肥不反弹
We refer to object pair o1,o k as a source-sink pair. 3.2Specifications Completeness
The problem of obtaining a complete specification for a tainted object propagation problem is an important one. If a specification is incomplete,important errors will be misd even if we u a sound analysis thatfinds all vul-nerabilities matching a specification.To come up with a list of source and sink descriptors for vulnerabilities in our experiments,we ud the documentation of the rele-vant J2EE APIs.
Since it is relatively easy to miss relevant descriptors in the specification,we ud veral techniques to make our problem specification more complete.For example, tofind some of the missing source methods,we instru-mented the applications tofind places where application code is called by the application rver.
We also ud a static analysis to identify tainted ob-jects that have no other objects derived from them,and examined methods into which the objects are pasd. In our experience,some of the methods turned out to be obscure derivation and sink methods missing from our initial specification,which we subquently added.