Preci Identification of Composition Relationships for UML Class Diagrams
Ana Milanova
Department of Computer Science
Renslaer Polytechnic Institute
gvmt
milanova@cs.rpi.edu
Abstract
Knowing which associations are compositions is impor-tant in a tool for the rever engineering of UML class dia-grams.Firstly,recovery of composition relationship bridges the gap between design and code.Secondly,since composi-tion relationships explicitly state a requirement that certain reprentation cannot be expod,it is important to deter-mine if this requirement is met by component code.Verify-ing that compositions are implemented properly may pre-vent rious programflaws due to reprentation exposure.
We propo an implementation-level composition model bad on ownership and a novel approach f
or identifying compositions in Java software.Our approach us static ownership inference bad on points-to analysis and is de-signed to work on incomplete programs.We prent empir-ical results on veral components.In our experiments,on average40%of the examinedfields account for relation-ships that are identified as compositions.We also prent a precision evaluation which shows that our analysis achieves almost perfect precision—that is,it almost never miss composition relationships.The results indicate that preci identification of interclass relationships can be done with a simple and inexpensive analysis,and thus can be easily in-corporated in rever engineering tools that support itera-tive model-driven development.
1.Introduction
In modern software development design recovery through rever engineering is performed often;in a typ-ical iterative development process rever engineering is performed at the beginning of every iteration to re-cover the design from the previous iteration[11].
UML class diagrams describe the architecture of the pro-gram in terms of class and interclass relationships;they are scalable,informative and widely-ud design models. While the UML concepts of class and inheritance have cor-respondingfirst-class concepts in object-oriented program-ming lan
guages,the UML concepts of association,aggre-gation and composition do not have corresponding language concepts.Thus,while the rever engineering of class and inheritance hierarchies is straightforward,the rever engi-neering of associations prents various challenges.
UML associations model relatively permanent interclass relationships;conventionally,they are implemented using instancefields of reference type[11](e.g.,an association from class A to class B is implemented using a reference field of type B in class A).Thus,rever engineering tools infer associations by examining instancefields of reference type;however,the inference is often non-trivial.One chal-lenge is the recovery of one-to-many associations imple-mented using pudo-generic ,Vector). Another challenge is the recovery of compositions.Mod-ern rever engineering tools such as Rational R OSE and Ar-goUML do not address the challenges and produce incon-sistent and even incorrect class diagrams(e Gu´e h´e neuc and Albin-Amiot[9]for detailed examples).Clearly,this leads to a gap between design class diagrams and rever engineered class diagrams which hinders understanding, round-trip engineering and identification of design patterns.
Towards the goal of bridging this gap,this paper pro-pos a methodology for inference of binary associations for UML class diagrams.Its major emphasis and contribu-tion is the inference of compo
sition relationships,which we believe is challenging and inadequately addresd in previous work.While the UML concept of aggregation is ”largely meaningless”[6],the UML concept of composition has a well-defined mantics that emphasizes the notion of ownership:a”composition is a strong form of[whole-part] association with strong ownership of parts by the composite and coincident lifetime of parts with the composite.A part may belong to only one component at a time”[18,Chapter 14].Therefore,a composition relationship at design level states the requirement for ownership and no reprenta-tion exposure at implementation ,the owned com-ponent object cannot be expod outside of its compos-ite owner object);if composition is implemented properly ownership should be prerved.
It is important to investigate techniques for recovery of
composition relationships.Firstly,it helps bridge the gap between the design class diagram and the rever engi-neered diagram.Secondly,since composition relationships explicitly state a requirement that certain reprentation cannot be expod,it is important to determine if this re-quirement is met by component code.Verifying that compo-sitions are implemented properly may prevent rious pro-gramflaws due to reprentation exposure such as the well-known signers bug in Java1.1.1
Thus,the goals of this work are(i)to define an implementation-level ownership model that captures the notion of composition in design and(ii)to de-sign an analysis algorithm that infers ownership and composition in accordance with this model.Our defi-nition of implementation-level composition is bad on the owners-as-dominators ownership model[4,13];in this model the owner object(the composite)should dom-inate an owned object(a component)—that is,all ac-cess paths to the owned object should pass through its owner.The owners-as-dominators model defines an own-ership boundary for each owner;intuitively,an owned object may be accesd by its owner as well as other ob-jects within the boundary of the owner.For example,an owned object stored in an instancefield may be pasd to an owned container.As pointed out by Clarke et.al[4,13] and also obrved by us during the empirical investiga-tion,the owners-as-dominators model captures well the notion of composition in modeling.
We propo a novel static analysis for ownership infer-ence.If the ownership inference determines that all objects stored in afield are owned by their enclosing object,the analysis identifies a composition through thatfield.Our ap-proach works on incomplete programs.This is an important feature becau in the context of rever engineering tools it is esntial to be able to perform parate analysis of soft-ware components.For example,it is typical to have to an-alyze a component
without having access to the clients of that component.Our ownership inference analysis is bad on points-to analysis,which determines the t of objects a reference variable or a reference objectfield may point to. We u the points-to analysis solution to approximate the possible access between run-time objects.
We prent empirical results on veral components.In our experiments,on average40%of the examinedfields ac-count for relationships that are identified as compositions. We also prent a precision evaluation which shows that our analysis achieves almost perfect precision—that is,it almost never miss composition relationships identified in our model.The results indicate that preci identification of interclass relationships can be done with a simple and inex-1In Java1.1the curity system Signers re-turned a pointer to an internal array allowing clients to modify the ar-ray and compromising the curity of the system.pensive analysis,and thus can be easily incorporated in re-ver engineering tools that support iterative model-driven development.
This work has the following contributions:
•We propo an implementation-level ownership and composition model that captures well the notion of composition in modeling.
•We propo a static analysis for identifying composi-tion relationships in accordance with our model;the analysis works on incomplete programs.
•We prent an empirical study that evaluates our anal-ysis on veral Java components.
2.Problem Statement
Rever engineering tools typically infer associations by examining instancefields of reference type in the code.In our model,an association relationship through afield f is refined as composition if it can be proven that all objects referred by f are owned by their enclosing object.Thus, given a suitable definition of implementation-level owner-ship and composition,our goal is to design a static analy-sis that answers the question:given a t of Java class(i.e, a component to be analyzed)for what instancefields we ob-rve implementation-level composition throughout all pos-sible executions of arbitrary client code built on top of the class?The analysis output is a t offields for which the relationship is guaranteed to be a composition for arbitrary client code.
Clearly,there are various us of this information when integrated in a tool for the rever engineering of UML class diagrams.Firstly,recovery of compositions bridges the gap between design and implementation and eas the understanding of underlying design.Secondly,verifying that comp
ositions are implemented properly and refactor-ing early if needed may help prevent rious programflaws such as the signers bug in Java1.1.Ownership of rep-rentation is a desirable property and information about it can be included automatically in the component documen-tation.
The input to the analysis contains a t Cls of interact-ing Java class.We will u”class”to denote both Java class and interfaces becau for the purpos of this work the difference is not relevant.A subt of Cls is designated as the t of accessible class;the are class that may be accesd by unknown client code from outside of Cls. Such client code can only accessfields and methods from Cls that are declared in some accessible class;the acces-siblefields and methods are referred to as boundaryfields and boundary methods.
Section2.1describes the ownership model,and Sec-tion2.2describes the notion of implementation-level com-
public class Vector {
protected Object[]data;public Vector(int size){1data =new Object[size];}
public void addElement(Object e,int at){2data[at]=e;}
en wikipedia org
public Object elementAt(int at){3return data[at];}
public Enumeration elements(){4return new VIterator(this);}}
final class VIterator implements Enumeration {Vector vector;int count;
VIterator(Vector v){5this.vector =unt =0;}Object nextElement(){
7Object[]data =vector.data;8int i =unt++;
10return data[i];}}
main(){
11Vector v =new Vector(100);12X x =new X();
13v.addElement(x,0);
merry是什么意思中文
14Enumeration e =v.elements();15x =(X) e.nextElement();16x.m();}
Figure 1.Simplified vector and its iterator.
(a) Original Object Graph
Figure 2.Object graphs for Figure 1.
position bad on it.Section 2.3discuss certain con-straints to the model that allow more preci identification of ownership and composition.
2.1.Ownership Model
The ownership model is bad on the notion of owners-as-dominators [4,3,13].It is esntially the model pro-pod by Potter et al.[13]with veral modifications that allow more preci handling of popular object-oriented pat-terns and idioms such as iterators,composites and facto-ries [7].In this model,each execution is reprented by an
object graph which describes access relationships between run-time objects:
stomach•Let f be a reference instance field in a run-time object o .There is an edge o f
→o in the object graph iff field f in o refers to o at some point of program execution.2•There is an edge o []→o iff some element of array o refers to o at some point of program execution.•There is an edge o →o iff an instance method or con-structor invoked on receiver o has local variable r that refers to o ,or a static method called from an instance method or constructor invoked on o ,has a local vari-able r that refers to o .There is an edge of this kind only if there is no edge of the first kind from o to o .A run-time object o is accesd in the context of o iff there is an edge from o to o in the object graph.The start of program execution is expresd with a special node root .Context root reprents the context for main and for ob-jects referenced by static fields.For example,executing main in Figure 1results in the object graph in Figure 2(a).Node Vector corresponds to the object created at the new site at line 11,node Object[]corresponds to the array created at the site at line 1,node VIterator corresponds to the iterator created at the site at line 4,and node X corre-sponds to the object created at the site at line 12.randy
The owners-as-dominators model states that the owner of an object o is the immediate dominator of
o in the ob-ject graph [13].3Thus,according to this model Object[]is not owned by its enclosing Vector object for this ex-ecution due to the access relationship (although only tem-porary)between VIterator and Object[].To make the model less restrictive,we introduce the relaxed ob-ject graph which omits edges due to certain temporary access relationships.We consider two kinds of temporary access relationships.The first kind aris when an object is created in one context and immediately pasd to an-other context;the relationship between the creating object and the new object is only temporary but if shown on the graph it is likely to restrict ownership.This notion cap-tures the situations when an object is created and immedi-ately returned (e.g.,as in return new VIterator (this );in method elements in Figure 1)and when an object is cre-ated and immediately pasd to another context (e.g.,as in new BufferedReader (new FileReader (fileName ))).This situation occurs in popular object-oriented design patterns
2
We require that all newly created objects appear in the object graph ex-plicitly.Analogously to [4],this is done by requiring that at the point of creation a new object is stored in a new local variable;clearly,this transformation does not change program mantics.
3
Node m dominates node n if every path from the root of the graph that reaches node n has to pass through node m .The root dominates all nodes.Node m immediately dominates node n if m dominates n and there is no node p such that m dominates p and p dominates n .
such as factories,decorators and composites;in the cas the temporary relationship between the creating object and the newly created one is a matter of safety andflexibility of the implementation rather than an intention of the design. The cond kind of temporary access relationships aris fromfield read statements r=l.f,where r is not assigned, pasd as an implicit or explicit argument,or returned.This notion captures the situation that aris in iterators(consider statement data=vector.data in nextElement in Fig-ure1)—iterator objects have temporary references to the reprentation of their collections,which allows efficient access of collection elements;however,the collection ob-ject is always in scope.Therefore,if all access of o in the context of o are due to such temporary access relation-ships,edge o→o is not shown in the relaxed object graph.
The relaxed object graph for the execution of the code in Figure1is shown in Figure2(b).Note that edge Vector→VIterator is omitted becau it is due to a temporary access relationship of thefirst kind; edge VIterator→Object[]is omitted as well be-cau it is due to a temporary access relationship of the cond kind.The owner of o is the immediate dom-inator of o in the relaxed object graph.Thus,
root owns X,Vector and Viterator and Vector owns Object[].
2.2.Implementation-level Composition
Let A be a class in Cls,and f be afield of type B declared in A where B is a reference type(class,inter-face or array type[8]).The ownership property holds for f if throughout all possible executions of arbitrary clients of Cls,every instance of A owns the instances of B that its ffield refers to.Consider the ca when f is a collec-tionfield—that is,all objects stored in thefield are arrays or instances of one of the standard java.util collection ,java.util.Vector).If every instance of A owns all corresponding instances stored in the collection, there is a one-to-many composition relationship between A and C,where C is the lowest common supertype of the in-stances stored in the collection;otherwi,there is a one-to-many regular association(it may also be referred as aggre-gation).For collectionfields for which the ownership prop-erty holds,there is an attribute of the association{owned collection}which indicates that the collection is owned by its enclosing object.Consider the ca when f is not a col-lectionfield.If the ownership property holds for f,the asso-ciation between A and B is a one-to-one composition;oth-erwi it is a regular one-to-one association.
Note that in Java,due to interfaces,it is not always possi-ble tofind unique ,not Object)least com-mon supertype of two types.Thus,we u a variant of the Java type system in which any two types have a unique lowest-common supertype,ud elwhere as well[20,5]. In this system types include a powert of the Java types in the class being analyzed.The mapping from a Java type, T is the smallest t that includes T and all its supertypes in Java.Since the elements in this system are ts,the low-est common supertype is the interction of two ts.铆钉英文
Consider the following example taken from[20].Sup-po that the least common supertype of two class C1 and C2,both of which implement interfaces I1and I2is needed.Since there are two common immediate supertypes of C1and C2,there is no unique supertype.The mapping of C1in the new type system is{C1,I1,I2,Object};simi-larly,the mapping of C2is{C2,I1,I2,Object}.The least common supertype in the types is simply the interction of the two ts:{I1,I2,Object}.When mapping the new types back to Java types it is not guaranteed that a Java type will exist.Thus,an analysis can either choo an alternative real type such as Object,show one-to-many associations with all types in the t,or ask the ur for help.Our anal-ysis,described in Section4,would display Object in this ca;assuming that the analysis is relatively preci,hav-ing Object as the least common supertype would indicate a potentialflaw and trigger a careful review of code and de-sign.
Example.Consider the package in Figure3.This ex-ample is bad on class from the standard Java li-brary package java.util.zip,with some modifi-cations made to simplify the prentation and better il-lustrate the problem and our approach.Cls contains the class from Figure3plus class ZipEntry.The accessi-ble class are ZipInputStream,ZipOutputStream and ZipEntry and the boundary methods are all pub-lic methods declared in tho ,the component can be accesd from client code through the public meth-ods declared in the class).
Clearly,the CRC32objects are always owned by their enclosing streams.Therefore,their is one-to-one composition relationships throughfields crc in both ZipInputStream and ZipOutputStream.There is a regular one-to-one association throughfield entry in ZipInputStream;it is easy to construct client code on top of the class such that the ZipEntry in-stances created in ZipInputStream objects are leaked to client code from getNextEntry.Similarly,there is a regular one-to-one association through entry in ZipOutputStream becau the ZipEntry objects are pasd from client code to putNextEntry.The associa-tions throughfields names and entries are both one-to-many regular associations between ZipOutputStream and ZipEntry;both have attribute{owned collec-tion}.Clearly,the ZipOutputStream instance trivially owns the Hashtable instance.It owns the Vector in-stance as well,although the Vector instance is referred
绿色的英语
package zip;
public class InflaterInputStream{
protected Inflater inf;
protected byte[]buf;
public InflaterInputStream(Inflater inf,
int size){
this.inf=inf;
buf=new byte[size];}
public InflaterInputStream(Inflater inf){ this(inf,512);}
//methods read andfill contain instance calls on inf}
public class ZipInputStream extends InflaterInputStream{
private ZipEntry entry;
private CRC32crc=new CRC32();
public ZipInputStream(){
一带一路英文
super(new Inflater(true),512);} public ZipEntry getNextEntry(){
<();
foreskin
<();
if((entry=readLOC())==null)return null;
return entry;}
private ZipEntry readLOC(){
ZipEntry e=new ZipEntry();
//code reads and writesfields of e
return e;}}
public class ZipOutputStream extends DeflaterOutputStream{
private ZipEntry entry;
private Vector entries=new Vector();
private Hashtable names=new Hashtable();
private CRC32crc=new CRC32();
public ZipOutputStream(){
super(new Deflater(...));}
public void putNextEntry(ZipEntry e){ //code reads and writesfields of e
boltedif(names.put(e.name,e)!=null){...}
entries.addElement(e);
entry=e;}
public void cloEntry(){
ZipEntry e=entry;
//code reads and writesfields of e
<();
entry=null;}
public void finish(){
Enumeration enum=entries.elements();
while(enum.hasMoreElements()){...}}} Figure3.Sample package zip.
to in the context of its iterator(recall the example in Fig-ure1);however,the iterator is a local object owned by the enclosing ZipOutputStream object which en-sures that the Vector instance is dominated by the en-closing ZipOutputStream and may be accesd only within its ownership bound
ary.
2.3.Discussion
In order to allow more preci identification of implementation-level composition,we employ the follow-ing constraint,standard for other problem definitions that require analysis of incomplete programs[17,15].We only consider executions in which the invocation of a bound-ary method does not leave Cls—that is,all of its transitive callees are also in Cls.In particular,if we consider the pos-sibility of unknown subclass,all instance calls from Cls could potentially be”redirected”to unknown exter-nal code that may affect the composition inference.For ex-ample,afield may be identified as composition in the current t of class but an unknown subclass may over-ride some method and the overriding method may leak the fi,by assigning it to a staticfield).
Thus,Cls is augmented to include the class that pro-vide component functionality as well as all other class transitively referenced.In the experiments prented in Sec-tion5we included all class that were transitively ref-erenced by Cls.This approach restricts analysis informa-tion to the currently”known world”—that is,the informa-tion may be invalidated in the future when new subclass are added to Cls.Another approach is to change the analy-sis to make worst ca assum
ptions for calls that may enter some unknown overriding methods.However,in this ca, the analysis will be overly conrvative and likely report fewer compositions.Thus,we believe that it is more u-ful to restrict the analysis to the known world;of cour, the analysis ur must be aware that the information is valid for the given t of known class.
3.Points-to Analysis
Points-to analysis determines the t of objects that a given reference variable or a referencefield may point to. This information has a wide variety of us in software tools and optimizing compilers.In this paper,points-to informa-tion is ud for ownership inference.It is needed to con-struct a graph that approximates all possible object graphs that can happen when arbitrary client code is built on top of Cls.There is a large body of work on points-to analysis with different trade-offs between cost and precision.In this paper we consider ownership inference bad on the well-known Andern-styleflow-and context-innsitive points-to anal-ysis for Java from[16].4
3.1.Points-to Analysis for Java
The points-to analysis is defined in terms of three ts. Set R is the t of locals,formals and staticfields of ref-erence type.Set O is the t of object names;the objects created at an allocation sit
e s i are reprented by object name o i∈O.Set F contains all instancefields in program 4Flow-innsitive analys do not take into account theflow of con-trol between program points and are less preci and less expensive thanflow-nsitive analys.Context-nsitive analys distinguish between different calling contexts of a method and are more preci and more expensive than context-innsitive ones.
class.The analysis solution is a points-to graph where the edges reprent the following”may-refer-to”relationships:•Let r∈R and o∈O.An edge r→o in the points-to graph means that at run time r may refer to some object that is reprented by o.
•Let f∈F be a reference instancefield in objects rep-rented by some o∈O.An edge o f→o2means that at run timefield f of some object reprented by o may refer to some object reprented by o2.
•If o reprents array objects,o[]→o2shows that some element of some array reprented by o may refer at run time to an object reprented by o2.
The Andern-style points-to analysis for Java from[16] is a relatively preciflow-and context-innsitive inclusion-bad analysis.It propagates may-refer-to rela-tionships by analyzing program statements.For example, when it analyzes statement”p=q”it infers that p may re-fer to any object that
q may refer to.
3.2.Fragment Points-to Analysis
Points-to analys and Andern’s analysis in particular are typically designed as whole-program analys;they take as input a complete program and produce points-to graphs that reflect relationships in the entire program.However,the problem considered in this paper requires points-to analy-sis of a partial program.The input is a t of class Cls and the analysis needs to construct an approximate object graph that is valid across all possible executions of arbi-trary client code built on top of Cls.To address this prob-lem we make u of a general technique called fragment analysis due to Nasko Rountev[14,17,15].Fragment anal-ysis works on a program fragment rather than on a complete program;in our ca the fragment is the t of class Cls.
Initially,the fragment analysis produces an arti-ficial main method that rves as a placeholder for client code written on top of Cls.Intuitively,the artifi-cial main simulates the possibleflow of objects between Cls and the client code.Subquently,the fragment anal-ysis attaches main to Cls and us some whole-program analysis engine to compute a points-to graph which summa-rizes the possible effects of arbitrary client code.The frag-ment analysis approach can be ud with a wide variety of p
oints-to and class analys;for the purpos of this pa-per we only consider fragment analysis ud with the Andern-style points-to analysis from[16].
The placeholder main method for the class from Fig-ure3is shown in Figure4.The method contains variables for types from Cls that can be accesd by client code.The statements reprent different possible interactions involv-ing Cls;their order is not relevant becau the subquent void main(){
ZipEntry ph ZE;
ZipInputStream ph ZIS;
ZipOutputStream ph ZOS;
ph ZE=new ZipEntry();
ph ZIS=new ZipInputStream();
ph ZOS=new ZipOutputStream();
ph ZE.tCRC(0);
ph ZE=NextEntry();
ph ZOS.putNextEntry(ph ZE);
ph ZOS.cloEntry();
ph ZOS.finish();}
Figure4.Placeholder main method for zip.
whole-program analysis isflow-innsitive.Method main invokes all public methods from the class in Cls desig-nated as accessible.
The details of the fragment analysis will not be discusd here;they can be found in[17].For the purpos of our anal-ysis we discuss the object reachability[15]property of the results computed by the fragment analysis;this property is relevant for the analysis described in Section4.Consider some client program built on top of Cls and an execution of this program(the program must satisfy the constraints dis-cusd in Section2.3).Let r∈R be a variable declared in Cls and at some point during execution r is the start of a chain of object references that leads to some heap ob-ject.In the fragment analysis solution,there will be a chain of points-to edges that starts at r and leads to some
object name o that reprents the run-time object.A similar prop-erty holds if r is declared outside of Cls.In this ca,in the fragment analysis solution,the starting point of the chain is the variable from main that has the same type as r.
We illustrate this property for our points-to analy-sis.Consider the example from Figures3and4.There are three allocation sites in the main method;they are denoted by names ZE1,ZIS1and ZOS1.Name byte[]corresponds to the allocation site in class InflaterInputStream.There are three alloca-tion sites in class ZipInputStream;they are denoted by names CRC1,Inflater1and ZE2.There are four al-location sites in class ZipOutputStream;they are de-noted by Vector1,Hashtable1,Deflater1and CRC2.In addition,we consider the allocation sites in Vector(recall Figure1),which are transitively reach-able;they are denoted by Object[]and VIter1. The points-to graph computed by Andern’s analy-sis from the code in Figures4,3and1is shown in Figure5. Heap object names are underlined and reference vari-able names are prefixed by the name of their declaring method.For simplicity,implicit parameters this and ob-ject names Inflater1,byte[],Hashtable1and Deflater1are not shown.