dloungeSkyline with Presorting:Theory and Optimizations
Jan Chomicki1,Parke Godfrey2,Jarek Gryz2,and Dongming Liang2
1University at Buffalo,USA
2York University,Canada
Abstract.There has been interest recently in skyline queries,also called Pareto queries,on relational databas.Relational query languages do not support arch for“best”tuples,beyond the order by statement.The propod skyline operator allows one to query for best tuples with respect to any number of attributes as preferences.In this work,we explore what the skyline means,and why skyline queries are uful,particularly for expressing preference.We describe the theoretical aspects and possible optimizations of an efficiant algorithm for computing skyline queries prented in[6].
1Introduction and Motivation
Often one would like to query a relational databa in arch of a“best”match,or tuples that best match one’s preferences.Relational query lan-guages provide only limited support for this:the min and max aggregation operators,which act over a single column;and the ability to order tuples with re
exciting和excited>粉扑怎么用spect to their attribute values.In SQL,this is done with the order by clau.This is sufficient when one’s preference is synonymous with the val-ues of one of the attributes,but is far from sufficient when one’s preferences are more complex,involving more of the attributes.
Consider a table of restaurant guide information,as in Figure1.Column S stands for rvice,F for food,and D for decor.Each is scored from1to 30,with30as the best.We are interested in choosing a restaurant from the guide,and we are looking for a best choice,or best choices from which to choo.Ideally,we would like the choice to be the best for rvice,food,and decor,and be the lowest priced.However,there is no restaurant that is better than all others on every criterion individually,as is usually the ca in real life,and in real data.No one restaurant“trumps”all others.For instance, Summer Moon is best on food,but Zakopane is best on rvice.
While there is no one best restaurant with respect to our criteria,we want at least to eliminate from consideration tho restaurants which are wor on all criteria than some other.Thus,the Briar Patch BBQ should be eliminated becau the Fenton&Pickle is better on all our criteria and is thus a better choice.The Brearton Grill is in turn eliminated becau Zakopane is better than it on all criteria.Meanwhile the Fenton&Pickle is wor on every criterion than every other(remaining)restaurant,except on price,where it
restaurant S F D price
Summer Moon21251947.50
Zakopane24202156.00
Yamanote22221751.50
Fenton&Pickle16141017.50
Fig.2.Restaurants in the skyline.
In[3],a new relational operator is propod which they name the sky-line operator.They propo an extension to SQL with a skyline of clau as counterpart to this operator that would allow the easy expression of the restaurant query we imagined above.In[9]and elwhere,this is called the Pareto operator.Indeed,the notion of Pareto optimality with respect to mul-tiple parameters is equivalent to that of choosing the non-dominated tuples, designated as the skyline.
<
[min|max|diff]
skyline of a[1][min|max|diff],...,a[n]
收到英文Fig.3.A propod skyline operator for SQL.
The skyline of clau is shown in Figure3.Syntactically,it is similar to an order by clau.The columns a1,...,a n are the attributes that our preferences range over.They must be of domains that have a natural total ordering,as integers,floats,and dates.The directives min and max specify whether we prefer low or high values,respectively.The directive diff says that we are interested in retaining best choices with respect to every distinct value of that attribute.Let max be the default directive if none is stated.The
skyline query in Figure4over the table GoodEats in Figure1express what we had in mind above for choosing“best”restaurants,and would result in the answer t in Figure2.
lect*from GoodEats
skyline of S max,F max,D max,price min
Fig.4.Skyline query to choo restaurants.
Skyline queries are not outside the expressive power of current SQL.The query in Figure5shows how we can write an arbitrary skyline query in prent SQL.The c i’s are attributes of OurTable that we are interested to retain in our query,but are not skyline criteria.The s i are the attributes that are our skyline criteria to be maximized,and would appear in skyline of as s i max.(Without loss of generality,let us only consider max and not min.) The d i are the attributes that are the skyline criteria to differ,and would appear in skyline of as d i diff.
lect c1,...,c k,s1,...,s m,d1,...,d n
from OurTable
except
lect D.c1,...,D.c k,D.s1,...,D.s m,D.d1,...,D.d n
from OurTable T,OurTable D
where D.s1≤D.s m≤T.s m and
限制对朝石油运输(D.s1<D.s m<T.s m)and
D.d1=D.d n=T.d n
Fig.5.SQL for generating the skyline t.
Certainly it would be cumbersome to need to write skyline-like queries in this way.The skyline clau is a uful syntactic addition.More important than ea of expression,however,is the expen of evaluation.The query in Figure5can be quite expensive.It involves a lf-join over a table,and this join is aθ-join,not an equality-join.The lf-join effectively computes the tuples that are trumped—or dominated—by other tuples.The tuples that remain,that were never trumped,are then the skyline tuples.It is known that the size of the skyline tends to be small,with certain provisos,with respect to the size of the table[7].Thus,the intermediate result-t before the except can be enormous.
No current query optimizer would be able to do much with the query in Figure5to improve performance.If we want to support skyline queries, it is necessary to develop an efficient algorithm for computing skyline.And if we want the skyline operator as part of SQL,this algorithm must be
easy to integrate in relational query plans,be well-behaved in a relational context,work in all cas(without special provisions in place),and be easily accommodated by the query optimizer.
2015考研政治
Recent years have brought new interest in expressing preference queries in the context of relational databas and the World Wide Web.Two competing approaches have emerged so far.In thefirst approach[1,8],preferences are expresd by means of preference(utility)functions.The cond approach us logical formulas[4,9]and,in particular,the skyline operator[3,12], described in the previous ction.Skyline computation is similar to the max-imal vector problem studied in[2,10,11].The consider algorithmic solutions to the problem and address the issue of skyline size.None of the works address the problem in a databa context,however.In[7],we address the question of skyline query cardinality more concretely.
In this paper,we explore what the skyline means,and why skyline queries are uful,particularly for expressing preference.We describe a well-behaved, efficient algorithm for computing skyline queries.Our algorithm improves on exisiting approaches in efficiency,pipelinabilty of output(of the skyline tu-ples),stability of run-time performance,and being applicable in any context. 2Skyline versus Ranking
The skyline of a relation in esnce reprents the best tuples of the relation, the Pareto optimal“solutions”,with respect to the skyline criteria.Another way tofind“best”tuples is to score each tuple with respect to one’s prefer-ences,and then choo tho tuples with the best score(ranking).T
he latter could be done efficiently in a relational tting.In one table scan,one can score the tuples and collect the best scoring tuples.
How is skyline related to ranking then?It is known that the skyline repre-nts the closure over the maximum scoring tuples of the relation with respect to all monotone scoring functions.For example,in choosing a restaurant as in the example in Section1,say that one values rvice quality twice as much as food quality,and food quality twice as much as decor,tho restaurants that are best with respect to this“weighting”will appear in the skyline.Fur-thermore,the skyline is the least-upper-bound closure over the maximums of the monotone scoring functions[3].
This means that the skyline can be ud instead of ranking,or it can be ud in conjunction with ranking.First,since the best tuples with respect to any(monotone)scoring are in the skyline,one only needs effectively to query the skyline with one’s preference queries,and not the original table itlf. The skyline is(usually)significantly smaller than the table itlf[7],so this would be much more efficient if one had many preference queries to try over the same datat.Second,as defining one’s preferences in a preference query can be quite difficult,while expressing a skyline query is relatively easy,urs mayfind skyline queries beneficial.The skyline over-answers with respect to
the urs’intent in a way,since it includes the best tuples with respect to any preferences.So there will be some choices (tuples)among the skyline that are not of interest to the ur.However,every best choice with respect to the ur’s implicit preferences shows up too.
While in [3],they obrve this relation of skyline with monotone scoring functions,they did not offer proof nor did they discuss linear scoring func-tions ,to which much work restricts focus.Let us investigate this more cloly,and more formally,then,for the following reasons:
•to relate skyline to preference queries,and to illustrate that expressing preferences by scoring is more difficult than one might initially expect;•to rectify some common misconceptions regarding scoring for the pur-pos of preference queries,and regarding the claim for skyline;and •to demonstrate a uful property of monotone scoring that we can exploit for an efficient algorithm to compute the skyline.
stallmanLet attributes a 1,...,a k of schema R be the skyline criteria,without loss of generality,with respect to “max”.Let the domains of the a i ’s be real,without loss of generality.Let R be a relation of schema R,and so reprents a given instance.
Definition 1.Define a monotone scoring function S with respect to R as a function that takes as its inp
ut domain tuples of R,and maps them onto the range of reals.S is compod of k monotone increasing functions,f 1,...,f k .For any tuple t ∈R ,S (t )= k i =1f i (t [a i ]).
Lemma 1.Any tuple that has the best score over R with respect to any monotone scoring function S with respect to R must be in the skyline.1
It is more difficult to show that every tuple of the skyline is the best score of some monotone scoring.Most restrict attention to linear weightings when considering scoring,though,so let us consider this first.
Definition 2.Define a positive,linear scoring function ,W ,as any function over a table R ’s tuples of the form W (t )= k i =1w i t [a i ],in which the w i ’s are positive,real constants.
As we insist that the w i ’s are positive,the class of the positive,linear scoring functions is a proper sub-class of the monotone scoring functions.Commonly in preference query work,as in [8],the focus is restricted to linear scoring.It is not true,however,that every skyline tuple is the best with respect to some positive,linear scoring.
Theorem 1.It is possible for a skyline tuple to exist on R such that,for every positive,linear scoring function,the tuple does not have the maximum score with respect to the function over table R .
Consider R=((4,1),(2,2),(1,4)).All three tuples are in the skyline (skyline of a1,a2).Linear scorings that choo(4,1)and(1,4)are obvious, but there is no positive,linear scoring that scores(2,2)best.Note that(2,2) is an interesting choice.Tuples(4,1)and(1,4)reprent in a way outliers. They make the skyline becau each has an attribute with an extrema value. Whereas(2,2)reprents a balance between the attributes(and hence,pref-erences).For example,if we are conducting a hou hunt,a1may reprent the number of bathrooms,and a2,the number of bedrooms.Neither a hou with four bathrooms and one bedroom,nor one with one bathroom and four bedrooms,em very appealing,whereas a2bth/2bdrm hou might. Theorem2.The skyline contains all,and only,tuples yielding maximum values of monotone scoring functions.
While there exists a monotone scoring function that choos—assigns the highest score to—any given skyline tuple,it does not mean anyone would everfind this function.In particular,this is becau,in many cas,any such function is a contrivance bad upon that skyline’s values.The ur is arching for“best”tuples and has not en them yet.Thus,it is unlikely anyone would discover a tuple like(2,2)above with any preference query. Yet,the2bth/2bdrm hou might be exactly what we wanted.
For the algorithm for skyline computation we are to develop,we can exploit our obrvations on the
monotone scoring functions.Let us define the dominance relation,“ ”,as follows:for tuples any r,t∈R,r t iffr[a i]≤t[a i],for all i∈1,...,k.Further define that r≺t iffr t and r[a i]<t[a i],for some i∈1,...,k.
Theorem3.Any total order of the tuples of R with respect to any monotone scoring function(ordered from highest to lowest score)is a topological sort with respect to the skyline dominance partial relation(“ ”).
Consider the total ordering on R provided by the basic SQL order by as in the query in Figure6.This total order is a topological sort with respect to dominance.
留学中介比较好
lect*from R
speechless什么意思
order by a1desc,...,a k desc;
Fig.6.An order by query that produces a total monotone order.
The following proposition is fairly obvious and it is ud to build a better skyline algorithm.
Theorem4.Any nested sort of R over the skyline attributes(sorting in descending order on each),as in the query in Figure6,is a topological sort with respect to the dominance partial order.taylor swift好听的歌