I.J. Modern Education and Computer Science, 2018, 3, 38-46
Published Online March 2018 in MECS (/)
DOI: 10.5815/ijmecs.2018.03.05establishment
Specific Queries Optimization Using Jaya
Approach
Sahil Saharan
Department of Computer Applications, National Institute of Technology, Kurukshetra, India
Email: sahil.saharan_1376@nitkkr.ac.in
J.S. Lather共和党候选人
Department of Electrical Engineering, National Institute of Technology, Kurukshetra, Indiahanes
Email: jslather@nitkkr.ac.in
R. Radhakrishnan
Department of Computer Science and Engineering, ABES Ghaziabad (Uttar Pradesh), India
Email:
Received: 24 November 2017; Accepted: 15 December 2017; Published: 08 March 2018
Abstract—The Fast query engine is a requirement as a supporting tool for the mantic web technology application such as Electronic Commerce environ. As the large data is reprented using the effective data reprentation called RDF. The focus of this paper is to optimize the specific type of the query called Cyclic query and star query on main-memory RDF data model using ARQ query engine of Jena. For the considered problem, we ruminate a Jaya algorithm for rearrangement of the order of triple pattern and also compare the results with an already propod approach in the literature. The evaluation result shows that Jaya performs better in terms of execution time in comparison to Ant Colony Optimization.
英国首相布朗
Index Terms—Resource Description Framework (RDF), Query Optimization, Jaya, SPARQL, Reordering triple patterns, Semantic Web
I.I NTRODUCTION
With the increasing popularity of Semantic Web [1], overwhelmed data stored over the many heterogeneous, yet interconnected resources which are generally reprented using RDF reprentation. RDF(Resource Description Framework) [2] is a framework to describe and interchange meta-data, which empower machine-interpretable by providing contextual information of the meta-data of data. So this interconnected data instead of pages, fulfill the complex information required in an efficient manner than that of the current web. The technologies of Semantic Web explore different RDF sources to meet very specific need of information. To execute queries W3C’s provides a protocol called SPARQL [3, 4]. The fast query engine is vital requirement for SPARQL queries to fetch the results of the ur query from the widely distributed RDF data in real-time environments. Execute the query with lower execution time requires optimization of triple patterns in SPARQL query. An efficient optimization algorithm for triple patterns can, therefore, contribute to efficient execution time. A list of optimization techniques has been propod already as a solution to the current problem such as 2-Pha Optimization (2PO) algorithm [5], genetic algorith
m (GA) [6], Ant Colony Optimization (ACO) algorithm [7] etc.
The proposal of the current study has been encouraged by the optimization of query paths in traditional databas. For dynamic mantic web, which support the change in the data time to time and needs a better optimization strategy; Jaya [8] is an alternative to the current approaches and have already been applied to the different field like: constrained mechanical design optimization problems [9], optimization of machining performance characteristics during the turning of CFRP composites [10], optimum power flow (OPF) problems [11], optimization of traditional machining process named surface grinding [12], dimensional optimization of a micro-channel heat sink [13], optimization of combined economic emission dispatch solution [14] etc.
合肥雅思培训The behavior of the propod algorithm is such that it works in a continuous manner and allows accommodating to changes in the data in real time. Through this paper, we are introducing Jaya applicability to the RDF query optimization in specific query forms over a single data source.
The rest of paper is organized as follows. Section 2 introduced a literature review. Section 3 introduces the concepts for RDF and SPARQL queries. Additionally, Jena API and the ARQ engine are also introduced. The BGP construction from where clau predicates of SPARQL queries are illu
strated. Section 4 introduced Jaya optimization algorithm. Section 5 describes the problem. Section 6 describes the obrvations using experimental study of the propod algorithm and its comparison in terms of execution time, fitness function
value and solution quality. Finally, in later ctions the discussion which is followed by a conclusion.
II.R ELATED W ORK
Regarding RDF databas context, literature study of rearrangement of the order of triple patterns and other key factors for optimization of join ordering in RDF databas is discusd here.
Stuckenschmidt et al. [15] prented the first solution for the reordering triple pattern (by optimizing the order of path expression) using hybrid algorithm two-pha optimization (2PO) (a combination iterative improvement and simulated annealing) over RDF Cyclic query by extending the Sesame system [16]. They considered the issue of integrated access to distributed RDF repositories from a practical point and provide flexibility, freshness, and independence of data as advantages over the centralized approach. Shironoshita et al. [17] introduce an algorithm for cardinality estimation of queries over ontology models of data. The introduced algorithm is an important ction for building an efficient query engine for distributed and heterogeneous data sources. Maduko et al. [18] prent
ed a framework using pattern-bad summarization to estimate cardinality. The strategy propod follows 1) pattern and their sub-pattern can have almost equal frequencies and 2) previous knowledge of pattern’s importance. For results calculation, they u Dynamic programming with two greedy solutions over real world and synthetic datats. Stocker et al. [19] prented a static optimization approach using reordering triple patterns of the BGP. For joined triple patterns, they prented a t of heuristics for the lectivity estimation and summary statistics of RDF data. Ruckhaus et al. [20] prented a hybrid cost model and dynamic programming bad optimization approach for queries optimization and named it as deductive ontology ba (DOB). Additionally, they ud an adaptive sampling technique to estimate the cardinality of intermediate inferred facts.
Next, Hogenboom et al. [6] introduced a GA bad optimization approach and experimental results over datats prented the effectiveness of propod approach over 2PO in ca of large Chain queries and for small queries having 10 predicates, 2PO prents better results. Neumann and Weikum [21] prented an RDF-3X engine and dynamic programming for optimal plan generation and lectivity histograms to estimate the cost of joins between triple patterns. To meet the scalability when querying a large number of triple patterns, Neumann and Weikum [22] introduced a very light-weight strategy for sideways information passing between parate joins and ud aggreg
ated statistics to accurately estimate the lectivity of join-ordering. Kaoudi et al. [23] studied the query optimization problem on top of distributed hash tables and using three greedy algorithms. The 3 algorithms are ud to optimize the intermediate relations generated using the lectivity-bad heuristics named: naive static algorithm, mi–naive static algorithm ,and dynamic algorithm. Neumann and Moerkotte [24] propod new accurate cardinality estimation for RDF databas bad on characteristic ts. They show experimentally that new method is superior to the estimation methods of commercial DBMSs and RDF-3X[21].
For multi-join ordering problem, Ouyang et al. [25] introduce a strategy using a genetic algorithm (GA) to optimize the execution plan and for generating the best plan, they u a bushy tree. Hogenboom et al. [7] introduce a new solution to the current problem of query optimization using ACO algorithm over Chain query and shows best results when compared with [21] and 2PO [15]. The heuristic ud by this experiment is introduced by [14]. In the solutions for query optimization, Gomathi et al.[26] also contributes to an efficient algorithm named adaptive Cuckoo arch (ACS) to an optimal query plan for large RDF data. Next, Kalayci et al. [27] propod a new optimization strategy using Ant Colony Optimization (ACO) algorithm for reordering triple patterns and they ud statistics for lectivity estimation propod by [19] with some modification. Meimaris and Papastefa
natos [35] prent a new approach of join reordering that converts a query into a multidimensional vector space and performs distance-bad optimisation.
They compared the results with existing approaches and proved that the propod approach is better than existing.
The prior work studies suggest that there is still need for better heuristic and so we are investigating a new algorithm called: Jaya. The efficiency of the propod approach can be en through the experimental results.
III.RDF AND SPARQL
RDF is data model which has a flexibility of schema-free. RDF develops major momentum in different areas like knowledge-management communities by collecting facts about entities and their relations using RDF reprentation. RDF data can be shown by entity-relationship graph and each triple of RDF can be reprented as a node-arc-node link [28].
SPARQL query depends on matching of graph patterns with triple patterns of the query and SPARQL queries generally made of triple patterns known as BGPs (Basic Graph Patterns) [4]. The structure of triple pattern is such that it contains <s, p, o>, that could be concrete or variable [19].
Jena[29] is a framework for Semantic Web applications that is capable to store and manipulate in-memory RDF data. ARQ [30] as query engine ud in Jena for querying RDF data. We are using ARQ in the propod approach.
The study traditional databa joins ordering operation of SQL queries enforces to study the rearrangement of the order of triple pattern of SPARQL query and this type of approach also make a greater impact on the execution time of the query.
We are given an example of Cyclic query to make clarity and importance about arrangement of order of
BGP (i.e. where clau of the SPARQL query containing the different pattern of triples) in Table 1 and we have taken this Cyclic query with six triple patterns from the LUBM [31] datat provided by the Lehigh University. We have assigned quence number from 0 to 5 to each triple pattern.
In the query example, we provide a numbering from 0 to 5 to each triple pattern. For the given order [0, 1, 2, 3, 4, 5] in Table 1, Jaya algorithm execute this 6 cyclic query with an execution time of 231714ms, whereas the execution time for [3, 1, 2, 0, 4, 5] order of listing 2 is 71128ms and for the order [0, 5, 2, 3, 4, 1] of listing 3, the execution time was 65ms. Here, [0, 5, 2, 3, 4, 1] is the optimal o
rder that have optimal execution time.
Table 1. BGP of an example Cyclic query[31]
0 ?X rdf:type ub:GraduateStudent. 1 ?Y rdf:type ub:University. 2 ?Z rdf:type ub:Department. 3 ?X ub:memberOf ?Z. 4 ?Z ub:subOrganizationOf ?Y.
5 ?X ub:undergraduateDegreeFrom ?Y.
Through the example given above, we obrved that the simple query of SPARQL in ARQ provides longer execution, [34] suggested a solution: adding more lective part of the query first, makes an impact on the execution time of the query.
IV. O PTIMIZATION A LGORITHM : J AYA
The first pha of the introduced strategy consider three major parts; first is to evaluate the number of concrete and variable element matching, cond is to estimate the join values using lectivity ba
d on estimated cardinality and third is to the construction of cost matrix using the estimated join values and estimated cardinality values. During the cond pha of the propod strategy, apply the Jaya algorithm over the constructed cost matrix for the rearrangement of the order of triple patterns.
Jaya [8] algorithm procedure is such that: using the upper and lower bound of the process variables, initially generate the random solution. Then, update the variable of every solution by Equation (1). The best candidate indicates the best value of the objective function from the whole candidate solutions and worst candidate indicate the worst value of the objective function from the whole candidate solutions. If i,j,k S indicate the value of the j-th variable for the k-th candidate at i-th iteration. The equation is:
i+1,j,k i,j,k i,j,1i,j,best i,j,k i,j,2i,j,worst i,j,k S =S +
r (S -abs(S ))-r (S -abs(S )) (1)
i,j,1r and i,j,2r reprents random number chon from the range [0,1] for the jth variable at the ith iteration. If the
列举英文modified value is better function value than the previous value then u the modified value otherwi avoid it. The equation (1) has an expression i,j,1i,j,best i,j,k r (S abs(S ))- which helps to move the solution to the best solution and i,j,2i,j,worst i,j,k r (S abs(S ))-helps in avoiding the worst solution. Random number and absolute of the variable ensures good exploration. Algorithm: Jaya algorithm[8]
Initialize the population size, number of variable and stoping criteria
Until stoping criteria met
Evaluate the fitness for each population
Evaluate best and worst solution in the population
Update the solution using:
j,k,i j,k,i 1,j,i j,best,i j,k,i 2,j,i j,worst,i j,k,i Y'Y rand (Y Y )rand (Y Y )
legging
=+--- Update the solution
No update in the solution
Repeat
Report optimum result
Discrete conversion of Jaya
The originally given for the continuous problem, but for our u we have to u a discrete version of the algorithm so that we can u Jaya algorithm in the query optimization problem of query optimization. So, to convert it in to a discrete problem, we convert the equation into such a way:
j,k,i j,k,i j,best,i j,k,i j,worst,i j,k,i Y'Y (Y Y )(Y Y )= (2)
Operator is the crossover operator we ud here the PMX crossover [33] operator and finally to eliminate the duplicate value of the final result, we u here the mutation operation which will generate the final result. The intermediate result will u mutation j,best,i j,k,i Y (Y Y )=∙and the mutation operation is the swapping of the one randomly generated triple pattern of the BGP with the current generated triple pattern if the random generated value is less than the mutation probability.
Algorithm: DJaya algorithm
Initialize the population size, number of variable and stoping criteria
Until stoping criteria met
Evaluate the fitness for each population
Evaluate best and worst solution in the population Update the solution using:
j,k,i j,k,i j,best,i j,k,i j,worst,i j,k,i Y'Y (Y Y )(Y Y )= j,best,i j,k,i Y (Y Y )=∙ % mutation in above intermediate result Update the solution
No update in the solution Repeat
Report optimum result
V.P ROBLEM D ESCRIPTION
We are considering the problem of optimization of the triple patterns of SPARQL queries using rearrangement of the order of the triple patterns. For rearrangement, we are considering Jaya algorithm.
如何缓解考试压力1. Ud Cost Model
From the different triple patterns of BGP, we constructed a complete digraph [27] of the given Ɓ where nodes of digraph which reprent the triple patterns. And the arc between nodes is resulted as a join between two triple patterns whenever join occur between two triple patterns otherwi it is resulted as a Cartesian product between triple patterns and resulted as 1(which is the upper limit of lectivity(range from 0 to1)). The arc is computed using addition of lectivity of most lective triple pattern 1st pattern and join estimation of two patterns whenever some join occur between two triple patterns. The digraph construction is important due to the fact that lectivity of one triple pattern is different than that of other. Rearrangement of order of triple pattern doesn’t affec t join between two triple patterns.
For the calculation of values of arc values, firstly consider the two triple patterns. The different triple patterns have different number of bound and unbound elements. According to the triple patterns, calculate bound and unbound elements of each triple patterns and bad on the bound and unbound elements, calculate the different joins between the triple patterns. Each bound and unbound component have its matching triple patterns and different bound and unbound elements join value depends on the cardinality and the estimated lectivity values. To evaluate the values the steps to be followed are:
1.Evaluate the number of concrete and variable
element matching.
2.Evaluate the estimated lectivity.
Next, the explanation of the above two steps are as follow:
1. Evaluate the number of concrete and variable element matching.
To evaluate the number of concrete and variable element matching. To make it understandable we are using an example:
?GraduateStudent rdf:type ub:GraduateStudent Above triple pattern has two concrete elements rdf:type ub:GraduateStudent and one variable and due to the two concrete elements the u the procedure [27] and evaluate cardinality for each concrete element using GSH[32]. In other words, we can say that find the number of triple pattern of each concrete element at the different position(s, p, and o) of the triple pattern using GSH and prerve the smallest number of triple pattern as the new cardinality. And finally, u the lowest value as the resulted cardinality. Note that, GSH only provide cardinality for one concrete element. For understanding, u the following example:
打电话英文? GraduateStudent ub:memberOf ?Depatment
The example shows that at only predicate position one concrete element is prent, so for this only cardinality can be provided by GSH.
2. Evaluate the estimated lectivity and join[27].
To evaluate the join between the concrete and the variable elements of the given two triple patterns different steps are ud:
Step1: U the two intermediate triple patterns of the given query.
Step 2: Now find the different that match between the two intermediate triple patterns.
Step 3: Now find the ranks of each matching and add them. Different ranks are a modified version of [19]. Steps 4: Assign the cost as 32 [19] and then subtract the different added rank value. And factor using resulted cost value/original cost.
Step 5: Finally, calculate the join value between two intermediate triple patterns. (Fig 1)
3. Jaya bad Query Optimization problem
We have SPARQL query with different triple patterns, we have to generate an execution plan that provides smaller execution time through rearrangement of the order of triple patterns. The solution is to construct the complete digraph according to BGP of given SPARQL query. Then u Jaya algorithm for rearrangement of the order of triple patterns.
Fig.1. Flow Chart for the evaluation of join
VI.E XPERIMENTAL R ESULTS
1. Setup, Datat detail and Ud Queries
The experimental results were taken over the machine with 32-bit Oracle JDK virtual machine running on Intel(R) Core(TM) i7 -3770M CPU @ 3.40 GHz, 32-bit with 4 GB RAM and Window 10 OS.
We are using 162923 triples RDF datat which was taken from LUBM. LUBM provides OWL ontology for university domain. We are considering only Cyclic and star queries for our experiment with veral triple 4, 6, 8 and 10 which are constructed from LUBM. The lection of the triple patterns is due to the fact that if the number of the triple patterns are small then sometimes they will address no optimization at all, that’s why we considered the large triple patterns.
According to our choice for this experiment, we u synthetically constructed queries [27] over LUBM having 4, 6, 8 and 10 triple patterns with Cyclic and star as shape. Queries are not bound to Cyclic and star shapes, and diver types of the queries with different triple patterns and complex gr
aph queries are planned. Here, just one query of every different type has been taken into account to comparative evaluation of the propod method. The lected Cyclic, Star as are sufficient enough to benchmark the results.
Jaya algorithm is independent of the algorithm-specific parameters and only us some common parameters like population size and a number of generations. For the test purpo, we apply a ries of 10 test over the single query of every different type and different algorithm.
2. Ud Abbreviations and their Detail
1. Executions Without optimization.
2. Ant System using Modified Method.
3. Elitist Ant System using Modified
清华少儿英语加盟Method.
4. MAX–MIN Ant System
algorithm using Modified Method.
5. Jaya using Modified Method. Here, Modified Method is name given to MVC[27] becau it is a modified method. For Jaya algorithm, we are using different calibration like starting node as random with population size and iteration size as 100 and 100 respectively. For the result evaluation, consider the execution time of the query as a sum of the time taken by optimization, execution and population time (time for the iteration of the result t in millicond) for all the algorithms except EWO.
For the comparison, we have taken different algorithms: (1) EWO (2) ACO different versions (AS, EAS, MMAS) [27] (3) Our propod approach.
3. The results are compared on the basis of:
1.Triple patterns 4, 6, 8, 10 for Cyclic Type Query.
2.Triple patterns 4, 6, 8, 10 for Star Type Query. 1). Cyclic queries
Table 2 shows results of Cyclic queries consist of various triple patterns wherein query execution time is considered in milliconds. Table results show that Jaya-