An efficient query matching algorithm for relational data semantic cache

An Efficient Query Matching Algorithm forRelational Data Semantic Cache

Munir AhmadCentre for Distributed and Semantic Computing

Mohammad Ali Jinnah UniversityIslamabad, Pakistan

[email protected]

Muhammad SanaullahCentre for Distributed and Semantic Computing


[email protected]

Abstract- Data access latency can be reduced for databases byusing caching. Semantic caching enhances the performance ofnormal caching by locally answering the partially overlappedqueries. Query processing (generation of probe and remainderquery from the incoming queries) and cache management need tobe addressed in its totality to really enjoy these benefits. That is,there is a need of correct, complete and efficient algorithms toprocess incoming queries and to manage semantic and datacache. In this paper, we address this issue in the context of queryprocessing. We have observed that the algorithm proposed by Q.Ren and his colleagues has some inefficiencies and redundancies.To overcome these inefficiencies and redundancies, we haveproposed an algorithm for query matching with hierarchal storedquery semantics. Proposed algorithm performs matching ofstored semantics in cache with semantics of incoming query. Italso has capability to generate amending query efficiently andrejects incorrect queries at initial level. Comparison of proposedalgorithm is made with existing algorithm. Complexity ofproposed query matching algorithm is O(n) which is smaller thenthe existing which have 0(2n-l), n is number of attributes in arelation. Also, the proposed algorithm is capable to stop theuseless processing as was done in the previous algorithms.

Keywords: Query matching, Query processing, SemanticCaching, Relational Databases, Algorithms.

I. INTRODUCTION

Importance of heterogeneous and distributed databasesystems is increasing day by day. Data access time increaseswhen we have to access data from remote location frequently.Concept of cache is introduced to retrieve data efficiently byreusing the already retrieved data. Cache stores the alreadyretrieved data and reused it against similar queries. Queryprocessing for semantic cache is extensively studied for bothSQL and XML queries [4, 18]. We have included only thosetechniques that belong to query processing over SQL querieswith semantic cache.

Muhammad Abdul QadirCentre for Distributed and Semantic Computing


[email protected]

Muhammad Farhan BashirCentre for Distributed and Semantic Computing


[email protected]

Cache can be categorized into page, tupple, and semantic[5, 9]. All of these are used to improve the response time up todifferent extent. Pages are retrieved in the presence of pagecaching. Tupples are retrieved when a system is using tupplecaching technique. Portion of page or portion of tupple cannotbe reused in the presences of page or tupple cachingrespectively [9]. A scheme with the name of Intelligent cachemanagement is introduced [1, 3] to reduce the overhead ofpageand tupple cache. To retrieve the portion of page or portion oftupple efficiently semantic cache is used due to having abilityto answer the overlapped or partial queries locally [10].Semantic cache provides better performance than page andtupple cache [5, 9] and the overall workload can be reducedsignificantly by using semantic caching [10]. With semanticcaching user-posed query is divided into probe (overlappedportion) and remainder (non-overlapped) queries [5]. Divisionprocess of query into probe and remainder is referred as queryprocessing. Efficient query processing is a must for efficientsemantic caching and a major activity of query processing isquery matching. So efficient query matching ensures efficientquery processing [6]. We have tried to make an efficient querymatching algorithm that increases the efficiency of semanticcaching. We have evaluated previous algorithms and discussedtheir deficiencies in our previous research activity [17]. In thispaper we have described the algorithm that overcomes thelimitations of existing algorithms. To describe query processingalgorithms, some notation have already been proposed as givenin the following table 1 [5, 6, 11]. These notations will be usedin this paper, too.

The remainder of this paper is organized as follows. Insection II, related work is discussed. Section III presentsproposed algorithms with cache architecture, case study isdiscussed in section IV; and comparison results are discussedin section V. Section VI concludes the paper and presentsfuture direction.

Notation DescriptionS Segment on Cache

Qu User posed queryQs SELECT clause ofuser queryQF FROM clause of user queryQw WHERE clause of user queryaq Amending queryrq Remainder querypq Probe queryPA Predicate attributeSA Attributes of segmentSp Predicate of segment

SR Relation of segmentSc Contents of segmentKA Key attribute of segmentCA Common attributesRs Result from serverDA Difference attributesFR Final ResultCR Server Result~ Assignment Operator

II. RELATED WORK

When semantics of queries are store in a flat structure, thequery matching process is very expensive (time consuming)[10, 5]. To reduce the cost [10], cache is divided into segments[5]. Runtime complexity and caching efficiency is improved bydivision of cache into segments. Idea of amending query is alsointroduced in [5] which proved helpful to making the querymatching efficient. Query matching is performed on the base of112 rules in our earlier work [11]. That work is applicable forsimple queries only. Predicate matching of query is presentedby B.T. Jonsson [16]. It is not able to handle the SELECT CLAUSE

of a query. In [14] query matching algorithm is presented foraggregate. In this paper we present the query matching for Select

and Project queries. Predicates are matched on the base ofpredicate value in [15, 12]. Matching procedure defined by Q.Ren and colleagues [5] is based on implication andsatisfiability [7, 8, 13] to match the predicate of query. Due toimplication and satisfiability, the algorithm has followingproblems:

A. In the reference [7, 8, 13]; it is clearly mention that thesealgorithms work only for conjunctive formulae, but Q. Ren etal. [5] claims that query processing algorithm will work fordisjunction of conjunctive predicates. If referenced algorithmsare not able to handle the disjunctive expressions then howquery matching algorithm by Q. Ren et al. [5] will work fordisjunctive queries.

B. Referenced algorithms for implication and satisfiablityassume that variables of segment predicate must be included inthe user's predicate. i.e PAS =PAU where PAS is set of variables ofsegment predicate and PAU is the set of variables of user'spredicate. In other words we can say that if predicate attributes

TABLE I. NOTATIONS of both user and cached segment are same then satisfiablity willreturn yes otherwise it will returns no as is clear from thefollowing example.

Example to highlight the problemHere we assume that user and semantic predicates are given

as below:

PAS = age>40 /\ Sal>50kand

PAU = age<50 /\ Sal<60k

In this example algorithm will return yes. Meanwhile ifusers pose queries with different attributes like:

PAU = eName = "Komal"

Then algorithm will return no because PAS '# P AU. Even someof the data may exist in cache with following predicate:

(age>40 /\ Sal>50k) /\ (eName = Komal)

Hence, there should be a mechanism to generate probequery in these types of cases.

Query matching algorithm proposed by Q. Ren et al. [5] hasa chance of useless processing specially when there is norelated data available in the cache. It can be explained clearlywith the following example.

Example to highlight the problemWe assume that there is a segment'S' in cache with the

following query.

S = SELECT eName, Sal FROM employee WHERE age>30SA = { eName, Sal}, Sp= age>30, SR = employee

Now, user poses following query 'Q'

Q = SELECT age FROM employee WHERE age>30QA = {age}, Qp = age>30, QR = employee

According to case 3 of query matching algorithm proposedby Q. Ren et al. and basing on above example QA is a not asubset of SA, so true result will be evaluated on the base of thiscondition. Then implication will be checked. Result ofimplication will also be true because conditions (predicate) ofboth (Qp and Sp) are same.

Hence, query will be divided into the probe and remainderquery. At the end there will be no result across the probe query.So there is a useless processing and wastage of resources. Evena single attribute does not exist then why processing proceeds?If there will be 'n' segments in cache, then same process willbe executed 'n' times, which increases the run time complexity.

In these situations query should not proceed after checkingthe first condition that declare that there is no commonattributes of posted query and cached segment. In the givenalgorithm a strong concept of generation of amending query isproposed with the help of key-contained segments, but this idea

is not handled efficiently in the query matching algorithm. Anexample of inefficient generation of amending query is givenbelow.

Example to highlight the problemSuppose there is a segment in the cache as given below.

S = SELECT eName, Sal FROM employee WHERE age>30SA = f eName, Sal}, Sp= a~e>30, SR = employee

Now, user poses following query 'Q'

Q = SELECT eName, post FROM employee WHERE age>3QA = { eName }, Qp = age>30, QR = employee

Condition attribute (age) is not present in the segment thenhow the result will be retrieved against probe query as given incase 3 of query matching algorithm. An amending queryshould be generated in these cases.

Query matching algorithm proposed by Q. Ren et al. [5] isnot capable to handle the SELECT '*' and incorrect queries asdiscussed in [6]. Our proposed scheme will provide the abilityto reject incorrect queries as well as handle the SELECT '*'cases.

III. PROPOSED SCHEME

In this section we have discuss our proposed algorithmwhich is the enhancement ofQ. Ren and colleague's algorithm[5] and based on hierarchal content matching [2]. Working ofour proposed algorithm is based on semantic cachingarchitecture as we have proposed in one of the previousresearch paper [6] given in Figure 1.

Query Processor

Rs

Figure 1. Semantic cache architecture [6]

Enhanced query matching algorithm which has an ability toovercome the limitations as discussed in the section 2 is givenin Figure 2.

Algorithm1: QUERy_MATCHING()INPUT: Qu (User query)OUTPUT: pq and rqPROCEDURE:1. PORTIONS := SPLIT_QUERY (Qu)2. Reject: = CHECK_REJECTION (Qs, QF, PA)3. if (Reject =false)

CA, DA:=MATCH_SELECT_CLAUSE (QsJIf(DA!= empty)

rq1 := 1tDAGQp(QR)Else

rql :=Nullif (CA!=empty)

if (!(QpA ~ Sp))aq :=GEN_AMEND_QUERy()

Elseaq :=Null

if (Qp ==> Sp)pq := 1tCAGSP(SC)rq2 := NULL

Else if ((Qp A Sp) is Satisfiable)pq := 1tCAGSPAQP(SC)rq2 := 1tCAGSPA--Qp(QR)

Else if ((Qp A Sp) is unsatisfiabIe)pq:= Nullrq2 := 1tcAGQP(QR)

Elsepq := Nullrq2 := Null

FR:= pq+rq1+rq24. Else

Rej ect Incorrect Query.

Figure 2. Query Matching Algorithm

Efficient process of generation of amending query is aproblem as discussed previously. We have presented analgorithm to generate amending query efficiently [6] as givenin figure 3.

Algorithm2: GEN_AMEND_QUERY()Input: pq, Sp, and Qw. PA•

Output: aqProcedure:

if(Sp ~ Qw II Sp=Qwllpq=Null)aq~Null

else Generate aq with KA on Sp.

Figure 3. Algorithm for Generation of Amending Query

In the above example there are 22 possible queries thatmake separate segments. So the formula to calculate possiblesegments across a single relation over 'n' attributes is "2n-1 ".Then add segments across each relation. As in example 15+7=22: (24-1= 15 &23-1=7). Hence, 22 segments are to be visitedto check availability of data on cache in worst case whichincreases the response time drastically.

Hierarchal scheme reduces the number of comparisons tofind out whether the data is available or not [2]. Only 'n'comparisons are required to check availability of data in thecache.

Now we have taken two queries as given in figure 7 to dryrun our proposed algorithms. One will be rejected at zero leveldue to invalidity and the other one will pass through all theprocess.

SR S SASI e IDS2 eNameS3 SalS4 AgeS5 e ID, eNameS6 e ID, Sal

(1)(1) S7 e ID, Age~0 S8 e ID, eName Sal Age} S9 e ID Sal Age(1)

S10 e ID, eName AgeSII eName SalS12 eName AgeS13 Sal Age,eNameS14 e ID, eName, SalS15 Sal, AgeS16 sNameS17 Grade

tJ'.)

S18 Gender......~(1) S19 Gender, Grade"'C.a S20 Gender, sNametJ'.)

S21 sName, GradeS22 sName, Grade, Gender

Splitter module of the scheme will work according to thealgorithm given in figure 4.

Algorithm 3: SPLIT_QUERY()INPUT: Qu (User query)OUTPUT: Qs, Qw, QFPROCEDURE:

Qs := SELECT CLAUSEQw:= WHERE CLAUSEQF:= FROM CLAUSEReturn Qs. QJV, QF.

Figure 4. Figure 4: Algorithm to Split Query

An algorithm is given in figure 5 to validate the queries.According to this algorithm incorrect queries will be rejected atinitial level and there will be no useless query processingrequired.

Algorithm 4: CHECK_REJECTION (Qs, QF, PAJINPUT: Qu (User query)OUTPUT: Qs, QF, PAPROCEDURE:1. If all attributes of Qs present in scheme

Ifrelation ofQFpresent in schemaIfPAis present in schema

return falseElse return true

Else return true2. Else return true

Figure 5. Figure 5: Algorithm to reject incorrect queries.

IV. CASE STUDYS

In this section we have elaborated the proposed algorithmwith the help of case study.

Let us consider there is a database of university with tworelations employee and students as given in the followingfigure 6:

TABLE!!. POSSIBLE SEGMENTS.

University

Figure 6. University Schema

For the above given schema of university; there are 15 and7 segments are possible for employee and students relationaccording to Q. Ren et al. approach [5]. In simple words wecan say that there are 15 queries possible against employee andsimilarly 7 for students as given in table 2.

QuI0=8ELECT eNarne, Sal FROM students WHERE Age>40QuII = SELECT e_ID, eName, Sal FROM employee WHERE Age>25

Figure 7. User Queries

QuiD will be rejected at zero level by rejecter because Qswill be not matched with attributes for student.

Now we execute the Quii over proposed architecture aswith respect to cache segment of table 2. Splitter splits theQuii into three portions as follow:

Qs+- {E_Id, eName, Sal}QF~ {employee}Qw~Age>25.

across number of attributes. Response time is simply calculatedon the bases of complexity expressions (n vs 2n_l).Rejecter receive these three (Qs, QF, Qw) and pass it to

Decider after checking the validity. Then Decider checks theavailability of required attributes by applying hierarchalapproach [2].

Here for simplicity we assume that there exists only onesegment "SII" in the cache with condition (age>30) foremployee. As eNarne and Sal are available at cache so CA andDA will be computed as follows:

';;' 1200 ,.........,.,~~~~---.,

e- 1000~e 800

E= 600~

~ 400

=Q. 200

~1 2 3 4 5 6 7 8 9 10

-+- Proposed

-Q. Ren etal.

CA+-- { eName, Sal}DA+-- {E_Id}

After calculation of common and difference attributes,decider will generate and send following remainder query toserver.,...------------------------.

rq+- SELECT E_Id FROM employee WHERE Age>25.

List of common attributes with QF and Qw will be sent toLQG. We assume that this query is Qu11a then this will be as:

Qu11a+-- SELECT eName, Sal FROM employee WHERE

Age>25.

Probe and remainder queries will be generated by LocalQuery Generator on the base of matching with segment oncache. Note that here Qs will be equal to CA. Firstly, Qu11awill be matched with S11 (assumption here is that there is onlyon segment 811 with condition age>30) and probe andremainder query as,

pq+-- SELECT eName, Sal FROM (Sc)rq +-- SELECT eName, Sal FROM employee WHERE

25<Age<==30

Here, amending query (aq) is null because Sp~Qs.

As, we have seen that PA (age) is not part of Sp. Soaccording to [5] an amending query will be generated but wehave not generated amending query according as per theproposed scheme.

This process of extracting probe and remainder queries willbe continued until all of the segments are visited or remainderqueries become null. LQG sends all of the probe queries tocache contents and final remainder query to the server toretrieve data.

Finally, rebuilder receives CR from cache and SR fromserver. CRand SR combined to build F R and the semantics in thecache will be updated accordingly.

V. COMPARISONS

In this section, comparison between proposed querymatching scheme and Q Ren's algorithm [5] is presented. Wehave used workload parameters same as taken in [5]. Oneadditional parameter we have taken is cache hit ratio. Cache hitratio varies from 0-100%. Figure 8 presents the response time

No. of Attributes

Figure 8. Complexity Comparison

VI. CONCLUSION AND FUTURE WORK

Query processing is a major activity in semantic caching. Itplays a major role to reuse the stored contents in cache. Anefficient query matching algorithm is presented with hierarchalapproach for semantic matching. Paper have concluded thatquery matching runtime complexity is significantly reduced to'n' from" 2n-l".

We are working on query processing with hierarchalsemantic matching scheme for aggregate queries and otherissues related with the query processing and management ofsemantic cache.

REFERENCES

[1] M.U Ahmed, R.A Zaheer, and M.A. Qadir, "Intelligent cachemanagement for data grid";,!n Proceedings of the AustralasianWorkshop on Grid Computing and E-Research, New South Wales,Australia, 2005 pp.

[2] M.F. Bashir and M.A. Qadir, "HiSIS: 4-Level Hierarchical SemanticIndexing for Efficient Content Matching over Semantic Cache". INMIC,IEEE, Islamabad, Pakistan, 2006, pp. 211-214.

[3] M.F. Bashir, , R.A. Zaheer, Z.M. Shams and M.A. Qadir. "SCAM:Semantic Caching Architecture for Efficient Content Matching overData Grid". AWIC, Springer Heidelberg, Berlin, 2007. pp. 41-46.

[4] L.Chen, E.A. Rundensteiner and S. Wang. "XCache -A SemanticCaching System for XML Queries". In Proceedings of the 2002 ACMSIGMOD international Conference on Management of Data, ACMPress, New York, 2002, pp. 618-618.

[5] Q. Ren, M.H. Dunham, and Vijay Kumar, "Semantic Caching and QueryProcessing". Knowledge and Data Engineering, IEEE ComputerSociety, 2003, pp. 192-210.

[6] M. Ahmad, M. A. Qadir, A. Razaque, and M. Sanaullah. "EfficientQuery Processing over Semantic Cache". Intelligent Systems and Agents,ISA 2008. indexed by IADIS digital library (www.iadis.net/dl). Heldwithin IADIS Multi Conference on Computer Science and InformationSystems (MCCSIS 2008), Amsterdam, Netherland. 22-27 July 2008.

[7] X. Sun. N.N. Kamel, and L.M. Ni, "Processing Implication on Queries",Software Engineering, IEEE, Piscataway, USA 1989, pp. 1168-1175.

[8] C.M. Chen and N. Roussopoulos, "The Implementation andPerformance Evaluation of the ADMS Query Optimizer: IntegratingQuery Result Caching and Matching," Proc. Int'l Conf. ExtendingDatabase Technology, 1994, pp. 323-336.

[9] S. Dar, MJ. Franklin, B.T. Jonsson, D. Srivatava, and M. Tan,"Semantic Data Caching and Replacement," Proceeding of VLDBConference, VLDB, 1996, pp. 330-341.

[10] P. Godfrey and J. Gryz, "Semantic Query Caching for HeterogeneousDatabases," In Proc. 4th KRDB Workshop "Intelligent Access toHeterogeneous In/ormation", Athens, Greece, 1997, pp.61-66.

[11] M.F. Bashir and M.A. Qadir, "ProQ - Query Processing Over SemanticCache For Data Grid", Center for Distributed and Semantic Computing,Mohammad Ali Jinnah University, Islamabad, Pakistan 2007.

[12] M.R. Sumalatha, V. Vaidehi, A. Kannan, M. Rajasekar and M.Karthigaiselvan, "Dynamic Rule Set Mapping Strategy for the Designof Effective Semantic Cache", ICACT, IEEE, Gangwon-Do, Korea,2007,pp.1952-1957.

[13] S. Guo, W. Sun, and M.A. Weiss, "Solving Satisfiability andImplication Problems in Database Systems," Database Systems, ACM,1996, pp. 270-293.

[14] J. Cai, Y. Jia, S. Yang, and P. Zou, "A Method of Aggregate QueryMatching in Semantic Cache for Massive Database Applications".Springer- Verlag, Berlin Heidelberg 2005, pp. 435-442.

[15] M.R. Sumalatha, V. Vaidehi, A. Kannan, M. Rajasekar and M.Karthigaiselvan, "Hash Mapping Strategy for Improving RetrievalEffectiveness in Semantic Cache System", ICSCN, IEEE, Chennai,India, 2007, pp. 233-237.

116] B. T. Jonsson, M. Arinbjarnar, B. Thorsson, M. Franklin, and D.Srivastava, "Performance and overhead of semantic cache management",Internet Technology, ACM, New York, USA, 2006, pp. 302-331.

[17] M. Ahmad, M. A. Qadir and M. Sanaullah. "Query Processing overRelational Databases with Semantic Cache: A Survey". 12th IEEEInternational Multitopic Conference, INMIC 2008, IEEE, Karachi,Pakistan, December 2008, In Press.

[18] Muhammad Sanaullah, Muhammad Abdul Qadir, Munir Ahmad,"SCAD-XML: Semantic Cache Architecture for XML Data Files usingXPath with Cases and Rules", 12th IEEE International MultitopicConference, INMIC 2008, IEEE, Karachi, Pakistan, December 2008, Inthe Press.

An efficient query matching algorithm for relational data semantic cache

Documents