1 Web Service Query Service Manivasakan Sabesan and Tore Risch Uppsala DataBase Laboratory Dept. of Information Technology Uppsala University Sweden
11
Web Service Query Service
Manivasakan Sabesan and Tore Risch
Uppsala DataBase Laboratory
Dept. of Information Technology
Uppsala University
Sweden
2
Outline
WSMED
Research Area
Adaptive Query Parallelization
Conclusion & Future work
• WSMED provides general query capabilities over data providing web services.
• Users only need to provide WSDL URLs of web services.
• WSMED automatically creates SQL views for each web service operation.
• It makes every web service operation query-able without any programming.
• Users can make any SQL query by using the automatically created SQL views.
WSMED (Web Service MEDiator) System
4
Service Oriented Architecture of WSMED
WSMED Server
SQL View1
WSDL metadata 1
WS Operation 1
WS Operation p
WS Operation 1
WS Operation q
WS1 WSn
WSDL metadata n
Import metadata
SQL Viewm
IMPORTWSDL AUTHENTICATION QUERY EXIT_SINIT
WSMED Web Service Interface
TABLEINFO
SOAP call
WSMED Demo
• WSMED provides web service query service.
• WSMED Demo can be accessible from a web browser.
• Java Script is used to invoke directly WSMED web service.
6
Outline
WSMED
Research Area
Adaptive Query Parallelization
Conclusion & Future work
7
Queries calling data providing web services have a similar pattern :- dependent calls.
Web service calls incur high-latency and high message setup cost
A naïve implementation of an application making these calls sequentially is time consuming
A challenge here is to develop methods to speed up such queries with dependent web service calls
Research Problems
WS1 WS2 WS3 WSn
8
Outline
WSMED
Research Area
Adaptive Query Parallelization
Conclusion & Future work
9
Example Query
select gl.City , gl.TypeIdfrom GetAllStates gs, GetPlacesWithin gp, GetPlaceList glwhere gs.state=gp.state and gp.distance=15.0 and gp.placeTypeToFind='City' and gp.place='Atlanta' and gl.placeName=gp.ToPlace+' ,'+gp.ToState and gl.MaxItems=100 and gl.imagePresence='true'
Finds information about places located within 15 km from each City named ’Atlanta‘ in all US states.
• Invokes 300 web service calls and returns a stream of 360 tuples
<City,
TypeId>GetAllStates GetPlacesWithin GetPlaceList<state> <ToPlace,
ToState>
<15,’City’,’Atlanta’> <100,’true’>
10
Query Processing in WSMED
Parallel query plan
SQL queryCalculus
Generator
Parallel pipeliner
Plan function generator
Non-parallel plan optimizer
Plan splitter
Phase 1
Phase 2
Non-parallel plan
γGetPlacesWithin(‘Atlanta’, state, 15.0, ‘City’)
<City, TypeId>
γGetPlaceList (str, 100, ‘true’)
γGetAllStates()
<state >
<city , state2 >
γconcat(city,’, ‘, state2)
<str>
Split point 1
Split point 2
PF1
PF2
Non-Parallel Plan
<str>
12
Adaptive Parallel Plan
<state>
AFF_APPLYP(PF2, str)
<City, TypeId>
γGetAllStates()
AFF_ APPLYP(PF1, state)
13
Parallel Process Tree
qi- query process (i=0,1,......n)PFj- Plan Function (j=1,......m)
Level 2
q0
q1
q3 q4
q2
GetAllStates
q5 q8q7q6
Coordinator
Level 1
Query
PF1
GetPlaceList
GetPlacesWithin
PF2
14
AFF_APPLYP(Function PF, Stream pstream) → Stream result• PF – plan function
• pstream – stream of parameter values pi
• result – stream of results ri
• Asynchronous operator
q3
q4q5
PFPF
PFp1
p2
p3
Adaptive First Finished Apply in Parallel (AFF_APPLYP)
AFF_APPLYP
r1r2
r3
p4
p5
p6
PFp1, p2, p3
r1
p4
r3
p5
r2
p6
Functionalities of AFF_APPLYP
1. AFF_APPLYP initially forms a binary process tree by always setting fanout to 2 - init stage.
15
q0
q1
q3 q4
q2
q6q5
Coordinator
Level 1
Level 2
..........2. A monitoring cycle for a non-leaf query process is defined when number of received end-of-call messages equal to number of children.
2.1 After the first monitoring cycle AFF_APPLYP adds p new child processes - an add stage.
3. When an added node has several levels of children, the init stages of AFF_APPLYP s in the children will produce a binary sub–tree.
q0
q1
q3 q4
q2
q5
Coordinator
Level 1 q7
q9q8q10Level 2 q6 q11
17
......
4. AFF_APPLYP records per monitoring cycle i the average time ti to produce an incoming tuple from the children.
4.1 If ti decreases more than a threshold (25%) the add stage is rerun.
4.2 If ti increases we either add no more children or run a drop stage that drops one child and its children.
q0
q1
q3 q4
q2
q5
Coordinator
Level 1
q12q10Level 2 q6 q11
18
Adaptive Results- Example Query
0
50
100
150
200
250
300
Execu
tio
n T
ime (
Sec)
Non-parallel plan p=1, no drop stage, fo1=3 fo2=3
p=1, drop stage, fo1=2 fo2=3 p=2, no drop stage, fo1=4 fo2=5
p=2, drop stage, fo1=3 fo2=3 p=3, no drop stage, fo1=5 fo2=3.4
p=3, drop stage, fo1=4 fo2=3.25 p=4, no drop stage, fo1=6 fo2=8.7
p=4, drop stage, fo1=5 fo2=4.2 p=5, no drop stage, fo1=7 fo2=7.5
p=5, drop stage, fo1=6 fo2=7.8
19
AFF_APPLYP observations
• For example query :– The execution time with p=4 and no drop stage is the best. – It is more than 4 times faster with the sequential execution (non-
parallel).
• The execution time with p=2 and no drop stage is reasonably close to the best execution time ( 80% ).
• Drop stage makes insignificant changes in the execution
time.
• Fanout of each level on a process tree depends on the execution time of a web service invoked on that level. – AFF_APPLYP finds the optimized fanout for each level.
20
Outline
WSMED
Research Area
Adaptive Query Parallelization
Conclusion & Future work
Related work
• Similar to WSMS (U.Srivastava, J.Widom, K.Munagala, and R.Motwani, Query
Optimization over Web Services, VLDB 2006) WSMED also invoke parallel web service calls. In contrast, WSMED supports automated adaptive parallelization.
• In contrast to WSQ/DSQ(R.Goldman, and J.Widom, WSQ/DSQ: a practical
approach for combined querying of databases and the Web, SIGMOD 2000) ,WSMED produces non-materialized adaptive parallel plans based on parameter streams.
• Runtime optimization techniques (A. Gounaris, et al., Robust runtime
optimization of data transfer in queries over Web Services, ICDE 2008 ) investigate adaptation of buffer sizes in web service calls, not dealing with adaptive parallelism on web service calls.
21
Conclusion• WSMED can be accessed :
– through a URL http://udbl2.it.uu.se/WSMED/wsmed.html – without installing any software.
• Queries are expressed in SQL to dynamically compose data providing web services without any programming.– Makes any web service queryable with SQL
• AFF_APPLYP:– automatically parallelize web service calls.– adapts the process tree at runtime , based on the flow of result
stream without any static cost model.
• Adaptive Parallel plan with AFF_APPLYP makes possible to run expensive queries.
22
23
Future .....
• Generalize the strategy for queries mixed with dependent and independent web service calls, as well bushy trees (Ongoing work)
• Investigate different process arrangement strategies with the algebra operators.
• Setup a benchmark to simulate the parallel invocation of web services.
Thank you for your attention
?
24“The un-queried life is not worth living”