Top Banner
1 1 Querying Data Providing Web Services Manivasakan Sabesan Uppsala DataBase Laboratory Dept. of Information Technology Uppsala University Sweden
37

Ph d defense_Department of Information Technology, Uppsala University, Sweden

May 25, 2015

Download

Technology

Querying Data Providing Web Services

Manivasakan Sabesan
Department of Information Technology
Uppsala University
Sweden.

Abstract
Web services are often used for search computing where data is retrieved from servers providing information of different kinds. Such data providing web services return a set of objects for a given set of parameters without any side effects. There is need to enable general and scalable search capabilities of data from data providing web services, which is the topic of this Thesis.
The Web Service MEDiator (WSMED) system automatically provides relational views of any data providing web service operations by reading the WSDL documents describing them. These views can be queried with SQL. Without any knowledge of the costs of executing specific web service operations the WSMED query processor automatically and adaptively finds an optimized parallel execution plan calling queried data providing web services.
For scalable execution of queries to data providing web services, an algebra operator PAP adaptively parallelizes calls in execution plans to web service operations until no significant performance improvement is measured, based on monitoring the flow from web service operations without any cost knowledge or extensive memory usage.
To comply with the Everything as a Service (XaaS) paradigm WSMED itself is implemented as a web service that provides web service operations to query and combine data from data providing web services. A web based demonstration of the WSMED web service provides general SQL queries to any data providing web service operations from a browser.
WSMED assumes that all queried data sources are available as web services. To make any data providing system into a data providing web service WSMED includes a subsystem, the web service generator, which generates and deploys the web service operations to access a data source. The WSMED web service itself is generated by the web service generator.

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ph d defense_Department of Information Technology, Uppsala University, Sweden

1111

Querying Data Providing Web Services

Manivasakan Sabesan

Uppsala DataBase Laboratory

Dept. of Information Technology

Uppsala University

Sweden

Page 2: Ph d defense_Department of Information Technology, Uppsala University, Sweden

222

Outline

WSMED Architecture

Semantic Enrichments

Adaptive Parallelization

Web Service Query Service

Related Work & Future Directions

Page 3: Ph d defense_Department of Information Technology, Uppsala University, Sweden

33

It is difficult to retrieve data provided by web services:

• Web service applications must be developed using a regular programming language such Java or C#

WSMED :

• Simplifies searching web services data by using database queries

• Automatically generates collections of parallel programs to do search

• Automatically optimizes the generated programs

Our problem area

Page 4: Ph d defense_Department of Information Technology, Uppsala University, Sweden

44

Search information

Search information through web services

Automatically generated parallel programs

WSMED

US States information Place Details Weather Forecast information

Web service operations

Page 5: Ph d defense_Department of Information Technology, Uppsala University, Sweden

55

Our approach

WSMED, a web service based mediator prototype:

WSMED mediator

SQL query result

wrapper

DPWSO1

wrapper

DPWSO2

wrapper

DPWSOn

SOAP SOAP SOAP

WSDL

Data Providing Web Service Operations

Page 6: Ph d defense_Department of Information Technology, Uppsala University, Sweden

6

6

Relational WSMED view

ndb keyword descry gpcode

19080 Sweet Candies, Sweet chocolate 1900

………. ……… ………… ……….

View food is based on the web service operation :SearchFoodByDescription

select descryfrom foodwhere gpcode = ’1900’ and keyword = ’Sweet’;

SQL Query:

WSDL document

Page 7: Ph d defense_Department of Information Technology, Uppsala University, Sweden

77

Research questions

1. How can standards, such as WSDL and SOAP, be automatically utilized by a mediator?

2. How can database views of web service operations be automatically generated?

3. How can modern query optimization be used to provide efficient and scalable search from different web services?

4. How can the query optimizer speed up queries calling web service operations without any cost estimate?

5. How can data sources that are not accessible via web services be simply transformed into data providing web service operations?

6. How can Everything as a Service paradigm be used for querying web services?

Page 8: Ph d defense_Department of Information Technology, Uppsala University, Sweden

88

Web Service MEDiator (WSMED) system architecture

WSDL importer: extracts meta data from WSDL document using Web Service Schema and store them in the Web service meta-database

Web Service Manager: invokes the web service operation to retrieve the data

WSMED enrichments: contains the semantic enrichments

WSDL Importer

Web serviceManager

SQL queryQuery

Processor

WSMED enrichments

Web serviceSchema

Web service Meta-database

Results

Web ServiceWSDL

document

Page 9: Ph d defense_Department of Information Technology, Uppsala University, Sweden

999

Outline

WSMED Architecture

Semantic Enrichments

Adaptive Parallelization

Web Service Query Service

Related Work & Future Directions

Page 10: Ph d defense_Department of Information Technology, Uppsala University, Sweden

101010

Semantic enrichments

Manually define SQL views over web service operations defined by imported WSDL

Manually add semantic enrichments to help WSMED improve the query performance

Page 11: Ph d defense_Department of Information Technology, Uppsala University, Sweden

1111

create view food(ndb, keyword, descry, gpcode)as <wrapper definition>;

create view foodclasses(ndb, keyword, gpcode) as select ndb, keyword, gpcode from food;

create view fooddescriptions(ndb, descry) as select ndb, descry from food;

Multi-level views

SQL query accesses the above views:

select fd.descryfrom foodclasses fc, fooddescriptions fdwhere fc.ndb=fd.ndb and fc.gpcode=’1900’;

Page 12: Ph d defense_Department of Information Technology, Uppsala University, Sweden

1212

Query execution strategies

No query optimization

Heuristic cost model: very simple manual heuristic cost model of web service operation cost and naïve join strategy

Hash join strategy: heuristic cost model + hash join

Semantic enrichment: key of the view is also specified

Page 13: Ph d defense_Department of Information Technology, Uppsala University, Sweden

1313

Comparison of query execution strategies

0

1000

2000

3000

4000

5000

6000

0 200 400 600 800 1000

Number of food items

Ex

ec

uti

on

tim

e (

se

c)

no optimization heuristic cost model

hash join semantic enrichment

Page 14: Ph d defense_Department of Information Technology, Uppsala University, Sweden

1414

Full semantic enrichment Vs hash join

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

0 100 200 300 400 500 600 700 800 900

Number of Food Items

Res

pons

e T

ime(

sec)

hash join semantic enrichment

Hash join requires memory to materialize results of the web service calls

Page 15: Ph d defense_Department of Information Technology, Uppsala University, Sweden

151515

Outline

WSMED Architecture

Semantic Enrichments

Adaptive Parallelization

Web Service Query Service

Related Work & Future Directions

Page 16: Ph d defense_Department of Information Technology, Uppsala University, Sweden

161616

Adaptive parallelization

SQL Views are fully automatically generated

No semantic enrichments

Costs are not known of web service operations:

=> Need for adaptive query processing which changes the query plans while running the query

Page 17: Ph d defense_Department of Information Technology, Uppsala University, Sweden

171717

Queries calling data providing web services often have dependent calls:

Web service calls incur high-latency and high message setup cost.

A naïve implementation of an application making these calls sequentially is time consuming

WSMED :

• automatically generates parallel plans

• experimented with three operators for adaptive parallelization

Parallelization of queries calling dependent web service operations

WS1 WS2 WS3 WSn

Page 18: Ph d defense_Department of Information Technology, Uppsala University, Sweden

181818

Example query

select gl.City , gl.TypeIdfrom GetAllStates gs, GetPlacesWithin gp, GetPlaceList glwhere gs.state=gp.state and gp.distance=15.0 and gp.placeTypeToFind='City' and gp.place='Atlanta' and gl.placeName=gp.ToPlace+', '+gp.ToState and gl.MaxItems=100 and gl.imagePresence='true'

Finds information about places located within 15 km from each City named ’Atlanta‘ in all US states

Invokes 300 web service calls and returns a stream of 360 tuples

<City,

TypeId>GetAllStates GetPlacesWithin GetPlaceList<state> <ToPlace,

ToState>

<15,’City’,’Atlanta’> <100,’true’>

Page 19: Ph d defense_Department of Information Technology, Uppsala University, Sweden

191919

Parallel plans in WSMED

Parallel query plan

SQL queryCalculus

Generator

Parallel pipeliner

Plan function generator

Central plan creator

Plan splitter

Phase 1

Phase 2

central plan

Page 20: Ph d defense_Department of Information Technology, Uppsala University, Sweden

202020

Manually parallelized execution plan (FF_APPLYP)

Parallel pipeline of calls to plan functions PF1 and PF2 Manually specified fanout:

• fixed number of children in a level (e.g. fanout of level 2 is 3) Query processes qi: Processes executing plan functions

Level 2

q0

q1

q3 q4

q2 GetPlacesWithin

GetAllStates

GetPlaceListq5 q8q7q6

Coordinator

Level 1

Query

<State>

FF_APPLYP(PF2,3,ToPlace,ToState)

<City, TypeID>

γGetAllStates()

FF_ APPLYP(PF1,2,State)

<ToPlace, ToState>

Parallel Plan Process tree

Page 21: Ph d defense_Department of Information Technology, Uppsala University, Sweden

212121

Define process tree by manually specifying fanouts per level:

FF_APPLYP(Function PF, Integer fo, Stream pstream) → Stream result

PF – plan function

fo – fanout , values are manually set

pstream – stream of argument tuples for PF: ai

result – stream of results ri from PF

Asynchronous operator

q3

q4q5

PF

PF

PFp1

p2

p3

FF_APPLYP

FF_APPLYP

r1r2

r3

p4

p5

p6

PFp1, p2, p3

r1

p4

r3

p5

r2

p6

Page 22: Ph d defense_Department of Information Technology, Uppsala University, Sweden

222222

Observations

•Fastest execution time 56.4 sec outperformed with the speed-up of 4.3 the central plan (244.8 sec)

•Limitation: Manual specification of fanout

Non parallel plan Best execution time

Page 23: Ph d defense_Department of Information Technology, Uppsala University, Sweden

2323

AFF_APPLYP

1. AFF_APPLYP initially forms a binary process tree by always setting fanout to 2 - init stage.

23

q0

q1

q3 q4

q2

q6q5

Coordinator

Level 1

Level 2

Automatically adapts process tree at run time:

AFF_APPLYP(Function PF, Stream pstream) → Stream result

Page 24: Ph d defense_Department of Information Technology, Uppsala University, Sweden

2424

AFF_APPLYP (cont.)

2. Executes a monitoring cycle for each invocation of PF for argument tuple ai in non-leaf node

2.1 After the first monitoring cycle AFF_APPLYP adds p new child processes - an add stage to compare performance change

3. When an added node has several levels of children, recursive init stages of AFF_APPLYP s will produce a binary sub–tree

q0

q1

q3 q4

q2

q5

Coordinator

Level 1 q7

q9q8q10Level 2 q6 q11

Page 25: Ph d defense_Department of Information Technology, Uppsala University, Sweden

252525

AFF_APPLYP (cont.)

4. AFF_APPLYP records per monitoring cycle i the average time ti to produce an incoming tuple from the children

4.1 If ti decreases more than a threshold the add stage is rerun

4.2 If ti increases we either add no more children or run a drop stage that drops one child and its children

q0

q1

q3 q4

q2

q5

Coordinator

Level 1

q12q10Level 2 q6 q11

Page 26: Ph d defense_Department of Information Technology, Uppsala University, Sweden

262626

Adaptive results

p- number of children added after each monitoring cycle

Methods with different p value0

50

100

150

200

250

300

1

Exe

cuti

on

Tim

e(se

c)

Non-parallel plan best FF_APPLYP

p=1, no drop stage, fo1=3 fo2=3 p=1, drop stage, fo1=2 fo2=3

p=2, no drop stage, fo1=4 fo2=5 p=2, drop stage, fo1=3 fo2=3

p=3, no drop stage, fo1=5 fo2=3.4 p=3, drop stage, fo1=4 fo2=3.25

p=4, no drop stage, fo1=6 fo2=8.7 p=4, drop stage, fo1=5 fo2=4.2

p=5, no drop stage, fo1=7 fo2=7.5 p=5, drop stage, fo1=6 fo2=7.8

Non parallel plan

Best FF_APPLYPBest AFF_APPLYP

Page 27: Ph d defense_Department of Information Technology, Uppsala University, Sweden

272727

The PAP operator adaptively parallelizes independent and dependent calls

AFF_APPLYP can also handle independent calls, but will treat them as a sequence (suboptimal):

Parameterized adaptive parallelization

WS5

WS1

WS2 WS3

WS4

WS1 WS2 WS3 WS4 WS5

Queries calling data providing web services often have both dependent & independent calls:

Page 28: Ph d defense_Department of Information Technology, Uppsala University, Sweden

28

PAP(Vector of Function VPF, Stream pstream ,Vector argorder, Vector resorder ) → Stream result

VPF – set of plan function pstream – stream of argument values pi

argorder – arguments order resorder – result order result – stream of results rj

Different plan functions use different argument values from an argument tuple in pstream• argorder specifies for each plan function how to form the its arguments

Similarly resorder specifies how the result of PAP is constructed from the results of its children

Asynchronous operator

PAP operator(Parameterized Adaptive Parallelization)

Page 29: Ph d defense_Department of Information Technology, Uppsala University, Sweden

292929

Experimental study

Cached dependent (CD): Modifies D by caching the results of web service operation calls using AFF_APPLYP

WS1 WS2 WS3 WS4 WS5

WS1 WS2 WS3 WS4 WS5

Cache

Dependent (D):All web service operations are using AFF_APPLYP

Independent (I): Parallel independent calls using PAP

WS5

WS1

WS2 WS3

WS4

Page 30: Ph d defense_Department of Information Technology, Uppsala University, Sweden

303030

Experimental results

Experiments with adaptive strategies Relative scalability

0

100

200

300

400

500

600

700

0 500 1000 1500 2000

No. of Zipcodes

Exec

utio

n tim

e (s

ec)

D CD I

0

20

40

60

80

100

120

140

160

0 500 1000 1500 2000

No. of Zipcodes

Exec

utio

n tim

e di

ffere

nce

(sec

)

D-I D-CD CD-I

Page 31: Ph d defense_Department of Information Technology, Uppsala University, Sweden

313131

Outline

WSMED Architecture

Semantic Enrichments

Adaptive Parallelization

Web Service Query Service

Related Work & Future Directions

Page 32: Ph d defense_Department of Information Technology, Uppsala University, Sweden

323232

WSMED assumes data sources are web service operations

• How handle a data providing system not available as web service?

The conventional way:

• Develop software, define WSDL, deploy the interface code

Our approach: WSMED Web Service Generator

• Once data source defined as Amos II mediator system

Automatically generates web service interfaces, generates WSDL, dynamically deploys the Web Service

The WSMED query service is automatically generated by the WSMED Web Service Generator

Everything as a Service paradigm (XaaS)

• URL to use WSMED web service: http://udbl2.it.uu.se/WSMED/wsmed.html

Web service query service

Page 33: Ph d defense_Department of Information Technology, Uppsala University, Sweden

333333

Outline

WSMED Architecture

Semantic Enrichments

Adaptive Parallelization

Web Service Query Service

Related Work & Future Directions

Page 34: Ph d defense_Department of Information Technology, Uppsala University, Sweden

3434

Contributions of papers

34

Research questions Paper I Paper II Paper III paper IV

1. How can web service standards be automatically utilized?

A A A

2. How can views of web service operations be automatically generated?

PA A A

3. How can query optimization be used to provide efficient and scalable search from web services?

PA PA A

4. How can the query optimizer speed up queries without any cost estimate?

PA PA A

5. How can data sources that are not accessible via web services be transformed into web services?

A

6. How can Everything as a Service paradigm be used for querying web services?

A

A- Answered PA – Partially answered

1. Paper1 - Semantic enrichments

2. Paper II - Adaptive parallelization with dependent calls: AFF_APPLYP

3. Paper III - Adaptive parallelization with dependent & independent calls: PAP

4. Paper IV - Web service query service

Page 35: Ph d defense_Department of Information Technology, Uppsala University, Sweden

353535

WSMS (U.Srivastava, J.Widom, K.Munagala, and R.Motwani, Query Optimization over Web Services, VLDB 2006)• WSMED also invokes parallel web service calls. • WSMS has static cost model• WSMED supports adaptive parallelization without any static cost model .

Eddies (R.Avnur, et al., Eddies: Continuously adaptive query processing, SIGMOD ,2000)• Adaptive operator

• Eddies dynamically adapting algebra expression

• PAP speeds up the calls to individual plan functions for a given algebra expression .

Two-phase query optimization strategies in distributed databases (Hasan, W. :Optimization of SQL queries for Parallel Machines, 1997) • Two-phase optimization

• Two-phase query optimization used static cost model to statically distribute execution plans

• WSMED supports adaptive parallelization without any static cost model.

Related work

Page 36: Ph d defense_Department of Information Technology, Uppsala University, Sweden

363636

Future directions

WSMED approach relies on calling side effect free data providing web service operations

• WSDL language does not provide meta-data describing side effects

• When such a standard is available WSMED can utilize it to guarantee query correctness by managing the updatable views.

All performance measurements were made with publicly available web service operations

• Development of a benchmark to simulate the parallel web service calls for controlled experiments.

Page 37: Ph d defense_Department of Information Technology, Uppsala University, Sweden

3737

Thank you for your attention

?

37“The un-queried life is not worth living”