Top Banner
GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova) and Tore Risch IT Department, Uppsala University, Sweden
22

GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

Sep 24, 2018

Download

Documents

lythien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

GRID Stream Database Managent for Scientific Applications

Milena Ivanova (Koparanova)and Tore Risch

IT Department, Uppsala University, Sweden

Page 2: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

Outline• Motivation• Stream Data Management• Computational GRIDs• GSDM Distributed Architecture• Project status• Next steps• GSDM Demo

Page 3: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

Motivation: LOFAR/LOIS Application

Data Fabric with13000 nodes

on-line hybrid data processing cluster

User post processing

Data Processing Plant

Control Center(s)

Page 4: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

Motivation: LOFAR/LOIS Application• Geographically distributed set of electromagnetic

sensors and emitters• Data products: Raw data streams (beams)• Very high data volume and rate• Complex numerical data• Continuous queries: filtering, reduction,

combining of data streams• User-defined computational functions

Problem:High–performance processing of distributed continuous queries over many data streams. Utilizing GRID infrastructure for achieving high-performance

Page 5: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

How stream data are different?Streams peculiarities:• Infinite• Representation of substreams of limited size: windows• Continuous Queries (CQs)• Immediate processing of elements, followed by

archiving or deletion• Order preserving and non blocking processing• Stream specific operations, e.g., windows join, moving

average• DBMS techniques for streams: approximate query

answering, adaptability, data reduction, multi-query optimization

Page 6: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

GRID Computing• Heterogeneous sets of clusters of computers• For applications with much need for CPU

cycles• Toolkits (e.g. GLOBUS, CONDOR-G) provide

transparent access to GRID clusters andparallelization of jobs

• Batch processing (upload – compute – download)

• High data volume bottleneck

Page 7: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

GSDM as a Computational GRIDs Application

High data flow rate and large data volumes require high performanceParallelism and distribution

Varying computer resource needs because of varying number of queries and streams, varying stream ratesDynamic resource allocation

Computational GRID infrastructure

Page 8: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

GSDM: GRID Stream Database Manager for Scientific Data

• Approach– Distributed and parallel– Stream-oriented– Main-memory Object-Relational DBMS– Utilization of GRID infrastructure for achieving

high-performance

Page 9: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

Working Node 1

QueryCoordinator 1

Application Application

Data Beam1 Data Beam 2

QueryCoordinator 2

GRID

CQ1 CQ2

MetadataManager

Working Node 3

Working Node 2

Working Node 4

Data Beam 3

Working Node 5

Legend:denotesdata flow

GSDM Scenario

Page 10: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

GSDM Distributed Architecture

Four types of nodes• Metadata manager• Coordinator server• Working node• Application

Page 11: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

Metadata and Coordinator Server

Functions:• store metadata about the system• decompose, install and activate CQs• assign queries to working nodes• start and kill working nodes • coordinate processing at working nodes

Page 12: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

System Metadata• Type GSDM

– name(GSDM) -> Charstring– working_node(GSDM) -> Boolean– coordinator(GSDM) -> Boolean– cur_load(GSDM) -> Real

• Type Query– qid(Query)->Charstring – query_string(Query)-> Charstring– producers(Query) -> Bag of GSDM– consumers(Query) -> Bag of GSDM– installed(Query,GSDM) -> Boolean– active (Query,GSDM)-> Boolean

Page 13: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

Working Node• Function: to process continuous queries over

streams• List of active CQ • CQ Execution loop

– Stream Consumer receives pushed data and incoming commands

– scheduler picks next CQ ready for execution– executes the CQ over current stream windows– Continuous Query Producer sends streamed result to

the consumer (application or working node)

Page 14: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

GSDM Working Node Architecture

GSDMWorking

Node

DatasourcesBeam 2

Application

Beam 1 GSDMStream

StreamConsumer

Plug-insQuery Executor

Continuous Query Manager

StreamConsumer

StreamConsumer

Cont QProducer

Cont QProducer

Query Executor

Continuous Query Manager

StreamConsumer

Page 15: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

GSDM Command Line User Interface

Commands sent from application Executed at coordinator

• monitor(Charstring qstring) -> Charstring qid;• run(Charstring qid) -> Boolean;• stop(Charstring qid) -> Boolean;• status(Charstring qid) -> <Charstring qstring,

Charstring prodname>;

Page 16: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

GSDM Communication Primitives

Metadata andCoordinator

WorkingNode

User Node

Monitor Query

Start nodeRegister WN

Install query qid, qstringInstalled qidActivate qid

Activated qidDeactivate qidDeactivated qid

Uninstall qidUninstalled qid

StatusQidQid Run

Qid

Kill node

Page 17: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

Current status

– GSDM prototype architecture– Initial GSDM prototype implementation – Formulation of continuous queries with

user-defined computational functions fromspace physics

– UDP receiver of radio on the Internet– Data beam simulator

Page 18: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

Related Work in Stream DBMS• Relatively low stream rates, centralized architectures• Relational representations and relational operations

modified for stream processing: selection, join, aggregations

Our system• High rates of LOIS/LOFAR streams• Distributed architecture• User-defined functions over non-relational data

representations

Page 19: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

GSDM requirements for GRID

• Interactive job: users can install and stop CQs interactively

• Accessibility & security issues– Data delivery directly to the working nodes,

e.g. inside of clusters– High-performance communication between

GSDM servers running on different GRID resources

Page 20: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

Next steps

• Evaluation of initial prototype implementation and development

• Optimization of distributed stream database queries

• Utilize GRID infrastructure functionality for resource brokering and management

• New GRID services for streaming data

Page 21: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

GSDM Demo Scenario

Working Node 1

Metadata and Coordinator Server

Application (UI)CQ

Name Server

Application GSDM

CQ Results

Page 22: GRID Stream Database Managent for Scientific Applicationsuser.it.uu.se/~torer/wim/GSDM_WIM03.pdf · GRID Stream Database Managent for Scientific Applications Milena Ivanova (Koparanova)

Demo scenario• Database schema

– start_udp_consumer(char s) -> integer;– stop_udp_consumer() -> integer;– change_udp_scale(real i) -> integer;– get_udp_packet() ->

< vector of complex, vector of complex, vector of complex >;

• Example CQcreate function visualized_fft()->boolean

as select repaint(v,6.0)

from vector of real v, vector of complex x,

vector of complex y, vector of complex z

where v= vect_log_magnitude(fft(windowed_fft(x))) and get_udp_packet()= <x,y,z>;