SciQL A Query Language for Unified Scientific Data Processing and Management Javad Chamanara University of Jena, Germany [email protected] At: CIKM 2012, Maui, HI, USA Nov. 2, 2012
Nov 15, 2014
SciQLA Query Language for Unified Scientific Data Processing and
Management
Javad ChamanaraUniversity of Jena, Germany
CIKM 2012, Maui, HI, USANov. 2, 2012
[email protected] 2SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
What is scientific data?
November 2, 2012
[email protected] 3SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
What is available?
November 2, 2012
[email protected] 4SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
What is proposed here?
November 2, 2012
[email protected] 5SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
What does it provide?
November 2, 2012
[email protected] 6SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
A SampleDefine Perspective p1 As{
Attribute Temp_Fahrenheit MapTo Function(1.8 * Temp_Celsius + 32)Attribute SN_mg MapTo Function(SN_g * 1000)Attribute Year MapTo Function(Year(Timestamp)) DataType=Integer
}Connection d Adapter=Spreadsheet Source_URI="c:\data\data1.xls"Bind Perspective=p1 Connection=d Version=Latest As pdLatest
Var pdAll = Select From pdLatestDraw Data=pdLatest GraphType=Scatter V-Axis=NS_mg H-Axis=Temp_Fahrenheit
Var pdGroupped = Select Average(Temp_Fahrenheit) As Avg From pdLatest Group By Year
November 2, 2012
[email protected] 7SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
How does it work?
Var x = Select Average(Temp_Fahrenheit) As Avg From pdLatest Where Year > 2001 Group By Year
November 2, 2012
[email protected] 8SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
How does it work? (AST)
Fetch Filter AggregateProject
=
Select
VAR DEF
Var x Avg pdLatest >
Year 2001 Year
Group
November 2, 2012
[email protected] 9SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
How does it work? (E-AST, CSV Adapter)
Fetch Filter AggregateProject
=
Select
VAR DEF
Var x Avg pdLatest >
Year 2001 Year
Group
CSV
November 2, 2012
[email protected] 10SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
How does it work? (E-AST, Excel Adapter)
Fetch Filter AggregateProject
=
Select
VAR DEF
Var x Avg pdLatest >
Year 2001 Year
Group
Default
Default
Excel DefaultExcelExcel
November 2, 2012
[email protected] 11SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
How does it work? (E-AST, Database Adapter)
Fetch Filter AggregateProject
=
Select
VAR DEF
Var x Avg pdLatest >
Year 2001 Year
Group
Default DB DBDB
DB
DB
November 2, 2012
[email protected] 12SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
Design
• Grammar• Architecture• Execution Engine
November 2, 2012
[email protected] 13SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
SciQL Language Constructs
November 2, 2012
[email protected] 14SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
The Grammar
November 2, 2012
[email protected] 15SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
General Architecture
November 2, 2012
cmp Components
Custom Application
Matlab R Console Declarative Console
SciQL
CSV Spreadsheet R DBMS Other
Spreadsheet Adapter RDBMS Adapter Vendor Specific Adapter
[email protected] 16SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
Query Execution Engine
Query Execution
Engine
Adapter
Data
Source
Query Engine
E-AST Result set
November 2, 2012
[email protected] 17SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
Mapping cmp Perspectiv e
Attribute 1Data Field 1
Attribute 2
Data Field 2
Data Field 3
Data Field 4Attribute 3
Perpectiv e 1
Attribute 1Data Field 1
Attribute 2
Data Field 2
Data Field 3
Data Field 4Attribute 3
Port1
Data
Port1
Data Field 1
Data Field 2
Data Field 3
Attribute A
Attribute B
Attribute C
Perspectiv e 2
Data Field 1
Data Field 2
Data Field 3
Attribute A
Attribute B
Attribute C
November 2, 2012
[email protected] 18SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
What would be the benefits?
• Scientists deal with just one language• It has a data source independent instruction
set• Its easier to learn and share• Integration to other tools is easy• Mitigates the need for computer knowledge
November 2, 2012
[email protected] 19SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
The Evaluation Plan
• To be used in the context of BExIS– Big and diverse user community– Various data
• Open source and free– Early feedback– Contribution
November 2, 2012
[email protected] 20SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
The Work Plan
• Define the grammar of the language – 6-9 months
• Compare to related works and revise – 3-6 months
• Compile the formal specification of the language– 3-6 months
• Develop the proof of concept implementation – 9-12 months
• Evaluation – 6 months
November 2, 2012
[email protected] 21SciQL: A Query Language for Unified Scientific Data Processing and Management, 5th Ph.D. Workshop (PIKM) at CIKM 2012, Maui, HI, USA
Thanks
November 2, 2012