Review of Claremont Report on Database Research Jiaheng Lu Renmin University of China
Outline Five challenges on database research
Database engine revisitingDeclarative programmingStructured and unstructured dataCloud data managementMobile application
Our research to meet those challenges
数据库的挑战 :Senior database researcher Meeting Senior database researchers have
gathered every few years to assess the state of database research and to recommend problems and problem areas deserve additional focus. Laguna Beach, Calif. in 1989 Palo Alto, Calif. (“Lagunita”) in 1990 and 1995 Cambridge, Mass. in 1996 Asilomar, Calif. in 1998 Lowell, Mass . In 2003
Revisiting database engines(1)
Traditional data engine NOT work wellOLTP System: data provenance, schema
evolution and versioningText indexingMedia delivery……
Revisiting database engines(2)
Research topicsRemote RAM and flash as persistent mediaTreat query optimization and physical data a a
unified, adaptive, self-tuning taskCompressing and encrypting data with query
optimizationDesigning systems that embrace non-
relational data models
Declarative programming for Emerging platforms (1) Data-centric approach for emerging
platformsManycore chipsDistributed servicesCloud computing platforms…..
Declarative programming for Emerging platforms (2) Good examples
Map-reduce:
data-parallelism
Ruby, Rails
query-like logic
XQuery
The interplay of structured and unstructured data(1) Witnessing a growing amount of structured
dataMillions of database hidden (Deep Web)Millions of HTML tables and MashupsWeb 2.0 Service photo video websites
The interplay of structured and unstructured data(2) Research challenge:
Extract structured meaning for unstructured data (IR, ML)
Querying and deriving insight from heterogeneous data
Keyword queries Pay-as-you-go fashion
Cloud data management (1)
Cloud service: shared commodity hardware for computing and storageApplication service (salesforce.com)Storage service (Amazon Web service)Computing service (Google App Engine)Data service (Microsoft SQLServer data
center)
Cloud data management (2)
Research challengeSelf-management database: limited human
invention, various workloadsLarge scale query processing and optimizationData security and privacy with sharing
Our research to meet challenges
XML search Approximate string search Cloud data management Mobile data privacy DataSpace,……
XML search (1) XML twig query processing (SIGMOD’05,
VLDB’05) Problem Statement
Given an XML twig pattern Q, and an XML database D, we need to find ALL the matches of Q on D.
An XML tree:
s1
s2
f1
p1
t1
t2
Section
Title Figure
Twig pattern: Query answers:
(s1, t1, f1) (s2, t2, f1) (s1, t2, f1)
XML search (2) XML keyword search (ICDE’09)
Problem Statement How to efficiently rank the results of XML keyword
query
Contribution: Extend TF/IDF by incorporating the structure of
XML data
Approximate string search Approximate string queries (ICDE’08,09)
Problem Statement Given a collection of string data, how to efficiently
perform approximate search
…
Schwarzenger
Samuel Jackson
Keanu ReevesStar
Search
Output: strings s that satisfy Sim(q,s)≤δOutput: strings s that satisfy Sim(q,s)≤δ
SchwarrzengerSchwarrzenger
18
Main Example
Query
1,2,3,4
0,1,2,4
Merge
Final answers
DataGrams
stick (st,ti,ic,ck)
Candidate string ids {1,2,3,4}
{1,2,3}
Double check for the real edit distance
st
ti
ic
ckcount >=2
Performance bottleneck!
id strings
0 rich
1 stick
2 stich
3 stuck
4 static
ck
ic
st
ta
ti…
1,3
0,1,2,4
1,2,3,4
4
1,2,4
1,2,4
1,3
ed(s,q)≤1
Cloud data management WAMDM实验室的分布式存储系统实验平台
Web-desktop1
Web-desktop2 Web-desktop3
Master
HRegion (Tablet) Server
HRegion (Tablet) Server
Web-desktop1
Web-desktop2 Web-desktop3
Master(NameNode)
Slave(DataNode)
Slave(DataNode)
Hbase
HDFS
Research topics about cloud data Self management and self tuning
Query optimization on thousands of nodes