This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
Role of Natural Language
4
Data Management SystemData Management System
NL SQL XQuery
NL Tables XML RDF
SPARQL
VisUI
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
From Natural Language to Databases
5
NL SQL
NL Tables
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
From Databases to Natural Language?
6
NL SQL
NL Tablesmainly: EDBT’08, CIDR’09
related: ICDE’06/’07, VLDB J.17(1) ’08
…this paper
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
Motivation• Interesting applications
−education and training on query languages−query debugging
• e.g., (sub)queries responsible for empty results (sub)queries responsible for too many results
−query explanation and automatic commenting−do-it-yourself applications−query-by-form applications
7
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
Motivation• Gartner analysis on technologies that will have a
broad impact on all aspects of people’s lives [2008]−seven most important IT challenges for next 25
years−one of them
8
Automating computer-to-human speech translation
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
Example Query
9
What courses has Andreas taught?
SELECT titleFROM Instructors I, CourseSched S, Courses CWHERE I.name = “Andreas” and I.instrID = S.instrID and S.courseID = C.courseID
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
Example Query #2
10
WHAT????
SELECT S.name, count(distinct CO.CourseID)FROM STUDENTS S, STUDENTHISTORY SH, COURSES CO, COURSESCHED R, INSTRUCTOR IWHERE S.name=I.name and S.class=SH.year and S.SuID=SH.SuID and SH.CourseID=CO.CourseID and CO.CourseID = R.CourseID and R.InstrID=I.InstrID and R.year > all (SELECT CO.year FROM COURSES CO, COURSESCHED R, INSTRUCTORS I WHERE CO.CourseID = R.CourseID and R.InstrID = I.InstrID and I.name = ‘Baeza’ and CourseID NOT IN (SELECT C.CourseID FROM COURSES CO, COURSESCHED R, INSTRUCTORS I WHERE CO.CourseID=R.CourseID and R.InstrID=I.InstrID and R.Year>2000 GROUP BY C.CourseID HAVING COUNT(distinct R.Year) > 3) ) and not exists (SELECT * FROM COURSES C1, STUDENTHISTORY SH1 WHERE SH1.SuID = S.SuID and SH1.CourseID = C1.CourseID and C1.DepID = D.DepID and D.name = ‘EE’)GROUP BY S.nameHAVING count(distinct CO.CourseID) =< ALL (SELECT count(C.CourseID) FROM STUDENTS S, STUDENTHISTORY SH, COURSES CO WHERE S.SuID = SH.SuID and SH.CourseID = CO.CourseID and S.CLASS >2008 GROUP BY S.SuID)
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
Query Translation Challenges• Equivalent query expressions that should be
translated in the same natural-language expression−commutativity, associativity, and other algebraic
properties• Equivalent natural-language expressions among
which one should be chosen• Choice between declarative and procedural
translations• Natural translations that don’t follow the query
form but are based on mathematical semantics SELECT A.id, A.name FROM MOVIES M, CAST C, ACTOR
A WHERE M.id = C.mid and C.aid =
A.id GROUPBY A.id, A.name HAVING count(distinct M.year) = 1
Sometimes…“count = 1” means
“all”“Find actors whose movies are all in the same year”[CIDR’09]
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
select s.namefrom students swhere NOT EXISTS ( select * from students s2 where s2.GPA > s.GPA )
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
Capturing Semantics• Labels
−each node v has a conceptual meaning l(v)• relation STUDENTS “students”• attribute NAME “name”• function MAX “the greatest” • operators = “is” , ≤ “does not exceed” like “looks like”
−each edge can be annotated by a label
16
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
Capturing Semantics• Templates
−template label:−language with variables, loops, functions, and
macros• e.g.,
−generic templates• e.g., for
−specific templates
17
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
Example Query Graph
19
select s.name, s.GPA, c.title, i.name, co.textfrom students s, comments co studenthistory h, courses c, departments d, coursesched cs, instructors i,where s.suid=co.suid and s.suid=h.suid and h.courseid=c.courseid and c.depid=d.depid and c.courseid=cs.courseid and cs.instrid=i.instrid and s.class = 2011 and co.rating > 3 and cs.term = ‘spring’ and d.name = ‘CS’
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
Query Subject (QS)
20
912
15
15
3
+3 +3 +3 +3
primary / secondary relations
• QS is the starting point of the translation−a “central” primary relation w/ attributes projected in the
query−…or the closest primary relation
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
Traversal Strategies• BST Algorithm
−BST composes separate clauses for each query part, in the following order• pStr: translate the edges connecting relations to their
attributes• fStr: connect all query relations to the subject through
the query joins• wStr: translate the paths connecting relations to value
nodes
‘Find ’ + pStr + ‘ for ’ + fStr + ‘.’ +
‘ Return results only for ’ + wStr + ‘.’
21
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
Traversal Strategies: BST
22
“Find the title of courses, the name of instructors, the gpa and name of students, and the description of comments for courses that are taught by instructors, are taken by students that gave comments, and are offered by departments. Return results only for courses whose term is spring, students whose class is 2011, comments whose rating is greater than 3, and departments whose name is CS.”
pStr
fStr
wStr
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
Traversal Strategies• Multi-Reference Point Graph Traversal (MRP)
Algorithm−MRP avoids the creation of complex and lengthy
phrases• the translation is semantically split at multiple points,
called reference points (RPs)
• a reference point (RP) is −a relation with projections or −a branching point or −a leaf
23
“Find the names of students and the titles of the courses taken by these students and the names of the instructors that taught courses taken by these students”
“Find the names of students and the titles of the courses taken by these students and the names of the instructors that taught these courses”
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
Traversal Strategies: MRP
24
1
26
5
4
3
8
7
“Find the title of courses for courses that are offered by departments whose name is CS, and also, the gpa and name of students for students whose class is 2011 and that have taken these courses, and also, the description of comments for comments whose rating is greater than 3 and that are given by these students, and also, the name of instructors that teach courses whose term is spring.”
RP
RP RP RP
RP
traversing fromQS outwards
translating fromRPs inwards
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
−This strategy makes use of templates• find the minimum number of composeable templates• two templates are composeable if they share reference
points• combine templates over the query graph in the right
order
−Example templates:
25
C D <val> + l(C)gb name <val>
S H C T l(S) + “ have been in classes of ” +l(I)Iga
C T I
l(I) + “ ‘s lectures on ” + l(C) + “ in ” + <val>gc
term <val>
S: Students
H:StudentHistory
T:CourseSched
C:Courses
D:Department
I: Instructors
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA
1
Traversal Strategies: TMT
26
2
3
“Find the gpa and name of students whose class is 2011 and have been in classes of instructors and find the name of these instructors, whose lectures on courses are in spring and find the title of these CS courses and the description of comments whose rating is greater than 3 given by these students.”
l(I) + “ ‘s lectures on ” + l(C) + “ in ” + <val>
l(S) + “ have been in classes of ” +l(I)
<val> + l(C)
G. Koutrika, A. Simitsis, Y. Ioannidis, ICDE’10 – Long Beach, CA, USA