Data Science Research @ Introduction Ontological Pathfinding Experiments Ontological Pathfinding: Mining First-Order Knowledge from Large Knowledge Bases Yang Chen, Sean Goldberg, Daisy Zhe Wang, Soumitra Siddharth Johri {yang,sean,daisyw}@cise.ufl.edu, soumitra.johri@ufl.edu Computer and Information Science and Engineering University of Florida SIGMOD’16, San Francisco, CA Jun 29, 2016 Ontological Pathfinding Jun 29, 2016 1/25
82
Embed
Ontological Pathfinding: Mining First-Order Knowledge from ...yang/doc/sigmod16/slides.pdf · Ontological Path nding: Mining First-Order Knowledge from Large Knowledge Bases Yang
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Ontological Pathfinding: Mining First-OrderKnowledge from Large Knowledge Bases
Yang Chen, Sean Goldberg, Daisy Zhe Wang, SoumitraSiddharth Johri
Knowledge BasesA knowledge base organizes human information in a structuredformat.
Predicate Subject ObjectisLocatedIn Washington, D.C. United StateshasCapital Canada OttawawasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesdealsWith United States Canada
Freebase: 112M entities, 388M facts;Is it possible to mine first-order rules from Freebase?
Ontological Pathfinding Jun 29, 2016 7/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Contributions
Goal: Mining first-order knowledge from web-scale knowledgebases.
Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.
Contributions:
Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)
Ontological Pathfinding Jun 29, 2016 8/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Contributions
Goal: Mining first-order knowledge from web-scale knowledgebases.
Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours;
publish the first Freebase rule set.
Contributions:
Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)
Ontological Pathfinding Jun 29, 2016 8/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Contributions
Goal: Mining first-order knowledge from web-scale knowledgebases.
Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.
Contributions:
Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)
Ontological Pathfinding Jun 29, 2016 8/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Contributions
Goal: Mining first-order knowledge from web-scale knowledgebases.
Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.
Contributions:
Partition KB into independent subsets to reduce join sizes.
(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)
Ontological Pathfinding Jun 29, 2016 8/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Contributions
Goal: Mining first-order knowledge from web-scale knowledgebases.
Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.
Contributions:
Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)
Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)
Ontological Pathfinding Jun 29, 2016 8/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Contributions
Goal: Mining first-order knowledge from web-scale knowledgebases.
Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.
Contributions:
Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.
(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)
Ontological Pathfinding Jun 29, 2016 8/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Contributions
Goal: Mining first-order knowledge from web-scale knowledgebases.
Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.
Contributions:
Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)
Prune inefficient and erroneous candidate rules.(Make joins possible.)
Ontological Pathfinding Jun 29, 2016 8/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Contributions
Goal: Mining first-order knowledge from web-scale knowledgebases.
Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.
Contributions:
Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.
(Make joins possible.)
Ontological Pathfinding Jun 29, 2016 8/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Contributions
Goal: Mining first-order knowledge from web-scale knowledgebases.
Result: Design the Ontological Pathfinding algorithm to mine36,625 inference rules from Freebase (388M facts) in 34hours; publish the first Freebase rule set.
Contributions:
Partition KB into independent subsets to reduce join sizes.(Improve runtime from 2.55 days to 5.06 hours for a singletask.)Design a parallel rule mining algorithm for each partition.(Achieve 3-6 times of speedup.)Prune inefficient and erroneous candidate rules.(Make joins possible.)
exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.
exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.
exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.
exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.
exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.
exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.
exports United States Computerexports Canada Aluminumimports United States Aluminumimports United States ClothingdealsWith Canada United StatesisLocatedIn Washington, D.C. United StatesisLocatedIn Ottawa CanadaisLocatedIn Stanford University Stanford, CaliforniahasCapital Canada OttawahasCapital United States Washington, D.C.wasBornIn Donald Knuth Milwaukee, WisconsinisCitizenOf Donald Knuth United StatesworksAt Donald Knuth Stanford UniversityhasAcademicAdvisor Donald Knuth Marshall Hall, Jr.
(a) Γ (b) M
Partition 1
Partition 2
M1
M2Γ1
Γ2
Ontological Pathfinding Jun 29, 2016 12/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Partitioning
Joining partitioned RDDs requires:
O(tl−1|S||M |)→ O(tl−1sm|M |).
bounded above by the largest partition size sm.
Ontological Pathfinding Jun 29, 2016 13/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Group facts by the join variable z.
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Group facts by the join variable z.
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
For each group, apply inference rules by an in-memory hashjoin, each fact noted by the inferring rule or “0” for basefacts.
R1
R2
R3
R1
R2
R3
Group joins
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.
R1
R2
R3
R1
R2
R3
Group joins Group by facts
R1
R2
R3
F1F2, F5
F4
F2, F3F1
F2
F5F3, F4
F5
Group by join variables
Rules = {R1, R2, R3}
Ontological Pathfinding Jun 29, 2016 14/25
Data Science Research
@
Introduction Ontological Pathfinding Experiments
Parallel Rule Mining
p(x, y)← q(x, z), r(y, z).
Map each fact to (fact, {r}) pair, {r} containing a list ofinferring rules.