Top Banner
Extracting, Aligning, and Linking Data to Build Knowledge Graphs Craig Knoblock University of Southern California Thanks to my collaborators: Pedro Szekely, Linhong Zhu, Majid Ghasemi-Gol, Mohsen Taheriyan, Minh Pham, and Steve Minton
70

Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Jan 22, 2018

Download

Data & Analytics

Craig Knoblock
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Extracting, Aligning, and

Linking Data to Build

Knowledge Graphs

Craig Knoblock University of Southern California

Thanks to my collaborators: Pedro Szekely, Linhong Zhu, Majid Ghasemi-Gol, Mohsen Taheriyan, Minh Pham, and Steve Minton

Page 2: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Goal

USC Information Sciences Institute CC-By 2.0 2

raw messy disconnected clean organized linked

hard to query, analyze & visualize easy to query, analyze & visualize

Page 3: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Use Case: Human Trafficking

USC Information Sciences Institute CC-By 2.0 3

raw messy disconnected clean organized linked

hard to query, analyze & visualize easy to query, analyze & visualize

Page 4: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Use Case: Human Trafficking

USC Information Sciences Institute CC-By 2.0 4

100 million pages

~ 100 Web sites

help victims

prosecute traffickers

Page 5: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Example: Investigating a Reported Victim

San Diego, where else?

USC Information Sciences Institute CC-By 2.0 5

Page 6: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

DIG Interface: Find the locations where a

potential victim was advertised

CC-By 2.0 6

Page 7: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Steps To Build a KG

USC Information Sciences Institute CC-By 2.0 7

Crawling Extraction

Data Acquisition

Mapping To

Ontology

Entity Linking

& Similarity

Knowledge Graph

Deployment

Query &

Visualization

Elastic

Search

Graph

DB

schema.org geonames

Data

Acquisition

Feature

Extraction

Feature

Alignment

Entity

Resolution

Graph

ConstructionUser

Interface

Data

Acquisition

Page 8: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Data Acquisition

USC Information Sciences Institute CC-By 2.0 8

downloading relevant data

batch real-time

Web pagesWeb service database

CSV Excel XML JSON

Page 9: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Traditional Web Crawler

(e.g., Nutch, Scrapy)

CC-By 2.0 9USC Information Sciences Institute

Page 10: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Web Crawling

24/7

5,000 Pages/Hour

~100,000,000 pages

Total

Page 11: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Steps To Build a KG

USC Information Sciences Institute CC-By 2.0 11

Crawling Extraction

Data Acquisition

Mapping To

Ontology

Entity Linking

& Similarity

Knowledge Graph

Deployment

Query &

Visualization

Elastic

Search

Graph

DB

schema.org geonames

Data

Acquisition

Feature

Extraction

Feature

Alignment

Entity

Resolution

Graph

ConstructionUser

Interface

Page 12: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Feature Extraction

USC Information Sciences Institute CC-By 2.0 12

from raw sources to structured data

• extraction from text

• extraction from structured Web pages

• extraction of image features

Page 13: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Extraction

USC Information Sciences Institute CC-By 2.0 13

Page 14: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Structured Extraction

CC-By 2.0 14

Page 15: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Automated Extraction[Minton et al., Inferlink]

• Title• Description• Seller• Post Date• Expiry Date• Price• Location• Category• Member Since• Num Views• Post ID

USC Information Sciences Institute CC-By 2.0 15

Page 16: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Automated Extraction

Input: A Pile of Pages

USC Information Sciences Institute CC-By 2.0 16

Page 17: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Automated Extraction

input: a pile of pages

Classify byTemplates

pages clusteredby template

USC Information Sciences Institute CC-By 2.0 17

Page 18: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Automated Extraction

input: a pile of pages

Classify byTemplates

pages clusteredby template

InferExtractor

InferExtractor

InferExtractor

InferExtractor

extractor

USC Information Sciences Institute CC-By 2.0 18

Page 19: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Unsupervised Extraction Tool

USC Information Sciences Institute CC-By 2.0 19

Page 20: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Pretty Good Extractions

Want Extracted

Extra Jan. 23, 2015 Jan. 23, 2015 expires Feb

Partial Jan. 23, 2015 Jan. 23

Page 21: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Extraction Evaluation

Title Desc Seller Date Price Loc CatMemberSince

Expires Views ID

Perfect 1.0(50/50)

.76(37/49)

.95(40/42)

.83(40/48)

.87(39/45)

.51(23/45)

.68(34/50)

1.0(35/35)

.52(15/29)

.76(19/25)

.97(35/36)

PrettyGood

1.0(50/50)

.98(48/49)

.95(40/42)

.83(40/48)

.98(44/45)

.84(38/45)

.88(44/50)

1.0(35/35)

.55(16/29)

1.0(25/25)

1.0(36/36)

10 websites, 5 pages each

fields

USC Information Sciences Institute CC-By 2.0 21

Page 22: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Steps To Build a KG

USC Information Sciences Institute CC-By 2.0 22

Crawling Extraction

Data Acquisition

Mapping To

Ontology

Entity Linking

& Similarity

Knowledge Graph

Deployment

Query &

Visualization

Elastic

Search

Graph

DB

schema.org geonames

Data

Acquisition

Feature

Extraction

Feature

Alignment

Entity

Resolution

Graph

ConstructionUser

Interface

Page 23: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Feature Alignment

USC Information Sciences Institute CC-By 2.0 23

from multiple schemas to a common domain schema

- CSV, Excel

- Database tables

- Web services

- Extractors

- Nomenclature

- Spelling

Multiple Schemas

Page 24: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Karma: Mapping Data to Ontologies

ServicesRelationalSources

Karma

{ JSON-LD }

Hierarchical Sources

Schema.org

USC Information Sciences Institute CC-By 2.0 24

Page 25: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Semantic Labeling[Pham et al., ISWC’16]

Offer Place Person

name price idname

Offer

Column-1 Column-2 Column-3 Column-4

British Lee-Enfield No 4 MK 2 still …

1,000 68155c13de2f2532

Cabelas MilleniumRevolver in .45 colt

700 1711 Anderson Rd 12155a1a2938bc1e

Page 26: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Learning Semantic Types

Requirements:

Learn from a small number of examples

Distinguish both string and numeric values

Can be learned quickly and is highly scalable to large

numbers of semantic types

Person OrganizationCity State

name birthdate name namename

Person

name date city state workplace

1 Fred Collins Oct 1959 Seattle WA Microsoft

2 Tina Peterson May 1980 New York NY Google

Domain Ontology

Page 27: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Textual Data

Learning Semantic TypesTextual Data

Treat each column of data as a document

Apply TF-IDF Cosine Similarity

Page 28: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Numeric Data

Learning Semantic Types

Numeric Data:

Apply statistical hypothesis testing to

determine which distribution fits best

Apply Kolmogorov-Smirnov Test

Page 29: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Features for

Semantic Labeling

• Features

– KS = Kolmogorov-Smirnov

– MW = Mann-Whitney

CC-By 2.0 29USC Information Sciences Institute

Page 30: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Combining the Features for

Semantic Labeling

CC-By 2.0 30USC Information Sciences Institute

Page 31: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Automatically Assigned

Semantic LabelsOffername

CreativeWorkfragment

Offerdescription

Offeridentifier

OfferdatePosted

CreativeWorkFragment

35 Whelen Handi-Rifle

No Tags 35 Whelen Handi-rifle. Black synthetic stock/forearm, blued barrel. Text 601-813-7280 ….

245625390711756 October 19, 2015 12:43 pm

Cabelas Millenium Revolver in .45 colt

No Tags This single action is built to shoot and is a great way for any level of shooter to get involved with a single action. …

12155a1a2938bc1e July 11, 2015 5:17 pm

1711 Anderson Rd

swap stocks No Tags want to trade butler creek folding stock for black stock ruger mini stock folder by butler creek will swap even for full rifle stock ….

5815600fd181fe3b September 22, 2015 1:05 am

white

streetAddress does not appear in training data -> more similar to noisy data

Page 32: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Results on www.msguntrader.com

number of attributes 19

Correct prediction 16

Correct label is in the top 4 predictions 18

Accuracy 84%

MRR 89%

Page 33: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Results on Gun Sites

Evaluation Dataset

Average number of attributes 18

Total number of attributes 176

Correct prediction (Accuracy) 56%

Correct label is in the top 4 predictions 89%

MRR 70%

Page 34: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Steps To Build a KG

USC Information Sciences Institute CC-By 2.0 34

Crawling Extraction

Data Acquisition

Mapping To

Ontology

Entity Linking

& Similarity

Knowledge Graph

Deployment

Query &

Visualization

Elastic

Search

Graph

DB

schema.org geonames

Data

Acquisition

Feature

Extraction

Feature

Alignment

Entity

Resolution

Graph

ConstructionUser

Interface

Page 35: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Entity Resolution

USC Information Sciences Institute CC-By 2.0 35

merging records that refer to the same entity

missing data

incorrect data

scale (~100 million records)

techniques to address

Page 36: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Unsupervised Collective Entity Resolution

36

USC Information Sciences Institute

Page 37: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

same victim

same Trafficker

Unsupervised Collective Entity Resolution

USC Information Sciences Institute CC-By 2.0 37

Page 38: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Collective Entity Resolution[Zhu et al, ISWC’16]

Identifying and linking instances of the same real world entity

Quiet Comfort

25 Noise

Cancelling

Headphone

Bose

Electroni

c

Product

1

Noise

Cancelling

Headphones

Product

2

292

Premium

Noise

Cancelling

Headphones

Son

y

Product

3

599

Dish Washer

Bosch

Product

4

229

Bose Noise

Cancelling

Headphones

Bos

e

Product

5

299

price description

manufacturerproductMulti-Type Graph

Page 39: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Quiet Comfort

25 Noise

Cancelling

Headphone

Bose

Electroni

c

Product

1

Noise

Cancelling

Headphones

Product

2

292

Premium

Noise

Cancelling

Headphones

Son

y

Product

3

599

Dish Washer

Bosch

Product

4

229

Bose Noise

Cancelling

Headphones

Bos

e

Product

5

299

price description

manufacturerproductMulti-Type Graph

Collective Entity Resolution[Zhu et al, ISWC’16]

Identifying and linking instances of the same real world entity

Page 40: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Common Approach:

Pairwise Comparisons

Product 5 299Quiet Comfort 25 Noise Cancelling

Headphone

Bose

Electronic

299, 229 Bose Noise Cancelling HeadphonesBoseProduct 4

599 Dish WasherBoschProduct 3

292 Premium Noise Cancelling HeadphonesSonyProduct 2

Noise Cancelling HeadphonesSonyProduct 1

Price TitleManufacturer

Jaro0.5

distance0.2

Jaccard0.3

Acceptance Threshold: 0.8

Page 41: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Missing Values

Product 5 299Quiet Comfort 25 Noise Cancelling

Headphone

Bose

Electronic

299, 229 Bose Noise Cancelling HeadphonesBoseProduct 4

599 Dish WasherBoschProduct 3

292 Premium Noise Cancelling HeadphonesSonyProduct 2

Noise Cancelling HeadphonesSonyProduct 1

Price TitleManufacturer

Jaro0.5

distance0.2

Jaccard0.3

Page 42: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Multiple Values

Product 5 299Quiet Comfort 25 Noise Cancelling

Headphone

Bose

Electronic

299, 229 Bose Noise Cancelling HeadphonesBoseProduct 4

599 Dish WasherBoschProduct 3

292 Premium Noise Cancelling HeadphonesSonyProduct 2

Noise Cancelling HeadphonesSonyProduct 1

Price TitleManufacturer

Jaro0.5

distance0.2

Jaccard0.3

Page 43: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Weights

Product 5 299Quiet Comfort 25 Noise Cancelling

Headphone

Bose

Electronic

299, 229 Bose Noise Cancelling HeadphonesBoseProduct 4

599 Dish WasherBoschProduct 3

292 Premium Noise Cancelling HeadphonesSonyProduct 2

Noise Cancelling HeadphonesSonyProduct 1

Price TitleManufacturer

Jaro0.5

distance0.2

Jaccard0.30.5 0.2 0.3

Page 44: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Unidirectional

Product 5 299Quiet Comfort 25 Noise Cancelling

Headphone

Bose

Electronic

299, 229 Bose Noise Cancelling HeadphonesBoseProduct 4

599 Dish WasherBoschProduct 3

292 Premium Noise Cancelling HeadphonesSonyProduct 2

Noise Cancelling HeadphonesSonyProduct 1

Price TitleManufacturer

Jaro0.5

distance0.2

Jaccard0.30.5 0.2 0.3

Page 45: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Graph Summarization:

Original Graph

Quiet Comfort

25 Noise

Cancelling

Headphone

Bose

Electroni

c

Product

1

Noise

Cancelling

Headphones

Product

2

292

Premium

Noise

Cancelling

Headphones

Son

y

Product

3

599

Dish Washer

Bosch

Product

4

229

Bose Noise

Cancelling

Headphones

Bos

e

Product

5

299

price description

manufacturerproduct

Page 46: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Quiet Comfort

25 Noise

Cancelling

Headphone

Bose

Electroni

c

Product

1

Noise

Cancelling

Headphones

Product

2

292

Premium

Noise

Cancelling

Headphones

Son

y

Product

3

599

Dish WasherBosch

229

Bose Noise

Cancelling

HeadphonesBos

e

Product

5

299

Product

4

Similar Nodes simt(x, y)

Page 47: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Quiet Comfort

25 Noise

Cancelling

Headphone

Bose

Electroni

c

Product

1

Noise

Cancelling

Headphones

Product

2

292

Premium

Noise

Cancelling

Headphones

Son

y

Product

3

599

Dish Washer

Bosch

229

Bose Noise

Cancelling

Headphones

Bos

e

Product

5

299

Product

4

Graph Sumarization:

Super-Nodes

Page 48: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Quiet Comfort 25 Noise

Cancelling Headphone

Noise Cancelling

Headphones

Premium Noise

Cancelling Headphones

Dish Washer

Bose Noise Cancelling

Headphones

Super-nodes Ct(x)

0.7 0.2 0.1

0.7 0.2 0.1

0.2 0.7 0.1

0.2 0.7 0.1

0.1 0.1 0.8

probability that a node x belongs to each super-node

one matrix for each type

Ct

Page 49: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Noise

Cancelling

Headphones

Premium

Noise

Cancelling

Headphones

Dish WasherQuiet Comfort

25 Noise

Cancelling

Headphone

Bose Noise

Cancelling

Headphones

Similar Nodes Should Be In The Same

Super-Node

Page 50: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Quiet Comfort

25 Noise

Cancelling

Headphone

Bose

Electroni

c

Product

1

Noise

Cancelling

Headphones

Product

2

292

Premium

Noise

Cancelling

Headphones

Son

y

Product

3

599

Dish Washer

Bosch

229

Bose Noise

Cancelling

Headphones

Bos

e

Product

5

299

Product

4

Super-Links

Page 51: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Quiet Comfort

25 Noise

Cancelling

Headphone

Bose

Electroni

c

Product

1

Noise

Cancelling

Headphones

Product

2

292

Premium

Noise

Cancelling

Headphones

Son

y

Product

3

599

Dish Washer

Bosch

229

Bose Noise

Cancelling

Headphones

Bos

e

Product

5

299

Product

4

Super-Links

Page 52: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Bose

Electroni

c

Product

3

Bosch

Bos

e

Product

5

Product

4

Predict Links In Original Graph

Page 53: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Bose

Electroni

c

Product

3

Bosch

Bos

e

Product

5

Product

4

Predict Links In Original Graph

Bose

Electroni

c

Product

3

Bosch

Bos

e

Product

5

Product

4

Page 54: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Predict Links In Original Graph

Bose

Electroni

c

Product

3

Bosch

Bos

e

Product

5

Product

4

Page 55: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Re-Clustering Improves Reconstruction

Quality

Bose

Electroni

cProduct

3

Bosch

Bos

e

Product

5

Product

4

Bose

Electroni

c

Product

3

Bosch

Bos

e

Product

5

Product

4

Page 56: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Comparable Approaches

Pairwise Clustering Unsupervised Supervised

Limes, Ngomo’11 ✔ ✔

SILK, Isele’10 ✔ ✔ ✔

Serf, Benjelloun’10 ✔ ✔

*Commercial, Kӧpcke’10 ✔ ✔

GraphSum, Riondato’14 ✔ ✔

*AuthorLDA, Bhattacharya’07 ✔ ✔

CoSum (proposed) ✔ ✔

Page 57: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Quality Comparison

Precision Recall F-measure

Author Paper Product Author Paper Product Author Paper Product

Limes-F 0.958 0.827 0.446 0.864 0.761 0.16 0.909 0.792 0.236

Silk-F 0.846 0.877 0.459 0.986 0.756 0.348 0.91 0.812 0.395

Gsum 0.727 0.668 0.01 0.569 0.624 0.587 0.638 0.645 0.02

CoSum-B 0.993 0.871 0.58 0.94 0.611 0.477 0.966 0.718 0.524

Limes-MO 0.912 0.827 0.446 0.944 0.761 0.16 0.928 0.792 0.236

Silk-MO 0.932 0.877 0.459 0.958 0.756 0.348 0.945 0.812 0.395

Serf 0.985 0.837 0.436 0.687 0.808 0.186 0.809 0.822 0.261

CoSum-P 0.999 0.771 0.639 0.997 0.997 0.695 0.998 0.87 0.666

Commercial 0.615 0.63 0.622

AuthorLDA 0.995

Page 58: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Steps To Build a KG

USC Information Sciences Institute CC-By 2.0 58

Crawling Extraction

Data Acquisition

Mapping To

Ontology

Entity Linking

& Similarity

Knowledge Graph

Deployment

Query &

Visualization

Elastic

Search

Graph

DB

schema.org geonames

Data

Acquisition

Feature

Extraction

Feature

Alignment

Entity

Resolution

Graph

ConstructionUser

Interface

Page 59: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Graph Construction

USC Information Sciences Institute CC-By 2.0 59

assembling the data for efficient query & analysis

- ElasticSearch: scalable, efficient query

- graph databases: network analytics

- NoSQL: scalable analytics

- bulk loading: massive data imports

- real-time updates: live, changing data

Page 60: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

elasticsearch

• Cloud-based search engine

• Based on Apache Lucene

• Horizontal scaling, replication, load balancing

• Blazingly fast!

• Everything is a document

– Documents are JSON objects

– Index what you want to find

– Fields can contain strings, numbers, booleans, etc.

CC-By 2.0 60USC Information Sciences Institute

Page 61: Extracting, Aligning, and Linking Data to Build Knowledge Graphs
Page 62: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

AdultService

Offer Person

Efficient indexing and query

PhoneWebPage

ElasticSearch Data Model

Page 63: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Offers As Roots

Page 64: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Products (AdultService) As Roots

Page 65: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Indexing for High Performance

Knowledge Graph QueriesAvg. Query Times in Milliseconds

Single User Query Load1.2 billion triples

State of the Art Graph Database (RDF)

DIG indexing deployed in ElasticSearchUSC Information Sciences Institute CC-By 2.0 65

Page 66: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Steps To Build a KG

USC Information Sciences Institute CC-By 2.0 66

Crawling Extraction

Data Acquisition

Mapping To

Ontology

Entity Linking

& Similarity

Knowledge Graph

Deployment

Query &

Visualization

Elastic

Search

Graph

DB

schema.org geonames

Data

Acquisition

Feature

Extraction

Feature

Alignment

Entity

Resolution

Graph

ConstructionUser

Interface

Page 67: Extracting, Aligning, and Linking Data to Build Knowledge Graphs
Page 68: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

DIG Deployment for Human Trafficking

USC Information Sciences Institute CC-By 2.0 68

- 100 million Web pages

- Live updates (~5,000 pages/hour)

- ElasticSearch database (7 nodes)

- Hadoop workflows (20 nodes)

- District Attorney

- Law Enforcement

- NGOs

Page 69: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

DIG Applications

Human Trafficking large, real users

Material Science Research 70,000 paper abstracts (built in 1 week)

Arms Traffickingidentify illegal sales

Patent Trollsidentifies patent trolls

Predicting Cyber Attackscombines diverse sources about vulnerabilities, exploits, etc.

CC-By 2.0 69USC Information Sciences Institute

Page 70: Extracting, Aligning, and Linking Data to Build Knowledge Graphs

Conclusions

• Presented the end-to-end tool-chain to build domain-specific knowledge graphs

• Integrates heterogeneous data: web pages, databases, CSV, web APIs, images, etc.

• Approach scales to million of pages, and billions facts

• Has been used to build real-world deployed applications

USC Information Sciences Institute CC-By 2.0 70