Top Banner
BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young University
106

BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

Mar 28, 2015

Download

Documents

Jazmin Highley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

BIG DATAConceptual Modeling to the Rescue

David W. Embleywith special thanks to Stephen W. Liddle

and the Data Extraction Research Group at Brigham Young University

Page 2: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 2

Roadmap• What is BIG DATA?• Why should Conceptual Modeling apply?• Examples to show how Conceptual Modeling

can “come to the rescue”• Summary (and take-home message):– Principles that guide the use of Conceptual

Modeling in BIG DATA applications– Challenges and Research Opportunities

Page 3: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 3

Roadmap• What is BIG DATA?• Why should Conceptual Modeling apply?• Examples to show how Conceptual Modeling

can “come to the rescue”• Summary (and take-home message):– Principles that guide the use of Conceptual

Modeling in BIG DATA applications– Challenges and Research Opportunities

Page 4: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 4

BIG DATA

• Volume: typically exceeding terabytes

• Variety: heterogeneous sources; diverse needs

• Velocity: phenomenal rate of acquisition

• Veracity: trustworthiness & uncertainty

Page 5: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 5

Volume: Kilobyte (103)A paragraph of text

Page 6: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 6

Volume: Megabyte (106)A small novel

Page 7: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 7

Volume: Gigabyte (109)Sound wave of Beethoven’s Fifth Symphony

Page 8: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 8

Volume: Terabyte (1012)All the X-ray images in a large hospital

Page 9: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 9

Volume: Petabyte (1015)10 billion Facebook photos

Page 10: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 10

Volume: Exabyte (1018)1/5 of the words ever spoken

Page 11: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 11

Volume: Zettabyte (1021)Grains of sand on all the world’s beaches

Page 12: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 12

Volume: Yottabyte (1024)Atoms in 7,000 human bodies

NSA data site – purportedly designed to store yottabytes of data.

Page 13: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 13

Variety: Heterogeneous Sources& Diverse Needs

Radiology Report

(John Doe, July 19, 12:14 pm)

Page 14: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 14

VelocityAstronomers expect to be processing 10 petabytes

of data every hour from the SKA telescope.

Square Kilometer Array Telescope

Page 15: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 15

VelocityOne minute on the Internet:

640TB data transferred, 100k tweets,204 million e-mails sent

Page 16: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 16

Veracity: Uncertainty

• An age-old question: “What is truth?”

• Einstein: “The pursuit of truth and beauty is a sphere of activity in which we are permitted to remain children all our lives.”

• Of one thing we can be certain:

Page 17: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 17

Roadmap• What is BIG DATA?• Why should Conceptual Modeling apply?• Examples to show how Conceptual Modeling

can “come to the rescue”• Summary (and take-home message):– Principles that guide the use of Conceptual

Modeling in BIG DATA applications– Challenges and Research Opportunities

Page 18: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 18

Conceptual Modeling & BIG DATA

• Main thrust: organizing data [Chen, TODS’76]• And, that’s one of the challenges of BIG DATA …

but– Volume: too big– Variety: too much– Velocity: too fast– Veracity: too uncertainty

Page 19: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 19

Looking Backward

select PART-NO, QUANTITY-ON-HANDwhere …

Page 20: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 20

Looking Forward• Conceptualization of the Web

– Semantic search as well as keyword search– World-wide knowledge sharing

• Examples:– DB-pedia– Conceptual Graphs

• Google’s Knowledge Graph• Yahoo!’s Web of Objects• Facebook’s Graph Search• Microsoft’s/Bing’s Satori Knowledge Base

– Metaweb– FamilySearch

• Conceptual Modeling should apply!

Page 21: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 21

SELECT ?name ?description_en ?description_de ?musician WHERE { ?musician <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:German_musicians> . ?musician foaf:name ?name . OPTIONAL { ?musician rdfs:comment ?description_en . FILTER (LANG(?description_en) = 'en') . } OPTIONAL { ?musician rdfs:comment ?description_de .

Page 22: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 22

Google’s Knowledge Graph

Page 23: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 23

Yahoo!’s Web of ObjectsYahoo!’s image answer to: What is a food?

Page 24: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 24

Facebook’s Graph Search

Page 25: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 25

Satori Knowledge Base

Page 26: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 26

Metaweb

Boston ?

Page 27: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 27

Metaweb

Don’t forget to take Wendy to Boston’s birthday party at 2:00.

Page 28: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 28

Metaweb

Don’t forget to take Wendy to Boston’s birthday party at 2:00.

Page 29: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 29

Roadmap• What is BIG DATA?• Why should Conceptual Modeling apply?• Examples to show how Conceptual Modeling

can “come to the rescue”• Summary (and take-home message):– Principles that guide the use of Conceptual

Modeling in BIG DATA applications– Challenges and Research Opportunities

Page 30: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 30

Visitors per day: 85,000+Pages viewed per day: 5M+

A service provided by The Church of Jesus Christ of Latter-day Saints. © 2013 by

Intellectual Reserve, Inc. All rights reserved.

A free family history web site

Page 31: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 31

& the WoK-HD Project • FamilySearch– Volume:

• 1.8PB+ online (1.2B records along with 900M 2MB jpeg images)• 42PB+ offline (1.2B 30–40MB tiff images)

– Velocity:• 500M+ images in 2013• 200K+ volunteer indexers

• WoK-HD scanned-book project (within FamilySearch)– Volume: 100,000 books (3.5TB expected)– Velocity: 25,000 books / year

Page 32: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 32

WoK-HD(A Web of Knowledge Superimposed over Historical Documents)

… …

… …

Page 33: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 33

WoK-HD(A Web of Knowledge Superimposed over Historical Documents)

… …

grandchildren of Mary Ely

… …

Page 34: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 34

WoK-HD(A Web of Knowledge Superimposed over Historical Documents)

… …

… …

grandchildren of Mary Ely

Page 35: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 35

WoK-HD(A Web of Knowledge Superimposed over Historical Documents)

… …

grandchildren of Mary Ely

… …

Page 36: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 36

grandchildren of Mary Ely

WoK-HD(A Web of Knowledge Superimposed over Historical Documents)

… …

… …

Page 37: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 37

WoK-HD Construction

• Mitigating Velocity, Variety, & Volume– CM-based information extraction• PatternReader (for semi-structured text)• OntoSoar (for unstructured text)

– Automated information harvesting & organization• Assuring Veracity– CM-based query processing (with links and

reasoning chains for extracted information)– Automated analysis with evidence-based CMs

Page 38: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 38

PatternReaderTHE ELY ANCESTRY. 419SEVENTH GENERATION.241213. Mary Eliza Warner, b. 1826, dau. of Samuel Selden Warnerand Azubah Tully; m. 1850, Joel M. Gloyd (who was connected withChief Justice Waite's family),24331 1. Abigail Huntington Lathrop (widow), Boonton, N. J., b.1810, dau. of Mary Ely and Gerard Lathrop ; m. 1835, Donald McKenzie.West Indies, who was b. 1812, d. 1839.(The widow is unable to give the names of her husband's parents.)Their children1. Mary Ely, b, 1836, d. 1859.2. Gerard Lathrop, b. 1838.243312. William Gerard Lathrop, Boonton, N. J., b. 1812, d. 1882,son of Mary Ely and Gerard Lathrop; m. 1837, Charlotte BrackettJennings, New York City, who was b. 1818, dau. of Nathan TilestoneJennings and Maria Miller. Their children:1. Maria Jennings, b. 1838, d. 1840.2. William Gerard, b. 1840. ) .3. Donald McKenzie, b. 1840, d. 1843. ]4. Anna Margaretta, b. 1843.5. Anna Catherine, b. 1845.243314. Charles Christopher Lathrop, N. Y. City, b. 1817, d. 1865,son of Mary Ely and Gerard Lathrop ; m. 1856, Mary Augusta Andruss,992 Broad St., Newark, N. J., who was b. 1825, dau. of Judge CalebHalstead Andruss and Emma Sutherland Goble. Mrs. Lathrop diedat her home, 992 Broad St., Newark, N. J., Friday morning, Nov. 4,1898. The funeral services were held at her residence on Monday, Nov.7, 1898, at half-past two o'clock P. M. Their children:1. Charles Halstead, b. 1857, d. 1861.2. William Gerard, b. 1858, d. 1861.3. Theodore Andruss, b. i860.4. Emma Goble, b. 1862.Miss Emma Goble Lathrop, official historian of the New York Chapter of theDaughters of the American Revolution, is one of the youngest members to holdoffice, but one whose intelligence and capability qualify her for such distinction.Miss Lathrop is not without experience; in her present home and native city, Newark,N. J., she has filled the positions of secretary and treasurer to the Girls'Friendly Society for nine years, secretary and president of the Woman's Auxiliaryof Trinity Church Parish, treasurer of the St. Catherine's Guild of St. BarnabasHospital, and manager of several of Newark's charitable institutions which hergrandparents were instrumental in founding. Miss Lathrop traces her lineageback through many generations of famous progenitors on both sides. Her maternalancestors were among the early settlers of New Jersey, among them John Ogden,who received patent in 1664 for the purchase of Elizabethtown, and who in 1673 was

Page 39: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 39

PatternReaderTHE ELY ANCESTRY. 419SEVENTH GENERATION.241213. Mary Eliza Warner, b. 1826, dau. of Samuel Selden Warnerand Azubah Tully; m. 1850, Joel M. Gloyd (who was connected withChief Justice Waite's family),24331 1. Abigail Huntington Lathrop (widow), Boonton, N. J., b.1810, dau. of Mary Ely and Gerard Lathrop ; m. 1835, Donald McKenzie.West Indies, who was b. 1812, d. 1839.(The widow is unable to give the names of her husband's parents.)Their children1. Mary Ely, b, 1836, d. 1859.2. Gerard Lathrop, b. 1838.243312. William Gerard Lathrop, Boonton, N. J., b. 1812, d. 1882,son of Mary Ely and Gerard Lathrop; m. 1837, Charlotte BrackettJennings, New York City, who was b. 1818, dau. of Nathan TilestoneJennings and Maria Miller. Their children:1. Maria Jennings, b. 1838, d. 1840.2. William Gerard, b. 1840. ) .3. Donald McKenzie, b. 1840, d. 1843. ]4. Anna Margaretta, b. 1843.5. Anna Catherine, b. 1845.243314. Charles Christopher Lathrop, N. Y. City, b. 1817, d. 1865,son of Mary Ely and Gerard Lathrop ; m. 1856, Mary Augusta Andruss,992 Broad St., Newark, N. J., who was b. 1825, dau. of Judge CalebHalstead Andruss and Emma Sutherland Goble. Mrs. Lathrop diedat her home, 992 Broad St., Newark, N. J., Friday morning, Nov. 4,1898. The funeral services were held at her residence on Monday, Nov.7, 1898, at half-past two o'clock P. M. Their children:1. Charles Halstead, b. 1857, d. 1861.2. William Gerard, b. 1858, d. 1861.3. Theodore Andruss, b. i860.4. Emma Goble, b. 1862.Miss Emma Goble Lathrop, official historian of the New York Chapter of theDaughters of the American Revolution, is one of the youngest members to holdoffice, but one whose intelligence and capability qualify her for such distinction.Miss Lathrop is not without experience; in her present home and native city, Newark,N. J., she has filled the positions of secretary and treasurer to the Girls'Friendly Society for nine years, secretary and president of the Woman's Auxiliaryof Trinity Church Parish, treasurer of the St. Catherine's Guild of St. BarnabasHospital, and manager of several of Newark's charitable institutions which hergrandparents were instrumental in founding. Miss Lathrop traces her lineageback through many generations of famous progenitors on both sides. Her maternalancestors were among the early settlers of New Jersey, among them John Ogden,who received patent in 1664 for the purchase of Elizabethtown, and who in 1673 was

…1. Mary Ely, b, 1836, d. 1859.2. Gerard Lathrop, b. 1838.…1. Maria Jennings, b. 1838, d. 1840.2. William Gerard, b. 1840. ) .3. Donald McKenzie, b. 1840, d. 1843. ]4. Anna Margaretta, b. 1843.5. Anna Catherine, b. 1845.…1. Charles Halstead, b. 1857, d. 1861.2. William Gerard, b. 1858, d. 1861.3. Theodore Andruss, b. i860.4. Emma Goble, b. 1862.

Page 40: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 40

PatternReader

…1. Mary Ely, b, 1836, d. 1859.2. Gerard Lathrop, b. 1838.…1. Maria Jennings, b. 1838, d. 1840.2. William Gerard, b. 1840. ) .3. Donald McKenzie, b. 1840, d. 1843. ]4. Anna Margaretta, b. 1843.5. Anna Catherine, b. 1845.…1. Charles Halstead, b. 1857, d. 1861.2. William Gerard, b. 1858, d. 1861.3. Theodore Andruss, b. i860.4. Emma Goble, b. 1862.

Page 41: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 41

PatternReader

…1. Mary Ely, b, 1836, d. 1859.2. Gerard Lathrop, b. 1838.…1. Maria Jennings, b. 1838, d. 1840.2. William Gerard, b. 1840. ) .3. Donald McKenzie, b. 1840, d. 1843. ]4. Anna Margaretta, b. 1843.5. Anna Catherine, b. 1845.…1. Charles Halstead, b. 1857, d. 1861.2. William Gerard, b. 1858, d. 1861.3. Theodore Andruss, b. i860.4. Emma Goble, b. 1862.

OCR Error

“ Twins” (lost in OCR)}

Page 42: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 42

PatternReader

…#. Aaaa Aaaa, b, 18##, d. 18##.#. Aaaa Aaaa, b. 18##.…#. Aaaa Aaaa, b. 18##, d. 18##.#. Aaaa Aaaa, b. 18##. ) .#. Aaaa AaAa, b. 18##, d. 18##. ]#. Aaaa Aaaa, b. 18##.#. Aaaa Aaaa, b. 18##.…#. Aaaa Aaaa, b. 18##, d. 18##.#. Aaaa Aaaa, b. 18##, d. 18##.#. Aaaa Aaaa, b. i8##.#. Aaaa Aaaa, b. 18##.

^(\d)\.\s([A-Z][a-z]{3,7})\s([A-Z][a-z]{4,9}),\sb\.\s([i1]8\d\d)$

^(\d)\.\s(([A-Z][a-z][A-Z][a-z]{5})|([A-Z][a-z]{3,7}))\s([A-Z][a-z]{4,9}),\sb[.,]\s(18\d\d)\sd.\s(18\d\d)\.$

Conflate symbols and induce grammar

Page 43: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 43

PatternReader

…1. Mary Ely, b, 1836, d. 1859.2. Gerard Lathrop, b. 1838.…1. Maria Jennings, b. 1838, d. 1840.2. William Gerard, b. 1840. ) .3. Donald McKenzie, b. 1840, d. 1843. ]4. Anna Margaretta, b. 1843.5. Anna Catherine, b. 1845.…1. Charles Halstead, b. 1857, d. 1861.2. William Gerard, b. 1858, d. 1861.3. Theodore Andruss, b. i860.4. Emma Goble, b. 1862.

Page 44: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 44

PatternReader

…1. Mary Ely, b, 1836, d. 1859.2. Gerard Lathrop, b. 1838.…1. Maria Jennings, b. 1838, d. 1840.2. William Gerard, b. 1840. ) .3. Donald McKenzie, b. 1840, d. 1843. ]4. Anna Margaretta, b. 1843.5. Anna Catherine, b. 1845.…1. Charles Halstead, b. 1857, d. 1861.2. William Gerard, b. 1858, d. 1861.3. Theodore Andruss, b. i860.4. Emma Goble, b. 1862.

Page 45: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 45

Conceptual Modeling—the Backbone

Page 46: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 46

Conceptual Modeling—the Backbone

(\d)\.\s([A-Z][a-z]{3,7})\s([A-Z][a-z]{4,9}),\sb\.\s([i1]8\d\d)

Page 47: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 47

Conceptual Modeling—the Backbone

(\d)\.\s([A-Z][a-z]{3,7})\s([A-Z][a-z]{4,9}),\sb\.\s([i1]8\d\d)

Page 48: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 48

Conceptual Modeling—the Backbone

Page 49: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 49

Extraction OntologiesLinguistically Grounded Conceptual Models

Page 50: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote

Lexical Object-Set Recognizers

50

BirthDate external representation: \b[1][6-9]\d\d\b left context: b\.\s right context: [.,] …

Page 51: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote

Non-lexical Object-Set Recognizers

51

Person object existence rule: {Name} …Name

external representation: \b{FirstName}\s{LastName}\b …

Page 52: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote

Relationship-Set Recognizers

52

Person-BirthDate external representation: ^\d{1,3}\.\s{Person},\sb\.\s{BirthDate}[.,] …

Page 53: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote

Ontology-Snippet Recognizers

53

ChildRecord external representation: ^(\d{1,3})\.\s+([A-Z]\w+\s[A-Z]\w+) (,\sb\.\s([1][6-9]\d\d))?(,\sd\.\s([1][6-9]\d\d))?\.

Page 54: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

54

HMM Recognizers

Page 55: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 55

OntoSoar Recognizers

Page 56: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 56

OntoSoar Recognizers

+---------------------------------Xp------------------------------+| +--------Ost--------+ +-----Js-----+ |+-Wd-+-Ss-+ +-----A-----+--Mp---+ +---DG--+ || | | | | | | | |^ Emma was.v official.a historian.n of the NYCDAR .

“of”(x1,x2)“NYCDAR”(x2)“Emma”(x1)“historian”(x1)“official”(x1)

Name(“Emma”)Officer(“historian”)Organization(“NYCDAR”)Person–Name(y1,“Emma”)

OntoESSoar

Person-Officer-Organization(y1,“official historian”,“NYCDAR”)

Page 57: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 57

Beyond Extraction• Canonicalization• Reasoning– Extraction of implied assertions– Generation of implied assertions– Object identity resolution

• Free-form query processing• Form-based advanced query processing

All based on Conceptual Modeling

Page 58: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

58

Canonicalization for Lexical Object Sets

• “Easter 1832” JulianDate(1832113)• JulianDate(1832113) 22 Apr 1832• “Sam’l” and “Geo.” “Samuel” and “George”• “Boonton, N.J.” “Boonton, NJ, USA”

• Operations:– before(Date1, Date2): Boolean– probabilityMale(Name): 0.0..1.0

ER 2013 Keynote

Page 59: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 59

Implied AssertionsAuthor’s View Desired View

Maria Jennings … daughter of …William Gerard Lathrop

Gender: Female

Name: GivenName: Maria Jennings Surname: Lathrop

Page 60: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 60

Implied Assertions

Maria Jennings Lathrop …child of …William Gerard Lathrop …son of …Mary Ely … Female

Mary Ely … grandmother of… Maria Jennings Lathrop

Page 61: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 61

Object Identity Resolution

0.032081

0.032081

0.995030

Page 62: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 62

Free-Form Query ProcessingPersons born in 1838

Page 63: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 63

Free-Form Query ProcessingPersons born in 1838

born

Person(s)?

Page 64: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 64

Free-Form Query ProcessingPersons born in 1838

= 1838

Person Name BirthDatePerson11 Gerard Lathrop McKenzie 1838Person18 Maria Jennings Lathrop 1838

born

Person(s)?

Page 65: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 65

Free-Form Query ProcessingPersons born in 1838

Person Name BirthDatePerson11 Gerard Lathrop McKenzie 1838Person18 Maria Jennings Lathrop 1838

“Gerard Lathrop McKenzie” because:Person(Person11) has GivenName (“Gerard Lathrop”)and Child(Person11) of Person(Person9)and Person(Person9) has Gender(“Male”)and Person(Person9) has Surname(“McKenzie”)

Page 66: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 66

Form-Based Advanced Query ProcessingCousins of Donald Lathrop who died before he was born or were born after he died.

Cousin

Page 67: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 67

Form-Based Advanced Query ProcessingCousins of Donald Lathrop who died before he was born or were born after he died.

… 1. Mary Ely, b, 1836, d. 1859.2. Gerard Lathrop, b. 1838.…1. Maria Jennings, b. 1838, d. 1840.2. William Gerard, b. 1840. ) .3. Donald McKenzie, b. 1840, d. 1843. ]4. Anna Margaretta, b. 1843.5. Anna Catherine, b. 1845.…1. Charles Halstead, b. 1857, d. 1861.2. William Gerard, b. 1858, d. 1861.3. Theodore Andruss, b. i860.4. Emma Goble, b. 1862.

Page 68: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 68

Veracity

• Knowledge– Populated conceptual model– Plato: “justified true belief”

• FamilySearch– Conceptual model of reality– Constraint violation (discovery)– Assertion verification (evidence)

• Conceptual modeling for veracity

Mitigating Uncertainty with Conceptual Modeling

Page 69: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 69

Veracity: “Justified True Belief”Persons born in 1838

Person Name BirthDatePerson11 Gerard Lathrop McKenzie 1838Person18 Maria Jennings Lathrop 1838

“Gerard Lathrop McKenzie” because:Person(Person11) has GivenName (“Gerard Lathrop”)and Child(Person11) of Person(Person9)and Person(Person9) has Gender(“Male”)and Person(Person9) has Surname(“McKenzie”)

Page 70: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

FamilySearch:Wiki-like Updates + Uncertain Information

Sources of error:1. Incorrect person merges2. Incorrect parent-child

relationship assertions

Cyclic Pedigree:

Page 71: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

FamilySearch: Useful More ExpressiveConceptual Model Specifications

1:*1:2.1:*

x2 Nov 1846

1 Nov 1845

p = 0.79

p = 0.35

Page 72: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

Evidence-Based Conceptual Modeling(1) Model Reality, (2) Allow/Discover Discrepancies, (3) Add Evidence

1:*1:2.1:*

x2 Nov 1846

1 Nov 1845

p = 0.79

p = 0.35

Page 73: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 73

Roadmap• What is BIG DATA?• Why should Conceptual Modeling apply?• Examples to show how Conceptual Modeling

can “come to the rescue”• Summary (and take-home message):– Principles that guide the use of Conceptual

Modeling in BIG DATA applications– Challenges and Research Opportunities

Page 74: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 74

Principles that guide the use of Conceptual Modeling in BIG DATA

• Harvest wrt a conceptual model– Extraction ontologies– And …

• Organize wrt a conceptual model– Rich conceptualizations– And …

• Analyze wrt a conceptual model– Evidence-based reasoning– And …

Page 75: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 75

More Examples of Conceptual Modelingin BIG DATA Applications

• Knowledge Bundle Building for Research Studies (KBB)• Multi-Lingual Query Processing (ML-OntoES)• Table Understanding (TISP, Table Ontology)• Automating Ontology Creation (TANGO)

• Automated Reading (OntoSoar)• Homeland Security• Twitter Suicide Study• Human Genome Project

Dream!Think Big!

Contribute!

Page 76: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 76

Knowledge Bundle Building(i.e., Construct and Populate CMs)

• Objective: Study the association of:– TP53 polymorphism and– Lung cancer

• Task: locate, gather, organize data from:– Single Nucleotide Polymorphism database– Medical journal articles– Medical-record database– Radiology images and reports

Example: Bio-Medical Research

Page 77: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 77

Form-Based Extraction Ontologies Gather SNP Information from the NCBI dbSNP Repository

Page 78: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 78

Linguistically Grounded Conceptual Models Search PubMed Literature

Page 79: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 79

Reverse-Engineer Human Subject Information from INDIVO into a Conceptual Model

Page 80: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 80

Add Annotated Images into the Conceptual Knowledge Bundle

Radiology Report(John Doe, July 19, 12:14 pm)

Page 81: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 81

Query and Analyze Data in Knowledge the Bundle

Page 82: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 82

Q 한국어Honda moins de 8000 en «excellent état»

marque prix mots de clé Honda 7826€ Honda (2)

자동차

색상주행거리

제조사

모델 등급 액세서리 변속기

차 종

모델등급

엔진

특징

연식

가격

8000€

français

+

Multi-Lingual Query Processing

Page 83: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 83

Q 한국어Honda moins de 8000 en «excellent état»

marque prix mots de clé Honda 7826€ Honda (2)

자동차

색상주행거리

제조사

모델 등급 액세서리 변속기

차 종

모델등급

엔진

특징

연식

가격

8000€

français

+

Page 84: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 84

Table Understanding

• Tables on the web– 14.1 billion HTML tables [Cafarella et al. 08]

• Most are tables for layout• 154 million high-quality relational tables

– 50 million spreadsheet tables [Adelfio & Samet 13]• Web table complexity (sampling statistics) [ibid]– Simple relational table: 25% (spreadsheet) 68% (HTML)– Multiple header rows: 15% (spreadsheet) 7% (HTML)– More complex: 60% (spreadsheet) 25% (HTML)

Page 85: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 85

Table Understanding

A B C D ELess than 100 100-299 p 300 pupils or more

1 Schools 2003/04 36.2 39 24.82 2004/05 35.2 39 25.83 2005/06 35.2 39 25.84 2006/07 34.3 40 25.75 2007/08 34 39.6 26.46 2008/09 33.3 40 26.77 2009/101 32 40.7 27.38 Pupils 2003/04 8.7 39.3 529 2004/05 8.7 38.3 5310 2005/06 8.8 38.3 52.911 2006/07 8.4 39 52.612 2007/08 8.3 38.2 53.512 2008/09 8.1 38.2 53.712 2009/101 7.7 38.2 54.1

Page 86: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 86

Table Understanding

A B C D ELess than 100 100-299 p 300 pupils or more

1 Schools 2003/04 36.2 39 24.82 2004/05 35.2 39 25.83 2005/06 35.2 39 25.84 2006/07 34.3 40 25.75 2007/08 34 39.6 26.46 2008/09 33.3 40 26.77 2009/101 32 40.7 27.38 Pupils 2003/04 8.7 39.3 529 2004/05 8.7 38.3 5310 2005/06 8.8 38.3 52.911 2006/07 8.4 39 52.612 2007/08 8.3 38.2 53.512 2008/09 8.1 38.2 53.712 2009/101 7.7 38.2 54.1

Page 87: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 87

Table Understanding

A B C D ELess than 100 100-299 p 300 pupils or more

1 Schools 2003/04 36.2 39 24.82 2004/05 35.2 39 25.83 2005/06 35.2 39 25.84 2006/07 34.3 40 25.75 2007/08 34 39.6 26.46 2008/09 33.3 40 26.77 2009/101 32 40.7 27.38 Pupils 2003/04 8.7 39.3 529 2004/05 8.7 38.3 5310 2005/06 8.8 38.3 52.911 2006/07 8.4 39 52.612 2007/08 8.3 38.2 53.512 2008/09 8.1 38.2 53.712 2009/101 7.7 38.2 54.1

Page 88: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 88

Table Understanding

A B C D ELess than 100 100-299 p 300 pupils or more

1 Schools 2003/04 36.2 39 24.82 2004/05 35.2 39 25.83 2005/06 35.2 39 25.84 2006/07 34.3 40 25.75 2007/08 34 39.6 26.46 2008/09 33.3 40 26.77 2009/101 32 40.7 27.38 Pupils 2003/04 8.7 39.3 529 2004/05 8.7 38.3 5310 2005/06 8.8 38.3 52.911 2006/07 8.4 39 52.612 2007/08 8.3 38.2 53.512 2008/09 8.1 38.2 53.712 2009/101 7.7 38.2 54.1

Page 89: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 89

Table Understanding

A B C D ELess than 100 100-299 p 300 pupils or more

1 Schools 2003/04 36.2 39 24.82 2004/05 35.2 39 25.83 2005/06 35.2 39 25.84 2006/07 34.3 40 25.75 2007/08 34 39.6 26.46 2008/09 33.3 40 26.77 2009/101 32 40.7 27.38 Pupils 2003/04 8.7 39.3 529 2004/05 8.7 38.3 5310 2005/06 8.8 38.3 52.911 2006/07 8.4 39 52.612 2007/08 8.3 38.2 53.512 2008/09 8.1 38.2 53.712 2009/101 7.7 38.2 54.1

Page 90: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 90

Table Understanding

A B C D ELess than 100 100-299 p 300 pupils or more

1 Schools 2003/04 36.2 39 24.82 2004/05 35.2 39 25.83 2005/06 35.2 39 25.84 2006/07 34.3 40 25.75 2007/08 34 39.6 26.46 2008/09 33.3 40 26.77 2009/101 32 40.7 27.38 Pupils 2003/04 8.7 39.3 529 2004/05 8.7 38.3 5310 2005/06 8.8 38.3 52.911 2006/07 8.4 39 52.612 2007/08 8.3 38.2 53.512 2008/09 8.1 38.2 53.712 2009/101 7.7 38.2 54.1

Page 91: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

Automating Ontology Creationwith TANGO

Agglomeration Population Continent Country

Tokyo 31,139,900 Asia Japan

New York-Philadelphia

30,286,900 The Americas United States of America

Mexico 21,233,900 The Americas Mexico

Seoul 19,969,100 Asia Korea (South)

Sao Paulo 18,847,400 The Americas Brazil

Jakarta 17,891,000 Asia Indonesia

Osaka-Kobe-Kyoto 17,621,500 Asia Japan

… … … …

Niigata 503,500 Asia Japan

Raurkela 503,300 Asia India

Homjel 502,200 Europe Belarus

Zunyi 501,900 Asia China

Santiago 501,800 The Americas Dominican Republic

Pingdingshan 501,500 Asia China

Fargona 501,000 Asia Uzbekistan

Kirov 500,200 Europe Russia

Newcastle 500,000 Australia /Oceania

Australia

Agglomeration Population

Country Continent

Page 92: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

Merge

Results

Agglomeration Population

Country Continent

Time

Location

Longitude Latitude

hasnames

Latitude and longitudedesignates location

Country City

Name Geopolitical Entity

Continent

Location

Longitude Latitude

Latitude and longitudedesignates location

Name Geopolitical Entity

Population

CityAgglomerationCountry

HasGMT

Time

Location

Longitude Latitude

hasnames

Latitude and longitudedesignates location

Country City

Name Geopolitical Entity

HasGMT

Automating Ontology Creationwith TANGO

Page 93: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 93

Automated Reading: NELL

http://rtw.ml.cmu.edu

Page 94: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 94

Automated Reading: OntoSoar• Populate conceptual model from text– Directly– By inference

• Augment conceptual model and populate

Page 95: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 95

Homeland Security: Terrorist Example

Page 96: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 96

Homeland Security: Terrorist Example

Page 97: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 97

Homeland Security: Terrorist Example

Abu Aziz

?

White House

White House

Page 98: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 98

Homeland Security: Terrorist Example

Abu Aziz

?

White House

White House

Page 99: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 99

Homeland Security: Terrorist Example

Abu Aziz

?

White House

White House

What If!

Page 100: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 100

Twitter Suicide Study

Tweets could warn of a suicide risk,BYU study says

Oct. 10 2013

… Over three months, the computer scientists screened millions of tweets and identified 37,717 that were "genuinely troubling" from 28,088 unique users …

Page 101: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 101

Conceptual Modeling for Studying theHuman Genome

Page 102: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 102

Conceptual Modeling for Studying theHuman Genome

Page 103: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 103

Roadmap• What is BIG DATA?• Why should Conceptual Modeling apply?• Examples to show how Conceptual Modeling

can “come to the rescue”• Summary (and take-home message):– Principles that guide the use of Conceptual

Modeling in BIG DATA applications– Challenges and Research Opportunities

Page 104: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 104

Principles that guide the use of Conceptual Modeling in BIG DATA

• Harvest wrt a conceptual model– Extraction ontologies– And: table understanding, automated reading, …

• Organize wrt a conceptual model– Rich conceptualizations– And: KBs for research studies, multilingual web, …

• Analyze wrt a conceptual model– Evidence-based reasoning– And: “what-if”, warning signs search, DNA, …

Page 105: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 105

Summary & Challenge• Conceptual Modeling Applies to BIG DATA

(perhaps more than you might have thought)

• Challenge: find ways to use conceptual modeling to “rescue”—resolve BIG DATA issues

BYU Data Extraction Research Groupwww.deg.byu.edu

Page 106: BIG DATA Conceptual Modeling to the Rescue David W. Embley with special thanks to Stephen W. Liddle and the Data Extraction Research Group at Brigham Young.

ER 2013 Keynote 106

Summary & Challenge• Conceptual Modeling Applies to BIG DATA

(perhaps more than you might have thought)

• Challenge: find ways to use conceptual modeling to “rescue”—resolve BIG DATA issues

BYU Data Extraction Research Groupwww.deg.byu.edu