Top Banner
Linguistic Linked Open Data LLOD Challenges, Approaches, Future Work Sebastian Hellmann TKE 2016 1
54

Linguistic Linked Open Data, Challenges, Approaches, Future Work

Apr 16, 2017

Download

Internet

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Linguistic Linked Open DataLLOD

Challenges, Approaches, Future Work

Sebastian HellmannTKE 2016

1

Page 2: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

AKSW / KILT in Leipzig Leipzig has become one of the largest Semantic Web centers

AKSW has 4 subgroups and 45 PhD students http://aksw.org/Team.html

Current position:

- Head of AKSW / KILT research group (8 PhD students)- Knowledge Integration and Language Technology (KILT) http://aksw.org/Groups/KILT.html

- Project manager for 2 H2020 and 1 German research project (BMWi)- http://freme-project.eu/ , http://aligned-project.eu/ , http://smartdataweb.de/

- Executive Director of the DBpedia Association http://dbpedia.org

2

Page 3: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

Outline● The vision behind Linked Data - a technological introduction● Linguistic Linked Open Data● Knowledge Modelling vs. Data Encoding● LIDER● Challenges and Approaches

3

Page 4: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Linked Data

4

Page 5: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

Web of DataWWW vs. GGG - https://en.wikipedia.org/wiki/Giant_Global_Graph

Data on the Web vs. the Web of Data vs. the Semantic Web

RDF - Entity Attribute Value - http://dbpedia.org/resource/Copenhagen

Three ways to publish RDF:

1. Linked Data: resource-level access via HTTP request (next slide)2. SPARQL: query access via triplestore database3. Dump: dataset-level access via bulk download

5

Page 6: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

Linked DataFour rules of https://www.w3.org/DesignIssues/LinkedData

1. Use URIs as names for things2. Use HTTP URIs so that people can look up those names.3. When someone looks up a URI, provide useful information, using the

standards (RDF*, SPARQL)4. Include links to other URIs. so that they can discover more things.

https://en.wikipedia.org/wiki/Copenhagen vs. http://dbpedia.org/resource/Copenhagen

Source: https://www.w3.org/DesignIssues/LinkedData.html 6

Page 7: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

Open Data != Open DataOpen Access vs Open License

Open Access means accessible like a web page (often unclear license)

http://opendefinition.org by OKFN:

“Knowledge is open if anyone is free to access, use, modify, and share it — subject, at most, to measures that preserve provenance and openness.”

7

Page 8: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 8

http://lod-cloud.net/

Page 9: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

How is the Linked Data Cloud built?

9

- Open Access as the basis- 50 links between things required to receive

a dataset link- http://lov.okfn.org- http://datahub.io - Assessing Quantity and Quality of Links Between Linked Data Datasets by Ciro Baron Neto, Dimitris Kontokostas,

Sebastian Hellmann, Kay Müller, and Martin Brümmer in LDOW 2016 http://events.linkeddata.org/ldow2016/papers/LDOW2016_paper_09.pdf

Page 10: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Linguistic Linked Open Data

10

Page 11: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

Linguistic Linked Open Data● Movement originated in the context of the Working Group for Open Data in

Linguistics (OWLG) at Open Knowledge Foundation (OKFN)● Open is supposed to mean Open license● Join community mailing list at http://linguistics.okfn.org/ ● Current information at http://linguistic-lod.org/

maintained by John McCrae -> Instructions on how to join the LLOD cloud

11

Page 12: Linguistic Linked Open Data, Challenges, Approaches, Future Work

January 2011

12

Page 13: Linguistic Linked Open Data, Challenges, Approaches, Future Work

13

February 2012

Linked Data in Linguistics. Representing Language Data and Metadata (http://www.springer.com/computer/ai/book/978-3-642-28248-5 ) Christian Chiarcos, Sebastian Nordhoff, and Sebastian Hellmann (Eds.). Springer, Heidelberg, (2012)

Page 14: Linguistic Linked Open Data, Challenges, Approaches, Future Work

August 2012

14

Page 15: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sept 2012MLODE

15

Special Issue on Multilingual Linked Open Data (MLOD)Editors: Sebastian Hellmann, Steven Moran, Martin Brümmer, and John McCrae, Semantic Web, vol. 6, no. 4, pp. 315-317, 2015

Page 16: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Jan 2013

16

Page 17: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sep 2013

17

LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/

Page 18: Linguistic Linked Open Data, Challenges, Approaches, Future Work

May 2014

18

LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/

Page 19: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Nov 2014

19

LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/

Page 20: Linguistic Linked Open Data, Challenges, Approaches, Future Work

May 2015

20

LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/

Page 21: Linguistic Linked Open Data, Challenges, Approaches, Future Work

May 2016

21

LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/

Page 22: Linguistic Linked Open Data, Challenges, Approaches, Future Work

22

Page 23: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Should we all use Linked Data?

23

Page 24: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

Should we all use Linked Data?

When should we use linked data?

How should we use linked data?

When should we not use it?

24

Page 25: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Knowledge Modeling vs. Data Encoding

25

Page 26: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

Entity Relationship Diagrams and UML

26

The Metadata Ecosystem of the DataId Ontology, Markus Freudenberg, submitted to MTSR Conf 2016

http://dataid.dbpedia.org

Page 27: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

XML encoding variants

27

Page 28: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

XML encoding variants

28

Page 29: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

XML encoding variants

<same> should be symmetric, reflexive and transitive https://en.wikipedia.org/wiki/Equivalence_relation

Apples and oranges

29

Page 30: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

Who can you ask what XML tags and structure mean and what they are used for?

30

Page 31: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

Who can you ask what XML tags and structure mean and what they are used for?

31

Page 32: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

Internationalization Tag Set (ITS) 2.0http://www.w3.org/TR/its20/

● W3C Recommendation since 29 October 2013● defines how to embed Machine Translation and Localisation

annotations, so called Data Categories, in (X)HTML and XML● In addition to the human-readable document two ontologies are referenced

that capture the semantics of the standard.● ITS Ontology as companion● NLP Interchange Format (NIF) is the recommended format for RDF

conversion of ITS2.0 http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core

32

Page 33: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

Internationalization Tag Set (ITS) 2.0

33

One of the most efficient and robust ways to annotate HTML in a standardized manner

Page 34: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

NLP Interchange Format 2.0 (old example)

34

Page 35: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

NLP Interchange Format 2.0 (old example)

35

Page 36: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

NIF 2.1 release pendingJoin W3C Community Group: https://www.w3.org/community/ld4lt/

NIF useful for:

● Adding semantics to NLP tool output and corpora● Providing and publishing identifiers for text and annotations

NIF is compact and scalable (cf. http://wiki-link.nlp2rdf.org/ ):

● Google Wikilinks Corpus with 10.6 million webpages and 31.5 million Wikipedia links (about 3 per page) with a zipped size of 180 GB.

● 533 million triples (other formats 7-27% more) ● 79 GB (12 GB gzipped dumps) in Turtle format (original size 180 GB containing HTML markup)

36

Page 37: Linguistic Linked Open Data, Challenges, Approaches, Future Work

LIDER Towards a linguistic linked data ecosystem

37

Website: http://lider-project.eu Guidelines: http://lider-project.eu/?q=guidelines

Page 38: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

NIF

38

Page 39: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

LIDER - Deliverable 2.1.2

39

http://www.lider-project.eu/sites/default/files/D2.1.2-Phase-II.pdf

Page 40: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

LIDER Reference Architecture Deliverable 3.1.2.General:

lemon - developed by

40

http://www.lider-project.eu/sites/default/files/D3.1.2-v2.0.pdf

Page 41: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Challenges and Work in Progress

41

Page 42: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

Identifier management- Ideal identifiers are stable, i.e. the meaning behind the URI does not change- Unrealistic for most use cases - Easier for individuals, i.e. persons, organisations- Non-trivial for terminology

Proposals:

1. Apply software development practices, i.e. versioning, update scripts http://vocol.org , http://github.org , http://aligned-project.eu

2. ??42

Page 43: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

Knowledge Fusion- Linking is mostly done manual- Linking 200 datasets pairwise requires maintenance of 40000 mappings- Adding one after the other depends on the merge order- Ideally we would be able to structure all datasets into clusters before linking

Proposals:

1. Under discussion with: Erhard Rahm - The Case for Holistic Data Integration ADBIS 2016 Keynote: http://adbis2016.vsb.cz/keynote/ (to appear)

2. Apply software development processes: https://github.com/dbpedia/links

43

Page 44: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

The Metadata ChallengeWhere to publish metadata for your data?

- Barrier between data and dataset description- Stale metadata- Single point of truth missing- Metadata too heterogeneous- Download link missing- No (sufficiently) complete view over the web of data possible, discovery failure

Proposals:

1. build an index: http://linghub.lider-project.eu/ (Clarin, LRE Map, Metashare, Datahub)2. create a better schema: http://dataid.dbpedia.org and provide benefits for complying

44

Page 45: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

MMoOn- LIDER

- Lemon- ODRL- Olia - NIF

- Morphology quite complex- Specific to language and to the

linguist - http://mmoon.org

45

Page 46: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

The Metadata Challenge 2● RDF structure is too simple to keep additional metadata

○ Scope○ Validity○ Confidence○ Technical metadata, i.e. collection time

Contextualisation is probably already better researched in lexicography than in Semantic Web.

46

Page 47: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Future work and take home messages

47

Page 48: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

● Data Quality can be defined and measure with the tools.● http://svn.aksw.org/papers/2014/WWW_Databugger/public.pdf Test-driven

Evaluation of Linked Data Quality by Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens Lehmann, Roland Cornelissen, and Amrapali J. Zaveri in Proceedings of the 23rd International Conference on World Wide Web

● Current standard:○ https://www.w3.org/TR/shacl/

Data quality and verification

48

Page 49: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

Open licenses in research

49

Are you willing to publish your data under an open

license?

Can you make a product out of your data?

No

Yes

Start

Congratulations, your paper has been accepted

Yes

Good luck, we wish you all the best and a high profit

No

Page 50: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

Entity Linking Verification - new translator job profile

● http://www.freme-project.eu/ ● Business Case: Integrating semantic enrichment into multilingual content in

translation and localisation● In the future, translators and lexicographers

might be asked to judge entity linking andverify data

50

Page 51: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

Should I invest in publishing linked data?Long-term data strategy, if you:

● Have many expected inbound links

● Persistent ids● Long term hosting and curation

Is no problem for you

-> yes (data value increases)

One time thing:

● Interest of externals only in the yellow zone-> Publish under open license (let someone else do it)

51

Page 52: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

DBpedia AssociationDBpedia+

● Maintain identifier space● Add open and member data to DBpedia+● Add data following the LIDER guidelines● Ability to add your backlinks

DBpedia Community meeting on the 15th of September in Leipzig

52

Page 53: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

Events in 2016● KEKI 2016 Workshop - Uses of Linguistic Linked Open Data http://keki2016.

linguistic-lod.org/ Deadline is 1st of July, but might be extended● http://2016.semantics.cc

53

Page 54: Linguistic Linked Open Data, Challenges, Approaches, Future Work

Thank [email protected]

54