Top Banner
Bringing Math to LOD: A Semantic Publishing Platform Prototype for Scientific Collections in Mathematics Olga Nevzorova, Nikita Zhiltsov, Danila Zaikin, Olga Zhibrik, Alexander Kirillovich, Vladimir Nevzorov, Evgeniy Birialtsev Kazan Federal University Russia October 23, 2013 1 / 29
29

Bringing Math to LOD

Nov 29, 2014

Download

Technology

Nikita Zhiltsov

The presentation slides at ISWC 2013
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bringing Math to LOD

Bringing Math to LOD:A Semantic Publishing Platform Prototype for

Scientific Collections in Mathematics

Olga Nevzorova, Nikita Zhiltsov, Danila Zaikin, Olga Zhibrik,Alexander Kirillovich, Vladimir Nevzorov, Evgeniy Birialtsev

Kazan Federal UniversityRussia

October 23, 2013

1 / 29

Page 2: Bringing Math to LOD

Outline

1 Introduction

2 Approach

3 Use Cases

2 / 29

Page 3: Bringing Math to LOD

Our Contribution

Our prototype is geared to build a semantic graph ofmathematical knowledge objects, that

I is extracted from a collection of mathematicalscholarly papers, and

I is integrated into the LOD «cloud»

3 / 29

Page 4: Bringing Math to LOD

Research OutputIVM Data Set

I LOD representation of 1 330 scholarly publications ofthe «Izvestiya Vuzov. Matematika» (IVM) journal

I Covers the semantics of:I article metadataI elements of the logical structureI terminologyI formulas

I Aligned with DBpedia, CORDISI More than 850 000 RDF triplesI SPARQL endpoint:

http://cll.niimm.ksu.ru:8890/sparql-auth∗

∗the SPARQL endpoint is secured. Please email the authors for credentials4 / 29

Page 5: Bringing Math to LOD

Related Work

I Domain-specific languages: OMDoc, MathLangI Domain models: Cambridge MathematicalThesaurus, DBpedia (math-related part),ScienceWISE Ontology

I Math-related NLP: mArachna; linguistic modules ofarXMLiv

5 / 29

Page 6: Bringing Math to LOD

Outline

1 Introduction

2 Approach

3 Use Cases

6 / 29

Page 7: Bringing Math to LOD

Key Research Contributions

I a thorough ontological model of the mathematicaldomain

I an ontology-based language-independent method forextraction of logical structure elements in papers

I an ontology-based method for extraction ofmathematical named entities from texts in Russian

I a method that connects mathematical named entitiesto symbolic expressions

7 / 29

Page 8: Bringing Math to LOD

Prototype’s Design

8 / 29

Page 9: Bringing Math to LOD

Domain Model

9 / 29

Page 10: Bringing Math to LOD

Ontology of Structural Elements (1)http://cll.niimm.ksu.ru/ontologies/mocassin

I Covers 15 common structural elements:

I Defines 9 object properties and 4 datatype properties:

10 / 29

Page 11: Bringing Math to LOD

Ontology of Structural Elements (2)http://cll.niimm.ksu.ru/ontologies/mocassin

I 3 cardinality axioms, e.g.Proof ∧ (= 1 proves ProvableStatement†)

I 2 transitivity axioms for hasPart and dependsOnproperties

I DL expressivity: SRIN (D)

†i.e., Claim ∨ Corollary ∨ Lemma ∨ Proposition ∨ Theorem11 / 29

Page 12: Bringing Math to LOD

Ontology of Mathematical Concepts (1)http://cll.niimm.ksu.ru/ontologies/mathematics

I Covers 3 450 mathematical conceptsI Defines commonly used terms as well as terms fromthe emerging professional vocabulary (e.g.Bitsadze-Samarsky problem)

I Supports Russian/English labels

12 / 29

Page 13: Bringing Math to LOD

Ontology of Mathematical Concepts (2)http://cll.niimm.ksu.ru/ontologies/mathematics

I Includes two taxonomies:I taxonomy of mathematical theories‡:

F number theory, set theory, algebra, analysis, geometry,mathematical logic, discrete mathematics, theory ofcomputation, differential equations, numerical analysis,probability theory and statistics

I taxonomy of mathematical objects

I Covers common scientific concepts, such as Problem,Method, Statement, Formula etc.

I DL expressivity: ALCHI

‡covers just a part of the mathematical knowledge13 / 29

Page 14: Bringing Math to LOD

Ontology of Mathematical Concepts (3)Object properties

I belongsTo/contains, e.g.Barycentric Coordinates belongsTo Metric Geometry

I defines/isDefinedBy, e.g.Christoffel Symbol isDefinedBy Connectedness

I seeAlso, e.g.Chebyshev Iterative Method seeAlso Numerical Solution ofLinear Equation Systems

14 / 29

Page 15: Bringing Math to LOD

Ontology of Mathematical Concepts (4)Stats

I 3 450 classesI 27% of classes are mapped onto DBpediaI 3 630 subclass-of property instancesI 1 140 other object property instancesI Common facts about the development:

I lasted for 4 monthsI 7 pro mathematicians participated as domain experts

guided by the authorsI WebProtege was used as a collaborative tool

15 / 29

Page 16: Bringing Math to LOD

Semantic Annotation

16 / 29

Page 17: Bringing Math to LOD

NLP Annotation

I Relies on the OntoIntegrator facilitiesI Solves some of the conventional linguistic tasks, suchas:

I tokenizationI sentence splitting (∼ 98% F-measure§)I morphological analysisI NP extraction (88% precision)

I Special handling of math symbols, abbreviations, andmath expressions as parts of NPs

I Currently supports only Russian language

§the metrics were evaluated on real math texts with the help ofdomain experts

17 / 29

Page 18: Bringing Math to LOD

Mining the Logical StructureI Supports our ontology of structural elements:

elements in real texts are instances of the ontology classesI Recognizing types of structural elements:

I A string similarity based method gives 89%-100%F-measure depending on the class

I Recognizing semantic relations between them:I A decision tree learner gives 61%-95% F-measure

depending on the relation

18 / 29

Page 19: Bringing Math to LOD

Mathematical Named Entity Extraction

I Supports our ontology of mathematical concepts:assigned NPs are instances of the ontology classes

I Our method employs annotations of the NP structureand Jaccard similarity

I The method gives 86% F-measure with parametersfocusing on precision/recall trade-off

19 / 29

Page 20: Bringing Math to LOD

Connecting Named Entities to Formulas

20 / 29

Page 21: Bringing Math to LOD

Connecting Named Entities to FormulasI Parsing mathematical expressionsI Detection of variablesI Proximity-based matching of mathematical variableswith noun phrases at 68% accuracy

21 / 29

Page 22: Bringing Math to LOD

Other supported features

22 / 29

Page 23: Bringing Math to LOD

Other supported features

I Article metadata extraction (title, author names,publication year etc.) according to AKT Portalschema

I Semi-manual interlinking¶ with existing LOD datasets: DBpedia, CORDIS

I Publishing the extracted data as an LOD-compliantRDF data set

¶by leveraging the Silk app23 / 29

Page 24: Bringing Math to LOD

Outline

1 Introduction

2 Approach

3 Use Cases

24 / 29

Page 25: Bringing Math to LOD

Finding DBpedia Entities in Mathematical Formulashttp://cll.niimm.ksu.ru/iswc-demo

1

2

25 / 29

Page 26: Bringing Math to LOD

Semantic Search of Theoretical FindingsFinding articles with theorems about finite groups

PREFIX moc: <http://cll.niimm.ksu.ru/ontologies/mocassin#>PREFIX math: <http://cll.niimm.ksu.ru/ontologies/mathematics#>SELECT ?article WHERE {?article moc:hasSegment ?theorem .?theorem moc:mentions ?entity; a moc:Theorem .?entity a math:E2183}

26 / 29

Page 27: Bringing Math to LOD

Conclusion

I We have developed a holistic approach for miningLOD representation of scholarly papers inmathematics

I We applied the prototype to a collection of over1 300 real math papers

I We conducted a thorough evaluation of the proposedmethods with the help of domain experts

I We provided several use cases to illustrate the utilityof the published data

27 / 29

Page 28: Bringing Math to LOD

Future Work

I Integrating all the modules into a full-fledged toolkitI Add support of English to the NLP moduleI Extend our approach to texts on other naturalscience domains

28 / 29

Page 29: Bringing Math to LOD

Thanks for your attention!Questions?

29 / 29