Top Banner
Andreas Blumauer CEO & Managing Partner Semantic Web Company / PoolParty Semantic Suite Taxonomy Boot Camp 2017 Washington, DC Leveraging Taxonomy Management With Machine Learning
30

Leveraging Taxonomy Management with Machine Learning

Jan 21, 2018

Download

Data & Analytics

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Leveraging Taxonomy Management with Machine Learning

Andreas BlumauerCEO & Managing Partner

Semantic Web Company / PoolParty Semantic Suite

Taxonomy Boot Camp 2017Washington, DC

Leveraging Taxonomy Management With Machine Learning

Page 2: Leveraging Taxonomy Management with Machine Learning

INTRODUCTION

2Semantic Web

Company

founder & CEO of

Andreas Blumauer

developer and vendor of

2004founded

6.0

current Version

active at

based on

Vienna

located

part of EnterpriseKnowledge Graphs

manages

standard for

part of

enriches

>200serves customers

editor of

Taxonomies

is about

Ontologies

standard for

graduates

Text Mining

used for

Page 3: Leveraging Taxonomy Management with Machine Learning

Agenda

▸ Cognitive Computing: Semantic Technologies & Machine Learning

▸ Terms, Concepts, Shadow Concepts▸ Corpus Analysis & (Shadow) Concept Extraction

with PoolParty▸ A comparison with LSA and Word2Vec▸ Use Cases

▹ Document Annotation & Indexing▹ Text Classification (incl. Benchmarks)▹ Recommender Systems (incl. Use Case)

3

Page 4: Leveraging Taxonomy Management with Machine Learning

Cognitive Computing

Combining Semantic Technologies With Machine Learning

4

Page 5: Leveraging Taxonomy Management with Machine Learning

A key assumptionof this talk

People do not search for documents only, they seek facts about things and smaller chunks of information.

Machines shall help to create links across data silos to give answers to questions.

5

Converging A.I. Technologies

Page 6: Leveraging Taxonomy Management with Machine Learning

A quick question at the beginning

Will Artificial Intelligence make Subject Matter Experts obsolete?

6 Imagine you want to build an application that helps to identify patients and treatments pairings.

Which will you prefer?

Applications solely based on machine learning, those ones which are based on doctors' knowledge only, or a combination of both?

Page 7: Leveraging Taxonomy Management with Machine Learning

How Semantic Computing and Machine Learning complement each other

7Structured Data

Machine Learning

Cognitive Applications

Page 8: Leveraging Taxonomy Management with Machine Learning

How Semantic Computing and Machine Learning complement each other

8Unstructured Data

Structured Data

Machine Learning

Cognitive Applications

Page 9: Leveraging Taxonomy Management with Machine Learning

How Semantic Computing and Machine Learning complement each other

9Unstructured Data

Structured Data

Knowledge Graphs

Machine Learning

Cognitive Applications

Page 10: Leveraging Taxonomy Management with Machine Learning

Towards a Digital Twin

Proposal for a Cognitive Computing Platform Architecture

10Unstructured Data

Structured Data

Knowledge Graphs

Machine Learning

Semantic Layer

IoT & Cognitive Applications

Page 11: Leveraging Taxonomy Management with Machine Learning

Terms, Concepts, Shadow ConceptsHow to make sense of text and data

11

Page 12: Leveraging Taxonomy Management with Machine Learning

Terms and co-occurence models

12DocumentCorpus

- Websites- PDF, Word, …- Abstracts from

DBpedia- RSS Feeds

Term 8

Term 3

Term 7

Term 8

Term 6

Term 9

Term 5

Term 10

- Relevant terms and phrases- Relevancy of terms- co-occurence between terms and terms

Term 1

Term 4

Term 2

Page 13: Leveraging Taxonomy Management with Machine Learning

‘Things’ but not Strings: Using a ‘Semantic Knowledge Graph’

http://www.my.com/taxonomy/62346723

prefLabel

Retina

image

http://www.my.com/images/90546089

http://www.my.com/taxonomy/97345854

prefLabel

Funduscope

altLabelOphthalmoscope

http://www.mycom.com/taxonomy/4543567

prefLabel

Diagnostic Equipment

has broader

Page 14: Leveraging Taxonomy Management with Machine Learning

Shadow Concepts

Use co-occurences between concepts and terms to extract ‘shadow concepts’

14 This site is a 15th-century Inca site located 2,430 metres above sea level. It is located in Cusco, Peru.

It is situated on a mountain ridge above the Sacred Valley through which the Urubamba River flows. Most archaeologists believe that it was built as an estate for the Inca emperor Pachacuti. Often mistakenly referred to as the "Lost City of the Incas", it is the most familiar icon of Inca civilization. The Incas built the estate around 1450, but abandoned it a century later at the time of the Spanish Conquest.

Inca site

Machu Picchu

CuscoInca

empire

Inca emperor

Peru

Spanish Conquest

Sacred Valley

Chankas

Lost City

Pachacuti

In addition to explicitly used concepts and terms, Machu Picchu is extracted from the article as a Shadow Concept. As a prerequisite, one has to provide and analyze a representative text corpus first.

Example:

Page 15: Leveraging Taxonomy Management with Machine Learning

Corpus AnalysisUse PoolParty for Deep Text Analysis

15

Page 16: Leveraging Taxonomy Management with Machine Learning

Bionics

How do we learn from a lot of text?

16 Bla bla bla bla. Bla bla bla bla

The stove is on. The stove is hot!

Ontological model → reasoningTaxonomical model → is-a abstractions

Bla stove bla bla. Bla bla bla hot

Switched on devices are dangerous devices.

The stove is on. The stove is hot!

Statistical model/cooccurences → is related

The stove is on. The stove is hot!

Switched on devices are dangerous, only if the operating temperature is above 100 degrees and the automatic shutdown mechanism is broken.

Bla bla bla bla. Bla bla bla bla

Page 18: Leveraging Taxonomy Management with Machine Learning

Knowledge graphs as a result of human-machine cooperation

18Manually created parts of graph

Supervised learning

Automatically created parts of graph(corpus analysis, RDF transformation, machine learning, ….)

Page 19: Leveraging Taxonomy Management with Machine Learning

PoolParty Corpus Analysis

How taxonomists can extend taxonomies with some help from machine learning algorithms

19

Candidate Concepts derived from sample documents can be easily integrated into taxonomy. A list of possible Candidate Concepts is

shown per document or as a list of most relevant candidates per corpus.

Context of a given taxonomy concept can be visualised with a few mouse-clicks. Terms, concepts and shadow concepts

can be high-lighted per document.

Page 20: Leveraging Taxonomy Management with Machine Learning

Network-based Knowledge Graph Assessment

Thesaurus Harmonizer

20 ▸ Find missing relationships between concepts, which are of high semantic relevance

▸ Point out structural flaws in existing thesauri

▸ Identify corpora that only reflect a fraction of a thesaurus ▹ Or, vice versa: identify

thesauri that are far too big for their domain applications, and possibly missing details

Page 21: Leveraging Taxonomy Management with Machine Learning

Use CasesBenefit from Semantic Knowledge Graphs

and Machine Learning

21

Page 22: Leveraging Taxonomy Management with Machine Learning

PoolParty Extractor

Extract concepts from text even if not used explicitly

22

Some domains use text that doesn’t always call a spade a spade. With ‘shadow concept extraction’ those ‘masked’ concepts still can be surfaced.

Since these technologies would have become conventional technologies that are made into products and introduced into market at the time of their introduction, it would be difficult to differentiate them as innovative environmental and energy technologies from other global warming prevention technologies that have already been put to practical use in the industrial, commercial, residential, and energy conversion sectors.- The Innovative Global Warming Prevention Technology Working Group under the Research and Development Subcommittee- Council assessed that innovative global warming prevention technologies would bring about a reduction effect of 7.49 million t-CO2 case of average emissions factor for all power sources of carbon dioxide in 2010. In view of the difficulty in putting innovative carbon dioxide sequestration technology into practical use by 2010, the Working Group reassigned it as an issue of global warming prevention technology to be tackled by 2030. The Central Environment Council, however, has not had the opportunity to examine the contents of these technologies in detail. (Promotion of climate change prevention activities by every social actor)- The Programme encourages every social actor to take actions to prevent global warming. The actions include measures undertaken by the public sector.

Climate Change

Since these technologies would have become conventional technologies that are made into products and introduced into market at the time of their introduction, it would be difficult to differentiate them as innovative environmental and energy technologies from other global warming prevention technologies that have already been put to practical use in the industrial, commercial, residential, and energy conversion sectors.- The Innovative Global Warming Prevention Technology Working Group under the Research and Development Subcommittee- Council assessed that innovative global warming prevention technologies would bring about a reduction effect of 7.49 million t-CO2 case of average emissions factor for all power sources of carbon dioxide in 2010. In view of the difficulty in putting innovative carbon dioxide sequestration technology into practical use by 2010, the Working Group reassigned it as an issue of global warming prevention technology to be tackled by 2030. The Central Environment Council, however, has not had the opportunity to examine the contents of these technologies in detail. (Promotion of climate change prevention activities by every social actor)- The Programme encourages every social actor to take actions to prevent global warming. The actions include measures undertaken by the public sector.

Climate Change

Page 23: Leveraging Taxonomy Management with Machine Learning

PoolParty Semantic Classifier

Text Classification based on Machine Learning and Semantic Knowledge Models

23

PoolParty Semantic Classifier combines machine learning algorithms (SVM, Deep Learning, Naive Bayes, etc.) with Semantic Knowledge Graphs.

Page 24: Leveraging Taxonomy Management with Machine Learning

Benchmarking the PoolParty Semantic Classifier

Improvement of 5.2% compared to traditional (term-based) SVM

24

Features used Classifier F1 (5 folds) Variance

Terms LinearSVC 0.83175 0.0008

Concepts from REEGLE + Shadow Concepts LinearSVC 0.84451 0.0011

Concepts from REEGLE LinearSVC 0.84647 0.0009

Terms + Concepts from REEGLE + Shadow Concepts LinearSVC 0.87474 0.0009

Reegle thesaurusA comprehensive SKOS taxonomyfor the clean energy sector(http://data.reeep.org/thesaurus/guide)

● 3,420 concepts● 7,280 labels (English version)● 9,183 relations (broader/narrower + related)

Document Training Set1.800 documents in 7 classesRenewable Energy, District Heating Systems, Cogeneration, Energy Efficiency, Energy (general), Climate Protection, Rural Electrification

Page 25: Leveraging Taxonomy Management with Machine Learning

Sample Calculation

Based on an improvement of 5.2%

25Inbound

Documents

PoolParty Semantic Classifier

ExperiencedAgent

● 100,000 documents (emails, tickets, etc.) per month● 5 Euros extra costs per document when misrouted

● Cost savings per year:○ 1,200.000 x €5.0 x 0.052 = € 312,000 per annum

Page 26: Leveraging Taxonomy Management with Machine Learning

Use Shadow Concepts to improve Recommender Systems

26Mini Countryman

And it’s probably more of a crossover than ever, with the design to match, Being a Mini, the Countryman is clearly meant to be the driver’s car among small crossovers. The suspension is sophisticated, and there are lots of chassis options (a stiffer sports setup, variable damping, the electronically controlled ALL4 all-wheel-drive).

But it’s also the crossover for people who’ve bags of cash to blow on personalisation and luxury.

There’s been a lot of effort on ramping up the cabin quality, but then the outgoing Countryman was a sad let-down in that department.

On the outside, plastic wheel-arch extensions, with eyebrow creases in the metalwork above, as well as roof bars and sill protectors all add to the visual crossover-ness. This remains the only Mini with angular rather than oval headlamps, and there’s a load of visual posturing going on in the lower face.

There are eight versions at launch, and they’re exactly what you’d expect. It’s Cooper or Cooper S, each fuelled by petrol or diesel, each of them with front drive or ALL4. Oh and an eight-speed auto, too, if you count that as a separate choice. The Cooper petrol is a three-cylinder, the rest fours.

You get extra kit as standard versus the old car, including navigation, Bluetooth, emergency call and park sensors. Upgrades include a bigger touch-screen nav with high-definition traffic, various posher seats, a HUD, and driver aids. Oh and a cushion thingy that folds out from the boot so you can sit on the rear bumper without getting your clothes mucky.

In June 2017 a Cooper E will launch, which has the Cooper three-cylinder petrol driving the front wheels, and an electric motor for the rears, with a capacity to do a claimed 25 miles of gentle all-electric running. So it has the performance of a Cooper S ALL4 with the tax-busting advantages of a plug-in hybrid. And you wouldn’t use any fuel if you commuted a short distance.

The platform is BMW’s contemporary transverse-engined hardware, in the bigger of its two sizes. That means it shares a lot with the BMW X1. The 4WD system is more sophisticated than the previous Countryman’s. The proportion of drive to the rear is computed by a controller that takes into account parameters including grip, steering angle and throttle position, as well as whether you’ve got the sports mode and sports traction systems selected.

Page 27: Leveraging Taxonomy Management with Machine Learning

Use a Knowledge Graph + Co-occurences for precise Content Recommendation

27 RavingDe-Void

Scott

attack

Stilinski

friend

shame

O’Brien

woman

married

girl

attractive

Sim

ilar e

piso

des!

love

Example: Find similar episodes

Page 28: Leveraging Taxonomy Management with Machine Learning

Rules-based Recommender Systems

Example: Wine-to-Cheese Harmonizer

Live Demo

28 Dry

Medium-bodied

High acidity

Weingut Weinrieder

Grüner Veltliner

Alte Reben

is characterized by

Nutmeg

Full-bodied

Warm finish

Tobacco

is characterized by

Nagelkaas

Cumin

Clove

Hard cheese

Higher fat

?is characterized by

matches

matches

does not match

Page 29: Leveraging Taxonomy Management with Machine Learning

Why ‘The Knot’ uses Machine Learning and Semantic Models

29 ▹ XO Group runs ‘The Knot’ since 1996

▹ NYSE: XOXO (S&P 600 Component)

▹ 1.5 million active members▹ The Knot has helped marry

25 million couples▹ Partnering with 300,000

wedding vendors ▹ Millions of vendor reviews

Page 30: Leveraging Taxonomy Management with Machine Learning

Thank you for your interest!

Andreas BlumauerCEO, Semantic Web Company

▸ Mail [email protected]▸ Company https://www.semantic-web.com ▸ LinkedIn https://www.linkedin.com/in/andreasblumauer▸ Twitter https://twitter.com/semwebcompany ▸ Blog https://www.linkedin.com/today/

author/andreasblumauer

30

© Semantic Web Company - http://www.semantic-web.com and http://www.poolparty.biz/