Top Banner
Analyzing Text & Images Getting More Insight from Web Content with S4 Marin Dimitrov (CTO of Ontotext) & Georgi Kadrev (CEO of Imagga) Nov 2015
41

Analyzing Text & Images - Getting More Insight from Web Content with S4

Feb 15, 2017

Download

Technology

Marin Dimitrov
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Analyzing Text & Images - Getting More Insight from Web Content with S4

Analyzing Text & Images Getting More Insight from Web Content with S4

Marin Dimitrov (CTO of Ontotext) & Georgi Kadrev (CEO of Imagga)

Nov 2015

Page 2: Analyzing Text & Images - Getting More Insight from Web Content with S4

Some Ontotext Customers

2 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 3: Analyzing Text & Images - Getting More Insight from Web Content with S4

S4 Webinar - Analyzing Text & Images

Smart Data Management

3

Graph Database

• Flexible RDF graph data model

• Ontology metadata layer

Semantic Search

• Semantic, exploratory search • Information discovery • Metadata driven content

Text Mining & Interlinking

• People, locations, organisations, topics

• Discover implicit relations • Reuse open knowledge

graphs

Nov 2015

Page 4: Analyzing Text & Images - Getting More Insight from Web Content with S4

Presentation Outline

• The Self-Service Semantic Suite (S4)

• Recent Updates

• Combining Text + Images Analytics

• Imagga Image Tagging

• Roadmap

• Q & A

4 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 5: Analyzing Text & Images - Getting More Insight from Web Content with S4

What Is S4?

• Capabilities for Smart Data management and analytics

−Text analytics for news, life sciences and social media

−RDF graph database as-a-service

−Access to large open knowledge graphs

• Available on-demand, anytime, anywhere

−Simple RESTful services

• Simple pay-per-use pricing

−No upfront commitments

5 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 6: Analyzing Text & Images - Getting More Insight from Web Content with S4

What Is S4?

6 Nov 2015 S4 Webinar - Analyzing Text & Images

+ Image analytics

via Imagga

Page 7: Analyzing Text & Images - Getting More Insight from Web Content with S4

S4 Benefits

• Enables quick prototyping

− Instantly available, no provisioning & operations required

−Focus on building applications, don’t worry about software + infrastructure

• Free tier!

• Easy to start, shorter learning curve

−Detailed documentation, various add-ons, SDKs and demo code

• Based on enterprise technology by Ontotext

7 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 8: Analyzing Text & Images - Getting More Insight from Web Content with S4

Nov 2015 S4 Webinar - Analyzing Text & Images

3. Check out the docs, demos

& sample code at

docs.s4.ontotext.com

Getting Started in Minutes

8

1. Register a personal account at s4.ontotext.com

2. Generate an

API key pair

4. Contact us

with questions!

Page 9: Analyzing Text & Images - Getting More Insight from Web Content with S4

Text Analytics with S4

• Text analytics services − News annotation

− News categorisation

− Biomedical

− Twitter

• Entity linking & disambiguation − Mappings to DBpedia & GeoNames instances

− Mappings to biomedical data sources (LinkedLifeData)

• HTML, MS Word, XML, plain text input

• Simple JSON output

9 Nov 2015 S4 Webinar - Analyzing Text & Images

+ Image analytics

via Imagga

Page 10: Analyzing Text & Images - Getting More Insight from Web Content with S4

News Analytics Example

10 Oct 2015

Page 11: Analyzing Text & Images - Getting More Insight from Web Content with S4

News Analytics Example

11 Oct 2015

S4 result

Page 12: Analyzing Text & Images - Getting More Insight from Web Content with S4

News Analytics Example

12 Oct 2015

S4 result

Page 13: Analyzing Text & Images - Getting More Insight from Web Content with S4

News Classification

13 Oct 2015

S4 result

Page 14: Analyzing Text & Images - Getting More Insight from Web Content with S4

Try It!

14 Oct 2015

API_KEY=…

KEY_SECRET=…

SERVICE_ENDPOINT="https://$API_KEY:[email protected]/v1/news-classifier"

URL="http://www.theguardian.com/world/2015/nov/11/germany-spied-fbi-un-bodies-french-foreign-minister"

JSON_REQUEST="{\"documentUrl\" : \"$URL\",

\"documentType\" : \"text/html\“ }"

curl -X POST -H "Content-Type: application/json" –H “Accept: application/json” -d "$JSON_REQUEST"

$SERVICE_ENDPOINT

API_KEY=…

KEY_SECRET=…

SERVICE_ENDPOINT="https://$API_KEY:[email protected]/v1/news"

URL="http://www.theguardian.com/world/2015/nov/11/germany-spied-fbi-un-bodies-french-foreign-minister"

JSON_REQUEST="{\"documentUrl\" : \"$URL\",

\"documentType\" : \"text/html\“ }"

curl -X POST -H "Content-Type: application/json" –H “Accept: application/json” -d "$JSON_REQUEST"

$SERVICE_ENDPOINT

Page 15: Analyzing Text & Images - Getting More Insight from Web Content with S4

Biomedical Analytics

15 Aug 2015 Introduction to Semantic Technology 15 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 16: Analyzing Text & Images - Getting More Insight from Web Content with S4

What Is S4?

16 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 17: Analyzing Text & Images - Getting More Insight from Web Content with S4

RDF Graph Databases – Advantages

• Simple, graph based data model

• Agile schema / schema-less / schema-late

• Ontology-based schema

• Global identifiers of resources (entities)

• Inference of implicit facts, based on rules

• Exploratory queries against unknown schema

• Compliance to standards (RDF, SPARQL), no vendor lock-in

17 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 18: Analyzing Text & Images - Getting More Insight from Web Content with S4

RDF Graph Databases – Inferring New Facts

18 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 19: Analyzing Text & Images - Getting More Insight from Web Content with S4

RDF Graph Database-as-a-Service Benefits

• Evaluate the technology

• Instant deployment

• Faster experimentation & application development

• Data services / Open Data publishing

• Reducing TCO & risk

19 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 20: Analyzing Text & Images - Getting More Insight from Web Content with S4

Self-managed RDF DBaaS

• Available from AWS Marketplace, “1-Click” purchasing

• Variety of hardware configurations

• Manage large RDF data volumes

• Pay-per-hour pricing, 5-day trial

• Users take care of operations

−Backups, restores

20 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 21: Analyzing Text & Images - Getting More Insight from Web Content with S4

Fully Managed RDF DBaaS

• Low-cost graph DBaaS available 24/7 on S4

• Ideal for small to moderate data & query volumes

−database options: 1M, 10M, 50M, 250M & 1B triples

• Instantly deploy new databases when needed

• Zero administration

−automated operations, maintenance & upgrades

• Users pay only for the actual database utilisation

• Standard OpenRDF REST API, 3rd party tools 21 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 22: Analyzing Text & Images - Getting More Insight from Web Content with S4

Fully Managed RDF DBaaS

22 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 23: Analyzing Text & Images - Getting More Insight from Web Content with S4

Fully Managed RDF DBaaS

23 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 24: Analyzing Text & Images - Getting More Insight from Web Content with S4

OpenRDF REST API

24

resource operations comments

/repositories GET Get info on DB repos

/repositories/<REPOSITORY> GET, POST, PUT, DELETE Create*, delete, query a repository

/repositories/<REPOSITORY>/size GET Gets the number of triples in a

repository

/repositories/<REPOSITORY>/statements GET, POST, PUT, DELETE Add, read, update, delete statements

repositories/<REPOSITORY>/rdf-graphs/<GRAPH> GET, POST, PUT, DELETE

Same as above

Nov 2015 S4 Webinar - Analyzing Text & Images

Page 25: Analyzing Text & Images - Getting More Insight from Web Content with S4

Presentation Outline

• The Self-Service Semantic Suite (S4)

• Recent Updates

• Combining Text + Images Analytics

• Imagga Image Tagging

• Roadmap

• Q & A

25 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 26: Analyzing Text & Images - Getting More Insight from Web Content with S4

Monthly Upgrades of the RDF DBaaS

26 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 27: Analyzing Text & Images - Getting More Insight from Web Content with S4

Recent RDF DBaaS Updates

• Improved stability, bugfixes

• Fine-grained access control

− repositories can be open for R/O public data access

− Useful for Open Data publishing

• Database exports in various formats

• Automated backup & restore

• Context indices

• Sample code in various programming languages

• Improved documentation 27 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 28: Analyzing Text & Images - Getting More Insight from Web Content with S4

Python SDK

28 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 29: Analyzing Text & Images - Getting More Insight from Web Content with S4

Python SDK

29 Nov 2015 S4 Webinar - Analyzing Text & Images

key = '<api-key>'

secret = '<api-secret>'

endpoint = "https://text.s4.ontotext.com/v1/news“

# Prepare the data

data = {

"documentUrl": "<document url goes here>",

"documentType": "text/html",

}

jsonData = json.dumps(data)

# Prepare the POST headers

headers = {

'Accept': "application/json",

'Content-type': "application/json",

'Accept-Encoding': "gzip",

}

# Prepare & execute the request

req = requests.post(endpoint, headers=headers, data=jsonData, auth=(key, secret))

response = json.loads(req.content.decode('utf-8'))

print(response)

Page 30: Analyzing Text & Images - Getting More Insight from Web Content with S4

More SDKs via Swagger

30 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 31: Analyzing Text & Images - Getting More Insight from Web Content with S4

Presentation Outline

• The Self-Service Semantic Suite (S4)

• Recent Updates

• Combining Text + Images Analytics

• Imagga Image Tagging

• Roadmap

• Q & A

31 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 32: Analyzing Text & Images - Getting More Insight from Web Content with S4

Text + Image Analytics

• Enrich the entities, categories & keywords extracted from text content with image tags & categories

• Image analytics via Imagga Image Tagging API

• Easy to use with S4

32 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 33: Analyzing Text & Images - Getting More Insight from Web Content with S4

{ "text“ : "The text of the document",

"entities“ : {

"AnnotationType1“ : […],

"AnnotationType2“ : […] },

"images": [

{ "image“ : "imageURL",

"tags“ : [

{ "confidence": …,

"tag“ : "SampleTag“ },

{ "confidence": …,

"tag“ : "SampleTag2“ }

],

"categories": [

{ "confidence": ….,

"name“ : "SampleCategory1“ },

{ "confidence": …,

"name“ : "SampleCategory2“ }

]

}

]

}

Text + Image Analytics

33 Nov 2015 S4 Webinar - Analyzing Text & Images

{

"documentType": "text/html",

"documentUrl": "<Paste your url here>",

"imageTagging": true,

"imageCategorization": true

}

Request Response

Page 34: Analyzing Text & Images - Getting More Insight from Web Content with S4

News + Image Analytics

34 Oct 2015

API_KEY=…

KEY_SECRET=…

SERVICE_ENDPOINT="https://$API_KEY:[email protected]/v1/news"

URL="http://www.theguardian.com/world/2015/nov/11/germany-spied-fbi-un-bodies-french-foreign-minister"

JSON_REQUEST="{\"documentUrl\" : \"$URL\",

\"documentType\" : \"text/html\“ ,

\"imageTagging\" : true,

\"imageCategorization\" : true }"

curl -X POST -H "Content-Type: application/json" –H “Accept: application/json” -d "$JSON_REQUEST"

$SERVICE_ENDPOINT

API_KEY=…

KEY_SECRET=…

SERVICE_ENDPOINT="https://$API_KEY:[email protected]/v1/news"

URL="http://www.theguardian.com/world/2015/nov/11/germany-spied-fbi-un-bodies-french-foreign-minister"

JSON_REQUEST="{\"documentUrl\" : \"$URL\",

\"documentType\" : \"text/html\“ }"

curl -X POST -H "Content-Type: application/json" –H “Accept: application/json” -d "$JSON_REQUEST"

$SERVICE_ENDPOINT

Page 35: Analyzing Text & Images - Getting More Insight from Web Content with S4

News + Image Analytics

35 Oct 2015

S4 + Imagga result

Page 36: Analyzing Text & Images - Getting More Insight from Web Content with S4

Presentation Outline

• The Self-Service Semantic Suite (S4)

• Recent Updates

• Combining Text + Images Analytics

• Imagga Image Tagging

• Roadmap

• Q & A

36 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 37: Analyzing Text & Images - Getting More Insight from Web Content with S4

Presentation Outline

• The Self-Service Semantic Suite (S4)

• Recent Updates

• Combining Text + Images Analytics

• Imagga Image Tagging

• Roadmap

• Q & A

37 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 38: Analyzing Text & Images - Getting More Insight from Web Content with S4

Roadmap

• RDF graph database-as-a-service

− Regular upgrades

− GraphDB Workbench

− Fully managed DBaaS of up to 1 billion triples

• Text Analytics

− Multi-lingual pipelines

− Large-scale processing

− Improvements of the integrated text + image analytics

• Video analytics via Imagga API 38 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 39: Analyzing Text & Images - Getting More Insight from Web Content with S4

Key Takeaways

• S4 provides key capabilities for Smart Data management & analytics

−Text analytics

−RDF graph database-as-a-service

−Knowledge graphs

• S4 enables faster prototyping

• Integrated text+image analytics for more insight from web content

• Check out http://s4.ontotext.com

39 Nov 2015 S4 Webinar - Analyzing Text & Images

Page 41: Analyzing Text & Images - Getting More Insight from Web Content with S4

Analyzing Text & Images: Getting More Insight from Web Content with S4

Thank You!