Ilian Uzunov (Georgi Georgiev): Ilian Uzunov (Georgi Georgiev)
Post on 21-Jan-2018
687 Views
Preview:
Transcript
Scaling Seman+c Technology to Increase User Engagement -‐ FT.com
September, 16th 2015
Ontotext, Scaling Semantic Technology #1 Sept, 2015
• Introducing Ontotext • Related Reads – a FT.com use case
• What we managed to achieve
• Hands on FT.com live
• PosiHve signs across the news and media domain
• Hands on NOW – News on the Web demo service
Outline
Ontotext, Scaling Semantic Technology #2 Sept, 2015
Why? enable be>er search, analy+cs and content delivery
What? data and content management technology graph database engine + text-‐mining solu+ons
How? seman+c analysis of text, linking text to data NoSQL database with inference
Best for: dealing with heterogeneous dynamic data
Clients: BBC, FT, Bloomberg, DK, AstraZeneca, Wiley, etc.
Facts: 70 staff; HQ in Sofia; sales in London & New York
USP: the best semanHc graph database engine text-‐mining pla[orm integrated with graph database
Company Brief
Ontotext, Scaling Semantic Technology #3 Sept, 2015
Sample RDF Graph: Data and Schema
#4 Sept, 2015
myData:Maria
ptop:Agent
ptop:Person
ptop:Woman
ptop:childOf
ptop:parentOf
rdfs:range
owl:inverseO
f
inferred
myData:Ivan
owl:relativeOf
owl:inverseOfowl:SymmetricProperty
rdfs:subPropertyOf
owl:inverseOf
owl:inverseOf
rdf:type
rdf:type
rdf:type
Ontotext, Scaling Semantic Technology
Seman+c Annota+on
Ontotext, Scaling Semantic Technology #6
pmid:17714090
umls:C0035204
COPD
Bronchial Diseases
Respiration Disorders
umls:C0006261
Chronic Obstructive Airway Diseases
Asthma umls:C000496
Ian A Yang
Clinical and experimental pharmacology …
Sept, 2015
Ontotext and Financial Times
Ontotext, Scaling Semantic Technology
Profile • Top 3 business media • Focused both on B2C publishing and B2B
services
Goals • Create a horizontal pla[orm for both data
and content based on semanHcs and serve all funcHonality through it
Challenges • CriHcal part of the enHre workflow • MulHple development projects in parallel
with up to 2 months Hme between incepHon and go live
• Horizontal pla[orm with focus on organizaHons, people, GPEs and relaHons between them
• AutomaHc extracHon of all these concepts and relaHonships
• Separate stream of work for a user behavior based recommenda+on of relevant content and data across the enHre media
#8 Sept, 2015
Serve relevant arHcles to increase user engagement
and improve usability
FT Primary Objec+ve
Ontotext, Scaling Semantic Technology #9 Sept, 2015
Subject: User Object: Ar+cle, Media Asset, Data, … AcHon: Read, Preview, Comment, …
Subject, Object, Ac+on
Ontotext, Scaling Semantic Technology #10 Sept, 2015
action
Behavioural Recommenda+on
Ontotext, Scaling Semantic Technology #12 Sept, 2015
Behavioural Similarity
User Prof
ile
Contextual and Behavioural in Combina+on
Ontotext, Scaling Semantic Technology #13 Sept, 2015
Behavioural and
Contextual SimilarityReads
User Prof
ile
Average News Ar+cle Metadata
Ontotext, Scaling Semantic Technology #14 Sept, 2015
Article
NY
promoted (popular)
updated
created
image
summary
title
ID
URL
reads
views
votes
comments
FT Ar+cle Metadata
Ontotext, Scaling Semantic Technology #15 Sept, 2015
Summary
Title
body
editorial
img:alt
people
regions
organisations
IPTC
tags
Metadata Used
Ontotext, Scaling Semantic Technology #16 Sept, 2015
Summary
Title
body
editorial
img:alt
people
regions
organisations
IPTC
tags
concepts keyphrases
User Ac+ons: Another Perspec+ve
Ontotext, Scaling Semantic Technology #18 Sept, 2015
perform
comments
votes
posts
preview
read
contains leads to read
leads to preview
Article
Search Action
Result
Date
FTS Q. TagCat
Tag set
results
cattaxonomy
Search Log-----------------------------------------------------------------
• Relies on the previous choices of an individual user (a user's profile)
• Results on the basis of the similarity of items, defined in terms of their content
• The recommended content is rather homogeneous
“Content”-‐based Recommenda+on
Ontotext, Scaling Semantic Technology #19 Sept, 2015
Two-‐fold scoring approach
• Similarity to recently viewed arHcles (context)
• Relevance to a long-‐term user profile – Weights reflecHng the relaHve importance of the individual terms (staHc component)
– TransiHon likelihoods among any pair of terms (dynamic component)
Content-‐based Ranking Mechanisms
Ontotext, Scaling Semantic Technology #20 Sept, 2015
• Rely on staHsHcs that reflect the past choices of all users
• Results based on user raHngs, and the similarity of users or items
• Content-‐agnosHc • Aware of the quality of content
Collabora+ve Filtering
Ontotext, Scaling Semantic Technology #21 Sept, 2015
Collabora+ve Ranking Mechanisms
Ontotext, Scaling Semantic Technology #22 Sept, 2015
User to Content Similarity Score
User to User Sim. Score
Content to Content Sim. Score
• Combines both approaches to improve the quality of predicHon
• Implemented via staHsHcal models
• Takes a wide array of features into consideraHon
Hybrid Approach
Ontotext, Scaling Semantic Technology #23 Sept, 2015
Final Architecture
Ontotext, Scaling Semantic Technology #25 Sept, 2015
SOLR 1
SOLR 2
SOLR 3
CS Node 3
CS Node 1
CS Node 2
ReplicationGroup I
FT API
Fetch &Annotation
OWLIMWorker
RecommendationAPI
Varnish Cache
RR
RR
RR
Read
Article
1. get related
2. ask
4. query
3. on cache miss
1. pull content
2. annotate3. indexannotatecontent
storeuser
profiles
updatepopularity
click stream
update user
AWS INSTANCE
AWS INSTANCEAWS INSTANCE
AWS Elastic LB
1. Pull content – annotate/enrich – index
2. Accumulate/update user profile
3. Recommend
Main Ac+ons
Ontotext, Scaling Semantic Technology #26 Sept, 2015
Implementa+on Overview
Ontotext, Scaling Semantic Technology #27 Sept, 2015
Profile Update Request
(User ID, Item ID)
Query Generation Items Index (Solr)
Profile Storage
(Cassandra)
Recommendation Request (User ID)
Profile Update
User: - context - static component - dynamic component Article: - co-visitation matrix - popularity
Boosted sub-queries for all involved ranking schemes: content-based, collaborative, popularity, recency
• 8m named enHHes and metadata about them
• 20m labels of People and OrganisaHons
• CES cluster which can be scaled horizontally to handle peak loads
• Live dicHonary updates coming from GraphDB through the EUF (EnHty Update Feed) plugin
• Max throughput -‐ 10 docs/sec on a single c3.2xlarge AWS node, mulHple by N to get an N nodes cluster throughput
• Reliability has been 100%, but the soluHon hasn't been stressed as much as we've designed it for
Wrap up -‐ Concept Extrac+on Highlights
Ontotext, Scaling Semantic Technology #28 Sept, 2015
• 100% reliability in producHon for a full year (Ontotext also manages the deployment)
• API handling 1,5m requests a day on average, up to 3m requests a day (1/3 recommendaHons, 1/3 logging user acHon, 1/3 checking whether a user has enough history to ask for behavioural recommendaHons)
• Roughly 200m recommendaHons served and 200m user acHons tracked to day since go live
• 450 873 documents indexed
• No caching, since everything is effecHvely a personalized search request
Wrap up -‐ Recommenda+on Highlights
Ontotext, Scaling Semantic Technology #29 Sept, 2015
• GraphDB had to comply with a set of tests designed by FT and OT: Network lag, Disk Space, Disk Load, Less Memory, CPU Load, etc.
• Comprehensive support for OWL and SPARQL
• Efficient inference through the enHre life-‐cycle of the data
• High-‐availability cluster architecture – proven and mature for more than 5 years now – GraphDB first HA implementaHons works at BBC since 2010 – Unmatched HA Tests and TransacHon load benchmarks
• FTS and NoSQL Connectors for seamless integraHon
Wrap up – GraphDB Highlights
Ontotext, Scaling Semantic Technology #30 Sept, 2015
• Washington Post tests new ‘Knowledge Map’ feature “Our ulHmate goal is to mine big data to surface highly personalized and
contextual data for both journalisHc and naHve content.”
• New York Times RnD Lab announced an experimental project “Editor” 1) recognize a term that can be categorized, 2) link that enHty to exisHng
databases or microservices, 3) make this enriched informaHon accessible to journalists
• BBC Structured Journalist Manifesto Structured journalism : 1) On the reporter side -‐ automaHon helps
improve a journalist’s reporHng and make it less cumbersome, 2) on the audience side semtech helps scale things that can improve the reader’s experience
Posi+ve Signs from the News Industry
Ontotext, Scaling Semantic Technology #31 Sept, 2015
Thanks!
Ontotext, Scaling Semantic Technology #33 Sept, 2015
We will be delighted to have a word with you auer the session or later today or tomorrow!
• Dr. Georgi Georgiev – Head of Ontotext Text Analysis Unit -‐ georgi.georgiev@ontotext.com
• Ilian Uzunov – Sales Director CEMEAA -‐ ilian.uzunov@ontotext.com
• Nikolay Krustev – GraphDB Sales Engineer -‐ nikolay.krustev@ontotext.com
top related