Top Banner
How Elasticsearch powers the Guardian’s newsroom graham tackley @tackers director of architecture guardian news and media shay banon @kimchy creator, co-founder and cto elasticsearch
32

How elasticsearch powers the Guardian's newsroom

Nov 22, 2014

Download

Technology

Graham Tackley

http://qconlondon.com/london-2014/presentation/How%20Elasticsearch%20Powers%20the%20Guardian's%20Newsroom:

theguardian.com is one of the world's most popular news websites, visited by over 80 million unique browsers every month. Yet in the past, their journalists and editors found it difficult to get meaningful, timely data on what people were reading.

In response to these issues, Graham and colleagues at the Guardian built "ophan", an in-house real-time analytics system based on Elasticsearch. By working closely with journalists and editors, they've focused on what they can action to provide a better experience for the Guardian's existing readers and enable more people discover their unique content.

In this talk, Graham will dive into the details of ophan - obstacles faced by the newsroom that prompted them to build the system, how it works for alerting and how the tool has made the Guardian's readers - and staffers - lives better. While Graham explores this real world use case, Shay will cover the technical underpinnings of ophan with a deep dive into the Elasticsearch features and functionality that power the ophan system.

Attendees will leave with a solid understanding of Elasticsearch's features and architecture, all gained through the lens of a real-world and hyperlocal use case.

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How elasticsearch powers the Guardian's newsroom

How Elasticsearch powers the Guardian’s newsroom

graham tackley ■ @tackers director of architecture

guardian news and media

shay banon ■ @kimchy creator, co-founder and cto elasticsearch

Page 2: How elasticsearch powers the Guardian's newsroom
Page 3: How elasticsearch powers the Guardian's newsroom
Page 4: How elasticsearch powers the Guardian's newsroom

“created in 1936 ... to secure the financial and editorial independence of the Guardian in perpetuity”

Page 5: How elasticsearch powers the Guardian's newsroom
Page 6: How elasticsearch powers the Guardian's newsroom

our in-house real-time traffic tool

Page 7: How elasticsearch powers the Guardian's newsroom
Page 8: How elasticsearch powers the Guardian's newsroom

my desktop workstation

production apaches

something htmly ?

Page 9: How elasticsearch powers the Guardian's newsroom

ssh $SERVER "nice tail -f /apache2/logs/guardian-access_log"

Page 10: How elasticsearch powers the Guardian's newsroom

my desktop workstation

2 x production apaches

publisher

ssh “tail”

zeromq

xSEO

dashboard

Page 11: How elasticsearch powers the Guardian's newsroom
Page 12: How elasticsearch powers the Guardian's newsroom
Page 13: How elasticsearch powers the Guardian's newsroom
Page 14: How elasticsearch powers the Guardian's newsroom

my desktop workstationx

Page 15: How elasticsearch powers the Guardian's newsroom

Javascript in browser

SNS

SQS

hidden pixel

Dashboard

Tracker

Page 16: How elasticsearch powers the Guardian's newsroom
Page 17: How elasticsearch powers the Guardian's newsroom
Page 18: How elasticsearch powers the Guardian's newsroom

Javascript in browser

Tracker

SNS

SQS

hidden pixel

SQS

Dashboard

Serf

elasticsearch

Dashboard

Page 19: How elasticsearch powers the Guardian's newsroom
Page 20: How elasticsearch powers the Guardian's newsroom

12 * m3.xlarge

in an autoscaling group (with manual scaling)

instance store (SSD)

https://github.com/guardian/status-app

Page 21: How elasticsearch powers the Guardian's newsroom
Page 22: How elasticsearch powers the Guardian's newsroom

{ "dt": "2014-03-03T02:01:48.026Z", "url": "http://www.theguardian.com/film/2014/mar/03/oscars-2014-winners-list", "queryString": "", "host": "www.theguardian.com", "path": "/film/2014/mar/03/oscars-2014-winners-list", "section": "film", "platform": "r2", "userAgent": { "type": "Browser", "family": "Safari 5.1.9", "os": "OS X 10.6.8", "device": "Personal computer" }, "documentReferrer": "http://www.theguardian.com/world", "browser": { "id": "gA6RUFLhWNQvWdt0rW4r78Fg", "isNew": false }, "referringHost": "theguardian.com", "referringPath": "/world", "isContent": true, "contentPublicationDate": "2014-03-03", "countryCode": "US", "countryName": "United States", "location": { "lonlat": [-73.4409, 41.2094] }}

⇠filter

⇠filter

⇠count per minute

Page 23: How elasticsearch powers the Guardian's newsroom

{ "query" : { "filtered" : { "query" : { "match_all" : { } }, "filter" : { "term" : { "path" : "/film/2014/mar/03/oscars-2014-winners-list" } } } }, …

Page 24: How elasticsearch powers the Guardian's newsroom

… "facets": { "Reddit": { "date_histogram": { "field": "dt", "interval": "1m" }, "facet_filter": { "term": { "referringHost": "reddit.com" } } }, "Facebook": { "date_histogram": { "field": "dt", "interval": "1m" }, "facet_filter": { "term": { "referringHost": "facebook.com" } } }, "Google": { "date_histogram": { "field": "dt", "interval": "1m" }, "facet_filter": { "or": { "filters": [ { "prefix": { "referringHost": "www.google." } }, { "prefix": { "referringHost": "news.google." } } ] } } } }}

Page 25: How elasticsearch powers the Guardian's newsroom
Page 26: How elasticsearch powers the Guardian's newsroom

/graph/breakdown?section=commentisfree

Page 27: How elasticsearch powers the Guardian's newsroom

?section=commentisfree

ophan.StandardFilters

ophan.StandardFiltersToElasticsearch

org.elasticsearch.index.query.FilterBuilder

Page 28: How elasticsearch powers the Guardian's newsroom

{ "query" : { "filtered" : { "query" : { "match_all" : { } }, "filter" : { "term" : { "path" : "/film/2014/mar/03/oscars-2014-winners-list" } } } }, …

Page 29: How elasticsearch powers the Guardian's newsroom

"filter": { "and": { "filters": [ { "range": { "dt": { "from": "2014-03-03T00:00:00.000Z", "to": "2014-03-03T22:30:59.999Z", "include_lower": true, "include_upper": false } } }, { "not": { "filter": { "term": { "countryCode": "GNM" } } } }, { "not": { "filter": { "term": { "userAgent.type": "Robot" } } } }, { "filter": { "terms": { "section": [ "commentisfree" ] }} } ] }}

Page 30: How elasticsearch powers the Guardian's newsroom
Page 31: How elasticsearch powers the Guardian's newsroom
Page 32: How elasticsearch powers the Guardian's newsroom

thank you

graham tackley ■ @tackers director of architecture

guardian news and media

shay banon ■ @kimchy creator, co-founder and cto elasticsearch