YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: How elasticsearch powers the Guardian's newsroom

How Elasticsearch powers the Guardian’s newsroom

graham tackley ■ @tackers director of architecture

guardian news and media

shay banon ■ @kimchy creator, co-founder and cto elasticsearch

Page 2: How elasticsearch powers the Guardian's newsroom
Page 3: How elasticsearch powers the Guardian's newsroom
Page 4: How elasticsearch powers the Guardian's newsroom

“created in 1936 ... to secure the financial and editorial independence of the Guardian in perpetuity”

Page 5: How elasticsearch powers the Guardian's newsroom
Page 6: How elasticsearch powers the Guardian's newsroom

our in-house real-time traffic tool

Page 7: How elasticsearch powers the Guardian's newsroom
Page 8: How elasticsearch powers the Guardian's newsroom

my desktop workstation

production apaches

something htmly ?

Page 9: How elasticsearch powers the Guardian's newsroom

ssh $SERVER "nice tail -f /apache2/logs/guardian-access_log"

Page 10: How elasticsearch powers the Guardian's newsroom

my desktop workstation

2 x production apaches

publisher

ssh “tail”

zeromq

xSEO

dashboard

Page 11: How elasticsearch powers the Guardian's newsroom
Page 12: How elasticsearch powers the Guardian's newsroom
Page 13: How elasticsearch powers the Guardian's newsroom
Page 14: How elasticsearch powers the Guardian's newsroom

my desktop workstationx

Page 15: How elasticsearch powers the Guardian's newsroom

Javascript in browser

SNS

SQS

hidden pixel

Dashboard

Tracker

Page 16: How elasticsearch powers the Guardian's newsroom
Page 17: How elasticsearch powers the Guardian's newsroom
Page 18: How elasticsearch powers the Guardian's newsroom

Javascript in browser

Tracker

SNS

SQS

hidden pixel

SQS

Dashboard

Serf

elasticsearch

Dashboard

Page 19: How elasticsearch powers the Guardian's newsroom
Page 20: How elasticsearch powers the Guardian's newsroom

12 * m3.xlarge

in an autoscaling group (with manual scaling)

instance store (SSD)

https://github.com/guardian/status-app

Page 21: How elasticsearch powers the Guardian's newsroom
Page 22: How elasticsearch powers the Guardian's newsroom

{ "dt": "2014-03-03T02:01:48.026Z", "url": "http://www.theguardian.com/film/2014/mar/03/oscars-2014-winners-list", "queryString": "", "host": "www.theguardian.com", "path": "/film/2014/mar/03/oscars-2014-winners-list", "section": "film", "platform": "r2", "userAgent": { "type": "Browser", "family": "Safari 5.1.9", "os": "OS X 10.6.8", "device": "Personal computer" }, "documentReferrer": "http://www.theguardian.com/world", "browser": { "id": "gA6RUFLhWNQvWdt0rW4r78Fg", "isNew": false }, "referringHost": "theguardian.com", "referringPath": "/world", "isContent": true, "contentPublicationDate": "2014-03-03", "countryCode": "US", "countryName": "United States", "location": { "lonlat": [-73.4409, 41.2094] }}

⇠filter

⇠filter

⇠count per minute

Page 23: How elasticsearch powers the Guardian's newsroom

{ "query" : { "filtered" : { "query" : { "match_all" : { } }, "filter" : { "term" : { "path" : "/film/2014/mar/03/oscars-2014-winners-list" } } } }, …

Page 24: How elasticsearch powers the Guardian's newsroom

… "facets": { "Reddit": { "date_histogram": { "field": "dt", "interval": "1m" }, "facet_filter": { "term": { "referringHost": "reddit.com" } } }, "Facebook": { "date_histogram": { "field": "dt", "interval": "1m" }, "facet_filter": { "term": { "referringHost": "facebook.com" } } }, "Google": { "date_histogram": { "field": "dt", "interval": "1m" }, "facet_filter": { "or": { "filters": [ { "prefix": { "referringHost": "www.google." } }, { "prefix": { "referringHost": "news.google." } } ] } } } }}

Page 25: How elasticsearch powers the Guardian's newsroom
Page 26: How elasticsearch powers the Guardian's newsroom

/graph/breakdown?section=commentisfree

Page 27: How elasticsearch powers the Guardian's newsroom

?section=commentisfree

ophan.StandardFilters

ophan.StandardFiltersToElasticsearch

org.elasticsearch.index.query.FilterBuilder

Page 28: How elasticsearch powers the Guardian's newsroom

{ "query" : { "filtered" : { "query" : { "match_all" : { } }, "filter" : { "term" : { "path" : "/film/2014/mar/03/oscars-2014-winners-list" } } } }, …

Page 29: How elasticsearch powers the Guardian's newsroom

"filter": { "and": { "filters": [ { "range": { "dt": { "from": "2014-03-03T00:00:00.000Z", "to": "2014-03-03T22:30:59.999Z", "include_lower": true, "include_upper": false } } }, { "not": { "filter": { "term": { "countryCode": "GNM" } } } }, { "not": { "filter": { "term": { "userAgent.type": "Robot" } } } }, { "filter": { "terms": { "section": [ "commentisfree" ] }} } ] }}

Page 30: How elasticsearch powers the Guardian's newsroom
Page 31: How elasticsearch powers the Guardian's newsroom
Page 32: How elasticsearch powers the Guardian's newsroom

thank you

graham tackley ■ @tackers director of architecture

guardian news and media

shay banon ■ @kimchy creator, co-founder and cto elasticsearch


Related Documents