Top Banner
OCTOBER 13-16, 2016 AUSTIN, TX
21

Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

Jan 07, 2017

Download

Technology

LucidWorks
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X

Page 2: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

Queue Based Indexing & Collection Management Devansh Dhutia

Platform Architect

Page 3: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

3

01

•  National & Local newspaper/media company •  92+ Markets in 33 states

Page 4: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

4

03Current/Future Architecture

Page 5: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

5

01Agenda

•  Solr @ Gannett •  Current State •  Collection Management •  Queuing Solution •  Future Work •  Questions

Page 6: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

6

02

@

Site Search CMS Search

Analytics Personalization

40+ Applications 20M+

Integral pillar of Gannett’s Digital Platform

total documents

800,000+ per month

Growing rapidly

100,000+ requests per minute

Highly Available

~100ms average response time

Extremely Fast

8 nodes

256 gb memory per availability zone

Page 7: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

7

03Current State

Page 8: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

8

01Current State

•  Synchronous Operations •  Near Realtime •  Time Consuming schema changes •  Visible outage impact

Page 9: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

9

01Collection Management

•  Create Collection •  Deploy Batch Indexer •  Index new Collection •  Update Alias to new Collection •  Run catch up •  Deploy Search/Index Apps

Page 10: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

10

01Realtime Changes / Queries

Page 11: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

11

01Prep Alternate Collection

Page 12: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

12

01Deploy

Page 13: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

13

01Outage Problems

•  Spinning Wheel •  Duplicate content •  Unable to find new content •  Frustrated editors •  Ux & other presentation layers

Page 14: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

14

01Enter Queues

•  Asynchronous Write Operations •  Near Realtime •  Faster schema changes •  Auto scale indexing workers •  Low authoring outage impact •  Eventually consistent

Page 15: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

Queue Based Indexing

Page 16: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

16

01RabbitMQ

•  Clustered & Highly Available •  FIFO •  pub/sub model •  Consistent Hash / Multiple Queues

Page 17: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

17

01RabbitMQ

Page 18: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

18

01Components •  Realtime Queue •  Batch Queue •  Prep Queue •  Deadletter Queue •  Indexing Service •  Prep mode •  Batch Push Service

Page 19: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

19

01Future Work

•  Continuous Delivery of schema •  Build payload in one zone only •  Automated Deadletter handling •  Earlier notification of potential failure

Page 20: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

20

01

Thank you Interested in joining our team at Gannett?

http://www.gannett.com/careers

Page 21: Queue Based Solr Indexing with Collection Management: Presented by Devansh Dhutia, Gannett Co.

21

01

Questions?