Top Banner
Apr. 13, 2011 Presented to MySQL Conference Building and Deploying Large Scale Real Time News System with MySQL and Distributed Cache
38

Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Jun 13, 2015

Download

Technology

Tao Cheng

Maintaining a constantly updated large data set alone is a big challenging not only to database administrators but also to developers as it is hard to maintain and expand. It adds more stress when the requirement is to serve real time data to heavy traffic websites.

In this presentation, we first examine the initial characteristics of AOL’s Real Time News system, the design strategy, and how MySQL fits into the overall architecture. We then review the issues encountered and the solutions applied when the system characteristics changed due to ever growing data set size and new query patterns.

In addition to common MySQL design, trouble-shooting, and performance tuning techniques, we will also share a heuristic algorithm implemented in the application servers to reduce the response time of complex queries from hours to a few milliseconds.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Apr.  13,  2011  Presented  to  MySQL  Conference  

Building and Deploying Large Scale Real Time News System with

MySQL and Distributed Cache

Page 2: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Who am I?

  Tao Cheng <[email protected]>, AOL Real Time News (RTN).

  Worked on Mail and Browser clients in the ‘90 and then moved to web backend servers since.

  Not an expert but am happy to share my experience and brainstorm solutions.

Presentation for [CLIENT]

Page 2

Page 3: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Agenda

  AOL Real Time News (RTN): what it is?   Requirements   Technical solutions with focus on MySQL   Deployment Topology   Operational Monitoring   Metrics Collection

Page 4: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Agenda

  Tips for query tuning and optimization   Heuristic Query Optimization Algorithm   Lessons learned   Q & A

Page 5: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Real Time News : background

Presentation for AOLU Un-University

Page 5

AOL deployed its large scale Real Time News (RTN) system in 2007. This system ingests and processes news from 30,000 sources on every second around the clock. Today, its data store, MySQL, has accumulated over several billions of rows and terabytes of data. However, news are delivered to end users in close to real time fashion. This presentation shares how it is done and the lessons learned.

Page 6: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Brief Intro: sample features

  Data presentation: return most recent news in   flat view – most recent news about an entity. An entity could

be a person, a company, a sports team, etc.   topic clusters – most recent news grouped by topics. A topic is

a group of news about an event, headline news, etc.

  News filtering by   source types such as news, blogs, press releases, regional, etc.   relevancy level (high, medium, low, etc) to the entities .

  Data Delivery: push (to subscribers) and pull   Search by entities, categories (National, Sports,

Finance, etc), topics, document ID, etc. Presentation for [CLIENT]

Page 6

Page 7: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Requirements for Phase I (2006)

  Commodity hardware: 4 CPU, 16 GB MEM, 600 GB disk space.

  Data ingestion rate = 250K docs/day; average document size = 5 KB.

  Data retention period: 7 days to forever   Est. data set size: (1.25 GB/day or 456 GB/year) +

space for indexes, schema change, and optimization.   Response time: < 30 milli-second/query   Throughputs: > 400 queries/sec/server   Up time: 99.999% Presentation for [CLIENT]

Page 7

Page 8: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Solutions: MySQL + Bucky

  MySQL   Serve raw/distinct queries   Back fill

  Bucky Technology (AOL’s distributed cache & computing framework)   Write ahead cache: pre-compute query results and push them

into cache.   Messaging (optional): push data directly to subscribers

 Updates are pushed to data consumers or browsers via AIM Complex.

  Updates go to both database and cache.

Presentation for [CLIENT]

Page 8

Page 9: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Architecture Diagram (over-simplified)

Presentation for [CLIENT]

Page 9

Asset  DB  

AIM  

Distributed  Cache  

Distributed  Cache   Gateway  

Gateway  

Ingestor   pull

push

WWW

WWW

Relegence  

Page 10: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Data Model: SOR v.s. Query DB

  Separate query from storage to keep tables small and query fast.

  System of Record (SOR): has all raw data   The authoritative data store; designed for data storage   Normalized schema: for simple key look-up; no table join.

  Query DB – de-normalized for query speed   avoid JOIN, reduce # of trips to DB, increase throughputs.

  Read/write small chunk of data at a time so database can get requests out quickly and process more.

  Use replication to achieve linear scalability for read.

Presentation for [CLIENT]

Page 10

Page 11: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Design Strategies: partitioning (Why)

  Dataset too big to fit on one host   Performance consideration: divide and conquer

  Write: more masters (Nx) to take writes   Read: smaller tables + more (NxM) slaves to handle read.

  Fault tolerance – distribute the risk and reduce the impact of system failure

  Easier Maintenance – size does matter   Faster nightly backup, disaster recovery, schema change, etc.   Faster optimization –need optimization to reclaim disk space

after deletion, rebuild indexes to improve query speed.

Presentation for [CLIENT]

Page 11

Page 12: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Design Strategies: partitioning (How)

  Partition on most used keys (look at query patterns)   Document table – on document ID   Entity table – on entity ID

  Simple hash on IDs – no partition map; thus no competition of read/write locks on yet another table

  Managing growth: add another partition set   New documents are written into both old and new partition

sets for a few weeks. Then, stop writing into the old partitions.   Queries go to the new partitions first and then the old ones if

in-sufficient results found.

  Works great in our case but might not for everyone. Presentation for [CLIENT]

Page 12

Page 13: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Schema design: De-normalization

  Make query tables small:   put only essential attributes in the de-normalized tables   store long text attributes in separate tables.

  De-normalization: how to store and match attributes   Single value attributes (1:1) : document ID, short string, date

time, etc. – one column, one row.   Multi-value attributes (1:many): tricky but feasible

 Use multiple rows with composite index/key: (c1, c2, etc.)  One row one column: CSV string, e.g., “id1, id2, id3” – SQL: “val

like ‘%id2%’”  One row but multiple columns, e.g., group1, group2, etc. – SQL:

group1=val1 OR group2=val2 ...

Presentation for [CLIENT]

Page 13

Page 14: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Tips for indexing

  Simple key – for metadata retrieval   Composite key – find matching documents

  Start with low cardinality and most used columns   Order matter: (c1, c2, c3) != (c2, c3, c1)

  InnoDB – all secondary indexes contain primary key   Make primary key short to keep index size small   Queries using secondary index references primary key too.

  Integer v.s. String – comparison of numeric values is faster => index hash values of long string instead.

  Index length – title:varchar(255) => idx_title(32)   Enforce referential integrity on application side. Presentation for [CLIENT]

Page 14

Page 15: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

MySQL configuration

  Storage engine: InnoDB – row level locking   Table space – one file per table

  Easier to maintain (schema change, optimization, etc.)

  Character set: ‘UTF-8’   Disable persistent connection (5.0.x)   skip-character-set-client-handshake

  Enable slow query log to identify bad queries.   System variables for memory buffer size

  innodb_buffer_pool_size: data and indexes   Sort_buffer_size, max_heap_table_size, tmp_table_size   Query cache size=0; tables are updated constantly

Presentation for [CLIENT]

Page 15

Page 16: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Runtime statistics (per server)

  Average write rate:   daily: < 40 tps   max at 400 tps during recovery   Perform best when write rate < 100 tps

  Query rate: 20~80 qps   Query response time – shorter when indexes and

data are in memory   75%: ~3 ms when qps < 15; ~2 ms when qps ~= 60   95%: 6~8 ms when qps < 15; 3~4 ms when qps ~= 60   CPU Idle %: > 99%.

Presentation for [CLIENT]

Page 16

Page 17: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Presentation for [CLIENT]

Page 17

Page 18: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Deployment Topology Consideration

•  Minimum configuration: host/DC redundency •  DC1: host 1 (master), host 3 (slave) •  DC2: host 2 (failover master), host 4 (slave)

•  Data locality: significant when network latency is a concern (100 Mbps) •  3,000 qps when DB is on remote host. •  15,000 qps when DB is on local host.

•  Linking dependent servers across data centers •  Push cross link up as far as possible (Topology 3): link to

dependent servers in the same data center.

Presentation for [CLIENT]

Page 18

Page 19: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Deployment Topology 1: minimum config

Presentation for [CLIENT]

Page 19

WWW

DB DB

DB DB

Data Consumer

Date Center 1

Date Center 2

Page 20: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Topology 2: link across DCs (bad)

Presentation for [CLIENT]

Page 20

WWW

DB DB

Data Consumer

GSLB

DB Data

Consumer

VIP

VIP

DB

DB Data

Consumer

DB Data

Consumer

VIP

VIP

GSLB

Page 21: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Topology 3: link to same DC (better)

Presentation for [CLIENT]

Page 21

WWW

DB DB

Data Consumer

GSLB

DB Data

Consumer

VIP

VIP

DB

DB Data

Consumer

DB Data

Consumer

VIP

VIP

Page 22: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Topology 4: use local UNIX socket

Presentation for [CLIENT]

Page 22

WWW

DB DB Data

Consumer

GSLB

DB Data

Consumer

VIP

DB

DB Data

Consumer

DB Data Consumer

VIP

Page 23: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Production Monitoring

  Operational Monitoring: logcheck, Scout/NOC alert, etc.

  DB monitoring on replication failure, latency, read/write rate, performance metrics.

Presentation for [CLIENT]

Page 23

Page 24: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Metrics Collection

  Graphing collected metrics: visualize and collate operational metrics.   Help analyzing and fine tuning server performance.   Help trace production issues and identify point of failure.

  What metrics are important?   Host: CPU, MEM, disk I/O, network I/O, # of processes, CPU

swap/paging   Server: Throughputs, response time

  Comparison: line up charts (throughputs, response time, CPU, disk i/o) in the same time window.

Presentation for [CLIENT]

Page 24

Page 25: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Presentation for [CLIENT]

Page 25

Page 26: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Presentation for [CLIENT]

Page 26

Page 27: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Presentation for [CLIENT]

Page 27

Page 28: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Tuning and Optimizing Queries

  Explain: mysql> explain SELECT ... FROM …   Watch out for tmp table usage, table scan, etc.   SQL_NO_CACHE   MySQL Query profiler

  mysql> set profiling=1;

  Linux OS Cache: leave enough memory on host   USE INDEX hint to choose INDEX explicitly

  use wisely: most of the time, MySQL chooses the right index for you. But, when table size grows, index cardinality might change.

Presentation for [CLIENT]

Page 28

Page 29: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Important MySQL statistics

  SHOW GLOBAL STATUS…   Qcache_free_blocks   Qcache_free_memory   Qcache_hits   Qcache_inserts   Qcache_lowmem_prunes   Qcache_not_cached   Qcache_queries_in_cache   Select_scan   Sort_scan

Presentation for [CLIENT]

Page 29

Page 30: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Important MySQL statistics (cont.)

  Table_locks_waited   Innodb_row_lock_current_waits   Innodb_row_lock_time   Innodb_row_lock_time_avg   Innodb_row_lock_time_max   Innodb_row_lock_waits   Select_scan   Slave_open_temp_tables

Presentation for [CLIENT]

Page 30

Page 31: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Heuristic Query Optimization Algorithm

  Primary for complex cluster queries: find latest N topics and related stories.

  Strategy: reduce the number of records database needs to load from disk to perform a query.   Pick a default query range. If in-sufficient docs are returned,

expand query range proportionally.   If none return => sparse data => drop the range and retry.   Save query range for future references.

  Result: reduce number of rows needed to process from millions to hundreds => cut query time down from minutes to less than 10 ms.

Presentation for [CLIENT]

Page 31

Page 32: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Presentation for [CLIENT]

Has query range?

Query  range  look  up  

Bound query with the range and send it to

DB

Suf@icient  results  from  query  engine?  

Compute docs to range ratio and save it back

to the look up table for future use.

NumOfTripToDB++  

Cluster  query  

Use default range

Compute docs to range ratio and prorate it to a range that would return

sufficient amount of docs.

Send original query to DB

numOfResults == 0?

NumOfTripToDB  >=2?  

yes  

NumOfTripToDB  =0  

yes  

Return query results to clients.

Query  Engine  

yes  

no  

Page 33: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Lessons Learned

  Always load test well ahead of launch (2 weeks) to avoid fire drill.

  Don’t rely on cache solely. Database needs to be able to serve reasonable amount of queries on its own.

  Separate cache from applications to avoid cold start.   Keep transaction/query simple and return fast.   Avoid table join; limit it to 2 if really needed.   Avoid stored procedure: results are not cached; need

DBA when altering implementation.

Presentation for [CLIENT]

Page 33

Page 34: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Lessons Learned (cont.)

  Avoid using ‘offset’ in LIMIT clause; use application based pagination instead.

  Avoid ‘SQL_CALC_FOUND_ROWS’ in SELECT   If possible, exclude text/blob columns from query

results to avoid disk I/O.   Store text/blob in separate table to speed up backup,

optimization, and schema change.   Separate real time v.s. archive data for better

performance and easier maintenance.   Keep table size under control ( < 100 GB) ; optimized

periodically. Presentation for [CLIENT]

Page 34

Page 35: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Lessons Learned (cont.)

  Put SQL statement (templates) in resource files so you can tune it without binary change.

  Set up replication in dev & qa to catch replication issues earlier   Transactional (MySQL 5.0.x) v.s. data/mixed (5.1 or above)   Auto-increment + (INSERT.. ON DUPLICATE UPDATE…)   Date time column: default to NOW()   Oversized data: increase max_allowed_packet   Replication lag: transactions that involve index update/

deletion often take longer to complete.

  Host and data center redundancy is important – don’t put all eggs in one basket.

Presentation for [CLIENT]

Page 35

Page 36: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

RTN 3 Redesign

  Free Text Search with SOLR   Real time v.s. archive shards.   1 minute latency w/o Ramdisk.

  Asset DB partitioned – 5 rows/doc -> 25 rows/doc   Avoid (System) Virtual Machine; instead, stack high

end hosts with processes that use different system resources (CPU, MEM, disk space, etc)   Better network and system resource utilization – cost effective.   Data Locality

  More processors (< 12 ) help when under load.

Presentation for [CLIENT]

Page 36

Page 37: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

Q & A

  Questions or comments?

Presentation for [CLIENT]

Page 37

Page 38: Building and deploying large scale real time news system with my sql and distributed cache mysql_conf

  THANK YOU !!

Presentation for [CLIENT]

Page 38