Top Banner
Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica GAMIFIED REWARDS 4/11/12 @jpmalek
17

Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

Nov 16, 2014

Download

Technology

Jeff Malek

Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

Building a High-Volume Reporting System on Amazon

AWSwith MySQL, Tungsten, and

Vertica

GAMIF IED REWARDS

4/11/12 @jpmalek

Page 2: Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

4/11/12 @jpmalek

1. Custom MySQL ETL via shell scripts, visualizations in Tableau2. ETL via a custom Tungsten applier into Vertica 3. New Tungsten Vertica applier, built by Continuent4. Sharded transactional system, multiple Tungsten Vertica appliers

What I’ll cover:Our reporting/analytics growth stages, their pitfalls and what we’ve learned:

Page 3: Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

4/11/12 @jpmalek

Stage 1 : Custom MySQL ETL via shell scripts, visualizations in Tableau

1. On slave, dump an hour’s worth of new rows via SELECT INTO OUTFILE2. Ship data file to aggregations host, dump old hourly snapshot, load new3. Perform aggregation queries against temporary snapshot and FEDERATED tables4. Tableau refreshes its extracts after aggregated rows are inserted.

Page 4: Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

Detour : RAID for the Win

4/11/12 @jpmalek

Big drop inAPI endpoint latency

(writes)

Page 5: Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

4/11/12 @jpmalek

Stage 2 : ETL via a custom Tungsten applier into Vertica

Page 6: Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

4/11/12

Stage 2 : Customized Tungsten Replication Setup

Vertica

Extract FromMaster to Log

Extract from Log

FilterCustom Vertica JDBC Applier

Filter DDL & unwantedtables

Slave Replicator

Extract binlog to Tungsten LogMySQL

Master Replicator

Page 7: Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

4/11/12 @jpmalek

Stage 2 : Issues with the Custom Tungsten Filter

1. OLTP transactions on Vertica are very slow! (10 transactions per second vs. around 1000 per second for a MySQL slave). Slave applier could not keep up with MySQL master.

2. Person who created the applier was no longer in the company. 3. Tungsten setup including custom applier was difficult to

maintain and hard to move to other hosts.

Page 8: Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

4/11/12 @jpmalek

Detour : flexible APIs and baseball schedules

Page 9: Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

4/11/12 @jpmalek

Stage 3 : New Tungsten Vertica Applier

Page 10: Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

4/11/12

Stage 3: A Template-Driven Batch Apply Process

Tungsten Replicator Pipeline

CSVFiles

DELETE, then INSERT (Template)

Extract-Filter-Apply

MySQL

Extract-Filter-Apply

Extract-Filter-Apply

Staging Table233, d, 64, …, 1233, i, 64, …, 2239, I, 76, …, 3

COPY(Template)

Base Tables63, ‘bob’, 23, …64, ‘sue’, 76, …67, ‘jim’, 1, …76, ‘dan’, 25, …98, ‘joe’, 66, …

Page 11: Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

4/11/12

Stage 3 : Batch Applier Replication Setup

Vertica

Extract FromMaster to Log

Extract from Log

Slave Replicator

Extract binlog to Tungsten LogMySQL

Master Replicator

Batch applier using SQL template commands

Filter

Use built-inFilters; DDL ignored

CSV

Write date todisk files

COPY / INSERT

Page 12: Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

4/11/12 @jpmalek

Stage 3 : Solving Problems to Get the New Applier to Work

1. Testing – Developed a lightweight testing mechanism for heterogeneous replication

2. Batch applier implementation – Two tries to get it right including SQL templates and full datatype support

3. Character sets – Ensuring consistent UTF-8 handling throughout the replication change, including CSV files

4. Time zones – Ensuring Java VM handled time values correctly5. Performance – Tweak SQL templates to get 50x boost over old

applier

Page 13: Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

4/11/12 @jpmalek

Detour : Sharding

or

Learning How To Sleep In Any Position

Page 14: Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

4/11/12 @jpmalek

Stage 4 : Sharded transactional system, multiple Tungsten Vertica appliers

Page 15: Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

4/11/12 @jpmalek

Solving Problems to Scale Up The Replication Configuration

1. Implement remote batch apply so Tungsten can run off-board from Vertica

2. Convert replication to a direct pipeline with a single service between MySQL and Vertica

3. Create a script to deploy replicator in a single command

4. Create staging tables on Vertica server

Page 16: Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

4/11/12 @jpmalek

Remaining Challenges to Complete Replication Setup

1. Configure replication for global and local DBShards data

2. Ensure performance is up to snuff-currently at 500-1000 transactions per second

3. Introduce intermediate staging servers to reduce number of replication streams into Vertica

Page 17: Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

Thank You!

In summary:

1. Tungsten is a great tool when it comes to MySQL ETL automation, so check it out as an alternative to custom in-house scripts or other options.

2. Vertica is a high-performance, scaleable BI platform that now pairs well with Tungsten. Full360 offers a cloud-based solution.

3. If you’re just getting started on the BI front, hire a BI developer to focus on this stuff, if you can.

4. I see no reason why this framework couldn’t scale to easily handle whatever our business needs in the future.

4/11/12 @jpmalek