John Hammink Evangelist
Move data from MySQL to Redshift with (not much more than) a single clickDecember, 2015
@treasuredata
• Opensource relational database system
• World’s third most widely used RDBMS
• >100 million installations.
• Part of LAMP stack
What’s MySQL?
What’s Redshift?
• Massively Parallel Processing (MPP) database
• Cloud-based, pay as you go
• Migrate to Redshift from: -On-Premises data warehouses -Sharded MySQL/PostgreSQL
Why would you migrate
What are the problems?
user_id sign_up_date action
123 “2015-09-01” “view”
………
campaign
“Twitter”
{“user_id”: 123, “sign_up_date”: “2015-09-01”, “action”: “view”, “campaign”: “Twitter”}
www
>.jar
>.rb
>.pyMulti-SourceEvent Data
Generic RelationalDatabase
Generic RelationalDatabase
In-DepthAdvancedAnalytic
www
>.jar
>.rb
>.pyMulti-SourceEvent Data
Generic RelationalDatabase
Generic RelationalDatabase
In-DepthAdvancedAnalytic
user_id sign_up_date action
123 “2015-09-01” “view”
………
campaign
“Twitter”
{“user_id”: 123, “sign_up_date”: “2015-09-01”, “action”: “view”, “campaign”: “Twitter”}
Event Data: Missing Field
Evolving Schema
user_id sign_up_date action
123 “2015-09-01” “view”
………
{“user_id”: 123, “sign_up_date”: “2015-09-01”, “action”: “view”}
www
>.jar
>.rb
>.pyMulti-SourceEvent Data
Generic RelationalDatabase
In-DepthAdvancedAnalytic
What are a few migration challenges
?REDSHIFT
1. STORAGE?2. SCHEMA COMPATIBILITY?3. AUTOMATION?
Manually exporting my sql to redshift
Create a Redshift cluster
Export MySQL data and split them into multiple files
Upload the load files to Amazon S3
Run a create table command
Run a COPY command to load the table
Verify that the data was loaded correctly
1
2
3
4
5
6
What are the solutions?
Scheduling
8
Cloud storage Schema on read
Solution: Cloud Data Lake + Redshift
JSON Event Data Cloud Data Lake (schema-on-read)
Cloud Data Warehouse (schema-on-write)
14
DEMO
Tweet after me! “At @sumologic with @treasuredata learning how to migrate easily from #MySQL to #Redshift”