1 A Breakfast Seminar in London 4 th Feb 2010 Data Warehousing Solutions with MySQL Sunday, 7 February 2010
1
A Breakfast Seminar in London4th Feb 2010
Data Warehousing Solutionswith MySQL
Sunday, 7 February 2010
2
9:00 - Welcome Coffee and Tea
9:20 - Introduction
9:30 - MySQL for Data Warehousing
10:00 - Infobright
10:30 - Coffee/Tea Break
10:45 - Talend
11:30 - Seminar Ends.
Sunday, 7 February 2010
Introduction
Sunday, 7 February 2010
MySQL Market Segments
4
Open-Source Powers the Web & The Network
`
Web / Web 2.0 OEM / ISV's
On Demand, SaaS, Hosting Enterprise 2.0Telecommunications
Sunday, 7 February 2010
TimelineSun acquired MySQL completed March 2008
Good acquisition, MySQL continues to grow
April 2009 : ORCL agreement to acquire Sun
The EC gives full clearance to the acquisition
We continue to develop, maintain, market, sell and support MySQL!
5
MAR2008
APR2009
JAN2010
FEB2010
Sunday, 7 February 2010
Oracle’s MySQL Strategy• Becomes part of the Open Source GBU> Independent sales organisation - retained from Sun> Independent development organisation – retained from Sun
•Make MySQL better>Apply Oracle’s expertise and engineering processes>A natural extension of what Oracle has done with InnoDB
•Make MySQL support better>Leverage Oracle’s award winning global support infrastructure
•Make MySQL part of the Oracle stack>Many customers use both MySQL and Oracle database> Integrate with Enterprise Manager, Secure Backup, Audit Vault
6http://www.oracle.com/ocom/groups/public/@ocom/documents/webcontent/044521.pdf
Sunday, 7 February 2010
7
Enjoy the event!
Sunday, 7 February 2010
Data Warehousingwith MySQL
Sunday, 7 February 2010
9
MySQL Data Warehousing Strategy• Strongly support common data warehouse use cases•Offer modern technology that adheres to MySQL’s
software priorities (reliability, performance, ease-of-use) • Partner with major BI/ETL vendors•Offer highly attractive total cost of ownership
Sunday, 7 February 2010
The MySQL DW Ecosystem
10
RDBMS
STORAGE ENGINE
PLATFORM
ETL INTEGRATIONBI/REPORTINGTOOLS
Sunday, 7 February 2010
Common Use Cases1.Small, semi real-time data marts2.Continuous, real-time/query data warehousing3.Traditional, standard reporting warehouse4.Massive historical, with ad-hoc queries warehouse5.BI, analytic in OLTP applications (emerging…)
11
Real-TimeData Mart Traditional AnalyticalHistorical
SQL
Sunday, 7 February 2010
MySQL Technical Strategy• Provide open source architecture to maximize innovation• Offer core data warehousing feature set• Provide specialised data warehouse engines for key use
cases• Supply strategies for combating mixed workload
challenge
12
Sunday, 7 February 2010
Pluggable Storage Engine Architecture
13
Sunday, 7 February 2010
MySQL Enterprise
14
• Global Monitoring of All Servers
• Web-Based Central Console• Built-in Advisors and Expert Advice
• MySQL Query Analyzer• Replication Monitor
• MySQL Enterprise Server
• Monthly Rapid Updates• Quarterly Service Packs
• Hot Fix Program• Indemnification
• 24 x 7 x 365 Production Support
• Web-Based Knowledge Base• Consultative Help
• High Availability and Scale Out
Server
Monitor
Support
http://www.mysql.com/products/enterprise/Sunday, 7 February 2010
MySQL Enterprise Monitor
• Single, consolidated view into entire MySQL environment
• Auto discovery of MySQL Servers, Replication Topologies
• New Query Analyzer• Customisable rules-based
monitoring and alerts• Identifies problems before they
occur• Reduces risk of downtime• Makes it easier
to scale-out without requiring more DBAs
15
“Your Virtual MySQL DBA”Assistant
http://www.mysql.com/products/enterprise/advisors.html
Sunday, 7 February 2010
“Finds code problems before your customers do.”
• Centralised monitoring of Queries across all servers• No reliance on Slow Query Logs,
SHOW PROCESSLIST, VMSTAT, etc.• Aggregated view of query
execution counts, time, and rows• Saves time parsing atomic
executions for total query expense
MySQL Query Analyzer
16
Sunday, 7 February 2010
The MySQL Technology behind a DW Strategy
17
REPLICATION MySQL PROXY
PARTITIONINGCol1 Col2 Col3 Col4 Col5 Col1 Col2 Col3 Col4 Col5
Col1 Col2 Col3 Col4 Col5
SHARDING
MEMCACHED QUERY CACHE
STORAGEENGINES
Sunday, 7 February 2010
Warehouse use cases/mapping
18
Analytical
•MyISAM•InnoDB•CSV•Archive•Federated•Query Cache•Replication•Sharding•Proxy•Memcached
Historical
SQL
•MyISAM•InnoDB•CSV•Archive•Federated•Query Cache•Replication•Sharding•Proxy•Memcached
Traditional
•MyISAM•InnoDB•CSV•Archive•Federated•Query Cache•Replication•Sharding•Proxy•Memcached
Real-Time
•MyISAM•InnoDB•CSV•Archive•Federated•Query Cache•Replication•Sharding•Proxy•Memcached
Data Mart
•MyISAM•InnoDB•CSV•Archive•Federated•Query Cache•Replication•Sharding•Proxy•Memcached
Sunday, 7 February 2010
MySQLData WarehouseCookbook
Sunday, 7 February 2010
Partitioning• Partition Pruning
• Partitioning key must result in an INT
• Check table lock with MyISAM
• Check the number of open files
• Foreign Keys, Fulltext and spatial indexes are not supported
• No MyISAM, LOAD INDEX or INSERT DELAYED
• For DW, it is mainly limited to InnoDB and MyISAM
20
Vertical PartitioningCol1 Col2 Col3 Col4 Col5 Col1 Col2 Col1 Col3 Col4 Col5
Horizontal PartitioningCol1 Col2 Col3 Col4 Col5 Col1 Col2 Col3 Col4 Col5
Col1 Col2 Col3 Col4 Col5
Sunday, 7 February 2010
SQL Generation•Multipass SQL or Subqueries• Avoid complex queries>More efficient use of query cache, key buffer and buffer pool>More shard friendly>More scalable for the current version of MySQL
–No parallel query
•Use temp tables and stored procedures•Check with EXPLAIN> ALL (sequential scan)> Using filesort> Using temporary (for GROUP BY and ORDER BY)
21
Sunday, 7 February 2010
Server Tuning
22
Thread Buffers• join_buffer_size• read_buffer_size• read_rnd_buffer_size• sort_buffer_size• For large resultsets and for high number of concurrent users,
they should be set individually or by role
Temporary Tables• tmp_table_size• max_heap_table_size• Implicit tmp tables can be tricky to control
• Store intermediate results
• Connect > Query > Disconnect
Query Cache• SELECT...SQL_NO_CACHE• query_cache_type• query_cache_limit• query_cache_size• No time functions
Sunday, 7 February 2010
Modelling
23
PK Key Key Key Key Met Met Met Met Met
Key Desc Key Desc Key Desc Key Desc Key Desc Key Desc
Key Desc Key Desc Key Desc Key Desc Key Desc Key Desc
PK Key Key Key Key Met Met Met Met Met
Key Key Key Desc
Key Key Desc
Key Desc
Key Key Key Desc
Key Key Desc
Key Desc
Key Key Key Desc
Key Key Desc
Key Desc
Key Key Key Desc
Key Key Desc
Key Desc
PK Key Key Key Key ... Key Met Met Met PK Met Met Met Met Met Met Met
• Multidimensional, but with care
• Snowflake vs Star Schema> Do not denormalise descriptions> Multiple fact tables with 1:1 relationships
• Queries> Query on Dimension N > Temp Table> Query on Fact 1 > Temp Table> Query on Fact 2 Join Temp Table
Sunday, 7 February 2010
Storage Engines
24
MyISAM• Compressed Tables• Use different spindles for data and indexes• Fast inserts - Insert already sorted data (when possible)• Key Buffers
• Multiple Key Buffers• SET GLOBAL <key_cache_name>.key_buffer_size...• CACHE INDEX ... IN ...• key_cache_block_size• bulk_insert_buffer_size
• Spatial and Fulltext indexes• All active shared disk cluster
InnoDB• innodb_file_per_table• innodb_flush_log_at_trx_commit• innodb_buffer_pool_size• The new Innodb plugin
• Fast index creation• Data compression
• Do not use FK or constraints
CSV• Good ETL trick• No Partitioning, no indexing, no nulls
Archive• Data compression and fast retrieve• INSERT & SELECT• No index (autoincrement only)
Federated• Limited indexing• Tips:
• Queries can be executed on multiple servers + result collection
• Use of stored procedures to consolidate results and control the access to the FEDERATED tables
Sunday, 7 February 2010
Replication• [For some] The easiest way to
provide real time data marts• Tips:>Delayed replication>Rotating servers> Support to more power users
25
SourceMaster
RotatingSlaves
UpdatingQuerying
Read
Write
BI/ReportServers
SourceMaster
Real Time
-10Min
-30Min
Yesterday-1
Hour-12
Hours
Sunday, 7 February 2010
Sharding• Sharding> Great to distribute the workload> Fantastic if the queries can be executed in parallel thanks to a middle or a client
layer> Tips:
– Replicate the dimensions
– specialise shards on facts– partition facts on shards
26
Read
Write
BI/ReportServers
Dimensions Master
Shards
A1 A2 B C1 C2 D
Sunday, 7 February 2010
• Webinars• http://www-it.mysql.com/news-and-events/web-seminars/
• Consulting• MySQL Architecture & Design
• MySQL Performance tuning
http://www.mysql.com/consulting/
• Training• MySQL 5.1 for developers
• MySQL 5.1 for DBAs
http://www.mysql.com/training/
• White Papers• http://www.mysql.com/why-mysql/white-papers/
27
More Resources Available
Sunday, 7 February 2010
28
Data Warehouse Solutionswith MySQL
Thank You!
[email protected]://izoratti.blogspot.com
Sunday, 7 February 2010