1 2010 Calpont Corporation – Confidential & Proprietary Making MySQL Great for Business Intelligence Robin Schumacher VP Products Calpont
Jan 26, 2015
1
2010 Calpont Corporation – Confidential & Proprietary
Making MySQL Great for Business
Intelligence
Robin SchumacherVP Products
Calpont
2
2010 Calpont Corporation – Confidential & Proprietary
Agenda
• Quick overview of BI• Looking at the right technology foundation• General physical MySQL design decisions that
impact success• A look at row vs. column MySQL databases• Conclusions
3
2010 Calpont Corporation – Confidential & Proprietary
A Quick Overview of Business Intelligence
4
2010 Calpont Corporation – Confidential & Proprietary
What is Business Intelligence?
Business Intelligence (BI) refers to skills, processes, technologies, applications and practices used to support decision making.
BI technologies provide historical, current, and predictive views of business operations. Common functions of Business Intelligence technologies are reporting, online analytical processing, analytics, data mining, business performance management, benchmarking, text mining, and predictive
analytics.
5
2010 Calpont Corporation – Confidential & Proprietary
Why Business Intelligence?
• All companies now recognize the need for BI• Information is a weapon that both large and small
companies use to better understand their customer, competitors, and marketplace
• Making poorly informed decisions can be disastrous
6
2010 Calpont Corporation – Confidential & Proprietary
Overview of Most BI Frameworks
OLTP
Files/XML
Log Files
Operational
Source Data
Stag
ing
or O
DS
ETL
Fina
l ET
L
Rep
ortin
g, B
I, N
otifi
catio
n La
yer Ad-Hoc
Dashboards
Reports
Notifications
Users
Staging
Area
Data
Warehouse
Warehouse
Archive
Purge/Archive
Data Warehouse and Metadata Management
7
2010 Calpont Corporation – Confidential & Proprietary
Simple Reporting Databases
OLTP Database Read Shard OneReporting Database
Application Servers
End Users
ETL
Data Archiving Link
Replication
8
2010 Calpont Corporation – Confidential & Proprietary
Building the Right Technical Foundation
9
2010 Calpont Corporation – Confidential & Proprietary
What is the Key Component for Success?
In other words, what you do with your MySQL Server – in terms of physical design, schema design, and
performance design – will be the biggest factor on whether a BI system hits the mark…
* Philip Russom, “Next Generation Data Warehouse Platforms”, TDWI, 2009.
*
10
2010 Calpont Corporation – Confidential & Proprietary
What Technology Decisions are Being Made?
* Philip Russom, “Next Generation Data Warehouse Platforms”, TDWI, 2009.
*
11
2010 Calpont Corporation – Confidential & Proprietary
What General MySQL Design Decisions Help Success?
12
2010 Calpont Corporation – Confidential & Proprietary
First – Get/Use a Modeling Tool
13
2010 Calpont Corporation – Confidential & Proprietary
Horizontal Partitioning Model
14
2010 Calpont Corporation – Confidential & Proprietary
Read Sharding / Horizontal Partitioning
15
2010 Calpont Corporation – Confidential & Proprietary
Vertical Partitioning Model
16
2010 Calpont Corporation – Confidential & Proprietary
General List of Top BI Design Decisions
• Storage Engine Selection
• Physical Table/Index Partitioning
• Indexing Creation and Placement
• Set proper amounts for memory caches, etc.
• Row vs. Column Engine / Database
17
2010 Calpont Corporation – Confidential & Proprietary
• No practical storage limits (1 tablespace=110TB)• Automatic storage management• ANSI-SQL support for all datatypes (including BLOB and XML)• Data/Index partitioning (range, hash, key, list, composite)• Built-in Replication• Main memory tables (for dimension tables)• Variety of indexes (b-tree, fulltext, clustered, hash, GIS)• Multiple-configurable data/index caches• Pre-loading of index data into index caches• Unique query cache (caches result set + query; not just data)• Parallel data load (5.1 and higher – multiple files)• Multi-insert DML• Data compression (depends on engine) • Read-only tables• Fast connection pooling• Cost-based optimizer • Wide platform support
Core BI Features for MySQL
18
2010 Calpont Corporation – Confidential & Proprietary
MyISAM
Archive
Memory
CSV
• High-speed query/insert engine• Non-transactional, table locking• Good for data marts, small
warehouses
• Compresses data by up to 80%• Fastest for data loads• Only allows inserts/selects• Good for seldom accessed data
• Main memory tables• Good for small dimension tables• B-tree and hash indexes
• Comma separated values• Allows both flat file access and
editing as well as SQL query/DML• Allows instantaneous data loads
Also:Merge for pre-5.1 partitioning
Storage Engines Internal to MySQL
2010 Calpont Corporation – Confidential & Proprietary
Partitioning and Performance (5.1+)mysql> CREATE TABLE part_tab
-> ( c1 int ,c2 varchar(30) ,c3 date )
-> PARTITION BY RANGE (year(c3)) (PARTITION p0 VALUES LESS THAN (1995),
-> PARTITION p1 VALUES LESS THAN (1996) , PARTITION p2 VALUES LESS THAN (1997) ,
-> PARTITION p3 VALUES LESS THAN (1998) , PARTITION p4 VALUES LESS THAN (1999) ,
-> PARTITION p5 VALUES LESS THAN (2000) , PARTITION p6 VALUES LESS THAN (2001) ,
-> PARTITION p7 VALUES LESS THAN (2002) , PARTITION p8 VALUES LESS THAN (2003) ,
-> PARTITION p9 VALUES LESS THAN (2004) , PARTITION p10 VALUES LESS THAN (2010),
-> PARTITION p11 VALUES LESS THAN MAXVALUE );
mysql> create table no_part_tab (c1 int,c2 varchar(30),c3 date);
*** Load 8 million rows of data into each table ***
mysql> select count(*) from no_part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31';
+----------+
| count(*) |
+----------+
| 795181 |
+----------+
1 row in set (38.30 sec)
mysql> select count(*) from part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31';
+----------+
| count(*) |
+----------+
| 795181 |
+----------+
1 row in set (3.88 sec)
90% Response Time Reduction
20
2010 Calpont Corporation – Confidential & Proprietary
Index Creation and Placement
• If query patterns are known and predictable, and data is relatively static, then indexing isn’t that difficult
• If the situation is a very ad-hoc environment, indexing becomes more difficult. Must analyze SQL traffic and index the best you can
• Over-indexing a table that is frequently loaded / refreshed / updated can severely impact load and DML performance. Test dropping and re-creating indexes vs. doing in-place loads and DML. Realize, though, any queries will be impacted from dropped indexes
• Index maintenance (rebuilds, etc.) can cause issues in MySQL (locking, etc.)
• Remember some storage engines don’t support normal indexes (Archive, CSV)
21
2010 Calpont Corporation – Confidential & Proprietary
Row vs. Column Engines / Databases
22
2010 Calpont Corporation – Confidential & Proprietary
Column vs. Row Orientation
A column-oriented architecture looks the same on the surface, but stores data differently than legacy/row-based databases…
23
2010 Calpont Corporation – Confidential & Proprietary
• Column databases only read the columns needed to satisfy a query vs. full rows
• If you are only selecting a subset of columns from a table and / or are using very wide tables, column DB’s are a great choice for BI
• Column databases (most of them…) remove the need for indexing because the column is the index
• Column databases automatically eliminate unnecessary I/O both logically and physically, so they do away with partitioning needs too as well as materialized views, etc.
• As a rule of thumb, column databases provide 5-10x (or more) the query performance of legacy RDBMS’s
Why a Column Database?
24
2010 Calpont Corporation – Confidential & Proprietary
Why a Column Database?
"If you're bringing back all the columns, a column-store database isn't going to perform any better than a row-store DBMS, but
analytic applications are typically looking at all rows and only a few columns. When you put that type of application on a column-
store DBMS, it outperforms anything that doesn't take a column-store approach."
- Donald Feinberg, Gartner Group
25
2010 Calpont Corporation – Confidential & Proprietary
• If you routinely have SELECT * queries or queries that request the majority of columns in a table
• If you constantly are doing lots of singleton inserts and deletes. As these are row-based operations they will normally run somewhat slower on a column DB than a row-oriented DB (more block touches are needed). Updates tend to run OK as they are a column operation
• If you want to do pure OLTP work. Some column DB’s are transactional (so data integrity is ensured), but they are not suited for straight OLTP work
• If you have a small database: such a DB eclipses the benefit column databases offer over row DB’s
Why Not a Column Database?
26
2010 Calpont Corporation – Confidential & Proprietary
What is Calpont’s InfiniDB?
InfiniDB is an open source, column-oriented database architected to handle data warehouses, data marts, analytic/BI systems, and other read-intensive applications. It delivers true scale up (more CPU’s/cores, RAM) and massive
parallel processing (MPP) scale out capabilities for MySQL users. Linear performance gains are achieved when adding either more capabilities to one
box or using commodity machines in a scale out configuration.
Scale up Scale Out
27
2010 Calpont Corporation – Confidential & Proprietary
InfiniDB vs. a Leading Row RDBMS
2 TB’s of raw data; 16 CPU 16GB RAM 14 SAS 15K RPM RAID-0 512MB Cache
28
2010 Calpont Corporation – Confidential & Proprietary
Percona’s Test of Column Databases
610 GB of raw data; 8 Core Machinehttp://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/
29
2010 Calpont Corporation – Confidential & Proprietary
Calpont Solutions
Calpont Analytic Database Server EditionsCalpont Analytic Database Solutions
InfiniDB Community Server
Column-OrientedMulti-threaded
Terabyte CapableSingle Server
InfiniDBEnterprise Server
Scale out /Parallel Processing Automatic
Failover
InfiniDBEnterprise Solution
Monitoring
24x7Support
Auto PatchManagement
Alerts & SNMPNotifications
Hot FixBuilds
ConsultativeHelp
30
2010 Calpont Corporation – Confidential & Proprietary
InfiniDB Community & Enterprise Server Comparison
Core Database Server Features InfiniDB
Community
InfiniDB
Enterprise
MySQL front end Yes Yes
Column-oriented Yes Yes
Logical data compression Yes Yes
High-Speed bulk loader w/ no blocking queries while loading Yes Yes
Crash-recovery Yes Yes
Transaction support (ACID compliant) Yes Yes
INSERT/UPDATE/DELETE (DML) support Yes Yes
Multi-threaded engine (queries/writes will use all CPU’s/cores on box) Yes Yes
No indexing necessary Yes Yes
Automatic vertical (column) and logical horizontal partitioning of data Yes Yes
MVCC support – snapshot read (readers don’t block writers) Yes Yes
Alter Table with online add column capability Yes Yes
High concurrency supported Yes Yes
Terabyte database capable Yes Yes
Multi-Node, MPP scale out capable w/ failover No Yes
Support Forums Only Formal Production
Support
31
2010 Calpont Corporation – Confidential & Proprietary
For More Information
• Download InfiniDB Community Edition• Download InfiniDB documentation• Read InfiniDB technical white papers• Read InfiniDB intro articles on MySQL dev zone• Visit InfiniDB online forums• Trial the InfiniDB Enterprise Edition: http://www.calpont.com
www.infinidb.orgwww.calpont.com