Open Source Data Warehousing Los Angeles MySQL Meetup Date: November 18 th , 2009
OpenSourceDataWarehousing
LosAngelesMySQLMeetupDate:November18th,2009
2
Infobright Presentation
Introductions Carl Gelbart, Sr. Systems Engineer
What is Infobright Database engine for MySQL Optimized for analytical queries
Who is Infobright Founded in 2005 Commercial Open Source in 2008
What’s new since I’ve been here last Infobright Architecture
2
Infobright
Cool Vendor in Data Management and Integration
2009
2008 High Performance Data Warehousing Partner of the Year
Infobright: Economic Data Warehouse
Choice
STARTUP TO WATCH Technology Innovation
First commercial open source database geared towards analytics Community & Enterprise Editions
Lowest cost technology in the market Simplest and easiest to build and manage Powerful & scalable MySQL Technology unlike ANYTHING in the industry
Strong Momentum & Mature Product Release 3.2.2 generally available > 100 Enterprise Customers in 10 Countries > 40 Partners on 6 continents A vibrant open source community
+1 million visitors Over 20,000 downloads 3,000 active community participants
Partner of the Year 2009
4
What’s new since I was here last
• Complete SQL query coverage • All functions are now supported • Math inside a function is now supported
• Better Performance • Join performance improvements • Insert speed improvements
• Loader improvements • MicroStrategy Certification • Virtual Machines for ICE
• Infobright & Pentaho • Infobright & BIRT • Infobright & Jaspersoft
5
What’s new since I was here last
• Platform Support Added • Linux (SUSE 10, RHEL 5 / CentOS 5, Debian) • Windows (32 & 64 bit) • Solaris 10 (IEE only)
• High Availability • Currently active/passive (active/active in roadmap) • Uses your choice of proxy & monitoring
• LVS, MySQL Proxy, NAT, Port Forwarding • Ldirectord, Heartbeat, keepalived, OpenAIS
• Uses shared storage (SAN or NAS) • Clients have to reconnect after a failure • In flight transactions are rolled back in a failure
What’s coming soon!
Infobright 3.3 (December 10th) UTF8 – phase 1 Improved DML speed
Continued improvements to insert speed
Improvements for update & delete
Alter table Improved join performance Data Integrity Manager
6
Infobright Architecture
7
8
Infobright and MySQL
Integration provides a simple path to highly scalable data warehousing for MySQL users
No new management interface to learn
MySQL integration enables seamless connectivity to BI tools and MySQL drivers for C, JDBC, ODBC, .NET , Perl, etc.
Infobright is architected on MySQL, backed by strong partnership with MySQL and SUN
Infobright Technology
Smarter architecture Load data and go No indices or partitions
to build and maintain Knowledge Grid
automatically updated as data packs are created or updated
Super-compact data foot- print can leverage off-the-shelf hardware
Data Packs – data stored in manageably sized, highly compressed data packs
Data compressed using algorithms tailored to data type
Knowledge Grid – statistics and metadata “describing” the super-compressed data
Column Orientation
9
Column vs. Row-Oriented
EMP_ID FNAME LNAME SALARY1 Moe Howard 100002 Curly Joe 120003 Larry Fine 9000
Row Oriented (1,Moe,Howard,10000; 2,Curly, Joe,12000; 3,Larry,Fine,9000;)
Works well if all the columns are needed for every query.
Efficient for transactional processing if all the data for the row is available
Works well with aggregate results (sum, count, avg. )
Only columns that are relevant need to be touched
Consistent performance with any database design
Allows for very efficient compression
Column Oriented (1,2,3; Moe,Curly,Larry; Howard,Joe,Fine; 10000,12000,9000;)
11
Data Packs and Compression
64K
64K
64K
64K
Data Packs Each data pack contains 65, 536 data values Compression is applied to each individual data pack The compression algorithm varies depending on data
type and distribution
Compression Results vary depending on the
distribution of data among data packs
A typical overall compression ratio seen in the field is 10:1
Some customers have seen results have been as high as 40:1
Patent Pending Compression
Algorithms
12
Knowledge Grid
This metadata layer = 1% of the compressed volume
Data Pack Nodes (DPN) A separate DPN is created for every data pack created in the database to store basic statistical information
Character Maps (CMAPs) Every Data Pack that contains text creates a matrix that records the occurrence of every possible ASCII character
Histograms Histograms are created for every Data Pack that contains numeric data and creates 1024 MIN-MAX intervals.
A Simple Query using the Knowledge Grid
SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘TORONTO’;
salary age job city
Rows 1 to 65,536
65,537 to 131,072
131,073 to ……
2. Find the Data Packs that contain age < 65
3. Find the Data Packs that have job = ‘Shipping’
4. Find the Data Packs that have City = “Toronto’
All packs ignored
All packs ignored
All packs ignored 5. Now we eliminate all rows that have been
flagged as irrelevant.
Only this pack will be decompressed
6. Finally we have identified the data pack that needs to be decompressed
1. Find the Data Packs with salary > 50000
Completely Irrelevant Suspect All values match
14
The Infobright Difference
Self-managing – no data partitioning, no index creation or maintenance, Knowledge Grid created automatically
Very low cost – standard commodity servers, minimal administrative costs, simple SMP hardware, much less storage
Scalable, high performance – up to 50TB using single server, very fast data load, superior query performance, industry-leading compression, query and load performance constant as data grows
Very fast time-to-market – fast implementation, no software configuration, no new schema, new queries without any effort, use the BI tools you already have
Community and Enterprise Editions to suit your needs
Common Use Cases
Log file analytics Examples: Telecom CDR analysis, Marketing click stream analytics, Financial services trade data analysis
Data Warehousing Data Marts, Enterprise Data Warehouse
Desktop analytic database Perform ad-hoc analytics on terabytes of data on a desktop or laptop. Create independent sandboxes for high use users.
Embedded analytic database ISV and SaaS vendors embed Infobright as low cost, small footprint analytic database
15
16
Infobright Community Edition
Column-oriented database ideal for analytics Self-managing Knowledge Grid eliminates the need for indexes, data
partitioning, specific schemas and manual tuning Superior query performance Industry-leading data compression (10:1 average, up to 40:1) results in
significant reduction of database size, storage needed Runs on industry standard Intel and AMD servers Supports up to 50TB of data, up to 32 concurrent queries at peak
performance (depending on hardware) Open source software under GPL v2 License Technical assistance, documentation and training services available
separately
17
Infobright Enterprise Edition
Infobright Community Edition product features, plus: DML support (INSERT, UPDATE, DELETE) Support for temp tables, faster query response Fastest data load; up to 300GB/hr concurrent over 4 tables Multi-threaded Infobright loader supports text and binary files Support for MySQL loader Three subscription programs to suit your needs; license,
maintenance and support, training discounts, and other benefits Product warranty and indemnification High availability configuration certification Proactive notification of new releases and product information
Features
Technical Support Forums and/or one-time 4-hr support pack Silver, Gold, Platinum
Warranty and Indemnification No Included
INSERT/UPDATE/DELETE No Supported
Infobright Loader Up to 50 GB/hr Multi-threaded, Up to 300GB/hr
Data Load Types Text only Text Binary (up to 100% faster)
MySQL Loader No Supported
Temp Tables No Supported
Platform Support 64-bit Intel and AMD
RHEL 5, CentOS 5, Debian 32-bit Ubuntu 8.04, Fedora 9,
Windows XP
64-bit Intel and AMD Solaris 10
RHEL 5, CentOS 5, Debian
Comparison of ICE and IEE
Support Level
Hours of Service
Response Time SLA
# Named Client
Contacts
Support Access
Methods Services
* Two year PrePay
Charge/TB
One Year Annual
Charge/TB
Silver
9am-5pm EDT
(no phone support)
Severity 1=4 hr
Severity 2=8 hr
1
Web Email Self-Service Knowledge Base
• Training and consulting services at published rate
$12,950 $15,950
Gold 8am-6pm
(ET for NA, CET Europe)
Severity 1=2 hr
Severity 2=6 hr
3
Phone Web Email Self-Service Knowledge Base
• Health Check Service included • 10% Discount training and other consulting services • Sev 1 hot fixes
$15,950 $18,950
Platinum 7 x 24 x 365
Severity 1=1 hr
Severity 2=4 hr
5
Phone Web Email Self-Service Knowledge Base
• Health Check Service included • 20% discount training and other consulting services • Sev 1 and 2 hot fixes
$18,950 $21,950
Perpetual Perpetual (Same as Gold Support Level)
18% of Purchase Price not to increase by more then 10% per Yr.
$40,000
* Two year subscription pricing requires two year prepay
IEE Annual Subscriptions
19
20
Get Started
Join the forums, learn from the experts Sign up for a webinar Download a white paper Download ICE (Infobright Community
Edition) Download a free trial of Infobright
Enterprise Edition at Infobright.com
www.infobright.com
www.infobright.org