Top Banner
Open Source Data Warehousing Los Angeles MySQL Meetup Date: November 18 th , 2009
20

Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

Aug 13, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

OpenSourceDataWarehousing

LosAngelesMySQLMeetupDate:November18th,2009

Page 2: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

2

Infobright Presentation

 Introductions  Carl Gelbart, Sr. Systems Engineer

 What is Infobright  Database engine for MySQL  Optimized for analytical queries

 Who is Infobright  Founded in 2005  Commercial Open Source in 2008

 What’s new since I’ve been here last  Infobright Architecture

Page 3: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

2

Infobright

Cool Vendor in Data Management and Integration

2009

2008 High Performance Data Warehousing Partner of the Year

Infobright: Economic Data Warehouse

Choice

STARTUP TO WATCH   Technology Innovation

  First commercial open source database geared towards analytics   Community & Enterprise Editions

  Lowest cost technology in the market   Simplest and easiest to build and manage   Powerful & scalable MySQL   Technology unlike ANYTHING in the industry

  Strong Momentum & Mature Product   Release 3.2.2 generally available   > 100 Enterprise Customers in 10 Countries   > 40 Partners on 6 continents   A vibrant open source community

  +1 million visitors   Over 20,000 downloads   3,000 active community participants

Partner of the Year 2009

Page 4: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

4

What’s new since I was here last

•  Complete SQL query coverage • All functions are now supported • Math inside a function is now supported

•  Better Performance • Join performance improvements •  Insert speed improvements

•  Loader improvements •  MicroStrategy Certification •  Virtual Machines for ICE

•  Infobright & Pentaho •  Infobright & BIRT •  Infobright & Jaspersoft

Page 5: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

5

What’s new since I was here last

•  Platform Support Added •  Linux (SUSE 10, RHEL 5 / CentOS 5, Debian) •  Windows (32 & 64 bit) •  Solaris 10 (IEE only)

•  High Availability •  Currently active/passive (active/active in roadmap) •  Uses your choice of proxy & monitoring

•  LVS, MySQL Proxy, NAT, Port Forwarding •  Ldirectord, Heartbeat, keepalived, OpenAIS

•  Uses shared storage (SAN or NAS) •  Clients have to reconnect after a failure •  In flight transactions are rolled back in a failure

Page 6: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

What’s coming soon!

Infobright 3.3 (December 10th)   UTF8 – phase 1   Improved DML speed

  Continued improvements to insert speed

  Improvements for update & delete

  Alter table   Improved join performance   Data Integrity Manager

6

Page 7: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

Infobright Architecture

7

Page 8: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

8

Infobright and MySQL

  Integration provides a simple path to highly scalable data warehousing for MySQL users

  No new management interface to learn

 MySQL integration enables seamless connectivity to BI tools and MySQL drivers for C, JDBC, ODBC, .NET , Perl, etc.

 Infobright is architected on MySQL, backed by strong partnership with MySQL and SUN

Page 9: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

Infobright Technology

Smarter architecture   Load data and go  No indices or partitions

to build and maintain  Knowledge Grid

automatically updated as data packs are created or updated

 Super-compact data foot- print can leverage off-the-shelf hardware

Data Packs – data stored in manageably sized, highly compressed data packs

Data compressed using algorithms tailored to data type

Knowledge Grid – statistics and metadata “describing” the super-compressed data

Column Orientation

9

Page 10: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

Column vs. Row-Oriented

EMP_ID FNAME LNAME SALARY1 Moe Howard 100002 Curly Joe 120003 Larry Fine 9000

Row Oriented (1,Moe,Howard,10000; 2,Curly, Joe,12000; 3,Larry,Fine,9000;)

  Works well if all the columns are needed for every query.

  Efficient for transactional processing if all the data for the row is available

  Works well with aggregate results (sum, count, avg. )

  Only columns that are relevant need to be touched

  Consistent performance with any database design

  Allows for very efficient compression

Column Oriented (1,2,3; Moe,Curly,Larry; Howard,Joe,Fine; 10000,12000,9000;)

Page 11: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

11

Data Packs and Compression

64K

64K

64K

64K

Data Packs  Each data pack contains 65, 536 data values  Compression is applied to each individual data pack   The compression algorithm varies depending on data

type and distribution

Compression  Results vary depending on the

distribution of data among data packs

 A typical overall compression ratio seen in the field is 10:1

 Some customers have seen results have been as high as 40:1

Patent Pending Compression

Algorithms

Page 12: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

12

Knowledge Grid

This metadata layer = 1% of the compressed volume

Data Pack Nodes (DPN) A separate DPN is created for every data pack created in the database to store basic statistical information

Character Maps (CMAPs) Every Data Pack that contains text creates a matrix that records the occurrence of every possible ASCII character

Histograms Histograms are created for every Data Pack that contains numeric data and creates 1024 MIN-MAX intervals.

Page 13: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

A Simple Query using the Knowledge Grid

SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘TORONTO’;

salary age job city

Rows 1 to 65,536

65,537 to 131,072

131,073 to ……

2.  Find the Data Packs that contain age < 65

3.  Find the Data Packs that have job = ‘Shipping’

4.  Find the Data Packs that have City = “Toronto’

All packs ignored

All packs ignored

All packs ignored 5.  Now we eliminate all rows that have been

flagged as irrelevant.

Only this pack will be decompressed

6.  Finally we have identified the data pack that needs to be decompressed

1.  Find the Data Packs with salary > 50000

Completely Irrelevant Suspect All values match

Page 14: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

14

The Infobright Difference

 Self-managing – no data partitioning, no index creation or maintenance, Knowledge Grid created automatically

 Very low cost – standard commodity servers, minimal administrative costs, simple SMP hardware, much less storage

 Scalable, high performance – up to 50TB using single server, very fast data load, superior query performance, industry-leading compression, query and load performance constant as data grows

 Very fast time-to-market – fast implementation, no software configuration, no new schema, new queries without any effort, use the BI tools you already have

 Community and Enterprise Editions to suit your needs

Page 15: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

Common Use Cases

 Log file analytics  Examples: Telecom CDR analysis, Marketing click stream analytics, Financial services trade data analysis

 Data Warehousing  Data Marts, Enterprise Data Warehouse

 Desktop analytic database  Perform ad-hoc analytics on terabytes of data on a desktop or laptop. Create independent sandboxes for high use users.

 Embedded analytic database  ISV and SaaS vendors embed Infobright as low cost, small footprint analytic database

15

Page 16: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

16

Infobright Community Edition

 Column-oriented database ideal for analytics  Self-managing Knowledge Grid eliminates the need for indexes, data

partitioning, specific schemas and manual tuning  Superior query performance   Industry-leading data compression (10:1 average, up to 40:1) results in

significant reduction of database size, storage needed  Runs on industry standard Intel and AMD servers  Supports up to 50TB of data, up to 32 concurrent queries at peak

performance (depending on hardware)  Open source software under GPL v2 License  Technical assistance, documentation and training services available

separately

Page 17: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

17

Infobright Enterprise Edition

Infobright Community Edition product features, plus:  DML support (INSERT, UPDATE, DELETE)  Support for temp tables, faster query response  Fastest data load; up to 300GB/hr concurrent over 4 tables  Multi-threaded Infobright loader supports text and binary files  Support for MySQL loader  Three subscription programs to suit your needs; license,

maintenance and support, training discounts, and other benefits  Product warranty and indemnification  High availability configuration certification  Proactive notification of new releases and product information

Page 18: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

Features

Technical Support Forums and/or one-time 4-hr support pack Silver, Gold, Platinum

Warranty and Indemnification No Included

INSERT/UPDATE/DELETE No Supported

Infobright Loader Up to 50 GB/hr Multi-threaded, Up to 300GB/hr

Data Load Types Text only Text Binary (up to 100% faster)

MySQL Loader No Supported

Temp Tables No Supported

Platform Support 64-bit Intel and AMD

RHEL 5, CentOS 5, Debian 32-bit Ubuntu 8.04, Fedora 9,

Windows XP

64-bit Intel and AMD Solaris 10

RHEL 5, CentOS 5, Debian

Comparison of ICE and IEE

Page 19: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

Support Level

Hours of Service

Response Time SLA

# Named Client

Contacts

Support Access

Methods Services

* Two year PrePay

Charge/TB

One Year Annual

Charge/TB

Silver

9am-5pm EDT

(no phone support)

Severity 1=4 hr

Severity 2=8 hr

1

  Web   Email   Self-Service Knowledge Base

• Training and consulting services at published rate

$12,950 $15,950

Gold 8am-6pm

(ET for NA, CET Europe)

Severity 1=2 hr

Severity 2=6 hr

3

  Phone   Web   Email   Self-Service Knowledge Base

• Health Check Service included • 10% Discount training and other consulting services • Sev 1 hot fixes

$15,950 $18,950

Platinum 7 x 24 x 365

Severity 1=1 hr

Severity 2=4 hr

5

  Phone   Web   Email   Self-Service Knowledge Base

• Health Check Service included • 20% discount training and other consulting services • Sev 1 and 2 hot fixes

$18,950 $21,950

Perpetual Perpetual (Same as Gold Support Level)

18% of Purchase Price not to increase by more then 10% per Yr.

$40,000

* Two year subscription pricing requires two year prepay

IEE Annual Subscriptions

19

Page 20: Los Angeles MySQL Meetupfiles.meetup.com/1310600/Infobright MySQL Meetup Nov 18 2009.pdf · Column vs. Row-Oriented EMP_ID FNAME LNAME SALARY 1 Moe Howard 10000 2 Curly Joe 12000

20

Get Started

 Join the forums, learn from the experts  Sign up for a webinar  Download a white paper  Download ICE (Infobright Community

Edition)  Download a free trial of Infobright

Enterprise Edition at Infobright.com

[email protected]

www.infobright.com

www.infobright.org