Top Banner
© 2ndQuadrant 2013 [email protected] PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor
24

PostgreSQL BI & Performance - 2ndQuadrant

Feb 12, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

PostgreSQLBusiness Intelligence& Performance

Simon RiggsCTO, 2ndQuadrantPostgreSQL Major Contributor

Page 2: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under gran t agreem ent num ber318633

Page 3: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

AXLE Project

• Analytics on Xtremely Large European data– Secure– Big– Fast– Hardware

optimised– Visual Analytics

axleproject.eu

Page 4: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

Topics

• Business Intelligence & Architecture• BI Performance Feature Effectiveness• Benchmark Analysis & Opportunities• New Features in Progress

Page 5: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

Business Intelligence

• ETL• Reporting• Ad-hoc queries• Data Mining

• Many query types• Counting• Summarisation• Strategic Analysis• Analytics

Page 6: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

BI Architecture

• SQL was invented for Business Intelligence• Classic DW

– DB2 v Teradata

• Specialist Databases

OLTP DW

Page 7: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

Specialist OLTP Problems

• M ongoDB– Joins don't scale!

• V oltDB– No concurrency– All SQL must run in same duration– Partial SQL implementation

Page 8: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

Specialist DW Problems

• Second specialist system required• ETL middleware also often required for loading• Data delayed on route to second system• Frequently highly compressed, so read only or

difficult to maintain

Page 9: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

Get Real

• Big Data– 99% of databases are <100GB

• Real Time results– Business Intelligence required 24x7– Closed loop processing requires fast response

• SQL is much easier to use than alternatives– More expressive and easier to use– Already the de facto standard for BI

Page 10: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

Minimal Approach

• Emphasise that additional BI technology will not reduce costs and may not offer solutions

• Keep Business Intelligence on PostgreSQL• Use Hot Standby to expand capacity and

isolate Business Intelligence workloads• Minimise ETL whenever possible• Gain benefits of SQL and concurrency

– Immediate access to data

Page 11: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

Things To Learn

• Query performance is important• Custom/special data structures are important

in increasing performance• Stale answers are acceptable for many

situations

Page 12: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

BI Feature Effectiveness

• Problem 1: Get the work into the database

• Problem 2: Speed up the work in the database+++++ Work Avoidance

++ Algorithmic Improvement

+ Brute Force

Page 13: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

Orange Data Mining

• Orange 3.0generatesSQL for alldata flows

• Directly utilises the power of databases

• Integrates withPostgreSQL

Page 14: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

BI Tuning Opportunities

• COPY batch optimisations defeated• Btree insert bottlenecks on large data loads• Aggregate Optimisation

– Use sum()/count() not avg()

• Join Estimate/Actual Mismatch– Use enable_nestloops = off

• Plan Pushdown– Manual SQL rewrite

Page 15: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

Speed Up: Work Avoidance

• Cacheing– Result Cache– Materialized Views

• Approximation• Partition Elimination• Improved Optimisation

Page 16: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

Speed Up: Algorithms

• Compression• Column Orientation• Vectors• Hardware approaches

Page 17: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

Speed Ups: Brute Force

• Parallel Query is a brute force approach• Gains in performance come from additional

utilisation of resources, not from being smarter– Reduces overall concurrency– Still requires extensive optimizer changes

• The industry thinks we need it• Some queries do require it• PostgreSQL should do this,

2ndQuadrant can, will and has already helped

Page 18: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

9.4 BI Features In-Progress

• Min Max Indexes• Parallel Sort & Parallel Query infrastructure• Materialized Views++• Multi-core scalability gains (lwlocks)• (DDL Locking impact reductions)• (Row Level Security)

Page 19: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

Min Max Indexes (9.4)

• Automatic Partitioning– Store min and max tuples for each page range– Use theorem proving to avoid sections of scan– Covers all columns, not just defined partition key– Can be added easily to existing applications

Page 20: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

Min Max Index results

• 2 GB table MinMax B-Tree• Index build time 11s 96s• Index size 24kB 1.1GB• Load time w index 1 x2-3• Index SEL (1 row) x2-3 1• Index SEL (many) same same

Page 21: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

MinMax Indexes

• Does not require complex DDL

• Generate almost no index inserts– Fits in RAM even for Petabytes of data

• Generate almost no additional WAL– Works well with Hot Standby data warehousing

• Only works with some data distributions– Additional indexing may be needed

Page 22: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

PostgreSQL BI Roadmap

Advanced Business Intelligence

9.4 10.0 10.1

High Security

Online ChangeVery Large Database

Page 23: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

2ndQuadrant

Consulting,Migration

Training Support,RemoteDBA

Open SourceDevelopment

Page 24: PostgreSQL BI & Performance - 2ndQuadrant

© 2ndQuadrant 2013 [email protected]

The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under gran t agreem ent num ber318633