Beyond PostgreSQL FORKS, ADD-ONS, TOOLS AND EXTENSION 18th August 2016
Beyond PostgreSQLFORKS, ADD-ONS, TOOLS AND EXTENSION
18th August 2016
PostgreSQL: Origin and History• Based on the same relational theory that other RDBMS are based on
• Started as a follow up project of Ingres – hence the name Post-gres
• Postgres was commercialized as Illustra – later acquired by Informix and features got merged into Informix
• 1995 – The query language was replaced with SQL
• 1996 – The project started a new life in Open Source world with devoted developers
• 1996 – PostgreSQL started with v6.0 and has flourished since then
• Today PostgreSQL – World’s Most Advanced Open Source Database
2
Timeline of PostgreSQL development
3
4
Recent Features and Roadmap• v8.0 – PITR, Tablespaces
• v9.0 – Streaming Replication
• v9.1 – Unlogged Tables, Synchronous Replication, pg_basebackup
• v9.2 – Cascaded Replication, JSON datatype
• v9.3 – Materialized Views, Event Triggers and Data page checksum
• v9.4 – Logical Decoding, Replication Slots, ALTER SYSTEM, JSONB
• Major features on their way – Parallel Processing, Bi-directional replication, Distributed Database/Sharding
5
Development Community• Today PostgreSQL is actively developed and maintained by a development group
• It is one of the oldest and strongest development groups
• The community has well defined roles for the team-◦ Core Team
◦ Committers
◦ Developers and Contributors
• Community has mailing list of development discussions and also for people who are looking for help ranging from a trivial query to a potential bug/defect
6
Featured Users
EDUCATION AND MEDIA
• Moscow State University, Moscow, Russia
• University of Sydney
• University of California, Berkeley
• IMDB.com
• Greenpeace
TECHNOLOGY AND TELECOM
• Sun Microsystems
• Apple
• Skype
• NTT Data
• Telstra
• Juniper Networks
• FlightStats
7
Some popular forks• EnterpriseDB
• Postgres-XL
• Postgres-XC
• Greenplum
• Vitesse DB
• Netezza
• Vertica
• Amazon Redshift
• Amazon RDS
• CitusDB
8
Some popular projects, tools and Extensions• pgBouncer
• pgpool
• BDR
• PostGIS
• pgAdmin
• Extend Features and Capability with Extensions
• Additional Server Side languages
• Foreign Data Wrappers
9
EnterpriseDB• Fork of Postgres – EDB Postgres Plus Advanced Server / Postgres Advanced Server
• Continuously merged with the latest code base of PostgreSQL◦ New release follows the release of PostgreSQL (typically within a quarter)◦ Many features get contributed back to community PostgreSQL as well
• Commercial License
• Latest Release – v9.5
• Key features –◦ Oracle Compatibility (PL/SQL, Packages, Oracle syntax and Built-in function/packages etc)◦ Security (password profile, sql/protect, auditing feature etc)◦ Performance enhancements◦ Tools (replication, monitoring, high availability and data integration etc)
• Website - http://www.enterprisedb.com/
• Also has a managed service available on many popular cloud platforms like Goole Cloud, AWS etc
10Disclaimer: My employer is a partner of EnterpriseDB
Postgre-XL• Forked from PostgreSQL as a separate project
• Mainly maintained by 2nd Quadrant now
• A continuous integration on last released major version of PostgreSQL◦ Typically a new release is 9-12 months after a major release of PostgreSQL
• Last release – 9.5 RC3
• Still new but getting better
• Open Source
• Key Features◦ Massive Parallel Processing with multiple nodes◦ Sharding and horizontal scaling
• Website - http://www.postgres-xl.org/
11
Greenplum• Forked from PostgreSQL 8.1
◦ Now merged with PostgreSQL 8.3
• Was developed independently
• Was proprietary – recently Open Sourced
• Last release - 4.3
• Key Features – Massive Parallel Processing
• Commercial Support available from Pivotal
• Website - http://greenplum.org/
12
Amazon Redshift• Forked from PostgreSQL 8.0
• Developed and maintained independently
• Not merge or continuous integration with recent code/features of PostgreSQL
• Provided as a managed service by Amazon Web Services
• Main features◦ Managed Service
◦ Massive Parallel Processing
• Website - https://aws.amazon.com/documentation/redshift/
13
Amazon RDS• A fork based on PostgreSQL
• Continuous integration with latest release
• Provided as a managed service by Amazon Web Services
• Latest available – v9.5
• Has few limitations as compared to running PostgreSQL on an EC2 Instace
• Does not support all the extensions
• Website - https://aws.amazon.com/rds/postgresql/
14
CitusDB• Continuous Integration with latest release
• Maintained by Citus Data
• Was proprietary – now Open Source!
• Last release – 5.2 (compatible with PostgreSQL 9.5)
• Main features-◦ Distributed Database
◦ Horizontal Scaling
• Website - https://www.citusdata.com/
15
pgBouncer• A database side middle-ware
• Offer various features◦ Connection Pooling
◦ Limiting Connections
◦ Offers various pooling modes
◦ Quite light weight – best tool if you just want connection pooling
• Continuously tries to keep up with new query parser
• Open Source
• Active community
• Website - https://pgbouncer.github.io/
16
pgpool-II• A database side middle-ware
• Offer various features◦ Connection Pooling◦ Parallel Query (now deprecated)◦ High Availability◦ Load Balancing◦ Replication
• Continuously tries to keep up with new query parser released with every new release of PostgreSQL
• Open Source
• Active community
• Website - http://www.pgpool.net/
17
Bi-directional replication a.k.a BDR• A fork of PostgreSQL with continuous attempts to merge back
• Currently available as an extension but soon aims to merge as an in-core feature
• Latest version – 1.0
• Lot of great development from the Project has been merged back in PostgreSQL in smaller chunks◦ Event Triggers
◦ Logical Decoding
◦ Replication Slots
◦ pg_xlogdump
◦ Logical Replication (aimed for v10.0)
• Actively maintained and pursued by 2nd Quadrant
• Website - http://bdr-project.org/docs/stable/index.html
18
PostGIS• A separate project aimed at adding Geo Spatial features for PostgreSQL
• Works as an extension
• Adds great capabilities for those who want to store geographical cooridnates, shapes etc in the database
• Additional datatypes, operators and functions are added by PostGIS
• Compatible with many forks e.g. AWS RDS, EnterpriseDB as well
• Website - http://postgis.net/
19
pgAdmin• A separate Open Source project maintained by PostgreSQL community
• A GUI tool to query, manage and provide development platform for PostgreSQL
• Currently in it’s third generation with pgAdmin-III
• pgAdmin-IV (the 4th Generation tool) is already in it’s beta
• Allows you to◦ Query
◦ Build query graphically
◦ Backup-restore (pg_dump/pg_restore)
◦ Explain Plan
◦ Admin Tasks (VACUUM/ANALYZE/CLUSTER/REINDEX)
◦ …
20
Some useful extensions - Data Type and Indexing• btree_gist - btree_gist provides GiST index operator classes that implement B-tree equivalent
behavior for the numeric and date-time data types
• chkpass - adds a new data-type which can be used for storing auto-encrypted password
• intarray - The intarray module provides a number of useful functions and operators for manipulating integer array. Has support indexes as well
21
Some useful extensions - Data Type and Indexing• isn - Provides data types for the following international product numbering standards:
EAN13, UPC, ISBN (books), ISMN (music), and ISSN (serials)
• ltree – adds ltree data type for storing hierarchical tree-like structure
• Seg - Confidence-interval datatype (GiST indexing example)
• Hstore - Adds hstore data-type for storing key-value pairs in PostgreSQL
• uuid-ossp - Adds capability to generate UUID
22
Some useful extensions - Additional Features• PostGIS - Adds spatial capabilities in PostgreSQL
• PL/Proxy - PL/Proxy is database partitioning system implemented as PL language
• adminpack - File and log manipulation routines◦ Used by pgAdmin and PEM
• lo - Extension for Large Object
• pgRouting - Extends PostGIS with routing and analysis functions
• spi -
• pg_trgm - provides functions and operators for determining the similarity of text based on trigram matching
◦ Also provides index operator classes that support fast searching
23
Some useful extensions - Additional Features• intagg - Provides integer aggregator and enumerator
• sslinfo - The sslinfo module provides information about the SSL certificate that the current client provided
• pgcrypto - Cryptographic functions
24
Some useful extensions - Monitoring• pg_jobmon - Job logging and monitoring for PostgreSQL
• pgfincore - A set of functions to handle low-level management of relations using mincore to explore cache memory
• pg_stat_statements - A view to return info on overall historical info on all statements executed
• pg_stat_plans - extends on pg_stat_statements and records query plans for all executed quries
• pgrowlocks - A function to return row locking information
• pgstattuple - A function to return statistics tuples/rows statistics within a table (dead rows, free space etc)
• pg_buffercache - View to inspect content of shared buffer cache
• pg_freespacemap - View to inspect contents of the free space map (FSM)
25
26
Some useful extensions - Interfacing and External Data Access• dblink
• Texcaller - Texcaller is a convenient interface to the TeX command line tools
• mongres - Runs a custom background worker that can speak mongo wire protocol
• PgMemcache - PgMemcache is a set of PostgreSQL user-defined functions that provide an interface to memcached
27
Additional Server Side Languages• PL/Java
• PL/Perl
• PL/Python
• PL/TCL
• …
28
Foreign Data Wrappers• Added in version 9.1
• PostgreSQL can link to other systems to retrieve data via Foreign Data Wrappers (FDWs).
• These can take the form of any data source
• Regular DB queries can use these data sources like regular tables and join multiple data sources
• Popular FDW • oracle_fdw• mongo_fdw• tds_fdw (SQL
Server)• couchdb_fdw• mysql_fdw• hdfs_fdw
• odbc_fdw• redis_fdw• file_fdw• postgres_fdw• www_fdw (web
service)
Thanks you!
Stay in Touch!
• Email - [email protected]
• LinkedIn - https://sg.linkedin.com/in/samkumar150288
• Twitter - @sameerkasi200x