Postgres as BI platform

Post on 07-Dec-2021

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Postgres asBIplatformAndyFefelovmastery.pro

1

Agenda

• Problemstatement• Opensourcesolution• ROLAP• Ourarchitecturereview• Postgresfeaturessuitablefor BI• ETLvsELT(stage-nds-ddm)• Columndatastorage• Configuration• Specialfeatures

• Prosandconsofoutsolution

2

Problemstatement

• OurcustomerisoneofthelargestpharmacysupplychaingroupinIreland• 4typesofdispensarysoftware• 250pharmacies• Tobeanalyzed:

• Orders• Scripts (prescription,recipe)• Claims

• Goalstobeachieved:• Purchasingpolicyoptimization• Marketingkillingfeature

3

Opensource

• SpagoBI• Pentaho• Mondrian• Saiku• Cubes(databrewery)

4

ROLAP(R-ROLAP)

• Starscheme• Facts• Dimensions• Measures

• Nopre-calculatedaggregates• SSD• Columnstorage• ???• Profit!

5

ROLAP

6

Ourarchitecture

Extractors

Postgres (LoadTransform)

Cubes(API)

Rails +React(UI)

Saiku (UI)

7

Architecture- extractors

• Cyclone_client• Mssql (2008-2012)• Golang• CSV+rsync overssh

• Kachok• Webscrapper

• Skytools replication• Fromexistingproducts

Extractors

Postgres (LoadTransform)

Cubes(API)

Rails +React(UI)

Saiku (UI)

8

Architecture– API+UI

• Cubes- cubes.databrewery.org• Easydrilling-down• Slicinganddicing• Servesaggregates,dimensiondetails,facts

• Providesallnecessarymetadataforareportingapplication

• Rails,React• Authorization• d3,dc,crossfilter

• Saiku• Onlyforbackoffice

Extractors

Postgres (LoadTransform)

Cubes(API)

Rails +React(UI)

Saiku (UI)

9

Architecture– Postgres(load,transform)

• rawdata• load_something_to_nds(_pharmacy_id integer)stage• normalizeddatastore• load_something_to_ddm(_pharmacy_id integer)nds•cubesandsnapshots•viewsddm

10

Architecture– Postgres(load,transform)

Stage• «Raw»data• CleanedupcompletelyineveryELTcycle• IsasdatasourceforNDS

• rawdata• load_something_to_nds(_pharmacy_id integer)

stage•normalized datastore•load_something_to_ddm(_pharmacy_idinteger)nds

•cubesandsnapshots•viewsddm

11

Architecture– Postgres(load,transform)

• NormalizedDataStore• Heredataisnormalizingandvalidating• Isasourceforddm

• Measuresforddm iscalculatedthere• deltacalculatingforloadingintoddm basedon last_updated field

• rawdata• load_something_to_nds(_pharmacy_id integer)

stage• normalizeddatastore• load_something_to_ddm(_pharmacy_idinteger)

nds

• cubesandsnapshots• viewsddm

12

Architecture– Postgres(load,transform)

• Dimensionaldatamodel• Cubes• Snapshots

• Deploycalmly• Analyzebefore-afterreleasestates• Viewisentrypointforapplication

• rawdata• load_something_to_nds(_pharmacy_id integer)

stage• normalizeddatastore• load_something_to_ddm(_pharmacy_idinteger)

nds

• cubesandsnapshots• viewsddm

13

Architecture– Postgres(snapshots)

fact_order_item

vw_order_item

s1_order_item

s2_order_item

14

Architecture– Postgres(snapshots)

fact_order_item

vw_order_item

s1_order_item

s2_order_item

15

Columnstorage

• Suitablefor:• aggregations• showingfixednumbersofcolumns

• cstore_fdw ->https://github.com/citusdata/cstore_fdw• Compression:Reducesin-memoryandon-diskdatasizeby2-4x.Canbeextendedtosupportdifferentcodecs.• Columnprojections:Onlyreadscolumndatarelevanttothequery.ImprovesperformanceforI/Oboundqueries.• Skipindexes:Storesmin/maxstatisticsforrowgroups,andusesthemtoskipoverunrelatedrows.

16

Columnstorage

• Ourexpirience:

• Isnotfasterthanvanilla postgres (sayhelloto cubes)• Volumereducedupto 12times.Wow.• Nowaytobackuptraditionalway(noneed?)• Nosupportfor delete/update (snapshots)

17

Configuration

• Loadprofile:• BigvolumeRWI/O• Mostof I/Oissittingin stage,nds• ddm isnothighloaded

• shared_buffers =½RAM• work_mem=2GB• maintenance_work_mem=3GB• temp_buffers =2GB• effective_cache_size =½RAM• max_wal_sizr =32GB

18

Features

• DDMcouldbeplacedindedicatedserver(londiste,pg_logical)• Use COPY/BULKINSERTS,don’tuse UPDATE(ke ke ke)• Youshouldthinkabouthorizontalandverticalpartitioning,pleasefindproperkeysforthat• Youshouldthinkaboutparallelismfromverybeginning• Use TABLESPACES/PARTIALINDEXES (andmoreandmoredisks)• Youshouldusedatastorepolicy• Statisticsshouldbecollectedintempfs volume

19

Featuresvol 2

• Usemigrations– sqitch bytheory• You’dbettertestELT- sqitch bytheory• Use pg_stat_statements (addthisintomonitoring)• Useprofiling– PLPROFILER3• Sometimes,youhave(not)touse cstore_fdw• Sometimes,youhave(not)touse unloggedtables

20

Prosandcons

• Cons• Noeasywaytoscalehorizontally• Reasonabledifficultdeploy

• Pros• Localdata(nobignetworktransfers)• Effectivelyparallelized(thankstopharmacy_id)• PL/pgSQL

21

Thankyouandy@mastery.pro

22

Speedlimit

• cubesisnotfast (duetoserialization)• json (12sec)• ujson (4sec)• postgres json output (1.5sec)db selftime0.3-0.7sec

23

top related