Top Banner
© 2017 Percona 1 Vadim Tkachenko Supercharge Your Analytics with ClickHouse Webinar September 14 th , 2017 CTO, Percona Alexander Zaitsev CTO, Altinity
80

Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Apr 26, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona1

VadimTkachenko

SuperchargeYourAnalyticswithClickHouseWebinarSeptember14th,2017

CTO,PerconaAlexanderZaitsevCTO,Altinity

Page 2: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona2

Analyticdatabaselandscape

Page 3: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona3

Commercialsolutions– fastandexpensive

Vertica

RedShift

Teradata• Etc

Thecostscaleswithyourdata

Page 4: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona4

OpenSource:somewhatslow,sometimebuggy.Butfree

InfiniDB(nowMariaDBColumnStore)

InfoBright

GreenPlum(startedascommerical)

Hadoopsystems

ApacheSpark

Page 5: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona5

ClickHouse – fastandfree!OpenSourced inJun2016

Page 6: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona6

ClickHouse story

Yandex.ru- Russiansearchengine

YandexMetrika- Russian“GoogleAnalytics”

InteractiveAdHocreportsatmultiplepetabytes• 30+billionsofeventsdaily

NocommercialsolutionwouldbecosteffectiveandnoOpenSourcesolutiontohandlethisscale.

That’showClickHousewasborn

Page 7: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona7

ClickHouse isextremelyfastandscalable."Wehad nochoice,butmakeitfast"by ClickHouse developers

Page 8: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona8

InitialRequirements

Fast.Reallyfast Dataprocessinginrealtime

Capableofstoringpetabytes

ofdata

Fault-toleranceintermsof

datacenters

Flexiblequerylanguage

Page 9: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona9

Technicaldetails

Vectorizedprocessing

MassivelyParallelProcessing

Sharednothing

Columnstorewithlatematerialization(likeC-StoreandVertica):• Datacompression• Columnlocality• Norandomreads

(moreindetails,inRussian,https://clickhouse.yandex/presentations/meetup7/internals.pdf)

Page 10: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona10

VectorizedprocessingDataisrepresentedassmallsingle-dimensionalarrays(vectors),easilyaccessibleforCPUs.

Thepercentageofinstructionsspentininterpretationlogicisreducedbyafactorequaltothevector-size

Thefunctionsthatperformworknowtypicallyprocessanarrayofvaluesinatightloop

Tightloopscanbeoptimizedwellbycompilers,enablecompilerstogenerateSIMDinstructionsautomatically.

ModernCPUsalsodowellonsuchloops,out-of-orderexecutioninCPUsoftentakesmultipleloopiterationsintoexecutionconcurrently,exploitingthedeeplypipelinedresourcesofmodernCPUs.

Itwasshownthatvectorizedexecutioncanimprovedata-intensive(OLAP)queriesbyafactor50.

Page 11: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona11

Column-oriented

*Theimagetakenfromhttp://www.timestored.com/time-series-data/what-is-a-column-oriented-database

Page 12: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona12

EfficientexecutionSELECT Referer, count(*) AS count FROM hits WHERE CounterID = 1234 AND Date >= today() - 7 GROUP BY RefererORDER BY count DESC LIMIT 10(*examplefromhttps://clickhouse.yandex/presentations/meetup7/internals.pdf)

Vectorizedprocessing

Readonlyneededcolumns:

CounterID,Referer,Date

Compression

Withindex(CounterID,Date)- fastdiscardofunneededblocks

Page 13: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona13

SingleServer- MPPUsemultipleCPUcoresonthesingleserver

Realcase:Apachelogfromtherealwebsite– 1.56billionrecords

Query:SELECT extract(request_uri,'(w+)$') p,sum(bytes) sm,count(*) c FROM apachelogGROUP BY p ORDER by c DESC limit 100

Queryissuitedforparallelexecution– mosttimespentinextractfunction

Page 14: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona14

Executiononsingleserver56threads/28cores|Intel(R)Xeon(R)[email protected]

Queryexecutiontime

With1threadallowed:823.646sec~1.89mln records/sec

With56threadsallowed:23.587sec~66.14mln records/sec

Speedup:34.9xtimes

Page 15: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

DATABASE PERFORMANCEMATTERS

DatabasePerformanceMattersDatabasePerformanceMattersDatabasePerformanceMattersDatabasePerformanceMattersDatabasePerformanceMatters

Page 16: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona16

Query3SELECT y, request_uri, cntFROM ( SELECT access_date y, request_uri, count(*) AS cnt

FROM apachelogGROUP BY y, request_uriORDER BY y ASC )

ORDER BY y,cnt DESC LIMIT 1 BY yLesssuitableforparallelexecution– serializationtobuildatemporarytableforinternalsubquerySpeedup:6.4xtimes

Page 17: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

DATABASE PERFORMANCEMATTERS

DatabasePerformanceMattersDatabasePerformanceMattersDatabasePerformanceMattersDatabasePerformanceMattersDatabasePerformanceMatters

Page 18: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona18

Moredetailsintheblogpost:https://www.percona.com/blog/2017/09/13/massive-parallel-log-processing-clickhouse/

Page 19: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona19

DatadistributionIfasingleserverisnotenough

Page 20: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona20

Distributedquery

SELECTfooFROMdistributed_table

SELECTfooFROMlocal_table GROUPBYcol1

• Server1

SELECTfooFROMlocal_table GROUPBYcol1

• Server2

SELECTfooFROMlocal_table GROUPBYcol1

• Server3

Page 21: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona21

NYCtaxibenchmarkCSV227GB,~1.3bln rowsSELECT passenger_count, avg(total_amount) FROM trips GROUP BY passenger_count

*Takenfromhttps://clickhouse.yandex/presentations/meetup7/internals.pdf

NServers 1 3 140Time,sec 1.224 0.438 0.043Speedup x2.8 x28.5

Page 22: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona22

Reliability

Anynumberofreplicas

Anyreplicationtopology

Multi-master

Cross-DC

Asynchronous(forspeed)• è Delayedreplicas,possiblestaledatareads• Moreondatadistributionandreplicationhttps://www.altinity.com/blog/2017/6/5/clickhouse-data-distribution

Page 23: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona23

Benchmarks!

Page 24: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona24

ClickHouse vsSparkvsMariaDBColumnStoreWikipediapageCounts,loadedfullwiththeyear2008,~26billionrowshttps://www.percona.com/blog/2017/03/17/column-store-database-benchmarks-mariadb-columnstore-vs-clickhouse-vs-apache-spark/

Page 25: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona25

ClickHouse vsSparkvsMariaDBColumnStore

Page 26: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona26

Cloud:ClickHouse vsRedShifthttps://www.altinity.com/blog/2017/6/20/clickhouse-vs-redshift

5queriesbasedonNYCtaxidataset

Query1:SELECT dictGetString('taxi_zones', 'zone', toUInt64(pickup_location_id)) AS zone, count() AS c

FROM yellow_tripdata_stagingGROUP BY pickup_location_idORDER BY c DESC LIMIT 10

RedShift1instance/3instancesofds2.xlarge(4vCPU/31GiBmemory)

ClickHouse1instancer4.xlarge(4vCPU/30.5GiBmemory)

Page 27: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona27

seconds

query

sec

query

Page 28: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona28

ByYandex,see[2]

Page 29: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona29

ClickHouse– usecases

Advnetworksdata

Web/Appanalytics

Ecommerce/Telecomlogs

Onlinegames

Sensordata

Monitoring

Page 30: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona30

ClickHouse – wrongcases

NotanOLTP

Notakey-valuestore

Notadocumentstore

NoUPDATEs/DELETEs– doesnotsupportdatamodification

Page 31: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona31

ClickHouse- limitations

CustomSQLdialect

Asaconsequence-- limitedecosystem(cannotfittostandardone)

Nodeletes/updates:• buttherearemutabletabletypes(engines)• thereisawaytoconnecttoexternalupdatabledata(dictionaries)

Somewhathardtomanagefornow- novarietyoftoolstoworkwith

Somewhatyoung

Page 32: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona32

ResourcesforusersTheDocumentationisavailableinEnglish!• https://clickhouse.yandex/docs/en/

GUITool• http://tabix.io

ApacheSupersethttps://superset.incubator.apache.org supportsClickHouse• amodern,enterprise-readybusinessintelligencewebapplication

Grafana integration• https://grafana.com/plugins/vertamedia-clickhouse-datasource

ODBC&JDBCdriversavailable

Page 33: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona33

Page 34: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona34

WhoisusingClickHouse?

Well,besideYandex

Carto• https://carto.com/blog/inside/geospatial-processing-with-clickhouse/

Percona• WeintegrateClickHouseaspartofourPerconaMonitoringandManagementsoftware

CloudFlare• https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/

Page 35: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona35

ClickHouse atCloudFlare

33Nodes

8M+inserts/sec

2PB+disksize

MoreonCloudFlare experience• https://www.altinity.com/sfmeetup2017

Page 36: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

ClickHouse Demo on MemCloudKodiak Data and Altinity now Offer a Cloud Version of ClickHouse

36

1. FASTEST MPP Open Source DBMS

2. Cutting Edge Cloud for Big Data Apps and Processing

3. World-class ClickHouse Expertise

Try the ClickHouse on MemCloud demo herehttp://clickhouse-demo.memcloud.works/

Page 37: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona37

Finalwords

SimplytryitforyourAnalytics/BigDatacase!

Needmoreinfo- http://clickhouse.yandex

•@VadimTk

MyContact:[email protected]

Page 38: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona38

GetYourTicketsforPerconaLiveEurope!

ChampioningOpenSourceDatabases▪ MySQL,MongoDB,OpenSourceDatabases▪ TimeSeriesDatabases,PostgreSQL,RocksDB▪ Developers,Business/CaseStudies,Operations▪ September25-27th,2017▪ RadissonBlu RoyalHotel,Dublin,Ireland

LastYear’sConferenceSoldOut!ReserveyourspotASAP.

Page 39: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

©2017Percona39

TalktoPerconaExpertsatAWSRe:Invent!DatabasePerformanceforCloudDeployments▪PerconaSupportandManagedServices• AmazonRDS,Aurora,RollYourOwn•MySQL/MariaDB/MongoDB• Reducecostsandoptimizeperformance

▪PerconaMonitoringandManagementDemos• Point-in-timevisibilityandhistoricaltrendingofdatabaseperformance• Detailedqueryanalytics

▪Booth#1138

Page 40: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

ClickHouse Webinar

Alexander Zaitsev

LifeSteet, Altinity

Altinity

Page 41: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Who am I• Graduated Moscow State University in 1999

• Software engineer since 1997

• Developed distributed systems since 2002

• Focused on high performance analytics since 2007

• Director of Engineering in LifeStreet

• Co-founder of Altinity

Page 42: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Agenda

• LifeStreet ClickHouse implementation experience

• MySQL and ClickHouse

Page 43: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

• Ad Tech company (ad exchange, ad server, RTB, DSP, DMP) since 2006

• 10,000,000,000+ events/day

• 10+ fact tables, 500+ dimensions, 100+ metrics

• Internal and external users, algos, MLs

• Different solutions tried and used in different years, including MySQL,

Oracle, Vertica, many internal POCs

• Now -- ClickHouse

Page 44: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Flashback: ClickHouse at 08/2016

• 1-2 months in Open Source

• Internal Yandex product – no other installations

• No support, roadmap, communicated plans

• 3 official devs

• A number of visible limitations (and many invisible)

• Stories of other doomed open-sourced DBs

Page 45: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Develop production system with “that”?

Page 46: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

ClickHouseis/was

missing:

• Transactions

• Constraints

• Consistency

• UPDATE/DELETE

• NULLs (not anymore)

• Milliseconds

• Implicit type conversions

• Full SQL support

• Partitioning by any column (date only)

• Cluster management tools

Page 47: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright
Page 48: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

But we tried and succeeded

Page 49: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Migration problem: basic things do not fit

Page 50: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Main Challenges

• Design efficient schema

– Use ClickHouse bests

– Workaround limitations

• Design sharding and replication

• Reliable data ingestion

• Client interfaces

Page 51: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Typical schema: “star”

• Facts• Dimensions• Metrics• Projections

Page 52: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

De-normalized vs. normalized

De-normalized (dimensions in fact table):• Easy• Simple queries• No data changes are

possible• Sub-efficient storage• Sub-efficient queries

Normalized (dimensions in separate tables):• More difficult to

maintain• More complex queries• Dimensions can change• More efficient storage• More efficient queries

Page 53: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Normalized schema: traditional approach - joins

• Limited support in ClickHouse (1 level, cascade sub-selects for

multiple)

• Dimension tables are not updatable

Page 54: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Dictionaries - ClickHouse dimensions approach

• Lookup service: key -> value

• Supports multiple external sources (files,

databases etc.)

• Refreshable

Page 55: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Dictionaries. Example

SELECT country_name, sum(imps)

FROM TANY INNER JOIN dim_geo USING (geo_key)

GROUP BY country_name;

vs

SELECT dictGetString(‘dim_geo’, ‘country_name’, geo_key) country_name,

sum(imps) FROM T

GROUP BY country_name;

Page 56: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Dictionaries. Configuration<dictionary>

<name></name>

<source> … </source>

<lifetime> ... </lifetime>

<layout> … </layout>

<structure>

<id> ... </id>

<attribute> ... </attribute>

<attribute> ... </attribute>

...

</structure>

</dictionary>

Page 57: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Dictionaries. Sources• file

• mysql table

• clickhouse table

• odbc data source

• executable script

• http service

Page 58: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Dictionaries. Layouts

• flat

• hashed

• cache

• complex_key_hashed

• range_hashed

Page 59: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Dictionaries. range_hashed

• ‘Effective Dated’ queries

<layout>

<range_hashed />

</layout>

<structure>

<id>

<name>id</name>

</id>

<range_min>

<name>start_date</name>

</range_min>

<range_max>

<name>end_date</name>

</range_max>

dictGetFloat32('srv_ad_serving_costs', 'ad_imps_cpm', toUInt64(0), event_day)

Page 60: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Dictionaries. Update values• By timer (default)

• Automatic for MySQL MyISAM

• Using ‘invalidate_query’

• Manually touching config file

• N dict * M nodes = N * M DB connections

Page 61: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Dictionaries. Restrictions

• ‘Normal’ keys are only UInt64

• No on demand update (added in 1.1.54289)

• Every cluster node has its own copy

• XML config (DDL would be better)

Page 62: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Tables

• Engines

• Sharding

• Distribution

• Replication

Page 63: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Engine = ?• In memory:

– Memory

– Buffer

– Join

– Set

• On disk:

– Log, TinyLog

– MergeTree

family

• Virtual:

• Merge

• Distributed

• Dictionary

• Null

• Special purpose:

• View

• Materialized View

Page 64: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Merge tree• What is ‘merge’

• PK sorting

• Date partitioning

• Query performance

Page 65: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Data Load

• Multiple formats are supported, including CSV, TSV,

JSONs, native binary

• Error handling

• Simple Transformations

• Load locally (better) or distributed (possible)

• Temp tables help

• Replicated tables help with de-dup

Page 66: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

The power of Materialized Views

• MV is a table, i.e. engine, replication etc.

• Updated synchronously

• SummingMergeTree – consistent aggregation

• Alters are not straightforward, but possible

Page 67: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Data Load Diagram

Temp tables (local)

Fact tables (shard)

SummingMergeTree(shard)

SummingMergeTree(shard)

Log Files

INSERT

MV MV

INSERT Buffer tables (local)

Realtime producers

INSERT

Buffer flush

MySQL

Dictionaries

CLICKHOUSE NODE

Page 68: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Updates and deletes

• Dictionaries are updatable

• Replacing and Collapsing merge trees

–eventually updates

–SELECT … FINAL

• Partitions

Page 69: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Sharding and Replication• Sharding and Distribution => Performance

– Fact tables and MVs – distributed over multiple shards

– Dimension tables and dicts – replicated at every node (local joins and

filters)

• Replication => Reliability

– 2-3 replicas per shard

– Cross DC

Page 70: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

SQL• Supports basic SQL syntax

• Non-standard JOINs implementation:

– 1 level only

– ANY vs ALL

– only USING

• Aliasing everywhere

• Array and nested data types, lambda-expressions, ARRAY JOIN

• GLOBAL IN, GLOBAL JOIN

• Approximate queries

• TopX support (LIMIT N BY)

Page 71: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Main Challenges Revisited

• Design efficient schema

– Use ClickHouse bests

– Workaround limitations

• Design sharding and replication

• Reliable data ingestion

• Client interfaces

Page 72: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Migration project timelines

• August 2016: POC

• October 2016: first test runs

• December 2016: production scale data load:

– 10-50B events/ day, 20TB data/day

– 12 x 2 servers with 12x4TB RAID10

• March 2017: Client API ready, starting migration

– 30+ client types, 20 req/s query load

• May 2017: extension to 20 x 3 servers

• June 2017: migration completed

Page 73: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

ClickHouse at fall 2017

• 1+ year Open Source

• 100+ prod installs worldwide

• Public changelogs, roadmap, and plans

• 10+ devs, community contributors

• Active community, blogs, case studies

• A lot of features added by community requests

• Support by Altinity

Page 74: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

So now it is much easier

Page 75: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

ClickHouse and MySQL

• MySQL is widespread but weak for analytics

– TokuDB, InfiniDB somewhat help

• ClickHouse is best in analytics

How to combine?

Page 76: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Imagine

MySQL flexibility at ClickHouse speed?

Page 77: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Dreams….

Page 78: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

ClickHouse with MySQL

• ProxySQL to access

ClickHouse data via MySQL

protocol (already available)

• Binlogs integration to load

MySQL data in ClickHouse in

realtime (in progress)

MySQL CH

ProxySQL

binlog consumer

Page 79: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

ClickHouse instead of MySQL

• Web logs analytics

• Monitoring data collection and analysis

– Percona’s PMM

– Infinidat InfiniMetrics

• Other time series apps

Page 80: Supercharge Your Analytics with ClickHouse · Supercharge Your Analytics with ClickHouse Webinar September 14th, 2017 CTO, Percona Alexander Zaitsev CTO, Altinity. ... InfoBright

Questions?

Contact me:

[email protected]@altinity.comskype: alex.zaitsev