Top Banner
BigQuery Redshift VS
40

Redshift VS BigQuery

Jan 23, 2018

Download

Data & Analytics

Kostas Pardalis
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Redshift VS BigQuery

BigQueryRedshiftVS

Page 2: Redshift VS BigQuery
Page 3: Redshift VS BigQuery
Page 4: Redshift VS BigQuery

Amazon RedshiftReleased on 2012 (beta)

based on ParAccel (PostgreSQL clone)

Designed for OLAP and BI applications

Relational and Columnar database

Petabyte to Exabyte scale (Spectrum)

Page 5: Redshift VS BigQuery

Google BigQueryevolution of Dremel (2006)

Initially launched in 2010

Web Service on top of Dremel Technology

More of a hybrid system (columnar + nested data)

Petabyte scale

Page 6: Redshift VS BigQuery

Amazon Redshift Google BigQueryBuild on top of a proven technology

Relational

SQL

Analysts

Build something from scratch

Nested data structures are a first class citizen

NoSQL

Developers

VS

Page 7: Redshift VS BigQuery

Loading Data

Page 8: Redshift VS BigQuery

Amazon Redshift Google BigQueryS3

Kinesis

CSV, Avro, JSON

Google Cloud Storage

Streaming Inserts

Google Analytics Premium

CSV, Avro, JSON

VS

Page 9: Redshift VS BigQuery

Data Modeling

Page 10: Redshift VS BigQuery

Amazon Redshift Google BigQuerySchemas

Tables

Datasets

Tables

VS

Page 11: Redshift VS BigQuery

Data Types

Page 12: Redshift VS BigQuery

Data Types

Redshift: Closer to the Standard SQL data types (e.g. INT4, INT8) but does not support the full range of PostgreSQL data types

BigQuery: Smaller set of data types supported. But...

Page 13: Redshift VS BigQuery

Data Types

Redshift: Very basic support for JSON

BigQuery: Support for Array and STRUCT types. Nested data structures are first class citizens.

Page 14: Redshift VS BigQuery

Working with Data

Page 15: Redshift VS BigQuery

Data ManipulationBigQuery used to be append only, now it supports Updates and Deletes (DML). But still limited.

Redshift always had this Supported via SQL but with a catch (Vacuum)

Page 16: Redshift VS BigQuery

Table Manipulation

BigQuery: Limited and expensive via standard SQL, or via HTTP API (but you have to unload and reload the table).

Redshift: Supported via SQL

Both support views but not materialized

Page 17: Redshift VS BigQuery

Data Consistency

Page 18: Redshift VS BigQuery

Data Consistency

Redshift supports transactions VS BigQuery No Deduplication harder to be achieved on BigQuery (costly also).Even more complex when we go streaming.

Page 19: Redshift VS BigQuery

Data Consistency

Kinesis: At least once semantics

BigQuery: best-effort deduplication time window + insertId.

Page 20: Redshift VS BigQuery

Cluster Management

Page 21: Redshift VS BigQuery

Cluster ManagementHere is where BigQuery really shines. It is fully managed with support for HA.

Redshift does not abstract completely the hardware from the user and it is difficult to implement it as a HA service.

This changes with Spectrum.

Page 22: Redshift VS BigQuery

Cluster Management Connectivity

Page 23: Redshift VS BigQuery

Connectivity

Redshift: API + Full JDBC/ODBC support. & access to all the standard PostgreSQL tools.

BigQuery: Mainly through the REST API, JDBC/ODBC drivers only for queries.

Page 24: Redshift VS BigQuery

Authentication

Redshift: AWS IAM

BigQuery: OAuth

Page 25: Redshift VS BigQuery

Cluster Management Quotas

Page 26: Redshift VS BigQuery

Amazon Redshift Google BigQueryResources capped by your cluster size

No quotas related to inserts/updates etc

2,000 slots per account

Encourages the append only model with strict DML quotas

Both have a limit of 50 concurrent QueriesCluster resizing a pain with Redshift.

Page 27: Redshift VS BigQuery

Cluster Management Optimization

Page 28: Redshift VS BigQuery

Optimization

Redshift: Distribution Keys (Partitioning), Sort Keys, Column Compression

BigQuery: You don’t have to worry about all these, but it allows you to define time based partitioning.

Page 29: Redshift VS BigQuery

Optimization

Redshift: Distribution Keys (Partitioning), Sort Keys, Column Compression

With great power comes great responsibility: Vacuuming

Page 30: Redshift VS BigQuery

Optimization

With BigQuery you can only optimize your queries on the statement level and partition over time.

Page 31: Redshift VS BigQuery

Cluster Management Costs

Page 32: Redshift VS BigQuery

Costs

BigQuery: more difficult to estimate costs but costs scale based on usage. $5 / TB (on demand)

Redshift: you know exactly how much you will pay but you pay regardless of usage.

Page 33: Redshift VS BigQuery

Ecosystem

Page 34: Redshift VS BigQuery
Page 35: Redshift VS BigQuery

Performance

Page 37: Redshift VS BigQuery

BigQueryRedshiftOR

??

Page 38: Redshift VS BigQuery

Amazon Redshift Google BigQueryVS

similarities are greater than the differences

Page 39: Redshift VS BigQuery

Amazon Redshift Google BigQueryMore predictable costs

More intuitive data modeling (Analysts)

Options for optimizations

Easier & cheaper to start with

Good for nested data

Easier to work with time series

VS

Page 40: Redshift VS BigQuery

[email protected]

www.blendo.co

Get the free Amazon Redshift guide http://bit.ly/redshift-guide