Top Banner
51

BigQuery Architecture

Feb 09, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BigQuery Architecture
Page 2: BigQuery Architecture

Crunching Big Data with BigQuery

Ryan Boyd, Developer Advocatehttp://profiles.google.com/ryan.boyd@ryguyrg

XLDBTuesday, September 11th

2012

Page 3: BigQuery Architecture

How BIG is big?

Page 4: BigQuery Architecture

1 million rows?

Page 5: BigQuery Architecture

1 million1 million1 million1 million1 million1 million1 million1 million1 million1 million

10 million rows?

Page 6: BigQuery Architecture

1 million1 million1 million1 million1 million1 million1 million1 million1 million1 million

1 million1 million1 million1 million1 million1 million1 million1 million1 million1 million

1 million1 million1 million1 million1 million1 million1 million1 million1 million1 million

1 million1 million1 million1 million1 million1 million1 million1 million1 million1 million

100 million rows?

Page 7: BigQuery Architecture

1 million

1 million1 million 1 million

1 million1 million1 million1 million1 million1 million1 million1 million1 million

1 million1 million1 million1 million1 million1 million1 million1 million1 million1 million

1 million1 million1 million1 million1 million1 million1 million1 million1 million1 million

1 million1 million1 million1 million1 million1 million1 million1 million1 million1 million

1 million1 million1 million1 million

1 million1 million1 million1 million1 million

1 million1 million1 million1 million1 million1 million1 million1 million1 million

1 million1 million1 million1 million1 million1 million1 million1 million1 million1 million

1 million1 million1 million1 million1 million1 million1 million

1 million

1 million1 million1 million

1 million1 million1 million1 million

1 million

1 million

1 million1 million1 million

1 million1 million1 million1 million

1 million1 million1 million1 million

1 million1 million1 million1 million1 million1 million1 million1 million1 million1 million

1 million1 million1 million1 million1 million1 million1 million1 million1 million1 million

1 million1 million1 million1 million1 million1 million1 million1 million1 million1 million

1 million1 million1 million1 million1 million1 million1 million1 million1 million1 million

1 million1 million1 million1 million1 million1 million1 million1 million1 million1 million

1 million1 million1 million1 million1 million1 million1 million1 million1 million1 million

500 million rows!

1 million

Page 8: BigQuery Architecture

Big Data at Google

60 hours100 million gigabytes425 million users

Page 9: BigQuery Architecture
Page 10: BigQuery Architecture
Page 11: BigQuery Architecture

Google's internal technology: Dremel

Page 12: BigQuery Architecture

Big Data at Google - Finding top installed market apps

SELECT top(appId, 20) AS app, count(*) AS countFROM installlog.2012;ORDER BY count DESC

Result in ~20 seconds!

Page 13: BigQuery Architecture

Big Data at Google - Finding slow servers

SELECT count(*) AS count, source_machine AS machineFROM product.product_log.liveWHERE elapsed_time > 4000GROUP BY source_machineORDER BY count DESC

Result in ~20 seconds!

Page 14: BigQuery Architecture

BigQuery gives you this power

Store data with reliability, redundancy and consistency

Go from data to meaning

Quickly!

At scale ...

Page 15: BigQuery Architecture

How are developers using it?

Game and social media analytics

Advertising campaign optimization

Sensor data analysis

Infrastructure monitoring

Page 16: BigQuery Architecture

● Show the power● Running your queries● Preparing your data● Visualizing your data● Underlying architecture design● Latest and greatest features

Agenda

Page 17: BigQuery Architecture

Let's dive in!

Page 18: BigQuery Architecture

It's an API

Page 19: BigQuery Architecture

Google Cloud Storage

Upload your Data

Page 20: BigQuery Architecture

BigQuery

Your data is loaded

Google Cloud Storage

Page 21: BigQuery Architecture

Load your data into BigQuery"jobReference":{ "projectId":"605902584318"}, "configuration":{ "load":{ "destinationTable":{ "projectId":"605902584318", "datasetId":"my_dataset", "tableId":"widget_sales"}, "sourceUris":[ "gs://widget-sales-data/2012080100.csv"], "schema":{ "fields":[{ "name":"widget", "type":"string"}, ...

POST https://www.googleapis.com/bigquery/v2/projects/605902584318/jobs

Page 22: BigQuery Architecture

Query Away!

"jobReference":{ "projectId":"605902584318"}, "query":"SELECT TOP(widget, 50), COUNT(*) AS sale_count FROM widget_sales", "maxResults":100, "apiVersion":"v2"}

POST https://www.googleapis.com/bigquery/v2/projects/605902584318/jobs

Page 23: BigQuery Architecture

● Java● Python● .NET● PHP● JavaScript● Apps Script● ... more ...

Libraries

Page 24: BigQuery Architecture

Libraries - Example JavaScript query

var request = gapi.client.bigquery.jobs.query({ 'projectId': project_id, 'timeoutMs': '30000', 'query': 'SELECT state, AVG(mother_age) AS theav FROM [publicdata:samples.natality] WHERE year=2000 AND ever_born=1 GROUP BY state ORDER BY theav DESC;'});

request.execute(function(response) { console.log(response); $.each(response.result.rows, function(i, item) { ...

Page 25: BigQuery Architecture

BigQuery UI

bigquery.cloud.google.com

Page 26: BigQuery Architecture

Preparing your Data

Page 27: BigQuery Architecture

Schema definition

birth_record

parent_id_motherparent_id_fatherpluralityis_maleraceweight

parents

idraceagecigarette_usestate

Page 28: BigQuery Architecture

Schema definition

birth_record

mother_racemother_agemother_cigarette_usemother_statefather_racefather_agefather_cigarette_usefather_statepluralityis_maleraceweight

Page 29: BigQuery Architecture

Schema definition - sharding

birth_record_2011

mother_racemother_agemother_cigarette_usemother_statefather_racefather_agefather_cigarette_usefather_statepluralityis_maleraceweight

birth_record_2012

mother_racemother_agemother_cigarette_usemother_statefather_racefather_agefather_cigarette_usefather_statepluralityis_maleraceweight

birth_record_2013

birth_record_2014

birth_record_2015

birth_record_2016

Page 30: BigQuery Architecture

Data format

1969,1969,1,20,,AL,TRUE,1,7.813,AL,1,20,true1971,1971,5,7,,NY,FALSE,1,7.213,MA,5,7,true2001,2001,12,5,,CA,TRUE,2,6.427,CA,12,5,true

CSV

Page 31: BigQuery Architecture

Tools to prepare your data

● App Engine MapReduce● Commercial ETL tools

● Pervasive● Informatica● Talend

● UNIX command-line

Page 32: BigQuery Architecture

Visualizing your Data

Page 33: BigQuery Architecture

Commercial visualization tools

Page 34: BigQuery Architecture

Google Spreadsheets

Page 35: BigQuery Architecture

Custom code and the Google Visualization API

Page 36: BigQuery Architecture

BigQuery architecture

Page 37: BigQuery Architecture

“ If you do a table scan over a 1TB table, you're going to have a bad time. ”

Anonymous16th century Italian Philosopher-Monk

Page 38: BigQuery Architecture

● Reading 1 TB/ second from disk:● 10k+ disks

● Processing 1 TB / sec:● 5k processors

Goal: Perform a 1 TB table scan in 1 second

Parallelize Parallelize Parallelize!

Page 39: BigQuery Architecture

Data access: Column Store

Record Oriented Storage Column Oriented Storage

Page 40: BigQuery Architecture

Distributed Storage (e.g. GFS)

BigQuery Architecture

Mixer 0

Mixer 1Shard 0-8

Mixer 1Shard 17-24

Mixer 1Shard 9-16

Shard 0 Shard 10 Shard 12 Shard 24Shard 20

Page 41: BigQuery Architecture

Running your Queries

Page 42: BigQuery Architecture

SELECT COUNT(foo), MAX(foo), STDDEV(foo) FROM ...

BigQuery SQL Example: Simple aggregates

Page 43: BigQuery Architecture

SELECT ... FROM ....WHERE REGEXP_MATCH(url, "\.com$") AND user CONTAINS 'test'

BigQuery SQL Example: Complex Processing

Page 44: BigQuery Architecture

SELECT COUNT(*) FROM (SELECT foo ..... )GROUP BY foo

BigQuery SQL Example: Nested SELECT

Page 45: BigQuery Architecture

BigQuery SQL Example: Small JOIN

SELECT huge_table.foo FROM huge_tableJOIN small_table ON small_table.foo = huge_table.foo

Page 46: BigQuery Architecture

Distributed Storage (e.g. GFS)

BigQuery Architecture: Small Join

Mixer 0

Mixer 1Shard 0-8

Mixer 1Shard 17-24

Shard 0 Shard 24Shard 20

Page 47: BigQuery Architecture

Other new features!

Page 48: BigQuery Architecture

Batch queries!

● Don't need interactive queries for some jobs?● priority: "BATCH"

Page 49: BigQuery Architecture

● API● Column-based datastore● Full table scans FAST● Aggregates● Commercial tool support● Use cases

That's it

Page 50: BigQuery Architecture

SELECT questions FROM audience

SELECT 'Thank You!' FROM ryan

http://developers.google.com/bigquery

@ryguyrg http://profiles.google.com/ryan.boyd

Page 51: BigQuery Architecture