Top Banner
BigQuery Basics Paris 2014
33

Big Query Basics

Jan 27, 2015

Download

Technology

Ido Green

The 'macro view' on Big Query:
We started with an overview, some typical uses and moved to project hierarchy, access control and security.
In the end we touch about tools and demos.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big Query Basics

BigQuery Basics

Paris 2014

Page 2: Big Query Basics

Who? Why?

BigQuery Basics

Ido GreenSolutions Architect

plus.google.com/greenido

greenido.wordpress.com

Page 3: Big Query Basics

Topics we cover in this lesson

● BigQuery Overview● Typical Uses● Project Hierarchy● Access Control and Security● Datasets and Tables ● Tools● Demos

BigQuery Basics

Page 4: Big Query Basics

● MapReduce based analysis can be slow for ad-hoc queries

● Managing data centers and tuning software takes time & money

● Analytics tools should be services

How does BigQuery fit in the analytics landscape?

BigQuery Basics

Page 5: Big Query Basics

Why BigQuery?

● Generate big data reports require expensive servers and skilled database administrators

● Interacting with big data has been expensive, slow and inefficient

● BigQuery changes all that○ Reducing time and expense to query data

BigQuery Basics

Page 6: Big Query Basics

What's BigQuery?● Service for interactive analysis of massive datasets (TBs)

○ Query billions of rows: seconds to write, seconds to return○ Uses a SQL-style query syntax○ It's a service, accessed by a RESTful API

● Reliable and secure○ Replicated across multiple sites○ Secured through Access Control Lists

● Scalable○ Store hundreds of terabytes○ Pay only for what you use

● Fast (really)○ Run ad hoc queries on multi-terabyte data sets in seconds

BigQuery Basics

Page 7: Big Query Basics

Analyzing Large Amount of Data .....at high speed

BigQuery Basics

demobigquery.appspot.com

Page 8: Big Query Basics

Uses

Page 9: Big Query Basics

Typical UsesAnalyzing query results using a visualization library such as Google Charts Tools API

BigQuery Basics

Page 10: Big Query Basics

Typical UsesAnother way to analyze query results with Google Spreadsheets

○ greenido.wordpress.com/2013/12/16/big-query-and-google-spreadsheet-intergration/

○ greenido.wordpress.com/2013/07/24/big-query-power-with-javascript/

BigQuery Basics

Page 11: Big Query Basics

BigQuery Use Cases● Log Analysis. Making sense of computer generated records

● Retailer. Using data to forecast product sales

● Ads Targeting. Targeting proper customer sections

● Sensor Data. Collect and visualize ambient data

● Data Mashup. Query terabytes of heterogeneous data

BigQuery Basics

Page 12: Big Query Basics

Some Customer Case Studies

Uses BigQuery to hone ad targeting and gain insights into their business

Dashboards using BigQuery to analyze booking and inventory data

Use BigQuery to provide their customers ways to expand game engagement and find new channels for monetization

Used BigQuery, App Engine and the Visualizaton API to build a business intelligence solution

BigQuery Basics

Page 13: Big Query Basics

BigQuery Basic Technical Details

Page 14: Big Query Basics

Project Hierarchy● Project. All data in BigQuery belongs inside a project

○ Set of users, APIs, authentication, billing information● Dataset. Holds one or more tables

○ Lowest access control unit (to which ACLs are applied)● Table. Row-column structure that contains actual data● Job. Used to start potentially long running queries

BigQuery Basics

Page 15: Big Query Basics

Datasets and Tables

Table name is represented as follows:● Current Project

<dataset>.<table name>

● Different Project <project>:<dataset>.<table>

e.g. publicdata:samples.wikipedia

BigQuery Basics

Page 16: Big Query Basics

Schema Example● Demographics about names occurrence table schema

name:string,gender:string,count:integer

BigQuery Basics

Page 17: Big Query Basics

Data Types● String

○ UTF-8 encoded, <64kB● Integer

○ 64 bit signed● Float● Boolean

○ "true" or "false", case insensitive● Timestamp

○ String format■ YYYY-MM-DD HH:MM:SS[.sssss] [+/-][HH:MM]

○ Numeric format (seconds from UNIX epoch)■ 1234567890, 1.234567890123456E9

(*) Max row size: 64kBDate type is supported as timestamp

BigQuery Basics

Page 18: Big Query Basics

Data Format

BigQuery supports the following format for loading data:

1. Comma Separated Values (CSV)

2. JSON a. BigQuery can load data faster, if your data contains

embedded newlines.b. Supports nested/repeated data fields

BigQuery Basics

Page 19: Big Query Basics

Loading data with repeated and nested fields is supported by JSON data format only

Repeated and Nested Fields

BigQuery Basics

[

{

"fields": [

{

"mode": "nullable",

"name": "country",

"type": "string"

},

{

"mode": "nullable",

"name": "city",

"type": "string"

}

],

"mode": "repeated",

"name": "location",

"type": "record"

},

...........

[

{

"fields": [

{

"mode": "nullable",

"name": "country",

"type": "string"

},

{

"mode": "nullable",

"name": "city",

"type": "string"

}

],

"mode": "repeated",

"name": "location",

"type": "record"

},

...........

Schema example

Page 20: Big Query Basics

Accessing BigQuery

● BigQuery Web browser○ Imports/exports data, runs

queries ● bq command line tool

○ Performs operations from the command line

● Service API○ RESTful API to access

BigQuery programmatically○ Requires authorization by

OAuth2○ Google client libraries for

Python, Java, JavaScript, PHP, ...

BigQuery Basics

Page 21: Big Query Basics

Third-party Tools

BigQuery Basics

Visualization and Business Intelligence

ETL tools for loading data into BigQuery

Page 22: Big Query Basics

Example of Visualization ToolsUsing commercial visualization tools to graph the query results

BigQuery Basics

Page 23: Big Query Basics

Loading Data Using the Web Browser● Upload from local disk or from Cloud Storage● Start the Web browser● Select Dataset● Create table and follow the wizard steps

BigQuery Basics

Page 24: Big Query Basics

"bq load" commandSyntax

● If not specified, the default file format is CSV (comma separated values)● The files can also use newline delimited JSON format● Schema

○ Either a filename or a comma-separated list of column_name:datatype pairs that describe the file format.

● Data source may be on local machine or on Cloud Storage

Loading Data Using bq Tool

BigQuery Basics

bq load [--source_format=NEWLINE_DELIMITED_JSON|CSV]

destination_table data_source_uri table_schema

Page 25: Big Query Basics

● 1,000 import jobs per table per day

● 10,000 import jobs per project per day

● File size (for both CSV and JSON)○ 1GB for compressed file○ 1TB for uncompressed

■ 4GB for uncompressed CSV with newlines in strings

● 10,000 files per import job

● 1TB per import job

Load Limitations

BigQuery Basics

Page 26: Big Query Basics

CSV/JSON must be split into chunks less than 1TB● "split" command with --line-bytes option● Split to smaller files

○ Easier error recovery○ To smaller data unit (day, month instead of year)

● Uploading to Cloud Storage is recommended

A Few Best Practices

Cloud Storage BigQuery

BigQuery Basics

Page 27: Big Query Basics

● Split Tables by Dates○ Minimize cost of data scanned○ Minimize query time

● Upload Multiple Files to Cloud Storage○ Allows parallel upload into BigQuery

● Denormalize your data

A Few Best Practices

BigQuery Basics

Page 28: Big Query Basics

Exercise & Questions

BigQuery Basics

Page 29: Big Query Basics

Work through Big Query Exercise 1 -- Basics ● Use the BigQuery UI● Use the bq command line tool● Upload a dataset

You will query the public sample GSOD (global summary of day) weather dataset.

You will get and upload earthquake data.

Exercise

BigQuery Basics

Page 30: Big Query Basics

● What are the different ways to load data into BigQuery?

● What is the maximum size of data in a BigQuery table?

● How can we import data into BigQuery?○ What's the limitation?○ What formats does BigQuery accept?

Questions

BigQuery Basics

Page 31: Big Query Basics

Google I/O Data Sensing ● Start the BigQuery Web browser● Click on Display Project in the project chooser dialog window● Enter data-sensing-lab when prompted

● In the dataset data-sensing-lab:io_sensor_data, select the table moscone_io13

● In the New Query box, enter the following query: SELECT * FROM [data-sensing-lab:io_sensor_data.moscone_io13] LIMIT 10

● Click Run Query button● Scroll to see relevant results

BigQuery Basics

Page 32: Big Query Basics

Data Structure● Define table schema when creating table● Data is stored in per-column structure● Each column is handled separately and only combined when

necessaryAdvantage of this data structure:● No need to set index in advance● Load only the relevant Columns

BigQuery Basics

Page 33: Big Query Basics

Questions?

BigQuery Basics

Thank you!