Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki

Post on 10-May-2015

1115 Views

Category:

Software

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Big data is amazing. You can get insights from your users, find interesting patterns and have lots of geek fun. Problem is big data usually means many servers, a complex set up, intensive monitoring and a steep learning curve. All those things cost money. If you don’t have the money, you are losing all the fun. In my talk I show you how you can use Google BigQuery to manage big data from your application using a hosted solution. And you can start with less than $1 per month.

Transcript

@supercoco9#devoxxBigQuery

Big Data with Google BigQuery

Javier Ramirez@supercoco9https://teowaki.com

@supercoco9#DevoxxBigquery

Managing Big Data with BigQuery

Javier Ramirez

•Writing software since 1996

•Web dev. since 1999 (C++, JAVA, PHP, Ruby, JS...)

•Founder of https://teowaki.com

•Google Developer Expert on the Cloud Platform

@YourTwitterHandle@supercoco9#DevoxxBigquery

BIG

BIG

DAT

A

DAT

A

@YourTwitterHandle@supercoco9#DevoxxBigquery

BIG

BIG

SERVER

S

SERVER

S

@YourTwitterHandle@supercoco9#DevoxxBigquery

BIG

BIG

DEV

OPS

DEV

OPS

@YourTwitterHandle@supercoco9#DevoxxBigquery

BIG

BIG

MONEY

MONEY

bigdata is cool but...

hard to set up and monitor

expensive cluster

not interactive enough

@supercoco9#DevoxxBigquery

bigdata is doing a fullscan to 330MM rows, matching them against a regexp, and getting the result (223MM rows) in just 5 seconds

Google BigQuery

Data analysis as a service

http://developers.google.com/bigquery

Based on “Dremel”

Specifically designed for interactive queries over

petabytes of real-time data

@supercoco9#DevoxxBigquery

Your only worries

•Load data

•Query the dataset

loading data.

You just send the data in

text (or JSON) format

up to 100K inserts per second

in stream mode

It's just SQL

select name from USERS order by date;

select count(*) from users;

select max(date) from USERS;

select sum(total) from ORDERS group by user;

@supercoco9#DevoxxBigquery

Subselect and joins out of the box

SELECT Year, Actor1Name, Actor2Name, Count FROM (SELECT Actor1Name, Actor2Name, Year, COUNT(*) Count, RANK() OVER(PARTITION BY YEAR ORDER BY Count DESC) rankFROM

(SELECT Actor1Name, Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name < Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode), (SELECT Actor2Name Actor1Name, Actor1Name Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name > Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode),

WHERE Actor1Name IS NOT nullAND Actor2Name IS NOT nullGROUP EACH BY 1, 2, 3HAVING Count > 100)

WHERE rank=1ORDER BY Year

http://gdeltproject.org/data.html#googlebigquery

@supercoco9#DevoxxBigquery

specific extensions for analytics

withinflattennest

stddev

topfirstlastnth

variance

var_popvar_samp

covar_popcovar_samp

quantiles

correlations

Things you always wanted to try but were too scared to

select count(*) from publicdata:samples.wikipedia where REGEXP_MATCH(title, "[0-9]*") AND wp_namespace = 0;

223,163,387 Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢)

columnar storage

https://cookbook.experiencesaphana.com/crm/what-is-crm-on-hana/technology-innovation/row-vs-column-based/

highly distributed execution using a tree

web console screenshot

@supercoco9#DevoxxBigquery

country segmented traffic

@supercoco9#DevoxxBigqueryjavier ramirez @supercoco9 https://teowaki.com

window functions

@supercoco9#DevoxxBigquery

our most active user

@supercoco9#DevoxxBigquery

Worldwide events in the last 36 years

SELECT Year, Actor1Name, Actor2Name, Count FROM (SELECT Actor1Name, Actor2Name, Year, COUNT(*) Count, RANK() OVER(PARTITION BY YEAR ORDER BY Count DESC) rankFROM

(SELECT Actor1Name, Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name < Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode), (SELECT Actor2Name Actor1Name, Actor1Name Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name > Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode),

WHERE Actor1Name IS NOT nullAND Actor2Name IS NOT nullGROUP EACH BY 1, 2, 3HAVING Count > 100)

WHERE rank=1ORDER BY Year

http://gdeltproject.org/data.html#googlebigquery

SELECT repository_name, repository_language, repository_description, COUNT(repository_name) as cnt,repository_urlFROM github.timelineWHERE type="WatchEvent"AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC("#{yesterday} 20:00:00")AND repository_url IN (

SELECT repository_urlFROM github.timelineWHERE type="CreateEvent"AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('#{yesterday}

20:00:00')AND repository_fork = "false"AND payload_ref_type = "repository"GROUP BY repository_url

)GROUP BY repository_name, repository_language, repository_description, repository_urlHAVING cnt >= 5ORDER BY cnt DESCLIMIT 25

@supercoco9#DevoxxBigquery

@supercoco9#DevoxxBigquery

Automation with Apps Script

●Read from BigQuery

●Create a spreadsheet on Drive

●E-mail it everyday as a PDF

https://developers.google.com/apps-script/

@supercoco9#DevoxxBigquery

bigquery pricing

$26 per stored TB1000000 rows => $0.00416 / month

£0.00243 / month

$5 per processed TB1 full scan = 160 MB

1 count = 0 MB1 full scan over 1 column = 5.4 MB100 GB => $0.05 / month £0.03

AppsScripts is for free

@supercoco9#DevoxxBigquery

£0.054307 / month*

per 1MM rows

*the 1st 1TB every month is free of charge**assumming your rows have web server logs-like info

price per month

@supercoco9#DevoxxBigquery

ig

@YourTwitterHandle#DVXFR14{session hashtag} @supercoco9#devoxxBigquery

THAN

KS!

Javier Ramirez@supercoco9https://teowaki.com

Related links at:

https://teowaki.com/teams/javier-community/link-categories/bigquery-talk

@supercoco9#DevoxxBigquery

Thanks / Creative Commons

•Presentation Template — Guillaume LaForge

•The Queen — A prestigious heritage with some inspiration from The Sex Pistols and funny Devoxxians

•Girl with a Balloon — Banksy

•Tube — Michael Keen

top related