Top Banner
Data Science at Pebble Analyzing Data to Make Smarter Watches June 2, 2015
32
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Partner webinar presentation aws pebble_treasure_data

Data Science at PebbleAnalyzing Data to Make Smarter Watches

June 2, 2015

Page 2: Partner webinar presentation aws pebble_treasure_data

Today’s speakers

Scott Ward

Solutions Architect

Amazon Web Services

Kiyoto Tamura

Head of Marketing

Treasure Data

Susan Holcomb

Head of Analytics

Pebble

Page 3: Partner webinar presentation aws pebble_treasure_data

Data at Pebble

Page 4: Partner webinar presentation aws pebble_treasure_data

What is Pebble?

• Customizable smart watch with crowd-pleasing history

• $10.3MM on Kickstarter with first product

• In March, $20MM on Kickstarter with new product

Page 5: Partner webinar presentation aws pebble_treasure_data

Pebble Data Team: Then vs. Now

One year ago…

No data team

No analytics infrastructure

Barely any data

Barely any insights

Today… 5-person team (& growing!)

Scalable analytics infrastructure via Treasure Data

~60MM records per day

New product influenced by data insights

Page 6: Partner webinar presentation aws pebble_treasure_data

Data Science Workflow

Define the problem

Acquire the data

Fit the model

the work the hype

Page 7: Partner webinar presentation aws pebble_treasure_data

Pebble’s First Problem

How should we measure product success?

Page 8: Partner webinar presentation aws pebble_treasure_data

Engagement Definition

• How can we tell someone likes the watch?– Button presses?– Apps downloaded / launched?– Minimized SW bugs?– A crazy formula combining these?

• Simplest: They are wearing the watch– Use accelerometer

Page 9: Partner webinar presentation aws pebble_treasure_data

Accessing Data

60 MM records per day Scheduled jobs

in TD to post-process & aggregate data

Ad hoc queries in TD to explore data (Presto, Hive)

Dashboards

Standardized output

Process: ~30 queries to get

one result

Page 10: Partner webinar presentation aws pebble_treasure_data

Accelerometer noise threshold

• Accelerometer picks up gestures, net motion (so we can enable cool features)

• Sensitive enough to pick up vibrations of passing train

• Goal: Determine threshold for noise so we can assess when watch is really in use

Page 11: Partner webinar presentation aws pebble_treasure_data

Accelerometer noise threshold

Page 12: Partner webinar presentation aws pebble_treasure_data

First result

???

Page 13: Partner webinar presentation aws pebble_treasure_data

Raising the threshold

peaks shift left spike remainsbacklight data matches original threshold!!

Further validated by survey of users

Page 14: Partner webinar presentation aws pebble_treasure_data

Why this worked

• Rapid, repeated ad hoc querying lets you get an intuitive picture of the data– What is the range?– Where are the errors?– Where are the inflection points?

• Few analytics infrastructure tools optimize for this– Too focused on standardized reporting– Want to sell you black box that spits out “insights”

Page 15: Partner webinar presentation aws pebble_treasure_data

Problems 2-n

• Building scalable reporting system

• Delivering insights that shaped interface for new product

• Discovering signals on user attrition

• Designing models to segment use cases

• Analyzing dozens of product elements to improve product experience

Page 16: Partner webinar presentation aws pebble_treasure_data

thanks <3

Page 17: Partner webinar presentation aws pebble_treasure_data

Product Overview

Kiyoto TamuraDirector of Developer Relations

Page 18: Partner webinar presentation aws pebble_treasure_data

Event Data is Everywhere…

Smartphones Websites Home Automation

WearableDevices

ConnectedVehicles

Page 19: Partner webinar presentation aws pebble_treasure_data

Event Data is Everywhere…

Smartphones Websites Home Automation

WearableDevices

ConnectedVehicles

{“timestamp”: “2015-05-22T13:50:00-0600”,“event”: “tap”,“object”: “button_32”,“user”: { “name”: “Luca”, “email”: “[email protected]”, “twitter”: “luckymethod” }}

Page 20: Partner webinar presentation aws pebble_treasure_data

Connecting the (big) data dots is hard

credit: Matt Turck @ FirstMark Capital

Page 21: Partner webinar presentation aws pebble_treasure_data

We provide a simple solution

Ingest Analyze Distribute

and more…

Page 22: Partner webinar presentation aws pebble_treasure_data

• Streaming or Batch ingestion (or both) with Treasure Agent and Embulk

• Don’t worry about changing the way you send data, Treasure Data handles it all

• 99.99% uptime, our team takes care of running the show so you don’t have to

• Query all your data using SQL, no schema required

• Control Treasure Data through our Console, our Command Line Interface or Luigi-TD for complex automated data pipelines

• Choose Hive or Presto

• Run machine learning at scale with Hivemall

• Expansive collection of export plugins: send data to Google Docs, Tableau, Excel, PostgreSQL…

• Connect your favorite BI tool

• Fine grained user access control to your data

Why is Treasure Data better?

Ingest Analyze Distribute

Page 23: Partner webinar presentation aws pebble_treasure_data

CommerceTechnologyGaming Media & Ad Tech

Our growing customer base

Energy Company

IoT

Page 24: Partner webinar presentation aws pebble_treasure_data

• API Servers (c3.2xlarge)

• Hadoop workers (c3.8xlarge)

• Generic workers (c3.4xlarge)

• Powers our schema-free, columnar store

• 50 billion events/day

• No capacity planning needed!

• Both MySQL & PostgreSQL

• Reduced ops cost

• No dedicated devops for 2.5 years

Treasure Data on AWS

EC2 S3 RDS

Page 25: Partner webinar presentation aws pebble_treasure_data
Page 26: Partner webinar presentation aws pebble_treasure_data
Page 27: Partner webinar presentation aws pebble_treasure_data

Amazon Relational Database Service (RDS)

Amazon RDS is a fully managed relational DB service that is:– Simple to deploy– Easy to scale– Reliable– Cost-effective

Ease of deployment and patching

Push-button scalability

Choice of DB Engines

Automated backups

User snapshots and cloning

Monitoring and auto. host replacement

POSTGRE

Amazon RDS for Aurora (Preview)

Page 28: Partner webinar presentation aws pebble_treasure_data

Amazon RDS - Multi-Availability Zone Configuration

• Configure your RDS environment for high availability and DR

• Primary database running in one Availability Zone with Standby in

another

• DNS Name changes due to unhealthy RDS instance or Availability Zone

Page 29: Partner webinar presentation aws pebble_treasure_data

Availability Zone #1

Web Tier

RDPGW

AppTier

Web Tier

AppTier

Auto Scaling group

Auto Scaling group

Availability Zone #2

Web Tier

AppTier

Web Tier

AppTier

Auto Scaling group

Auto Scaling group

RDS Multi-Availability Zone Architecture

Page 30: Partner webinar presentation aws pebble_treasure_data

Amazon RDS - Read Replicas

Insert Partner Logo Here

Region #1 Region #2

Page 31: Partner webinar presentation aws pebble_treasure_data

Insert Partner Logo Here

Page 32: Partner webinar presentation aws pebble_treasure_data

Questions?

Treasure DataKiyoto Tamura

@kiyototamura

treasuredata.com

PebbleSusan Holcomb

getpebble.com

AWSScott Ward

aws.amazon.com

Contact us to learn more