Top Banner
SNOWPLOW AND LOOKER AT OYSTER.COM SNOWPLOW MEETUP NYC – MARCH 30, 2016 BEN HOYT, DEVON POHL
15

Snowplow Analytics and Looker at Oyster.com

Jan 09, 2017

Download

Data & Analytics

yalisassoon
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Snowplow Analytics and Looker at Oyster.com

SNOWPLOW AND LOOKER AT

OYSTER.COMSNOWPLOW MEETUP NYC – MARCH 30, 2016

BEN HOYT, DEVON POHL

Page 2: Snowplow Analytics and Looker at Oyster.com

WHAT IS OYSTER.COM?• “The Hotel Tell-All”• Authentic hotel reviews and

photos• We visit every hotel in person• 1000 hotels per month• 7M high-res photos• 100k 360° panoramas

Page 3: Snowplow Analytics and Looker at Oyster.com

(SOME OF) OUR TECH STACK

• Python to run our backend: web, scripting, photo processing, ETL• PostgreSQL for all content data (eg: hotels, metadata for 12M images)• Amazon S3 for image storage, EC2 spot instances for photo processing• Amazon Redshift for analytics and reporting data• Looker for reporting and visualizations• for analytics tracking and analytics ETL

Page 4: Snowplow Analytics and Looker at Oyster.com

GOOGLE ANALYTICS V. SNOWPLOWGoogle Analytics

• Good for web, but little control and flexibility

• Hard to get data out of (your data!)

• Crazy pricing model ($0 for free tier, or $150,000/y for premium)

• Can only do web analytics, not other business reporting

Snowplow• Free and open source, with great support and paid tiers

• Puts data into a standard, easily-queryable database (Redshift)

• Focuses on tracking and analytics ETL and does that part well

Page 5: Snowplow Analytics and Looker at Oyster.com

WHY & HOW WE SWITCHED (1 YEAR AGO)

• We were considering Looker for reporting and visualization• Looker rep: “majority of our customers use Snowplow to collect their data”• We dug into Snowplow and liked what we saw• Initially the design felt a bit overkill, but it’s definitely built to scale• We implemented the tracking and pipeline, and haven’t looked back

Page 6: Snowplow Analytics and Looker at Oyster.com

OUR CONTEXT SCHEMA• We use one “custom fields” schema to rule them all• Simple, one table, one SQL join gives us all our custom fields

{ "self": {"name": "custom_fields", "vendor": "com.oyster", "version": "1-0-9"}, "properties": { "page_type": {"type": "string"}, "page_subtype": {"type": "string"}, "template_type": {"type": "string", "enum": ["desktop", "mobile"]},

"hotel_id": {"$ref": "#/definitions/positiveInteger32"}, "account_id": {"$ref": "#/definitions/positiveInteger32"},

"ab_cell": {"type": "integer", "minimum": 1, "maximum": 20}, "checkin_date": {"type": "string", "pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}$"}, ...

Page 7: Snowplow Analytics and Looker at Oyster.com

OUR DATASET

• A large, though not a massive, dataset• Redshift cluster: 6 dc1.large SSD nodes, ~1TB storage• 640 million rows in our events table• We add 1.5 million event rows per day

• We copy (a subset of) our PostgreSQL content database into Redshift nightly

• Enables business reporting and advanced content-based queries

Page 8: Snowplow Analytics and Looker at Oyster.com

PAGETRACKINGEXAMPLE

Page 9: Snowplow Analytics and Looker at Oyster.com

ANALYTICS AND LOOKER (DEVON POHL)

Page 10: Snowplow Analytics and Looker at Oyster.com

REPORTING• Snowplow and content data are merged to provide insights into:

• Product• A/B testing• Funnel mapping

• Marketing• SEO monitoring• Ad Campaigns

• Operations• Workflow Optimization• ROI Modeling

• Business Trends• Traffic• Revenue

Page 11: Snowplow Analytics and Looker at Oyster.com

VISIT TABLE• Event data is large and granular – often hard to digest

• Most valuable pre-processing we do is building the visit table

• Incremental build Python ETL run on Redshift

• This is key to most of our reporting infrastructure

• Combines events, custom fields data

• This visit table:

• Is user and user-session-ID granular

• Includes counts of a variety of event types

• Includes all information associated with first event of a visit

• A/B testing cells

• Referral information

• Etc.

Page 12: Snowplow Analytics and Looker at Oyster.com

LOOKER

• Looker is our core data exploration and reporting tool• Web-based YAML + visualization wrapper on Redshift

• Enables non-technical business owners self-serve reporting and explore• Used for other pre-processing via persistent derived tables (PDTs)

• PDTs are temporary tables built and managed by Looker defined by a query

• Good for small-to-medium size pre-processing

• Applications include de-duping and revenue attribution

Page 13: Snowplow Analytics and Looker at Oyster.com

DASHBOARDS / SAVED REPORTS

Page 14: Snowplow Analytics and Looker at Oyster.com

EXPLORATION

Page 15: Snowplow Analytics and Looker at Oyster.com

OYSTER.COMThe Hotel Tell-All