Top Banner
Rob Winters Stefano Oldeman
34

Big Data at a Gaming Company: Spil Games

Jul 17, 2015

Download

Data & Analytics

Rob Winters
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big Data at a Gaming Company: Spil Games

Rob Winters

Stefano Oldeman

Page 2: Big Data at a Gaming Company: Spil Games

• Introductions – Rob, Stefano, Spil

• Analytics at Spil, the Journey Questions that drove the evolution

• Architectural Overview Designing successful analytics architecture

• Analytical Case Studies How can you change the business?

▫ Information empowerment (Self-service BI)

▫ Data mining/Predictive Analytics

▫ Personalization

• Key Learnings What are mistakes for you to avoid?

Today’s Agenda

Page 3: Big Data at a Gaming Company: Spil Games

The value of an idea lies in the using of it.-Thomas Edison

A person who is gifted sees the essential point and leaves the rest as surplus.-Thomas Carlyle

If you torture the data long enough, it will confess.-Ronald Coase

Page 4: Big Data at a Gaming Company: Spil Games

Rob Winters

• Current role: Head of Data Technology, de Bijenkorf

• Formerly Director of Analytics, Spil Games

• Eight years in analytics, four in leadership roles

• Industries include telecom, gaming, retail, and e-commerce

Spil Games

• Web and mobile gaming company based in Hilversum

• >150 million monthly unique visitors, >1 billion gameplays monthly

• Activity measured in >150 countries across ~50 sites

Stefano Oldeman

• Current role: Big data Developer, Shop2market

• Former Developer in Big Data program, Spil Games

• 4 years in High available / High performance applications. 3 years building BI solutions.

Page 5: Big Data at a Gaming Company: Spil Games

The Story of our Journey

Page 6: Big Data at a Gaming Company: Spil Games

What drove Spil to Analytics?

We were here

2014 version

Stagnating growth, needed to act differently Buzzword ideas without understanding

Zynga was growing by using data, Spil’s growth was slowing

“Data-driven” was a hot buzzword in 2007

2014’s version: “I need a Hadoop!”

Page 7: Big Data at a Gaming Company: Spil Games

• Nightly copies of production data into Postgres• No integration model for data sources• Event tracking = Google Analytics• Reporting was “send out the numbers” rather

than “analyze data and answer questions”

Page 8: Big Data at a Gaming Company: Spil Games

Starting the Big Data program

Rationale:• We needed higher connectibility, flexibility, and quicker insights than were

possible with existing solutions• Wanted to “own” our own data versus having it on an external system

First steps:• Answer “what do we want to track?”• Deploy the fundamental components:▫ Tracking Library▫ Logging infrastructure▫ MapReduce platform▫ Scheduling

• Start basic event tracking

Page 9: Big Data at a Gaming Company: Spil Games

Systems and Architecture

Page 10: Big Data at a Gaming Company: Spil Games

Architectural Overview

Plus a scheduler!

Page 11: Big Data at a Gaming Company: Spil Games

Event tracking principles

• Think of it as Information architecture

• Each event should refer to an actual business-user interaction

• Use multiple events over time to tell what happened

• Never tell the system what did NOT happen

• Agree upfront on structures and definitions that explain your business

Page 12: Big Data at a Gaming Company: Spil Games

Challenges with data pipelines

Think in pipes

Things will fail somewhere

Be generic

Keep moving until the end

Page 13: Big Data at a Gaming Company: Spil Games

• Two-tier architecture: Hadoop/Disco for “big data” persistence and processing, analytical database for data warehousing and analytics

• All data persisted in Hadoop

• Some data is made available in your DB

• Offload big data calculations to Hadoop

• When data is not complete or business logic changes: Replay data from your Hadoop to your DB

The Data Lake

Page 14: Big Data at a Gaming Company: Spil Games

The right tool for the right job: ETL tools plus raw code

Load first, integrate later – ELT versus ETL

Everybody lies. Manage your own metadata and provide a feedback loop

Page 15: Big Data at a Gaming Company: Spil Games

Vertica: Our column store data warehouse

• The goal: offer users complete data in a high performance environment

• Reporting namespace for normalized tables▫ Names are user friendly

▫ Optimized for drag and drop queries / reports

• Users escalate when they find incorrect data▫ This feedback is then processed in the data pipeline

▫ Data is processed again to correct mistakes

Why not just Hadoop?Source Merging, Ad hoc, Run-time query performance

Page 16: Big Data at a Gaming Company: Spil Games

Tools for the use case: visual analytics, standard dashboards, statistical environment

Analysts need to use development best practices – version control, deployment mechanisms, metadata-driven models

Everyone else needs something simple, intuitive, and FUN

Performance is critical. You have <5 seconds to load a report

Page 17: Big Data at a Gaming Company: Spil Games

Use Cases

Page 18: Big Data at a Gaming Company: Spil Games

Primary Objective:An organization that can

Formulate

Ask

Explore

and Answer questions using data

Page 19: Big Data at a Gaming Company: Spil Games

Engaging the frontline is not the same as management support

Balancing operational needs versus management needs

Scale the BI team support at a better than linear rate

Roadshows, 1:1 sessions, and informal learn@lunches to discuss data questions

Centralize your systems, distribute your support via power users

Challenge all requests equally on a value basis; fit “tweaks” to dev windows

Page 20: Big Data at a Gaming Company: Spil Games

Avoid presumptions in keys; Avoid “interpretation”; keep the raw records!

How do we enforce consistency without limiting future flexibility?

Build to FAIL – jobs should be able to be run at any time, repeatedly, without requiring intervention

How can we be resilient, fail gracefully, and recover automatically?

It’s in the tooling – use systems that don’t require pre-aggregation or complex end-user querying

How can we allow deep exploration without compromising performance?

Page 21: Big Data at a Gaming Company: Spil Games
Page 22: Big Data at a Gaming Company: Spil Games

Is our technology ready to support personalization?Can we use data to (semi) automatically improve our business?

Page 23: Big Data at a Gaming Company: Spil Games
Page 24: Big Data at a Gaming Company: Spil Games

There are known knowns – Donald Rumsfeld

API’s to integrate with product

Releases and Servers

Page 25: Big Data at a Gaming Company: Spil Games
Page 26: Big Data at a Gaming Company: Spil Games

Can we create additional user and business value by delivering an individual

experience to (almost) everyone?

Page 27: Big Data at a Gaming Company: Spil Games
Page 28: Big Data at a Gaming Company: Spil Games

Loading data into production

Non happy flow..

Page 29: Big Data at a Gaming Company: Spil Games
Page 30: Big Data at a Gaming Company: Spil Games

Key Learnings

Page 31: Big Data at a Gaming Company: Spil Games

• Plan for 10x more than today, design for 100x more

• Build versus off-the-shelf: I’ve built an event tracker

• Testing on production: NOT OPTIONAL

• “Cheap” isn’t always cheap

▫ Expensive software which offsets hardware and (most importantly) people costs can often deliver much lower

TCO

Page 32: Big Data at a Gaming Company: Spil Games

• Hadoop is great to work on, there are so many tools.

▫ But you don’t want to worry about the infrastructure (outsourced)

• And yet, the developers and infra engineers have to work closely

▫ Users/Scripts should not store small files (who should/can influence)

• Simple is better than complex (Flume + Avro FF).

• Security..

▫ Problem 1: user X uploads a file, hive can’t read it

▫ Problem 2: user X creates a table with hive, can user Y can’t write to it..

• Conclusion:

▫ Clear agreements on who can do what.. And follow defined requirements set by the users (TTL of files, FileFormats, when to upgrade).

What We Learned about Hadoop

Page 33: Big Data at a Gaming Company: Spil Games

• Reduce your analytical cycles!▫ 15 minute query time = <20 iterations per day; 2 minute query = > 100

• Walk before you run

• Be wary of the HIPPO (Highest Paid Person in the Organization)

• Focus on developing internal bench strength and power users above the general organization

• Old tricks are the best tricks (when done right)

What we learned about analytics

Page 34: Big Data at a Gaming Company: Spil Games

The greatest value of a picture is when it forces us to notice what we never expected to see.

-John Tukey