Top Banner
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Scaling your Application for Growth using Automation November 14,2013 Ken Leung- Euclid Analytics Greg Narain- Chute
28

Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

May 11, 2015

Download

Technology

Growing too quickly may sound like a nice problem to have, unless you are the one having it. A growing business can’t afford not to keep up with customer demand and availability. Don’t be left behind. Come learn how start-ups Chute and Euclid kept up with real-time user-generated data from over 3,000 apps and 2 TB of metadata and stayed ahead of retail peak-time traffic, all with AWS. Hear how they used all that data on their own growth to propel their business even further and deepen relationships with customers. Not planning for growth is just like not planning to grow!
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Scaling your Application for Growth using

Automation

November 14,2013

Ken Leung- Euclid Analytics

Greg Narain- Chute

Page 2: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

What is Euclid?

Page 3: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Online Analytics for the Offline World

E-Commerce Physical Stores

Page 4: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

How Euclid Works

Shopper carrying smartphone

walks by or into store

Euclid analyzes data

for trends and insights

We use Wi-Fi technology to turn in-store behavior into actionable insights

Wi-Fi AP detects smartphone

MAC addresses

XX:XX:XX:XX:XX:XX

Insights on customer acquisition,

engagement and retention

Page 5: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Market Leader in Real World Analytics

• First to develop proprietary Wi-Fi based analytics – Most advanced data analytics capabilities and experience in retail environments

– Backed by tier 1 investors: Series A led by NEA, Series B led by Benchmark Capital

• World-class executive team – Co-founder of Google Analytics, Founding team of ShopperTrak

– Executive experience from Google, SAP, Ariba and Tibco

• Experience with the world’s leading retailers – Specialty retail, QSR, department store, big box, automotive, malls and more

• Largest data scale and rapidly accelerating adoption – Recording >5B events per day

– Dataset with >100M unique devices (shoppers)

– Gartner Cool Vendor 2012; Idea Innovation Award Winner: Business Technology 2012

• Market leadership recognized by:

Page 6: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Euclid is a

Data Company Acquire

Data

•Reliable

•Durable

•Scalable

Process Data

•Efficient

•Flexible

•Scalable

•Versatile

Deliver Data

•Richness

•Sophistication

•Value

As of October, 2013, the

Euclid Network:

• Covers over 600

shopping centers, malls,

and street locations

• Processes 50 TB of raw

data

• Collects over 30 GB of

raw data daily

Page 7: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Euclid’s Challenges

Common Challenges

• Scaling

• Performance

• Cost effectiveness

• Removing the technical

barriers for innovation

• “Failing fast”

Unique Challenges

• Recomputing the entire

history of Euclid data!

– Need fast results

– Need a lot of computational

power, sometimes greater

than 100x of regular daily

compute needs

Page 8: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Euclid’s Use of AWS

Euclid started with AWS from Day One

- Amazon EC2, Amazon RDS, Amazon EMR,

Amazon S3

- AWS Elastic Beanstalk

- Amazon Redshift

Heroku from Amazon Partner Network (APN)

Page 9: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Architecture

Page 10: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Data Acquisition

Elastic Beanstalk

- Multi-AZ, multi-region

- Load balancing, auto scaling

- Monitoring, notification

- Deployment Management

- Amazon EBS-backed volume for failover data recovery

- Log rotation to Amazon S3 (99.999999999% durability)

All built-in.

Page 11: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Data Acquisition - code <%@ page import="java.io.*,java.util.*,com.euclid.spongebob..server.*" %><%

Properties sensorCredentials = (Properties)this.getServletContext().getAttribute("sensor_credentials");

String sensor_id = request.getParameter("sensor_id");

String credential = request.getParameter("credential");

String body = request.getParameter("body");

if (sensor_id == null || !sensorCredentials.containsKey(sensor_id) ||

!sensorCredentials.getProperty(sensor_id).equals(credential)) {

response.sendError(HttpServletResponse.SC_UNAUTHORIZED);

return;

}

java.util.logging.Logger logger = java.util.logging.Logger.getLogger("spongebob");

logger.log(java.util.logging.Level.INFO, body);

response.setStatus(HttpServletResponse.SC_OK);

%>

Page 12: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Data Acquisition - Principles

• Log to Amazon EBS Volume – high I/O

performance

• As “dumb” as possible: reliable

• Fork data from disk to – Amazon S3 for batch processing

– Kafka messaging service for real time processing

Page 13: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Data Acquisition – System Monitor

• Low latency

• Low CPU utilization

Page 14: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Data Processing - Pipeline

R/D

Analytics

Raw Data

Product dashboard, insights

Map

Reduce

(EMR)

Page 15: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Pipeline – Dual Purposes

Two worlds, one platform

• Big Data Engineering – noSQL – Pig Latin with Amazon EMR (Java, Python UDFs)

– Work flows (Jenkins), shell scripting

• Analytics, Analysts, Business – SQL – Excel

– Tableau

– Maybe some Python, etc.

Page 16: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Pipeline - Architecture

SQL MapReduce

Raw Data

Aggr.

Level 1

Aggr.

Level n

Amazon S3 SQL DB: MySQL, Redshift

Product dashboard, insights

MySQL

Some Raw Data

Aggr.

Level 1

Aggr.

Level n

Meta

Data

3rd Party

Data

Models

Algorithms

R&D Models

Algorithms

Analytics Direct

DB Load Meta

Data

3rd Party

Data

Page 17: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

SQL: MySQL, Amazon Redshift, both by AWS

• Started with MySQL, Amazon Redshift Preview Jan

2013

• MySQL 1TB limit vs Amazon Redshift PB scale

• Performance, night and day – E.g., count distinct of 100m rows: 5h in MySQL, 2m in Amazon Redshift

• Amazon Redshift: killer data warehouse – Low cost

– No DBA!

– Easy integration

Page 18: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Pipeline - Monitoring

• System monitoring provided by AWS

• Workflow monitoring with Jenkins – Failure notification

– Dependency management

• Data quality (including acquisition) monitoring – Also utilize Jenkins

– Scripts that check data at various stages

– Each script as a job in the Jenkins workflow

Page 19: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Pipeline - Workflow

Part of the Jenkins Dependency Graph

Page 20: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

AWS Benefits

• “Apps not Ops” – Euclid does not have/need an

Ops team

• Scale up and down on demand

• Pay as we go

• Agile (innovations, time-to-market)

Page 21: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Chute

1. Data

2. Automation

3. Uptime

4. Monitoring

Page 22: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Data

● Real time analytics is hard

● Hadoop!

○ Sqoop imports SQL data to HDFS

○ Clojure

○ Scalding (github.com/twitter/scalding)

● Elasticsearch, Logstash

○ parse logs to track activity for customers

Page 23: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Sharded Postgres

Hadoop cluster

or

EMR

S3 HDFS

SQOOP Server

Page 24: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

ElasticSearch

ELB

N number of

EC2 instances

● varnish

● logstash

Redis cluster

Events Server

● nginx

● logstash

Kibana

plugin front ends

API

Page 25: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Automation through DevOps

● Chute has 100 servers

○ Configured many manually

○ 82? of 100 now managed by Chef

● Whirr

● Sqoop and Cron to automate data import

● route53 with Chef for urls

Page 26: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Uptime

● Architect applications to scale horizontally

○ AWS launches servers on demand

○ spot and reserve pricing

● Keep services running with Chef

○ Chef makes it easy to wrap programs as

a service on AWS

Page 27: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Monitoring

● newrelic

○ server resource monitoring

○ application monitoring

● logstash + kibana

○ elasticsearch backend

○ redis (cluster)

○ can monitor server logs

Page 28: Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent 2013

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

CPN209