Since Amazon Redshift launched last year, it has been adopted by a wide variety of companies for data warehousing. In this session, learn how customers NASDAQ, HauteLook, and Roundarch Isobar are taking advantage of Amazon Redshift for three unique use cases: enterprise, big data, and SaaS. Learn about their implementations and how they made data analysis faster, cheaper, and easier with Amazon Redshift.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• Private sale, members-only limited-time sale events
• Premium fashion and lifestyle brands at exclusive prices of
50-75% off
• Over 20 new sale events begin each morning at 8am PST
• Over 14 million members
• Acquired by Nordstrom in 2011
Why a Data Warehouse?
• Centralized storage of multiple data sources
• Singular reporting consistency for all departments
• Data model that supports analytics not transactions
• Operational reports vs. analytical reports – Real-time vs. previous day
Why Amazon Redshift?
• Looked at some competitors: – Ranged from $ to $$$
– All required Software, Implementation and BIG Hardware
• Skipped the RFP
• Jumped into the Public Beta of Amazon Redshift and never looked back
How We Implemented Amazon Redshift
• ETL from MySQL and Microsoft SQL Server into AWS across a Direct Connect line storing on S3
• Also used S3 to dump flat files (iTunes Connect Data, Web Analytics dumps, log files, etc)
• Used AWS Data Pipeline for executing Sqoop and Hadoop running on EC2 to load data into Amazon Redshift
• Redshift Data Model based on Star Schema which looks something like …
Example of Star Schema
Usage with Business Intelligence
• Already selected a BI Tool
• Had difficulty deploying in the cloud
• But worked great on-premises
• Easily tied into Amazon Redshift using ODBC Drivers
• BUT, metadata for reports had to live in MSSQL
• Ported many SSIS/SSRS reports over
– But only the analytical reports!
And it all looks like this
Amazon Redshift Instances
• We use a little under 2TB
• Thought to use 2 - BIG 8XL instance to get great performance (in passive failover mode)
• Cost us $$$
• Then we tested using 6 - XL instances in a cluster
• Performed better and allowed for more concurrency of queries in all but a handful of cases that really needed the 8XL power
• Cost us $
• Duh! That’s why we do distributed everything else!!
Some First Hand Experience
• ETL was hardest part
• Amazon Redshift performs awesome
• Someone needs to make a great client SQL tool
• MicroStrategy works great on it (just wished it loved running in EC2)
• Saving a ton, thanks to:
– No hardware costs
– No maintenance/overhead (rack + power)
– Annual costs are equivalent to just the annual maintenance of some of the cheaper DW on-premises options
Conclusion/Last Advice • Only use 8XL instances if you need >2TB of space
– Otherwise distribute on a bunch of XL nodes
• Buy reserved instances (we still need to do this!) since you likely will have this always on
• Although we haven’t yet, the idea of a flexible scale-up/down DW is crazy awesome – maybe during Holiday we will
• Probably could have used Elastic MapReduce instead of Hadoop – wasn’t sure how it would play with Sqoop
• Almost all BI tools play with Amazon Redshift now, so choose what is right for your business, and make sure it works in EC2 before just putting it there
• Communication between AWS and your DC is easy and fast, but I recommend a Direct Connect
• Passed our rigorous information security standards, but used in a VPC
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases
Parag Thakker – VP, Roundarch Isobar
Colin McGuigan – Architect, Roundarch Isobar
November 15th, 2013
27 27 27 27 27 27 27
OUR SERVICES ACROSS BOUGHT, OWNED AND EARNED MEDIA
Strategies We digitally transform business processes and
disrupt industries
Campaigns We create, measure and optimize digitally-focused
campaigns
Business planning: competitive & industry analysis, business cases, maturity models, roadmaps Strategies: brand, interactive, multi-channel, social, content
Audience insight Communications planning Creative: advertising, visual design, content creation, studio production Optimization: analytics, monitoring, SEO, MVT, media ROI analysis
Experiences We produce joyful
experiences that inspire consumer interaction
Platforms We design and build flexible and scalable technology solutions
Research: competitive, segmentation, persona development, heuristics Requirements and specifications: content analysis and specs, functional requirements, functional specifications User experience design: information architecture, taxonomy and meta data, interaction design, mobile
Platforms: content management, search, portals, mobile, front-end technology, internet-enabled devices/wearables, social apps, web services, security, big data, hosting
Products We invent digital
products that generate new revenue streams
Digital products Digital product extensions Brand as a service
roundarch isobar
28 28 28 28 28 28 28
• 4-5+ million pages daily (40-70 Mbit/sec)
• Portal availability over 99.9% of time
• 28 production enterprise services
• Over 300 applications available
• Public-facing and secure private instances (NIPR & SIPR)
• Portal support for over 5,000 “Communities of Interest”
Key metrics for our USAF work include:
• 900,000+ registered users
• 700,000+ PK-E users
• Response time worldwide: 3 seconds for 80% of all pages
• Over 1.2 million logins/week
• 124,000 unique daily users
U.S. Air Force
We have served the U.S. Air Force since 2001, building their enterprise portal and many mission-critical applications
29 29 29 29 29 29 29
Transforming in-stadium operations through a touch-screen command center
and game data, allowing the Jets owner, Woody Johnson, to monitor
the fan experience during game time and make operational
decisions that help maximize sales. The command center provides
summary-level and drill-down views of stadium operations such as
tickets, parking and concessions. It also creates predictive
algorithms that help identify pinch points and open revenue
opportunities.
New York Jets
“We brought the big picture close enough to identify new, better ways to do business.”
30 30 30 30 30 30 30
Technology:
• JavaScript, HTML5, CSS3
• Uses Jquery,
JavascriptMVC, Less
• JSON Web Services
• Java, Spring, JPA, Mongo
DB
• User comment: “We love
how fast it is!”
• Facilitates collaboration between
portfolio managers and analysts
• Provides a holistic view of a
company/stock
– What is everything our
organization knows about
AAPL
• Digitizes PDF/Excel tools and
reports to enable rich, dynamic
interactions
• Simplifies content creation; e.g.,
comments, recommendation
reports, document upload
• Rich charting and visualization of
analytics
William Blair | Investment Research Management System
Through a joint venture with Copia Capital, we created a new product offering for William Blair
31 31 31 31 31 31
What is the focus of your CMO today?
Optimize marketing spend across all channels (Bought, Earned and Owned)
32 32 32 32 32 32
billions marketing spend
dozens media channels
hundreds data sources
multiple terabytes data size
multiple clients
domain
Search
Display
Ads
Email
Affiliate
Social
Print
Mobile
Sales TV
Radio
Web
marketing effectiveness stages
Analyze
Learn
Optimize
• Centralized cross channel
Big Data Platform
• Standardized cross channel
reporting tools
• Discovery tools to identify
channel optimization
opportunities
• Modeling solutions
• Channel experience
enhancements
• Improved media buying,
planning & reporting functions
• Real time integration into DSP
• A/B testing based micro
segment adjustments
DLP AMNET
Scorecard
Scorecard
Compass
Real-Time and Non-Real-Time
Sonar
34 34 34 34 34 34
So what have we accomplished?
Built Marketing Analytics Platform - Radar to enable in-time analytics, reporting and optimization for multiple clients with customized metrics with 200+ feeds (1TB/week) with various frequency, granularity and classification as scalable multi-tenant SaaS platform on Amazon with first launch in 3 months
35 35 35 35 35 35
scorecard dashboard
36 36 36 36 36 36 36
Detailed Analytic Reports
Scorecard App
TV
DDS
Media Team Client Stakeholders
Media Team Planners Client Team
scorecard logical architecture
Paid Search
Google Bing
Marin
Organic Search
Google Bing
Sales
TBD
Digital Video Custom
Site Metrics
Google Omniture
Display
Google DFA
Radio
DDS
Paid Social Facebook
Twitter
Print OOH
DDS
Earned Social Facebook
Twitter
Competit
ive Custom
37 37 37 37 37 37
Voluminous Data
Digital
CRM
Research
- Surveys
- Demographics - Campaigns
- Search - Mobile - Attribution - Site - Social - Display