AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Welcome! Big Data in The Cloud: Architecting a Better Platform Brian Kinlaw, Principal Solution Architect, CSC
32
Embed
Big Data in The Cloud: Architecting a Better Platform
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Welcome!
Big Data in The Cloud: Architecting a Better Platform
Brian Kinlaw, Principal Solution Architect, CSC
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Today’s Presenters
Brian KinlawPrincipal Solution ArchitectCSC Emerging Business GroupLeads the initiation, development and execution of Big Data, Analytics, Social Media, Mobile, Cloud, Cyber Security, and Internet of Things (IoT) solutions for the Office of the CTO.
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Risk Insights: Reduce fraudulent activity by up to 75%, avoid millions in cost & exposure
RevenueEnhancers
ProfitEnhancers
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Client Value Achieved
• Prioritized Roadmap of Initiatives to Achieve Growth Vision within 2-3 years: BU Growth from $200M to $1B Through Analytic Insights
Client Value Achieved
• 331% ROI• Payback Period of 2.1 Months• 2% Yield Improvement = $300M
Client Value Achieved
• Reduced time to onboard customers by 80%• Improved visibility on service levels• Increased customer satisfaction
Client Value Achieved
• BSL Met Strategic Objective (ITaaS)• Reduced Costs by 20%• Improved Analytic Cycle Time by 50%
Client Value Achieved
• Access to Information in Minutes versus Weeks• Speed: Solution Deployed within Days• Access to Key Next Gen Talent
Client Value Achieved
• Speed to Market: 30 Days to Platform, 60 Days to Full Working Mobile Telematics Application
• Flexible Deployment Options
Achieving Real Business Value With Our Clients
Integrated data for ~100M people from 40 member companies
Healthcare
Maximized diamond company profitability through BI and analytics
Wholesale
Railway punctuality improved from 92% to a world-leading 96%
Transportation
Reduced tax evasion and litigation through DW and predictive modeling
Government
16% increase in claims fraud investigations for significant
ROI in 6 months
Insurance
Performance optimization and analytical
insights into POS and sales trends
Retail/CPG
$10M reduction in annual operating expenses
Printing
Customer intelligence lifetime value model driving marketing and customer service
Travel & Leisure
Use of sensor data for real-time managementof mining and mfg. ops and maintenance
Natural Resources
Comprehensive global view of exposure in near real time
Banking
Global InsuranceCompany
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
RISK RESULT
• Structuring all data at the point of ingestion• Schema on Write vs Schema on Read
• Significant upfront expense ( and $$) for planning
• Significant expense ( and $$) to adapt to changes/needs of the business
• Data silos • Disparate information streams• Reduced ability to obtain requirements from
entire business• Does not allow for holistic decisions to be
made• No golden source of truth
• Proprietary/custom data warehousing/infrastructure
• Expensive• Non standard to environment
• Scale • Not economically feasible• Not technically possible
Risk to Traditional Data Model the status quo
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Risk of Transforming to a Big Data Business
RISK RESULT
• Numerous different technologies • Hard to select the best tool without specific experience with these technologies
• Lack of Big Data specific expertise • Unreasonable expectations without having done it before
• R&D in Big Data is lost or as time permits• Scope creep is common• Learning as your go
• Immature Big Data Technologies • Compliance risk• Security Risk• Complex deployments• Complex integrations between technologies• High operational costs
• Large CapEx expenditure • Buying upfront growth• More complex to scale
Big Data & Analytic systems should be a tool to enable companies with better information and insights, not a roadblock
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Business InsightsDescriptive (1.0) Diagnostics (2.0) Predictive (3.0) Prescriptive (4.0)
5Define the right
tools for the task at hand
4Define consumption and interaction
3Define the types of Analysis
2Define data needed & format for
analysis
1Define the desired insights by stage
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Case Study
• Decrease warranty inquiry response times• Increase operational efficiency• Enable the business to extract new insights
• Conducted 5-week big data strategy assessment
• Established cloud-based big data platform• Built the apps and analytics to capitalize
on the data
• Over 10,000 queries/day• 30+ data connections• 1,000+TB of data• Response times of 2-3 months now done
with a single query• Improved customer satisfaction• Reduced churn• Reduced support costs• New product management capabilities,
fixes• Better supply chain coordination• Increased security• New data and analytics products• Increased cross-sales and up-sales• Increased renewals• Better license compliance
HGST, a Western Digital company, develops innovative, advanced hard disk drives, enterprise-class solid state drives, and external storage solutions and services. CSC improved customer support and product quality.
Solution ResultsChallenge
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Case StudyNetwork Rail manages the most of the rail infrastructure across Great Britain, responsible for control and maintenance of over 2,500 railway stations, 20,000
miles of track, and 40,000 bridges and tunnels. CSC provides a data and analytics hub for massive amounts of imagery and
analog track monitoring data.
• Network Rail needed a platform that could not only store, but also analyze petabytes of data over the long-term:– Track imagery and video data captured via drones and
cameras– Vibration data captured via maintenance trains– Other forms of large file size analog data crossed with
operational, structured data sets• Network Rail wanted to implement the solution
quickly, and ramp up data volumes at a fast pace• Goal of leveraging combined services to assist with
loading data, managing the underlying infrastructure, and working with and analyzing the data
• CSC designed and configured the solution, built and deployed it in the cloud, and developed ETL flows to import massive amounts of bulk data on an ongoing basis– Core platform (BDPaaS) leveraging Hortonworks Data
Platform, including Hive with Tez• CSC’s platform integrated with ESRI ArcGIS for Big
Data geolocation analysis features including geotagging and geo tiles
• CSC managed the infrastructure, platform components, and data flows, in addition to providing continued support/consultation services to the client
• Network Rail is generating insights on how to prioritize in near real-time the improvement and maintenance of the massive railway track and infrastructure footprint– Advanced analytics of analog data, including
geolocation capabilities– Ability to handle the scale required by the massive
amount of data under management and data growth– Complete transformation of a business unit’s analytics
capability on track for success in less than 12 months
SOLUTIONCHALLENGE RESULTS
ImageFiles
YARNHDFS
HiveHue
AWS S3Object Storage
Hue
Hadoop-ArcGISConnector
ESRI ArcGIS
AnalogData
GeoInfo
PostgreSQL
PostGIS
ArcGIS
Geocortex
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Case StudyThis Food & Hospitality Retailer has a footprint of over 650 regional hotels, 2,800 coffee shops, and a number of restaurant chains. CSC provides the
infrastructure, data platform, and analytics that uncovers revenue opportunities in customer web interactions.
• The client wanted to quickly evaluate the use of big data and the value that it brings as it relates to identifying new business opportunities
• Ease of use was a key need in making insights and reporting more accessible to analysts… and increasing the speed with which they could analyze
• Time to market was a key factor in the decision to implement a comprehensive big data platform. The client realized:– A bare platform would not be easy
to manage– Their staff does not possess the skills to operate a bare
platform– They needed to focus on the
big data applications, rather than the platform
• CSC designed and configured the solution, built and deployed it in the cloud, and developed ETL flows to transport web activity data within 90 days:– Core platform (BDPaaS) leveraging Hortonworks Data
Platform, including Hive with Tez– Aggregating various different data sources to create one
massive web log data set– Adding data science algorithms to clean up data for
better insights– Providing Pentaho Business Analytics as a
comprehensive reporting and dashboard suite for insight presentation
• CSC managed the infrastructure, platform components, and data flows, in addition to providing continued support/consultation services to the client
• The client is generating insights on how customers interact with their website, and improving their services for happier customers and more streamlined business:– Faster path to ROI with both tech and services– Creating a real-time customer insights dashboard and
set of reports– Ability to prove the value of big data internally through
the mining of data and generation of insights and reports for various teams
– Scalability to more data sources and use cases, including plans for mobile application analytics and operational metrics, as well as operational business analytics combining internal and external data sources
SOLUTIONCHALLENGE RESULTS
Food & HospitalityRetailer
YARNHDFS
HiveHue
PostgreSQL (onboard)
Distcp
Hue
Hive >
PostgreSQL
Hive-Pentaho
Connector
(ODBC)
Pentaho Business Analytics
Logs
Pentaho Data Integration (PDI)
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
ETL Data Transformation Business Intelligence Data Mining Advanced Analytics Geolocation
COLUMNARHBaseAccumuloDataStax
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
EventsHTTP(S) / TCP / UDP
FilesDirect Upload / FTP / FTPS / SFTP
Streams Queries
Hadoop
WebListener
Command & Control
File Store / Landing
Zone
Kafka Queue
Storm
HBase or Accumulo Tez or Impala
HDFS
MapReduce Hive Spark
Queries
Jobs
DataStax / TitanDB
Elastic-search or MongoDB
Splunk
FreeIPA + LDAP
Git
Jenkins
Agility Server
Puppet
Versioning Control
ID Access & Management
Monitoring & Log File Analysis
Continuous Integration
Infrastructure as Code
IT Policy & Governance
Big Data PaaS – Standard Reference Architecture
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Command & Control
• $100M+ R&D investment• 8+ years of R&D• 25+ distinguished big data engineers• 125+ related technology engineers (cloud, cybersecurity, etc.)• Core committers to all major Big Data open source projects
• Detailed Log Monitoring & Troubleshooting• Complete activity monitoring & audit trail• Comprehensive system monitoring and
alerting suite
FreeIPA + LDAP
• User Account and Permissions Management
• LDAP Integration
Git
• Platform and Application Version Control• DevOps Push-Pull Application Code Delivery
Agility Server
• IT Policy & Governance Engine• Hybrid Cloud Workload Interoperability
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Production Data Flows
PRODUCTION
LOCAL
Push Code
w/ Git
Test w/
Jenkins
Maintenance WindowPush to Production
DEV / DR
Sample Data, Partial/Full Flows, or DR Replication
Storm
Kafka HDFS
Hive Impala
ElasticsearchC&C
Storm
Kafka HDFS
Hive Impala
ElasticsearchC&C
VM/Sandboxor “local node” environment
or “direct-dev” on BDPaaS
User
Queries
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Production Data Flows
PRODUCTION
DEV / DR
Sample Data, Partial/Full Flows, or DR Replication
Storm
Kafka HDFS
Hive Impala
ElasticsearchC&C
Storm
Kafka HDFS
Hive Impala
ElasticsearchC&C
• ADD OR REMOVE NODES• RECONFIGURE NODES• RECONGIFURE OVERALL CLUSTER• ADD OR REMOVE CLUSTERS• SCALE UP OR SCALE DOWN CPU, RAM, DISK• ADD OR REMOVE ENVIRONMENTS
• ADD OR REMOVE NODES• RECONFIGURE NODES• RECONGIFURE OVERALL CLUSTER• ADD OR REMOVE CLUSTERS• SCALE UP OR SCALE DOWN CPU, RAM, DISK• ADD OR REMOVE ENVIRONMENTS
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Tableau
ODBC
Kibana
API
RevolutionR
SAS
BulkExport
RHad
oop
/ Sc
aleR
Storm
Kafka HDFS
Hive Impala
ElasticsearchC&C
DR
BDRKfk Replicate
Teradata
Oracle RDBMS
Twitter
Logs
VideoFiles
IBM MQ
HDFS
Hive Impala
Elasticsearch
Command & Control
Sqoop
Hue
Kfk-HdpBulk Writer
Kfk-ESRecordWriter
Storm
Kafka
Teradata Connector for Hadoop
Distcp
Sqoop
HTTP (GNIP)
HTTP
CustomConnector
EBS Volumes
VPN
Amazon S3
Amazon IAM
Amazon Storage Gateway
Direct Connect
AmazonCloudFront
Amazon CloudFormation
AMI Service
GlacierEphemeral Local Drives
D2-Instances
R3-Instances
I2-Instances
C4-Instances
Amazon RDS
C3-Instances
M3-Instances
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Why Public Cloud
• Higher Resource Efficiency for Increase Savings
• Significantly Greater Workload and Resource Flexibility
• More compatible with software-defined-everything approach