Dell | Cloudera | Syncsort Data Warehouse Optimization – ETL Offload Reference Architecture Dell Cloudera Syncsort Intel
Dell | Cloudera | Syncsort Data Warehouse Optimization – ETL Offload Reference Architecture
DellClouderaSyncsort
Intel
Panel moderator Armando Acosta, Dell
Armando Acosta • Subject Matter Expert for Dell Big Data Solutions • Product Manager for the Dell Hadoop Solutions • Works with customers to transform IT into better business outcomes • Seventeen years in technology
Sean Anderson Cloudera
Brandon DraegerIntel
Mark MuncySyncsort
Panel introductions
Organizations actively using data grow 50% faster
50% 39
%42%(2014 ) (201 5)
The number of organizations who understand the benefits of big data grew slightly.
Older technology can’t keep upThe ability to scale to support all data and unpredictable workloads means effective data management and data integration are key priorities
Data silos hinder decision-makingNeed to analyze all data, regardless of type or where it resides – and apply to use cases
Determining the valueIT/business alignment on strategic business objectives and use cases is critical to achieving ROI from all data
There are challenges that must be addressed
Address data challenges holistically, yet modularly
7
How data is moved and prepared for analysis
The basics of big data and analyticsWhere data is analyzed
• Databases• Social media• Sensor data
(IoT)• Devices• LOB
applications• Cloud• External
sources
Where data originates
• Analytical engine
• Business intelligence
• In-memory computing
• Enterprise data warehouse
Data integration, aggregation and transformation
Sean Anderson Sean Anderson, Cloudera Product Marketing - IT Solutions at Cloudera
Sean is a tenured infrastructure scaling and cloud strategy consultant with a strong focus on strategic partnerships and innovative hybrid technology. He has been a part of integral shifts in technology including the rise of cloud computing, open source standardization, and big data. Sean quickly became a go-to resource and speaker for data specific workloads focusing on technologies like Hadoop, MongoDB, Redis, ElasticSearch, SQL, and Data Warehousing. At Rackspace Hosting, Sean helped build and launch open-source cloud platforms around Hadoop, MongoDB, and Redis. Sean is currently marketing director for IT Solutions at Cloudera; the pioneers of Apache Hadoop.
Inefficient data workloads cost customers money
Frequent ETL breakdowns Long reporting wait times
Ad hoc access pressure on EDW Extreme query complexity
Cloudera EnterpriseMaking Hadoop Fast, Easy, and Secure
A new kind of data platform.• One place for unlimited
data• Unified data access
Cloudera makes it:• Fast for business• Easy to manage• Secure without
compromise
Cloudera Navigator OptimizerUnlock Your Best Hadoop Strategy, Instantly
Active Data Optimization for Hadoop to save you time and money
• Instant workload insights
• Intelligent optimization guidance
• Reduce Hadoop workload development effort
Intel
Brandon Draeger Director of Marketing and Business Development for Big Data Solutions
Brandon is a Director of Marketing and Business Development for Big Data Solutions at Intel and manages the GTM relationship for Intel and Cloudera and their shared partner ecosystem. Brandon has over 15 years of experience in a variety of enterprise technology disciplines and has held roles in engineering, product management, and strategy at Dell, Symantec, and Dorado Software.
Customers Are StrugglingTraditional Tools Aren’t Working
Data integration and transformation workloads consume as much as 80% of EDW capacity 80
%
Of all Data Warehouses are performance and capacity constrained – 70%#1 ChallengeOrganizations cite TCO as biggest obstacle to data integration tools
Gartner: “The State of Data Warehousing in 2014, June 19, 2014”
Gartner: “The State of Data Warehousing in 2014, June 19, 2014”
Gartner: “The State of Data Warehousing in 2014, June 19, 2014”
#1 Use Case for HadoopData Warehouse Optimization - ETL Offload
Customer Challenge- Processing and storing ever-increasing data volumes with traditional enterprise data warehouses and related data integration technology, and their legacy pricing
models, is taxing stagnant IT budgets
Practitioners who have shifted one or more workloads from legacy data warehouses or
mainframes to HadoopThe most popular workloads being shifted are large-scale
data transformations
61%Customers have
implemented Hadoop
Syncsort Customer Survey 2014
15
Operational efficiency
ConnectUnify all data from disparate tables/sources to reduce existing system load and data transformation costs
AnalyzeDeliver streamlined business reporting even with existing analytical tools
ActUtilize better, faster reporting for improved data-driven decision making
Key use cases
• Data warehouse acceleration
• Log aggregation
• Data pipeline modernization
Data challenges for operational efficiency
Syncsort
Mark Muncy Technical Product Marketing Manager – Big Data, Syncsort
Mark Muncy leads Technical Product Marketing for Syncsort’s Big Data portfolio, working with technical and client-facing teams to deliver high-value solutions to the most data intensive companies in the world. Mark brings to his current role over a decade of hands-on experience in data architecture and ETL development in the gaming, data services, & financial services industries.
Modern Data Pipeline
Traditional Data Pipeline
Too Many Workloads in the EDWModernize the Data Pipeline with Hadoop
Data Staging Tool
Extract & Load Data
Clean & Parse Data
Disparate Data
SourcesEnterprise data
warehouse + ETLData Transformation
JobsBusiness Reporting
Query
Perf
Capacity
The Results Longer data transformation job times
Not meeting SLAs for business reporting
Slow Ad Hoc Query
Too costly to scale
Disparate Data
SourcesEnterprise data
warehouseBusiness Reporting
Query
Perf
Capacity
The Results Reduced data transformation job times
Improved SLAs for business reporting
Fast Ad Hoc Query
Scales Economically
Hadoop + ETLData Transformation
Jobs Clean, Parse, Transform
Syncsort DMX-h: A Complete Solution for Hadoop
Connect Transform Optimize
• Smarter Architecture – Engine runs natively within MapReduce and Spark
• Smarter Connectivity – Connect streaming and batch data sources across the organization, including mainframe, NoSQL and everything in between.
• Smarter Development – GUI for developing & maintaining Hadoop data pipeline
• Smarter Productivity – Use-case Accelerators to fast-track development
• Enterprise Grade Solution – Integrated support for Cloudera Navigator, Sentry, Kerberos and LDAP
Design Once, Deploy Anywhere• Free users from underlying complexities of Hadoop• Intelligent Execution dynamically optimizes the job
for any platform on premise or in the cloud• Future-proof your applications!
19
3. Act2. Analyze1. ConnectSource
Operational efficiency architecture
ManagementServices Security Dell Financial ServicesInfrastructure
Operational data sources
Enterprise data warehouse
Relational management
database
Data mart
Extract, translate,and load
Sort
Aggregate
Group
Parse
Clean
Translate
Enterprise data warehouse
Relationalmanagement
database
Data mart
Business reporting and query
Price optimization
Improved forecasting
Uptime optimization
Accelerated response
Faster
reporting
Improved service levels
Dell | Cloudera | Syncsort |Intel
Microsoft APS, SAP HANA
Redeploying talent / reducing staff costsEntry level employee using the Dell | Cloudera | Syncsort solution for Hadoop could save 76.3% over three years compared to a senior engineer using a DIY, open source approach.
Save time and cost on Hadoop ETL jobs.
Expert Cost (contractor)$559.298
Expert Cost (employee) $279,149
Beginner Cost$132,326
Total administrative costs over three years to design 4 ETL jobs per month.
Entry Level vs. Senior EngineerTime to complete ETL jobs comparing experience engineers (green) to new hires (blue)
Complete Hadoop jobs faster
30 min, 11 sec
36 min, 39 sec
4 min, 48 sec5 min, 51 sec
6 min, 15 sec
15 min, 45 sec
Data validation and pre-processing
Fact dimension load with type 2 SCD
Vendor mainframe file integration
60.3%less time
17.6%less time
17.9%less time
Save 53.7% in timeUsing the Dell | Cloudera | Syncsort solution for Hadoop, the entry-level technician developed and deployed Hadoop ETL jobs in 53.7% less time
Reclaim days of valuable time
Fact dimension load with type 2 SCD
Data validation and pre-processing
Vendor mainframe file integration
Load Validate
Int.
8.3 Days
3.8 Days
Panel Q&A
Listen to this Webcast On-DemandIncluding Panel & Participant Q&A
http://bit.ly/1Rtk2OE
For additional information:Dell.com/Hadoop [email protected]
Thank you.