© 2014 Impact Analytix, LLC Kickstart Big Data: Combine Existing Analytics Assets with New Hadoop Data Sources Jen Underwood Founder & Principal Consultant Impact Analytix, LLC [email protected] www.impactanalytix.com quickly make a positive impact
© 2014 Impact Analytix, LLC
Kickstart Big Data: Combine Existing Analytics Assets with New Hadoop Data SourcesJen Underwood
Founder & Principal ConsultantImpact Analytix, LLC
q u i c k l y m a k e a p o s i t i v e i m p a c t
© 2014 Impact Analytix, LLC
AgendaPresenter: Jen Underwood
Title: Kickstart Big Data: Combine Existing Analytics Assets with New Hadoop Data Sources
Tagline: Use Big Data technologies to Leverage your Existing Data Warehouse and BI/Analytics (OLAP) Investments
Abstract: Explore successful approaches to securing initial quick wins with big data analytics pilot projects without boiling the ocean (data lake). Business intelligence and big data initiatives remain the No. 1 CIO priority for the second consecutive year. In this session we look at practical options to get started by combining existing data warehouse and OLAP assets with new Hadoopdata sources.
- Share popular big data analytics use cases
- Discuss modern analytics solution architecture
- How to choose the right pilot project
Key Takeaways:
First Step – Data Warehouse Modernization for speed, scale and outcomes
Next Step – Analytics Optimization for simplicity, alignment and value
The Goal – Advanced Analytics Platform for Big Data as a Service, Feature, Source
© 2014 Impact Analytix, LLC
Mega-Trends
Source: http://www.burrus.com/resources/daniel-burrus-top-twenty-technology-driven-trends-for-2013/
1. Rapid Growth of Big Data
2. Cloud Computing and
Advanced Cloud Services
3. On Demand Services
4. Virtualization
5. Consumerization of IT Increases
© 2014 Impact Analytix, LLC
Living in the Age of Data Explosion
Exponential increase in unstructured data
New breed of highly distributed, elastic scale non-relational databases
Revolutionary market shift after 40 years of relational database dominance
Big data requires modernizing architecture and approach to analytics
© 2014 Impact Analytix, LLC
What is Big Data?
© 2014 Impact Analytix, LLC
Big Data Analytics ≠ Traditional BI with More Data
Volume
Variety
Velocity
Relational Data
10xincrease
every five years
85%from new data types
Real Time
petabytes Batch & Streaming
Structured & Unstructured
© 2014 Impact Analytix, LLC
Big Data Analytics ≠ Traditional BI with More DataBig Data is redefining the processes of managing master data, data quality, and information lifecycle management
Big Data is NOT replacing EDW and OLAP, it supplements those investments
Big Data ecosystem includes variety of analytic technologies
• Columnar databases, JSON, and unstructured file stores
• Hadoop and NoSQL platforms adding SQL, search, and streaming capabilities, while NoSQL platforms are adding MPP and transactional support
• Data tiering that aggressively leverages SSD (Flash) and DRAM
Source: Gartner
© 2014 Impact Analytix, LLC
Hadoop: Move Compute to the DataInspired by Google’s Map Reduce
Infrastructure to automatically scale-out storage and distributed data processing on commodity hardware
Hadoop
© 2014 Impact Analytix, LLC
Hadoop: Move Compute to the Data
Source: Datameer
Another way to think about this shift…
© 2014 Impact Analytix, LLC
Traditional RDBMS MapReduce
Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)
Access Interactive and Batch Batch
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
DBA Ratio 1:40 1:3000
Source: Tom White’s Hadoop: The Definitive Guide
Hadoop: Move Compute to the Data
© 2014 Impact Analytix, LLC
Process Shift from Schema First to Schema Later
1. Data arrives
2. Derive schema
3. Cleanse data
4. Transform
5. Load to EDW
6. Analyze
1. Data arrives
2. Load to Hadoop
3. Analyze
4. Subsets of data loaded to EDW
SLOW VALUE FROM DATA
RAPID VALUE FROM DATA
© 2014 Impact Analytix, LLC
Modern Analytics Architecture
© 2014 Impact Analytix, LLC
Modern Data Warehousing
© 2014 Impact Analytix, LLC
Changes in Data Warehousing PatternsFree up the EDW from low value tasks
Keep 100% of the source data and historical data
Explore and mine data with "schema on read"
Cold data storage with Hadoop, warm data with MPP/Columnar, hot data in-memoryNon-relational data
Hadoop –Cold Data
MPP/Columnar –Warm Data
In-Memory –Hot Data
© 2014 Impact Analytix, LLC
Changes in Data Warehousing Patterns
Results
Non-relational data
Social apps
Sensor and RFID
Mobile apps
Webapps
Hadoop
Relational and OLAP data
Traditional schema-based data warehouse applications
EDWHDFS bridge
Enhanced query engine
External table
External data source
External fileformat
Regular
T-SQL
Basically adding a “bridge” to Big Data from your existing investments
© 2014 Impact Analytix, LLC
Changes in Data Warehousing Patterns
Big Data storage aka Data Lake is characterized by three key attributes:
Collect everything A data lake contains all data, both raw sources over extended periods of time as well as any processed data
Dive in anywhere A data lake enables users across multiple business units to refine, explore and enrich data on their terms
Flexible access A data lake enables multiple data access patterns across a shared infrastructure: batch, interactive, online, search, in-memory and other processing engine
© 2014 Impact Analytix, LLC
Changes in Data Warehousing Patterns
Modern MPP, Columnar and Visual Analytics Innovations:
Nature of Hadoop data access Historically querying Hadoop entailed complex Java, results were slow and batch processes thus improved tools made to expedite Hadoop data access
External tables, compression, HDFS, Hive, other means Easy visual analytics tools use business user friendly means to access Hadoop data and often brings that data into an in-memory cache for rapid data analysis
Materialized Views “v2” and analytic functions Big data visual analytic tools improve upon traditional view techniques to bring bid data into memory or chip and intelligently, automatically re-use and refresh those views
© 2014 Impact Analytix, LLC
Why Now? What’s the big deal?
“By 2015, organizations that build a modern information management
system will outperform their peers financially by 20 percent.”
– Gartner, Mark Beyer, Information Management in the 21st Century
© 2014 Impact Analytix, LLC
Source: 2014 IDG Enterprise Big Data Research An online survey of 46 questions was used with 751 respondents randomly selected from CIO, Computerworld, CSO, InfoWorld, ITworld, and Network World subscribers, e-mail subscription lists and LinkedIn forums.
Big Data Adoption
© 2014 Impact Analytix, LLC
Big Data Changing the Landscape
Beyond hype, it is imperative to understand when it is time to embrace a technology-enabled trend in its formative stages
Organizations are already vastly improving the quality and speed of decision making – big data is a competitive need to thrive
Look around you… ALL the major database vendors and analytics software providers evolving their solution offerings for big data sources
New analytical solutions easily, quickly unlock the value in big data
© 2014 Impact Analytix, LLC
Big Data Today
© 2014 Impact Analytix, LLC
Areas of Business Intelligence Tools
Source: http://www.b-eye-network.com/blogs/eckerson/archives/2013/03/a_guide_for_bi.php
© 2014 Impact Analytix, LLC
Unlocking the Value of Big Data
Today’s easy visual analytics and integration tools empower the business to make smarter decisions and generate more value from more data
Fast, direct, agile access to big data to analyze in-place, blend with EDW, OLAP and personal data sources, decreasing long BI backlogs for faster actionable insight
Less need to move large volumes of data between platforms just to ask new questions or perform predictive analytics
© 2014 Impact Analytix, LLC
Integrate Predictive Intelligence
Transform business using “Smart” Apps and Reports
Analytic tool specific integration options
In-Database Predictive UDF Functions
and Predictive Queries
PMML to exchange models
Programming with APIs
© 2014 Impact Analytix, LLC
Hunk
Unlocking the Value of Big Data
Many Others…
© 2014 Impact Analytix, LLC
Tableau
© 2014 Impact Analytix, LLC
Datameer
© 2014 Impact Analytix, LLC
Platfora
© 2014 Impact Analytix, LLC
SAS Visual Analytics
© 2014 Impact Analytix, LLC
Excel 2013
© 2014 Impact Analytix, LLC
Demos
© 2014 Impact Analytix, LLC
Choosing a Pilot ProjectSecure practical, initial quick wins without boiling the ocean (data lake)
© 2014 Impact Analytix, LLC
How to Start1. Develop Roadmap
2. Plan to invest in modern infrastructure as a long-term equal analytic partner for traditional EDW and BI assets
3. Develop a "skills matrix" and staffing plan
4. Identify and prioritize projects that present only one or two of the extreme data challenges —volume, variety, velocity, or complexity — and include visual analytics where the business can immediately see value
5. Gradually invest in training by partnering with experts and adding staff as needed
© 2014 Impact Analytix, LLC
Why Visual Analytics1. Let’s face it… if the business can see it, they can immediately
recognize the value
2. Easy – few weeks to a month to highly visible results the business can understand and truly appreciate (much more so than a cold data back up!)
3. Choose a project with measurable ROI for a specific for a valued business use casea. Outline the goals of a big data pilotb. Get assistance from experts to reduce learning curve,
fast-track learning curve and ensure initial successesc. Sell the vision with the end result imageryd. Start small and specific area with
one big data “V” BUT large enough that people care about the results
© 2014 Impact Analytix, LLC
Key Takeaways
Don’t be left behind. Get started now.
First Step – Data Warehouse Modernization for speed, scale and outcomes
Next Step – Analytics Optimization for simplicity, alignment and value
The Goal – Advanced Analytics Platform for Big Data as a Service, Feature, Source
© 2014 Impact Analytix, LLC© 2013 Impact Analytix, LLC