Top Banner
© 2014 Impact Analytix, LLC Kickstart Big Data: Combine Existing Analytics Assets with New Hadoop Data Sources Jen Underwood Founder & Principal Consultant Impact Analytix, LLC [email protected] www.impactanalytix.com quickly make a positive impact
36

Combining Big Data with Existing Analytics Technologies

Jan 19, 2017

Download

Data & Analytics

Jen Underwood
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Kickstart Big Data: Combine Existing Analytics Assets with New Hadoop Data SourcesJen Underwood

Founder & Principal ConsultantImpact Analytix, LLC

[email protected]

q u i c k l y m a k e a p o s i t i v e i m p a c t

Page 2: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

AgendaPresenter: Jen Underwood

Title: Kickstart Big Data: Combine Existing Analytics Assets with New Hadoop Data Sources

Tagline: Use Big Data technologies to Leverage your Existing Data Warehouse and BI/Analytics (OLAP) Investments

Abstract: Explore successful approaches to securing initial quick wins with big data analytics pilot projects without boiling the ocean (data lake). Business intelligence and big data initiatives remain the No. 1 CIO priority for the second consecutive year. In this session we look at practical options to get started by combining existing data warehouse and OLAP assets with new Hadoopdata sources.

- Share popular big data analytics use cases

- Discuss modern analytics solution architecture

- How to choose the right pilot project

Key Takeaways:

First Step – Data Warehouse Modernization for speed, scale and outcomes

Next Step – Analytics Optimization for simplicity, alignment and value

The Goal – Advanced Analytics Platform for Big Data as a Service, Feature, Source

Page 3: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Mega-Trends

Source: http://www.burrus.com/resources/daniel-burrus-top-twenty-technology-driven-trends-for-2013/

1. Rapid Growth of Big Data

2. Cloud Computing and

Advanced Cloud Services

3. On Demand Services

4. Virtualization

5. Consumerization of IT Increases

Page 4: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Living in the Age of Data Explosion

Exponential increase in unstructured data

New breed of highly distributed, elastic scale non-relational databases

Revolutionary market shift after 40 years of relational database dominance

Big data requires modernizing architecture and approach to analytics

Page 5: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

What is Big Data?

Page 6: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Big Data Analytics ≠ Traditional BI with More Data

Volume

Variety

Velocity

Relational Data

10xincrease

every five years

85%from new data types

Real Time

petabytes Batch & Streaming

Structured & Unstructured

Page 7: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Big Data Analytics ≠ Traditional BI with More DataBig Data is redefining the processes of managing master data, data quality, and information lifecycle management

Big Data is NOT replacing EDW and OLAP, it supplements those investments

Big Data ecosystem includes variety of analytic technologies

• Columnar databases, JSON, and unstructured file stores

• Hadoop and NoSQL platforms adding SQL, search, and streaming capabilities, while NoSQL platforms are adding MPP and transactional support

• Data tiering that aggressively leverages SSD (Flash) and DRAM

Source: Gartner

Page 8: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Hadoop: Move Compute to the DataInspired by Google’s Map Reduce

Infrastructure to automatically scale-out storage and distributed data processing on commodity hardware

Hadoop

Page 9: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Hadoop: Move Compute to the Data

Source: Datameer

Another way to think about this shift…

Page 10: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Traditional RDBMS MapReduce

Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)

Access Interactive and Batch Batch

Updates Read / Write many times Write once, Read many times

Structure Static Schema Dynamic Schema

Integrity High (ACID) Low

Scaling Nonlinear Linear

DBA Ratio 1:40 1:3000

Source: Tom White’s Hadoop: The Definitive Guide

Hadoop: Move Compute to the Data

Page 11: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Process Shift from Schema First to Schema Later

1. Data arrives

2. Derive schema

3. Cleanse data

4. Transform

5. Load to EDW

6. Analyze

1. Data arrives

2. Load to Hadoop

3. Analyze

4. Subsets of data loaded to EDW

SLOW VALUE FROM DATA

RAPID VALUE FROM DATA

Page 12: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Modern Analytics Architecture

Page 13: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Modern Data Warehousing

Page 14: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Changes in Data Warehousing PatternsFree up the EDW from low value tasks

Keep 100% of the source data and historical data

Explore and mine data with "schema on read"

Cold data storage with Hadoop, warm data with MPP/Columnar, hot data in-memoryNon-relational data

Hadoop –Cold Data

MPP/Columnar –Warm Data

In-Memory –Hot Data

Page 15: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Changes in Data Warehousing Patterns

Results

Non-relational data

Social apps

Sensor and RFID

Mobile apps

Webapps

Hadoop

Relational and OLAP data

Traditional schema-based data warehouse applications

EDWHDFS bridge

Enhanced query engine

External table

External data source

External fileformat

Regular

T-SQL

Basically adding a “bridge” to Big Data from your existing investments

Page 16: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Changes in Data Warehousing Patterns

Big Data storage aka Data Lake is characterized by three key attributes:

Collect everything A data lake contains all data, both raw sources over extended periods of time as well as any processed data

Dive in anywhere A data lake enables users across multiple business units to refine, explore and enrich data on their terms

Flexible access A data lake enables multiple data access patterns across a shared infrastructure: batch, interactive, online, search, in-memory and other processing engine

Page 17: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Changes in Data Warehousing Patterns

Modern MPP, Columnar and Visual Analytics Innovations:

Nature of Hadoop data access Historically querying Hadoop entailed complex Java, results were slow and batch processes thus improved tools made to expedite Hadoop data access

External tables, compression, HDFS, Hive, other means Easy visual analytics tools use business user friendly means to access Hadoop data and often brings that data into an in-memory cache for rapid data analysis

Materialized Views “v2” and analytic functions Big data visual analytic tools improve upon traditional view techniques to bring bid data into memory or chip and intelligently, automatically re-use and refresh those views

Page 18: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Why Now? What’s the big deal?

“By 2015, organizations that build a modern information management

system will outperform their peers financially by 20 percent.”

– Gartner, Mark Beyer, Information Management in the 21st Century

Page 19: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Source: 2014 IDG Enterprise Big Data Research An online survey of 46 questions was used with 751 respondents randomly selected from CIO, Computerworld, CSO, InfoWorld, ITworld, and Network World subscribers, e-mail subscription lists and LinkedIn forums.

Big Data Adoption

Page 20: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Big Data Changing the Landscape

Beyond hype, it is imperative to understand when it is time to embrace a technology-enabled trend in its formative stages

Organizations are already vastly improving the quality and speed of decision making – big data is a competitive need to thrive

Look around you… ALL the major database vendors and analytics software providers evolving their solution offerings for big data sources

New analytical solutions easily, quickly unlock the value in big data

Page 21: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Big Data Today

Page 22: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Areas of Business Intelligence Tools

Source: http://www.b-eye-network.com/blogs/eckerson/archives/2013/03/a_guide_for_bi.php

Page 23: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Unlocking the Value of Big Data

Today’s easy visual analytics and integration tools empower the business to make smarter decisions and generate more value from more data

Fast, direct, agile access to big data to analyze in-place, blend with EDW, OLAP and personal data sources, decreasing long BI backlogs for faster actionable insight

Less need to move large volumes of data between platforms just to ask new questions or perform predictive analytics

Page 24: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Integrate Predictive Intelligence

Transform business using “Smart” Apps and Reports

Analytic tool specific integration options

In-Database Predictive UDF Functions

and Predictive Queries

PMML to exchange models

Programming with APIs

Page 25: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Hunk

Unlocking the Value of Big Data

Many Others…

Page 26: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Tableau

Page 27: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Datameer

Page 28: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Platfora

Page 29: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

SAS Visual Analytics

Page 30: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Excel 2013

Page 31: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Demos

Page 32: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Choosing a Pilot ProjectSecure practical, initial quick wins without boiling the ocean (data lake)

Page 33: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

How to Start1. Develop Roadmap

2. Plan to invest in modern infrastructure as a long-term equal analytic partner for traditional EDW and BI assets

3. Develop a "skills matrix" and staffing plan

4. Identify and prioritize projects that present only one or two of the extreme data challenges —volume, variety, velocity, or complexity — and include visual analytics where the business can immediately see value

5. Gradually invest in training by partnering with experts and adding staff as needed

Page 34: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Why Visual Analytics1. Let’s face it… if the business can see it, they can immediately

recognize the value

2. Easy – few weeks to a month to highly visible results the business can understand and truly appreciate (much more so than a cold data back up!)

3. Choose a project with measurable ROI for a specific for a valued business use casea. Outline the goals of a big data pilotb. Get assistance from experts to reduce learning curve,

fast-track learning curve and ensure initial successesc. Sell the vision with the end result imageryd. Start small and specific area with

one big data “V” BUT large enough that people care about the results

Page 35: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC

Key Takeaways

Don’t be left behind. Get started now.

First Step – Data Warehouse Modernization for speed, scale and outcomes

Next Step – Analytics Optimization for simplicity, alignment and value

The Goal – Advanced Analytics Platform for Big Data as a Service, Feature, Source

Page 36: Combining Big Data with Existing Analytics Technologies

© 2014 Impact Analytix, LLC© 2013 Impact Analytix, LLC