Top Banner
Big Data In Small Steps By Devashish Khatwani January 2015
18
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big Data

Big Data In Small Steps

By Devashish Khatwani

January 2015

Page 2: Big Data

The Three (or Four) Questions

1

2

3

What is Big Data and What does it have to do with my IT?

How can Big Data deliver value?

How do I implement it?

2 Devashish Khatwani

Page 3: Big Data

3 Devashish Khatwani

Question 1: What is Big Data and What does it have to do with my IT?

Page 4: Big Data

V Size of the data being processed. It can run up to Petabytes of data for companies such as Google and Facebook

V Speed at which the data is generated. To give you a perspective more than 90% of the data generated in human history was generated in the last two years. Banks can relate it with the # of credit card transactions happening per minute

Types of data which encompass the dataset you need to process. It could be structured data coming from databases or could be unstructured data coming from tweets about your company or could be semi structured such as the one coming from an online feedback form

Uncertainty of the data, you can think of it has a partially filled feedback form or a tweet with hashtags such as #YOLO

V V VOLUME VELOCITY VARIETY VERACITY

Huge amount of Data is not Big Data, Big Data is defined by

four key attributes

4 Devashish Khatwani

Page 5: Big Data

The leaders in Big Data implementation are moving away

from the traditional technology stack

5

Monolithic Commodity

Hardware

Centralized Relational

Database

Queries (SQL)

Distributed Commodity

Hardware

Hadoop Parallel

Relational

Database

No SQL

Database

Relational

Database

Monolithic Commodity

Hardware

Interactive

Query

Real Time

Query Map Reduce

Data Visualization Tools

Traditional Technology Stack

Big Data Technology Stack

Data Storage and Management

Data Processing

Data Analysis and Presentation

1 2 3

1

2

3

Devashish Khatwani

Page 6: Big Data

Migrating to a Big Data Technology Stack has to be gradual

so that regular reporting is not hindered

6

Monolithic Commodity

Hardware

Centralized Relational

Database

Queries (SQL)

Data Generation

Source

ETL Process

Distributed Commodity

Hardware

Parallel

Relational

Database

No SQL

Database

Relational

Database

Monolithic Commodity

Hardware

Interactive

Query

Real Time

Query Map Reduce

Data Visualization Tools

Hadoop

Data Generation

Source

Regular Reports

ETL Process

Export to the existing Database

Regular Reports

Devashish Khatwani

Page 7: Big Data

7 Devashish Khatwani

Question 2: How can Big Data deliver value?

Page 8: Big Data

Technology is just an enabler, in order to make money you

need to analyse your Data

8

Models to present History- This is similar to the reporting that most companies have today with an added layer of drill down queries and advanced scorecard reporting

1

Models to explain the history – This would be analysis such as segmentation analysis, sensitivity analysis etc. which can be used to inform future decisions or to analyse the efficacy of past decisions

2

Models to predict the future – This maturity level entails advanced statistical models such as predictive analytics, simulations, optimization and machine learning

3

• Use Big Data to build models

which may predict the future

• Use historical data and/or

experiments to validate the

models

• Make business decisions based

on these models to get your ROI

Incr

easi

ng

Co

mp

lexi

ty

and

M

atu

rity

Devashish Khatwani

Page 9: Big Data

Big Data Use Cases: Insurance Industry can use Big Data for increasing

customer loyalty, actuarial risk management and increase the efficiency of claims

function

9

Increasing Customer Loyalty: By running a real time sentiment analysis on various social media platforms, emails , chats, website etc. Insurance providers can develop custom response approaches to known behaviour patterns. For instance by analyzing the website activity of the users and correlating it with the subsequent calls made to the call centre the insurance provider can predict the nature of the incoming call and develop a custom response to the situation.

1

Acturial Risk Management: Current auto insurance premiums are based on credit scores of individuals, demographic variables and vehicle classifications. Insurance companies can extract driving patterns through the use of telematics and use this data to offer better premiums to its customers. This will not only help the insurance provider to finely segment the customer base but also alleviate the problem of moral hazard

2

Increasing the efficiency of Claims function: With the text analysis of previous claims, a claims officer is able to cross reference similar claims more quickly and can speed up the claims process. Text analysis of various claims can reveal patterns and combining them with demographic and behavioral data can generate cohorts which the insurance provider can use to avoid frauds

3

Devashish Khatwani

Page 10: Big Data

Big Data Use Cases: Telecom Industry can use Big Data for better

segmentation, optimize capacity planning and optimizing promotional spend

10

Better Segmentation: Usually segmentation is based on demographic, geographic, behavioral and psychographic attributes but with Big Data a telecom provider can start micro-segmenting with adding on a layer of: • Activity based data( website tracking, purchase history, call centre data, mobile usage data, response to

incentives) • Social Network Profile • Social Influence and sentiment data

1

Optimize Capacity Planning: Network capacities, workforce capacity etc. can be optimized based on analysis of historical data. For instance instead of using aggregated time series forecasting for the number of calls received by the call centres , the telecom provider can use time series forecasting at customer segment level and then aggregate the forecast to generate a more accurate prediction

2

Optimizing Promotional Spend: The ROI of each email campaign, social media campaign etc. can be calculated and an optimized mix of promotion methods can be generated. Furthermore factors such as date, time, text of the campaign can be analyzed for finding out the best combination through experiments

3

Devashish Khatwani

Page 11: Big Data

11 Devashish Khatwani

Question 3: How do I implement it?

Page 12: Big Data

Establishing a Centre of Excellence for Big Data implementation is

the fastest way to Big Data Success

12

Corporate

Business Unit COE

Big Data Project

Option 1: Internal Consulting Option 2: Centralized Option 3: Centre of Excellence

Corporate

Business Unit COE

Big Data Project

Corporate

Business Unit COE

Big Data Project

Analytics team

This setup treats the COE as internal team of experts which can be called upon by the Business Unit for projects. The onus of initiating a Big Data project lies with the Business Unit. The COE can be treated either as a cost centre or a profit centre under this structure

Under this setup COE identifies and executes the Big Data initiative with support from the business unit. The onus of identifying a viable Big Data initiative rests with the COE. The resources in the COE must be well versed with Business Unit’s business for this structure to be effective

Under this structure the COE is a small organization with very specialized Big Data skills and the Business Unit itself is well versed with basic analytics capabilities. This structure works well for organization who have historically be analytically savvy and have taken data driven decisions in the past

Devashish Khatwani

Page 13: Big Data

13

Big Data Initiatives need to be championed by CXOs in order for

them to have maximum impact

Source: LEAP Study 2014 by AT Kearney and Carnegie Melon University

Devashish Khatwani

Page 14: Big Data

You need seven types of people for

your Big Data Initiative

14

1

5 6

7

3

2 4

1

2

3

4

5

6

7

Executive Leader

Project Manager

Internal Trainer

External Liaison

Data Technologist

Data Scientist

Data Analyst

CORE

Devashish Khatwani

Page 15: Big Data

The Data Scientist is the person who will have the statistical know how of coding and performing statistical analysis such as clustering, predictive analytics, Sentiment Analysis, Machine learning etc. He is the most important link of converting data to actionable insights. Some of the skills that a data scientist should posses are: 1. Programming experience in

Python, Java, R and SQL 2. Knowledge of data mining,

machine learning and statistical methods

3. Experience working with relational databases

The Data Analyst is responsible for brainstorming for different models which need to be studied and statistically validated by the Data Scientist . He is also responsible for calculating the dollar impact of actions taken based on big data insights Some of the skills that a data analyst has to be familiar with is basic level statistics and sound understanding of product/function for which the Big Data initiative is being run. For instance if you are running a sentiment analysis on Social Media then the business analyst should be an expert in social media marketing

The three people who form the core of your Big Data Initiative:

Data Technologist, Data Scientist and Data Analyst

15

The Data Technologist is responsible for identifying the data sources of the organization and should be able to work on different aspects of data management such as data 1. Data Governance 2. Data Architecture 3. Data Quality 4. Data Security 5. Data Warehousing 6. Data Availability

Data Technologist Data Scientist Data Analyst

Devashish Khatwani

Page 16: Big Data

16

Prototyping and then developing a repeatable solution is the best

way for extracting value from Big Data

Source: LEAP Study 2014 by AT Kearney and Carnegie Melon University

Devashish Khatwani

Page 17: Big Data

17

Getting Started on your Big Data journey

Source: Deloitte, Big Data An Insurance business imperative

Devashish Khatwani

Page 18: Big Data

Devashish Khatwani

Thank You

Devashish Khatwani B Tech – Electrical Engineering, IIT Roorkee MBA – Rotman School of Management [email protected]

18