Top Banner
More than 17 years experiences in IT industry with theoretical physics background. He started as scientific programmer at University of Indonesia’s semiconductor lab then later worked as software engineer and architect in various software companies. He joined SRIN in 2014 to lead development of various mobile apps and middleware platforms, as well as to conduct research projects on predictive data analytic using deep machine learning technologies. Prior to SRIN, he spent 10 years at Microsoft Indonesia as Director of Developer Ecosystem (DX) division. insert photo
28

SymEx 2015 - Agile Process for Big Data Analytic

Jan 21, 2017

Download

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SymEx 2015 - Agile Process for Big Data Analytic

• More than 17 years experiences in IT industry with theoretical physics background. He started as scientific programmer at University of Indonesia’s semiconductor lab then later worked as software engineer and architect in various software companies. He joined SRIN in 2014 to lead development of various mobile apps and middleware platforms, as well as to conduct research projects on predictive data analytic using deep machine learning technologies. Prior to SRIN, he spent 10 years at Microsoft Indonesia as Director of Developer Ecosystem (DX) division.

insert photo

Page 2: SymEx 2015 - Agile Process for Big Data Analytic
Page 3: SymEx 2015 - Agile Process for Big Data Analytic

Context of “Big Data” Science Scope of Data Analytic Project Management Complexities Team Structure and R&R Agile Principles and Process Model Common Execution Issues Q&A

Page 4: SymEx 2015 - Agile Process for Big Data Analytic

Volume Exceeds physical limits of vertical scalability

Velocity Decision window small compared to data change rate

Variety Many different formats makes integration expensive

Variability Many options or variable interpretations confound analysis

Page 5: SymEx 2015 - Agile Process for Big Data Analytic

By 2015, organizations that build a modern information management system will outperform their peers financially by 20 percent.

– – Gartner, Mark Beyer “Information Management in the 21st Century”

Data, Data, .. Everywhere New Data Sources Larger Data Volumes

New Data Management Technologies Hadoop + Spark + Tool Ecosystem

New Era of Data Analytic Descriptive, Predictive & Prescriptive Data-Driven Organization

10x increase every five years

85% from new data types

Volume Velocity Variety

Page 6: SymEx 2015 - Agile Process for Big Data Analytic
Page 7: SymEx 2015 - Agile Process for Big Data Analytic

2013 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

Cloud Data Storage is Unlimited

Quincy, WA Chicago, IL San Antonio, TX Dublin, Ireland Generation 4 DCs

Page 8: SymEx 2015 - Agile Process for Big Data Analytic

2015 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

Page 9: SymEx 2015 - Agile Process for Big Data Analytic
Page 10: SymEx 2015 - Agile Process for Big Data Analytic
Page 11: SymEx 2015 - Agile Process for Big Data Analytic
Page 12: SymEx 2015 - Agile Process for Big Data Analytic

Generic Tasks 1. Define Analytic

Requirement 2. Setup

Infrastructure 3. Collect Data 4. Data Modeling 5. Data Processing 6. Model Deployment 7. Monitoring 8. Evaluation 9. Etc….

Complexity

Valu

e

Descriptive Analytics

Diagnostic Analytics

Predictive Analytics

Prescriptive Analytics

What happened?

Why did it happen?

What will happen?

How can we make it happen?

Page 13: SymEx 2015 - Agile Process for Big Data Analytic
Page 14: SymEx 2015 - Agile Process for Big Data Analytic

Vision Analytics

Recommenda-tion engines

Advertising analysis

Weather forecasting for business planning

Social network analysis

Legal discovery and document archiving

Pricing analysis

Fraud detection

Churn analysis

Equipment monitoring

Location-based tracking and services

Personalized Insurance

Advance computation based on machine learning & predictive analytics are core capabilities that are needed throughout future business

Page 15: SymEx 2015 - Agile Process for Big Data Analytic
Page 16: SymEx 2015 - Agile Process for Big Data Analytic

Pull-based Batch Loads

Enterprise Data Models

Complex ETL Logic

Poorly Suited to Non-Relational Data

Emergent design is difficult

Page 17: SymEx 2015 - Agile Process for Big Data Analytic
Page 18: SymEx 2015 - Agile Process for Big Data Analytic
Page 19: SymEx 2015 - Agile Process for Big Data Analytic

Much More than Technologies people process

Page 20: SymEx 2015 - Agile Process for Big Data Analytic

New Roles: 1. Data Engineer 2. Data Scientists

Page 21: SymEx 2015 - Agile Process for Big Data Analytic
Page 22: SymEx 2015 - Agile Process for Big Data Analytic

CRISP-DM - Cross Industry Standard Process for Data Mining.

Framework for Guidance Process Model Non-proprietary Experience Base Application/Industry neutral Tool neutral Focus on business issues As well as technical analysis

Page 23: SymEx 2015 - Agile Process for Big Data Analytic
Page 24: SymEx 2015 - Agile Process for Big Data Analytic
Page 25: SymEx 2015 - Agile Process for Big Data Analytic
Page 26: SymEx 2015 - Agile Process for Big Data Analytic

Business Understanding

Data Understanding

Data Preparation Modeling Deployment Evaluation

Format Data

Integrate Data

Construct Data

Clean Data

Select Data

Determine Business

Objectives

Review Project

Produce Final

Report

Plan Monitoring &

Maintenance

Plan Deployment

Determine Next Steps

Review Process

Evaluate Results

Assess Model

Build Model

Generate Test Design

Select Modeling Technique

Assess Situation

Explore Data

Describe Data

Collect Initial Data

Determine Data Mining

Goals

Verify Data

Quality

Produce Project Plan

Page 27: SymEx 2015 - Agile Process for Big Data Analytic

Common Issues Learning curve for data science & data engineer. We can’t design insights, we discover it through exploring Low data quality .. Less insights from the data. The result is not good enough.

Key Strategies Extra dedicated time to learn before project sprints (Eq. MOOC). Add capabilities to explore data, iterate and publish intermediate results. Improve data quality based on feedbacks. Build-Measure-Release iteration.

Page 28: SymEx 2015 - Agile Process for Big Data Analytic