• More than 17 years experiences in IT industry with theoretical physics background. He started as scientific programmer at University of Indonesia’s semiconductor lab then later worked as software engineer and architect in various software companies. He joined SRIN in 2014 to lead development of various mobile apps and middleware platforms, as well as to conduct research projects on predictive data analytic using deep machine learning technologies. Prior to SRIN, he spent 10 years at Microsoft Indonesia as Director of Developer Ecosystem (DX) division. insert photo
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
• More than 17 years experiences in IT industry with theoretical physics background. He started as scientific programmer at University of Indonesia’s semiconductor lab then later worked as software engineer and architect in various software companies. He joined SRIN in 2014 to lead development of various mobile apps and middleware platforms, as well as to conduct research projects on predictive data analytic using deep machine learning technologies. Prior to SRIN, he spent 10 years at Microsoft Indonesia as Director of Developer Ecosystem (DX) division.
insert photo
Context of “Big Data” Science Scope of Data Analytic Project Management Complexities Team Structure and R&R Agile Principles and Process Model Common Execution Issues Q&A
Volume Exceeds physical limits of vertical scalability
Velocity Decision window small compared to data change rate
Variety Many different formats makes integration expensive
Variability Many options or variable interpretations confound analysis
By 2015, organizations that build a modern information management system will outperform their peers financially by 20 percent.
– – Gartner, Mark Beyer “Information Management in the 21st Century”
Data, Data, .. Everywhere New Data Sources Larger Data Volumes
New Data Management Technologies Hadoop + Spark + Tool Ecosystem
New Era of Data Analytic Descriptive, Predictive & Prescriptive Data-Driven Organization
Infrastructure 3. Collect Data 4. Data Modeling 5. Data Processing 6. Model Deployment 7. Monitoring 8. Evaluation 9. Etc….
Complexity
Valu
e
Descriptive Analytics
Diagnostic Analytics
Predictive Analytics
Prescriptive Analytics
What happened?
Why did it happen?
What will happen?
How can we make it happen?
Vision Analytics
Recommenda-tion engines
Advertising analysis
Weather forecasting for business planning
Social network analysis
Legal discovery and document archiving
Pricing analysis
Fraud detection
Churn analysis
Equipment monitoring
Location-based tracking and services
Personalized Insurance
Advance computation based on machine learning & predictive analytics are core capabilities that are needed throughout future business
Pull-based Batch Loads
Enterprise Data Models
Complex ETL Logic
Poorly Suited to Non-Relational Data
Emergent design is difficult
Much More than Technologies people process
New Roles: 1. Data Engineer 2. Data Scientists
CRISP-DM - Cross Industry Standard Process for Data Mining.
Framework for Guidance Process Model Non-proprietary Experience Base Application/Industry neutral Tool neutral Focus on business issues As well as technical analysis
Business Understanding
Data Understanding
Data Preparation Modeling Deployment Evaluation
Format Data
Integrate Data
Construct Data
Clean Data
Select Data
Determine Business
Objectives
Review Project
Produce Final
Report
Plan Monitoring &
Maintenance
Plan Deployment
Determine Next Steps
Review Process
Evaluate Results
Assess Model
Build Model
Generate Test Design
Select Modeling Technique
Assess Situation
Explore Data
Describe Data
Collect Initial Data
Determine Data Mining
Goals
Verify Data
Quality
Produce Project Plan
Common Issues Learning curve for data science & data engineer. We can’t design insights, we discover it through exploring Low data quality .. Less insights from the data. The result is not good enough.
Key Strategies Extra dedicated time to learn before project sprints (Eq. MOOC). Add capabilities to explore data, iterate and publish intermediate results. Improve data quality based on feedbacks. Build-Measure-Release iteration.