Building Enterprise Advance Analytics Platform SoCal Data Science Conference 09.25.2016 Raymond Fu Practice Architect Trace3 T3
Building EnterpriseAdvance Analytics PlatformSoCal Data Science Conference 09.25.2016
Raymond FuPractice ArchitectTrace3
T3
22
Raymond FuPractice Architect, Trace3
16 years of IT experience specializing in big data, business intelligence, and enterprise architecture. 10 year corporate career with Bank of America highlighted by leading many data integrations and warehousing initiatives from mergers and acquisitions.
Founded his own technology company Xceed Consulting Group in 2012 enabling data driven solutions.
Joined California based consulting company Trace3 in 2016 as a practice architect for the Data Intelligence team.
Blog: Everything About Data
Twitter: @RaymondxFu
• Typically, organizations got a firm grasp on required People, Process, and Technology to deliver capabilities, articulate end-to-end roadmap, identify platforms and resources.
• Big Data disrupts the traditional architecture paradigm. Organizations may have an idea or interest, but they don’t necessarily know what will come out of it.
• The answer or outcome for an initial question will trigger the next set of questions. It requires a unique combination of skill sets, the likes of which are new and not in abundance.
• The pursuit of the answer is advanced analytics.
Big Data Disruption
3
Advanced Analytics Definition
• The process, tools, technology, and collaboration to create predictive models that enable/drive strategic and operational decisions. The predictive models (1) generate insights and hypotheses and (2) test/score them through experiments, so organizations KNOW what works better.
• Predictive models are created using machine learning, deep learning, advanced data management tools and visualization tools
• An integral part of Advanced Analytics includes the operationalization of the predictive models so they can be rapidly scored and decisioned at scale
Advanced Analytics Relevancy
5
Organizations’ goals
Advanced Analytics’ goals
What’s different today
Obstacles to the goals
Advanced Analytics Process
6• Domain
knowledge• Hypothesis
development
• Model architecture• Algorithm selection and development• Feature engineering• Visualization
Collaboration
Reproducibility
• Data mining• Statistical data shaping• Training• Cross-validation testing• Environment and libraries
Production feature
generation, modeling, testing
DeploymentParallel
experiments
• Performance assessment
• Connectivity• Landing• Ingestion• Knowledge• Preparation
Business metric assessment
Data management
Analytics creation(business modeling)
Analytics operationalization(model production and deployment)
Organization and business
impact
• Continuous integration and deployment
• Model iteration and redeployment
IT/DE, DS LoB, DS DS, IT/DE, LoB LoB, DS, IT/DE
• R-T and batch scoring
• Decisioning
Enterprise Big Data Strategy• Information management
• Data architecture, data governance and meta data management. • Address key issues such as data integration and data quality.
• Data platform modernization• Enterprise data warehouse offload.• Data lake platform assessment.
• Advanced Analytics• Methodology• Tools recommendation• Operationalization
• Step 1 – Establish Business Context and Scope (incubate ideas)
• Step 2 – Establish an Architecture Vision
• Step 3 – Assess the Current State
• Step 4 – Establish Future State and Economic Model
• Step 5 – Develop a Strategic Roadmap
• Step 6 – Establish Governance over the Architecture
Enterprise Architecture Approach
Establishing an Architecture Vision
9
The architecture development process needs to be more fluid and different from SDLC-like architecture process. It must allow organizations to continuously assess progress, correct course where needed, balance cost, and gain acceptance.
Advanced Analytics Capabilities
10
Category Capability Items
Organization and business impact
Fast, informed decisions • Time from question to hypothesis to model implementation to informed decision
Strategic and operational role
• Degree of input into business/policy decisions• Perceived and quantified value of analytics
Analytics operationalization
Model performance
• Execution of experiments in parallel• Model performance for scoring and decisioning
Model deployment • Continuous integration and deployment
Analytics creation
Efficient model creation
• Use of data mining and visualization tools• Rapidly spun-up environment customized to individual data scientists that enables execution of large data sets
and highly mathematical algorithms• Collaboration among data scientists and between data scientist and lines of business; reuse of data sets and
models• Model reproducibility (including versions, algorithms, data sets, parameters, notes, environment)
Appropriate model selection
• Understanding, and appropriate use, of model architecture and algorithms, feature engineering, hyper parameterization, statistical and mathematical concepts, training and validation, scoring, and decisioning
• Use of ML and DL concepts, tools, and libraries• Use of graph systems
Data management
Data capability • Infrastructure and tools to access and cleanse data
Data knowledge and confidence
• Understanding of, and confidence in, data (e.g. what is available, their relationships)
Data access • Access to internal and external data through infrastructure, logical associations, and tools
13
Structured data source Unstructured data source
RDBMS
Big Data
Business Intelligence / Data Visualization Advanced Analytics
HDFS NoSQL Cloud Storage
ETLETL
Teradata
Operation
CRM ERP Accounting Clickstream Sensor Info Images/Video Event Logs Social Media
Tools
Real-timeStreaming
Library (ML and DL) Online ML
AWS
Azuretorch
Machine Learning APIGoogle PredictionAWSAzureBigML IBM Watson
Advanced Analytics Services
14
Service Type
Services
Overall Assessment
• Advanced Analytics assessment
Architecture • Architecture for data science• Architecture for cloud analytics
ETL/ELT
• Data source identification and integration
• Data virtualization• Data preparation
Data analysis and modeling (data science)
• Statistical / quantitative analysis• Descriptive analysis• Predictive modeling• Machine learning• Deep learning• Graph systems• Simulation and optimization
Service Type ServicesVisualization and insight presentation and recommendations
• Data exploration / mining / advanced visualization to understand the data
• Insight presentation and recommendations
Tools recommendation
• Infrastructure• Software tools• Software environment, programming,
libraries
Process improvement
• Analytics process improvement• Data governance• Model governance• Continuous integration and deployment of
models
Organizational capabilities
• Advanced analytics organization structure and roles
• Advanced analytics training • Advanced analytics staff augmentation
Best Practice
15
• Align Analytics with Specific Business Goals • Ease Skills Shortage with Standards and Governance • Optimize Knowledge Transfer with a Center of Excellence • Top Payoff is Aligning Unstructured with Structured Data • Plan Your Discovery Lab for Performance • Align with the Cloud Operating Model