` 17.10.19 Modernising Data Architecture for AI Jon Teo Solution Specialist, Informatica
`
17.10.19
Modernising Data Architecture for AIJon Teo
Solution Specialist, Informatica
2 © Informatica. Proprietary and Confidential.
Objectives:
1. Importance of Data in Healthcare AI
2. Data Challenges in AI & Analytics
3. Modern Data Management Architecture
Data
Needs
AI
AI
Needs
Data
3 © Informatica. Proprietary and Confidential.
Some Working Definitions Before We Begin
digitalready.co
4 © Informatica. Proprietary and Confidential.
Health AI State of Play
Percentage (%) of healthcare professionals using Digital Health technology in their practice:
n = 3194
Percentage (%) of healthcare professionals comfortable using AI for:2.
1.
5 © Informatica. Proprietary and Confidential.
Myriad Healthcare Applications for AI
Health
Consumer
Institutions &
Providers
Research &
Discovery
6 © Informatica. Proprietary and Confidential.
Some Examples of AI Applications in:Population Management
Medicare Beneficiaries Leakage
Personalised Medicine
Customised Radiotherapy
Clinical Research
Clinical Trial Candidate Screening
PED Admissions Recruitment
7 © Informatica. Proprietary and Confidential.
The “Big Data” Vision for Precision Healthcare[Pan-omics + SDoH + Global Evidence Base] + Deep Learning = Precision Healthcare
The Tapestry of Potentially High-Value Information Sources
That May be Linked to an Individual for Use in Health Care
JAMA. 2014
Deep Medicine: How Artificial Intelligence Can Make
Healthcare Human Again, 1st Ed. Topol, Eric. 2018
8 © Informatica. Proprietary and Confidential.
Data Challenges in Health AI & Analytics
9 © Informatica. Proprietary and Confidential.
UBER STRATA SJO 2017
IDEATION EXPLORATION PREPARATION ANALYSIS BUILDING SHARING
SA
TIS
FA
CT
ION
Common Data Challenges in Analytics and AI Programmes
1. Data Discovery
& Access
2. Data Quality
& Preparation
3. Protection &
Permission4. Explainable AI
10 © Informatica. Proprietary and Confidential.
1. Discovery & Access to Data
Challenges for Data Scientists and Analytics teams:
• Large numbers of systems & data sources.
• Hybrid data sources, both in-house and in-cloud.
• “Data Swamps” in existing repositories.
• Reliance on IT to process access to data in a timely
manner.
• Addition of new data sources requires lengthy
system integration projects.
© Informatica. Proprietary and Confidential.1111
2. Data Preparation & Data Quality NeedsDeveloping AI capabilities require both quantity and quality in data
Image/Pattern Recognition Speech, Voice, NLP, Free-Text
Relationship Discovery
• Source of Data
• Image Pre-processing
• Tagging
• Ontology Management
• Regional Localisation
• Entity Extraction
• Intelligent Matching
• Ontology Management
• Graph structuring
12 © Informatica. Proprietary and Confidential.
3. Data Protection & Permissioning
Human Biomedical Research Act
Increased regulations, and expected accountability for Data Use
Top Reasons individuals
would use digital health
technology:
Assurance
health data is
secure
Sources of Data
Protection Friction:
• Data Protection & Privacy
Impact Assessments
• Approval processes
• Org behaviors & processes
• Appropriate data protection
mechanisms
csoonline.com
13 © Informatica. Proprietary and Confidential.
Applying AI in healthcare settings will likely face the
challenges if it is considered a “Black-Box”:
1. High-impact / high-cost decisions
2. When an unexpected recommendations are made
3. For post-hoc reviews of performance & incidents
encountered
4. Concerns about data-driven bias
5. Detecting ML ‘cheating’
4. The need for Explainable AI
Pacemaker
Regulatory
Approval
User
Adoption
Patient
Outcomes
Programme
Trust
14 © Informatica. Proprietary and Confidential.
Modern Data Management Architecture
Flexible Data Platform
3 Major Enablers for Modern Data Management
CollaborativeData
Governance
AI- AssistedData
Management
© Informatica. Proprietary and Confidential.1616
1. Modular
2. Scalable
3. Logical Data Warehouse & Data Lake
Architecture
4. Metadata-Driven
5. Enterprise Data Management
Capabilities
Flexible Data Platform Architecture
Data
Democratization
Real-Time
Operational
Analytics
Pervasive
Analytics & AI
IoT
Machine Data, &
Streaming Analytics
© Informatica. Proprietary and Confidential.
Platform Approach for Flexible Data Operations
Data Management
Platform
Business Operations
Business Process
Management
Next-Best
Recommendations
Digital Commerce
Automation
Predictive Analytics
Cognitive Analytics Prescriptive Analytics Descriptive Analytics
Self-service Analytics Streaming Analytics
Analytics
Outbound
Touch Points
Communities
Social
Web
Mobile / Text
Mobile Apps
To
uch
Po
int
Ro
utin
g
Inbound
Touch Points
Professional Services
Assisted Interaction
Clinical support
Admin Staff
Customer Service Rep
Email Web
IoT
Mobile Social
Health Consumer Data
Knowledge Base
Forums
Downloads
Unassisted Interaction
DatabasesApplication Servers
Documents
Mainframe
Operational
Data
External Data
Partner Data
SaaS
Big Data
Machine Data
IoT
Cloud
Clustering Algorithms Learning Algorithms
Natural Language Processing
AI
Recommendations
Categorization Classification
Architect Citizen Integrators IT Specialist Data Scientist Data Analyst Application Developers
© Informatica. Proprietary and Confidential.
Modular Platform for Flexible Data Operations
AI
Inbound
Touch Points
Professional Services
Assisted Interaction
Customer support
Sales Rep
Customer Service Rep
Email Web
IoT
Mobile Social
Consumer Interaction Data
Knowledge Base
Forums
Downloads
Unassisted Interaction
Architect Citizen Integrators IT Specialist Data ScientistApplication Developers
Business Operations
Business Process
Management
Next-Best
Recommendations
Digital Commerce
Automation
Predictive Analytics
Cognitive Analytics Prescriptive Analytics Descriptive Analytics
Self-service Analytics Streaming Analytics
Analytics
Outbound
Touch Points
Communities
Social
Web
Mobile / Text
Mobile Apps
To
uch
Po
int
Ro
utin
g
DatabasesApplication Servers
Documents
Mainframe
Operational
Data
External Data
Partner Data
SaaS
Big Data
Machine Data
IoT
Cloud
Clustering Algorithms Learning Algorithms
Natural Language ProcessingRecommendations
Categorization Classification
Integration
Platform
Data Management
Platform
Deployment (cloud,
on-premises)
Connectivity
Monitor and Manage
Multi-latency Ingestion, API &
Integration Patterns
Metadata Foundation
Master Data Management
360 Insights
Data Quality, Data Governance &
Data Privacy
Data Discovery & Cataloging
AI-enabled Automation
Data Analyst
19 © Informatica. Proprietary and Confidential.
Collaborative Data Governance
Unlocking Value of Data
Manual Effort
Policy – Implementation Gap
Top-Down & Siloed
Tra
dit
ion
al D
ata
Go
ve
rna
nc
eCompliance and Risk
Collaborative EffortDemocratised DG with proper
stakeholder interest alignment
Integrated ViewConnecting data & business via.
multi-dimensional viewpoints
Automation & ScalableFeasible to manage ‘4V’
explosion of data
20 © Informatica. Proprietary and Confidential.
Evolution of Data Governance Practice“Governance without Implementation is just Documentation”
Policy, Direction
& Definition
Risk
Management
Compliance &
Regulatory
Data
Ownership
Data & Digital
Strategy
Data
UniverseApplicationsData Stores Cloud EUC
Operational
Implementation
Controls and
Measurement
Master &
Reference Data
Management
Democratised
Data Access
Data Lifecycle
Management
Data
Protection
Enterprise Data
Catalog
Data Health
Management
Privacy &
Security
Analytics
CPO / CDO
Governance Office
Data
OwnerData
Steward
Data Users
Data
Architect
IT
Team
| “Data Governance is a Team Sport“
21 © Informatica. Proprietary and Confidential.
Operational
Implementation
Controls and
Measurement
Policy, Direction
& Definition
Evolution of Data Governance Practice
Risk
Management
Compliance &
Regulatory
Data
Ownership
Data & Digital
Strategy
Master &
Reference Data
Management
Democratised
Data Access
Data Lifecycle
Management
Governance Outcomes achieved in a cohesive, efficient manner
Data
Universe
AI -
Augmented
Governance
Community
Enterprise Data
Catalog
Data Health
Management
Privacy &
Security
Analytics
Data
Protection
ApplicationsData Stores Cloud EUC
Enterprise Data
Visibility
Bu
sin
ess D
efin
ition
s
Ph
ysic
al D
isco
ve
ry
De
tect &
De
sig
n
Imp
lem
en
t &
Me
asu
re
Sustainable
Data Quality
De
fine
Po
licy &
Pro
tectio
ns
En
forc
e &
Re
po
rt
Privacy &
Compliance
22 © Informatica. Proprietary and Confidential.
Example of DG Capability - Data Lineage
Business Data
Logical View
Policy
Owners
Physical Data
Resources
Data
Steward
Data
Engineers
Docum
ent / E
nfo
rce
Data
Analysts
Valid
ate
/ D
iscover
23 © Informatica. Proprietary and Confidential.
Data Governance helps “Context-explainable” AI
towardsdatascience.com
A working approach to improve AI
explainability:
1. Consider the whole AI development chain. How much do
we trust all the components that went into developing the
model?
2. Plan for “Explainability by Design” throughout the
development process.
24 © Informatica. Proprietary and Confidential.24
Collaborative Practices for “Context-Validated” AI
For AI Assets: ‘Data + Model + Context’.
• Provide traceability of the AI asset throughout its lifecycle
• Leverage multiple stakeholders in collaborative production of AI asset to provide end-to-end traceability & oversight
Example of Context-Validated AI Development:
1. Model – Logical representation of how data is processed to produce prediction, may include processing types, staging, weights, etc.
2. Model Code – Algorithm that processes data consistent with model to produce prediction
3. Data – Inputs used to produce the prediction
4. KPI – Quantification of predictive outcome
25 © Informatica. Proprietary and Confidential.25
Data
ScientistLoB
Executive
LoB
ExecutiveCitizen
Analyst
Intuition Report
KPI
(Predictive) ?
Simple AI development pipeline
Model+
Data
?
26 © Informatica. Proprietary and Confidential.26
Producer
Consumer
Technical Business
Data
Scientist
Data
Engineer
Data
Steward
LoB
ExecutiveCitizen
Analyst
Intuition
KPI
(Predictive)
Model C
ode (T
est)D
ata
(T
rain
ing)
Logic
al M
odel
Data Dictionary (Production)
Report LoB
Executive
Collaboration is Key to Explainable AI
Data Quality & Lineage (Production)
27 © Informatica. Proprietary and Confidential.
AI-assisted Data Management
Automation
• Discovery
• Next-best Actions
• Platform Scaling
AI
Explosion
in Volume
Data
ControlsNew Data Types
& Sources
• Intuitive UX
• Natural Language DQ
• Increase collaboration
Engagement
• Patterns
• Sense-making
• Platform Management
Insight
28 © Informatica. Proprietary and Confidential.
Example 1: Intelligent Structure Discovery
AI
Examples: clickstreams, log files, IoT data, txt,
csv, Excel, PDF, Word, etc..
• AI can automatically discover the structure in the
data.
30 © Informatica. Proprietary and Confidential.
Example 2: Discover & Catalogue Data Entities
“Real-world” Data Catalogue
• Relating Physical data to Business Glossary Entities is laborious, confusing and not sustainable.
• E.g. Address, Customer details may be normalized, cryptic column names, etc.
Data Problem
AI for Data Cataloguing
Like photo tagging
for data
• Unsupervised learning techniques to cluster &
classify similar data types.
• Learns associations of user-tagged data types to
tag similar concepts across the Enterprise.
• Learns concept hierarchies to derive composite
business entities across the Enterprise.
• Semantic search of Enterprise catalogue with AI-
led recommendations
31 © Informatica. Proprietary and Confidential.
Example 3: Handling Data Drift
Data Problem
AI for Data Drift
Original
Log
New version
Log
New fields that are not
in the model are
mapped to unassigned
ports
New date format is handled
correctly
Added spaces are
handled correctly• Data Sources and resources can
change ‘unannounced’. Traditional Data mapping is brittle.
• Data Drift can happen for formats, structure or meaning.
• Runtime processing can gracefully overcome noise and changes in incoming data.
• Unexpected data can be captured and processed.
Same Semantics, format change: 01/01/2019 and 01-01-2019 and Jan-01-2019Structural changes within file: If some records contain 10 fields other contain 8
Problem Definition
Explore Data
Prepare Data
Build & Test Model
Deploy Model
Output Results
(& Refine Model)
Many other AI applications for Data Management
Data Relationship
Inference
Business Term
Associations
Dataset
Similarity
Entity
ExtractionData Domain
Inference
Column
Similarity
Data Discovery & Access
Business Rules
Translation
Entity
Matching
Business Rule
Associations
Mass Data
Correction
Natural Language
Description of Code
Data Quality & Data Preparation
Schema
Inference
Protection & Permission
Data Anomaly
Detection
Self Secure
Operational
Anomaly Detection
Cost of
Data Breach
Data Pipeline Management
Self Healing
Processing
Self Tuning
ProcessingSmart Data
Visualization
Schedule
Optimization
Security
Analyst,
DPO
Data Scientist,
Data Steward
Data Scientist,
Data StewardData
Engineer
Data
Engineer
Problem Definition
Explore Data
Prepare Data
Build & Test Model
Deploy Model
Output Results (& Refine
Model)
Putting it All TogetherModernised Data Management powers future Analytics & AI development
Flexible Data Platform Architecture
Evolved Data Governance Practices
AI/ ML Enablers
-80%reduction in data quality
issues
Retailer, Australia
-50%workload for
data stewards
-3man-months of discovery effort
Health Provider, USA
Distributor, USA
PR
OD
UC
TS
SO
LU
TIO
NS
MULTI-
CLOUD
REAL TIME/
STREAMINGBIG DATA TRADITIONAL
MONITOR AND MANAGE
DATA ENGINE
CONNECTIVITY
DATA QUALITY & GOVERNANCE
MASTER DATAMANAGEMENT
BIG DATA MANAGEMENT
ENTERPRISEDATA CATALOG
DATASECURITY
DATAINTEGRATION
iPaaS
The Intelligent Data Platform
PRODUCT 360
SUPPLIER 360
CUSTOMER 360
REFERENCE 360
SECURE@SOURCEENTERPRISE DATA PREPARATION
ENTERPRISE DATA GOVERNANCE
CUSTOMER 360 INSIGHTS
36 © Informatica. Proprietary and Confidential.
Enhance Patient Lives with Data-Driven Decisions
• A Sanofi company, dedicated to transforming the lives of people with hemophilia and other rare blood disorders through world-class research, development, and commercialization of innovative therapies
• Needed a modern hybrid data architecture to easily support integrate and synchronize data between multiple hybrid sources such as Salesforce and Veeva CRM into Azure SQL DW
• Built a scalable solution (i.e., hardware and storage) leveraging Informatica Intelligent Cloud Services for data integration and management and Azure as the platform
• Gained faster time-to-insights and made better, faster, data-driven business decisions to respond quickly and reach more patients
© Informatica. Proprietary and Confidential.3737
Bioverativ’s Cloud Journey
Customer Relationship Management Rollout
2017
• Salesforce & Veeva CRM rollout
• New digital transformation
requirements arise that include
additional reporting requirements
• Additional patient and data
integration requirements increase
2018
Informatica Cloud Data Integration and Azure Rollout
• Implemented CRM Analytics CDW on
Azure DWH
• Informatica Intelligent Cloud Services
for data integration needs which
included support vendor & external
partner data systems
• Managed hybrid data source and
added new CRM services: Service
Cloud, Veeva CRM, Veeva Vault
2019+
Integration with Sanofi& Cloud Data Lakes
• Integrate with Sanofi data warehouse
infrastructure
• Implement data catalog, data lakes
• Implement predictive analytics to
support data scientists
38 © Informatica. Proprietary and Confidential.
“Big Data” in Healthcare is about Quantity and QualitySupervised training needs – labelled data.
1.4M hand-labelled images
878%
in global
health data
growth since
2016
800M
Medical
scans per
annum (US)
8.41
Petabytes
average data
generated per
organisation
© Informatica. Proprietary and Confidential.
Cloud Data Warehouse Modernization Blueprint
ENTERPRISE DATA CATALOG
DATA QUALITY & GOVERNANCE
DATA PRIVACY & PROTECTION
Visualization
Business Intelligence
MachineLearning
Elastic
Compute
CloudObject Store
API & Application Integration
Streaming
Ingestion
Common Enterprise MetadataAI/ML Engine
Cloud Data Integration
DatabasesApplication Servers
Mainframe
On-Premises
SaaS
LogsMachine DataConnected Devices
Edge
Replication & Mass Ingestion
Cloud Data Integration
Cloud Data
Warehouses
Cloud Data Integration
Cloud
Data LakeCloud
© Informatica. Proprietary and Confidential.
Intelligent Data Catalog
Ma
ch
ine
Hum
an
s
Meta
data
Colle
ction
Data
Cata
log U
se C
ases
Knowledge Graph
Structure Discovery
Profile and Domain Discovery
Recommendations
Similarity
Clustering
AI C
ura
ted C
ata
log
Busin
ess &
Cro
wd
Sourc
ed C
ura
tion
Data Asset
Management
Data
Governance
Self-Service
Analytics
Data Analyst Data Engineer Data Architect Data ScientistData Steward
Bu
sin
ess C
on
text
Glossary
Process
Policies
Bro
ad
Me
tad
ata
So
urc
e
Databases DocumentsMainframe
Cloud Data Warehouse
Application Servers
ETL Tools Other Metadata
Tools
Business Intelligence
Wisdom of Crowd
Annotations Comments
Ratings
Business
Classifications
Business Glossary
Associations