Improving Healthcare Outcomes with Deeper Insight from Anonymized Data
Post on 01-Nov-2014
244 Views
Preview:
DESCRIPTION
Transcript
www.privacyanalytics.ca | 855.686.4781info@privacyanalytics.ca
251 Laurier Avenue, Suite 200Ottawa, Ontario, Canada K1P 5J6
WEBINAR: Improving Healthcare Outcomes with
Deeper Insight from Anonymized Data
How to Get More Granular Analysis
by Automating Statistical De-identification
© 2014 Privacy Analytics, Inc.
Presenters
Luk Arbuckle, Director of Analytics,
Privacy Analytics, Inc.
Chris Wright, Vice President, Marketing and
Today’s Moderator, Privacy Analytics, Inc.
Grant Middleton, Solution Architect, Privacy
Analytics, Inc.
© 2014 Privacy Analytics, Inc.
1. Analytics Meets Privacy: Balancing Healthcare Imperatives with the Pressing Need to Know More
2. Incorporating Risk-based Approaches to Anonymization
3. Gaining Analytic Utility and Value from Anonymized Data Sets: A Case Study
4. Demonstrating the Application of Business Intelligence and Predictive Modelling to Anonymized Data
5. Summary
6. Question and Answer
Agenda
© 2014 Privacy Analytics, Inc.
Privacy Analytics
For organizations that want to safeguard and enable their data for
secondary use …
• Software that automates the de-identification
and masking of data using a risk-based
approach to anonymize personal information
• Integrated capabilities to anonymize
structured and unstructured data from
multiple sources
• Peer-reviewed methodologies and value-
added services that certify data as de-
identified using the expert statistical method
under HIPAA
© 2014 Privacy Analytics, Inc.
Creating Value with Analytics …
To control healthcare costs and drive greater efficiencies, organizations will need to
become more rigorous in their management, analysis and governance of data and
its privacy.
McKinsey & Company, “The ‘Big Data’ Revolution in Healthcare: Accelerating Value and Innovation,” January 2013
© 2014 Privacy Analytics, Inc.
Which is Difficult in an Analytic Ecosystem
4. Sourced: Misha Paval, Computer & Information Science and Engineering Directorate Information &
Intelligent Systems Division, National Science Division, Webinar, January 2012
© 2014 Privacy Analytics, Inc.
Reconciling Analytic Imperatives with Privacy
Population HealthRegulation
Comparative BenchmarkingReleasing Data
Detecting Fraud
Monetizing Data Compliance
Accelerating Research
Data Complexity
Re-identification Risk
Post-marketing surveillance
Data Breach
Marketing
Reputation
Ethics
© 2014 Privacy Analytics, Inc.
Bridging Analytics and Privacy
Secondary use of health data
applies outside of direct health
care delivery. It includes such
activities as analysis, research,
quality and safety
measurement, public health,
payment, provider certification
or accreditation, marketing and
other business applications.
Leveraging Data for Secondary Use
© 2014 Privacy Analytics, Inc.
Anonymizing Data for Analytic Utility
Greater Analytic
Utility
An
aly
tic Utility
Privacy Governance
To allow for richer analysis of anonymized data within well defined
regulatory and internal privacy protocols of our customers …
1. Identify and
classify variables in
the data
2. Mask direct
identifiers
3. Determine the
threshold for de-
identification
4. De-identify
indirect identifiers
5. Reporting and
certification
Current Masking Software
© 2014 Privacy Analytics, Inc.
Gaining Richer Analytic Value
Primary Structured and
Unstructured Data• Income = $82,000
• Plan # 54678
• MRN: 123
• cwright@
• Chris Wright
• Born Jan 15, 1978
• Zip code: 12345
Safe Harbor
Method (Data Masking)
Expert Determination
(Statistical De-identification)
• MRN: 123
• cwright@
• Chris Wright
• Born Jan 15, 1978
• Zip code: 12345
• Income = $82,000
• Plan # 54678
External Structured Data
• Income = $82,000
• Plan # 65123
• MRN: 589
• rwong@
• Robert Wong
• Born Sept 18, 1978
• Zip code: 12346
• Income = $82,000
• Plan # 54678
EMR data and notes at last PCP
visit:
• Admission date: 08/05/2012
• Discharge date: 08/07/2012
• MRN: 123
• cwright@
• Chris Wright
• Born Jan 15, 1978
• Zip code: 12345
Internal Structured Data
Structured & Unstructured Data
EMR data and notes at last PCP
visit:
• Admission date: 08/05/2012
• Discharge date: 08/07/2012
Statistical de-identification allows for richer data analysis
EMR data and notes at last PCP
visit:
• Admission date: 08/05/2012
• Discharge date: 08/07/2012
EMR data and notes at last PCP
visit:
• Admission date: 08/08/2012
• Discharge date: 08/10/2012
© 2014 Privacy Analytics, Inc.
1. Analytics Meets Privacy: Balancing Healthcare Imperatives with the Pressing Need to Know More
2. Incorporating Risk-based Approaches to Anonymization
3. Gaining Analytic Utility and Value from Anonymized Data Sets: A Case Study
4. Demonstrating the Application of Business Intelligence and Predictive Modelling to Anonymized Data
5. Summary
6. Question and Answer
Agenda
© 2014 Privacy Analytics, Inc.
Presenter
Luk Arbuckle, Director of Analytics,
Privacy Analytics, Inc.
© 2014 Privacy Analytics, Inc.
Statistical De-identification Method
If the measured risk does not
meet the threshold, specific
transformations (such as
generalization and
suppression) are applied to
reduce the risk.
Based on plausible re-
identification attacks, appropriate
metrics are selected and used to
measure actual re-identification
risk from the data.
De-identification
Process
Measure Risk
Apply Transformations Set Risk Threshold
Based on the characteristics of
the data recipient, the data,
and precedents a quantitative
threshold is set.
Managing the Risk of Re-identification
© 2014 Privacy Analytics, Inc.
Enabling Analytics to Use Anonymized Data
Managing the Risk of Re-identification is our Starting Point
We measure the risk of re-identification along a spectrum of identifiability that
takes into account an individual’s data, mitigating controls that protect it and how
the data will be used and governed for secondary purposes.
Individual Individual’s Data Mitigating Controls Analytic Purpose
Protect Individual Privacy
Gain Analytic Value
© 2014 Privacy Analytics, Inc.
Re-identification Risk: Example
Two matching
quasi identifiers
in three rows.
Two matching
quasi identifiers
in three rows.
Two matching
quasi identifiers
in three rows.
© 2014 Privacy Analytics, Inc.
Identifiablity Spectrum and Secondary Use
Range of Operational Precedents
Re-identification risk thresholds are established precedents used by leading
research organizations. These thresholds are based on the situational context and
mitigating controls associated with a data set’s use for secondary purposes. Data is
anonymized based on whether indirect identifiers can be matched within a given
cell size.
5
20
3
2
10
Identifiable information De-identified Information
811
16
© 2014 Privacy Analytics, Inc.
Identifiablity Spectrum and Secondary Use
Range of Operational Precedents
Re-identification risk thresholds are established precedents used by leading
research organizations depending on how they assess the risk of disclosure. As
such, they use a wide variety of operational precedents to trigger the application of
anonymization techniques. What we’ve done is captured and automated them.
Little De-identification Significant De-identification
5
20
3
2
10
811
16
© 2014 Privacy Analytics, Inc.
Measuring Re-identification Risk
18
© 2014 Privacy Analytics, Inc.
1. Analytics Meets Privacy: Balancing Healthcare Imperatives with the Pressing Need to Know More
2. Incorporating Risk-based Approaches to Anonymization
3. Gaining Analytic Utility and Value from Anonymized Data Sets
4. Demonstrating the Application of Business Intelligence and Predictive Modelling to Anonymized Data
5. Summary
6. Question and Answer
Agenda
© 2014 Privacy Analytics, Inc.
Post-marketing and Public Health Surveillance
Challenges:
• Significant size and complex data set. Held
more than five years of clinical, prescription,
laboratory, scheduling and billing data of
patients
• Numerous release requests from 2,664 clinics
and 5,850 physicians
• Data complexity: 820 columns/73 tables
Case Study: EMR Software Vendor
Analytic Outcomes:
De-identified data to analyze:
• Post-marketing surveillance of adverse events
• Public health surveillance
• Prescription pattern analysis
• Health services analysis
� Wanted to anonymize
data on 535,595
patients from general
practices
� Longitudinal data
needed to be used for
on-going and on-
demand analytics
20
© 2014 Privacy Analytics, Inc.
Assessing Analytic Value – Date Shifting
Length of service (LOS) before and after date shifting was performed on this EMR
data. We examined whether the date shifting associated with anonymization
lengthens or shortens the LOS for patients.
Source: Anonymizing Health Data, Chapter 13, De-identification and Data Quality: A Clinical Data Warehouse
© 2014 Privacy Analytics, Inc.
What We Discovered …
Source: Anonymizing Health Data, Chapter 13, De-identification and Data Quality: A Clinical Data Warehouse
Length of service (LOS) was the same before and after statistical de-identification.
The mean difference in LOS before and after date shifting is 0.2 days. The expected
LOS follows a normal distribution for the 90% of patients shown in this diagram,
which makes date shifting seem like a perfectly natural process.
© 2014 Privacy Analytics, Inc.
Summary
• Most analyses performed on clinical database use descriptive statistics and cross-tabulations
• Anonymization meets the requirements of these techniques, while maintaining the essential analytic utility of the original data
• Data evaluation should be statistical as opposed to deterministic – comparing a before and after approach of an anonymized data set
• In short, anonymization allowed this vendor to fully leverage their data for secondary purposes – all within a reasonable range of optimal utility and value
© 2014 Privacy Analytics, Inc.
Presenter
Grant Middleton, Solution Architect,
Privacy Analytics, Inc.
© 2014 Privacy Analytics, Inc.
1. Analytics Meets Privacy: Balancing Healthcare Imperatives with the Pressing Need to Know More
2. Incorporating Risk-based Approaches to Anonymization
3. Gaining Analytic Utility and Value from Anonymized Data Sets: Case Study
4. Demonstrating the Application of Business Intelligence and Predictive Modelling to Anonymized Data
5. Summary
6. Question and Answer
Agenda
© 2014 Privacy Analytics, Inc.
PARAT
Providing organizations with a scalable set of capabilities to automate
the anonymization of structured data
• Evaluate data quality for analysis after de-
identification
• Simulate attacks to determine levels of risk
associated with the re-identification of personal
information
• Configure re-identification risk threshold settings
directly from Privacy Analytics’ online Risk
Assessment application
• Determine enterprise policies for data sharing using
risk-based methodologies for assessing re-
identification
• Automate data sharing agreements and
certifications that confirm risks are “very small” for
re-identification
Stronger Safeguards. Richer Analysis. Integrated Solution.
© 2014 Privacy Analytics, Inc.
PARAT + BI
© 2014 Privacy Analytics, Inc.
Presenter
Luk Arbuckle, Director Analytics,
Privacy Analytics, Inc.
© 2014 Privacy Analytics, Inc.
PARAT +
Predictive
© 2014 Privacy Analytics, Inc.
Predictive Modelling and Anonymized Data
Before anonymization
days
no
yes
yes
A simple model for a complex problem—predicting the
number of days until the next visit. With more variables, and
more data prep, you can get a much more accurate model.
© 2014 Privacy Analytics, Inc.
Predictive Modelling and Anonymized Data
After anonymization
days
no
yes
yes
But the point is that the results are almost identical. Date
shifting, with randomized intervals, allows us to develop
predictive models that give us the same answers.
© 2014 Privacy Analytics, Inc.
Predictive Modelling and Anonymized Data
• Predictive modelling, done right, is challenging. You need a rich source of data to begin with. Then you need to clean and format it, so that you have quality data to work with.
• Anonymization, done right, can provide you with a rich source of data. The data cleaning and formatting will still be there, but no more than before.
• Predictive modelling can produce the same results, before and after anonymization. Put the time and effort into anonymization so you have quality data to work with.
© 2014 Privacy Analytics, Inc.
1. Analytics Meets Privacy: Balancing Healthcare Imperatives with the Pressing Need to Know More
2. Incorporating Risk-based Approaches to Anonymization
3. Gaining Analytic Utility and Value from Anonymized Data Sets: Case Study
4. Demonstrating the Application of Business Intelligence and Predictive Modelling to Anonymized Data
5. Summary
6. Question and Answer
Agenda
© 2014 Privacy Analytics, Inc.
Balancing Privacy with Data Utility
Data Quality1 Analytic Granularity2 Depth of Insight3
Ensuring de-identified
data has analytic
usefulness by minimizing
the amount of distortion
but still ensure that re-
identification risk is very
small
Allowing users to
configure the extent of
de-identification to match
the characteristics of the
analysis that is
anticipated
Enabling analysis of the
total patient health
experience, to compile a
complete picture of this
experience from multiple
data sources and types
The Analytic Benefits of a Statistical De-identified Method
© 2014 Privacy Analytics, Inc.
Upcoming Events
• April 16-17, Healthcare Business Intelligence Forum,
Washington, D.C.
• April 23, Noon EST, The Second Part of Webinar Series: Fear and
Loathing Data Monetization
• May 21-22, e-Health Initiative, Washington, D.C.
• Take the Anonymization Survey:
• http://surveys.ronin.com/wix/p1834200753.aspx?src=1
© 2014 Privacy Analytics, Inc.
If you’d like to learn more, we’re offering free of charge our latest chapter 13 from Anonymizing Health Data, which provides greater detail into the case study presented today.
� Learn proven methods for anonymizing health data to
share meaningful datasets, without exposing patient
identity
� Leading experts walk you through a risk-based
methodology, using case studies from their efforts to de-
identify hundreds of data sets
� Drop me a line if you’d like a copy of the chapter:
cwright@privacyanalytics.ca
Resources
Also, contact me to learn more. We can set up a personalized
demo or have a discussion on your current anonymization needs.
Just drop me a line.
top related