Top Banner
www.privacyanalytics.ca | 855.686.4781 [email protected] 251 Laurier Avenue, Suite 200 Ottawa, Ontario, Canada K1P 5J6 WEBINAR: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data How to Get More Granular Analysis by Automating Statistical De-identification
36

Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

Nov 01, 2014

Download

Healthcare

How to Get More Granular Data Analysis by Automating Statistical De-identification
(View webinar by visiting this link: https://vimeo.com/90976799)

Leveraging healthcare data for secondary use can have a positive impact on the efficacy of healthcare service delivery, patient care and population health analysis. And yet, traditional approaches to anonymization limit the ability of statisticians and analysts to gain more granular insight into data sets used for secondary purposes.

Part one of this webinar series examines how organizations can automate statistical de-identification to optimize the application of business intelligence and advanced analytics, enabling more granular-level analysis of anonymized data sets.

Privacy and compliance and data analytic professionals will learn how their organizations can:

Apply risk-based approaches to anonymize data based on situational and governing principles of its use;
Automate statistical de-identification of data sets to maximize the application of business intelligence and predictive software insights; and,
Understand how other organizations have applied statistical de-identification to data with real-world examples and demos.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

www.privacyanalytics.ca | [email protected]

251 Laurier Avenue, Suite 200Ottawa, Ontario, Canada K1P 5J6

WEBINAR: Improving Healthcare Outcomes with

Deeper Insight from Anonymized Data

How to Get More Granular Analysis

by Automating Statistical De-identification

Page 2: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Presenters

Luk Arbuckle, Director of Analytics,

Privacy Analytics, Inc.

Chris Wright, Vice President, Marketing and

Today’s Moderator, Privacy Analytics, Inc.

Grant Middleton, Solution Architect, Privacy

Analytics, Inc.

Page 3: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

1. Analytics Meets Privacy: Balancing Healthcare Imperatives with the Pressing Need to Know More

2. Incorporating Risk-based Approaches to Anonymization

3. Gaining Analytic Utility and Value from Anonymized Data Sets: A Case Study

4. Demonstrating the Application of Business Intelligence and Predictive Modelling to Anonymized Data

5. Summary

6. Question and Answer

Agenda

Page 4: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Privacy Analytics

For organizations that want to safeguard and enable their data for

secondary use …

• Software that automates the de-identification

and masking of data using a risk-based

approach to anonymize personal information

• Integrated capabilities to anonymize

structured and unstructured data from

multiple sources

• Peer-reviewed methodologies and value-

added services that certify data as de-

identified using the expert statistical method

under HIPAA

Page 5: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Creating Value with Analytics …

To control healthcare costs and drive greater efficiencies, organizations will need to

become more rigorous in their management, analysis and governance of data and

its privacy.

McKinsey & Company, “The ‘Big Data’ Revolution in Healthcare: Accelerating Value and Innovation,” January 2013

Page 6: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Which is Difficult in an Analytic Ecosystem

4. Sourced: Misha Paval, Computer & Information Science and Engineering Directorate Information &

Intelligent Systems Division, National Science Division, Webinar, January 2012

Page 7: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Reconciling Analytic Imperatives with Privacy

Population HealthRegulation

Comparative BenchmarkingReleasing Data

Detecting Fraud

Monetizing Data Compliance

Accelerating Research

Data Complexity

Re-identification Risk

Post-marketing surveillance

Data Breach

Marketing

Reputation

Ethics

Page 8: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Bridging Analytics and Privacy

Secondary use of health data

applies outside of direct health

care delivery. It includes such

activities as analysis, research,

quality and safety

measurement, public health,

payment, provider certification

or accreditation, marketing and

other business applications.

Leveraging Data for Secondary Use

Page 9: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Anonymizing Data for Analytic Utility

Greater Analytic

Utility

An

aly

tic Utility

Privacy Governance

To allow for richer analysis of anonymized data within well defined

regulatory and internal privacy protocols of our customers …

1. Identify and

classify variables in

the data

2. Mask direct

identifiers

3. Determine the

threshold for de-

identification

4. De-identify

indirect identifiers

5. Reporting and

certification

Current Masking Software

Page 10: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Gaining Richer Analytic Value

Primary Structured and

Unstructured Data• Income = $82,000

• Plan # 54678

• MRN: 123

• cwright@

• Chris Wright

• Born Jan 15, 1978

• Zip code: 12345

Safe Harbor

Method (Data Masking)

Expert Determination

(Statistical De-identification)

• MRN: 123

• cwright@

• Chris Wright

• Born Jan 15, 1978

• Zip code: 12345

• Income = $82,000

• Plan # 54678

External Structured Data

• Income = $82,000

• Plan # 65123

• MRN: 589

• rwong@

• Robert Wong

• Born Sept 18, 1978

• Zip code: 12346

• Income = $82,000

• Plan # 54678

EMR data and notes at last PCP

visit:

• Admission date: 08/05/2012

• Discharge date: 08/07/2012

• MRN: 123

• cwright@

• Chris Wright

• Born Jan 15, 1978

• Zip code: 12345

Internal Structured Data

Structured & Unstructured Data

EMR data and notes at last PCP

visit:

• Admission date: 08/05/2012

• Discharge date: 08/07/2012

Statistical de-identification allows for richer data analysis

EMR data and notes at last PCP

visit:

• Admission date: 08/05/2012

• Discharge date: 08/07/2012

EMR data and notes at last PCP

visit:

• Admission date: 08/08/2012

• Discharge date: 08/10/2012

Page 11: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

1. Analytics Meets Privacy: Balancing Healthcare Imperatives with the Pressing Need to Know More

2. Incorporating Risk-based Approaches to Anonymization

3. Gaining Analytic Utility and Value from Anonymized Data Sets: A Case Study

4. Demonstrating the Application of Business Intelligence and Predictive Modelling to Anonymized Data

5. Summary

6. Question and Answer

Agenda

Page 12: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Presenter

Luk Arbuckle, Director of Analytics,

Privacy Analytics, Inc.

Page 13: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Statistical De-identification Method

If the measured risk does not

meet the threshold, specific

transformations (such as

generalization and

suppression) are applied to

reduce the risk.

Based on plausible re-

identification attacks, appropriate

metrics are selected and used to

measure actual re-identification

risk from the data.

De-identification

Process

Measure Risk

Apply Transformations Set Risk Threshold

Based on the characteristics of

the data recipient, the data,

and precedents a quantitative

threshold is set.

Managing the Risk of Re-identification

Page 14: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Enabling Analytics to Use Anonymized Data

Managing the Risk of Re-identification is our Starting Point

We measure the risk of re-identification along a spectrum of identifiability that

takes into account an individual’s data, mitigating controls that protect it and how

the data will be used and governed for secondary purposes.

Individual Individual’s Data Mitigating Controls Analytic Purpose

Protect Individual Privacy

Gain Analytic Value

Page 15: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Re-identification Risk: Example

Two matching

quasi identifiers

in three rows.

Two matching

quasi identifiers

in three rows.

Two matching

quasi identifiers

in three rows.

Page 16: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Identifiablity Spectrum and Secondary Use

Range of Operational Precedents

Re-identification risk thresholds are established precedents used by leading

research organizations. These thresholds are based on the situational context and

mitigating controls associated with a data set’s use for secondary purposes. Data is

anonymized based on whether indirect identifiers can be matched within a given

cell size.

5

20

3

2

10

Identifiable information De-identified Information

811

16

Page 17: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Identifiablity Spectrum and Secondary Use

Range of Operational Precedents

Re-identification risk thresholds are established precedents used by leading

research organizations depending on how they assess the risk of disclosure. As

such, they use a wide variety of operational precedents to trigger the application of

anonymization techniques. What we’ve done is captured and automated them.

Little De-identification Significant De-identification

5

20

3

2

10

811

16

Page 18: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Measuring Re-identification Risk

18

Page 19: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

1. Analytics Meets Privacy: Balancing Healthcare Imperatives with the Pressing Need to Know More

2. Incorporating Risk-based Approaches to Anonymization

3. Gaining Analytic Utility and Value from Anonymized Data Sets

4. Demonstrating the Application of Business Intelligence and Predictive Modelling to Anonymized Data

5. Summary

6. Question and Answer

Agenda

Page 20: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Post-marketing and Public Health Surveillance

Challenges:

• Significant size and complex data set. Held

more than five years of clinical, prescription,

laboratory, scheduling and billing data of

patients

• Numerous release requests from 2,664 clinics

and 5,850 physicians

• Data complexity: 820 columns/73 tables

Case Study: EMR Software Vendor

Analytic Outcomes:

De-identified data to analyze:

• Post-marketing surveillance of adverse events

• Public health surveillance

• Prescription pattern analysis

• Health services analysis

� Wanted to anonymize

data on 535,595

patients from general

practices

� Longitudinal data

needed to be used for

on-going and on-

demand analytics

20

Page 21: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Assessing Analytic Value – Date Shifting

Length of service (LOS) before and after date shifting was performed on this EMR

data. We examined whether the date shifting associated with anonymization

lengthens or shortens the LOS for patients.

Source: Anonymizing Health Data, Chapter 13, De-identification and Data Quality: A Clinical Data Warehouse

Page 22: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

What We Discovered …

Source: Anonymizing Health Data, Chapter 13, De-identification and Data Quality: A Clinical Data Warehouse

Length of service (LOS) was the same before and after statistical de-identification.

The mean difference in LOS before and after date shifting is 0.2 days. The expected

LOS follows a normal distribution for the 90% of patients shown in this diagram,

which makes date shifting seem like a perfectly natural process.

Page 23: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Summary

• Most analyses performed on clinical database use descriptive statistics and cross-tabulations

• Anonymization meets the requirements of these techniques, while maintaining the essential analytic utility of the original data

• Data evaluation should be statistical as opposed to deterministic – comparing a before and after approach of an anonymized data set

• In short, anonymization allowed this vendor to fully leverage their data for secondary purposes – all within a reasonable range of optimal utility and value

Page 24: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Presenter

Grant Middleton, Solution Architect,

Privacy Analytics, Inc.

Page 25: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

1. Analytics Meets Privacy: Balancing Healthcare Imperatives with the Pressing Need to Know More

2. Incorporating Risk-based Approaches to Anonymization

3. Gaining Analytic Utility and Value from Anonymized Data Sets: Case Study

4. Demonstrating the Application of Business Intelligence and Predictive Modelling to Anonymized Data

5. Summary

6. Question and Answer

Agenda

Page 26: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

PARAT

Providing organizations with a scalable set of capabilities to automate

the anonymization of structured data

• Evaluate data quality for analysis after de-

identification

• Simulate attacks to determine levels of risk

associated with the re-identification of personal

information

• Configure re-identification risk threshold settings

directly from Privacy Analytics’ online Risk

Assessment application

• Determine enterprise policies for data sharing using

risk-based methodologies for assessing re-

identification

• Automate data sharing agreements and

certifications that confirm risks are “very small” for

re-identification

Stronger Safeguards. Richer Analysis. Integrated Solution.

Page 27: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

PARAT + BI

Page 28: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Presenter

Luk Arbuckle, Director Analytics,

Privacy Analytics, Inc.

Page 29: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

PARAT +

Predictive

Page 30: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Predictive Modelling and Anonymized Data

Before anonymization

days

no

yes

yes

A simple model for a complex problem—predicting the

number of days until the next visit. With more variables, and

more data prep, you can get a much more accurate model.

Page 31: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Predictive Modelling and Anonymized Data

After anonymization

days

no

yes

yes

But the point is that the results are almost identical. Date

shifting, with randomized intervals, allows us to develop

predictive models that give us the same answers.

Page 32: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Predictive Modelling and Anonymized Data

• Predictive modelling, done right, is challenging. You need a rich source of data to begin with. Then you need to clean and format it, so that you have quality data to work with.

• Anonymization, done right, can provide you with a rich source of data. The data cleaning and formatting will still be there, but no more than before.

• Predictive modelling can produce the same results, before and after anonymization. Put the time and effort into anonymization so you have quality data to work with.

Page 33: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

1. Analytics Meets Privacy: Balancing Healthcare Imperatives with the Pressing Need to Know More

2. Incorporating Risk-based Approaches to Anonymization

3. Gaining Analytic Utility and Value from Anonymized Data Sets: Case Study

4. Demonstrating the Application of Business Intelligence and Predictive Modelling to Anonymized Data

5. Summary

6. Question and Answer

Agenda

Page 34: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Balancing Privacy with Data Utility

Data Quality1 Analytic Granularity2 Depth of Insight3

Ensuring de-identified

data has analytic

usefulness by minimizing

the amount of distortion

but still ensure that re-

identification risk is very

small

Allowing users to

configure the extent of

de-identification to match

the characteristics of the

analysis that is

anticipated

Enabling analysis of the

total patient health

experience, to compile a

complete picture of this

experience from multiple

data sources and types

The Analytic Benefits of a Statistical De-identified Method

Page 35: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

Upcoming Events

• April 16-17, Healthcare Business Intelligence Forum,

Washington, D.C.

• April 23, Noon EST, The Second Part of Webinar Series: Fear and

Loathing Data Monetization

• May 21-22, e-Health Initiative, Washington, D.C.

• Take the Anonymization Survey:

• http://surveys.ronin.com/wix/p1834200753.aspx?src=1

Page 36: Improving Healthcare Outcomes with Deeper Insight from Anonymized Data

© 2014 Privacy Analytics, Inc.

If you’d like to learn more, we’re offering free of charge our latest chapter 13 from Anonymizing Health Data, which provides greater detail into the case study presented today.

� Learn proven methods for anonymizing health data to

share meaningful datasets, without exposing patient

identity

� Leading experts walk you through a risk-based

methodology, using case studies from their efforts to de-

identify hundreds of data sets

� Drop me a line if you’d like a copy of the chapter:

[email protected]

Resources

Also, contact me to learn more. We can set up a personalized

demo or have a discussion on your current anonymization needs.

Just drop me a line.