Making Large Data Sets Work for You Advantages and Challenges Lesley H Curtis Soko Setoguchi Bradley G Hammill.
Post on 25-Dec-2015
216 Views
Preview:
Transcript
Making Large Data Sets Work for YouMaking Large Data Sets Work for YouAdvantages and ChallengesAdvantages and Challenges
Lesley H CurtisLesley H Curtis
Soko SetoguchiSoko Setoguchi
Bradley G HammillBradley G Hammill
Presenter disclosure informationPresenter disclosure information
Lesley H CurtisLesley H Curtis
Large Data Sets: An OverviewLarge Data Sets: An Overview
FINANCIAL DISCLOSURE: FINANCIAL DISCLOSURE:
NoneNone
UNLABELED/UNAPPROVED USES DISCLOSURE:UNLABELED/UNAPPROVED USES DISCLOSURE:
NoneNone
Agenda Agenda
Large Data Sets: An OverviewLarge Data Sets: An Overview
Prescription Drug Data: Advantages, Availability, Prescription Drug Data: Advantages, Availability, and Accessand Access
Linking Large Data Sets: Why, How, and What Linking Large Data Sets: Why, How, and What Not to DoNot to Do
Practical ExamplesPractical Examples
Which large data sets?Which large data sets?
Relevant for cardiovascular researchRelevant for cardiovascular research
Available to researchersAvailable to researchers
Potential for linkagePotential for linkage
Claims data—federal and commercialClaims data—federal and commercial
Inpatient registriesInpatient registries
Longitudinal cohort studiesLongitudinal cohort studies
Claims dataClaims data
Derived from payment of bills Derived from payment of bills
Payor-centricPayor-centric
ExamplesExamples MedicareMedicare MedicaidMedicaid Thomson-ReutersThomson-Reuters United Health CareUnited Health Care
Medicare claims dataMedicare claims data
Inpatient services (Part A)Inpatient services (Part A)
Outpatient services (Part B)Outpatient services (Part B)
Physician services (Carrier, Part B)Physician services (Carrier, Part B)
Durable medical equipmentDurable medical equipment
Home health careHome health care
Skilled nursing facilitiesSkilled nursing facilities
HospiceHospice
Medicare claims data elementsMedicare claims data elements
What data are availableWhat data are available DemographicsDemographics Service datesService dates DiagnosesDiagnoses ProceduresProcedures Hospital / PhysicianHospital / Physician
What data are not availableWhat data are not available Physiological measuresPhysiological measures Test resultsTest results Times of admission, procedures, etc.Times of admission, procedures, etc. MedicationsMedications
Medicare claims data coverageMedicare claims data coverage
National scopeNational scope
What patients will be represented?What patients will be represented? Patients enrolled in traditional (fee-for-Patients enrolled in traditional (fee-for-
service) Medicareservice) Medicare
What patients will not be represented?What patients will not be represented? Patients receiving care through the Veterans Patients receiving care through the Veterans
Health AdministrationHealth Administration Patients enrolled in Medicare managed care Patients enrolled in Medicare managed care
plansplans
Medicare claims data qualityMedicare claims data quality
Main pointMain point Reliability of specific claims data elements Reliability of specific claims data elements
depends on importance for reimbursementdepends on importance for reimbursement
Good data on…Good data on… Major proceduresMajor procedures HospitalizationsHospitalizations MortalityMortality
Inconsistent data on…Inconsistent data on… Comorbidities and illness severityComorbidities and illness severity Procedures with low reimbursement ratesProcedures with low reimbursement rates
Acquiring CMS claims dataAcquiring CMS claims data
All requests begin with ResDAC All requests begin with ResDAC (www.resdac.umn.edu)(www.resdac.umn.edu)
CostCost $15K per year of inpatient+denominator data$15K per year of inpatient+denominator data $20K per year of 5% data across all files$20K per year of 5% data across all files $30K+ per year of data for custom requests$30K+ per year of data for custom requests
Detailed approval processDetailed approval process Prepare request packet for ResDAC review (4-6 Prepare request packet for ResDAC review (4-6
weeks)weeks) Review by CMS privacy board (4 weeks)Review by CMS privacy board (4 weeks) Request processed by contractor (6-8 weeks)Request processed by contractor (6-8 weeks)
Preparing for CMS claims dataPreparing for CMS claims data
Make spaceMake space 16 GB for 100% denominator and inpatient files16 GB for 100% denominator and inpatient files 57 GB for 5% denominator, inpatient, outpatient, 57 GB for 5% denominator, inpatient, outpatient,
and carrier* filesand carrier* files
Manage expectationsManage expectations Time to process filesTime to process files Transforming raw claims into usable informationTransforming raw claims into usable information
Coding algorithmsCoding algorithmsCoding changesCoding changes
Learning curveLearning curve
The Learning CurveThe Learning Curve
CMS Data PublicationsCurtis, Hammill, et al
0 0 1
9
16
28
2005 2006 2007 2008 2009 2010
Claims dataClaims data
Derived from payment of bills Derived from payment of bills
Payor-centricPayor-centric
ExamplesExamples MedicareMedicare MedicaidMedicaid Thomson-ReutersThomson-Reuters United Health CareUnited Health Care
Commercial claims data elementsCommercial claims data elements
What data are typically availableWhat data are typically available DemographicsDemographics Service datesService dates DiagnosesDiagnoses ProceduresProcedures MedicationsMedications Hospital / PhysicianHospital / Physician
What data may not be availableWhat data may not be available Physiological measuresPhysiological measures Test resultsTest results
Commercial claims data coverageCommercial claims data coverage
National scopeNational scope
What patients will be represented?What patients will be represented? Individuals who are commercially insuredIndividuals who are commercially insured
What patients will not be represented?What patients will not be represented? The uninsuredThe uninsured Medicare managed care?Medicare managed care?
Commercial claims data qualityCommercial claims data quality
Similar to Medicare claims dataSimilar to Medicare claims data Reliability of specific claims data elements Reliability of specific claims data elements
depends on importance for reimbursementdepends on importance for reimbursement
Good data on…Good data on… Major proceduresMajor procedures HospitalizationsHospitalizations
Inconsistent data on…Inconsistent data on… MortalityMortality Comorbidities and illness severityComorbidities and illness severity Procedures with low reimbursement ratesProcedures with low reimbursement rates
Preparing for commercial claims dataPreparing for commercial claims data
CostCost $25-70K depending on size, scope of data $25-70K depending on size, scope of data
requestrequest
SizeSize 100 GB per year of data100 GB per year of data Analysis sample sizes will differ from advertised Analysis sample sizes will differ from advertised
sample sizessample sizes
Manage expectations!Manage expectations!
Registry dataRegistry data
Observational cohorts of patients undergoing Observational cohorts of patients undergoing specific treatments or having specific conditionsspecific treatments or having specific conditions
Purpose may be to assess…Purpose may be to assess… Quality of careQuality of care Provider performanceProvider performance Treatment safety/effectivenessTreatment safety/effectiveness
Of interest today are hospital-based registriesOf interest today are hospital-based registries
OPTIMIZE-HF registryOPTIMIZE-HF registry
Hospital-based quality improvement program Hospital-based quality improvement program and internet-based registry for heart failure.and internet-based registry for heart failure.
2002-2005: 50,000 patients; > 250 hospitals2002-2005: 50,000 patients; > 250 hospitals
Transitioned to GWTG-HF in 2005Transitioned to GWTG-HF in 2005
Registry data coverageRegistry data coverage
Only patients treated at participating hospitals will be Only patients treated at participating hospitals will be includedincluded+ All patients at these hospitals included regardless of All patients at these hospitals included regardless of
payorpayor– Participating hospitals may not be representative of Participating hospitals may not be representative of
hospitals nationwidehospitals nationwide% of group in selected states% of group in selected states
StateState US ElderlyUS Elderly Medicare FFSMedicare FFS OPTIMIZE-HFOPTIMIZE-HF
CaliforniaCalifornia 10.1%10.1% 7.7%7.7% 13.8%13.8%
FloridaFlorida 7.4%7.4% 7.0%7.0% 8.7%8.7%
MichiganMichigan 3.4%3.4% 4.0%4.0% 9.5%9.5%
New YorkNew York 6.6%6.6% 6.1%6.1% 3.5%3.5%
PennsylvaniaPennsylvania 5.2%5.2% 4.4%4.4% 6.7%6.7%
TexasTexas 6.0%6.0% 6.5%6.5% 5.4%5.4%
Registry data qualityRegistry data quality
Good data on…Good data on… Many of the things not included in Medicare Many of the things not included in Medicare
data:data:
Labs, medications, treatment timing, process Labs, medications, treatment timing, process measures, contraindicationsmeasures, contraindications
((if collectedif collected))
Inconsistent data on…Inconsistent data on… Post-hospitalization follow-up carePost-hospitalization follow-up care Outcomes, particularly long-termOutcomes, particularly long-term
Accessing registry dataAccessing registry data
Networking and partneringNetworking and partnering Many require that analyses be performed at Many require that analyses be performed at
selected analytical centers which may have long selected analytical centers which may have long queuesqueues
Approval process via steering or executive Approval process via steering or executive committeecommittee
NHLBI longitudinal cohort studiesNHLBI longitudinal cohort studies
Atherosclerosis Risk in Communities Study (ARIC)Atherosclerosis Risk in Communities Study (ARIC)
Cardiovascular Health Study (CHS)Cardiovascular Health Study (CHS)
Framingham Heart StudyFramingham Heart Study
Jackson Heart Study Jackson Heart Study
Multi-Ethnic Study of AtherosclerosisMulti-Ethnic Study of Atherosclerosis
Women’s Health InitiativeWomen’s Health Initiative
Cardiovascular Health Study (CHS)Cardiovascular Health Study (CHS)
Prospective, observational study of CV disease in the elderly (Washington Co. Maryland, Forsyth Co. NC, Sacramento Co. CA, and Pittsburgh, PA.)
Baseline exams occurred from 1989-90.
Minority cohort added at Year 5
Annual exams, with ‘major’ exams occurring at year 5 (1992-93), and year 9 (1996-97). Last exam was year 11 (1998-99).
5,201 participants at baseline; 687 additional minority participants 5,888
Cardiovascular Health Study data elementsCardiovascular Health Study data elements
What data are availableWhat data are available DemographicsDemographics Medical, personal historyMedical, personal history Physiological measures, test resultsPhysiological measures, test results QOL, depressionQOL, depression Cognitive functionCognitive function
What data are not availableWhat data are not available Service datesService dates ProceduresProcedures Hospital/physicianHospital/physician
Cardiovascular Health Study data qualityCardiovascular Health Study data quality
Main pointMain point Data collected are of high qualityData collected are of high quality
Good data regarding…Good data regarding… Cardiovascular risk factorsCardiovascular risk factors Cardiovascular endpointsCardiovascular endpoints General healthGeneral health
Limited data on…Limited data on… Non-cardiovascular risk factorsNon-cardiovascular risk factors Non-cardiovascular endpointsNon-cardiovascular endpoints
Accessing NHLBI cohort studiesAccessing NHLBI cohort studies
Via the NHLBI data repositoryVia the NHLBI data repository HIPAA identifiers, geography removedHIPAA identifiers, geography removed
Via Coordinating Center for identifiable dataVia Coordinating Center for identifiable data
SizeSize 20MB per year of data20MB per year of data
NHLBI-Medicare linked data setsNHLBI-Medicare linked data sets
CMS linked with…CMS linked with… CHS (1991-2004, 2005-2009 pending)CHS (1991-2004, 2005-2009 pending) Framingham (2000-2009 pending)Framingham (2000-2009 pending) Jackson Heart Study (2000-2009 pending)Jackson Heart Study (2000-2009 pending) Multi-Ethnic Study of Atherosclerosis (2000-Multi-Ethnic Study of Atherosclerosis (2000-
2009 pending)2009 pending) Atherosclerosis Risk in CommunitiesAtherosclerosis Risk in Communities Women’s Health InitiativeWomen’s Health Initiative
ConclusionConclusion
Large data sets aboundLarge data sets abound
Do yourself a favor…manage expectations!Do yourself a favor…manage expectations!
Contact InformationContact Information
Lesley CurtisLesley Curtis
Lesley.curtis@duke.edu Lesley.curtis@duke.edu
top related