A strategy to promote access to HDSS data for researchers ... · • Public access databasePublic access database for researchers andfor researchers and students explore HDSS data
Post on 27-Sep-2020
2 Views
Preview:
Transcript
A strategy to promote access to HDSS data for
researchers and scientistsresearchers and scientists
the 1:10 Sample Datasetthe 1:10 Sample Dataset
Agincourt HDSS-Wits SPHN i bi U b HDSS APHRC N i biNairobi Urban HDSS, APHRC, Nairobi
Institute of Behavioral Science, University of Colorado, Boulder
OutlineOutline
• Why develop a sample database?• What is the 1:10?What is the 1:10?• Using the 1:10 for teaching
Wh t h l d t b hi d?• What has a sample database achieved?• Challenges
Access to HDSS data: h ll f ichallenges for sites
Extracting datasets takes time and• Extracting datasets takes time and resourcesProject specific datasets require clear• Project specific datasets require clear, knowledgeable data requestsUser friendly documentation needed eg• User-friendly documentation needed eg comprehensive data dictionaryQuality of research based on HDSS data• Quality of research based on HDSS data needs to be ensured
• Protecting confidentiality of participants and• Protecting confidentiality of participants and small area communities
Training issuesTraining issues
• Making data available is not enough –Making data available is not enoughtraining and support needed to use complex longitudinal informationcomplex longitudinal information
• Universities need datasets for grad• Universities need datasets for grad student research – masters and doctoral
• Faculty may not have skills to analyze longitudinal data (or supervise students)longitudinal data (or supervise students)
Challenges• To increase data access without increasing
demand for individual, tailored datasets• To provide training on longitudinal data
management and analysis so that students, their i f lt HDSS d tsupervisors, faculty can use HDSS data
A responseA response• 1:10 Sample Database
– conceptualized and developed through collaboration between Agincourt and Nairobi HDSS, Wits U, & University of Colorado at Bouldery
Goals of 1:10 Sample SatabaseGoals of 1:10 Sample Satabase
• Public access database for researchers andPublic access database for researchers and students explore HDSS data
• Improve quality of data requests to site• Improve quality of data requests to site• Provide experiential training courses on use
f l it di l d tof longitudinal data
therebythereby • Enhancing research & training through
increasing access to HDSS dataincreasing access to HDSS data
What is the 1:10 Sample Database?p
• Agincourt / Nairobi Sample Database: i f f ll d t b t i d fversion of full database stripped of
identifiers
What is the 1:10?What is the 1:10?
• Includes 10% of geographic locations in each village; and full information on all individuals in each location over full period of data collectioneach location over full period of data collection– thus retains relational, temporal and data integrity of
full database
• Maintains structure of full database but simplified– Observations limited to one per year; single date– More complex variables removed– Some adjustment – ‘normalization’ – to obtain
representivity of the full datasetp y
Validated sampleValidated sample
• Created and compared counts of births, deaths, in- and out-migrations, and household size at population and sample level to assurepopulation and sample level to assure representativeness of sample
• Sample means of these counts fall within one standard deviation of full database means:standard deviation of full database means:– Rates using event counts reasonably comparable
between the sample and full databases
Advantages 1:10 Sample DatabaseAdvantages 1:10 Sample Database
• Subsamples can be easily extractedSubsamples can be easily extracted• Anonymized version can be updated
regularlyregularly• User-friendly documentation:
– study setting and publications– database structure– data dictionary– standard agreement on use
Documentation on websiteDocumentation on website• ADSS 1in10 Dataset Presentation.pptpp• AHPU Data Dictionary v2(Draft).pdf• AHPU.1_10.dictionary.pdf
AHPUD U A df• AHPUDataUseAgreeme.pdf• DM-DataRequest-v7.doc• DM SampleRequest v2 doc• DM-SampleRequest-v2.doc• Display forms• OneInTenDataset20070416.zipO e e ataset 00 0 6 p• SD-TUTORIAL2-V1(ODBC).pdf• SD_TUTORIAL1-V1(CREATE DATASET).pdf
Controls on databaseControls on database
• Signed data agreement in order to useSigned data agreement in order to use datasets from sample database
• Request to publish on work developed• Request to publish on work developed using sample database:
l f f ll t il d d t t– apply for a full tailored dataset– sign a confidentiality agreement
l il d d– re-run analyses on tailored dataset
Agincourt Data Request form• Project Name: <title>• Sample Name: <title>• Auther: <authers>
D t <d t >
• Variables:• <list variable from data dictionary that need to be
include with the sample>• <• Date: <date>
• Version: <version>• Purpose (condensed to protocol): • <Condensed protocol describing project>• Analytical Plan:• <analysis plan the justifies the data requested>
• <• Village• ExternalID• Name• Surname• Strata<analysis plan the justifies the data requested>
• Sample requirements• Sample Population: • <define population with bulleted specific criteria for each
strata• Criteria 1
C it i 2
• Guardian Name• Etc…• >• <list variable from data dictionary>• http://www.npongo.com/agincourt/AHPUDataDic.zip
i d t d t t• Criteria 2• Criteria 3• Criteria 4• >• Other Sample Considerations:• <describe other considerations for drawing the sample
• <sign and return data use agreement>• www.npongo.com\agincourt\AHPUDataAgreeme.pd
f• Data Cleaning: • <Description of any dirt data found,>• Case: <description of specific case of dirty data>• <describe other considerations for drawing the sample
i.e. village clustering or study logistic concerns, exclusion from other studies>
• Method for Drawing Sample:• <describe the procedure for drawing the sample>• Unit of analysis:
<I di id l H h ld Vill Sit Mi d>
Case: description of specific case of dirty data• Logic: <logic used the identifier dirty data>• Code: <programmatic code used to identify
dirty data>• Code Type: <ie SQL, Stata, SAS etc.>
• <Individual, Household, Village, Site, Mixed>
AGINCOURT STUDENT DATA AGREEMENT
Using 1:10 for training
F l it di l d t t & l i• Focus: longitudinal data management & analysis
• Intensive, experiential training– 2 weeks residential, 1 week self-study– Provide context of site: exposure to range of
research conducted using diverse study designs– Use of Agincourt / Nairobi database, data
management, statistical analysish d i i l it di l th d• hands-on experience using longitudinal methods
Guided exercises using HDSS data
M i l it di l• Managing longitudinal data using STATA and Microsoft Access
• Introduce students to statistical analysis using STATASTATA– Descriptive analysis of
fertility and mortality trends– Event history analysis– Hazard modeling– etc
Experiential learning: Group projects
• Academic skills training– Developing research questions – Constructing data sets– Preliminary analysise a y a a ys s– Literature reviews– Presentation of work
Write up of research paper– Write-up of research paper• Research
question/hypotheses/objective• Analysis• Analysis• Results• Discussion• Interpretation Student groups at work• Interpretation Student groups at work
in computer laboratory
Examples of student group projectsExamples of student group projects• Correlates of out-migration in
Agincourt HDSSAgincourt HDSS
• Mortality and Food Security: A Multi-Pathway Association
• Communicable and non-communicable causes of death in Agincourt HDSS: Patterns and gtrends from 2000-2005
• Parity progression in the context of fertility decline in rural South Africafertility decline in rural South Africa
• Fertility in Agincourt: Does the education level of female
Project presentations
participants matter?
What has the 1:10 achieved?What has the 1:10 achieved?Before
h d iAfter
1 10 l d b d f ll• researcher or student supervisor submits proposal &data request
• Agincourt team review & approve• Agincourt data specialist writes
• 1:10 sample database and full documentation available on a password-protected website
• Researchers/students can freely Agincourt data specialist writes unique extraction script to create tailored dataset
yexplore data
• Researchers/students thus able to better specify full data request
Bottleneck - time consumingCostly – time, resourcesUnderspecified data requests due to
1:10 reduces work for Agincourt data team
Better specified data requestsUnderspecified data requests due to inadequate knowledge
Feelings of frustration & pressure (HDSS); neglect (researchers, t d t )
p qFaster approval of outside workFaster production of tailored
datasetsM t d t h j tstudents) More student research projects
Increase in masters students corresponds with 1:10 sample databasecorresponds with 1:10 sample database
Masters Enrollment (1996 - 2007)
7
8
( )
1 10 d t b
5
6
7
er
1:10 database available in 2006
3
4
Num
be
0
1
2
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
YearRegistered
Challenges (1)
• 1:10 useful to explore database… but not ppopular for final student research report– can request full dataset if results promisingca equest u dataset esu ts p o s g
• Availability does not ensure researchers• Availability does not ensure researchers, students, supervisors able to use data
courses on longitudinal data management and– courses on longitudinal data management and analysis needed
– Colorado-Wits-Nairobi experience useful– Colorado-Wits-Nairobi experience useful
Challenges (2)Challenges (2)
• Usability of a relational database - differences in disciplinary expertise;differences in disciplinary expertise; transforming to flatfiles
• Expanding the 1:10 model to other HDSS sites in INDEPTHsites in INDEPTH – requires investment in preparation of 1:10 and
documentationdocumentation
What have we learned?What have we learned?
• Training needed for students andTraining needed for students and researchers to use HDSS database
• Experience needed to produce good dataExperience needed to produce good data request
1:10 sample database useful in meeting these needsthese needs
Experience can be transferred to other HDSS sitesHDSS sites
top related