Data Management by Shahzad Asghar Arain

Post on 30-Oct-2014

66 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Using EpiData and SPSSShahzad Asghar Arain

Shahzad.cdcu@gmail.comCell 92 312 514 9114

http://shahzadasghar.info

ReferencesPublic domain (pdf) book on data management:

Bennett, et al. (2001). Data Management for Surveys and Trials. A Practical Primer Using EpiData. The EpiData Documentation Project. : http://www.epidata.dk/downloads/dmepidata.pdf

EpiData Association Website: http://www.epidata.dk/

Importing raw data into SPSS: http://www.ats.ucla.edu/stat/spss/modules/input.htm

Data ManagementPlanning data needsData collectionData entry and controlValidation and checkingData cleaning and variable transformationData backup and storageSystem documentationOther

Types of Data Base Management Systems (DBMSs)

Spreadsheets (e.g., Excel, SPSS Data Editor) Prone to error, data corruption, & mismanagement Lack data controls, limited programmability Suitable only for small and didactic projects Also good for last step data cleaning

Commercial DBMS programs (e.g., MySQL,Oracle, Access) Limited data control, good programmability Slow & expensive Powerful and widely available

Public domain programs (e.g., EpiData, Epi Info) Controlled data entry, good programmability Suitable for research and field use

We will use two platforms:EpiData

controlled data entry data documentationexport (“write”) data

SPSS import (“read”) dataanalysis reporting

What is EpiData ? EpiData is computer program (small in size

1.2Mb) for simple or programmed data entry and data documentation

It is highly reliable It runs on Windows computers

Runs on Macs and Linus with emulator software (only)Interface

pull down menus work bar

History of EpiInfo & EpiData 1976–1995: EpiInfo (DOS program) created by

CDC (in wake of swine flu epidemic)Small, fast, reliable, 100,000+ users worldwide

1995–2000: DOS dies slow painful death2000: CDC releases EpiInfo2000

Based on Microsoft Jet (Access) data engineLarge, slow, unreliable (resembled EpiInfo in name only)

2001: Loyal EpiInfo user group decides it needs real “EpiInfo for Windows”Creates open source public domain program Calls program “EpiData”

Goal: Create & Maintain Error-Free DatasetsTwo types of data errors

Measurement error (i.e., information bias) – discussed last couple of weeks

Processing errors = errors that occur during data handling – discussed this week

Examples of data processing errorsTranspositions (91 instead of 19)Copying errors (O instead of 0)Additional processing errors described on p.

18.2

Avoiding Data Processing ErrorsManual checks (e.g., handwriting legibility)

Range and consistency checks* (e.g., do not allow hysterectomy dates for men)

Double entry and validation* Operator 1 enters dataOperator 2 enters data in separate fileCheck files for inconsistencies

Screening during analysis (e.g., look for outliers)

* covered in lab

Controlled Data EntryCriteria for accepting & rejecting dataTypes of data controls

Range checks (e.g., restrict AGE to reasonable range)

Value labels (e.g., SEX: 1 = male, 2 = female)Jumps (e.g., if “male,” jump to Q8)Consistency checks (e.g., if “sex = male,” do

not allow “hysterectomy = yes”)Must entersetc.

Data Processing Steps1. File naming conventions2. Variables types and names3. QES (questionnaire) development4. Convert .QES file to .REC (record) file 5. Add .CHK file 6. Enter data in REC file7. Validate data (double entry procedure)8. Documentation data (code book) 9. Export data to SPSS 10. Import data into SPSS

Filenaming and File Managementc:\path\filename.extA web address is a good example of a filename,

e.g., http://www2.sjsu.edu/faculty/gerstman/StatPrimer/data.pptSome systems are case sensitive (Unix)

Others are not (Windows) Always be aware of

Physical location (local, removable, network) Path (folders and subfolders) Filename (proper) Extension

Demo Windows Network Explorer: right-click Start Bar > Explore

ExtensionExtension Software programSoftware program.qes.qes EpiInfo/EpiData questionnaireEpiInfo/EpiData questionnaire.rec.rec EpiInfo/EpiData records (data)EpiInfo/EpiData records (data).chk.chk EpiInfo/EpiData check (controls & labels)EpiInfo/EpiData check (controls & labels).not.not EpiData notes (data documentation)EpiData notes (data documentation).sav.sav SPSS permanent data fileSPSS permanent data file.sps.sps SPSS syntax file (program)SPSS syntax file (program).txt.txt Generic (flat) text dataGeneric (flat) text data.htm.htm Web BrowserWeb Browser.doc.doc Microsoft WordMicrosoft Word.xls.xls Microsoft ExcelMicrosoft Excel

Selected EpiData Variable Types

Variable TypeVariable Type ExamplesExamplesTextText _ _

<A ><A >NumericNumeric ##

##.###.#DateDate <mm/dd/yyyy><mm/dd/yyyy>

<dd/mm/yyyy><dd/mm/yyyy>Auto IDAuto ID <IDNUM><IDNUM>Sondex (sanitized)Sondex (sanitized) <S ><S >

EpiData Variable NamesVariable name based on text that occurs

before variable type indicator codeEpiData variable naming default vary

depending on installation Create variable names exactly as specified

To be safe, denote variable names in {curly brackets}

For example, to create a two byte numeric variable called age, use the question:

What is your {age}? ##

Demo / Work AlongCreate QES file [demo.qes]Convert QES to REC [demo.rec]Create CHK file [demo.chk]Create double entry file [demo2.rec]Enter data Validate data

FnameFname LnameLname DOBDOB SEXSEX DEATHAGEDEATHAGE

JohnJohn SnowSnow 3/15/18133/15/1813 11 4545

GeorgeGeorge OrwellOrwell 6/25/19036/25/1903 11 4646

CodebooksContain info that helps users decipher

data file content and structureIncludes:

Filename(s)File location(s)Variable namesCoding schemesUnits Anything else you think might be useful

EpiData codebook generators

File Structure Codebook

Full codebook contains descriptive statistics (demo)

Notice descriptive statistics

Conversion of Data FileRequires common intermediate file formatExamples of common intermediate files

.TXT = plain text .DBF = dBase program.XLS = Excel

StepsExport .REC file .TXT fileImport .TXT file into SPSS Save permanent SAV file

Plain (“raw”) TXT dataplain ASCII data formatno column demarcationsno variable namesno labels

tox-samp.txttox-samp.txt tox-samp.nottox-samp.not

SPSS Data Export / Import

TXT(raw data)

REC

SPS(syntax)

SAV

Lines beginning with * are comments (ignored by command interpreter)

Next set of commands showfile location and structure via SPSS command syntax

Labels being importedinto SPSS

Delete * if you want this command to run

Ethics of Data KeepingConfidentiality (sanitized files – free of

identifiers)Beneficence EquipoiseInformed consent (To what extent?)Oversight (IRB)

top related