Top Banner
Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health [email protected]
31

Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health [email protected].

Dec 27, 2015

Download

Documents

Daniela Wilkins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

StataWorkshop #1

Chiu-Hsieh (Paul) HsuAssociate Professor

College of Public [email protected]

Page 2: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Outline

• Do files• Data entry• Data management• Data description• Estimation: Confidence Interval• Hypothesis testing

Page 3: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Do files

• Stata programs– Easy to add or skip comments– One click/command can run the whole

program

• Reproducible– Don’t need to retype all of the

commands

• Interactive work vs. do files

Page 4: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Data Entry

Page 5: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Stata Commands1. cd: Change directory2. dir or ls: Show files in current directory3. insheet: Read ASCII (text) data created by a

spreadsheet4. infile: Read unformatted ASCII (text) data5. infix: Read ASCII (text) data in fixed format6. input: Enter data from keyboard7. save: Store the dataset currently in memory on

disk in Stata data format8. use: Load a Stata-format dataset9. count: Show the number of observations 10. list: List values of variables11. clear: Clear the entire dataset and everything

else12. memory: Display a report on memory usage13. set memory:Set the size of memory

Page 6: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Ways to enter data• Change the directory to the folder you like

• cd c:\Stata

• Common separated values (.csv) format files• insheet using test.csv,clear (with variable names)• infile gender id race ses schtyp str10 prgtype read

write math science socst using hs0.raw, clear (without variable names)

• Stata (.dta) files• use test

• Type in data one by one• input id female race ses str3 schtype prog read

write math science socst• End (when you are done)

• What’s in the dataset?• describe• list

Page 7: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Data Management

Page 8: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Stata Commands1. pwd: show: current directory (pwd=print working directory)2. keep if: keep observations if condition is met3. Keep: keep variables or observations4. drop: drop variables or observations5. append: append a data file to current file6. sort: sort observations7. merge: merge a data file with current file8. codebook: show codebook information for file9. label data: apply a label to a data set10. order: order the variables in a data set11. label variable: apply a label to a variable12. label define: define a set of a labels for the levels of a categorical

variable13. label values: apply value labels to a variable 14. encode: create numeric version of a string variable15. rename a variable16. recode: recode the values of a variable17. notes: apply notes to the data file18. generate: creates a new variable19. replace: replaces one value with another value20. egen: extended generate - has special functions that can be used

when creating a new variable 

Page 9: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Merging two datasets • test1 and test2 have the same variables but

different subjectsuse test1 append using test2

save test12• test3 and test4 have the same subjects and

only share a link variable, e.g. IDuse test3, clearsort idsave test3,replaceuse test4, clearsort idsave test4,replaceuse test3merge id using test4save test34

Page 10: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Play with Variables • use test • label variable gender "Male"• rename gender male• gen female=1-male• order id male female• encode prgtype, gen(prog)• codebook prog• keep if female==1 (delete male)• drop female

Page 11: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Dummy Variables • A categorical variable with K possible levels• Need K-1 dummy variables (one as the

reference)• Dummy variables are convenient for

regression analysis• How to create dummy variables?• Use generate command

– gen female=1-gender• Use tabulate command

– tabulate gender, gen(male)• Use factor variables

– xi i.gender– list,clean

Page 12: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Data Description

Page 13: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Stata Commands1. describe: describe a dataset2. log: create a log file3. summarize: descriptive statistics4. tabstat: table of descriptive statistics5. table: create a table of statistics6. stem: stem-and-leaf plot7. graph: high resolution graphs8. kdensity: kernel density plot9. histogram: histogram for continuous and

categorical variables10. tabulate: one- and two-way frequency tables11. correlate: correlations12. pwcorr: pairwise correlations

Page 14: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Example: raw data• log using test.txt, text replace • use lead• describe• sum maxfwt, detail• histogram maxfwt, by(Group) normal• graph box maxfwt, by(Group)• stem maxfwt• kdensity maxfwt• tab Group sex• cor ageyrs maxfwt,sig• cor ageyrs maxfwt if sex==1 (male only),sig• pwcorr ageyrs maxfwt fwt_r,sig• log close

Page 15: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Example: grouped data• use group (a grouped dataset)• sum age [fweight=freq],detail• hist age [fweight=freq] • Pretty much the same as raw data. Just need

to specify the weight.

Page 16: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Some Review

• Use both location and spread measures to summarize a dataset

• Mean, standard deviation and range are easily affected by extreme observations

• Median and inter-quartile range are less affected by extreme observations

• Coefficient of variation (standard deviation divided by mean) removes the scale effect.

Page 17: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Estimation

Page 18: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Estimation of Parameters

• Binomial distribution– Parameters n (usually known) and p– How to estimate p?

• Poisson distribution– Parameter λ

– How to estimate λ?• Normal distribution

– Parameters µ and σ2

– How to estimate µ and σ2?– σ2 unknown t distribution

Page 19: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Stata Commands

• Raw data– ci [varlist] [if] [in] [weight] [, options]

• confidence intervals for mean, proportion (b) and count (p)

• Summarry statistics– cii #obs #mean #sd [, ciin_option]

• Normal

– cii #obs #succ [, ciib_options]• Binomial

Page 20: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Examples• gen female=sex-1• tab female Group• What’s the average maxfwt for females in

the exposed group? – ci maxfwt if female==1 & Group==2 (raw data)– sum maxfwt if female==1 & Group==2– cii 16 59 20.887,level(95) (summary statistics)

• What’s the proportion of females in the exposed group?

– gen expose=Group-1– ci expose if female==1,b– cii 48 16,level(95)

Page 21: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Hypothesis Testing

Page 22: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Stata Commands (mean)

• ttest– Raw data

• ttest varname == # [if] [in] [, level(#)]• ttest varname1 == varname2 [if] [in], unpaired

[unequal welch level(#)]• ttest varname1 == varname2 [if] [in] [,

level(#)]• ttest varname [if] [in] , by(groupvar) [options1]

– Summarry statistics• ttesti #obs #mean #sd #val [, level(#)]• ttesti #obs1 #mean1 #sd1 #obs2 #mean2 #sd2

[, options2]

Page 23: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Examples

• One sample– Is the average maxfwt for females in the exposed

group significantly lower than 45? • ttest maxfwt==45 if female==1 & Group==2• ttesti 16 59 20.887 45 (summary statistics)

• Two samples– Do females have a higher average maxfwt than

males in the exposed group?• ttest maxfwt if Group==2, by(female)• sum maxfwt if female==0 & Group==2• ttesti 16 59 20.887 30 60.167 27.28

Page 24: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Stata Commands (variance)

• sdtest– Raw data

• sdtest varname == # [if] [in] [, level(#)]• sdtest varname1 == varname2 [if] [in] [,

level(#)]• sdtest varname [if] [in] , by(groupvar)

[level(#)]

– Summarry statistics• sdtesti #obs {#mean | . } #sd #val [, level(#)]• sdtesti #obs1 {#mean1 | . } #sd1 #obs2

{#mean2 | . } #sd2 [, level(#)]

Page 25: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Examples

• One sample– Is the variance of maxfwt for females in the

exposed group significantly greater than 100? • sdtest maxfwt==10 if female==1 & Group==2• sdtesti 16 59 20.887 10 (summary statistics)

• Two samples– Do females have a greater variation in maxfwt than

males in the exposed group?• sdtest maxfwt if Group==2, by(female)• sum maxfwt if female==0 & Group==2• sdtesti 16 59 20.887 30 60.167 27.28

Page 26: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Stata Commands (proportion)

• prtest– Raw data

• prtest varname == #p [if] [in] [, level(#)]• prtest varname1 == varname2 [if] [in] [,

level(#)]• prtest varname [if] [in] , by(groupvar) [level(#)]

– Summarry statistics• prtesti #obs1 #p1 #p2 [, level(#) count]• prtesti #obs1 #p1 #obs2 #p2 [, level(#) count]

Page 27: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Examples

• One sample– Is it more than 50% of females in the exposed

group? • prtest expose==0.5 if female==1• prtesti 48 0.3333333 0.5

• Two samples– Are there more females in the exposed group than

the control group?• prtest female, by(expose)• tab expose female, r• prtesti 78 0.4103 46 0.3478

Page 28: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Power and Sample Size

Page 29: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Stata Command (sample size)

• One sample– continuous

• sampsi μ0 μ1, sd(.) p(.) a(.) onesam• sampsi 3500 3800, sd(420) p(.9) onesam

– Binary proportions• sampsi p0 p1, p(.) onesam• sampsi 0.4 0.25, p(0.9) onesam

• Two samples– continuous

• sampsi μ1 μ2, p(.) sd1(.) sd2(.) a(.)• sampsi 132.86 127.44, p(0.8) sd1(15.34) sd2(18.23)

– Binary proportions• sampsi p1 p2, p(.) • sampsi 0.4 0.25, p(0.9)

Page 30: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Stata Command (power)• One sample

– continuous• sampsi μ0 μ1, sd(.) n(.) a(.) onesam• sampsi 84.4 90.1, sd(10.3) n(5) onesam onesided

– Binomial proportion• sampsi p0 p1, n1(.) onesam• sampsi 0.25 0.4, n1(100) onesam

• Two samples– continuous

• sampsi μ1 μ2, n1(.) n2(.) sd1(.) sd2(.) a(.)• sampsi 9 14, n1(100) n2(100) sd1(15.34) sd2(18.23)

– Binomial proportions• sampsi p1 p2, n1(.) n2(.)• sampsi 0.4 0.25, n1(100) n2(150)

Page 31: Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu.

Useful links

• http://www.ats.ucla.edu/stat/stata/• Once the D2L site is created, all of

the handouts and related materials will be posted on the D2L site.