1 Introduction to Stata SHRS, UQ 7 September 2010 Asad Khan
1
Introduction to Stata
SHRS, UQ7 September 2010
Asad Khan
2
Overview
• Sample Data
• Stata Interface
• Steps in a Stata Session
• Working Examples
• Dealing with Do-files
• Data Transformation
• Getting Help
3
Sample Data
Let‟s start with a dataset that contains responses to a self-
administered survey. This survey was conducted among
postgraduate students enrolled on a Biostatistics course
at the University of Sydney in late 1990s
– The questionnaire includes 15 questions with a
variety of response options (continuous, categorical)
– A total of 80 students completed the questionnaire
Suppose the data have been recorded in a Stata data file
(bioqdata.dta) that we want to use in this workshop.
4
5
Stata Interface
• Open the Stata program by selecting Stata from the Program menu
• In a Stata session, a number of windows are available to Stata users:
– Results window - shows the numeric output from the Stata commands entered
– Command window – this is where you enter Stata commands to perform analyses
– Variables window – lists the variables that are available in the current dataset
– Review window – lists the commands that have already been run
• Optional windows: Graph, Data editor, Do-file editor
6
Review Window Command Window
Variables Window Results Window
Graph
Window
Enter commands here!
7
Step 1: Open a log file to store results
Step 2: Load/enter your data into Stata
Step 3: Explore, manipulate and analyse
Step 4: Close the log file to save results
Steps in a Stata Session
8
Step 1: Open a log file to store results
9
At the beginning of a Stata session, it is recommended to
open a log file to save your commands and results, by
selecting this icon:
You can also open a log file by typing (in command window):
Stata: log using h:\myfile.log
Now you have a log file (myfile.log) that will record all your
commands and the (numeric) output that you see in the
results window
10
Step 2: Load/enter your data into Stata
11
• To open/load your Stata data-file (with .dta extension), click on this icon and browse for “bioqdata.dta”
Alternatively,
• Go to: File Open Browse for the Stata data-file “bioqdata.dta”
• You can also load your data-file into Stata by typing the command:
Stata: use h:\bioqdata.dta, clear
12
You can enter (/edit/view) your data in Stata using Data Editor (an optional window).
• To open a Data Editor window, click on this icon in the Stata window
• Data Editor can also be opened by simply typing:
Stata: edit
• Either enter your data or copy and paste data from a spreadsheet
• Changes in data are not permanent until you save them
13
Bioqdata in Data Editor
14
Step 3: Exploration, manipulation and analyses
15
• To view the contents of your loaded data file:
Stata: describe
• To view the variables in your loaded data file:
Stata: codebook
pulse2 int %8.0g chol byte %8.0g smoke byte %8.0g pt_ft byte %8.0g course byte %8.0g sleep float %9.0g children byte %8.0g marital byte %8.0g bthord byte %8.0g pulse1 byte %8.0g weight int %8.0g height float %9.0g sex byte %8.0g sexlabel age byte %8.0g sid byte %8.0g variable name type format label variable label storage display value size: 2,160 (99.8% of memory free) vars: 15 3 Sep 2010 11:48 obs: 80 Contains data from F:\Workshops\bioqdata.dta
16
• To view a particular variable, e.g. sex:
Stata: codebook sex
• To view observations from the loaded data file:
Stata: list sid age sex in 1/5
47 2 Female 33 1 Male tabulation: Freq. Numeric Label
unique values: 2 missing .: 0/80 range: [1,2] units: 1
label: sexlabel type: numeric (byte)
sex (unlabeled)
5. 5 43 Female 4. 4 46 Female 3. 3 40 Male 2. 2 29 Male 1. 1 25 Female sid age sex
17
• To draw a histogram of a variable, e.g.sleep:
Stata: histogram sleep
• To identify cases with sleep > 15 hrs:
Stata: list sid sleep if sleep>15
62. 62 28 46. 46 . sid sleep
0
.05
.1.1
5.2
Den
sity
0 5 10 15 20 25sleep
18
• To draw histograms by group using Menu bar, go to
Graph → Histogram → (Main → By → Density plots)
19
0
.02
.04
.06
.08
150 160 170 180 190 150 160 170 180 190
Male Female
Density
kdensity height
normal height
De
nsity
Height in cm
Graphs by Sex
These graphs can be obtained by typing the command:Stata : hist height, by(sex) normal kdensity
20
To create a new variable using Menu bar, go to
Data → Create or change variables → Create new variable
→
bmi = weight (in kg)/height (in m)2
21
• To generate BMI, type the command:
Stata : gen bmi=weight/(height/100)^2
• To recode into same variable, type the command:
Stata : recode bmi (min/18.4=1)(18.5/24.9=2)
(25/29.9=3)(30/max=4)
• To recode into different variable, type the command:
(a) Stata : recode bmi (min/18.4=1)(18.5/24.9=2)
(25/29.9=3)(30/max=4), gen(bmi_r)
(b) Stata : recode bmi (min/18.4=1 underweight) (18.5/24.9=2 "Normal weight") (25/29.9=3
"Over weight")(30/max=4 Obese), gen(bmi_n)
22
• To assign a variable label to pt_ft, type the command:
Stata : tab pt_ft
label var pt_ft "Type of candidature"
tab pt_ft
• To assign value labels to pt_ft, type the commands:
Stata : tab pt_ft
label define pt_ftlabel 1 “Part-time"
label define pt_ftlabel 2 “Fulltime", add
label values sex pt_ftlabel
tab pt_ft
23
• To examine whether the average height was different for male and female student population, go to:Statistics → Summaries, tables and tests → Classical tests of hypotheses → Two-group mean-comparison test
24
This output can also be obtained by typing the command:
Stata: ttest height, by(sex)
Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Ho: diff = 0 degrees of freedom = 78 diff = mean(Male) - mean(Female) t = 7.4173 diff 13.19504 1.77895 9.653417 16.73665 combined 80 168.5812 1.136368 10.16399 166.3194 170.8431 Female 47 163.1383 1.020681 6.997439 161.0838 165.1928 Male 33 176.3333 1.548867 8.897565 173.1784 179.4883 Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] Two-sample t test with equal variances
25
• To examine the relationship between weight and height
through a scatter plot, type the command:
Stata: scatter weight height || lfit weight height
|| lowess weight height5
01
00
150
150 160 170 180 190height
weight Fitted values
lowess weight height
26
• To obtain correlation between weight and height, type
Stata: corr weight height
• To construct a simple linear regression model for weight
(as DV) and height (as IV)
Stata: regress weight height
Coefficient estimates
Analysis of variance Fit statistics
height 0.5264 1.0000 weight 1.0000 weight height
_cons -88.47466 28.67081 -3.09 0.003 -145.5539 -31.39545 height .9283337 .1697668 5.47 0.000 .5903541 1.266313 weight Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 25379.95 79 321.26519 Root MSE = 15.337 Adj R-squared = 0.2679 Residual 18346.5804 78 235.212569 R-squared = 0.2771 Model 7033.36961 1 7033.36961 Prob > F = 0.0000 F( 1, 78) = 29.90 Source SS df MS Number of obs = 80
27
• If you want to use dialogue box to run your analysis, you can do so by using a Stata command db
• To obtain a dialogue box to run regression, type:
Stata: db regress
• To obtain a dialogue box to draw a histogram, type:
Stata: db histogram
28
• Frequency tables
Stata: tab1 sex chol pt_ft
• Graphs
– Stem-and-leaf plots
Stata: stem sleep
– Box plots by a variable
Stata: graph box pulse1, by(sex)
• Proportion (& CI)
Stata: proportion pt_ft
• Summary (descriptive) statistics
Stata: sum pulse2, detail
Working Examples
29
• To generate a variable
Stata: gen sleep1=sleep if sleep<=15
• To assign a value to a variable
Stata: replace sleep=7 if sleep==28
• To generate a dichotomous (1, 0) variable
Stata: gen obese=(bmi>=30)
• To generate dummy variables for each value of a variable
Stata: tab marital, gen(marital_dmy)
• To keep variables in data file
Stata: keep age sex bmi
Stata: keep age sex bmi if sex==2
30
• To test whether the average sleep was different for male and female students population:
Stata: tab sex, sum(sleep)
ttest sleep, by(sex)
• To assess equality of two independent group means using summary data:
Stata: ttesti 12 25.34 2.05 15 24.94 2.44
• To investigate whether pulse rate has increased after one minute exercise, using a paired t-test:
Stata: ttest pulse1=pulse2
• To examine whether average age is different for students in different courses, using one-way ANOVA:
Stata: oneway age course
Also try this with dialogue box using db command:
Stata: db oneway
31
… keep working on your analyses
32
In a Stata session, you can ask Stata –
• Not to pause or display the –more– message by typing:
Stata : set more off
• To temporarily stop logging by typing the command:
Stata : log off
• To resume logging by typing the command:
Stata : log on
To save time in editing, consider suspending (log off)
the log when it is not necessary (log off to resume).
33
Step 4: Close your log file to save results
34
• After you are done with your analyses, you need to close
the log file by clicking on this icon
and then select „Close log file‟
• You can close your log file by typing the command:
Stata: log close
• To view your log file, click on this icon
and select „View snapshot of log‟
• To exit from Stata, type the command:
Stata: exit
35
• Once you close your log file, you can view the log using
any text editor or word processor and print the log as you
would print any text file.
• If your log file has .smcl extension, you can translate that
into a text file using the command:
Stata : translate h:\mylog.smcl h:\mylog.txt
• To print a graph, select Print from File menu in a
Graph window
• You can place your graph into a Word document by selecting Copy from Edit menu from Graph window and
then Paste it into your document.
36
In a Stata session, you can write and save your Stata
commands in a file, called do-file (like a syntax file in
SPSS).
Do-file is simply a text file that contains a list of Stata
commands that you can use during a Stata session or
can save them for later use.
Any command which can be executed from the command
line can be placed (copy and paste) in a do-file.
Do-file allows you to execute several commands at once
and also makes it easier to identify and correct mistakes
Dealing with Do-files
37
• To open a Do-File Editor, select this icon in Stata window
• A new do-file can be created by typing the command:
Stata: doedit
• To reopen a do-file (say, ana_1.do), type the command:
Stata: doedit h:\ana_1.do
• To run the do-file (ana1.do), type the command:
Stata: do h:\ana_1.do
• A do-file is typically saved with .do extension.
38
An example of a do-file
To run commands in do-
file, click on this icon
39
Data Transformation
Stat/Transfer, a file transfer utility software, makes it easy
to move data among the different spreadsheets and
statistical programs by providing a fast and reliable way
to convert data files from one format to another.
We can use Stat/Transfer to convert SPSS or Excel data-
files (e.g. bioqdata.sav or bioqdata.xls) into a Stata data-
file with .dta extension (e.g. bioqdata.dta)
NB: Stat/Transfer is available for Windows, Mac OS X, and Unix.
40
Open the software by choosing Stat/Transfer from the
Program list
41
You will need to select the input file type, the file itself,
the output file type and the output file.
Finally, click the "Transfer" button to complete the conversion
42
Stat/Transfer will show you the progress it's making as it converts the file
Ref: Keown L (2004) Producing efficient data files using Stat/Transfer,
Information and Technical Bulletin 1(1):13-19.
43
Import Excel Data into Stata
• Launch Excel and read in your Excel file (mydata.exl)
• Save your Excel file as an XML Spreadsheet 2003
• Launch Stata and go to:
File > Import > XML Data > Browse for “mydata.xml”
– Tick “Excel Spreadsheet” box
– Tick “First row in variable names” box
– Click „OK‟
You can use Stata command to import Excel spreadsheet:
Stata: xmluse "H:\mydata.xml", doctype(excel) firstrow
• Save the data as a Stata dataset using the save as
command
44
Getting Help
Within Stata:
• To get help in a particular command (e.g. regression)
help regression
• To obtain all references to a topic (e.g. logistic)
search logistic
• To find relevant commands on a topic (e.g. anova)
findit anova
Online Stata support @ www.stata.com/support
AU/NZ distributor for Stata & Stat/Transfer
www.survey-design.com.au
– Stata GradPlan arrangements for students
http://www.survey-design.com.au/gradplan.html
45
Some References
• Acock A (2008) A Gentle Introduction to Stata, 2nd edition,
Stata Press.
• Hamilton L (2009) Statistics with Stata, Version 10, 7th
edition, Stata Press
• Juul S (2008) An Introduction to Stata for Health
Researchers, 2nd edition, Stata Press
• Hills M & De Stavola B (2007) A Short Introduction to
Stata for Biostatistics, 1st edition, Timberlake Consultants
• Mitchell M (2008) A Visual Guide to Stata Graphics, 2nd
edition, Stata Press
• Stata Manuals (Release 10)
46
Thank you
Comments
Questions