STATA TUTORIAL: LAB 1
STATA TUTORIAL: LAB 1
1. STATA windows
The command window The viewer/results window The review of commands window The variable window
2. Working with STATA
A. Opening DataB. Using a “log” fileC. Useful CommandsD. Using a “do” file
A. Opening Data
Shows you your data Check this frequently, especially after
commands you are unsure about
A. Opening your data
If your data is in STATA format, then: Go to “File”>”Open”>Browse Location
where data stored>double click In Command window type: use “Fill In Correct Path Name\filename.dta”
Practice with “Wage1.dta”
A. Opening your data-Data editor/browser Data editor/data browser shows you your
data Go to “Window”>”Data Editor” Click on “Data Editor” or “Data Browser” icons
(editor: can modify data by typing in cell...like Excel; browser: locked, so can’t make changes)
Good to look at data when load data or after commands so that can understand structure of data.
A. Opening your data-Variable Window
Now that you have data loaded, you can see the variables that are included in the data listed in the variable window.
Name...name of variable Label...description of what variable is Type/Format...how STATA stores the
variable format Click on variable and it appears in
command window.
A. Opening your data-What do the variables look like? wage educ exper tenure nonwhite female married numdep smsa northcen
south west construc ndurman trcommpu trade services profserv profocc clerocc servocc lwage expersq tenursq
What values do they take? Wage...tenure, numdep are actual #’s Nonwhite...servocc take values of 0 or
1...qualitative measures of some personal characteristics
Lwage...tenursq are transformations of other variables (ln, square)
A. Opening your data (advanced)
If your data is a comma delimited file: insheet using “filename.txt”
If your data is a raw data file: It must have a dictionary file and you must
use the “infile” command infile using “dictionaryname.dct” dictionary file will refer to data that has a
“.dat” or “.raw” extenstion
B. The “log” file The log file is an “output file” Creates and saves a log with all the actions
performed by STATA and all the results How to open/close?
Go to “File”>“log”>“begin” Go to “File”>”log”>”close”
How to view it later? Go to“File”>“log”>“view”, and search for
your filename, keeping in mind it has extension “.log”
C. Useful Commands
“describe”: STATA will list all the variables, their labels,
types, and tell you the # of observations Two types of variables:
1. Numerical2. String (usually appear in red in the data
browser) You can convert a string variable to
numerical using the “destring” command: ie. “destring var1, replace” or “destring var1, force replace”
C. Useful Commands
“summarize, sum, summ” tells STATA to compute summary statistics
(mean, standard deviations, and so forth) for all variables
useful to identify outliers and get an idea of your data
i.e. summarize (will do all variables) i.e. summarize wage educ (just does wage
and educ..note, no “,” between variables)
C. Useful Commands
How many observations are there? What is the average value of wage? What is the min and max of tenure?
C. Useful Commands
“tabulate, tab” Shows the frequency and percent of each
value of the variable in the dataset i.e. tabulate tenure i.e. tab wage (long list, to display all press
space bar) i.e. tab educ female (gives education by
gender)
C. Useful Commands
“generate, gen” Creates a new variable gen weeklywage=wage*40 tab weeklywage gen prevexper=exper-tenure gen lwage=ln(wage)...gen
newlwage=ln(wage) gen expersp=exper*exper or gen
expersq=(exper)^2
C. Useful Commands
“if” command allows you to use only a portion of the observations tab wage if female==1 sum exper if educ>=13 gen expermomwkid=exper-1 if female==1 gen expermomwkid=exper-1 if female==1
& numdep!=0
C. Useful Commands
“reg” • reg dependent variable independent
variable (s) reg wage educ
• Increase in education by 1 unit (year) is predicted to increase hourly wage by $0.54
• R sq=0.1648• When educ=0, wage is predicted to be -
$0.90.
C. SLR Wage regression
_cons -.9048516 .6849678 -1.32 0.187 -2.250472 .4407687 educ .5413593 .053248 10.17 0.000 .4367534 .6459651 wage Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 7160.41429 525 13.6388844 Root MSE = 3.3784 Adj R-squared = 0.1632 Residual 5980.68225 524 11.4135158 R-squared = 0.1648 Model 1179.73204 1 1179.73204 Prob > F = 0.0000 F( 1, 524) = 103.36 Source SS df MS Number of obs = 526
• Increase in education by 1 unit (year) is predicted to increase hourly wage by $0.54...increase by 6 years=6*$0.54=$3.24
• R sq=0.1648; variation in education explains 16.4% of variation in wages
• When educ=0, wage is predicted to be -$0.90.
• Variance of estimator is 0.0532
C. Reading the output table
SSTotal --The total variability around the mean. •
SSResidual --The sum of squared errors: •
SSModel (aka SSE) •
Observe SSModel=SSTotal - SSResidual. Note that SSModel / SSTotal is equal to
0.1648, the value of R-Square
2
1
)(
n
i
YY
2
1
)ˆ(
n
i
YY
2
1
)ˆ(
n
i
YY
C. Reading the output table
Coefficients: wagePredicted = -0.9048516 + 0.5413539*educ
Statistics (Ch. 4) t and P>|t| - These columns provide the t-value and 2-
tailed p-value used in testing the null hypothesis that the coefficient (parameter) is 0.
[95% Conf. Interval] - This shows a 95% confidence interval for the coefficient. (the coefficient will not be statistically significant if the confidence interval includes 0)
C. Reading the output table
After the regression, type: predict wagehat, xb
Tells us predicted value of wage, given that observations value of education
predict uhat, resid tells us portion of wage that is not explained by
the independent variable(s)
C. Useful Commands “replace”: replace value with a new one
replace wage=4 if wage<4 “drop”: drop entire variable or just some
observations drop prevexper drop if educ<=8
“keep” keep wage educ Keep if educ>=8 Be careful with these commands!!
C. Operators
< less than > greater than <= less than or equal to >= greater than or equal to == equal to !=. or ~= not equal to & and | or
E. The “do” file A text file that you can type all your
commands in and store. Helpful to keep a file of what commands
you run in case you want to re-run them later.
How to open/save a do file? Go to “Window”>”Do-File Editor” Or click on “New Do-File Editor” Save the do file (.do) To open saved do file, open a new do-file
and search for where you saved it.
E. The “do” file:
Comments in your do file: /* */ STATA ignores the text that comes after
* (does not execute them) these lines can be used to describe what
the commands are doing, or allows you to write comments.
/*the following command summarizes the variable wage*/
sum wage
E. The “do” file From the STATA do-file editor
click “do” for STATA to execute all commands can highlight and click “do” to execute only the
highlighted command lines click “run” for STATA to execute all commands,
but you won’t see results in viewer/results window
All the commands in a do-file can be typed into the command window and run from there, but this is helpful if you want to do same thing over and over.
E. The “do” file Each command must have it’s own line Stata will not run: sum wage sum educBut will run:sum wagesum educsum wage educ
F. Save your data Saving in Stata format:
save “Type in correct path name\file name.dta”
Go to “File”>”Save” or “Save As”
G. Other Commands Increasing memory, variables
“set memory 200m” “set maxvar 400”
Clear the file “clear”
For long commands # delimit ; tells STATA that each STATA command ends
with a semicolon...instead of line break Do not forget the “;” and write this even
after the comment lines that start with *.
G. Other Commands sort
i.e. sort educ i.e. sort educ female
by educ: summarize wage(Note, must sort first by educ before can
use by educ) Graphs
twoway (scatter wage educ ) histogram wage
H. MLR Wage Regression
_cons -2.575859 .8066152 -3.19 0.001 -4.160491 -.9912264 smsa 1.045125 .3053285 3.42 0.001 .4452936 1.644957 numdep .1716186 .1116665 1.54 0.125 -.0477552 .3909924 married .649392 .3036465 2.14 0.033 .0528647 1.245919 female -2.101436 .2694909 -7.80 0.000 -2.630863 -1.572009 exper .0597343 .0111783 5.34 0.000 .037774 .0816945 educ .5677906 .0543361 10.45 0.000 .4610449 .6745363 wage Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 7160.41429 525 13.6388844 Root MSE = 3.0326 Adj R-squared = 0.3257 Residual 4772.95524 519 9.19644556 R-squared = 0.3334 Model 2387.45905 6 397.909841 Prob > F = 0.0000 F( 6, 519) = 43.27 Source SS df MS Number of obs = 526
. reg wage educ exper female married numdep smsa
• Including other covariates doesn’t change estimate on wage by much.
• R sq increases• Variables have expected sign: Higher wage if have more
experience, are married or have family(because probably very devoted worker), and live in metropolitan area. Women generally get paid less than men.