Ann Arbor ASA “Up and Running” Series: Intro Stata Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association November 29, 2011
Jan 31, 2016
Ann Arbor ASA “Up and Running” Series:
Intro Stata
Prepared by volunteers of the Ann Arbor Chapter of the American
Statistical AssociationNovember 29, 2011
Ann Arbor ASA (Up and Running): Stata Intro 2
Agenda
• Why Stata?• The Stata interface• The Stata mindset
• data• logging• issuing commands via menus• understanding command syntax
• Data management• Descriptive statistics and estimation • Graphing• Adding user-written commands • .do files
Ann Arbor ASA (Up and Running): Stata Intro 3
Why Stata
• General purpose, cross-platform package like R or SAS• Command line interface combined with point-and-click
menus• Intuitive and standardized command syntax that is well-
documented with formulas, examples and references• Many advanced user-written commands• Easy to write your own code that is pretty fast• Excellent corporate tech support and user community
Ann Arbor ASA (Up and Running): Stata Intro 4
Which Stata: MP, SE, IC or Small
• Stata is not sold in pieces, every flavor has the same commands
• Most flavors available for 32- and 64-bit Windows, Mac, and Unix/Linux platforms
• Stata/IC (Intercooled) can handle up to 2,047 variables• Stata/SE (Special Edition) can handle up to 32766 variables.
Also allows longer string variables and larger matrices• Stata/MP has the same limits, but is faster on multicore and
multiprocessor computers• Small Stata is intended for students and is limited to
analyzing data sets with a maximum of 99 variables and 1200 observations
• All of these versions can read each other’s files within their size limits
Ann Arbor ASA (Up and Running): Stata Intro 5
The Stata Interface
• Results window: All output appears here, except for graphs which will appear in a separate window. Note that output is not automatically saved to a file
• Command window: Enter commands here interactively• Variables window: All variables in the current dataset are
listed here. Clicking on a variable sends its name to the command window
• Review window: Previously issued commands are listed here and can b reissued by clicking on them.
• Buttons: Shortcuts for many common commands such as log, browse, edit, etc.
• Menus: Convenient for learning Stata command syntax, but time consuming
• Look and feel is customizable
Ann Arbor ASA (Up and Running): Stata Intro 6
Lab 1A
• Use the Stata File menu to open the example dataset, auto.dta
Ann Arbor ASA (Up and Running): Stata Intro 7
The Stata Mindset
• Data
• Logging
• Issuing commands from menus
• Understanding command syntax
Ann Arbor ASA (Up and Running): Stata Intro 8
Data
• Stata reads an entire dataset into memory. This is a fundamental difference from other stat packages such as SAS and SPSS
• Only one dataset at a time in a Stata session
• This is why there are flavors of Stata – IC, SE, Small
Ann Arbor ASA (Up and Running): Stata Intro 9
Reading data into memory
• Use the menus– File, Open
• Use the command windowuse “C:\...\sample.dta”, clear
use sample.dta, clear
• Use the File Open button
• All methods produce the same result
Ann Arbor ASA (Up and Running): Stata Intro 10
Saving data
• Use the menus– File, Save (or Save As…)
• Use the command windowsave “C:\...\sample.dta” [, replace]
• Use the Save button
• All methods produce the same result
Ann Arbor ASA (Up and Running): Stata Intro 11
Logging
• Stata does not automatically write output to a file!• You can do this by starting a log file at the start of your
analysis, and closing it at the end• Use the menus
– File, Log
• Use the command windowlog using “C:\...\analysis1.log”
• Use the Log button• All methods produce the same result• Logs can be created, replaced, suspended, resumed,
and appended
Ann Arbor ASA (Up and Running): Stata Intro 12
Lab 1B
• Use the Stata menus to:– change the color scheme – change the working directory to your desktop
lab folder– start a log file called “labs.log” in your desktop
lab folder– save the example auto.dta dataset to your
desktop lab folder
Ann Arbor ASA (Up and Running): Stata Intro 13
Issuing Commands from Menus
• Menus are great for:– Familiarizing yourself with Stata’s capabilities,
both big picture and command-specific– Getting context-sensitive help– Learning Stata command syntax
• The downside:– time-consuming, especially for repetitive tasks– not all functionality available through the
menus!
Ann Arbor ASA (Up and Running): Stata Intro 14
Lab 2
• To get a codebook for the auto.dta dataset, use the following menu path:
Data, Describe data, Describe data contents (codebook)
• You will see the codebook dialog. Inspect it closely…
Ann Arbor ASA (Up and Running): Stata Intro 15
Anatomy of a Dialog BoxThe Stata command (keyword) that will be submitted
Multiple tabs
Help, Reset, and Copy Command
Submit and close dialog
Submit and leave dialog open
Ann Arbor ASA (Up and Running): Stata Intro 16
Anatomy of a Dialog Box
Use if/in to filter rows
Specify logical condition
Specify row #s
Ann Arbor ASA (Up and Running): Stata Intro 17
Anatomy of a Dialog Box
Command options available on additional tabs
Ann Arbor ASA (Up and Running): Stata Intro 18
Understanding Command Syntax
• The general syntax for all Stata commands is: [prefix:] cmdname [varlist] [=exp] [if exp] [in exp]
[weight] [using filename] [, options]
• Elements in square brackets are optional for some commands
• Sometimes cmdname is all that is required, for example, codebook or describe
• The underlined portion of cmdname is shorthand for the command
• Stata is case sensitive
Ann Arbor ASA (Up and Running): Stata Intro 19
Understanding Command Syntax: cmdname
• cmdname is Stata’s keyword for a command Examples:
generate replace drop regress logistic logit scatter graph bar graph box
• Enter cmdname exactly as indicated, taking care to use the proper case (usually lower case for commands)
Ann Arbor ASA (Up and Running): Stata Intro 20
Understanding Command Syntax: varlist
• You can apply the command to particular variables by specifying a varlist
• Order of variables matters; can use hyphen to indicate a series of variables in order as in:
codebook x1-x20• Use wildcard notation for shorthand, such as
codebook x*• Use _all to apply command to all variables• Remember that Stata is case sensitive! Variables gender and Gender are two different things to Stata
Ann Arbor ASA (Up and Running): Stata Intro 21
Understanding Command Syntax: =exp
• exp is short for expression• exp is used by data management commands
such as generate and replace• For example, to create a constant variable x
equal to 1, use:generate x=1
• You can also use functions this way:gen x2 = x^2 gen x_sq = x*xgen logx=ln(x)
Ann Arbor ASA (Up and Running): Stata Intro 22
Understanding Command Syntax: if/in exp
• Without any options, commands apply to all observations/variables in the dataset
• To filter observations, use the if exp clause: codebook if (x==2 & z>=3) | w==2
• Note the parentheses!• Also note the difference between = and == (assignment and condition
equality, respectively) gen x=1 if y==2 list if gender==“F”• Conditional operators in Stata are
== (equal to) != (not equal to)> (greater than) >= (greater than or equal to)< (less than) <= (less than or equal to)& (and) | (or)
• Use in exp to refer to particular row numbers in the dataset:list in 1/10
Ann Arbor ASA (Up and Running): Stata Intro 23
A Brief but Critical Detour: Missing Data
• While we are talking about selecting cases using an if exp clause, it is important to note that Stata considers missing the largest possible numeric value
• Stata represents missing numeric variables with a dot
• Keep this in mind when filtering cases based on a numeric variable:replace hieduc = 1 if x>3 (potential problem)replace hieduc = 1 x>3 & x<. (playing it safe)replace hieduc = 1 x>3 & x!=. (playing it safe)
Ann Arbor ASA (Up and Running): Stata Intro 24
Understanding Command Syntax: weight
• Most Stata commands can deal with weighted data, where the weight is a variable in the dataset
• You need to specify the type of weight and the weight variable, using brackets, as in:summarize x [iweight=weightvarname]
• Four types of weights:– Frequency fweights, for replicated data– Probability pweights, for observations sampled with unequal
probability of selection– Analytic aweights, for data containing averages where the
average is weighted by the # obs used in calculating the average
– Importance iweights, defined by the specific command
Ann Arbor ASA (Up and Running): Stata Intro 25
Understanding Command Syntax: using filename
• Some commands read in data from external files, or write to files
• These commands contain a using clause, in which the path and filename appear
• Merging two datasets together is an example:use “C:\…\master_data.dta,clearmerge 1:1 id using “C:\...\using_data.dta
• This performs a 1:1 match using the key variable, id (merge adds new variables). 1:many merges are also possible
• Similarly, to stack datasets:use “C:\..\one.dta”,clearappend using “C:\...\two.dta”
Ann Arbor ASA (Up and Running): Stata Intro 26
Understanding Command Syntax: prefix:
• Prefix commands operate on other Stata commands. One common prefix is bysort:bysort gender: summarize wage
• The bysort prefix sorts and stratifies the summarize command by the gender variable
• The bysort prefix is also very handy in a data management context, for example, aggregatingbysort gender: egen avg_wage = mean(wage)
• Not all commands permit the use of all or even any prefixes
Ann Arbor ASA (Up and Running): Stata Intro 27
Understanding Command Syntax: Where to get HELP
• If you know the name of a command, enterhelp cmdname
• If you don’t know it, enterfindit word1 [word2]…
• This queries a keyword database and some of the official internet sources (such as Stata FAQs, Stata Journal articles)
• Google• Email or call Stata Technical Services (really!)• Statalist archives• Email CSCAR Stata support at [email protected] if
you are affiliated with the U-M as a grad student, staff or faculty member
Ann Arbor ASA (Up and Running): Stata Intro 28
Lab 3
• Enter the appropriate commands in the command window (no menus!):– open the auto.dta dataset, clearing out what is in memory– describe the datatset– get the codebook for the first 5 variables in the dataset– list out the first 10 observations– try out the browse command– browse the cases where price is greater than 5000 (but not
missing)– summarize the price variable where foreign==0 (for domestic
cars)– use the bysort prefix to summarize the price variable by levels
of the foreign variable
Ann Arbor ASA (Up and Running): Stata Intro 29
Data Management Commands
• We’ve already seen quite a fewuse save //open/save databrowse list //view datacodebook describe //10,000 ft viewgen, egen replace //create/replace varsmerge append //merge/stack datasets
• Next up:– importing– exporting – aggregating– keeping/dropping
Ann Arbor ASA (Up and Running): Stata Intro 30
Data Management Commands:importing files
• use reads Stata formatted (.dta) datasets.• For data created in another software package:
– Save the data in an excel file, then use the import excel command (new with Stata 12)
– save the data in a comma separated values file (.csv), or a delimited file, then use the insheet command
– use the other package to save the data in .dta format (SPSS 17+ and SAS 9.2 can do this)
– use StatTransfer to convert the file to .dta• .dta, delimited, and .csv files are the simplest file types to
get into Stata • Stata will also import data in other formats, but it’s not
always straight-forward• To import a .csv file:
insheet using “C:\...\new_data.csv”, comma clear
Ann Arbor ASA (Up and Running): Stata Intro 31
Data Management Commands:exporting files
• save saves the data in .dta format• To make the data usable by other software
packages:– export the data to a comma separated values file (.csv),
or a delimited file using outsheet– use the other package to open the .dta file and save it
in another format (SPSS 17+ and SAS 9.2 can do this)– use StatTransfer to convert the file from .dta to
something else
• To export data to a .csv file:outsheet using “C:\...\out_data.csv”, comma
Ann Arbor ASA (Up and Running): Stata Intro 32
Data Management Commands:aggregating files
• It is a common exercise to aggregate data, or to make a dataset of summary statistics
• Use the collapse command:collapse (mean) mn_wage=wage (count) count=gender, by(gender)
to turn data like this……… into this
id gender wage ……… gender count mn_wage1 M 500 M 3 ##2 M 550 F 2 ##3 M 4904 F 5055 F 410
• Use collapse to produce counts, means, medians, percentiles, extrema, and standard deviations of your data.
Ann Arbor ASA (Up and Running): Stata Intro 33
Data Management Commands:keep/drop
• To throw away variables, usekeep varlist
drop varlist
• To get ride of particular observations, add an if or in clause with no varlist:
drop if x==3
keep in 1/100
Ann Arbor ASA (Up and Running): Stata Intro 34
Lab 4• Import the “auto.csv” dataset from your desktop lab
folder• Save the file in your desktop lab folder as “auto1.dt
a”• Aggregate the dataset by levels of foreign, obtaining
the mean and median for price and mpg• Drop the median price and median mpg variables• Export the aggregated dataset to a .csv file in your
desktop lab folder
Ann Arbor ASA (Up and Running): Stata Intro 35
Descriptive Statistics and Estimation
• We’ve already seensummarize
• Next up:– summarizing (with detail)– tabulating– estimation (modeling)– post-estimation
Ann Arbor ASA (Up and Running): Stata Intro 36
Descriptive Statistics and Estimation :summarizing with detail
• summarize gives descriptive statistics for numeric variables
• Use the detail option to get additional descriptive statisticssum x1, detail
• summarize without a varlist will summarize all numeric variables in the dataset
Ann Arbor ASA (Up and Running): Stata Intro 37
Descriptive Statistics and Estimation :tabulating
• tabulate gives one- and two-way tables for categorical variables
• Use the chi2, row, and col options to get a chi-square test, row %, column %tab race, row
tab race treatment, chi2 col
Ann Arbor ASA (Up and Running): Stata Intro 38
Descriptive Statistics and Estimation :estimation (modeling)
• Most estimation commands have the same syntaxcmdname yvar(list) xvarlist [,options]
• Common estimation commands are regress //OLSlogit, logistic //logisticmlogit //multinomialologit //ordinalpoisson //poissonxtmixed //mixed
• Example:reg y x1 x2 x3
Ann Arbor ASA (Up and Running): Stata Intro 39
Descriptive Statistics and Estimation :post-estimation
• After you get your estimates you can obtain predictions: predict yhat1 if e(sample)predict yhat2predict resid, residuals
• Adjusting the estimated covariance matrix is straight forward:reg y x1 x2 x3, robustreg y x1 x2 x3, cluster(clustervar)
• Testing hypotheses about parameters:test x1=3
• Hypotheses can also be nonlinear and involve combinations of parameters
Ann Arbor ASA (Up and Running): Stata Intro 40
Lab 5
• Using the auto.dta dataset:– summarize the variables price and mpg– tabulate the foreign variable– regress price on mpg and foreign (OLS regression)– save the predicted values in a new variable called
yhat– save the studentized residuals in a new variable
called rstudent
Ann Arbor ASA (Up and Running): Stata Intro 41
Graphing
• Easily customized graphics• Graphs can be created via menus or command
line• Manual adjustment can be done after the graph
is generated, using the Graph Editor• Graphs can be saved in various file formats
and/or pasted into documents• Examples:
histogram y, normaltwoway (scatter y x) (lfit y x)
Ann Arbor ASA (Up and Running): Stata Intro 42
Lab 6
• Using the auto.dta dataset, create a scatterplot of price on the y-axis, and mpg on the x-axis
• From the Graph window, start the Graph Editor. Modify the plot titles and colors
• Save your graph as a Stata .gph file in your desktop lab folder
• Copy the graph and paste it into a Word or PowerPoint file
Ann Arbor ASA (Up and Running): Stata Intro 43
Adding User-written Commands
• You can install add-on packages, which are user-written commands made publicly available
• You may run into these packages if you – do a findit search– Google – go to Help, SJ and user-written programs
• Installation is usually as simple as clicking thru some links
• My personal most-used add-ons:mvpatternsgllamm
Ann Arbor ASA (Up and Running): Stata Intro 44
Lab 7
• Install the mvpatterns add-on package, by typingfindit mvpatterns
then click on the blue link starting with dm91• Follow links to install• Read the help file for mvpatterns• Check the missing value patterns for the
variables make thru rep78• Close your log file
Ann Arbor ASA (Up and Running): Stata Intro 45
.do Files• .do files are text files that contain sequences of Stata
commands (like a SAS command file, or a SPSS syntax file)
• Create them using Stata’s .do file editor, or any text editor.– Copy from your Review window– Type in the commands directly
• Saving your commands to a .do file(s) is never a bad idea. But use good habits:– Comment liberally, using * or /* */ conventions– Specify the version of Stata used– Use set more off to opt out of Stata’s paging feature, if
appropriate • You can run the entire .do file, or just a small part of it• Stata will stop processing if an error is encountered when
commands from a .do file are submitted
Ann Arbor ASA (Up and Running): Stata Intro 46
Lab 8
• Open the sample.do file in your desktop lab folder
• Can you describe what is happening in the .do file?
• Copy all of the commands from tonight’s session into a new .do file
• Run a small section of commands• Run the entire file
Ann Arbor ASA (Up and Running): Stata Intro 47
Other Misc.
• To manage variable attributes, use the Variables Manager.
• Type help cmdname to find out more about these commands:matrix //matrix algebra
mata //fancy matrix programming
foreach //looping command
xt //panel/longitudinal analysis
st //survival analysis
svy //analysis of complex survey data
Ann Arbor ASA (Up and Running): Stata Intro 48
Additional Resources
• Stata website, FAQs:http://www.stata.com/support/faqs
• UCLA websitehttp://www.ats.ucla.edu/stat/stata/default.htm
• Christopher F. Baum’s Stata handoutshttp://fmwww.bc.edu/GStat/docs/StataIntro.pdfhttp://fmwww.bc.edu/GStat/docs/StataProg.pdfhttp://fmwww.bc.edu/GStat/docs/StataMata.pdf
• Stata NetCourses http://www.stata.com/netcourse/
• CSCAR workshops http://www.umich.edu/~cscar/workshops/