Top Banner
Ann Arbor ASA “Up and Running” Series: Intro Stata Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association November 29, 2011
48

Ann Arbor ASA “ Up and Running ” Series: Intro Stata

Jan 31, 2016

Download

Documents

ailish

Ann Arbor ASA “ Up and Running ” Series: Intro Stata. Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association November 29, 2011. Agenda. Why Stata ? The Stata interface The Stata mindset data logging issuing commands via menus - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA “Up and Running” Series:

Intro Stata

Prepared by volunteers of the Ann Arbor Chapter of the American

Statistical AssociationNovember 29, 2011

Page 2: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 2

Agenda

• Why Stata?• The Stata interface• The Stata mindset

• data• logging• issuing commands via menus• understanding command syntax

• Data management• Descriptive statistics and estimation • Graphing• Adding user-written commands • .do files

Page 3: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 3

Why Stata

• General purpose, cross-platform package like R or SAS• Command line interface combined with point-and-click

menus• Intuitive and standardized command syntax that is well-

documented with formulas, examples and references• Many advanced user-written commands• Easy to write your own code that is pretty fast• Excellent corporate tech support and user community

Page 4: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 4

Which Stata: MP, SE, IC or Small

• Stata is not sold in pieces, every flavor has the same commands

• Most flavors available for 32- and 64-bit Windows, Mac, and Unix/Linux platforms

• Stata/IC (Intercooled) can handle up to 2,047 variables• Stata/SE (Special Edition) can handle up to 32766 variables.

Also allows longer string variables and larger matrices• Stata/MP has the same limits, but is faster on multicore and

multiprocessor computers• Small Stata is intended for students and is limited to

analyzing data sets with a maximum of 99 variables and 1200 observations

• All of these versions can read each other’s files within their size limits

Page 5: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 5

The Stata Interface

• Results window: All output appears here, except for graphs which will appear in a separate window. Note that output is not automatically saved to a file

• Command window: Enter commands here interactively• Variables window: All variables in the current dataset are

listed here. Clicking on a variable sends its name to the command window

• Review window: Previously issued commands are listed here and can b reissued by clicking on them.

• Buttons: Shortcuts for many common commands such as log, browse, edit, etc.

• Menus: Convenient for learning Stata command syntax, but time consuming

• Look and feel is customizable

Page 6: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 6

Lab 1A

• Use the Stata File menu to open the example dataset, auto.dta

Page 7: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 7

The Stata Mindset

• Data

• Logging

• Issuing commands from menus

• Understanding command syntax

Page 8: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 8

Data

• Stata reads an entire dataset into memory. This is a fundamental difference from other stat packages such as SAS and SPSS

• Only one dataset at a time in a Stata session

• This is why there are flavors of Stata – IC, SE, Small

Page 9: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 9

Reading data into memory

• Use the menus– File, Open

• Use the command windowuse “C:\...\sample.dta”, clear

use sample.dta, clear

• Use the File Open button

• All methods produce the same result

Page 10: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 10

Saving data

• Use the menus– File, Save (or Save As…)

• Use the command windowsave “C:\...\sample.dta” [, replace]

• Use the Save button

• All methods produce the same result

Page 11: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 11

Logging

• Stata does not automatically write output to a file!• You can do this by starting a log file at the start of your

analysis, and closing it at the end• Use the menus

– File, Log

• Use the command windowlog using “C:\...\analysis1.log”

• Use the Log button• All methods produce the same result• Logs can be created, replaced, suspended, resumed,

and appended

Page 12: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 12

Lab 1B

• Use the Stata menus to:– change the color scheme – change the working directory to your desktop

lab folder– start a log file called “labs.log” in your desktop

lab folder– save the example auto.dta dataset to your

desktop lab folder

Page 13: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 13

Issuing Commands from Menus

• Menus are great for:– Familiarizing yourself with Stata’s capabilities,

both big picture and command-specific– Getting context-sensitive help– Learning Stata command syntax

• The downside:– time-consuming, especially for repetitive tasks– not all functionality available through the

menus!

Page 14: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 14

Lab 2

• To get a codebook for the auto.dta dataset, use the following menu path:

Data, Describe data, Describe data contents (codebook)

• You will see the codebook dialog. Inspect it closely…

Page 15: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 15

Anatomy of a Dialog BoxThe Stata command (keyword) that will be submitted

Multiple tabs

Help, Reset, and Copy Command

Submit and close dialog

Submit and leave dialog open

Page 16: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 16

Anatomy of a Dialog Box

Use if/in to filter rows

Specify logical condition

Specify row #s

Page 17: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 17

Anatomy of a Dialog Box

Command options available on additional tabs

Page 18: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 18

Understanding Command Syntax

• The general syntax for all Stata commands is: [prefix:] cmdname [varlist] [=exp] [if exp] [in exp]

[weight] [using filename] [, options]

• Elements in square brackets are optional for some commands

• Sometimes cmdname is all that is required, for example, codebook or describe

• The underlined portion of cmdname is shorthand for the command

• Stata is case sensitive

Page 19: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 19

Understanding Command Syntax: cmdname

• cmdname is Stata’s keyword for a command Examples:

generate replace drop regress logistic logit scatter graph bar graph box

• Enter cmdname exactly as indicated, taking care to use the proper case (usually lower case for commands)

Page 20: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 20

Understanding Command Syntax: varlist

• You can apply the command to particular variables by specifying a varlist

• Order of variables matters; can use hyphen to indicate a series of variables in order as in:

codebook x1-x20• Use wildcard notation for shorthand, such as

codebook x*• Use _all to apply command to all variables• Remember that Stata is case sensitive! Variables gender and Gender are two different things to Stata

Page 21: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 21

Understanding Command Syntax: =exp

• exp is short for expression• exp is used by data management commands

such as generate and replace• For example, to create a constant variable x

equal to 1, use:generate x=1

• You can also use functions this way:gen x2 = x^2 gen x_sq = x*xgen logx=ln(x)

Page 22: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 22

Understanding Command Syntax: if/in exp

• Without any options, commands apply to all observations/variables in the dataset

• To filter observations, use the if exp clause: codebook if (x==2 & z>=3) | w==2

• Note the parentheses!• Also note the difference between = and == (assignment and condition

equality, respectively) gen x=1 if y==2 list if gender==“F”• Conditional operators in Stata are

== (equal to) != (not equal to)> (greater than) >= (greater than or equal to)< (less than) <= (less than or equal to)& (and) | (or)

• Use in exp to refer to particular row numbers in the dataset:list in 1/10

Page 23: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 23

A Brief but Critical Detour: Missing Data

• While we are talking about selecting cases using an if exp clause, it is important to note that Stata considers missing the largest possible numeric value

• Stata represents missing numeric variables with a dot

• Keep this in mind when filtering cases based on a numeric variable:replace hieduc = 1 if x>3 (potential problem)replace hieduc = 1 x>3 & x<. (playing it safe)replace hieduc = 1 x>3 & x!=. (playing it safe)

Page 24: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 24

Understanding Command Syntax: weight

• Most Stata commands can deal with weighted data, where the weight is a variable in the dataset

• You need to specify the type of weight and the weight variable, using brackets, as in:summarize x [iweight=weightvarname]

• Four types of weights:– Frequency fweights, for replicated data– Probability pweights, for observations sampled with unequal

probability of selection– Analytic aweights, for data containing averages where the

average is weighted by the # obs used in calculating the average

– Importance iweights, defined by the specific command

Page 25: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 25

Understanding Command Syntax: using filename

• Some commands read in data from external files, or write to files

• These commands contain a using clause, in which the path and filename appear

• Merging two datasets together is an example:use “C:\…\master_data.dta,clearmerge 1:1 id using “C:\...\using_data.dta

• This performs a 1:1 match using the key variable, id (merge adds new variables). 1:many merges are also possible

• Similarly, to stack datasets:use “C:\..\one.dta”,clearappend using “C:\...\two.dta”

Page 26: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 26

Understanding Command Syntax: prefix:

• Prefix commands operate on other Stata commands. One common prefix is bysort:bysort gender: summarize wage

• The bysort prefix sorts and stratifies the summarize command by the gender variable

• The bysort prefix is also very handy in a data management context, for example, aggregatingbysort gender: egen avg_wage = mean(wage)

• Not all commands permit the use of all or even any prefixes

Page 27: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 27

Understanding Command Syntax: Where to get HELP

• If you know the name of a command, enterhelp cmdname

• If you don’t know it, enterfindit word1 [word2]…

• This queries a keyword database and some of the official internet sources (such as Stata FAQs, Stata Journal articles)

• Google• Email or call Stata Technical Services (really!)• Statalist archives• Email CSCAR Stata support at [email protected] if

you are affiliated with the U-M as a grad student, staff or faculty member

Page 28: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 28

Lab 3

• Enter the appropriate commands in the command window (no menus!):– open the auto.dta dataset, clearing out what is in memory– describe the datatset– get the codebook for the first 5 variables in the dataset– list out the first 10 observations– try out the browse command– browse the cases where price is greater than 5000 (but not

missing)– summarize the price variable where foreign==0 (for domestic

cars)– use the bysort prefix to summarize the price variable by levels

of the foreign variable

Page 29: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 29

Data Management Commands

• We’ve already seen quite a fewuse save //open/save databrowse list //view datacodebook describe //10,000 ft viewgen, egen replace //create/replace varsmerge append //merge/stack datasets

• Next up:– importing– exporting – aggregating– keeping/dropping

Page 30: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 30

Data Management Commands:importing files

• use reads Stata formatted (.dta) datasets.• For data created in another software package:

– Save the data in an excel file, then use the import excel command (new with Stata 12)

– save the data in a comma separated values file (.csv), or a delimited file, then use the insheet command

– use the other package to save the data in .dta format (SPSS 17+ and SAS 9.2 can do this)

– use StatTransfer to convert the file to .dta• .dta, delimited, and .csv files are the simplest file types to

get into Stata • Stata will also import data in other formats, but it’s not

always straight-forward• To import a .csv file:

insheet using “C:\...\new_data.csv”, comma clear

Page 31: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 31

Data Management Commands:exporting files

• save saves the data in .dta format• To make the data usable by other software

packages:– export the data to a comma separated values file (.csv),

or a delimited file using outsheet– use the other package to open the .dta file and save it

in another format (SPSS 17+ and SAS 9.2 can do this)– use StatTransfer to convert the file from .dta to

something else

• To export data to a .csv file:outsheet using “C:\...\out_data.csv”, comma

Page 32: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 32

Data Management Commands:aggregating files

• It is a common exercise to aggregate data, or to make a dataset of summary statistics

• Use the collapse command:collapse (mean) mn_wage=wage (count) count=gender, by(gender)

to turn data like this……… into this

id gender wage ……… gender count mn_wage1 M 500 M 3 ##2 M 550 F 2 ##3 M 4904 F 5055 F 410

• Use collapse to produce counts, means, medians, percentiles, extrema, and standard deviations of your data.

Page 33: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 33

Data Management Commands:keep/drop

• To throw away variables, usekeep varlist

drop varlist

• To get ride of particular observations, add an if or in clause with no varlist:

drop if x==3

keep in 1/100

Page 34: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 34

Lab 4• Import the “auto.csv” dataset from your desktop lab

folder• Save the file in your desktop lab folder as “auto1.dt

a”• Aggregate the dataset by levels of foreign, obtaining

the mean and median for price and mpg• Drop the median price and median mpg variables• Export the aggregated dataset to a .csv file in your

desktop lab folder

Page 35: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 35

Descriptive Statistics and Estimation

• We’ve already seensummarize

• Next up:– summarizing (with detail)– tabulating– estimation (modeling)– post-estimation

Page 36: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 36

Descriptive Statistics and Estimation :summarizing with detail

• summarize gives descriptive statistics for numeric variables

• Use the detail option to get additional descriptive statisticssum x1, detail

• summarize without a varlist will summarize all numeric variables in the dataset

Page 37: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 37

Descriptive Statistics and Estimation :tabulating

• tabulate gives one- and two-way tables for categorical variables

• Use the chi2, row, and col options to get a chi-square test, row %, column %tab race, row

tab race treatment, chi2 col

Page 38: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 38

Descriptive Statistics and Estimation :estimation (modeling)

• Most estimation commands have the same syntaxcmdname yvar(list) xvarlist [,options]

• Common estimation commands are regress //OLSlogit, logistic //logisticmlogit //multinomialologit //ordinalpoisson //poissonxtmixed //mixed

• Example:reg y x1 x2 x3

Page 39: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 39

Descriptive Statistics and Estimation :post-estimation

• After you get your estimates you can obtain predictions: predict yhat1 if e(sample)predict yhat2predict resid, residuals

• Adjusting the estimated covariance matrix is straight forward:reg y x1 x2 x3, robustreg y x1 x2 x3, cluster(clustervar)

• Testing hypotheses about parameters:test x1=3

• Hypotheses can also be nonlinear and involve combinations of parameters

Page 40: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 40

Lab 5

• Using the auto.dta dataset:– summarize the variables price and mpg– tabulate the foreign variable– regress price on mpg and foreign (OLS regression)– save the predicted values in a new variable called

yhat– save the studentized residuals in a new variable

called rstudent

Page 41: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 41

Graphing

• Easily customized graphics• Graphs can be created via menus or command

line• Manual adjustment can be done after the graph

is generated, using the Graph Editor• Graphs can be saved in various file formats

and/or pasted into documents• Examples:

histogram y, normaltwoway (scatter y x) (lfit y x)

Page 42: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 42

Lab 6

• Using the auto.dta dataset, create a scatterplot of price on the y-axis, and mpg on the x-axis

• From the Graph window, start the Graph Editor. Modify the plot titles and colors

• Save your graph as a Stata .gph file in your desktop lab folder

• Copy the graph and paste it into a Word or PowerPoint file

Page 43: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 43

Adding User-written Commands

• You can install add-on packages, which are user-written commands made publicly available

• You may run into these packages if you – do a findit search– Google – go to Help, SJ and user-written programs

• Installation is usually as simple as clicking thru some links

• My personal most-used add-ons:mvpatternsgllamm

Page 44: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 44

Lab 7

• Install the mvpatterns add-on package, by typingfindit mvpatterns

then click on the blue link starting with dm91• Follow links to install• Read the help file for mvpatterns• Check the missing value patterns for the

variables make thru rep78• Close your log file

Page 45: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 45

.do Files• .do files are text files that contain sequences of Stata

commands (like a SAS command file, or a SPSS syntax file)

• Create them using Stata’s .do file editor, or any text editor.– Copy from your Review window– Type in the commands directly

• Saving your commands to a .do file(s) is never a bad idea. But use good habits:– Comment liberally, using * or /* */ conventions– Specify the version of Stata used– Use set more off to opt out of Stata’s paging feature, if

appropriate • You can run the entire .do file, or just a small part of it• Stata will stop processing if an error is encountered when

commands from a .do file are submitted

Page 46: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 46

Lab 8

• Open the sample.do file in your desktop lab folder

• Can you describe what is happening in the .do file?

• Copy all of the commands from tonight’s session into a new .do file

• Run a small section of commands• Run the entire file

Page 47: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 47

Other Misc.

• To manage variable attributes, use the Variables Manager.

• Type help cmdname to find out more about these commands:matrix //matrix algebra

mata //fancy matrix programming

foreach //looping command

xt //panel/longitudinal analysis

st //survival analysis

svy //analysis of complex survey data

Page 48: Ann Arbor ASA  “ Up and Running ”  Series:  Intro Stata

Ann Arbor ASA (Up and Running): Stata Intro 48

Additional Resources

• Stata website, FAQs:http://www.stata.com/support/faqs

• UCLA websitehttp://www.ats.ucla.edu/stat/stata/default.htm

• Christopher F. Baum’s Stata handoutshttp://fmwww.bc.edu/GStat/docs/StataIntro.pdfhttp://fmwww.bc.edu/GStat/docs/StataProg.pdfhttp://fmwww.bc.edu/GStat/docs/StataMata.pdf

• Stata NetCourses http://www.stata.com/netcourse/

• CSCAR workshops http://www.umich.edu/~cscar/workshops/