Top Banner
Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019
42

Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

Nov 03, 2019

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Introduction to the Stata Language

Mark Lunt

Centre for Epidemiology Versus ArthritisUniversity of Manchester

01/10/2019

Page 2: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Topics Covered Today

Getting helpStata WindowsBasic ConceptsManipulation of variablesManipulation of datasets

Page 3: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Command-line vs. Point-and-Click

Command-line requires more initial learning thanpoint-and-clickCommands must be entered exactly correctlyOnly option for any serious work

1 Reproducible2 Editable3 More efficient

Some commands can be written more efficiently viapoint-and-click

Page 4: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Getting Help

HelpManualsSearchStata websiteStatalistStata JournalMe

Page 5: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Command WindowVariables WindowReview WindowResults Window

Stata Windows

2 must exist:ResultsCommand

2 others usually existReviewVariables

Others can exist (data editor, graph, do-file editor, help/logviewer)

Page 6: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Command WindowVariables WindowReview WindowResults Window

Command Window: Syntax

command [varlist] [,options]

Roman letters: entered exactlyItalic letters: replaced by some text you enterSquare brackets: that item is optionalExample above means means:

Command is called “command”Command name may be followed by a list of variablesOptions may follow a comma

Page 7: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Command WindowVariables WindowReview WindowResults Window

Command Window

Can navigate through previous commands with PageUpand PageDown.Pressing tab key will complete a variable name as far aspossibleCase-sensitive: height and HEIGHT are differentvariablesSyntax must be exact (although abbreviations are possible)

Only one comma, before all optionsSpace before opening parenthesis was most common error,now accepted (since Stata 12). (e.g. level(5), notlevel (5)).

Page 8: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Command WindowVariables WindowReview WindowResults Window

Variables window

List of all variables in current datasetClicking adds variable name to command windowMay contain label if one has been defined

Page 9: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Command WindowVariables WindowReview WindowResults Window

Review Window

List of commands entered this sessionClicking on a command puts it in command windowDouble-clicking runs the commandCan be saved as a script, called a “do-file”

Page 10: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Command WindowVariables WindowReview WindowResults Window

Results Window

Limited size: use a log file to preserve resultsBlue = clickable linkScrolling controlled by Return, Space and q keys.set more [on | off]

Page 11: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Basic Concepts

Do-filesLog filesInteraction with Operating SystemMacrosVariable and number lists

Page 12: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Do-Files

List of commandsCan be run from stata with the commanddo "do-file.do"

All data manipulation and analysis should be done using ado-file.

Perfectly reproducibleCan see exactly what was doneEasy to modify

Page 13: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Profile.do

Stata looks for a file called profile.do every time itstarts.If it finds it, it runs itUseful for

Setting memoryUser-defined menusLogging commands

See help profilew for details

Page 14: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Log Files

Results window of limited size: must log resultsCan use plain text or SMCL (stata markup and controllanguage)Top of do file should be:capture log closelog using myfile.log, [append]|[replace]([text]|[smcl])

Page 15: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Interaction with Operating System

cd Change directorypwd Display current directorymkdir Create directorydir List files in current directoryshell Run another program

Can use either "/" or "\" in directory names.Safer to use "/"

Path names containing spaces must be surrounded byinverted commas.

Page 16: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Macros

Macro name is replaced by definition text when commandis run.Very useful for making do-files portable

Directories used are defined first using macrosChange in location of data or do-files only means changingmacro definitions

Page 17: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Macro Example

Definition: global mymac C:/Project/Data

Use:use "$mymac/data"Loads the file C:/Project/Data/data

Page 18: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Local vs. Global

Global macro retains definition until end of sessionLocal macro loses definition at end of do-file

Definition UseGlobal global mymac defn $mymacLocal local mymac defn ‘mymac’

Local vs Global macros

Page 19: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Variable Lists

Shorthand for referring to a lot of variablesprefix* means all variables beginning with prefix

firstvar-lastvar means all variables in the datasetfrom firstvar to lastvar inclusive.Type help varlist for more details

Page 20: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Number Lists

Symbol Meaning Example Expansionlist of numbers 1 2 3 1 2 3

x /y whole numbers from x to y inclusive 1/5 1 2 3 4 5x y to z numbers from x to z, increasing by y − x 5 10 to 20 5 10 15 20x y : z same as x y to z 5 10:20 5 10 15 20x(y)z numbers from x to z, increasing by y 10(10)50 10 20 30 40 50x [y ]z same as x(y)z 10[10]50 10 20 30 40 50

Number Lists

Page 21: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

Manipulating Variables

generate & replace

egen

LabellingSelecting variables

Page 22: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

generate

Used to create a new variableSyntax: generate [type] newvar = expression

newvar must not already existtype, if present, defines the type of the dataexpression defines the values: e.g.

generate ltitre = log(titre)generate str6 head = substr(name, 1, 6)

Page 23: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

Variable Types

type size (bytes) min max precision missingbyte 1 -127 126 integers .int 2 -32,767 32,766 integers .long 4 -2,147,483,647 2,147,483,646 integers .∗float 4 −1036 1036 7 digits .double 8 −10308 10308 15 digits .strn n ""strL varies ""

Available data types

∗float is the default type.

Page 24: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

Missing Values

Numerical variables can have several different missingvalues:

., .a, .b, etcMay be useful if you know why a variable is missingif variable != . may not catch all missing values

All missing values are greater than any numberrepresentable by that datatype.

Can exclude all missing values withif variable < .gen old = age > 65 if age < .

Page 25: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

replace

Similar to generateCannot change typenewvar must already exist

Page 26: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

egen

Extended GENerateHas more functions availableUser can write their own egen functionsNo ereplace: must drop the existing variable and createa new oneExamples of its use in the practicalSee help egen for details

Page 27: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

Labelling

Need to label variables themselvesshow exactly what the variable measures

Need to label values of a variableOnly for categorical variablesFirst define a labelThen assign it to a variableEasier to assign same label to a number of variablesCan label different missing values

Page 28: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

Labelling a variable

Syntax: label variable varname "Description"

Example: label variable height "Height in m."

Page 29: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

Labelling values

Syntax: label define labelname 1 "string1" . . .label values varname labelname

Example: label define yesno 0 "No" 1 "Yes"label values question1 yesnolabel values question2 yesno

Page 30: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

Selecting variables

drop varlist

keep varlist

Page 31: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

Manipulating Datasets

use & save

appendmergebrowse and edit

preserve and restore

Page 32: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

use

use "filename" reads a file into stataIf there is already a file in stata, need use "filename",clear

Always use inverted commasEasier to use the menu or button-bar

Page 33: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

save

save "filename" saves the current dataset as"filename"

If "filename" already exists, need save "filename",replace

Option saveold allows saving in format of a previousversion of stataIf you do not include a directory in filename, stata will tryto save it in the current directory

Page 34: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

Combining Datasets

appendmore subjects, same variablesappend using filename

mergesame subjects, more variablesmerge 1:1 identifier using filename

Page 35: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

Appending Data: Example

ID common_1 common_2 file1_1 file1_21 a1 b1 c1 d12 a2 b2 c2 d23 a3 b3 c3 d3

Appending Data: File 1

ID common_1 common_2 file2_1 file2_24 a4 b4 e4 f45 a5 b5 e5 f56 a6 b6 e6 f6

Appending Data: File 2

Page 36: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

Appending Data: Example

ID common_1 common_2 file1_1 file1_2 file2_1 file2_21 a1 b1 c1 d1 . .2 a2 b2 c2 d2 . .3 a3 b3 c3 d3 . .4 a4 b4 . . e4 f45 a5 b5 . . e5 f56 a6 b6 . . e6 f6

Appending Data: Combined Files

Page 37: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

Merging Data

Need an identifier (one or more variables on which tomatch observations)Both files must be sorted by this identifierAll observations from both files are usedVariable _merge says whether observation was in first file,second file or both.

Page 38: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

Merging Files: example

idno var1 var21 a1 b12 a2 b23 a3 b3

Merging Data: File 1

idno var3 var41 c1 d13 c3 d34 c4 d4

Merging Data: File 2

Page 39: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

Merging Files: example

idno var1 var2 var3 var4 _merge1 a1 b1 c1 d1 32 a2 b2 . . 13 a3 b3 c3 d3 34 . . c4 d4 2

Merging Data: Combined Files

Page 40: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

Ensuring Uniqueness

Usually, should only be one observation per uniqueidentifierMay not be the case (e.g. adding family-level data toindividual-level data)If there should be one observation per identifier in bothdatasets, use the command merge 1:1

If each record in current dataset corresponds to several inthe merged dataset, use merge 1:m

Equally, there are merge m:1 and merge 1:mcommands

Page 41: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

browse & edit

Can open a data editor window with browse

Can choose variables to browse with browse varlist

Cannot modify data while browsingedit allows data to be changed: don’t use it

Page 42: Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

preserve & restore

You may wish to change your data temporarilyE.g. collapse to means by groupType preserve before changing data, restore after