Introduction to the Stata Language - University of Manchester · Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 01/10/2019.

Post on 03-Nov-2019

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Introduction to the Stata Language

Mark Lunt

Centre for Epidemiology Versus ArthritisUniversity of Manchester

01/10/2019

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Topics Covered Today

Getting helpStata WindowsBasic ConceptsManipulation of variablesManipulation of datasets

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Command-line vs. Point-and-Click

Command-line requires more initial learning thanpoint-and-clickCommands must be entered exactly correctlyOnly option for any serious work

1 Reproducible2 Editable3 More efficient

Some commands can be written more efficiently viapoint-and-click

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Getting Help

HelpManualsSearchStata websiteStatalistStata JournalMe

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Command WindowVariables WindowReview WindowResults Window

Stata Windows

2 must exist:ResultsCommand

2 others usually existReviewVariables

Others can exist (data editor, graph, do-file editor, help/logviewer)

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Command WindowVariables WindowReview WindowResults Window

Command Window: Syntax

command [varlist] [,options]

Roman letters: entered exactlyItalic letters: replaced by some text you enterSquare brackets: that item is optionalExample above means means:

Command is called “command”Command name may be followed by a list of variablesOptions may follow a comma

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Command WindowVariables WindowReview WindowResults Window

Command Window

Can navigate through previous commands with PageUpand PageDown.Pressing tab key will complete a variable name as far aspossibleCase-sensitive: height and HEIGHT are differentvariablesSyntax must be exact (although abbreviations are possible)

Only one comma, before all optionsSpace before opening parenthesis was most common error,now accepted (since Stata 12). (e.g. level(5), notlevel (5)).

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Command WindowVariables WindowReview WindowResults Window

Variables window

List of all variables in current datasetClicking adds variable name to command windowMay contain label if one has been defined

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Command WindowVariables WindowReview WindowResults Window

Review Window

List of commands entered this sessionClicking on a command puts it in command windowDouble-clicking runs the commandCan be saved as a script, called a “do-file”

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Command WindowVariables WindowReview WindowResults Window

Results Window

Limited size: use a log file to preserve resultsBlue = clickable linkScrolling controlled by Return, Space and q keys.set more [on | off]

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Basic Concepts

Do-filesLog filesInteraction with Operating SystemMacrosVariable and number lists

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Do-Files

List of commandsCan be run from stata with the commanddo "do-file.do"

All data manipulation and analysis should be done using ado-file.

Perfectly reproducibleCan see exactly what was doneEasy to modify

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Profile.do

Stata looks for a file called profile.do every time itstarts.If it finds it, it runs itUseful for

Setting memoryUser-defined menusLogging commands

See help profilew for details

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Log Files

Results window of limited size: must log resultsCan use plain text or SMCL (stata markup and controllanguage)Top of do file should be:capture log closelog using myfile.log, [append]|[replace]([text]|[smcl])

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Interaction with Operating System

cd Change directorypwd Display current directorymkdir Create directorydir List files in current directoryshell Run another program

Can use either "/" or "\" in directory names.Safer to use "/"

Path names containing spaces must be surrounded byinverted commas.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Macros

Macro name is replaced by definition text when commandis run.Very useful for making do-files portable

Directories used are defined first using macrosChange in location of data or do-files only means changingmacro definitions

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Macro Example

Definition: global mymac C:/Project/Data

Use:use "$mymac/data"Loads the file C:/Project/Data/data

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Local vs. Global

Global macro retains definition until end of sessionLocal macro loses definition at end of do-file

Definition UseGlobal global mymac defn $mymacLocal local mymac defn ‘mymac’

Local vs Global macros

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Variable Lists

Shorthand for referring to a lot of variablesprefix* means all variables beginning with prefix

firstvar-lastvar means all variables in the datasetfrom firstvar to lastvar inclusive.Type help varlist for more details

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Do-FilesLog FilesInteraction with Operating SystemMacrosLists

Number Lists

Symbol Meaning Example Expansionlist of numbers 1 2 3 1 2 3

x /y whole numbers from x to y inclusive 1/5 1 2 3 4 5x y to z numbers from x to z, increasing by y − x 5 10 to 20 5 10 15 20x y : z same as x y to z 5 10:20 5 10 15 20x(y)z numbers from x to z, increasing by y 10(10)50 10 20 30 40 50x [y ]z same as x(y)z 10[10]50 10 20 30 40 50

Number Lists

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

Manipulating Variables

generate & replace

egen

LabellingSelecting variables

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

generate

Used to create a new variableSyntax: generate [type] newvar = expression

newvar must not already existtype, if present, defines the type of the dataexpression defines the values: e.g.

generate ltitre = log(titre)generate str6 head = substr(name, 1, 6)

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

Variable Types

type size (bytes) min max precision missingbyte 1 -127 126 integers .int 2 -32,767 32,766 integers .long 4 -2,147,483,647 2,147,483,646 integers .∗float 4 −1036 1036 7 digits .double 8 −10308 10308 15 digits .strn n ""strL varies ""

Available data types

∗float is the default type.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

Missing Values

Numerical variables can have several different missingvalues:

., .a, .b, etcMay be useful if you know why a variable is missingif variable != . may not catch all missing values

All missing values are greater than any numberrepresentable by that datatype.

Can exclude all missing values withif variable < .gen old = age > 65 if age < .

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

replace

Similar to generateCannot change typenewvar must already exist

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

egen

Extended GENerateHas more functions availableUser can write their own egen functionsNo ereplace: must drop the existing variable and createa new oneExamples of its use in the practicalSee help egen for details

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

Labelling

Need to label variables themselvesshow exactly what the variable measures

Need to label values of a variableOnly for categorical variablesFirst define a labelThen assign it to a variableEasier to assign same label to a number of variablesCan label different missing values

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

Labelling a variable

Syntax: label variable varname "Description"

Example: label variable height "Height in m."

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

Labelling values

Syntax: label define labelname 1 "string1" . . .label values varname labelname

Example: label define yesno 0 "No" 1 "Yes"label values question1 yesnolabel values question2 yesno

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

Creation & ModificationLabellingSelecting variables

Selecting variables

drop varlist

keep varlist

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

Manipulating Datasets

use & save

appendmergebrowse and edit

preserve and restore

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

use

use "filename" reads a file into stataIf there is already a file in stata, need use "filename",clear

Always use inverted commasEasier to use the menu or button-bar

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

save

save "filename" saves the current dataset as"filename"

If "filename" already exists, need save "filename",replace

Option saveold allows saving in format of a previousversion of stataIf you do not include a directory in filename, stata will tryto save it in the current directory

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

Combining Datasets

appendmore subjects, same variablesappend using filename

mergesame subjects, more variablesmerge 1:1 identifier using filename

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

Appending Data: Example

ID common_1 common_2 file1_1 file1_21 a1 b1 c1 d12 a2 b2 c2 d23 a3 b3 c3 d3

Appending Data: File 1

ID common_1 common_2 file2_1 file2_24 a4 b4 e4 f45 a5 b5 e5 f56 a6 b6 e6 f6

Appending Data: File 2

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

Appending Data: Example

ID common_1 common_2 file1_1 file1_2 file2_1 file2_21 a1 b1 c1 d1 . .2 a2 b2 c2 d2 . .3 a3 b3 c3 d3 . .4 a4 b4 . . e4 f45 a5 b5 . . e5 f56 a6 b6 . . e6 f6

Appending Data: Combined Files

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

Merging Data

Need an identifier (one or more variables on which tomatch observations)Both files must be sorted by this identifierAll observations from both files are usedVariable _merge says whether observation was in first file,second file or both.

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

Merging Files: example

idno var1 var21 a1 b12 a2 b23 a3 b3

Merging Data: File 1

idno var3 var41 c1 d13 c3 d34 c4 d4

Merging Data: File 2

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

Merging Files: example

idno var1 var2 var3 var4 _merge1 a1 b1 c1 d1 32 a2 b2 . . 13 a3 b3 c3 d3 34 . . c4 d4 2

Merging Data: Combined Files

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

Ensuring Uniqueness

Usually, should only be one observation per uniqueidentifierMay not be the case (e.g. adding family-level data toindividual-level data)If there should be one observation per identifier in bothdatasets, use the command merge 1:1

If each record in current dataset corresponds to several inthe merged dataset, use merge 1:m

Equally, there are merge m:1 and merge 1:mcommands

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

browse & edit

Can open a data editor window with browse

Can choose variables to browse with browse varlist

Cannot modify data while browsingedit allows data to be changed: don’t use it

IntroductionGetting Help

Stata WindowsBasic Concepts

Manipulating VariablesManipulating Datasets

BasicsAppending DatasetsMerging DatasetsOther dataset commands

preserve & restore

You may wish to change your data temporarilyE.g. collapse to means by groupType preserve before changing data, restore after

top related