Stata: Getting Stata: Getting Starting Starting and Being Productive and Being Productive with VA Data with VA Data Give me six hours to chop down a tree and I will spend Give me six hours to chop down a tree and I will spend the first four sharpening the axe. the first four sharpening the axe. --Abraham Lincoln --Abraham Lincoln Todd Wagner Todd Wagner June 2007 June 2007
35
Embed
Stata: Getting Starting and Being Productive with VA Data
Stata: Getting Starting and Being Productive with VA Data. Give me six hours to chop down a tree and I will spend the first four sharpening the axe. --Abraham Lincoln Todd Wagner June 2007. Outline. Getting data into Stata Editing in Stata How does Stata handle data - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Stata: Getting Starting Stata: Getting Starting and Being Productiveand Being Productive
with VA Datawith VA Data
Give me six hours to chop down a tree and I will spend the Give me six hours to chop down a tree and I will spend the first four sharpening the axe. first four sharpening the axe.
--Abraham Lincoln--Abraham Lincoln
Todd WagnerTodd WagnerJune 2007June 2007
OutlineOutline
Getting data into StataGetting data into Stata Editing in StataEditing in Stata How does Stata handle dataHow does Stata handle data Stata notation and helpStata notation and help Using Stata and Basic Stata commandsUsing Stata and Basic Stata commands
Transferring DataTransferring Data
Stattransfer or DBMS copy workStattransfer or DBMS copy work Stattransfer often seeks to optimize the Stattransfer often seeks to optimize the
Stata dataset by defaultStata dataset by default– If transferring data with SCRSSN, If transferring data with SCRSSN, FORCEFORCE
Stattransfer to transfer SCRSSN as double Stattransfer to transfer SCRSSN as double precisionprecision
StattransferStattransfer
CLICK ON DOUBLE
Editing in StataEditing in Stata
Any ASCII text editor will workAny ASCII text editor will work Stata has a built in text editor, but it is Stata has a built in text editor, but it is
limited.limited. I recommend using another text editorI recommend using another text editor
SAS processes one record at a timeSAS processes one record at a time Stata processes all the records at the same Stata processes all the records at the same
timetime– Loops are commonly used in SASLoops are commonly used in SAS
– Loops are very rarely used in StataLoops are very rarely used in Stata
Loading Data into MemoryLoading Data into Memory
Stata reads the data into memoryStata reads the data into memory– set mem 100m set mem 100m (before you load the data)(before you load the data)
You must have enough memory for your You must have enough memory for your datasetdataset
With large datasets:With large datasets:– drop unnecessary variablesdrop unnecessary variables– Use the compress command (but don’t compress Use the compress command (but don’t compress
SCRSSN)SCRSSN)
Stata AbbreviationsStata Abbreviations Stata commands can be abbreviated with Stata commands can be abbreviated with
the first three lettersthe first three letters– regression income education femaleregression income education female
could be writtencould be written– reg income education femalereg income education female
Can also abbreviate variables if uniquely Can also abbreviate variables if uniquely defineddefined– reg inc educ femreg inc educ fem
Stata HelpStata Help
Stata’s built in help is greatStata’s built in help is great– Help <command>Help <command>
Stata manuals are great because they Stata manuals are great because they review theoryreview theory
Stata and the WebStata and the Web
Stata is “web aware”Stata is “web aware” Check for updates periodicallyCheck for updates periodically–update allupdate all
You can search for user-written programsYou can search for user-written programs–findit outputfindit output–findit outregfindit outreg (click to install) (click to install)
Stata in WindowsStata in Windows
Page up scrolls through the previous Page up scrolls through the previous commandscommands
There is a graphical user interface There is a graphical user interface (menus) if you forget a command(menus) if you forget a command
We have Stata on rocky and tasha– no We have Stata on rocky and tasha– no graphical capabilities, no menus, and loss graphical capabilities, no menus, and loss of some shortcutsof some shortcuts
Using StataUsing Stata
Create batch files called “.do” filesCreate batch files called “.do” files I work interactivelyI work interactively
– Run Stata and create do file as I goRun Stata and create do file as I go
– I can then use the do file as neededI can then use the do file as needed Debugging code and exploratory data Debugging code and exploratory data
analysis is very fast in Stataanalysis is very fast in Stata
Sysdir, ls and cdSysdir, ls and cd
Stata recognizes some unix commands, such Stata recognizes some unix commands, such as ls and cdas ls and cd
Sysdir provides a listing of Stata’s working Sysdir provides a listing of Stata’s working directoriesdirectories
SAS recognizes “;” as a delimiterSAS recognizes “;” as a delimiter Stata recognizes the carriage returnStata recognizes the carriage return
– Always add a carriage return after your last Always add a carriage return after your last commandcommand
You can change delimiters to ; You can change delimiters to ; #delimit ;#delimit ;
Missing DataMissing Data
Stata and SAS both use “.” as missingStata and SAS both use “.” as missing Stata implicitly values a missing as a very Stata implicitly values a missing as a very
large numberlarge number SAS implicitly values a missing as a very SAS implicitly values a missing as a very
small numbersmall number
Generating and Recoding VariablesGenerating and Recoding Variables
In SAS you typeIn SAS you typequality=0; quality=0;
If VA=1 then quality=1;If VA=1 then quality=1; In Stata you typeIn Stata you typegen quality=0 gen quality=0
recode quality 0=1 if VA==1 recode quality 0=1 if VA==1 oror
replace quality=1 if VA==1 replace quality=1 if VA==1
Boolean LogicBoolean Logic
Stata is picky about Boolean logicStata is picky about Boolean logic
gen y=x if a==bgen y=x if a==b (must use two ==) (must use two ==)
gen y=x if a>b & b>10gen y=x if a>b & b>10 (must use &) (must use &)
gen y=x if a<=bgen y=x if a<=b (< or > must be before =) (< or > must be before =)
Creating Dummy VariablesCreating Dummy Variables
Goal: create dummy variable for each DRGGoal: create dummy variable for each DRG
gen drgnum1=drg==1 gen drgnum1=drg==1 oror
tab drg, gen(drgnum)tab drg, gen(drgnum)
This second command automatically creates This second command automatically creates dummy variablesdummy variables
DropDrop
Drop <varnames>Drop <varnames> (drops variables) (drops variables)
Drop if X==1Drop if X==1 (drop cases where (drop cases where value is 1)value is 1)
egen Commandsegen Commands
You want to generate total costs for a medical You want to generate total costs for a medical centercenter
In SAS this is done by proc summaryIn SAS this is done by proc summary In Stata, you can typeIn Stata, you can typecollapse (sum) costs, by (stan3)collapse (sum) costs, by (stan3) oror
sort sta3nsort sta3n
by sta3n: egen sumcost=total(cost)by sta3n: egen sumcost=total(cost)
ICD-9 CodesICD-9 Codes
Stata has capabilities to handle ICD-9 Stata has capabilities to handle ICD-9 diagnosis and procedure codesdiagnosis and procedure codes
You can You can – check to see if codes are validcheck to see if codes are valid
– generate identifiers based on codes or generate identifiers based on codes or ranges of codesranges of codes
DatesDates
Same date functions as SASSame date functions as SAS
Combining DataCombining Data MergeMerge
– this automatically creates a variable called _mergethis automatically creates a variable called _merge– merge==1 obs. from master data merge==1 obs. from master data – merge==2 obs. from only one using dataset merge==2 obs. from only one using dataset – merge==3 obs. from at least two datasets, master or merge==3 obs. from at least two datasets, master or
using using
merge scrssn admitday disday using data_ymerge scrssn admitday disday using data_y
Append (stacking data)Append (stacking data)
Explicit SubscriptingExplicit Subscripting
Identify the most recent encounter in an Identify the most recent encounter in an encounter databaseencounter database
gsort id -dategsort id -date
by id : gen n=_nby id : gen n=_n
by id : gen N=_Nby id : gen N=_N
gen select=n==1gen select=n==1
Ascending sort by ID and reverse by date
Record counter from 1 to N per person
Total number of records per person
Using StataUsing Stata
Stata Interface in WindowsStata Interface in Windows
Set, Clear and MoreSet, Clear and More
Set: sets system parametersSet: sets system parameters– Need to set memory size to open a databaseNeed to set memory size to open a database
set mem 100mset mem 100m ClearClear erases data from memory erases data from memory When output is >1 page, you are asked to When output is >1 page, you are asked to
continue (continue (set more offset more off))
Summarizing DataSummarizing Data
. sum gender age educ
Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- gender | 4085 1.496206 .5000468 1 2 age | 4085 64.5601 9.451724 50 94 educ | 4085 4.398286 1.662883 1 9
Sum < >, dSum < >, d provides more details on each provides more details on each variablevariable
Tabstat provides summary info, including Tabstat provides summary info, including totalstotals
Outputs data to a delimited fileOutputs data to a delimited file Delimited file can be read into ExcelDelimited file can be read into Excel Very flexibleVery flexible Creates publishable tables Creates publishable tables