Top Banner
Stata version 14 Lab Session 1 February 2016 (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 1 of 20 Stata version 14 Also works for versions 13 & 12 Lab Session 1 February 2016 1. Preliminary: How to Screen Capture …………………………….. 2. Preliminary: How to Keep a Log of Your Stata Session ……….. 3. Preliminary: How to Save a Stata Graph …………………..……. 4. Enter Data: Create a New Data Set in Stata ……………...…….… 5. Enter Data: How to Import an Excel Data Set …………..……….. 6. Import a Stata Data Set Directly from the Internet ……………..… 7. Describe Your Data – Numerical Descriptions …………………… 8. Describe Your Data – Graphical Descriptions ……………………. 9. One and Two Sample Inference ……………………………………. 10. Simple and Multiple Linear Regression …………………………… 2 4 6 8 10 15 16 17 19 20 Please note I do a lot of comments! You will see many of my commands that begin with an asterisk. I’ve put some of these in green (but not all) so that they are easier to see. Commands in STATA that begin with an asterisk (*) are comments. While recommended, you don’t actually have to type these comments.
20

Stata version 14 Also works for versions 13 & 12people.umass.edu/biep640w/pdf/stata v 14 lab sessio… ·  · 2016-02-01Teaching\stata\stata version 14\stata v 14 lab session 1.docx

Apr 22, 2018

Download

Documents

ledat
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 1 of 20

    Stata version 14 Also works for versions 13 & 12

    Lab Session 1 February 2016

    1. Preliminary: How to Screen Capture .. 2. Preliminary: How to Keep a Log of Your Stata Session .. 3. Preliminary: How to Save a Stata Graph ... 4. Enter Data: Create a New Data Set in Stata .... 5. Enter Data: How to Import an Excel Data Set .... 6. Import a Stata Data Set Directly from the Internet .. 7. Describe Your Data Numerical Descriptions 8. Describe Your Data Graphical Descriptions . 9. One and Two Sample Inference . 10. Simple and Multiple Linear Regression

    2

    4

    6

    8

    10

    15

    16

    17

    19

    20

    Please note I do a lot of comments! You will see many of my commands that begin with an asterisk. Ive put some of these in green (but not all) so that they are easier to see. Commands in STATA that begin with an asterisk (*) are comments. While recommended, you dont actually have to type these comments.

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 2 of 20

    1. Preliminary: How to Screen Capture

    When to Screen Capture Screen capture is useful when you want to capture a picture to be pasted elsewhere. The picture might be what is on your screen following a Google image search. Or it might be what is on your screen after obtaining an error message. There are two steps: (1) capturing the picture, followed by (2) pasting the picture elsewhere. Step 1 Capture the Picture (PC Users)

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 3 of 20

    Step 1 Capture the Picture (for MAC Users)

    Step 2 Paste the Picture __1. Launch WORD __2. Scroll to the location where the picture is to be placed. Tip Make sure there is a blank line above this location. __3. Use the TABLE commands to create a table that has 1 row and 1 column __4. Position the cursor inside your table. Tip Center the cursor so that your picture will be centered. __5. Use INSERT > PICTURE > FROM FILE to insert your picture.

    YOUR TURN: __a. Launch Word. Create an empty word document called lab1.doc __b. Using your browser, go to the welcome page for BIOSTATS 640 __c. Capture the picture of the lighthouse __d. Paste into lab1.doc

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 4 of 20

    2. Preliminary: How to Keep a Log of Your Stata Session

    When to Keep a Log of Your Stata Session The short answer is: ALWAYS! This is both an archive of your work that can be reproduced later and a record of what you have learned how to do (thus, saving you having to re-learn it later!). It is also useful in report writing. You can import sections of your log, results mostly, into the report that you write. A Stata log can be saved in either of two formats: smcl or log

    .smcl smcl stands for stata markup and control language. This format preserves all the Stata formatting and controls.

    .log This is a plain text format. This format is easily imported into MS Word or Notepad. Tip Save your log in format .log Step 1: From the main menu at upper left: FILE > LOG > BEGIN

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 5 of 20

    Step 2: Click on drop down menu at right of FILE FORMAT:

    Step 3: Choose Stata Log

    Step 4: With a name of your choosing, enter name at SAVE AS: Then click SAVE

    YOUR TURN: __a. Launch Stata __b. Start a log of your lab session, using file format .log and that you name stata_lab1 Tip After you select File Format: Stata Log, it is not necessary to type the extension .log. Stata will do this for you.

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 6 of 20

    3. Preliminary: How to Save a Stata Graph

    When to Save a Stata Graph Again, the short answer is: ALWAYS! Surely you will want to display your graph somewhere, yes? Step 1: Create your graph (for now just follow along by typing the following in the command window)

    * Command sysuse to open a data set that is internal to Stata. sysuse auto graph twoway (scatter price mpg) (lfit price mpg), title(Simple Linear Regression)

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 7 of 20

    Step 2: With the graph window still active, click on the SAVE icon.

    Step 3: From drop down menu for File format: Choose PORTABLE NETWORK GRAPHICS (*.png)

    Step 4: Using a name of your choosing, enter your choices at SAVE AS and at WHERE. Then click on the SAVE icon.

    YOUR TURN (Be sure that you are still using the Stata data set auto.dta): __a. Launch Stata. __b. In the command window, type: histogram foreign, discrete __c. Save the graph to your desktop under the name foreign_bar.png

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 8 of 20

    4. Enter Data Create a New Data Set in Stata

    Most of the time, you will import data into Stata for analysis from an Excel file or from a Stata data set available on the internent. Once in a while, however, you may want to create a new data set by launching Data Editor in Stata. Tip The Data Editor icon is located on the top, horizontal, navigation bar

    YOUR TURN Create a Stata Data Set Containing the Following Data Follow the commands below to create a data set containing the n=4 observations on the variables id, dob, gender, and weight that are shown below.

    id type: numeric

    dob

    type: date

    gender

    type: string/character

    weight

    type: numeric 1 3/26/1926 male 161.3 2 6/9/1956 female 120.1 3 4/1/1954 male 223.2 4 11/4/1951 female 124.0

    Tip The type of variable matters!! In this illustration, you will be creating three different types of variables; numeric, string, and date.

    Launch Stata Then type the following two commands into the command window clear set more off

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 9 of 20

    Then follow along, issuing the following commands, to create your Stata data set

    . * STEP 1: Define your variables (lower case recommended), set type, and initialize . generate id=. . generate str8 dob_string="" . generate str8 gender="" . generate weight=. . * STEP 2: Click on DATA EDITOR icon. This will bring you to an empty spreadsheet . * ---- Enter the data on the previous page. Then close the data editor window ---* . * STEP 3: Create dob (date of birth) that is a Stata date variable . * For date variables with year in 4 digits, use function date with option MDY . generate dob=date(dob_string, "MDY") . format dob %tdNN/DD/CCYY . drop dob_string . list . * STEP 4: Create 0/1 indicator of female gender . generate female=(gender=="female") . list . * STEP 5: Attach labels to variable names . label variable id "Subject id" . label variable weight "weight (lbs)" . label variable dob "Date of birth" . label variable female "0/1 female" . * STEP 6: Define dictionary of discrete variable values . label define femalef 0 "male" 1 "female" . * STEP 7: Attach coding labels to discrete variable values . label values female femalef . list . * To see data with numeric labels provided . numlabel, add . list . * To drop display of numeric labels . numlabel, remove . list . * STEP 8 - Save data set using FILE > SAVE AS

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 10 of 20

    5. Enter Data How to Import an Excel Data Set Beware - Take care that the data types are correct, especially for date variables!

    There are multiple methods for importing an excel data set: (1) Copy and paste; (2) Importing the excel spreadsheet; and (3) using StatTransfer (The simplest but requires purchase of StatTransfer). METHOD 1 Copy (from Excel) and Paste (into Stata) Step 1 Launch Excel. Open the file stata_lab1.xls You should see:

    Step 2 In Excel, use FORMAT > CELLS to format each column of data (numeric, text, custom, etc) Tips (1) For each variable, position cursor over the letter of the column (eg column A for formatting the variable ID) (2) If you format a column in Excel as a date variable, it will NOT import correctly into Stata. You must format it as type = custom. Variable Format Cells as id numeric then choose 0 places after the decimal point dob custom then choose the type: m/d/yy gender text weight numeric - then choose 2 places after the decimal point

    Step 3 In Excel, select the data to be copied, including row headings with variable names. Use EDIT > COPY to complete selection.

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 11 of 20

    Step 4 Launch Stata. From tool bar, click on the icon DATA EDITOR

    Step 5 (a) Position cursor in cell Var1[1]. From the menu bar, use EDIT > PASTE SPECIAL to paste data here. (b) Important - be sure to check the box next to: Treat first row as variable names

    You should now see the following

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 12 of 20

    Step 6 Close the DATA EDITOR window Tip Dont worry; data is not lost. You will save it later

    Step 7 In the command window, issue the following commands to convert the Excel date variable (that fails to import as a date variable) into a Stata date variable that is a bona fide date variable. Eg; -

    * For dates with year recorded in 2 digits, all in the 1900s: * Use the function date(stringvariable, MD19Y) with option MD19Y in quotes. generate dob2 = date(dob, MD19Y) format dob2 %tdNN/DD/CCYY drop dob rename dob2 dob describe

    Step 8 From the menu bar, save your Stata data set using FILE > SAVE AS

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 13 of 20

    METHOD 2 Importing the Excel spreadsheet. Tip Before you do this, make sure that you have previously formatted (and saved) your data types in Excel! See METHOD 1. Step 1 Launch Stata. Click on FILE > IMPORT > EXCEL SPREADSHEET (*.xls, *.xlsx)

    Step 2 (a) Click BROWSE to locate file, (b) Check the box IMPORT 1st row as variable names (c) Click OK

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 14 of 20

    You should see the following (but with your path and name, not mine) in your results window

    Step 3 In the command window, issue the following commands to convert the Excel date variable (that fails to import as a date variable) into a Stata date variable that is a bona fide date variable. Eg; -

    * For dates with year recorded in 2 digits, all in the 1900s: * Use the function date(stringvariable, MD19Y) with option MD19Y in quotes. generate dob2 = date(dob, MD19Y) format dob2 %tdNN/DD/CCYY drop dob rename dob2 dob describe

    Step 4 From the menu bar, save your Stata data set using FILE > SAVE AS

    YOUR TURN Create an Excel Data set. Bring it into Stata by method 1 or 2. Save. __1. Launch Excel. __2. Create an Excel data set called stata_lab1.xls __3. Create a Stata data set stata_lab1.dta using your excel data. __4. Save your Stata data set.

    Here is the excel data for you. Note that the dob variable has year in 2 digits, not 4 digits.

    id

    type: numeric

    dob

    type: date

    gender

    type: string/character

    weight

    type: numeric 1 3/26/26 male 161.3 2 6/9/56 female 120.1 3 4/1/54 male 223.2 4 11/4/51 female 124.0

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 15 of 20

    6. Import a Stata Data Set Directly from the Internet

    When to Import a Stata Data Set You will do this often for class and perhaps not so often in your work. Tips (1) Be sure to enclose the url in quotes, (2) Be sure to use the option clear after the comma; and (3) Be sure to save the data onto your computer so that you have it for your use later.

    Use the command use to import a stata data set. Be sure to enclose the full url path in quotes. The basic command is of the following form and is issued in the command window use http://fullurlpath, clear To save the data onto your computer, from the top menu bar issue: FILE > SAVE AS .. Tip!! Be sure to include the extension .dta in the name of the data set; see examples below.

    Examples - use http://www.pauldickman.com/survival/ivf.dta, clear use http://people.umass.edu/biep640w/datasets/week02.dta, clear use http://people.umass.edu/biep640w/datasets/larvae.dta, clear

    YOUR TURN Import ivf.dta from the internet __1. Launch Stata (if you have not already done so) __2. In the command window, type: set more off __3. In the command window, type: use http://www.pauldickman.com/survival/ivf.dta, clear __4. Use FILE > SAVE AS to save it to your computer

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 16 of 20

    7. Describe Your Data - Numerical Descriptions

    YOUR TURN The following is a session that you can duplicate on your own. Tip - Commands that begin with an asterisk (*) are comments; I have not put them in green. The highlights in blue are my doing not Stata.

    . * Import data set from the internet - Use the command use fullrlpath remembering quotes. . use http://www.pauldickman.com/survival/ivf.dta, clear . * Reorder the variable so that they are in alphabetic order - Use the command aorder . aorder . * Descriptives for dicrete variable - Use the command tab1 . tab1 sex, missing . * Descriptives for discrete variables with display of labels & missing values numlabel, add . numlabel, add . tab1 sex, missing . * Descriptives for continuous variable - Use either tabstat or summarize . summarize bweight . summarize bweight, detail . tabstat bweight, stat(n, mean sd sem min q max cv) missing . * Descriptives for TWO dicrete variables - Use the command tab2 . * tab2 rowvariable columnvariable, options . tab2 sex hyp . tab2 sex hyp, row . tab2 sex hyp, row column cell exact chi2 . * Descriptives for ONE continuous variable, by groups defined by a dicrete variable . * Must sort by the discrete variable first . sort sex . tabstat bweight, by(sex) col(stat) stat(n mean sd sem min q max) . * use option LONGSTUB if you want to be reminded of the variable you are summarizing . tabstat bweight, by(sex) col(stat) stat(n mean sd sem min q max) longstub

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 17 of 20

    8. Describe Your Data Graphical Descriptions Note For each graph, I have shown you 2 examples. The first is the simple and very basic graph. The second is the same graph with various aesthetics added (for example titles, x and y-axis tick marks, etc)

    . * Tell Stata what graph scheme you want to use. I like the scheme s1color . set scheme s1color . ****** Bar Graphs ****** . * BAR GRAPH, simple - Use the command histogram with the option discrete . histogram hyp, discrete . * BAR GRAPH, fancy - Use the command histogram with the option discrete . histogram hyp, discrete percent addlabels ylabel(0(20)100) xlabel(0 "Normotensive" 1 "Hypertensive") gap(50) title("Bar Graph -Hyp") subtitle("n=639") caption("hyp_barchart.png") . ****** Dot Plots ****** . * DOT PLOT, simple - Use the command dotplot . dotplot matage . * DOT PLOT, fancy Use the command dotplot . dotplot matage, center msize(vsmall) xlabel(1 "All") title("Distribution of Maternal Age") subtitle("n=641") caption("dot_matage.png", size(vsmall)) . * DOT PLOT for more than 1 group, simple Use the command dot plot with option over( ). . * Must sort first. . sort sex . dotplot matage, over(sex) . * DOT PLOT for more than 1 group, fancy. . sort sex . dotplot matage, over(sex) title("Distribution of Maternal Age") subtitle("by infant sex") caption("dot2_matage.png", size(vsmall)) . ****** Box Plots ****** . * BOX PLOT for more than 1 group, simple Use the command graph box with option over( ). . sort sex . graph box matage, over(sex) . * BOX PLOT for more than 1 group, fancy. . sort sex . graph box matage, over(sex) title("Distribution of Maternal Age") subtitle("by infant sex") caption("box2_matage.png", size(vsmall))

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 18 of 20

    .* HORIZONTAL BOX PLOT for more than 1 group, simple Use the command graph hbox . graph hbox matage, over(sex) .* HORIZONTAL BOX PLOT for more than 1 group, fancy . graph hbox matage, over(sex) title("Distribution of Maternal Age") subtitle("by infant sex") caption("hbox2_matage.png", size(vsmall)) . ****** Histograms ****** . * HISTOGRAM Preliminary: Its a good idea to obtain min and max . tabstat matage, stat(min max) . * HISTOGRAM, simple Use the command histogram . histogram matage . * HISTOGRAM, fancy Use the command histogram . histogram matage, width(5) start(20) percent ylabel(0(10)50) addlabels title("Distribution of Maternal Age") subtitle("n=641") caption("histogram_matage.png", size(vsmall)) . ****** Scatterplots ****** . * X-Y SCATTERPLOT - Preliminary: Obtain min and max of the X and Y variables . tabstat matage gestwks, stat(min max) .* X-Y SCATTERPLOT, simple - Use the command graph twoway (scatter yvariable xvariable) . graph twoway (scatter gestwks matage) .* X-Y SCATTERPLOT, fancy . graph twoway (scatter gestwks matage, msymbol(d) msize(vsmall)), xlabel(20(5)45) ylabel(20(5)45) title("Scatterplot") ytitle("Weeks Gestation",size(small)) caption("scatter.png", size(vsmall)) .* X-Y SCATTERPLOT WITH OVERLAY LINEAR FIT, simple: graph twoway (lfit yvariable xvariable) . graph twoway (scatter gestwks matage) lfit gestwks matage) .* X-Y SCATTERPLOT WITH OVERLAY LINEAR FIT, fancy . graph twoway (scatter gestwks matage, msymbol(d) msize(vsmall)) (lfit gestwks matage), xlabel(20(5)45) ylabel(20(5)45) legend(off) title("Scatterplot") subtitle("with overlay linear fit") ytitle("Weeks Gestation",size(small)) caption("lfit.png", size(vsmall)) . * X_Y SCATTERPLOT W OVERLAY FIT & 95% CI, simple: graph twoway (lfitci yvariable xvariable) . graph twoway (scatter gestwks matage) (lfitci gestwks matage) . * X_Y SCATTERPLOT WITH OVERLAY FIT AND 95% CI, fancy . graph twoway (scatter gestwks matage, msymbol(d) msize(vsmall)) (lfitci gestwks matage), xlabel(20(5)45) ylabel(20(5)45) legend(off) title("Scatterplot") subtitle("with overlay linear fit and 95% CI") ytitle("Weeks Gestation",size(small)) caption("lfitci.png", size(vsmall))

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 19 of 20

    9. One and Two Sample Inference

    YOUR TURN Again, follow along!

    . ** ONE CONTINUOUS VARIABLE - 99% CI for mean using command ci with option level(99) . ci gestwks, level(99) . ** ONE CONTINUOUS VARIABLE - Test of null: mean = 40 using command ttest . ttest gestwks=40 . ** ONE CONTINUOUS VARIABLE - Test of null: standard deviation = 1 using command sdtest . sdtest gestwks=1 . ** TWO CONTINUOUS VARIABLES - Test of null: Equality of 2 INDEPENDENT means using ttest . sort sex . ttest gestwks, by(sex) . ttest gestwks, by(sex) unequal . ** TWO CONTINUOUS VARIABLES - Test of equality of 2 independent variances using sdtest . sdtest gestwks, by(sex) . ** 1 DISCRETE (0/1)VARIABLE - Test of binomial proportion using command bitest and prtest . ** Test of null: proportion of female births = .50 . generate female=(sex==2) . bitest female=.50 . prtest female=.50 . ** 1 DISCRETE (0/1)VARIABLE - 95% CI for event probability using ci with option binomial . ci female, binomial level(95) . ** 2 DISCRETE (0/1)VARIABLES - Test of equality of probabilities prtest with option by( ) . sort sex . prtest hyp, by(sex) . ** 2 DISCRETE VARIABLES (any # rows and # columns) - chi square test: tab2 with option chi2 . tab2 sex hyp, row column chi2 . ** 2 DISCRETE VARIABLES (any # rows and # columns) -fisher exact test: tab2 with option exact . tab2 sex hyp, row column exact

  • Stata version 14 Lab Session 1 February 2016

    (mac) 1. Teaching\stata\stata version 14\stata v 14 lab session 1.docx Page 20 of 20

    10. Simple and Multiple Linear Regression

    . clear . * Data are from Vittinghoff et al. Regression Methods in Biostatistics . * Y=glucose X=physact (5 levels), BMI . use http://people.umass.edu/biep640w/datasets/hersdata1000.dta, clear . ** descriptives of glucose, by levels of physical activity . sort physact . tabstat glucose, by(physact) col(stat) stat(n mean sd min p50 max) .** descriptive graphs . dotplot glucose, over(physact) center msymbol(d) msize(tiny) xlabel(1 "1" 2 "2" 3 "3" 4 "4" 5 "5") . graph matrix glucose age BMI, half msize(tiny) . ** Assessment of normality of Y=glucose . swilk glucose . sfrancia glucose . tabstat glucose, stat(min max) . histogram glucose, start(0) width(25) percent addlabels normal . ** One predictor - continuous . regress glucose BMI . ** One predictor - nominal physical activity with design variables . xi: regress glucose i.physact . ** multiple predictor model with both BMI and physact (with design variables) . xi: regress glucose BMI i.physact . ** Partial F test of BMI controlling for physact: 1 df Partial F . testparm BMI . ** Partial F test of physical activity controlling for BMI: 4 df Partial F . testparm _Iphysact*