Top Banner
Introduction to SAS
36

Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

Mar 31, 2015

Download

Documents

Jasper Mix
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

Introduction to SAS

Page 2: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

What is a data set?

• A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question.

Page 3: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

There are three types of datasets

• Cross-sectional

• Time-Series

• Panel (combination of cross-sectional time-series data sets)

Page 4: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

Cross-Sectional Data

• Cross-sectional data refers to data collected by observing many subjects (such as individuals, firms or countries/regions) at the same point of time, or without regard to differences in time.

Members Age Wage Years of schooling

John 40 100k 14Paul 34 110k 17Mary 28 75k 10Tom 30 130k 16Sara 37 50k 15

Page 5: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

Time-Series Data• A time series is a sequence of data points, measured

typically at successive times spaced at uniform time intervals.

• Frequencies: daily, weekly, monthly, quarterly, annual

Year GDP xyz Inflation Rate

2004 34 3.2

2005 30 2.5

2006 37 2.7

2007 38 3

2008 41 2.9

2009 43 3.4

Page 6: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

Panel Data• Panel data, also called longitudinal data or

cross-sectional time series data, are data where multiple cases (people, firms, countries etc) were observed at two or more time periods.

Person Year Income Age Sex

1 2003 1500 27 11 2004 1700 28 11 2005 2000 29 12 2003 2100 41 22 2004 2100 42 22 2005 2200 43 2

Page 7: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

What should you know about your dataset?

• What type of dataset do you have? • How many variables do you have?• How many observations do you have?• What kind of variables do you have?– Numeric. numerical variable is an observed

response that is a numerical value– String. A string variable is any combination of one

or more characters. • Are there missing values?

Page 8: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

How to store your dataset?

• Microsoft Excel Spreadsheets

Page 9: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

Accessing SASVersion 9.2

Click on ENGLISH 9.2

Page 10: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

1. What does SAS look like?

EDITOR WINDOW

LOG WINDOW

OUTPUT WINDOW

RESULTS WINDOW

EXPLORER WINDOW

EXECUTE THE PROGRAM

NEW LIBRARIES

Page 11: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

Anatomy of a SAS Program(1) Data name statement(2) Input statement (list of all variables to be read into the

program)(3) Transformation statements(4) Datalines statement (copy & paste from Excel)(5) Placement of data(6) PROC statements– Means– Corr– Reg– Model– Autoreg

(7) Run Statement

Page 12: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

Examples

Page 13: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

data spaghettisauce; input week qclassico pclassico qhunts phunts qnewmans pnewmans qprego pprego qprivl pprivl qragu pragu totalexp; datalines; 1 29.478939 2.311685 80.401300 1.099910 16.557905 2.287727 117.049632 1.909622 41.363767 1.336341 276.759921 1.677577 937.542909 2 31.380376 2.305388 75.181905 1.105020 15.409280 2.299160 125.986697 1.847495 41.584220 1.311940 206.699207 1.777097 845.490186 3 31.762660 2.299778 69.281355 1.111880 17.952117 2.160420 123.057729 1.870962 34.458333 1.381772 218.231648 1.738040 846.008960 4 28.447741 2.341264 68.898908 1.108804 15.187799 2.321191 114.953810 1.932617 33.825571 1.362120 204.152369 1.752055 804.175192 5 27.772665 2.340832 77.208027 1.080379 15.651408 2.249415 113.247798 1.920066 35.508482 1.356528 180.526273 1.846330 782.554156

Spaghetti Sauce Program

Data set name

Placement of data after the datalines statement

Input statement

Page 14: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

6 28.251703 2.362670 96.507708 1.060712 16.511740 2.246208 125.877846 1.899778 61.823708 1.133252 217.893723 1.812694 910.381585 7 26.947404 2.368843 84.284722 1.048924 16.802342 2.203934 120.413152 1.877365 37.019864 1.353188 222.342183 1.796701 864.910385 8 26.669631 2.375479 81.810965 1.071276 16.730153 2.207156 121.300549 1.823129 33.521622 1.359883 187.289016 1.809025 793.463874 9 29.190977 2.354548 97.958015 1.045915 16.885963 2.205486 126.792828 1.855721 38.925676 1.343224 229.625749 1.754891 898.975245 10 30.564590 2.301370 96.337535 1.073289 16.835041 2.216497 112.731447 1.930341 44.831781 1.315669 212.556985 1.798351 869.899250 11 29.502039 2.342312 76.135599 1.104502 16.832199 2.195138 122.730980 1.912570 43.597670 1.272487 244.799346 1.692623 894.705963 12 29.454762 2.383079 69.803347 1.118029 18.328200 2.175770 118.288762 1.892754 56.155822 1.161683 260.440575 1.704789 921.236157 13 28.853887 2.393748 72.185000 1.108094 18.922787 2.168626 133.727889 1.822013 35.419832 1.334668 219.052937 1.772264 869.240450 14 30.275710 2.361550 110.997722 0.970290 18.885386 2.164687 130.808890 1.849916 41.793621 1.296186 286.263290 1.671172 994.631492 15 34.241497 2.290308 91.463049 1.059148 18.770848 2.158183 137.464940 1.858437 42.349396 1.337508 287.937805 1.680284 1011.737278

Page 15: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

16 43.102764 2.246922 80.876807 1.067257 18.982216 2.108769 150.014841 1.806350 42.336055 1.277091 204.939009 1.787616 914.592550 17 35.687632 2.329571 80.627606 1.084763 16.899180 2.202753 124.371155 1.881480 65.104947 1.132962 230.349266 1.756095 920.101902 18 37.710794 2.327977 131.616811 0.983176 16.947751 2.238778 136.538891 1.873221 53.952967 1.180342 310.249254 1.587328 1067.052402 19 36.972091 2.265346 90.488003 1.049932 16.030220 2.244342 134.412773 1.827838 41.157579 1.263050 313.810158 1.604585 1015.942051 20 32.236119 2.393364 92.735918 1.032483 17.484182 2.219749 131.812201 1.822419 39.314523 1.331265 290.076840 1.616331 973.126818 21 31.584801 2.409353 77.131493 1.078318 17.947063 2.184630 137.357622 1.816392 40.795877 1.295275 196.146760 1.815537 856.927770 22 33.133108 2.406975 90.895588 1.047093 18.852538 2.178318 169.203190 1.780520 47.883579 1.285596 202.989618 1.779423 940.026516 23 36.753574 2.363383 97.552040 1.048465 19.426319 2.148674 131.769601 1.897437 51.256152 1.274568 210.257725 1.798278 924.339307 24 34.855495 2.399628 92.632436 1.050174 20.321799 2.127138 151.600412 1.848591 43.226880 1.305765 176.466149 1.835858 884.805764 25 39.940000 2.369996 71.949897 1.102729 19.500000 2.136197 131.142332 1.913279 39.030243 1.317081 192.511247 1.820689 868.475691

Page 16: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

26 33.047390 2.333880 64.127801 1.073595 16.576976 2.117485 104.450759 1.885650 30.055429 1.322393 133.549765 1.863909 666.704699 27 38.182377 2.358465 82.609900 1.091488 21.204360 2.130235 145.691479 1.834819 50.747411 1.278392 250.367996 1.675408 977.050906 28 39.340907 2.310507 88.645386 1.074976 20.679452 2.071805 140.632073 1.858516 39.082103 1.323507 225.536421 1.708309 927.411188 29 42.142760 2.249801 70.741651 1.112186 18.398206 2.174521 136.383001 1.849351 39.929939 1.311964 196.440037 1.734620 858.853482 30 33.415941 2.359613 64.464785 1.107849 16.740638 2.168873 113.736908 1.876736 33.423684 1.322969 155.285085 1.816932 726.389342 31 38.053214 2.421593 117.697194 0.980586 22.297249 2.124518 164.054088 1.831481 50.552102 1.259899 255.540157 1.666451 1044.930162 32 36.574890 2.448129 111.200050 1.033788 20.088398 2.198892 176.283723 1.757429 47.497145 1.281080 300.352885 1.628485 1108.443481 33 39.515679 2.460343 96.845945 1.047416 19.503119 2.177400 162.312382 1.804064 45.812445 1.294824 281.485505 1.798869 1099.622735 34 49.178044 2.448336 119.427531 0.942305 18.707266 2.173055 138.827152 1.826863 40.742757 1.314488 194.565294 1.816060 934.109753 35 42.717913 2.461972 88.517788 1.054659 18.438739 2.161882 149.951936 1.769869 57.621016 1.141232 224.360623 1.710587 953.331361

Page 17: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

36 41.197544 2.471480 94.505700 1.046608 18.941095 2.156862 135.265353 1.858542 60.168718 1.180721 229.472645 1.719798 958.668060 37 39.788842 2.453568 102.044994 0.985755 18.903699 2.151540 147.601324 1.850108 42.290973 1.306216 234.360074 1.651436 954.238256 38 41.314488 2.395698 85.400518 1.028951 20.735704 2.067240 177.527764 1.656228 36.378665 1.345077 239.807719 1.692423 978.530320 39 42.616783 2.383487 85.074060 1.037431 20.512587 2.059879 160.196547 1.799702 42.088850 1.291107 208.047699 1.687847 925.888402 40 41.664717 2.379957 89.145855 1.069841 18.182713 2.111224 163.945941 1.799987 43.358742 1.282735 226.592166 1.639427 955.119559 41 51.567643 2.222860 83.334685 1.057869 18.994402 2.103811 132.379793 1.884823 53.652463 1.147004 192.771435 1.735189 888.292404 42 41.421016 2.451080 74.908315 1.067700 16.726950 2.162721 129.406980 1.867044 37.261913 1.295095 227.179097 1.576445 865.683152 43 53.792936 2.439119 78.807962 1.057600 16.198722 2.182931 135.693022 1.840945 40.210292 1.306747 200.088160 1.645527 881.513904 44 43.606944 2.456948 81.652867 1.071207 16.419492 2.195529 146.731878 1.853020 46.255420 1.258085 192.757055 1.708901 890.149666 45 37.815625 2.486153 90.360281 1.013287 15.269039 2.216885 133.537747 1.840366 38.729663 1.333144 201.300751 1.644203 847.795877

Page 18: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

46 37.094566 2.487414 72.384919 1.068425 14.802451 2.229075 141.855150 1.734876 35.310976 1.343831 159.132530 1.762777 776.671417 47 33.204738 2.459258 64.056208 1.073800 13.912000 2.230996 107.101937 1.820402 32.968196 1.318080 171.275693 1.729661 716.152377 48 35.602401 2.476466 73.726567 1.061919 15.994960 2.215290 108.142838 1.873297 36.839772 1.326406 169.023852 1.756630 750.253771 49 34.042741 2.504729 76.487107 1.074008 15.018651 2.213809 123.314975 1.854718 40.085731 1.330138 169.606295 1.745948 778.821860 50 34.286204 2.485415 80.908097 1.029931 14.701141 2.202930 149.748097 1.759291 33.764603 1.345225 157.402089 1.821746 796.548881 51 32.317382 2.512603 60.741143 1.093612 15.676934 2.159240 113.224372 1.859247 33.115052 1.325042 131.361039 1.850146 678.906266 52 39.603541 2.371574 68.719874 1.064090 18.118513 2.107728 115.728895 1.868074 35.667096 1.322699 152.943591 1.786516 741.838900 ;;; Options nodate; proc means data=spaghettisauce n mean median std min max cv skewness kurtosis; var qprego pprego; run;

Need this statement after the data

No date will appear on the output

Page 19: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

proc corr data=spahgettisauce; var qprego pprego; run; proc reg data=spaghettisauce; model qprego=pprego / dwprob; output out=datareg r=resqprego p=predqprego; run; proc autoreg data=spaghettisauce; model qprego=pprego / normal; run; proc print data=datareg; var week qprego pprego resqprego predqprego; run; proc reg data=spaghettisauce; model qprego=pprego / pcorr2 clb cli alpha=.10; run;

Model Statement

print

Creation of a data set named datareg which contains the predicted values of the dependent variable and the residuals

Test of normality of the residuals autoreg also produces AIC, SIC, and within sample MAE, MAPE, and RMSE.

Confidence intervals associated with the estimated coefficients

Square of partial correlation coefficients

Page 20: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

Statistics in SASUse PROC MEANS or PROC CORR

Proc Means Data = ??? N mean median std min max cv skewness kurtosis var

var_name1 var_name2…;

Page 21: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

The SAS System The MEANS Procedure Variable N Mean Median Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ qprego 52 134.5458093 132.9587700 17.8065350 104.4507590 177.5277640 pprego 52 1.8458800 1.8515640 0.0517779 1.6562280 1.9326170 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Coeff of Variable Variation Skewness Kurtosis ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ qprego 13.2345519 0.5902592 -0.1063091 pprego 2.8050533 -1.0928616 2.5133372 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Page 22: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

Regression in SAS

Use PROC REG or PROC MODELSimple and Multiple Regression

Page 23: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

Using SAS PROC REG for Simple Linear Regression

• The general syntax for PROC REG is – PROC REG <options>; <statements>;

• The most commonly used options are:– DATA=datsetname

• Specifies dataset

– SIMPLE• Displays descriptive statistics

• The most commonly used statements are:– MODEL dependentvar = independentvar </ options >;

• Specifies the variable to be predicted (dependentvar) and the variable that is the predictor (independentvar)

• Several MODEL options are available.

Page 24: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

Example

Proc reg data = spaghettisauceModel qprego = pprego/Pr cli dwprob;

Page 25: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

The SAS System The REG Procedure Model: MODEL1 Dependent Variable: qprego Number of Observations Read 52 Number of Observations Used 52 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 8631.07541 8631.07541 57.24 <.0001 Error 50 7539.63173 150.79263 Corrected Total 51 16171 Root MSE 12.27977 R-Square 0.5337 Dependent Mean 134.54581 Adj R-Sq 0.5244 Coeff Var 9.12683 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 598.31966 61.32413 9.76 <.0001

pprego 1 -251.24810 33.20935 -7.57 <.0001

SSRSSE SST

R22R

Page 26: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

The SAS System The REG Procedure Model: MODEL1 Dependent Variable: qprego Durbin-Watson D 1.132 Pr < DW 0.0004 Pr > DW 0.9996 Number of Observations 52 1st Order Autocorrelation 0.422

NOTE: Pr<DW is the p-value for testing positive autocorrelation, and Pr>DW is the p-value for testing negative autocorrelation.

Page 27: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

The SAS System The AUTOREG Procedure Dependent Variable qprego Ordinary Least Squares Estimates SSE 7539.63173 DFE 50 MSE 150.79263 Root MSE(RMSE) 12.27977 SBC 414.25971 AIC 410.357222 MAE 9.49555836 AICC 410.60212 MAPE 7.12604319 Regress R-Square 0.5337 Durbin-Watson 1.1321 Total R-Square 0.5337 Miscellaneous Statistics Statistic Value Prob Label Normal Test 0.4812 0.7862 Pr > ChiSq Standard Approx Variable DF Estimate Error t Value Pr > |t| Intercept 1 598.3197 61.3241 9.76 <.0001 pprego 1 -251.2481 33.2094 -7.57 <.0001

Test of normality

of residuals

Page 28: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

The SAS System Obs week qprego pprego resqprego predqprego 1 1 117.050 1.90962 -1.4811 118.531 2 2 125.987 1.84750 -8.1534 134.140 3 3 123.058 1.87096 -5.1863 128.244 4 4 114.954 1.93262 2.2005 112.753 5 5 113.248 1.92007 -2.6589 115.907 6 6 125.878 1.89978 4.8738 121.004 7 7 120.413 1.87737 -6.2221 126.635 8 8 121.301 1.82313 -18.9614 140.262 9 9 126.793 1.85572 -5.2805 132.073 10 10 112.731 1.93034 -0.5937 113.325 11 11 122.731 1.91257 4.9409 117.790 12 12 118.289 1.89275 -4.4800 122.769 13 13 133.728 1.82201 -6.8145 140.542 14 14 130.809 1.84992 -2.7229 133.532 15 15 137.465 1.85844 6.0740 131.391 16 16 150.015 1.80635 5.5372 144.478 17 17 124.371 1.88148 -1.2302 125.601 18 18 136.539 1.87322 8.8625 127.676 19 19 134.413 1.82784 -4.6661 139.079 20 20 131.812 1.82242 -8.6281 140.440 21 21 137.358 1.81639 -4.5970 141.955 22 22 169.203 1.78052 18.2358 150.967 23 23 131.770 1.89744 10.1774 121.592 24 24 151.600 1.84859 17.7357 133.865 25 25 131.142 1.91328 13.5304 117.612

residual predicted variables

Page 29: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

Obs week qprego pprego resqprego predqprego 26 26 104.451 1.88565 -20.1029 124.554 27 27 145.691 1.83482 8.3666 137.325 28 28 140.632 1.85852 9.2610 131.371 29 29 136.383 1.84935 2.7093 133.674 30 30 113.737 1.87674 -13.0564 126.793 31 31 164.054 1.83148 25.8906 138.164 32 32 176.284 1.75743 19.5148 156.769 33 33 162.312 1.80406 17.2604 145.052 34 34 138.827 1.82686 -0.4966 139.324 35 35 149.952 1.76987 -3.6915 153.643 36 36 135.265 1.85854 3.9008 131.365 37 37 147.601 1.85011 14.1178 133.484 38 38 177.528 1.65623 -4.6678 182.196 39 39 160.197 1.79970 14.0486 146.148 40 40 163.946 1.79999 17.8696 146.076 41 41 132.380 1.88482 7.6183 124.761 42 42 129.407 1.86704 0.1786 129.228 43 43 135.693 1.84095 -0.0927 135.786 44 44 146.732 1.85302 13.9800 132.752 45 45 133.538 1.84037 -2.3934 135.931 46 46 141.855 1.73488 -20.5802 162.435 47 47 107.102 1.82040 -33.8452 140.947 48 48 108.143 1.87330 -19.5145 127.657 49 49 123.315 1.85472 -9.0103 132.325 50 50 149.748 1.75929 -6.5530 156.301 51 51 113.224 1.85925 -17.9630 131.187 52 52 115.729 1.86807 -13.2407 128.970

Page 30: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

The REG Procedure Model: MODEL1 Dependent Variable: qprego Number of Observations Read 52 Number of Observations Used 52 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 8631.07541 8631.07541 57.24 <.0001 Error 50 7539.63173 150.79263 Corrected Total 51 16171 Root MSE 12.27977 R-Square 0.5337 Dependent Mean 134.54581 Adj R-Sq 0.5244 Coeff Var 9.12683 Parameter Estimates Squared Parameter Standard Partial Variable DF Estimate Error t Value Pr > |t| Corr Type II 90% Confidence Limits Intercept 1 598.31966 61.32413 9.76 <.0001 . 495.54624 701.09307 pprego 1 -251.24810 33.20935 -7.57 <.0001 0.53375 -306.90382 -195.59238

Confidence limits of parameter estimates square of partial correlation coefficients

Page 31: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

Using SAS PROC REG for Multiple Linear Regression

• The general syntax for PROC REG is – PROC REG <options>; <statements>;

• The most commonly used options are:– DATA=datsetname

• Specifies dataset

– SIMPLE• Displays descriptive statistics

• The most commonly used statements are:– MODEL dependentvar = independentvar </ options >

• Specifies the variable to be predicted (dependentvar) and the variables that are the predictors (independentvars)

Page 32: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

MODEL STATEMENT OPTIONS(Place after slash following the list of explanatory variables.)

• P Requests a table containing predicted values from the model

• R Requests that the residuals be analyzed.

• CLI Requests the 95 percent upper and lower confidence limits for an individual value of the dependent variable.

Page 33: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

Example

Page 34: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

data firms; input firm_id capital labor output; log_output=log(output); log_capital=log(capital); log_labor=log(labor); datalines; 1 8 23 106 2 9 14 81.08 3 4 38 72.8 4 2 97 57.34 5 6 11 66.79 6 6 43 98.23 7 3 93 82.68 8 6 49 99.77 9 8 36 110 10 8 43 118.93 11 4 61 95.05 12 8 31 112.83 13 3 57 64.54 14 6 97 137.22 15 4 93 86.17 16 2 72 56.25 17 3 61 81.1 18 3 97 65.23 19 9 89 149.56 20 3 25 65.43 21 1 81 36.06 22 4 11 56.92 23 2 64 49.59 24 3 10 43.21 25 6 71 121.24 ;;; options nodate; proc reg data=firms; model output=labor capital / pcorr2; run; proc reg data=firms; model log_output=log_labor log_capital / pcorr2; run;

Transformation statements

log_output=log(output); log_capital=log(capital); log_labor=log(labor);

Page 35: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

The REG Procedure Model: MODEL1 Dependent Variable: output Number of Observations Read 25 Number of Observations Used 25 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 17596 8798.14334 54.08 <.0001 Error 22 3578.83410 162.67428 Corrected Total 24 21175 Root MSE 12.75438 R-Square 0.8310 Dependent Mean 84.56080 Adj R-Sq 0.8156 Coeff Var 15.08309 Parameter Estimates Squared Parameter Standard Partial Variable DF Estimate Error t Value Pr > |t| Corr Type II Intercept 1 2.15525 9.01440 0.24 0.8132 . labor 1 0.47631 0.09215 5.17 <.0001 0.54842 capital 1 11.64477 1.13539 10.26 <.0001 0.82703

Square of partial correlation coefficients

SSRSSE SST

R22R

Page 36: Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

Model: MODEL1 Dependent Variable: log_output Number of Observations Read 25 Number of Observations Used 25 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 3.01454 1.50727 177.22 <.0001 Error 22 0.18711 0.00851 Corrected Total 24 3.20165 Root MSE 0.09222 R-Square 0.9416 Dependent Mean 4.37573 Adj R-Sq 0.9362 Coeff Var 2.10760 Parameter Estimates Squared Parameter Standard Partial Variable DF Estimate Error t Value Pr > |t| Corr Type II Intercept 1 2.48108 0.12862 19.29 <.0001 . log_labor 1 0.25734 0.02696 9.55 <.0001 0.80551 log_capital 1 0.64011 0.03473 18.43 <.0001 0.93917

R22R