-
SIAP-SRTC Training Course on SamplingAcceed Center, AIM,
MakatiPhilippines4 April 2002
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
OUTLINE Statistical Computing ResourcesData Management with
StataTable GenerationTab and Table CommandsSurvey Commands
2000 SPSS Public Sector User Exchange
-
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesThe Age of ICT has brought about a synergy of
computing and communicationsImplications: More DATA collectedMore
DATA storedMore DATA accessible and distributed
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesThere are a host of statistical software that
provide pre-programmed analytical and data management capabilities.
These software may be classified according to use and cost.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesTypes of Stat Software by usageGeneral
Purpose -- SAS, SPSS, R, Splus, Statistica, StataSpecial Purposes
-- econometric modeling (Eviews), seasonal adjustment (X12),
Bayesian modeling (WINBUGS), survey data tabulation & variance
estimation (IMPS, CENVAR)
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesTypes of Stat Software by costCommercial
Software - SAS, SPSS, Stata, S-plus Freeware - R, IMPS, X12
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesFOR SURVEY DATABascula from Statistics
Netherlands. CENVAR (& IMPS)from U.S. Bureau of the Census.
CLUSTERS from University of Essex. Epi Info from Centers for
Disease Control. Generalized Estimation System (GES) from
Statistics Canada. IVEWare (beta version) from University of
Michigan.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesFOR SURVEY DATAPCCARP from Iowa State
University. SAS/STAT from SAS Institute. Stata from Stata
Corporation. SUDAAN from Research Triangle Institute. VPLX from
U.S. Bureau of the Census. WesVar from Westat, Inc.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesLists of Statistical Software
http://members.aol.com/johnp71/javasta2.html
http://www.stir.ac.uk/Departments/HumanSciences/SocInfo/Statistical.htmhttp://www.fas.harvard.edu/~stats/survey-soft/
http://www.feweb.vu.nl/econometriclinks/software.html
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesThis afternoon, we will provide a
demonstration on how to use STATA for accomplishing some of the
most common tasks of data management, statistical computing and
analysis of survey data.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesStata Estimation of means, totals, ratios,
and proportions; linear regression, logistic regression, and
probit. Point estimates, associated standard errors, confidence
intervals, and design effects for the full population or
subpopulations are displayed.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesStata Auxiliary commands display various
information for linear combinations (e.g., differences) of
estimators, and conduct hypothesis tests. New in Stata :
contingency tables with Rao-Scott corrections of chi-squared tests;
new survey-corrected regression commands including tobit, interval,
censored, instrumental variables, multinomial logit, ordered logit
and probit, and Poisson
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesStatastratified designs; cluster sampling;
FPCs can be calculated for simple random sampling w/o replacement
of sampling units within strata; variance estimation for multistage
sample data carried out through the customary
between-PSU-squared-differences calculation.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesStataVariance estimation is done thru
Taylor-series linearization in the survey analysis commands. There
are also commands for jackknife and bootstrap variance estimation,
but these are not specifically oriented toward survey data.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Computing ResourcesNote:We will demonstrate the use of STATA
version 6. Current version is version 7; even a Special Edition
(SE) which can handle up to 32,766 variables w/ strings up to 244
chars, and up to 11,000 x 11,000 matrices.
2000 SPSS Public Sector User Exchange
-
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementSTARTING UPGo to Start, Programs, Stata,
Intercooled StataAlternatively, from Windows Explorer, go to folder
c:\stata Double click wstata.exe
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data Management
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementCREATING A NEW DATASETOpen the STATA spreadsheet
editor
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementCREATING A NEW DATASETEnter data into the editor,
when done close the editor.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementCREATING A NEW DATASETIn the STATA COMMAND window
enter the commandsave newfile
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementNOTEA STATA dataset will have extension name dta.
That is, newfile is actually newfile.dtaPublic use files of some
surveys, e.g. VLSS (Vietnam Living Standards Survey), are in Stata
format.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementINSPECTING DATA BASEIn the STATA COMMAND window
enter the following commandsdescribe list summarize
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementNOTE:Stata is case sensitive.Stata commands may
be abbreviated, e.g. D for DESCRIBE, SUM for SUMMARIZE, etc.We may
use Page Up/Down keys or mouse for re-selecting commands in the
Review window.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementNOTE:Commands and output are shown in Results
window. Windows may be re-sized. Commands and output may be logged
into a log file by pressing Open Log button.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementRENAMING VARIABLESONE WAY : (From Data Editor)
Double click anywhere in the variables column resulting in a
dialogue box
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementRENAMING VARIABLESSECOND WAY: (In the STATA
COMMAND window) enter rename var1 domain rename var2 hcn rename
var3 age label variable age HH head age d
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementSAVING EDITED DATABASEIn the STATA COMMAND window
enter the following commands save newfile, replaceNote: typing only
save newfile will result in an error message
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementREADING PRE-EXISTING STATA DATASETIf dataset is
in folder c:\fies2000 and filename is fies00small.dta, enter clear
set mem 64m cd c:\fies2000 use fies00smallNOTE: Impt for MEMORY
MANAGEMENT
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementIMPORTING DATASuppose we have a dataset try.txt
in c:\fies2000 folder NOTE: Missing Data coded as .
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementIMPORTING DATASuppose we have a dataset try.txt
in c:\fies2000 folderUse the infile command with syntaxinfile
variable-list using filename.rawIn particular, entercd c:\fies2000
infile domain hcn age using try.txt, automatic
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementTRIVIA ON STRING VARIABLESWhen using the infile
command for character (string) variables, we need to identify these
variables. For instanceinfile domain hcn str30 prov using tr.txtFor
more details regarding infile, enterhelp infile1
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementIMPORTING DATASuppose we have a dataset try2.txt
in c:\fies2000 folder with the data in specific fields Assumes last
line is blank line
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementIMPORTING DATASuppose we have a dataset try2.txt
in c:\fies2000 folder with the data in specific fieldsUse the infix
command infix domain 1 hcn 2 age 3-4 using try2.txt, clear
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementThus, Stata can read text files withInfile (if
the data in text is separated by spaces and does not have strings,
or if strings are just one word, or if all strings are enclosed in
quotes)Infix (fixed format text)Insheet (if text file was created
by a spreadsheet or db program)
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementNOTE:The commands infile, infix, insheet read
data from ASCII files. Outfile is a way to save the data in ASCII.
There are third party programs, esp. Stat/Transfer and DBMS/COPY,
that perform translations from one data format (e.g., dBASE, Excel,
SAS, SPSS, Stata) to another.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data Management
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementOTHER USEFUL COMMANDSTo sort the dataset by age
sort ageTo get a listing of the datasetlistTo get a listing of the
2nd-4th datalist in 2/4
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementOTHER USEFUL COMMANDSTo summarize the restricted
dataset of HHs whose heads age is less than/equal to 50summarize if
age > = == < 35To get the correlation matrixcorrelate x y
z
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementGENERATING & REPLACING VARIABLESSuppose we
want to obtain per capita income (pci) of FIES 2000
householdsclearcd d:\fies00use fies00small gen pci=toinc/hsize
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementGENERATING & REPLACING VARIABLESNow tag the
household as poor (1) if pci < some threshold, say 13823,
determine percent of HHs that are poor. gen poor=1 if pci <
13823 replace poor=0 if poor==. sum poor [aw=rfact] save
fies00small, replace
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementNOTESmall portion of data set of FIES 2000 was
used. The Family Income and Expenditure Survey (FIES) is conducted
by the National Statistics Office (NSO)every 3 years. Data may be
purchased through the NSO website: www.census.gov.ph
2000 SPSS Public Sector User Exchange
-
SIAP-SRTC Training Course on SamplingAcceed Center, AIM,
MakatiPhilippines5 April 2002
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementRECALLThat if we use our fies2000 data setset mem
64m cd c:\fies2000 use fies00small sum poor [aw=rfact]Note poverty
line we provided is a weighted average of the variable poverty
lines in the Philippines (for urban-rural areas across the
different regions)
2000 SPSS Public Sector User Exchange
-
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Estimating Food Poverty LineFood poverty line estimated from low
cost one day menus (breakfast, lunch, supper snack) constructed for
each urban-rural area of a region by Food and Nutrient Research
Institute (FNRI) which meet 100% sufficiency in energy and protein
requirements and 80% sufficiency of other nutrients and vitamins.
RDAs for energy: 2000 Kcal per personRDAs for protein: 50 grams per
person29 such menus constructed on the basis of the 1988 Food
Consumption Survey
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Annual Per Capita Food Line Urban, by Region
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Annual Per Capita Food Line Rural, by Region
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Estimating Poverty LinePoverty Line= Food Threshold/ Engels
Coefficient Engels coefficient estimated by analyzing the
consumption pattern of families having incomes within plus or minus
10 percentage points from food threshold. Engels coeff = Food Exp/
Total Basic Exp
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Annual Per Capita Poverty Line Urban, by Region
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Annual Per Capita Poverty Line Rural, by Region
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Poverty Statistics (Family)[Standard Error]
Measures20001997
Poverty Incidence 33.6% [0.3%]31.8%Poverty
Gap10.7%[0.1%]10.0%Severity Index4.6%[0.1%]4.3%
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Poverty Incidence All Areas, by Region
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Small Area Poverty Stats?Stata has some add ons for generating
SEs for poverty statsIf we wish to generate provincial poverty
statistics, we will find out that SEs are too high, i.e. figures
are unreliable
2000 SPSS Public Sector User Exchange
-
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementRECALLThat if we use our fies2000 data setset mem
64m cd c:\fies2000 use fies00small sum poor [aw=rfact]Note poverty
line we provided is a weighted average of the variable poverty
lines in the Philippines (for urban-rural areas across the
different regions)
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementNOTE:STATA uses several types of weights fw
frequency weightsaw analytic weights iw importance weightspw
probability weights
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementNOTE:Within the command generate or replace, we
may transform or create variables by using functions, e.g.,generate
loginc=ln(toinc) generate y=cos(x*_pi/180)replace newvar=normd(z)
generate rvar=uniform()
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementDELETING VARIABLES/DATATo drop a variable, say
agedrop ageTo drop some observationsdrop in 2/3Try also the command
keep. To drop all data in memoryclear
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementNOTE:So far we have used STATA interactively. We
can also do batch processing through the DO FILE editor.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data ManagementNOTE:The STATA toolbar has 13 buttons.
The first three are to OPEN a Stata datasetSAVE to the disk the
resident dataset PRINT a graph or log
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data Management
The next five are for Starting/stopping/suspending a LOG
Bringing the Log to the Front Bringing the Dialog to Front Bringing
the Results to Front Bringing the Graph to Front
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Data Management
The last five are for Opening the DO FILE editor Opening the
DATA editor Opening the DATA Browser Telling Stat to continue when
it has paused in mid of long output Stopping the current task
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
ExerciseWhat is the average income of families that are below or
above the mean family expenditure?
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
ExerciseCompare correlation of food expenditures (fexp) and
nonfood expenditures for families in rural & urban areas.
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
ExtraEntergraph food nfood
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
ExtraNow trysort urb graph food nfood, by (urb) graph food
nfood, by (urb) total
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
ExtraMatrix plotsgraph toinc food nfood, matrix
2000 SPSS Public Sector User Exchange
-
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Table Generation w/ tabEarlier, we showed the use of the
tab(ulate) command. Trytab urb tab urb [aw=rfact]tab urb
[iw=rfact]tab urb regn
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
TabThe tab command has options for generating 1-way tables of
freqs tab urb, summ(toinc)and two way tables tab urb sextab urb
sex, rowtab urb sex, row col chi2 tab urb sex, all exact
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Table Generation w/ tableAside from the tab command, we can
generate tables of statistics with the table command. Compare tab
urbwithtable urb
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
TableTo generate the average (family) income and average
(family) expenditure across urban and rural areas, enter table urb,
c(mean toinc mean toexp)Using weights table urb [aw=rfact], c(mean
toinc mean toexp)
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
TableThe contents option may specify at most five of the ff
statistics: freq (for frequency) mean varname (for mean of varname)
sd varname (for standard deviation) sum varname (for sum) rawsum
varname (for sums ignoring optionally specified weight) count
varname (for count of nonmissing data)
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
TableThe contents option may specify at most five of the ff
statistics:n varname (same as count)max varname (for maximum)min
varname (for minimum)median varname (for median)p1 varname (for 1st
percentile)p2 varname (for 2nd percentile) ...iqr varname (for
interquartile range)
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Exercise Using TableObtain the average and median per capita
income of households by sex of household head table sex, c(mean pci
median pci)Obtain the weighted frequency of poor and nonpoor
households across regions table poor regn [iw=rfact]
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Using Survey CommandsSTATA has designed a family of commands
especially for sample surveys. These commands all begin with svy
svyset setting variables svydes describe strata and PSUs svymean
estimate popn & subpop means svytotals estimate popn &
subpop totals
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Using Survey CommandsSvy commands svyprop estimate popn &
subpop props svyratio estimate popn & subpop ratios svytab for
two way tables svyreg for regression svyivreg for instrumental
variables reg svylogit for logit reg svyprobitfor probit reg
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Using Survey CommandsSvy commands svytest for hypothesis testing
svylc for estimating linear combs svymlog for multinomial logistic
reg svyolog for ordered logistic reg svyoprob for ordered probit
reg svypois for poisson reg svyintrg for censored & interval
reg
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Using Survey CommandsBefore issuing any svy estimation command,
we identify the weight, strata and PSU identifier variables svyset
pweight rfact svyset strata domain svyset psu hcn
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Using Survey CommandsTo obtain the average family income &
average family expenditure svymean toinc toexp To obtain the total
family income, total family expenditure by provincesvytotal toinc
toexp, by(regn)
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Using Survey CommandsTo obtain the per capita income & per
capita expenditure svyratio toinc/fsize toexp/fsize pci & pce
by urban/rural svyratio toinc/fsize toexp/fsize, by(urb)
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Using Survey CommandsLinear regression of ln(pci) gen
loginc=ln(pci)svyreg loginc age fsize sex prov urbCompare the
results with the regular regression commandreg loginc age fsize sex
prov urb
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Using Survey CommandsTwo way tablessvytab urb poor, row se
compared withtab urb poor [aw=rfact], no freq row
2000 SPSS Public Sector User Exchange
-
Alternatives to STATA
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Learning More about StataOnline tutorial, typetutorial introList
of TutorialsTutorial
Description-----------------------------------------------------intro
An introduction to Statagraphics How to make graphstables How to
make tablesregress Estimating regression models, inc 2SLSanova
Estimating one-, two- and N-way ANOVA and ANCOVA models
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Learning More about StataTutorial
Description-----------------------------------------------------logit
Estimating maximum-likelihood logit and probit modelssurvival
Estimating ML survival modelsfactor Estimating factor and principal
component modelsourdata Description of the data we provideyourdata
How to input your own data into Stata
2000 SPSS Public Sector User Exchange
*SIAP-SRTC Training on Sampling
Learning More about StataEmail distribution list. Send email to
[email protected] the body of your email message
type the message subscribe statalist email@address or for a daily
summary subscribe statalist-digest email@address
2000 SPSS Public Sector User Exchange
-
Maraming Salamat sa inyong pakikinig.(Thank you for your
attention)
2000 SPSS Public Sector User Exchange
2000 SPSS Public Sector User Exchange2000 SPSS Public Sector
User Exchange2000 SPSS Public Sector User Exchange2000 SPSS Public
Sector User Exchange2000 SPSS Public Sector User Exchange2000 SPSS
Public Sector User Exchange2000 SPSS Public Sector User
Exchange2000 SPSS Public Sector User Exchange