S dng chng trnh Stata khai thc s liu iu tra Mc sng h gia nh
(VLSS) *
ni dung
Chng I: Gii thiu chung v chng trnh Stata1. T chc lu tr d liu
trong Stata (Dataset in Stata) Stata l phn mm thng k s dng qun l,
phn tch s liu v v th. Stata cho php lu tr thng tin v cc c im ca cc
i tng nghin cu. S liu lu tr trong Stata c th c hin th di dng bng nh
v d sau: hhcode headname 101 Nguyen Van A 102 Le Thi B 103 Tran Van
C Quan st (bn ghi) hhsize 6 5 10 incomepc 2100 3210 1200
Mi mt hng ngang ca bng s liu c gi l mt quan st (observation),
hay mt bn ghi (record) lu tr s liu v mt i tng nghin cu. v d trn c 3
quan st lu tr s liu v M h (hhcode); Tn ch h (headname); Quy m h
(hhsize); Thu nhp bnh qun (incomepc) ca 3 h gia nh. Bin (trng; thuc
tnh) Thng tin v i tng nghin cu c thu thp v lu tr theo cc c im ca
chng. Cc c im ny c gi l bin (variable), hay trng (field). Bin c xem
l cc ct ca bng s liu. v d trn c 4 bin, vi tn l hhcoed, hedname,
hhsize, v incomepc. Tn bin di t 1 n 32 k t, c bt u ch hoc du gch di
(_). Tn bin ch bao gm ch, s v du gch di. Cc k t c bit khc khng th
dng t tn cho bin. Bin xc nh (identifying variables) Thng thng trong
cc bin s c cc bin dng nhn dng quan st, c gi l bin xc nh. Nh c cc
bin xc nh ny m cc quan st c th phn bit c vi nhau. Mi mt quan st c
mt gi tr ca cc bin ny. v d trn, bin xc nh l hhcode, i vi mi mt quan
st bin hhcode nhn mt gi tr. Cc c im ca bin Cc bin c th c gn nhn (ch
thch). V d bin hhcode c th c gn nhn l M h.
1
Bin c th c nh dng (format) l bin s v bin k t vi cc loi lu tr khc
nhau. Bin s c th lu tr di loi byte; int; long; float; double. Cn
bin k t th c th lu tr di dng str1 n str80 cho cc di khc nhau. Kiu
lu tr Dung lng Gi tr nh nht Gi tr ln nht Kiu dng s (Byte) byte 1
-127 126 S nguyn int 2 -32,767 32,766 S nguyn long 4 -2,147,483,647
2,147,483,646 S nguyn float 4 -10^36 10^36 S thc double 8 -10^308
10^308 S thc Cc bin s c th bao gm cc bin ri rc v lin tc. Cc bin nh
l quy m h gia nh, gii tnh ch h, vng a l, trnh gio dc l cc bin ri rc
(discrete) (hay cn gi l bin phn loi (categorical)). Cc bin ny c th
c lu tr di dng byte, int, v long. Cc bin lin tc (continuous) nh thu
nhp, chi tiu ca h th lu tr di dng float hoc double. Bin k t
(string) dng lu tr cc loi k t. V d bin headname l bin kiu k t dng
lu tr tn ca ch h. Kiu lu tr dng ch str1 str2 ... str80 Byte 1 2 80
di ln nht 1 2 80
2. Khi ng v thot khi Stata (Open and exit) Stata c khi ng tng t
nh cc chng trnh tin hc ng dng khc, bng cch kch vo biu tng ca tp
wstata.exe trong Windows explorer, hoc chn bng cch chn Start ->
Program -> Stata. Chng trnh c thot ra bng lnh exit t ca s lnh
Stata Command, hoc tu chn exit trong thc n (menu) File. 3. Giao din
Stata 7 (Stata interface)1 Sau khi Stata c khi ng, giao din ca
Stata s c hin ln, bao gm thanh thc n (menu bar) trn cng, di l thanh
cng c (tool bar) v cc ca s (windows).
1
Phin bn Stata 8 c giao din tng t nh phin bn Stata 7. Khc bit ln
nht l Stata 8 c thm tu chn Statistics trong thanh thc n. Tu chn ny
cho php thc hin cc mt s lnh thng k bng cc tu chn qua giao din ca s
m khng phi g cc lnh trong ca s Command. 2
Cc ca s ca Stata Cc ca s ca Stata c m ra bng vic la chn cc tu
chn thanh thc n Windows (menu bar). Cc ca s ny bao gm: Results
Graph Viewer Command Review Variables Data editor Do-file editor
Hin th cc lnh v kt qu Hin th th Hin th ca s tr gip (help) v hin th
ni dung cc file vn bn (text) Dng g cc cu lnh Hin th cc lnh thc hin
Hin th danh sch cc bin ca tp s liu Hin th v sa cha s liu di dng bng
Hin th ca s son tho chng trnh
Thanh thc n (Menu bar) Bng cch kch vo thanh thc n v cc tu chn
trong , Stata s thc hin cc lnh khc nhau. Thanh thc n bao gm cc nhm
lnh sau y: File Open M file s liu3
View Save Save as File name Log Save graph Print graph Print
results Exit Edit Copy text Copy tables Paste Table copy
options
Xem cc file ca Stata trong ca s Viewer Lu file s liu Lu file s
liu di tn mi Chn tn file a vo ca s lnh ng, m, xem li log file Lu gi
file th In th In kt qu Thot khi Stata
Sao chp vn bn (text) Sao chp bng biu Dn La chn sao chp bng s liu
La chn sao chp th (khng c trong Stata 7) Cc tu chn v mu sc, phng
ch, v kch c
Graph copy options Prefs Windows Results Graph Log Viewer
Command Review Variables Help/Search Data editor Do-file editor
Help Thanh cng c (tool bar)
M ca s kt qu M ca s th M ca s log file M ca s tr gip (help) v
xem ni dung file M ca s cu lnh M ca s cc lnh thc hin M ca s danh
sch cc bin ca tp s liu M ca s tr gip (help) M ca xem s liu lu tr di
dng bng M ca s vit chng trnh Cc tr gip lin quan n vic s dng
Stata
Cc tu chn trn thanh cng c c thit k thc hin cc lnh thng dng ca
Stata. Nu chng ta di chuyn con tr n cc nt ny th s hin ln cc cu hung
dn, bao gm: Open (use) Save M file s liu Stata Lu tr file s liu ra
a4
Print results Begin log Start viewer Bring Dialog Window to font
Bring Result Window to font Bring Graph Window to font Do-file
editor Data editor Data browser Clear more- condition Break
In ni dung ca ca s kt qu M, ng v xem ni dung ca file log M ca s
tr tr (help) a ca s hp thoi ra pha trc a ca s kt qu ra pha trc a ca
s v th ra pha trc M ca s son tho chng trnh M ca s sa cha s liu M ca
s xem s liu Tt lnh more Dng vic thc hin lnh hoc chng trnh
4. Bin bn lm vic (log file) Thng thng khi lm vic vi Stata, ngi s
dng mun ghi li bin bn lm vic bao gm cc lnh, cc thng bo v cc kt qu
phn tch thu c. Stata cho php ghi li cc bin bn lm vic bng lnh log
using. C php: log using (ng dn\tn tp) [, append replace [ text |
smcl ] ] Cc tu chn: append replace text smcl V d: log using baitap1
. log using baitap1
------------------------------------------------------------------------------log:
C:\baitap1.smcl log type: smcl opened on: 17 Feb 2004, 15:32:03 log
using baitap1, replace log using d:\baitap2, text To tp baitap1 ghi
ln tp baitap1 c sn To tp baitap2 ti a D, di dng vn bn (text) (phn m
rng l log)5
Ghi bin bn lm vic tip vo 1 file c sn Ghi li bin bn lm vic ln 1
file c sn To bin bn lm vic di dng vn bn (text) (phn m rng l log) To
bin bn lm vic di dng smcl (phn m rng l smcl), y cng l tu chn ngm
nh
To tp baitap1 ghi li bin bn lm vic ti th mc hin thi, phn m rng
mc nh l smcl
log using d:\baitap2, append V d: translate baitap1.smcl
exercise1.log log off
Ghi tip tc bin bn lm vic tp baitap2 ti a D
Cc tp vi phn m rng smcl c th chuyn thnh cc tp text bng lnh
translate.
Lnh ny tm thi dng vic ghi li bin bn lm vic vo tp log/smcl ang m
log on Lnh ny tip tc ghi bin bn lm vic vo tp log ang m. Lnh ny c
dng sau ln log using hoc log off. log close Lnh ny ng v lu tr tp
log ang m. Ch : Stata cho php ch ghi li nhng g m ngi s dng g trong
ca s command, vic ny gip cho vic sau ny vit cc chng trnh da trn
nhng bin bn lm vic. C php: cmdlog using (ng dn\tn tp) [, append
replace] cmdlog {off | on | close} xem cc file log/smcl vo thanh
thc n: file/log/view (hoc ca s lnh command g: view (tn tp)); hoc c
th m bng cc chng trnh son thao vn bn khc nh MS-Word; Notepad
5. Nhp v lu d liu (Use, input and and save) M tp s liu ang c: C
php: use (ng dn\tn tp) Lnh ny m tp Stata, vi phn rng l .dta, c ch
ra tn tp. V d: use ho1.dta use "D:\VHLSS 2004\ho1.dta", clear m tp
ho1.dta th mc hin thi m tp ho1.ta th mc VHLSS 2004 trn D
Tp s liu Stata c th c m bng la chn Open trn thc n File; hoc nt
Open (use) trn thanh cng c tool bar. Nu file s liu c dung lng ln th
chng ta phi thit lp b nh cn dng cho Stata bng lnh: set memory
#[k|m] V d: set mem 32m set mem 32000k Nhp s liu C mt s cch nhp s
liu t bn phm vo b nh ca Stata.6
-
S dng ca s Stata editor nhp s liu. Hoc t ca s command, g lnh
edit. Sau nhp s liu theo kiu biu bng trong ca s ny. S dng lnh:
input [danh sch bin + nh dng nu cn] Sau s dng bn phm nhp s liu ln
lt cho cc bin ca tng quan st. Gi tr c nhp cch nhau 1 k t trng. Kt
thc nhp s liu bng lnh end. V d: . input hhcode str15 name income
hhcode name income 1. 101 "Nguyen Van A" 1200 2. 102 "Nguyen Van B"
1350 3. 103 "Tran Thi C" 2310 4. end
Stata cho php nhp s liu t cc file c s d liu khc. Trc ht cc file
s liu ny cn c lu tr di dng text (c th bng chng trnh Excel), cc quan
st c cc nhau 1 dng v cc gi tr cch nhau 1 du phy (commas) hoc du cch
(tab). Sau dng lnh insheet nhp s liu ny vo Stata. C php: insheet
[danh sch bin] using (tn tp text) [, [no]names comma tab clear] Lnh
ny s c vo b nh ca Stata cc quan st ca tp text, v ch ra tn cc bin s
c to ra. Cc tu chn: [no]names comma tab clear V d: . insheet using
c:\income.txt (3 vars, 4 obs) . insheet maho hoten thunhap using
c:\income.txt (note: variable names in file ignored) (3 vars, 4
obs) Lu tr s liu C php: save (ng dn\tn tp) [,replace] Lnh ny lu tr
s liu ang trong b nh ca Stata thnh tp ch nh di tn tp. Nu tu chn
replace c ch ra th tp s liu ny s ghi ln tp hin thi (tt nhin tn tp s
liu l ging nhau).7
Cho php nhp tn bin c ch ra dng th nht ca file text Thng bo l cc
gi tr ca file text c phn cch bng du phy Thng bo l cc gi tr ca file
text c phn cch bng du tab S liu c c vo s thay th s liu ang c thng
tr trong b nh ca Stata
Vic lu tr s liu c th thc hin bng cc ty chn Save v Save as trong
thanh thc n (menu bar); hoc nt Save trn thanh cng c (tool bar). Ch
: Xem thm lnh infile v outfile
Chng II: Khai thc d liu1. Cu trc lnh trong Stata (Stata command
syntax) Cu trc c bn ca mt lnh trong Stata nh sau: [by danh sch
bin:] C php lnh [danh sch bin] [biu thc] [iu kin] [phm vi] [quyn s]
[, tu chn] Trong phn Hng dn s dng (Help) ca Stata, c php lnh trnh
by bng ting Anh nh sau: [by varlist:] command [varlist] [=exp] [if
exp] [in range] [weight] [, options] Trong du ngoc vung k hiu cc tu
chn. Ch : Cc cu lnh Stata c vit bng ch thng. i vi tn bin, Stata phn
bit ch vit thng vi ch vit hoa. V d, trong cng mt tp s liu, bin
Ho_ten v bin ho_ten l 2 bin khc nhau. Cc tu chn c k hiu trong du
ngoc vung [ ]. Cc tu chn ny c th c hoc khng trong cu lnh. Cc tham s
bt buc (tn bin) c t trong du ngoc < >. Cc cu lnh s khng thc
hin c nu cc tham s bt buc ny khng c khai bo.
8
-
Mt s lnh Stata cho php vit tt. V d lnh summarize c th vit tt l
sum. Trong cun ti liu ny phn gch chn di c php ca cu lnh l c php vit
tt ca cu lnh . Cc v d trong cun ti liu ny s dng s liu iu tra Mc sng
dn c nm 1998 do Tng cc Thng k tin hnh. Trong Tp chi tiu tng hp
Hhexp98n.dta thng xuyn c s dng.
-
by danh sch bin (by varlist): Stata s thc hin cu lnh vi theo tng
gi tr c ch ra bi danh sch bin. Bin c ch ra bi danh sch bin c yu cu
sp xp trc khi thc hin lnh. V d:. sort sex . by sex: sum -> sex =
1 Variable | Obs Mean Std. Dev. Min Max
-------------+----------------------------------------------------rlpcex1
| 4375 2980.906 2430.648 357.318 45801.71 -> sex = 2 Variable |
Obs Mean Std. Dev. Min Max
-------------+----------------------------------------------------rlpcex1
| 1624 3748.368 3231.241 376.9805 30624.77 rlpcex1
. sort sex urban98 . by sex urban98: sum rlpcex1
-> sex = 1, urban98 = Rural Variable | Obs Mean Std. Dev. Min
Max
-------------+----------------------------------------------------rlpcex1
| 3344 2308.134 1345.671 357.318 24386.43 -> sex = 1, urban98 =
Urban Variable | Obs Mean Std. Dev. Min Max
-------------+----------------------------------------------------rlpcex1
| 1031 5163.01 3602.245 682.9575 45801.71 -> sex = 2, urban98 =
Rural Variable | Obs Mean Std. Dev. Min Max
-------------+----------------------------------------------------rlpcex1
| 925 2553.448 1776.178 376.9805 25527.95 -> sex = 2, urban98 =
Urban Variable | Obs Mean Std. Dev. Min Max
9
-------------+----------------------------------------------------rlpcex1
| 699 5329.628 3962.946 1057.797 30624.77
Danh sch bin (varlist) Ch ra danh sch cc bin chu tc ng ca cu
lnh. Nu nh khng c bin no c ch ra th lnh Stata s c tc dng ln tt c cc
bin (all variables) V d:. sum hhsize sex reg7 Variable | Obs Mean
Std. Dev. Min Max
-------------+----------------------------------------------------hhsize
| 5999 4.752292 1.954292 1 19 sex | 5999 1.270712 .4443645 1 2 reg7
| 5999 4.01917 2.145305 1 7 . sum Variable | Obs Mean Std. Dev. Min
Max
-------------+----------------------------------------------------househol
| 5999 19617.86 11201.92 101 38820 year | 5999 97.94666 .2247337 97
98 month | 5999 6.340723 3.011082 1 12 --Break-r(1);
Lnh sum ny hin th thng k c bn ca tt c cc bin trong tp s liu. iu
kin (if exp) Stata ch thc hin cu lnh i vi cc quan st m gi tr ca n
cho kt qu ca biu thc l ng. V d:. sum poor if reg7==1
Variable | Obs Mean Std. Dev. Min Max
-------------+----------------------------------------------------poor
| 859 .4982538 .5002882 0 1
Lnh ny ch c tc dng i vi cc quan st m bin reg7 c gi tr bng 1. Phm
vi (in range) Ch ra phm vi cc quan st chu tc ng ca cu lnh. Range
(phm vi) c th c cc dng sau: sum poor in 10 Tnh gi tr trung bnh ca
bin poor cho quan st 10 (chnh bng gi tr ca bin poor ti quan st th
10)10
sum poor in 10/100 sum poor in f/100 sum poor in 100/l Quyn s
(weight)
Tnh gi tr trung bnh ca bin poor cho quan st t 10 n 100 Tnh gi tr
trung bnh ca bin poor cho quan st t u tin n 100 Tnh gi tr trung bnh
ca bin poor cho quan st t th 100 n quan st cui cng
Cho php tnh ton s dng quyn s. Tu chn v quyn s s c trnh by k mc 5
ca chng ny. Cc tu chn (Options) Nhiu cu lnh Stata cho php cc tu chn
ring. Cc tu chn ny c ch ra sau du phy. V d: Lnh sum c tu chn l
detail, cho php tnh ton thm mt s thng k khc ngoi gi tr trung bnh v
lnh chun.. sum rlpcex1, detail comp.M&Reg price adj.pc tot exp
------------------------------------------------------------Percentiles
Smallest 1% 682.9575 357.318 5% 1012.433 366.2792 10% 1238.088
376.9805 Obs 5999 25% 1671.054 381.3502 Sum of Wgt. 5999 50% 75%
90% 95% 99% 2397.042 3711.917 5940.803 8045.32 14163.04 Largest
26944.64 30624.77 31066.5 45801.71 Mean Std. Dev. Variance Skewness
Kurtosis 3188.667 2692.567 7249918 3.791027 29.21398
Ch : Stata cho php vit tt cc lnh v ty chn. Trong ti liu ny, phn
gch chn di cc lnh c ngha l lnh c th vit tt bng k t trong phn gch
chn ny. V d nh lnh use c ngha l c th c vit tt bi u. C php ca cc cu
lnh trong ti liu ny c vit bng ting Anh, cho php ngi c c th i chiu
vi phn hng dn s dng trong Stata.
-
2.Ton t v hm s (Operators and functions) Cc ton t (operators) Cc
ton t trong Stata c k hiu nh sau: K hiu S hc + * / Cng Tr Nhn
Chia11
ngha
^ Quan h > < >= tabulation of urban98 1:urban 98; |
0:rural 98 | Freq. Percent Cum.
------------+----------------------------------Rural | 4269 71.16
71.16 Urban | 1730 28.84 100.00
------------+----------------------------------Total | 5999 100.00
-> tabulation of reg7 Code by 7 | regions | Freq. Percent Cum.
------------+----------------------------------region1 | 859 14.32
14.32 region2 | 1175 19.59 33.91 region3 | 708 11.80 45.71 region4
| 754 12.57 58.28 region5 | 368 6.13 64.41 region6 | 1023 17.05
81.46 region7 | 1112 18.54 100.00
------------+----------------------------------Total | 5999
100.00
To bng tn sut 2 chiu C php: tabulate [quyn s] [iu kin] [phm vi]
[, chi2 missing nofreq cell column row] tab2 [quyn s] [iu kin] [phm
vi] [, chi2 missing nofreq cell column row] Lnh tablulate ny tnh v
hin th bng tn sut 2 chiu ca 2 bin c ch ra. Lnh tab2 to bng tn sut 2
chiu ca tng cp bin c ch ra trong danh sch bin. V d:. tab urban98
farm 1:urban | Type of HH (1:farm; 98; | 0:nonfarm) 0:rural 98 |
non farm farm | Total
-----------+----------------------+---------Rural | 1021 3248 |
4269 Urban | 1540 190 | 1730
-----------+----------------------+---------Total | 2561 3438 |
5999
17
Cc tu chn: chi2 missing nofreq cell column row V d:. tab reg7
urban98, cell nof
Thc hin kim nh gi thit l hai bin c lp Cho php cc quan st khng c
gi tr c xp vo 1 loi Khng hin th tn sut Hin th tn sut tng i (t l %)
ca cc Hin th tn sut tng i (t l %) ca cc theo ct Hin th tn sut tng i
(t l %) ca cc theo hng
| 1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban |
Total -----------+----------------------+---------region1 | 11.20
3.12 | 14.32 region2 | 13.05 6.53 | 19.59 region3 | 10.00 1.80 |
11.80 region4 | 8.37 4.20 | 12.57 region5 | 6.13 0.00 | 6.13
region6 | 8.57 8.48 | 17.05 region7 | 13.84 4.70 | 18.54
-----------+----------------------+---------Total | 71.16 28.84 |
100.00 . tab farm urban98, column row Type of HH | 1:urban 98;
0:rural (1:farm; | 98 0:nonfarm) | Rural Urban | Total
-----------+----------------------+---------non farm | 1021 1540 |
2561 | 39.87 60.13 | 100.00 | 23.92 89.02 | 42.69
-----------+----------------------+---------farm | 3248 190 | 3438
| 94.47 5.53 | 100.00 | 76.08 10.98 | 57.31
-----------+----------------------+---------Total | 4269 1730 |
5999 | 71.16 28.84 | 100.00 | 100.00 100.00 | 100.00
3.11. To bng thng k tng hp bng lnh tabulatesummarize C php:
tabulate [quyn s] [iu kin] [phm vi] , summarize(tn bin 3) [means
standard freq missing ] Lnh ny to bng mt hoc hai chiu nh ngha bi
bin 1 hoc bin 2 v mi cho gi tr thng k trung bnh, lch chun v tn sut
ca bin 3. V d:18
. tab
farm urban98, sum(poor) Means, Standard Deviations and
Frequencies of poor
Type of HH | 1:urban 98; 0:rural (1:farm; | 98 0:nonfarm) |
Rural Urban | Total -----------+----------------------+---------non
farm | .2791381 .06168831 | .14837954 | .44879538 .24066673 |
.35554523 | 1021 1540 | 2561
-----------+----------------------+---------farm | .42302956
.12105263 | .4063409 | .4941161 .32705022 | .49122109 | 3248 190 |
3438 -----------+----------------------+---------Total | .3886156
.06820809 | .29621604 | .48749275 .25217555 | .45662551 | 4269 1730
| 5999
Cc tu chn: means standard freq missing V d:. replace
poor=poor*100 (1777 real changes made) . format poor %4.2f . tab
reg7 urban98, sum(poor) means Means of poor | 1:urban 98; 0:rural
Code by 7 | 98 regions | Rural Urban | Total
-----------+----------------------+---------region1 | 61.46 8.02 |
49.83 region2 | 32.57 5.87 | 23.66 region3 | 44.83 10.19 | 39.55
region4 | 37.25 11.51 | 28.65 region5 | 47.28 . | 47.28 region6 |
12.45 2.16 | 7.33 region7 | 35.78 10.28 | 29.32
-----------+----------------------+---------Total | 38.86 6.82 |
29.62
Hin th mi gi tr trung bnh Hin th mi gi tr lch chun Hin th mi gi
tr tn sut Cho php cc quan st khng c gi tr c xp vo 1 loi
3.12. To bng thng k tng hp bng lnh tabstat C php:19
tabstat [quyn s] [iu kin] [phm vi] [, statistics(c php tk [...])
by(tn bin) missing format[(%fmt)]] Lnh ny tnh ton cc thng k ca cc
bin c ch ra bi danh sch bin cho tng gi tr ca bin phn loi
(categorical) c ch ra bi by(tn bin). V d:. tabstat rlfood rlhhex1,
stats(mean median) by(reg7)
Summary statistics: mean, p50 by categories of: reg7 (Code by 7
regions) reg7 | rlfood rlhhex1 --------+-------------------region1
| 5595.556 9560.349 | 5350.916 8536.373
----------------------------region2 | 6419.427 12951.14 | 5664.145
9997.146 ----------------------------region3 | 5692.201 10885.38 |
5369.411 9022.334 ----------------------------region4 | 6512.576
13525.41 | 5790.046 11077.51 ----------------------------region5 |
5894.983 11217.05 | 5380.505 9421.447
----------------------------region6 | 9746.158 23515.01 | 8428.743
18514.39 ----------------------------region7 | 6556.616 13068.11 |
6066.128 11043.99 ----------------------------Total | 6787.898
14010.74 | 5951.567 10733.19 -----------------------------
Cc tu chn: statistics(statname [...]) by(tn bin) Missing
format[(%fmt)] C php thng k mean count n Ch ra thng k cn tnh cho
danh sch bin Ch ra bin phn loi (categorical) Gi tr thiu (mising) ca
bin loi c xem nh 1 loi Ch ra nh dng ca s liu hin th ngha Trung bnh
mean m s quan st Ging nh lnh count (m s quan st)20
Stata cho php cc loi thng k c ch ra bi statistics(c php thng k
[...]) nh sau:
sum max min range sd sdmean skewness kurtosis median p1 p5 p10
p25 p50 p75 p90 p95 p99 iqr q V d:. tabstat
Tng cng Gi tr ln nht Gi tr nh nht Bin = Gi tr ln nht - Gi tr nh
nht lch chun lch chun ca trung bnh = lch chun / {(S quan st)^0.5}
lch ca phn phi nhn Trung v (Ging nh p50) 1% phn v 5% phn v 10% phn
v 25% phn v 50% phn v (trung v) 75% phn v 90% phn v 95% phn v 99%
phn v p75 - p25 tng ng vi "p25 p50 p75"
rlpcex1, stats(mean sd q) by(reg7) format(%5.1f)
Summary for variables: rlpcex1 by categories of: reg7 (Code by 7
regions) reg7 | mean sd p25 p50 p75
--------+-------------------------------------------------region1 |
2174.8 1265.1 1328.0 1792.1 2710.8 region2 | 3294.0 2511.9 1816.7
2532.5 3822.0 region3 | 2503.3 1918.0 1489.7 2001.2 2808.1 region4
| 2933.7 2260.5 1697.9 2362.2 3471.4 region5 | 2087.3 1285.4 1217.3
1850.8 2700.5 region6 | 5257.5 4005.7 2676.7 4154.1 6431.8 region7
| 2931.1 2137.2 1680.1 2321.9 3414.7
----------------------------------------------------------Total |
3188.7 2692.6 1671.1 2397.0 3711.9
-----------------------------------------------------------
3.13. To bng thng k tng hp bng lnh table C php:21
table [bin ct [bin ct trn cng]] [iu kin] [phm vi] [quyn s] [,
contents(ni dung) row col format(%fmt) missing] Lnh ny cho php tnh
cc thng k ca cc bin c ch ra trong contents theo dng bng, trong cc
hng c nh ngha bi bin dng, cn cc ct c nh ngha bi bin ct (v bin ct
trn cng). Cc bin hng v ct ny l cc bin phn loi (categorical). V d:.
table reg7 urban98 farm, contents(mean poor)
---------------------------------------------------| Type of HH
(1:farm; 0:nonfarm) and | 1:urban 98; 0:rural 98 Code by 7 | ----
non farm --------- farm -----regions | Rural Urban Rural Urban
----------+----------------------------------------region1 |
19.35484 6.015038 65.7377 12.96296 region2 | 26.66667 4.624278
33.96524 15.21739 region3 | 40.98361 10.11236 45.8159 10.52632
region4 | 21.6 11.63793 42.44032 10 region5 | 30.76923 49.24012
region6 | 15.04065 2.195609 10.07463 0 region7 | 38.62816 10.04184
34.35805 11.62791
----------------------------------------------------
Cc tu chn: Contents(ni dung) row col format(%fmt) missing V d:.
table reg7 urban98 farm, contents(mean poor) row col format(%4.2f)
-----------------------------------------------------| Type of HH
(1:farm; 0:nonfarm) and 1:urban | 98; 0:rural 98 Code by 7 | -----
non farm ---------- farm -----regions | Rural Urban Total Rural
Urban Total
----------+------------------------------------------region1 |
19.35 6.02 10.26 65.74 12.96 61.45 region2 | 26.67 4.62 11.29 33.97
15.22 32.70 region3 | 40.98 10.11 27.96 45.82 10.53 44.47 region4 |
21.60 11.64 15.13 42.44 10.00 40.81 region5 | 30.77 30.77 49.24
49.24 region6 | 15.04 2.20 6.43 10.07 0.00 9.78 region7 | 38.63
10.04 25.39 34.36 11.63 32.72 | Total | 27.91 6.17 14.84 42.30
12.11 40.63
Lit k danh sch cc bin v cc thng k. Cc k hiu thng k tng t nh lnh
tabstat Hin th thng k tng ca cc dng Hin th thng k tng ca cc ct Ch
ra nh dng ca s liu hin th Gi tr thiu (mising) ca bin loi c xem nh 1
loi
22
-----------------------------------------------------. table
urban98 farm, contents(mean poor sd poor) row col format(%4.2f)
---------------------------------------1:urban | 98; | Type of HH
(1:farm; 0:rural | 0:nonfarm) 98 | non farm farm Total
----------+----------------------------Rural | 27.91 42.30 38.86 |
44.88 49.41 48.75 | Urban | 6.17 12.11 6.82 | 24.07 32.71 25.22 |
Total | 14.84 40.63 29.62 | 35.55 49.12 45.66
---------------------------------------. table urban98
format(%4.2f) farm, contents(mean rlpcex1 mean rlhhex1) row col
---------------------------------------1:urban | 98; | Type of
HH (1:farm; 0:rural | 0:nonfarm) 98 | non farm farm Total
----------+----------------------------Rural | 2835.83 2212.12
2361.29 | 13242.03 10120.89 10867.36 | Urban | 5476.86 3232.17
5230.33 | 22984.44 11903.19 21767.43 | Total | 4423.95 2268.49
3188.67 | 19100.41 10219.39 14010.74
----------------------------------------
4. Bin tp v sa cha d liu (Data manipulation) 4.1. To bin mi To
bin bng lnh generate C php: generate = biu thc [iu kin] [phm vi]
Lnh ny cho php to bin mi c gi tr bng gi tr ca biu thc c ch ra. V d:
. gen poor = 1 if rlpcex1 < 1790 (4222 missing values generated)
. gen nonpoor=1 if rlpcex1 >= 1790 (1777 missing values
generated)23
Lnh to bin gi tabulategenerate C php: tabulate , generate(bin
mi) Lnh generate c th kt hp vi tab to cc bin gi . Bin mi to ra s c
dng l bin mi 1, bin mi 2, bin mi 3, v..v. Bin ny chnh l cc bin gi c
to ra trn c s ca bin phn loi. V d:
. tab reg7, gen(region) Code by 7 | regions | Freq. Percent Cum.
------------+----------------------------------region1 | 859 14.32
14.32 region2 | 1175 19.59 33.91 region3 | 708 11.80 45.71 region4
| 754 12.57 58.28 region5 | 368 6.13 64.41 region6 | 1023 17.05
81.46 region7 | 1112 18.54 100.00
------------+----------------------------------Total | 5999 100.00
. tab1 region1 region2 -> tabulation of region1 reg7==regio | n1
| Freq. Percent Cum.
------------+----------------------------------0 | 5140 85.68 85.68
1 | 859 14.32 100.00
------------+----------------------------------Total | 5999 100.00
-> tabulation of region2 reg7==regio | n2 | Freq. Percent Cum.
------------+----------------------------------0 | 4824 80.41 80.41
1 | 1175 19.59 100.00
------------+----------------------------------Total | 5999
100.00
y bin reg7 c 7 gi tr t 1 n 7 tng ng vi 7 bin gi t region1 n
region7 s c to ra. Bin region1 nhn gi tr bng 1 nu nh bin reg7 nhn
gi tr 1, nu khng th bng 0. Tng t bin region7 nhn gi tr 1 nu nh bin
reg7 bng 7. v d trn lnh tabulategenerate tng ng vi 7 lnh sau: gen
region1=(reg7==1) gen region2=(reg7==2)24
gen region7=(reg7==7) To bin bng lnh egen C php: egen = fcn(tham
s) [iu kin] [phm vi] [, by(bin)] Lnh ny cho php to bin mi theo gi
tr ca hm s c ch ra bi fcn. Bin mi ny s nhn gi tr c nh cho mi quan
st. Hm s y c th l: count(exp) mean(exp) median(exp) sd(exp) V d:.
egen sumexp=sum(rlpcex1) . sum sumexp Variable | Obs Mean Std. Dev.
Min Max
-------------+----------------------------------------------------sumexp
| 5999 1.91e+07 0 1.91e+07 1.91e+07 . egen g=median( food+
nonfood1) . sum g Variable | Obs Mean Std. Dev. Min Max
-------------+----------------------------------------------------g
| 5999 11063.6 0 11063.6 11063.6
m s quan st ca biu thc Cho gi tr trung bnh ca biu thc Cho gi tr
trung v ca biu thc Cho gi tr lch chun ca biu thc
Cc hm s khc c th xem phn help egen.
Thay th gi tr ca bin C php: replace = biu thc [iu kin] [phm vi]
Lnh ny thay th gi tr ca bin hin c bng gi tr mi xc nh bi biu thc
exp. V d: replace poor=poor*100 replace pcexp = hhexp/hhsize To bin
phn loi bng lnh encode C php: encode [iu kin] [phm vi],
generate(bin mi) Lnh ny cho php to bin phn loi mi (categorical) kiu
s tng ng vi cc gi tr ca bin kiu ch ch ra bi tn bin (c xp theo vn ch
ci). V d:25
. gen str15(mucsong) = "Kha" . drop mucsong
. gen mucsong="Rat ngheo" type mismatch r(109); . gen
str15(mucsong)="Rat ngheo" . replace mucsong="Ngheo" if (1087 real
changes made) rlpcex11290
. replace mucsong="Khong ngheo" if (4222 real changes made) .
tab mucsong
rlpcex1>=1790
mucsong | Freq. Percent Cum.
----------------+----------------------------------Khong ngheo |
4222 70.38 70.38 Ngheo | 1087 18.12 88.50 Rat ngheo | 690 11.50
100.00 ----------------+----------------------------------Total |
5999 100.00 . sum mucsong Variable | Obs Mean Std. Dev. Min Max
-------------+----------------------------------------------------mucsong
| 0 . encode mucsong, gen(ma_ms) . tab ma_ms ma_ms | Freq. Percent
Cum. ------------+----------------------------------Khong ngheo |
4222 70.38 70.38 Ngheo | 1087 18.12 88.50 Rat ngheo | 690 11.50
100.00 ------------+----------------------------------Total | 5999
100.00 . sum ma_ms Variable | Obs Mean Std. Dev. Min Max
-------------+----------------------------------------------------ma_ms
| 5999 1.411235 .6871957 1 3
To bin bng lnh xtile C php: xtile = biu thc [quyn s] [iu kin]
[phm vi] [, nquantiles(#)]26
Lnh ny to bin phn nhm cho biu thc theo phn v. Trong
nquantiles(#) ch ra s lng phn v. V d: To bin ng v phn theo chi tiu.
xtile quinexp= rlpcex1, nq(5) . tab quinexp 5 quantiles | of
rlpcex1 | Freq. Percent Cum.
------------+----------------------------------1 | 1200 20.00 20.00
2 | 1200 20.00 40.01 3 | 1200 20.00 60.01 4 | 1200 20.00 80.01 5 |
1199 19.99 100.00
------------+----------------------------------Total | 5999 100.00
. tab quinexp, sum( rlpcex1) | Summary of comp.M&Reg price
adj.pc 5 quantiles | tot exp of rlpcex1 | Mean Std. Dev. Freq.
------------+-----------------------------------1 | 1184.3975
261.20537 1200 2 | 1803.6331 151.66604 1200 3 | 2408.4867 211.5407
1200 4 | 3390.1065 403.08913 1200 5 | 7160.021 3690.3672 1199
------------+-----------------------------------Total | 3188.6671
2692.5673 5999
4.2. i tn bin C php: rename Lnh ny thc hin vic i tn c ca mt bin
sang tn mi. V d: rename poor nguoingheo rename rpcexp1 chitieu 4.3.
Lnh xo bin, xo quan st C php: drop drop drop [iu kin] keep Lnh ny
xo bin c ch ra bi danh sch bin Lnh ny xo quan st tho mn iu kin biu
thc Lnh ny xo quan st c ch ra bi phm vi (v c th phi tho mn iu kin
biu thc) Lnh ny gi li cc bin c ch ra bi danh sch bin, cc bin khng c
ch ra s b xo i27
keep keep [iu kin]
Lnh ny gi li cc quan st tho mn iu kin biu thc, cc quan st khc s
b xo i Lnh ny gi li cc quan st c ch ra bi phm vi (v c th tho mn iu
kin biu thc), cc quan st khc s b xo i. Xo 2 bin poor v urban98 Xo
cc quan st c bin sex nhn gi tr bng 1 Xo quan st t 1 n 20 Ch gi li
bin househol, cc bin khc b xo i Gi li quan st t u tin n 50, cc quan
st khc b xo i
V d: drop poor urban98 drop if sex==1 drop in 1/20 keep househol
keep in f/50 C php: recode gi tr c = gi tr mi [iu kin] [phm vi] Lnh
ny i gi tr ca bin phn loi theo cc quy tc c ch ra sau . V d:. recode
sex 0=1 (0 changes made) . recode sex . = 0 (0 changes made) .
recode hhsize 1/5=1 6/10 = 2 * = 3 (5785 changes made) . tab hhsize
Household | size | Freq. Percent Cum.
------------+----------------------------------1 | 4164 69.41 69.41
2 | 1786 29.77 99.18 3 | 49 0.82 100.00
------------+----------------------------------Total | 5999 100.00
. tab urban98 1:urban 98; | 0:rural 98 | Freq. Percent Cum.
------------+----------------------------------Rural | 4269 71.16
71.16 Urban | 1730 28.84 100.00
------------+----------------------------------Total | 5999
100.00
4.4. Lnh i gi tr ca bin phn loi
. recode urban98 0=1 1=0
28
(5999 changes made) . tab urban98 1:urban 98; | 0:rural 98 |
Freq. Percent Cum.
------------+----------------------------------Rural | 1730 28.84
28.84 Urban | 4269 71.16 100.00
------------+----------------------------------Total | 5999
100.00
4.5. Lnh gn nhn cho bin Gn nhn cho bin C php: label variable Nhn
ca bin Lnh ny gn nhn l mt dy k t cho bin. V d:. gen ngheo=poor .
des ngheo storage display value variable name type format label
variable label
--------------------------------------------------------------------------ngheo
float %9.0g . tab ngheo ngheo | Freq. Percent Cum.
------------+----------------------------------0 | 4222 70.38 70.38
1 | 1777 29.62 100.00
------------+----------------------------------Total | 5999 100.00
. label var ngheo "Nguoi co thu nhap duoi chuan ngheo" . tab ngheo
Nguoi co | thu nhap | duoi chuan | ngheo | Freq. Percent Cum.
------------+----------------------------------0 | 4222 70.38 70.38
1 | 1777 29.62 100.00
------------+----------------------------------Total | 5999 100.00
. des ngheo storage display value variable name type format label
variable label
---------------------------------------------------------------------------ngheo
float %9.0g Nguoi co thu nhap duoi chuan ngheo
29
Gn gi tr cho bin phn loi label define # "nhn" [# "nhn" ...] [,
add modify] label dir label list label drop {tn b nhn [tn b nhn
...] | _all} label values [tn b nhn] Lnh label define gn nhn cho mt
b gi tr s. Tn ca b nhn c ch ra sau t kho define, # l gi tr s, nhn l
chui k t tng ng vi gi tr s y. C hai tu chn y: tu chn add thm gi tr
v nhn tng ng vo 1 b nhn c sn. Tu chn modify cho php sa cha gi tr v
nhn ca 1 b nhn c sn. Lnh label dir hin th nhng b nhn c sn, cn lnh
label list hin th gi tr ca b nhn c ch ra. Lnh label drop xo cc b
nhn c sn. V d: To nhn c tn l nngheo vi gi tr 1 c ngha l ngi ngho,
cn 0 c ngha l ngi khng ngho.. label define nngheo 0 "Ngheo" 1
"Khong ngheo" . label dir nngheo region loaiho diploma urban
agegroup . label list nngheo nngheo: 0 Khong ngheo 1 Ngheo . label
drop _all . label dir
Lnh label values s gn cc nhn ca 1 b nhn cho cc gi tr s ca 1 bin
phn loi. V d:. tab ngheo ngheo | Freq. Percent Cum.
------------+----------------------------------0 | 4222 70.38 70.38
1 | 1777 29.62 100.00
------------+----------------------------------Total | 5999 100.00
. list ngheo in 1/5
30
1. 2. 3. 4. 5.
ngheo 1 0 1 1 0
. label values ngheo nngheo . tab ngheo ngheo | Freq. Percent
Cum. ------------+----------------------------------Ngheo | 4222
70.38 70.38 Khong ngheo | 1777 29.62 100.00
------------+----------------------------------Total | 5999 100.00
. list ngheo in 1/5 ngheo 1. Khong ngheo 2. Ngheo 3. Khong ngheo 4.
Khong ngheo 5. Ngheo
4.6. Sp xp s liu C php: sort [phm vi] gsort [+|-]tn bin [[+|-]tn
bin [...]] Lnh sort sp xp quan st theo th t tng dn ca gi tr ca cc
bin c ch ra trong danh sch bin. Lnh gsort cho php sp xp cc quan st
theo th t tng dn ca ca cc bin (danh sch bin), nu du + c ch ra (y
cng l gi tr ngm nh), hoc theo th t gim dn, nu du - c ch ra. V d:
sort reg7 hhsize Lnh ny sp xp cc quan st theo th t tng dn ca bin
vng reg7, trong mi vng cc quan st li c sp xp theo th t tng dn ca
bin quy m h hhsize. Lnh ny sp xp cc quan st theo th t tng dn ca bin
vng reg7, nhng trong mi vng cc quan st li c sp xp theo th t gim dn
ca bin quy m h hhsize.
gsort reg7 hhsize
4.7. Trn s liu Lnh thu gn s liu - collapse C php:31
collapse [quyn s] [iu kin] [phm vi] [, by(danh sch bin)] trong :
Biu thc thng k l danh sch cc thng k v cc bin tng ng. Cc thng k c k
hiu nh mc 3.12 ca chng ny. Lnh collapse s to ra mt tp s liu mi bao
gm cc bin c ch ra bi danh sch bin, vi cc gi tr c tnh theo thng k
tng ng. Cc quan st ca tp s liu c s c nhm li theo cc gi tr cng loi
ca bin c ch ra bi by(danh sch bin). V d: Chng ta c file s liu v thu
nhp v chi tiu ca cc h thnh vin trong gia nh: ma_tv ma_ho thunhap
Chitieu 1 101 200 500 2 101 1200 400 3 101 0 200 4 101 0 200 1 102
3200 500 2 102 1200 320 3 102 200 200 1 103 300 500 2 103 2100 250
3 103 0 300 4 103 0 300 1 104 4300 800 2 104 3500 500 3 104 300 500
4 104 0 300 5 104 0 200 6 104 0 200 Chng ta s dng lnh collapse to
file v thu nhp v chi tiu bnh qun ca cc h, v to thm 1 bin v qui m h.
. gen quimo=1 . collapse (mean) thunhap (mean) chitieu (sum) quimo,
by(ma_ho) Tp s liu mi c dng: ma_ho thunhap chitieu 101 350 325 102
1533.33 340 103 600 337.5 104 1350 416.667 Kt hp s liu - lnh merge
C php: merge [danh sch bin] using [, update replace] Lnh merge s ni
cc quan st ca tp s liu ang m trong Stata (gi l tp ch (master
dataset)) vi cc quan st tng ng ca tp s liu khc c ch ra sau t kho
using (gi l tp s dng (using dataset)) thnh 1 tp mi. Cc bin ch ra
trong danh sch bin c gi l32
quimo 4 3 4 6
bin xc nh (identifying variables), v phi c sp xt bng lnh sort
(hoc gsort) trc khi thc hin lnh merge. V d: Chng ta c 2 tp s liu nh
sau: thunhap.dta ma_ho 101 102 103 104 dialy.dta thunhap chitieu
350 325 1533.33 340 600 337.5 1350 416.667 quimo 4 3 4 6
ma_ho thanhthi vung 204 0 1 102 1 4 103 0 3 104 0 6 Lnh merge s
c thc hin nh sau: . use "C:\dialy.dta", clear . sort ma_ho . save
"C:\dialy.dta" file C:\dialy.dta saved . use "C:\thunhap.dta",
clear . sort ma_ho . merge ma_ho using "C:\dialy.dta" ma_ho was
byte now int . edit Tp kt qu c dng nh sau: ma_ho thunhap chitieu
quimo thanhthi vung _merge 101 350 325 4 . . 1 102 1533.33 340 3 1
4 3 103 600 337.5 4 0 3 3 104 1350 416.667 6 0 6 3 204 . . . 0 1 2
Trong tp kt qu c thm 1 bin tn l _merge, bin ny nhn cc gi tr nh sau:
_merge==1 _merge==2 _merge==3 Cc tu chn: Trong trng hp hai tp s liu
c cc bin trng nhau, cc tu chn sau y cho php x l s liu theo cc cch
khc nhau: Nu nh quan st ch c to t tp ch Nu nh quan st ch c to t s
dng Nu nh quan st c to t c tp ch v tp s dng
33
update Nu s liu ca bin trng nhau ca tp ch c gi tr thiu th gi tr
thiu ny nhn gi tr ca bin trng nhau ca tp s dng. replace Gi tr ca
bin trng nhau ca tp ch s nhn gi tr ca bin trng nhau ca tp s dng. Nu
khng tu chn no c ch ra th theo ngm nh, gi tr ca bin ca tp ch s khng
thay i. Ni s liu lnh append C php: append using Lnh ny cho php ni
tp c ch ra bi using vo vi tp ang c m theo cc bin c cng tn v nh dng.
S quan st ca tp mi bng tng s s quan st ca 2 tp. V d: c tp
thunhap2.dta nh sau ma_ho thunhap chitieu gioitinh 105 1350 425 1
106 1500 370 0 107 800 556 0 108 1500 417 0 109 2500 540 1 Hai tp
ny s c ni vi nhau bng lnh append nh sau: . use "C:\thunhap.dta",
clear . append using "C:\thunhap2.dta" . edit Tp kt qu c dng: ma_ho
thunhap chitieu quimo gioitinh 101 350 325 4 102 1533.33 340 3 103
600 337.5 4 104 1350 416.667 6 105 1350 425 1 106 1500 370 0 107
800 556 0 108 1500 417 0 109 2500 540 1 Ch : Xem thm lnh expand
dung to ra cc quan st ging nhau. 4.8. Chuyn dng s liu C php:
reshape wide , i(danh sch bin) [ j(tn bin [values]) ... ] reshape
long , i(danh sch bin) [ j(tn bin [values]) ... ] reshape wide
reshape long Lnh ny cho php chuyn s liu t dng ngang sang s liu dng
dc (tu chn long), v t dng dc sang dng ngang (tu chn wide). i(danh
sch bin) ch ra bin xc nh (indentifying34
variables) dng phn bit cc quan st vi nhau trong s liu dng ngang
(gi l quan st cp 1). j(tn bin) ch ra bin dng phn bit gia cc quan st
cp 2 s liu dng dc. V d 1: Chng ta c th s liu dng bng ngang nh mt ma
trn nh sau: -i-------------------- xj ------------------maho quimo
thunhap95 thunhap96 thunhap97 101 5 4500 4400 5400 102 4 3400 3300
3700 103 6 5000 5400 5500 s liu ny s c chuyn sang dng bng dc nh
sau: -i-jmaho quimo nam 101 5 95 101 5 96 101 5 97 102 4 95 102 4
96 102 4 97 103 6 95 103 6 96 103 6 97 V lnh reshape s c vit nh
sau:. reshape long thunhap, i(maho) j(nam) (note: j = 95 96 97)
Data wide -> long
--------------------------------------------------------------------Number
of obs. 3 -> 9 Number of variables 5 -> 4 j variable (3
values) -> nam xij variables: thunhap95 thunhap96 thunhap97
-> thunhap
--------------------------------------------------------------------*
Va chuyen nguoc lai tu dang doc sang dang ngang nhu sau . reshape
wide thunhap, i(maho) j(nam) (note: j = 95 96 97) Data long ->
wide
-----------------------------------------------------------------------Number
of obs. 9 -> 3 Number of variables 4 -> 5 j variable (3
values) nam -> (dropped) xij variables: thunhap -> thunhap95
thunhap96 thunhap97
----------------------------------------------------------------------
- xji thunhap 4500 4400 5400 3400 3300 3700 5000 5400 5500
V d 2:35
Chng ta c s liu dng bng sau y: maho sotien1 nguon1 sotien2 101
1200 Ngan hang A 2000 102 1300 Ngan hang B . 103 2500 Ngan hang A
1000 104 3000 Ngan hang A 2000 Bng ny c chuyn sang bng dng dc nh
sau:. reshape long sotien nguon, i(maho) j(lanvay) (note: j = 1 2)
Data wide -> long
--------------------------------------------------------------------Number
of obs. 4 -> 8 Number of variables 5 -> 4 j variable (2
values) -> lanvay xij variables: sotien1 sotien2 -> sotien
nguon1 nguon2 -> nguon
---------------------------------------------------------------------
nguon2 Ngan hang A . Ngan hang C Ngan hang B
Bng dc c dng nh sau: maho 101 101 102 102 103 103 104 104 lanvay
1 2 1 2 1 2 1 2 sotien 1200 2000 1300 2500 1000 3000 2000 nguon
Ngan hang A Ngan hang A Ngan hang B Ngan hang A Ngan hang C Ngan
hang A Ngan hang B
5. Quyn s trong VHLSS (Weight) 5.1. Quyn s trong iu tra chn mu
Trong iu tra chn mu, cc quan st c la chn mt cch ngu nhin nhng thng
thng cc quan st thng c xc sut la chn khc nhau. Quyn s bng gi tr
nghch o ca xc sut c chn vo mu. Nu nh quan st i c quyn s l wi th c
th ni quan st i trong mu i din cho wi phn t trong tng th. Cc c lng
suy din v tng th cn phi tnh n quyn s chn mu, nu khng th kt qu s b
sai lch. V d: Gi s min ng bng Sng Hng gm 2 tnh l H Ni v Bc Ninh vi
dn s tng ng l 4.5 triu v 500 nghn ngi. Chng ta mun chn mt mu ngu
nhin vi c mu l 500 quan st nghin cu v thu nhp ca ng bng Sng Hng cng
nh 2 tnh ny. Nu nh theo t l v dn s gia 2 tnh th chng ta s thu c mu
gm 450 h ti H Ni v 50 h ti Nam nh. Tuy nhin mu c chn mt cch ngu
nhin trn c vng nn s c kh nng l chng ta thu c mt mu m khng c quan st
no ca tnh Nam nh, hoc c vi s lng rt36
nh. cho mu mang tnh i din cho cc tnh th nn chn 400 quan st ti H
Ni v 100 quan st ti Nam nh. Nu thu nhp bnh qun ca H Ni l 900 nghn/
thng, v ca Nam nh l 300 nghn/thng th thu nhp bnh qun ca c vng ng
bng Sng Hng khng th tnh l (900 + 300)/2, v cc quan st trong mu khng
c chn t l vi cc tnh. Mi quan st ti H ni i din cho 11250 h trong vng
(4500000/400). y chnh l quyn s ca quan st, bng gi tr nghch o ca xc
sut c chn vo mu. Cn mi quan st ti Nam nh i din cho 50000 quan st ca
vng (500000/100). Thu nhp ca vng ng bng Sng Hng s c tnh nh sau: Thu
nhap = 900 400 11250 + 300 100 50000 = 840 400 11250 + 100
50000
Trong VLSS 1998 c 2 quyn s. Th nht l quyn s h, bin wt, chnh l s
h ca Vit Nam m mi h i din. Quyn s th hai l quyn s ca thnh vin h,
hhsizewt l s ngi Vit Nam m mi thnh vin ca h i din. Quyn s ca thnh
vin h bng quyn s h nhn vi quy m h. V d: Quyn s trong VLSS 1998. tab
reg7, sum(wt) Code by 7 | Summary of sample quyn s regions | Mean
Std. Dev. Freq.
------------+-----------------------------------region1 | 3218.4296
850.74246 859 region2 | 3133.7277 849.12325 1175 region3 |
3185.1794 801.74266 708 region4 | 2199.37 492.37202 754 region5 |
1336.3098 269.14747 368 region6 | 1963.8964 528.69328 1023 region7
| 2938.2122 547.72125 1112
------------+-----------------------------------Total | 2688.5003
900.01379 5999 . tab reg7, sum(hhsizewt) Code by 7 | Summary of
=hhsize*wt regions | Mean Std. Dev. Freq.
------------+-----------------------------------region1 | 15790.857
7555.7552 859 region2 | 12656.003 5970.9089 1175 region3 |
14814.504 7236.7592 708 region4 | 10794.537 5235.562 754 region5 |
7564.731 3185.9336 368 region6 | 9447.7077 4535.0816 1023 region7 |
14653.702 6639.8297 1112
------------+-----------------------------------Total | 12636.546
6597.6574 5999 . di 2688.5003*5999 16128313 . di 12636.546*5999
75806639
37
5.2. Cc la chn v quyn s Stata cho php s dng 4 loi loi quyn s sau
y: fweights: pweights: quyn s tn sut (frequency weights), Stata s
hiu quyn s y c ngha l s ln m mi quan st mi quan st c lp li trong
tnh ton. quyn s chn mu (sampling weights), Stata s hiu quyn s l gi
tr nghch o ca xc sut c chn vo mu, hay s phn t trong tng th m mi
quan st trong mu i din. quyn s phn tch (analytical weights), Stata
s hiu quyn s t l nghch vi phng sai ca quan st. quyn s quan trng
(importance weights), y l quyn s ch mc quan trng ca cc quan st.
aweights iweights
i vi iu tra mc sng cc lnh s dng quyn s pweights v fweights. V
d:. sum poor Variable | Obs Mean Std. Dev. Min Max
-------------+----------------------------------------------------poor
| 5999 29.6216 45.66255 0 100 . sum poor [fw=hhsize] Variable | Obs
Mean Std. Dev. Min Max
-------------+----------------------------------------------------poor
| 28509 34.17517 47.43051 0 100 . . .
tab
reg7 urban98
| 1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban |
Total -----------+----------------------+---------region1 | 672 187
| 859 region2 | 783 392 | 1175 region3 | 600 108 | 708 region4 |
502 252 | 754 region5 | 368 0 | 368 region6 | 514 509 | 1023
region7 | 830 282 | 1112
-----------+----------------------+---------Total | 4269 1730 |
5999
. .
tab
reg7 urban98 [fw= hhsizewt] 1:urban 98; 0:rural 98
| Code by 7 |
38
regions | Rural Urban | Total
-----------+----------------------+---------region1 | 11993763
1570583 | 13564346 region2 | 11057932 3812871 | 14870803 region3 |
9582621 906048 | 10488669 region4 | 5618709 2520372 | 8139081
region5 | 2783821 0 | 2783821 region6 | 4545303 5119702 | 9665005
region7 | 13220727 3074190 | 16294917
-----------+----------------------+---------Total | 58802876
17003766 | 75806642
. tab reg7 urban98 , sum(hhsize) means Means of Household size |
1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban | Total
-----------+----------------------+---------region1 | 5.1205357
3.7326203 | 4.8183935 region2 | 4.045977 4.0459184 | 4.0459574
region3 | 4.6666667 4.6759259 | 4.6680791 region4 | 4.8027888
5.1190476 | 4.9084881 region5 | 5.7065217 . | 5.7065217 region6 |
5.0719844 4.7131631 | 4.8934506 region7 | 5.1373494 4.3971631 |
4.9496403 -----------+----------------------+---------Total |
4.8702272 4.4612717 | 4.752292 . tab reg7 urban98 [fw=wt],
sum(hhsize) means Means and Number of Observations of Household
size | 1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban |
Total -----------+----------------------+---------region1 |
5.1328749 3.6698008 | 4.9063857 | 2336656 427975 | 2764631
-----------+----------------------+---------region2 | 4.0564115
3.987975 | 4.0386415 | 2726038 956092 | 3682130
-----------+----------------------+---------region3 | 4.6508908
4.6530097 | 4.6510738 | 2060384 194723 | 2255107
-----------+----------------------+---------region4 | 4.8136253
5.132367 | 4.9080132 | 1167251 491074 | 1658325
-----------+----------------------+---------region5 | 5.6609112 . |
5.6609112 | 491762 0 | 491762
-----------+----------------------+---------region6 | 5.0486426
4.6174858 | 4.8106956 | 900302 1108764 | 2009066
39
-----------+----------------------+---------region7 | 5.1494132
4.3925283 | 4.9872852 | 2567424 699868 | 3267292
-----------+----------------------+---------Total | 4.8003065
4.3841133 | 4.7002214 | 12249817 3878496 | 16128313 . . table reg7
urban98 , c(mean poor) col row format(%4.1f)
------------------------------| 1:urban 98; 0:rural Code by 7 | 98
regions | Rural Urban Total ----------+-------------------region1 |
61.5 8.0 49.8 region2 | 32.6 5.9 23.7 region3 | 44.8 10.2 39.5
region4 | 37.3 11.5 28.6 region5 | 47.3 47.3 region6 | 12.5 2.2 7.3
region7 | 35.8 10.3 29.3 | Total | 38.9 6.8 29.6
------------------------------. table reg7 urban98 [pw=hhsizewt],
c(mean poor) col row format(%4.1f) ------------------------------|
1:urban 98; 0:rural Code by 7 | 98 regions | Rural Urban Total
----------+-------------------region1 | 65.2 8.3 58.6 region2 |
36.1 7.0 28.7 region3 | 51.3 14.3 48.1 region4 | 43.6 16.6 35.2
region5 | 52.4 52.4 region6 | 13.0 2.9 7.6 region7 | 42.0 15.3 36.9
| Total | 45.5 9.2 37.4 -------------------------------
Chng III: Kim nh gi thit v phn tch hi quy1. c lng v kim nh gi
thit (Estimation and hypothesis testing) 1.1. c lng gi tr trung bnh
bng khong tin cy C php: ci [danh sch bin] [quyn s] [iu kin] [phm
vi] [, level(#) binomial poisson exposure(tn bin) total]
40
Lnh ny tnh sai s chun v khong tin cy cho gi tr trung bnh ca mu
theo quy lut chun, nh thc v Poatxng. Cc tu chn: level(#) binomial
poisson exposure(tn bin) ch nh mc tin cy cho c lng khong tin cy. #
nhn gi tr t 10 n 99, gi tr ngm nh l 95. p dng cho quy lut nh thc p
dng cho quy lut Poatxng p dng cho quy lut Poatxng, tn bin ch ra bin
thi lng (thng thng l thi gian hoc din tch) m trong xy ra cc s kin c
ch ra bi danh sch bin dng khi ma by prefix c s dung, yu cu c lng
khong tin cy cho ton b nhm.
total V d:. ci poor
Variable | Obs Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------------------poor
| 5999 29.6216 .5895501 28.46587 30.77733 . . . sort reg7 . by
reg7: ci poor, total
_______________________________________________________________________________
-> reg7 = region1 Variable | Obs Mean Std. Err. [95% Conf.
Interval]
-------------+------------------------------------------------------------poor
| 859 49.82538 1.706961 46.47507 53.17569
_______________________________________________________________________________
-> reg7 = region2 Variable | Obs Mean Std. Err. [95% Conf.
Interval]
-------------+------------------------------------------------------------poor
| 1175 23.65957 1.240357 21.22601 26.09314
_______________________________________________________________________________
-> reg7 = region3 Variable | Obs Mean Std. Err. [95% Conf.
Interval]
-------------+------------------------------------------------------------poor
| 708 39.54802 1.838899 35.93767 43.15838
_______________________________________________________________________________
-> reg7 = region4
41
Variable | Obs Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------------------poor
| 754 28.64721 1.64759 25.4128 31.88163
_______________________________________________________________________________
-> reg7 = region5 Variable | Obs Mean Std. Err. [95% Conf.
Interval]
-------------+------------------------------------------------------------poor
| 368 47.28261 2.606121 42.1578 52.40741
_______________________________________________________________________________
-> reg7 = region6 Variable | Obs Mean Std. Err. [95% Conf.
Interval]
-------------+------------------------------------------------------------poor
| 1023 7.331378 .8153306 5.731465 8.931292
_______________________________________________________________________________
-> reg7 = region7 Variable | Obs Mean Std. Err. [95% Conf.
Interval]
-------------+------------------------------------------------------------poor
| 1112 29.31655 1.365709 26.63689 31.99621
_______________________________________________________________________________
-> Total Variable | Obs Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------------------poor
| 5999 29.6216 .5895501 28.46587 30.77733
Ch : Cc lnh c lng c th c s dng khi bit cc tham s v mu. y c th c
gi l cc lnh s dng tham s trc tip (Commands using immediate
arguments). Cc lnh ny rt hu dng khi chng ta khng c s liu gc v bin.
cii [, level(#) ] cii [, level(#) ] (phn phi chun) (phn phi nh
thc)
#obs ch ra s quan st, #succ ch ra s ln gi tr bin nhn gi tr tng
ng vi php th thnh cng (thng thng nhn gi tr bng 1) cii poisson [
level(#) ] V d:. cii 5999 1777, level (90) -- Binomial Exact
-Variable | Obs Mean Std. Err. [90% Conf. Interval]
-------------+------------------------------------------------------------|
5999 .296216 .005895 .2865107 .3060676
(phn phi Poatxng)
42
. cii 12 27, poisson -- Poisson Exact -Variable | Exposure Mean
Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------------------|
12 2.25 .4330127 1.483144 3.273587
1.2.
Kim nh gi thuyt thng k
1.2.1. Kim nh gi tr trung bnh ca mu Phn phi khng mt C php:
prtest = # [iu kin] [phm vi] [, level(#)] Lnh ny thc hin kim nh gi
thuyt v t l gi tr ca bin phn phi theo quy lut khng mt (Ho: p = p0).
V d:. prtest poor=0.44 if reg7==1 One-sample test of proportion
poor: Number of obs = 859
---------------------------------------------------------------------------Variable
| Mean Std. Err. z P>|z| [95% Conf. Interval]
---------+-----------------------------------------------------------------poor
| .4982538 .0170597 29.2065 0.0000 .4648174 .5316901
---------------------------------------------------------------------------Ho:
proportion(poor) = .44 Ha: poor < .44 z = 3.440 P < z =
0.9997 Ha: poor ~= .44 z = 3.440 P > |z| = 0.0006 Ha: poor >
.44 z = 3.440 P > z = 0.0003
prtest = [iu kin] [phm vi] [, level(#)] Lnh ny thc hin kim nh gi
thuyt v s bng nhau ca t l ca hai gi tr bin c ch ra bi tn bin (Ho:
pX = pY). V d: Kim nh xem t l ngho i gia vng 2 v vng 4 c khac nhau
khng:. gen poor2=poor if reg7==2 (4824 missing values generated) .
gen poor4=poor if reg7==4 (5245 missing values generated) . prtest
poor2 = poor4 Two-sample test of proportion poor2: Number of obs =
poor4: Number of obs = 1175 754
-----------------------------------------------------------------------------Variable
| Mean Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
43
poor2 | .2365957 .0123983 19.0829 0.0000 .2122955 .2608959 poor4
| .2864721 .016465 17.3989 0.0000 .2542014 .3187429
---------+-------------------------------------------------------------------diff
| -.0498764 .020611 -.0902732 -.0094796 | under Ho: .0203666
-2.44893 0.0143
-----------------------------------------------------------------------------Ho:
proportion(poor2) - proportion(poor4) = diff = 0 Ha: diff < 0 z
= -2.449 P < z = 0.0072 Ha: diff ~= 0 z = -2.449 P > |z| =
0.0143 Ha: diff > 0 z = -2.449 P > z = 0.9928
prtest [iu kin] [phm vi], by(bin phn nhm) [level(#)] Lnh ny thc
hin kim nh gi thuyt v s bng nhau ca t l ca hai nhm c ch ra bi bin
phn nhm (Ho: pX1 = pX2). V d:. prtest poor, by(sex) Two-sample test
of proportion 1: Number of obs = 2: Number of obs = 4375 1624
-----------------------------------------------------------------------------Variable
| Mean Std. Err. z P>|z| [95% Conf. Interval]
---------+-------------------------------------------------------------------1
| .3248 .00708 45.8755 0.0000 .3109234 .3386766 2 | .2192118
.0102661 21.353 0.0000 .1990906 .239333
---------+-------------------------------------------------------------------diff
| .1055882 .0124708 .0811459 .1300304 | under Ho: .0132673 7.95855
0.0000
-----------------------------------------------------------------------------Ho:
proportion(1) - proportion(2) = diff = 0 Ha: diff < 0 z = 7.959
P < z = 1.0000 Ha: diff ~= 0 z = 7.959 P > |z| = 0.0000 Ha:
diff > 0 z = 7.959 P > z = 0.0000
Phn phi nh thc C php: bitest = #p [quyn s] [iu kin] [phm vi] Lnh
ny kim nh gi thuyt v tham s p trong quy lut nh thc (xc sut thnh cng
ca php th) ca bin c ch ra bi tn bin. (Ho: p = p0) V d:. bitest
poor=0.44 if reg7==1 Variable | N Observed k Expected k Assumed p
Observed p
-------------+-----------------------------------------------------------poor
| 859 428 377.96 0.44000 0.49825 Pr(k >= 428) = 0.000344
(one-sided test)
44
Pr(k = 428) = 0.000344 Pr(k |t| = 0.7444 Ha: mean > 3200 t =
-0.3260 P > t = 0.6278
ttest = [iu kin] [phm vi] [, unpaired unequal level(#) ] Lnh ny
thc hin kim nh gi thuyt rng hai bin c gi tr trung bnh bng nhau.
(Ho: X = Y). Cc tu chn: unpaired unequal V d:. ttest poor2=poor4,
unpaired unequal Two-sample t test with unequal variances
S liu ca hai bin khng cng cp Phung sai ca hai bin khng bng
nhau
45
-----------------------------------------------------------------------------Variable
| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+-------------------------------------------------------------------poor2
| 1175 .2365957 .0124036 .425173 .2122601 .2609314 poor4 | 754
.2864721 .0164759 .4524128 .254128 .3188163
---------+-------------------------------------------------------------------combined
| 1929 .2560912 .0099404 .436586 .2365962 .2755863
---------+-------------------------------------------------------------------diff
| -.0498764 .0206229 -.0903285 -.0094243
-----------------------------------------------------------------------------Satterthwaite's
degrees of freedom: 1532.64 Ho: mean(poor2) - mean(poor4) = diff =
0 Ha: diff < 0 t = -2.4185 P < t = 0.0079 Ha: diff ~= 0 t =
-2.4185 P > |t| = 0.0157 Ha: diff > 0 t = -2.4185 P > t =
0.9921
ttest [iu kin] [phm vi], by(bin phn nhm) [ unequal level(#) ]
Lnh ny thc hin kim nh gi thuyt v s bng nhau ca gi tr trung bnh ca
hai nhm c ch ra bi bin phn nhm (Ho: X1 = X2). V d:. ttest rlpcex1,
by(sex)
Two-sample t test with equal variances
-----------------------------------------------------------------------------Group
| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+-------------------------------------------------------------------1
| 4375 2980.906 36.74795 2430.648 2908.862 3052.951 2 | 1624
3748.368 80.18189 3231.241 3591.097 3905.638
---------+-------------------------------------------------------------------combined
| 5999 3188.667 34.76379 2692.567 3120.518 3256.817
---------+-------------------------------------------------------------------diff
| -767.4613 77.6155 -919.6156 -615.3071
-----------------------------------------------------------------------------Degrees
of freedom: 5997 Ho: mean(1) - mean(2) = diff = 0 Ha: diff < 0 t
= -9.8880 P < t = 0.0000 Ha: diff ~= 0 t = -9.8880 P > |t| =
0.0000 Ha: diff > 0 t = -9.8880 P > t = 1.0000
1.2.2. Kim nh gi tr lch chun C php: sdtest = # [iu kin] [phm vi]
[, level(#) ] sdtest = [iu kin] [phm vi] [, level(#) ] sdtest [iu
kin] [phm vi] , by(bin phn nhm) [ level(#) ] Lnh ny kim dnh tham s
lch chun ca bin ngu nhin tun theo quy lut chun c ch ra bi tn bin. C
php ca ln ny tng t vi c php ca lnh ttest46
V d:. sum rlpcex1
Variable | Obs Mean Std. Dev. Min Max
-------------+----------------------------------------------------rlpcex1
| 5999 3188.667 2692.567 357.318 45801.71 . sdtest rlpcex1=2700
One-sample test of variance
-----------------------------------------------------------------------------Variable
| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+-------------------------------------------------------------------rlpcex1
| 5999 3188.667 34.76379 2692.567 3120.518 3256.817
-----------------------------------------------------------------------------Ho:
sd(rlpcex1) = 2700 chi2(5998) = 5965.022 Ha: sd(rlpcex1) < 2700
P < chi2 = 0.3838 Ha: sd(rlpcex1) ~= 2700 2*(P < chi2) =
0.7676 Ha: sd(rlpcex1) > 2700 P > chi2 = 0.6162
2. Phn tch tng quan v hi quy (Correlation and regression) 2.1.
Phn tch tng quan C php: correlate [danh sch bin] [quyn s] [iu kin]
[phm vi] [, means covariance _coef wrap] Lnh ny tnh ma trn h s tong
quan (correlation coefficient), hoc hip phng sai (covariance) cho
cc bin c lit k trong danh sch bin. S quan st c dng l s quan st ca
bin c t quan st nht. Cc tu chn: means covariance _coef wrap V d:.
corr hhsize poor (obs=5999) rlpcex1 sex
Hin th cc thng k khc nh gi tr trung bnh, lch chun, gi tr ln nht,
nh nht a ra ma trn hip phng sai thay v h s tng quan Tnh ma trn tung
quan ca cc h s ca c lng gn nht Hin th cc dng ca ma trn lin nhau nu
c qua nhiu cc bin c lit k
| hhsize poor rlpcex1 sex
-------------+-----------------------------------hhsize | 1.0000
poor | 0.2425 1.0000 rlpcex1 | -0.2172 -0.4452 1.0000
47
sex |
-0.2570
-0.1028
0.1267
1.0000
. corr hhsize poor (obs=5999)
rlpcex1 sex, means cov
Variable | Mean Std. Dev. Min Max
-------------+---------------------------------------------------hhsize
| 4.752292 1.954292 1 19 poor | .296216 .4566255 0 1 rlpcex1 |
3188.667 2692.567 357.318 45801.71 sex | 1.270712 .4443645 1 2
| hhsize poor rlpcex1 sex
-------------+-----------------------------------hhsize | 3.81926
poor | .216435 .208507 rlpcex1 | -1142.93 -547.335 7.2e+06 sex |
-.223195 -.020849 151.543 .19746
pwcorr
[danh sch bin] [quyn s] [iu kin] [phm vi] [, obs sig print(#)
star(#)]
Lnh ny tnh h s tng quan cho tng cp bin c ch ra bi danh sch bin.
Cc tu chn: obs sig print(#) star(#) V d:. pwcorr hhsize poor
rlpcex1 sex, obs sig star(5) | hhsize poor rlpcex1 sex
-------------+-----------------------------------hhsize | 1.0000 |
| 5999 | poor | 0.2425* 1.0000 | 0.0000 | 5999 5999 | rlpcex1 |
-0.2172* -0.4452* 1.0000 | 0.0000 0.0000 | 5999 5999 5999 | sex |
-0.2570* -0.1028* 0.1267* 1.0000 | 0.0000 0.0000 0.0000
Hin th s quan st dng tnh h s tng quan Hin th mc ngha ca cc h s
tng quan Ch ra mc ngha theo ch cc h s tng quan c mc ngha nh hn mc
ny mi c hin th nh du sao i vi cc h s tng quan c mc ngh nh hn mc c
ch ra bi star
48
| |
5999
5999
5999
5999
pcorr [quyn s] [iu kin] [phm vi] Lnh ny tnh h s tng quan ca bin
c ch ra bi tn bin vi cc bin c trong danh sch bin V d:. pwcorr poor
hhsize rlpcex1 sex
| poor hhsize rlpcex1 sex
-------------+-----------------------------------poor | 1.0000
hhsize | 0.2425 1.0000 rlpcex1 | -0.4452 -0.2172 1.0000 sex |
-0.1028 -0.2570 0.1267 1.0000
2.2. Phn tch hi quy Phng php bnh phng nh nht (Ordinary-Least
Square) C php: regress [danh sch bin] [quyn s] [iu kin] [phm vi] [,
option] Lnh ny c lng cc h s ca hm bin ph thuc (dependent variable)
theo cc bin c lp (danh sch bin) theo phng php bnh phng nh nht. V
d:. reg rlpcex1 reg7 sex hhsize Number of obs F( 3, 5995) Prob >
F R-squared Adj R-squared Root MSE = = = = = = 5999 194.88 0.0000
0.0889 0.0884 2570.8
Source | SS df MS
-------------+-----------------------------Model | 3.8639e+09 3
1.2880e+09 Residual | 3.9621e+10 5995 6609032.15
-------------+-----------------------------Total | 4.3485e+10 5998
7249918.40
-----------------------------------------------------------------------------rlpcex1
| Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------reg7
| 240.9633 15.5905 15.46 0.000 210.4003 271.5263 sex | 403.2984
77.38324 5.21 0.000 251.5994 554.9974 hhsize | -305.6382 17.70692
-17.26 0.000 -340.3501 -270.9263 _cons | 3160.201 155.6576 20.30
0.000 2855.056 3465.346
------------------------------------------------------------------------------
Cc tu chn: level(#) noconstant noheader Ch ra mc tin cy cho c
lng khong tin cy ca h s Khng c h s (intercept) trong hm hi quy Ch
hin th kt qu phn tch v cc h s49
beta
Hin th h s c chun ho, dng so snh mc nh hng ca cc h s vi nhau
Phng php kh nng ln nht (Maximum-Likelihood) C php: probit [danh
sch bin] [quyn s] [iu kin] [phm vi] [, tu chn] Lnh ny thc hin hi
quy bin ph thuc theo cc bin c ch ra trong danh sch bin theo phng
php kh nng ln nht. Bin ph thuc thng l bin gi vi hai gi tr 0 v 1. V
d:. probit Iteration Iteration Iteration Iteration poor 0: 1: 2: 3:
reg7 sex log log log log hhsize = = = = -3645.1363 -3367.2185
-3364.8032 -3364.8025 Number of obs LR chi2(3) Prob > chi2
Pseudo R2 = = = = 5999 560.67 0.0000 0.0769
likelihood likelihood likelihood likelihood
Probit estimates
Log likelihood = -3364.8025
-----------------------------------------------------------------------------poor
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------reg7
| -.116342 .0084551 -13.76 0.000 -.1329136 -.0997703 sex |
-.1284525 .0422247 -3.04 0.002 -.2112113 -.0456937 hhsize |
.1808115 .0095806 18.87 0.000 .1620338 .1995892 _cons | -.8088731
.0824798 -9.81 0.000 -.9705306 -.6472157
------------------------------------------------------------------------------
c lng gi tr bin ph thuc v phn d C php: predict [iu kin] [phm vi]
[, xb stdp resid] Lnh ny c thc hin sau lnh regress (hoc probit) to
ra 1 bin mi c gi tr c tnh tu theo tu chn c ch ra. Cc tu chn: xb cho
php c lng gi tr ca bin ph thuc thu c t hm hi quy: Yi = 0 + 1 X i
stdp c lng sai s chun ca gia tr c lng:2 SE i = Var ( 0 ) + X i Var
(1 ) 2X i Cov( 0 , 1 )
redid
c lng gi tr phn d: e i = Yi Yi
V d: predict exphat, xb50
To ra bin mi exphat c gi tr c lng ca bin ph thuc (fitted value)
theo h s thu c t hm hi quy. predict expres, resid To ra bin expres
c gi tr ca phn d. Kim nh v h s ca hm hi quy C php: test [gi tr biu
thc] test [danh sch bin] testparm [, equal ] Lnh test kim nh cc gi
thit v h s ca hm hi quy va mi c c lng V d: test urban98 =2000 Kim
nh gi thit h s ca bin urban98 = 0 test region1 = region2 Kim nh gi
thit h s ca bin region1 bng h s ca bin region2 test region1 =
(region2+region3)/2 Kim nh gi thit v quan h gia cc h s ca bin
region1, region2, va region3 test region1 region2 region3 Kim nh gi
thit h s ca bin region1, region2, va region3 u bng 0 testparm
region* Kim nh gi thit v ca h s ca bin region1 n region7 u bng
0
. tab reg7, gen(region) Code by 7 | regions | Freq. Percent Cum.
------------+----------------------------------region1 | 859 14.32
14.32 region2 | 1175 19.59 33.91 region3 | 708 11.80 45.71 region4
| 754 12.57 58.28 region5 | 368 6.13 64.41 region6 | 1023 17.05
81.46 region7 | 1112 18.54 100.00
------------+----------------------------------Total | 5999 100.00
. reg rlpcex1 urban98 region* sex educyr98 hhsize Number of obs F(
10, 5988) Prob > F R-squared Adj R-squared Root MSE = = = = = =
5999 382.87 0.0000 0.3900 0.3890 2104.7
Source | SS df MS
-------------+-----------------------------Model | 1.6960e+10 10
1.6960e+09 Residual | 2.6525e+10 5988 4429712.49
-------------+-----------------------------Total | 4.3485e+10 5998
7249918.40
------------------------------------------------------------------------------
51
rlpcex1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------urban98
| 1995.163 66.46943 30.02 0.000 1864.859 2125.467 region1 |
-923.7066 132.8334 -6.95 0.000 -1184.108 -663.3052 region2 |
-362.6047 130.2254 -2.78 0.005 -617.8934 -107.316 region3 |
-558.0354 137.1551 -4.07 0.000 -826.9089 -289.1619 region4 |
-100.7586 135.8372 -0.74 0.458 -367.0486 165.5313 region5 |
(dropped) region6 | 1742.688 131.9928 13.20 0.000 1483.934 2001.441
region7 | 151.9854 128.0272 1.19 0.235 -98.99396 402.9648 sex |
270.9142 66.61031 4.07 0.000 140.3339 401.4944 educyr98 | 153.3281
6.836934 22.43 0.000 139.9253 166.731 hhsize | -257.691 14.73741
-17.49 0.000 -286.5816 -228.8004 _cons | 2362.355 178.3197 13.25
0.000 2012.784 2711.926
-----------------------------------------------------------------------------.
test ( 1) urban98 =2000 urban98 = 2000.0 F( 1, 5988) = Prob > F
= 0.01 0.9420
. test ( 1)
region1 = region2 region1 - region2 = 0.0 F( 1, 5988) = Prob
> F = 34.57 0.0000
. test ( 1)
region1 = (region2+region3)/2 region1 - .5 region2 - .5 region3
= 0.0 F( 1, 5988) = Prob > F = 27.80 0.0000
. test ( 1) ( 2) ( 3)
region1 region2 region3 region1 = 0.0 region2 = 0.0 region3 =
0.0 F( 3, 5988) = Prob > F = 20.22 0.0000
. testparm ( ( ( ( ( ( ( 1) 2) 3) 4) 5) 6) 7)
region*
region1 = 0.0 region2 = 0.0 region3 = 0.0 region4 = 0.0 region5
= 0.0 region6 = 0.0 region7 = 0.0 Constraint 5 dropped F( 6, 5988)
= 148.55
52
Prob > F =
0.0000
Chng IV: V th
1. V th (graph) C php: graph [danh sch bin] [quyn s] [iu kin]
[phm vi] [, loi__th tu_chn_ring tu_chn_chung] Trong : loi__th
(graph_type) tu_chn_ring (specific_options) tu_chn_chung
(common_options) Ch ra loi th cn v Cc tu chn lin quan n tng loi th
Cc tu chn c th s dng chung cho cc loi th nh tu chn v nh nhn trn cc
trc ca th
Stata cho php v 8 loi th nh sau (graph_type): (1) th 2 chiu
(two-way scatterplots) . graph rlpcex1 age
53
45801.7
comp.M&Reg price adj.pc tot exp 357.318 16 Age of household
head 95
(2) Ma trn th 2 chiu (two-way scatterplot matrices) . gr rlpcex1
age educyr98 hhsize, matrix16 95 1 19 45801.7
comp.M&Reg price adj.pc tot exp357.318 95
Age of household head16 22
schooling year of HH.head0 19
Household size
1 357.318 45801.7 0 22
(3) th tn sut (histograms) . gr rlpcex1, bin(50) normal
54
.329888
Fraction
0 357.318 comp.M&Reg price adj.pc tot exp 45801.7
(4) th ri mt chiu (one-way scatterplots) . gr rlpcex1,
oneway
357.318
comp.M&Reg price adj.pc tot exp
45801.71
(5) th hnh hp (box-and-whisker plots)
55
comp.M&Reg price adj.pc tot exp 45801.7
357.318
(6) th ct (bar chart) . sort reg7 . gr poor, bar means
by(reg7)poor .498254
0
1
2
3
4
5
6
7
(7) th hnh trn (pie charts) . for num 1/7: gen poorX=poor if
reg7==X -> gen poor1=poor if reg7==1 (5140 missing values
generated) -> gen poor2=poor if reg7==2 (4824 missing values
generated) -> gen poor3=poor if reg7==3 (5291 missing values
generated) -> gen poor4=poor if reg7==456
(5245 missing values generated) -> gen poor5=poor if reg7==5
(5631 missing values generated) -> gen poor6=poor if reg7==6
(4976 missing values generated) -> gen poor7=poor if reg7==7
(4887 missing values generated) . graph poor1-poor7, pie24% poor1
16% poor2 16% poor3 12% poor4 10% poor5 4% poor6 18% poor7
(8) th hnh sao (star charts) chart_type l star
57
Audi 5000
Audi Fox
BMW 320i
Datsun 200
Datsun 210
Price Mileage (mpg) Repair Record 1978
Datsun 510
Datsun 810
Fiat Strada
Honda Accord
Honda Civic
Headroom (in.) Trunk space (cu. ft.) Weight (lbs.) Length
(in.)
Mazda GLC
Renault
Subaru
Toyota Celica
Toyota Corolla
Turn Circle (ft.) Displacement (cu. in.)
Toyota Corona
VW Dasher
VW Diesel
VW Rabbit
VW Scirocco
Volvo 260
Cc la chn chung (common_options) * To tp s liu. tabulate hhsize,
sum (rlpcex1) | Summary of comp.M&Reg price adj.pc Household |
tot exp size | Mean Std. Dev. Freq.
------------+-----------------------------------1 | 4696.0254
4619.5012 214 2 | 4131.4892 3677.2297 497 3 | 3834.8615 2913.8177
731 4 | 3428.8011 2599.7301 1404 5 | 2930.5486 2168.0644 1318 6 |
2626.6848 2277.1893 867 7 | 2501.0912 2186.1605 480 8 | 2329.7009
1803.7873 255 9 | 2207.0166 1380.5607 126 10 | 2252.3772 1423.7576
58 11 | 2370.7034 1404.7148 29 12 | 1747.3691 924.72977 9 13 |
2114.1337 2109.0077 4 14 | 1579.78 990.81152 4 16 | 2994.5771
2061.6804 2 19 | 4833.936 0 1
------------+-----------------------------------Total | 3188.6671
2692.5673 5999 . tab hhsize, sum(educyr98) | Summary of schooling
year of Household | HH.head
58
size | Mean Std. Dev. Freq.
------------+-----------------------------------1 | 3.7897196
4.3956537 214 2 | 5.7545272 4.7225549 497 3 | 7.3023256 4.6396425
731 4 | 8.2578348 4.2659841 1404 5 | 7.7243298 4.2998488 1318 6 |
6.8788927 4.0778062 867 7 | 6.3348958 4.1241759 480 8 | 5.7333333
3.9623557 255 9 | 5.7936508 3.4878474 126 10 | 6.1724138 3.1851516
58 11 | 4.7931034 3.1665586 29 12 | 4.4444444 3.6438685 9 13 | 5
5.0990195 4 14 | 3 2.1602469 4 16 | 4 1.4142136 2 19 | 2 0 1
------------+-----------------------------------Total | 7.0944185
4.4160917 5999 . replace meanexp= meanexp/1000 (16 real changes
made) . replace meanexp= meanexp/1000 . rename var71 ahhsize .
rename var72 meanexp . rename var73 meanedu . replace meanexp=
meanexp/1000 . label var meanexp Chi tieu binh quan . label var
meanedu So nam hoc . label var ahhsize Quy mo ho
* Cc tu chn v tiu v trc to Ly v d th 2 chiu, trc tung th hin chi
tiu bnh qun v s nm hc bnh qun ca ch h, trc honh th hin quy m h gia
nh. . gr meanexp meanedu ahhsize
59
meanexp 8.25783
meanedu
1.57978 1 ahhsize 19
* La chn v tiu : title("chui k t") t1title("chui k t")
t2title("chui k t") b1title("chui k t") b2title("chui k t")
l1title("chui k t") l2title("chui k t") r1title("chui k t")
r2title("chui k t") Lnh ny ghi cc tiu trn pha trn (top), pha di
(bottom), bn tri (left) v bn phi (right) th. V d: gr meanexp
meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho)
l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu
ho) b2title (Quy mo ho gia dinh)Chi tieu binh quan 8.25783 So nam
hoc
Chi tieu binh quan (tr dong) So nam hoc cua chu ho 1.57978 1 Quy
mo ho gia dinh 19
Do thi chi tieu va hoc van chu ho
60
* Hin th gi tr trc th xlabel[(gi tr s)] ylabel[(gi tr s)]
rlabel[(gi tr s)] tlabel[(gi tr s)] V d: gr meanexp meanedu
ahhsize, title (Do thi chi tieu va hoc van chu ho) l1title(Chi tieu
binh quan (tr dong)) l2title(So nam hoc cua chu ho) b2title (Quy mo
ho gia dinh) xlabel ylabelChi tieu binh quan 8 So nam hoc
Chi tieu binh quan (tr dong) So nam hoc cua chu ho
6
4
2 0 5 10 Quy mo ho gia dinh 15 20
Do thi chi tieu va hoc van chu hoCh : Cc la chn khc c th xem phn
help bng lnh: help graxes Cc tu chn v ng ni xline[(gi tr s)]
yline[(gi tr s)] rline[(gi tr s)] tline[(gi tr s)] connect(c[[p]]
... c[[p]]) V d: . gr meanexp meanedu ahhsize, title (Do thi chi
tieu va hoc van chu ho) l1title(Chi tieu binh quan (tr dong))
l2title(So nam hoc cua chu ho) b2title (Quy mo ho gia dinh) xlabel
ylabel xline (5 10 to 20) yline(2 4 to 8) connect(ll)
61
Chi tieu binh quan 8
So nam hoc
Chi tieu binh quan (tr dong) So nam hoc cua chu ho
6
4
2 0 5 10 Quy mo ho gia dinh 15 20
Do thi chi tieu va hoc van chu ho2. Mt s loi th thng dng 2.1. th
2 chiu C php: graph [danh sch bin] [quyn s] [iu kin] [phm vi],
twoway [tu_chn_chung rescale] Tu chn rescale cho php hin th hai trc
tung vi gi tr khc nhau . gen meanexp1=meanexp*1000 . label var
meanexp1 "Chi tieu binh quan" . gr meanexp1 meanedu ahhsize, title
(Do thi chi tieu va hoc van chu ho) l1title(Chi tieu binh quan
(nghin dong)) b2title (Quy mo ho gia dinh) xlabel ylabel rlabel(2 4
to 8) connect(ll) rescaleChi tieu binh quan 5000 So nam hoc 8
Chi tieu binh quan (nghin dong)
4000 So nam hoc 6 3000
4 2000
1000 0 5 10 Quy mo ho gia dinh 15 20
2
Do thi chi tieu va hoc van chu ho62
2.2. th tn sut C php: graph [bin] [quyn s] [iu kin] [phm vi],
histogram [tu_chn_chung bin(#) freq normal[(#,#)] density(#)] Cc tu
chn: bin(#) Freq normal[(#,#)] density(#)] V d: th tn sut ca chi
tiu binh qun u ngi . gr rlpcex1, hist bin(20) normal.56026
Ch ra s lng khong cho th, gi tr ngm nh l bin(5) Gi tr tn sut s c
hin th trn trc tung V hm phn phi chun c dng vi la chn normal, ch ra
s lng im c lng hm mt theo phn phi chun
Fraction
0 357.318 comp.M&Reg price adj.pc tot exp 45801.7
. gr rlpcex1, hist bin(50) normal freq
63
1979
Frequency
0 357.318 comp.M&Reg price adj.pc tot exp 45801.7
. gr rlpcex1, hist bin(50) normal freq by(reg7)region1 415
region2 region3
0 region4 415 region5 region6
Frequency
0 region7 415 357.318 45801.7 357.318 45801.7
0 357.318
45801.7
Histograms by Code by 7 regions2.3. th hnh ct C php: graph [danh
sch bin] [quyn s] [iu kin] [phm vi], bar [tu_chn_chung [no]alt
means stack] V d: th gi tr trung bnh hc vn ca ch h v quy m h gia nh
theo 7 vng . gr educyr98 hhsize, bar means by(reg7)64
comp.M&Reg price adj.pc tot exp
schooling year of HH.head 8.64426
Household size
0
1
2
3
4
5
6
7
. label define region 1 "region1" 2 "region2" 3 "region3" 4
"region4" 5 "region5" 6 "region6" 7 "region7" . label values reg7
region . tab reg7 Code by 7 | regions | Freq. Percent Cum.
------------+----------------------------------region1 | 859 14.32
14.32 region2 | 1175 19.59 33.91 region3 | 708 11.80 45.71 region4
| 754 12.57 58.28 region5 | 368 6.13 64.41 region6 | 1023 17.05
81.46 region7 | 1112 18.54 100.00
------------+----------------------------------Total | 5999 100.00
. gr educyr98 hhsize, bar means by(reg7) ylabel( 2 4 to 10) alt
65
schooling year of HH.head 10
Household size
8
6
4
2
region1
region2
region3
region4
region5
region6
region7
La chn stack . gen persons=1 . gr persons urban98, bar ylabel
by(reg7) stack altpersons 1500 1:urban 98; 0:rural 98
1000
500
0
region1
region2
region3
region4
region5
region6
region7
V d: Hy v th sau:
66
foodpoor 600
poor
400
200
0
region1
region2
region3
region4
region5
region6
region7
2.4. th hnh trn C php: graph [danh sch bin] [quyn s] [iu kin]
[phm vi], pie [tu_chn_chung] Lnh ny v th hnh trn Mi bin s chim 1
phn ca hnh trn v t l ca phn ny do tng gi tr ca cc quan st cu bin
quyt nh. V d: V th t l phn trm s ngi ngho ca mi vng trn tng s ngi
ngho ca c nc. . gr poor1-poor7, pie24% poor1 16% poor2 16% poor3
12% poor4 10% poor5 4% poor6 18% poor7
. gen nonfpood=poor- foodpoor . label var nonfpood "poor but
still above food poverty line" . gen nonpoor=( rlpcex1>=1790) .
gr foodpoor nonfpood nonpoor, pie . set textsize 90
67
12% foodpoor 18% poor but still above food povert 70%
nonpoor
. set textsize 100 . gr foodpoor nonfpood nonpoor, pie by(reg7)
totalregion1 region2 region3
12% foodpoor 18% poor but still above food povert 70%
nonpoor
region4
region5
region6
region7
Total
3. Lu tr v hin th th (Saving and graph using) lu tr th th ti ca
s graph, vo thc n File, chn Save graph, sau la chn ng dn v tn file
cho th, phn m rng ngm nh l gph. th cng c th c lu tr bng tu chn
saving(tn tp [,replace]) vit sau lnh graph V d: . gr educyr98
hhsize, bar means by(reg7) ylabel( 2 4 to 10) alt saving ("c:\ do
thi 1") . gr persons urban98, bar ylabel by(reg7) stack alt
saving("c:\do thi 2")68
khng hin th th th c th dng lnh tt ch hin th th bng lnh set
graphics { on | off } . set graphics off . gr poor1-poor7, pie
saving ("c:\do thi 3", replace) (note: file c:\do thi 3.gph not
found) Stata cho php hin th cc th lu tr bng lnh: graph using [tp tp
th 2 ...] [, margin(#)] margin(#) ch ra khong cch l bao quanh th
theo gi tr phn trm ca din tch th. Gi tr ngm nh l 0. V d: . set
graphics on . graph using "c:\do thi 1" "c:\do thi 2" "c:\do thi
3", margin(10) title("Mot so dac diem cua ho gia dinh")region1
region2 region3
persons 12% foodpoor 18% poor but still above food povert 70%
nonpoor 1500
1:urban 98; 0:rural 98
region4
region5
region6
1000
region7
Total
500
0
region1
region2
region3
region4
region5
region6
region7
24% poor1 16% poor2 16% poor3 12% poor4 10% poor5 4% poor6 18%
poor7
Mot so dac diem cua ho gia dinhCh : Chng ta co th kt hp lnh
saving vi using lu tr ra th mi. V d: . graph using "c:\do thi 1"
"c:\do thi 2" "c:\do thi 3", margin(10) title("Mot so dac die m cua
ho gia dinh") saving("c:\do thi tong hop") . graph using "c:\do thi
tong hop"
69
Chng V: Lp trnh trong Stata
1. Gii thiu chung v chng trnh do-file 1.1. M v lu tr do-file
Stata cho php vit cc tp c gi l do-file bao gm cc lnh ca Stata. Thay
v thc hin tng lnh mt t ca s lnh command, cc tp do-file s ln lt thc
hin cc lnh . Chng trnh Stata c son tho trong ca s do-file editor.
Ca s ny c m bng cch kch vo thc n Windows v chn tu chn do-file
editor. Mt cch khc m ca s ny l g lnh doedit ti ca s lnh command. V
d: Mt chng trnh c th c son tho trong ca s do-file editor nh sau:
---------------clear set mem 32m use "C:\VLSS98\Hhexp98n.dta",
clear tab urban98 sum hhsize gen new=hhsizet gen
new=hhsize----------------
Sau khi son tho, do-file s c lu tr bng tu chn Save as trong thc
n File ca ca s do-file editor. Tn ca do-file c th c ch ra ngay ti
lnh doedit nh sau: doedit (tn do-file) Tp do-file c phn m rng l do.
v d trn chng ta c th lu tr on chng trnh di tn l chng trnh 1 ti th
mc Vlss98 trn a C. 1.2. Thc hin cc tp do-file chy do-file th ti ca
s lnh chng ta g mt trong hai lnh sau: do filename [, nostop] run
filename [, nostop]70
Lnh run thc hin cc lnh trong do-file nhng khng hin th kt qu ra
mn hnh. Trong qu trnh thc hin do-file, nu c cu lnh sai th Stata s
bo li v ngng vic thc hin cc cu lnh sau . Tuy nhin nu tu chn nostop
c ch ra th Stata s b qua cu lnh b li v tip tc thc hin cc lnh sau cu
lnh li . V d: . do "c:\vlss98\chuong trinh 1" . clear . set mem 32m
(32768k) . use "C:\VLSS98\Hhexp98n.dta", clear . tab urban98
1:urban 98; | 0:rural 98 | Rural | Urban | Total | . sum hhsize
Variable | hhsize | Obs 5999 Mean Std. Dev. 4.752292 1.954292 Min 1
Max 19
-------------+----------------------------------------------------Freq.
4269 1730 5999 Percent 71.16 28.84 100.00 Cum. 71.16 100.00
------------+-----------------------------------
------------+-----------------------------------
. gen new=hhsizet hhsizet not found r(111); end of do-file
r(111); Vi tu chn nostop . do "c:\vlss98\chuong trinh 1", nostop .
clear . set mem 32m (32768k) . use "C:\VLSS98\Hhexp98n.dta", clear
. tab urban98 1:urban 98; |71
0:rural 98 | Rural | Urban | Total | . sum hhsize Variable |
hhsize |
Freq. 4269 1730 5999
Percent 71.16 28.84 100.00
Cum. 71.16 100.00
------------+-----------------------------------
------------+-----------------------------------
Obs 5999
Mean Std. Dev. 4.752292 1.954292
Min 1
Max 19
-------------+-----------------------------------------------------
. gen new=hhsizet hhsizet not found r(111); . gen new=hhsize .
end of do-file Thc hin (chy) bng lnh run . run "c:\vlss98\chuong
trinh 1", nostop hhsizet not found Cc do-file c th thc hin bng tu
chn Do trong thc n File, hoc thc hin trc tip trong ca s Do-file
editor bng tu chn Do hoc Run trong thc n Tool. 1.3. Mt s lu khi son
tho do-file version # Khi son tho cc tp do-file chng ta nn a dng
lnh ny vo u chng trnh thng bo phin bn Stata c dng son tho do-file.
V d nu nh chng ta dng Stata 7.0 son tho do-file th cu lnh ny s c a
vo u chng trnh nh sau: version 7.0 clear use Hhexp98n.dta tab reg7
. Cc phin bn Stata khc nhau s c th c s khc nhau v c php hoc ngha ca
cc cu lnh. Lnh version cho php chng trnh Stata chy c th hiu ng c ni
dung ca tp do-file c vit bi cc phin bn khc. set memory #[k|m] Nu nh
file s liu i hi b nh ln hn b nh m Stata ang s dng th chng ta phi
thit lp b nh ln hn cho Stata bng lnh trn. Ch l khng nn thit lp b nh
ln hn b nh ca RAM my tnh.72
V d: . use "C:\Hhexp98n.dta", clear no room to add more
observations r(901); . set mem 32m (32768k) . use
"C:\Hhexp98n.dta", clear set more off/on Theo ch ngm nh, khi thc
hin mt lnh nu nh kt qu ca vic x l lnh di hn ca s kt qu (Stata
Results), mn hnh s dng li v chng ta s phi n phm (chng hn Enter hoc
Space bar) kt qu tip tc c hin th. Lnh set more off cho php kt qu
khng b dng li m c hin th lin tc cho n khi thc hin xong cu lnh hoc
do-file. Lnh set more on khi phc li ch ngm nh. K t * v /* */ Stata
s khng thc hin cc cu lnh c bt u bng k t * hoc nm gia hai nhm k t /*
*/. Cc k t ny dng vit ch thch trong do-file. V d:
-------------------version 7.0 set mem 32m use "C:\Hhexp98n.dta",
clear * Tao bien thu nhap cua ho gia dinh /* Bien nay bang Thu nhap
binh quan nhan voi Quy mo ho*/ gen hhexp = rlpcex1 * hhsize
#delimit ; Khi cu lnh trong do-file editor qu di th chng ta c th
dng lnh ny thng bo rng 1 cu lnh c kt thc bng k t (;). Theo ch ngm
nh th cu lnh c kt thc khi xung dng bng vic g phm Enter. khi phc li
ch ngm nh th dng lnh #delimit cr V d: lnh v th chng trc: graph
meanexp meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho)
l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu
ho) b2title (Quy mo ho gia dinh) xlabel ylabel xline (5 10 to 20)
yline(2 4 to 8) connect(ll) tung ng vi: #delimit ; graph meanexp
meanedu ahhsize, title (Do thi chi tieu va hoc van chu ho)
l1title(Chi tieu binh quan (tr dong)) l2title(So nam hoc cua chu
ho) b2title (Quy mo ho gia dinh) xlabel ylabel xline (5 10 to
20)73
yline(2 4 to 8) connect(ll) ; gen hhexp = rlpcex1 * hhsize ; ..
Sau chng ta nn khi phc li ch ngm nh nu nh cc cu lnh sau c th vit
trn 1 dng bng lnh: #delimit cr Ch : Chng ta c th dng k t /* */ vit
cu lnh di nh sau: graph meanexp meanedu ahhsize, title (Do thi chi
tieu va hoc van chu ho) /* */ l1title(Chi tieu binh quan (tr dong))
l2title(So nam hoc cua chu ho) /* */ b2title (Quy mo ho gia dinh)
xlabel ylabel xline (5 10 to 20) yline(2 4 to 8) connect(ll); Cc
lnh # delimit v cch vit cu lnh di s dng k t /* */ ch dng c trong
dofile ch khng dng c ti ca s lnh command.
2. Local v global macros Macros l cc bin c dng trong cc chng
trnh Stata. Bin macros c xem nh 1 on k t - gi l macroname (tn ca
macros) - tng ng vi 1 dy k t khc - c gi l macro contents (ni dung
ca macro). C hai loi macros l local macros (macros ni b) v global
macros (macros ton b). 2.1. Local macros Nu chng ta g: . local hogd
age hhsize rlpcex1 (Du nhy kp co th b qua, tc l c th g: local hogd
age hhsize rlpcex1) Khi th `hogd s c hiu tng ng vi: age hhsize
rlpcex1. hogd c gi l tn ca macros, cn age hhsize rlpcex1 l ni dung
ca macros. s dng ni dung ca macros, chng ta g tn ca macros gia du
trch dn bn tri ( ) nm pha trn bn tri bn phm - v du trch dn bn phi (
) nm pha phi bn di ca bn phm. Nh vy nu chng ta g: . summarize `hogd
th tng ng vi g: . summarize age hhsize rlpcex1 Nu chng ta g: .
local tb summarize th chng ta c th thc hin lnh summarize age hhsize
rlpcex1 bng cch g: . `tb' `hogd' Variable | Obs Mean Std. Dev. Min
Max
-------------+-----------------------------------------------------------age
| 5999 48.01284 13.7702 16 9574
hhsize | 5999 4.752292 1.954292 1 19 rlpcex1 | 5999 3188.667
2692.567 357.318 45801.71 hin th ni dung ca local macros th chng ta
g lnh macros list _(tn local macros) V d: . macro list _hogd _hogd:
age hhsize rlpcex1 xo local macros th chng ta c th dung lnh macros
drop _(tn local macros) V d: . macro drop _hogd . macro list _hogd
local macro `hogd' not found r(111); 2.2. Global macros Nu chng ta
g: . global diaban reg7 province commune (hoc c th b qua du ngoc
kp: global diaban reg7 province commune) Khi th $diaban tng ng vi:
reg7 province commune. diaban c gi l tn ca macros, cn reg7 province
commune l ni dung ca macros. s dng c ni dung ca global macros chng
ta g k hiu $ lin trc tn ca macros. Nh vy nu chng ta g: . describe
$diaban th tng ng vi g: . describe : reg7 province commune .
describe $diaban storage display value variable name type format
label variable label
------------------------------------------------------------------------------reg7
int %8.0g Code by 7 regions province float %9.0g Province code
commune float %9.0g commune code PSU-SVY commands . global mota
"describe" . $mota $diaban storage display value variable name type
format label variable label
------------------------------------------------------------------------------reg7
int %8.0g Code by 7 regions75
province commune
float %9.0g float %9.0g
Province code commune code PSU-SVY commands
hin th ni dung ca global macros th chng ta g lnh macros list (tn
global macros) V d: . global diaban "reg7 province commune" . macro
list diaban diaban: reg7 province commune xo global macros th chng
ta c th dng lnh macros drop (tn local macros) V d: . macro drop
diaban . macro list diaban global macro $diaban not found r(111);
2.3. S khc nhau gia local macros v global macros Local macros ch tn
ti trong 1 chng trnh. Mt chng trnh s khng hiu c cc local macros c s
dng cc chng trnh khc. Trong khi , mt khi c khai bo, global macros c
hiu bi tt c cc chng trnh v tn ti trong b nh ca Stata trong sut qu
trnh hot ng. V d: Thc hin on chng trnh khai bo local macros a. Sau
thc hin lnh hin th ni dung local macros ny, nhng macros ny khng tn
ti on chng trinh khc hay b nh ca Stata. . do
"C:\WINDOWS\TEMP\STD010000.tmp" . local a "chuong trinh thong ke
Stata" . end of do-file . macro list _a local macro `a' not found
r(111); Trong khi i vi global macros . do
"C:\WINDOWS\TEMP\STD010000.tmp" . global b "chuong trinh thong ke
Stata" . end of do-file . macro list b b: chuong trinh thong ke
Stata 3. Tch v hng v ma trn (scalar and matrix) 3.1. Ma trn
(matrix) Stata nh ngha ma trn A[r, c] l mt mng hnh ch nht gm r hng
(row) v c ct (column).76
V d: Nu ma trn A c to ra th chng ta c th xem ni dung ca ma trn
nh sau: . matrix list A A[3,3] c1 c2 c3 r1 r2 1 3 2 4 4 7
r3 10 11 14 y ma trn A bao gm 9 phn t (element): 1, 2, 4, 3, 4,
7, 10, 11, 14. Cc ct c t tn l c1, c2, v c3, v cc hng l r1, r2, v
r3. Phn t l giao im ca dng 1 v ct 2 c k hiu l A[1, 2]. Trong v d ny
A[1, 2] cha gi tr bng 2. 3.2. Tch v hng (scalar) Tch v hng cha 1
phn t l s. Tch v hng c nh ngha bng lnh sau: scalar scalar_name =
expression V d: . scalar a = 10 . scalar list a a = 10 . scalar b =
a* 2 . scalar list b b= 20 Trong chng mc no , tch v hng c th xem nh
mt trng hp c bit ca ma trn ch c 1 phn t (mt hng v mt ct). 3.3. Mt s
lnh lm vic vi ma trn Thit lp kch thc ma trn Gia tr ngm nh ca kch
thc ma trn l ti a 40 hng v 40 ct. Chng ta c th thay i kch thc ti a
ny bng lnh: . set matsize 500 Lnh ny cho php cc ma trn c to ra c th
bao gm 500 hng v 500 ct. To ma trn Ma trn c th to ra bng cc cu lnh
trc tip. V d: matrix mymat = (1,2\3,4) matrix myvec = (1 5 3 1 3)
matrix mycol = (1/5/3/1/3) Cc phn t c phn bit bi du phy, cn cc hng
c phn bit bi du gch cho To ra vct hng To ra vct ct
Ma trn cng c th c to ra t s liu bng lnh:77
mkmat [iu kin] [phm vi] [, matrix(tn ma trn) ] V d: . input maho
quymo thunhap maho 1. 101 6 1200 2. 103 5 1400 3. 105 5 3200 4. 107
9 1000 5. 109 4 2500 6. end . mkmat maho quymo thunhap, matrix(A) .
matrix list A A[5,3] maho r1 r2 r3 r4 r5 101 103 105 107 109 =B =
(C+C)/2 quymo thunhap 6 5 5 9 4 1200 1400 3200 1000 2500 To ra ma
trn D bng ma trn B Tnh li ma trn C da trn gi tr ca ca n To ra ma
trn D bng tch ma trn A v ma trn chuyn v A quymo thunhap
Tnh ton ma trn matrix D matrix C
matrix D = A*A Xo ma trn
Ma trn v tch v hng c th xo khi b nh bng lnh: matrix drop scalar
drop V d: . matrix drop A . scalar drop B 4. Lnh iu kin v vng lp
4.1. Lnh ifelse C php: iu kin (iu kin logic) { Nhm cu lnh 1 } else
Cu lnh78
Stata s kim tra iu kin logic (expression), nu iu kin ny ng th cc
lnh Nhm cu lnh 1 s c thc hin, nu iu kin sai th lnh ng sau else s c
thc hin, trong trng hp