Top Banner
SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables
42

SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

Dec 31, 2015

Download

Documents

Earl Floyd
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

SJTU CMGPD 2012Methodological Lecture

Day 4

Household and Relationship Variables

Page 2: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

Outline

• Existing household variables– Identifiers– Characteristics– Dynamics– Household relationship

• Creation of new variables– Use of bysort/egen

• Household relationship variables

Page 3: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

Identifiers

• HOUSEHOLD_ID– Identifies records associated with a household in the

current register• HOUSEHOLD_SEQ

– The order of the current household (linghu) within the current household group (yihu)

• UNIQUE_HH_ID– Identifies records associated with the same household

across different registers– New value assigned at time of household division

• Each of the resulting households gets a new, different

Page 4: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

Characteristics

• HH_SIZE– Number of living members of the household– Set to missing before 1789

• HH_DIVIDE_NEXT– Number of households in the next register that the

members of the current household are associated with.– 1 if no division– 0 if extinction– 2 or more if division– Set to missing before 1789

Page 5: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

histogram HH_SIZE if PRESENT & HH_SIZE > 0, width(2) scheme(s1mono) fraction ytitle("Proportion of individuals") xtitle("Number of members")

0.0

5.1

.15

Pro

por

tion

of in

div

idu

als

0 50 100 150Number of members

Page 6: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

• This isn’t particularly appealing• A log scale on the x axis would help• In STATA, histogram forces fixed width bins, even

when the x scale is set to log• We can collapse the data and plot using twoway bar or scatter

table HH_SIZE, replacetwoway bar table1 HH_SIZE if HH_SIZE > 0,

xscale(log) scheme(s1mono) xlabel(0 1 2 5 10 20 50 100 150)

Page 7: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

020

,000

40,0

0060

,000

80,0

0010

0,00

0F

req.

0 1 2 5 10 20 50 100 150Household size

Page 8: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

• What if we would like to convert to fractions?• Compute total number of households by summing table1,

then divide each value of table 1 by the total• sum(table1) returns the sum of table 1 up to the current

observation• total[_N] returns the value of total in the last observation

drop if HH_SIZE <= 0generate total = sum(table1)generate hh_fraction = table1/total[_N]twoway bar hh_fraction HH_SIZE if HH_SIZE > 0, xscale(log) scheme(s1mono) xlabel(0 1 2 5 10 20 50 100 150) ytitle("Proportion of households")

Page 9: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

0.0

2.0

4.0

6.0

8P

rop

ortio

n of

hou

seh

old

s

0 1 2 5 10 20 50 100 150Household size

Page 10: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

Households as units of analysis

• The previous figures all treated individuals as the units of an analysis

• Every household was represented as many times as it had members– A household with 100 members would contribute 100

observations• In effect, the figures represent household size as

experienced by individuals• Sometimes we would like to treat households as units of

analysis– So that each household only contributes one observation per

register

Page 11: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

Households as units of analysis

• One easy way is to create a flag variable that is set to 1 only for the first observation in each household

• Then select based on that flag variable for tabulations etc.• This leaves the original individual level data intact

bysort HOUSEHOLD_ID: generate hh_first_record = _n == 1

histogram HH_SIZE if hh_first_record & HH_SIZE > 0, width(2) scheme(s1mono) fraction ytitle("Proportion of households") xtitle("Number of members")

Page 12: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

0.1

.2.3

Pro

por

tion

of h

ouse

hol

ds

0 50 100 150Number of members

Page 13: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

0.0

5.1

.15

Pro

por

tion

of in

div

idu

als

0 50 100 150Number of members

0.1

.2.3

Pro

por

tion

of h

ouse

hol

ds

0 50 100 150Number of members

Page 14: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

Another approach to plotting trends

• We can plot average household size by year of birth without ‘destroying’ the data with TABLE, REPLACE or COLLAPSE

bysort YEAR: egen mean_hh_size = mean(HH_SIZE) if HH_SIZE > 0

bysort YEAR: egen first_in_year = _n == 1twoway scatter mean_hh_size YEAR if first_in_year & YEAR >= 1775, scheme(s1mono) ytitle("Mean household size of individuals") xlabel(1775(25)1900)

Page 15: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

510

1520

25M

ean

hous

eho

ld s

ize

of i

ndi

vid

uals

1775 1800 1825 1850 1875 1900Year

Page 16: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

Mean household size of individuals by age

keep if AGE_IN_SUI > 0 & SEX == 2 & YEAR >= 1789 & HH_SIZE > 0

bysort AGE_IN_SUI: egen mean_hh_size = mean(HH_SIZE)

bysort AGE_IN_SUI: generate first_in_age = _n == 1

twoway scatter mean_hh_size AGE_IN_SUI if first_in_age & AGE_IN_SUI <= 80, scheme(s1mono) ytitle("Mean household size of individuals") xlabel(1(5)85) xtitle("Age in sui")

lowess mean_hh_size AGE_IN_SUI if first_in_age & AGE_IN_SUI <= 80, scheme(s1mono) ytitle("Mean household size of individuals") xlabel(1(5)85) xtitle("Age in sui") msize(small)

Page 17: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

1015

20M

ean

hous

eho

ld s

ize

of i

ndi

vid

uals

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86Age in sui

Page 18: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

1015

20M

ean

hous

eho

ld s

ize

of i

ndi

vid

uals

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86Age in sui

bandwidth = .8

Lowess smoother

Page 19: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

Household divisionIndividuals by next register

. tab HH_DIVIDE_NEXT if PRESENT & NEXT_3 & HH_DIVIDE_NEXT >= 0

Number of | household in | the next | available | register | Freq. Percent Cum.---------------+----------------------------------- 1 | 789,250 94.98 94.98 2 | 33,000 3.97 98.95 3 | 5,815 0.70 99.65 4 | 1,812 0.22 99.87 5 | 383 0.05 99.91 6 | 314 0.04 99.95 7 | 196 0.02 99.98 8 | 34 0.00 99.98 9 | 82 0.01 99.99 10 | 86 0.01 100.00---------------+----------------------------------- Total | 830,972 100.00

Page 20: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

Household divisionHouseholds by next register

. bysort HOUSEHOLD_ID: generate first_in_hh = _n == 1

. tab HH_DIVIDE_NEXT if PRESENT & NEXT_3 & HH_DIVIDE_NEXT >= 0 & first_in_hh

Number of | household in | the next | available | register | Freq. Percent Cum.---------------+----------------------------------- 1 | 117,317 97.80 97.80 2 | 2,287 1.91 99.71 3 | 272 0.23 99.94 4 | 57 0.05 99.98 5 | 8 0.01 99.99 6 | 7 0.01 100.00 7 | 2 0.00 100.00 9 | 1 0.00 100.00 10 | 1 0.00 100.00---------------+----------------------------------- Total | 119,952 100.00

Page 21: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

Household divisionExample of a simple analysis

generate byte DIVISION = HH_DIVIDE_NEXT > 1

generate l_HH_SIZE = ln(HH_SIZE)/ln(1.1)

logit DIVISION HH_SIZE YEAR if HH_SIZE > 0 & NEXT_3 & HH_DIVIDE_NEXT >= 0 & first_in_hh

logit DIVISION l_HH_SIZE YEAR if NEXT_3 & HH_DIVIDE_NEXT >= 0 & first_in_hh

Page 22: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

. logit DIVISION HH_SIZE YEAR if HH_SIZE > 0 & NEXT_3 & HH_DIVIDE_NEXT >= 0 & first_in_hh

Iteration 0: log likelihood = -15419.716 Iteration 1: log likelihood = -14310.848 Iteration 2: log likelihood = -14127.244 Iteration 3: log likelihood = -14126.276 Iteration 4: log likelihood = -14126.276

Logistic regression Number of obs = 132688 LR chi2(2) = 2586.88 Prob > chi2 = 0.0000Log likelihood = -14126.276 Pseudo R2 = 0.0839

------------------------------------------------------------------------------ DIVISION | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- HH_SIZE | .0882472 .0016549 53.32 0.000 .0850036 .0914908 YEAR | -.0122989 .0005941 -20.70 0.000 -.0134633 -.0111345 _cons | 18.23519 1.087218 16.77 0.000 16.10428 20.3661

Page 23: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

. logit DIVISION l_HH_SIZE YEAR if NEXT_3 & HH_DIVIDE_NEXT >= 0 & first_in_hh

Iteration 0: log likelihood = -15419.716 Iteration 1: log likelihood = -13953.268 Iteration 2: log likelihood = -13468.077 Iteration 3: log likelihood = -13463.036 Iteration 4: log likelihood = -13463.032 Iteration 5: log likelihood = -13463.032

Logistic regression Number of obs = 132688 LR chi2(2) = 3913.37 Prob > chi2 = 0.0000Log likelihood = -13463.032 Pseudo R2 = 0.1269

------------------------------------------------------------------------------ DIVISION | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- l_HH_SIZE | .1341566 .0023316 57.54 0.000 .1295867 .1387265 YEAR | -.0130866 .0005775 -22.66 0.000 -.0142185 -.0119547 _cons | 17.75924 1.048066 16.94 0.000 15.70507 19.81342------------------------------------------------------------------------------

Page 24: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

Creating household variables• bysort and egen are your friends• Use household_id to group observations of the same

household in the same register• Let’s start with a count of the number of live individuals

in the household

bysort HOUSEHOLD_ID: egen new_hh_size = total(PRESENT)

. corr HH_SIZE new_hh_size if YEAR >= 1789(obs=1410354)

| HH_SIZE new_hh~e-------------+------------------ HH_SIZE | 1.0000 new_hh_size | 1.0000 1.0000

Page 25: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

Creating measures of age and sex composition of the household

bysort HOUSEHOLD_ID: egen males_1_15 = total(PRESENT & SEX == 2 & AGE_IN_SUI >= 1 & AGE_IN_SUI <= 15)

bysort HOUSEHOLD_ID: egen males_16_55 = total(PRESENT & SEX == 2 & AGE_IN_SUI >= 16 & AGE_IN_SUI <= 55)

bysort HOUSEHOLD_ID: egen males_56_up = total(PRESENT & SEX == 2 & AGE_IN_SUI >= 56)

bysort HOUSEHOLD_ID: egen females_1_15 = total(PRESENT & SEX == 1 & AGE_IN_SUI >= 1 & AGE_IN_SUI <= 15)

bysort HOUSEHOLD_ID: egen females_16_55 = total(PRESENT & SEX == 1 & AGE_IN_SUI >= 16 & AGE_IN_SUI <= 55)

bysort HOUSEHOLD_ID: egen females_56_up = total(PRESENT & SEX == 1 & AGE_IN_SUI >= 56)

generate hh_dependency_ratio = (males_1_15+males56_up+females_1_15+females56_up)/HH_SIZE

bysort AGE_IN_SUI: generate first_in_age = _n == 1bysort AGE_IN_SUI: egen mean_hh_dependency_ratio =

mean(hh_dependency_ratio)

twoway line mean_hh_dependency_ratio AGE_IN_SUI if first_in_age & AGE_IN_SUI >= 16 & AGE_IN_SUI <= 55, scheme(s1mono) ylabel(0(0.1)0.5) xlabel(16(5)55) ytitle("Household dependency ratio (Prop. < 15 or >= 56 sui)") xtitle("Age in sui")

Page 26: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

0.1

.2.3

.4.5

Ho

use

hold

dep

ende

ncy

ratio

(P

rop

. < 1

5 o

r >

= 5

6 su

i)

16 21 26 31 36 41 46 51 56Age in sui

Page 27: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

Numbers of individuals who co-reside with someone who holds a position

. bysort HOUSEHOLD_ID: egen position_in_hh = total(PRESENT & HAS_POSITION > 0)

. tab position_in_hh if PRESENT & YEAR >= 1789

position_in | _hh | Freq. Percent Cum.------------+----------------------------------- 0 | 1,177,575 90.23 90.23 1 | 87,517 6.71 96.94 2 | 24,204 1.85 98.79 3 | 8,019 0.61 99.41 4 | 4,893 0.37 99.78 5 | 1,712 0.13 99.91 6 | 651 0.05 99.96 7 | 241 0.02 99.98 8 | 136 0.01 99.99 9 | 101 0.01 100.00------------+----------------------------------- Total | 1,305,049 100.00

. replace position_in_hh = position_in_hh > 0(49183 real changes made)

. tab position_in_hh if PRESENT & YEAR >= 1789

position_in | _hh | Freq. Percent Cum.------------+----------------------------------- 0 | 1,177,575 90.23 90.23 1 | 127,474 9.77 100.00------------+----------------------------------- Total | 1,305,049 100.00

Page 28: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

RELATIONSHIP

• String describes relationship of individual to the head of the household– Before 1789, describes relationship to head of

yihu• This is the basis of our kinship linkage

– Automated linkage of children to their parents– Automated linkage of wives to their husband’s– All based on processing of strings describing

relationship

Page 29: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

RELATIONSHIPCore

• e is household head• w is a household head’s wife• m is household head’s mother• f is household head’s father (usually dead)• 1yb, 2yb, 2ob etc. are head’s brothers

– Older brothers of the head are unusual• 1yz, 2yz, 2oz etc. are head’s unmarried sisters• 1s, 2s, etc. are head’s sons• 1d, 2d, etc. are the head’s unmarried daughters

Page 30: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

RELATIONSHIPCombining codes

• More distant relationships are built up from these core relationships by combining them

• Examples– ff is grandfather of head– fm is grandmother of head– f2yb is an uncle: father’s second younger brother

• f2ybw is his wife

– f2yb1s is a cousin: father’s 2nd younger brother’s 1st son– 3yb2s is a nephew: 3rd younger brother’s 2nd son– 3s2s is a grandson: 3rd son’s 2nd son

• 3s2sw is his wife

Page 31: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

RELATIONSHIPLinking wives to husbands

• Strip the w off of a married woman’s relationship and search the household for the remaining string. – f2yb1sw -> search for f2yb1s

• Exceptions– For w, search for e– For f, search for m– For fm, search for ff– Etc.

• Basically prepare a target string, and then make use of merge on HOUSEHOLD_ID and the target

Page 32: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

RELATIONSHIPLinking children to fathers

• In most cases, strip off the last relationship code and look for the remainder.– 1s1s -> look for 1s– ff2yb3s2s -> look for ff2yb3s

• Exceptions– e look for f– 2yb look for f– f2yb look for ff

• To link married women to their fathers-in-law, strip off w first, then convert to father’s relationship

Page 33: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

RELATIONSHIPIndicators of specify basic relationships to head

generate head = RELATIONSHIP == “e”

generate head_wife = RELATIONSHIP == “w”

generate mother = RELATIONSHIP == “m”

generate father = RELATIONSHIP == “f”

. tab head SEX if PRESENT & SEX >= 1, row col

+-------------------+| Key ||-------------------|| frequency || row percentage || column percentage |+-------------------+

| Sex head | Female Male | Total-----------+----------------------+---------- 0 | 539,935 671,972 | 1,211,907 | 44.55 55.45 | 100.00 | 98.69 78.90 | 86.64 -----------+----------------------+---------- 1 | 7,148 179,658 | 186,806 | 3.83 96.17 | 100.00 | 1.31 21.10 | 13.36 -----------+----------------------+---------- Total | 547,083 851,630 | 1,398,713 | 39.11 60.89 | 100.00 | 100.00 100.00 | 100.00

Page 34: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

RELATIONSHIPProcessing for distant relationships

• Strip out numbers, seniority modifiers y and b, etc.

• In a .do file, this will create a new variable with a stripped relationship

generate new_RELATIONSHIP = RELATIONSHIPlocal for_removal "1 2 3 4 5 6 7 8 9 o y w"foreach x of local for_removal {

replace new_RELATIONSHIP = subinstr(new_RELATIONSHIP,"`x'","",.)

}

Page 35: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

ExamplesRELATIONSHIP new_RELATIONSHIPe ewf fm m1ob b1obw b1ob1s bs3yb b3ybw b3yb1s bs3yb1d bd4yb b4ybw bf2yb fbf2ybw fb

RELATIONSHIP new_RELATIONSHIPf2yb1d fbdf3yb fbf3ybw fbf3yb1s fbsf3yb1sw fbsf3yb1s1s fbssf3yb1s1d fbsdf3yb2s fbsf3yb2sw fbsf3yb2s1d fbsdf4ybw fbf4yb1sw fbsf4yb1s1d fbsdf4yb1d fbdf4yb2d fbd

Page 36: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

generate brother = new_RELATIONSHIP = “b” & SEX == 2

generate brothers_wife = “b” & SEX == 1 & MARITAL_STATUS !=2 & MARITAL_STATUS > 0

generate sister = new_RELATIONSHIP = “z” & SEX == 1

generate male_cousin = new_RELATIONSHIP = “fbs” & SEX == 2

generate nephew = new_RELATIONSHIP = “bs” & SEX == 2

Page 37: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

Proportions of different relationships by age

generate brother = new_RELATIONSHIP == "b"bysort AGE_IN_SUI: egen males = total(SEX == 2 & PRESENT)bysort AGE_IN_SUI: egen brothers = total(SEX == 2 & brother & PRESENT)generate proportion_brothers = brothers/malesby AGE_IN_SUI: generate first_in_age = _n == 1twoway line proportion_brothers AGE_IN_SUI if AGE_IN_SUI >= 1 & AGE_IN_SUI <= 80 &

first_in_age, ytitle("Proportion of males who are brother of a head") scheme(s1mono)bysort AGE_IN_SUI: egen heads = total(SEX == 2 & RELATIONSHIP == "e" & PRESENT)generate proportion_heads = heads/malestwoway line proportion_heads AGE_IN_SUI if AGE_IN_SUI >= 1 & AGE_IN_SUI <= 80 &

first_in_age, ytitle("Proportion of males who are household head") scheme(s1mono)bysort AGE_IN_SUI: egen sons = total(SEX == 2 & new_RELATIONSHIP == "s" & PRESENT)generate proportion_sons = sons/malestwoway line proportion_sons AGE_IN_SUI if AGE_IN_SUI >= 1 & AGE_IN_SUI <= 80 &

first_in_age, ytitle("Proportion of males who are son of a head") scheme(s1mono)

Page 38: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

0.2

.4.6

.8P

rop

ortio

n of

ma

les

wh

o ar

e h

ouse

hol

d he

ad

0 20 40 60 80Age in Sui

Page 39: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

0.0

5.1

.15

.2P

rop

ortio

n of

ma

les

wh

o ar

e b

roth

er

of a

hea

d

0 20 40 60 80Age in Sui

Page 40: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

0.1

.2.3

.4P

rop

ortio

n of

ma

les

wh

o ar

e s

on

of a

hea

d

0 20 40 60 80Age in Sui

Page 41: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

Relationship at first appearancebysort PERSON_ID (YEAR): generate fa_nephew = new_RELATIONSHIP[1] == "bs" & AGE[1] <= 10 &

SEX == 2 & PRESENTbysort PERSON_ID (YEAR): generate fa_son = new_RELATIONSHIP[1] == "s" & AGE[1] <= 10 & SEX

== 2 & PRESENTgenerate fa_nephew_head = fa_nephew & headgenerate fa_son_head = fa_son & headbysort AGE_IN_SUI: egen fa_sons = total(fa_son)bysort AGE_IN_SUI: egen fa_nephews = total(fa_nephew)bysort AGE_IN_SUI: egen fa_sons_head = total(fa_son_head)bysort AGE_IN_SUI: egen fa_nephews_head = total(fa_nephew_head)generate p_fa_sons_head = fa_sons_head/fa_sonsgenerate p_fa_nephews_head = fa_nephews_head/fa_nephewstwoway line p_fa_sons_head p_fa_nephews_head AGE_IN_SUI if AGE_IN_SUI >= 1 & AGE_IN_SUI

<= 80 & first_in_age, ytitle("Proportion") scheme(s1mono)twoway line p_fa_sons_head p_fa_nephews_head AGE_IN_SUI if AGE_IN_SUI >= 1 & AGE_IN_SUI

<= 80 & first_in_age, ytitle("Proportion now head") scheme(s1mono) legend(order(1 "Appeared as sons of head" 2 "Appeared as nephews of head"))

Page 42: SJTU CMGPD 2012 Methodological Lecture Day 4 Household and Relationship Variables.

0.2

.4.6

.8P

rop

ortio

n no

w h

ead

0 20 40 60 80Age in Sui

Appeared as sons of head Appeared as nephews of head