Key variables 1 Key Variables: Social Science Measurement and Functional Form Presentation to: ‘Interpreting results from statistical modelling – A seminar for Scottish Government Social Researchers”, Edinburgh, 1 April 2009 Dr Paul Lambert and Professor Vernon Gayle University of Stirling
62
Embed
Key variables1 Key Variables: Social Science Measurement and Functional Form Presentation to: ‘ Interpreting results from statistical modelling – A seminar.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Key variables 1
Key Variables: Social Science Measurement and Functional Form
Presentation to: ‘Interpreting results from statistical modelling – A seminar for Scottish Government Social Researchers”,
Edinburgh, 1 April 2009
Dr Paul Lambert and Professor Vernon Gayle University of Stirling
Key variables 2
Key Variables: Social Science Measurement and Functional Form
1) Working with variables - ‘Beta’s in Society’ and ‘Demystifying Coefficients’
2) Key Variables and social science measurement - Harmonisation and standardisation
- An example: occupations
3) Functional Form
Key variables 3
‘Beta’s in Society’ and ‘Demystifying Coefficients’ Dorling, D., & Simpson, S. (Eds.). (1999). Statistics in Society: The Arithmetic of
Politics. London: Arnold. Irvine, J., Miles, I., & Evans, J. (Eds.). (1979). Demystifying Social Statistics.
London: Pluto Press.
• Famous works on critical interpretation of social statistics tend to have a univariate / bivariate focus – Measuring unemployment; averaging income; bivariate
significance tests; correlation v’s causation
• But social survey analysts usually argue that complex multivariate analyses are more appropriate.. Critical interpretation of joint relative effects Attention to effects of ‘key variables’ in multivariate analysis
Key variables 4
• “A program like SPSS .. has two main components:
the statistical routines, .. and the data management facilities. Perhaps surprisingly, it was the latter that really revolutionised quantitative social research” [Procter, 2001: 253]
• “Socio-economic processes require comprehensive approaches as they are very complex (‘everything depends on everything else’). The data and computing power needed to disentangle the multiple mechanisms at work have only just become available.” [Crouchley and Fligelstone 2004]
Key variables 5
Large scale survey data: 2 technological themes
• We’re data rich (but analysts’ poor) • Plenty of variables (a thousand is common) • Plenty of cases
• We work overwhelmingly through individual analysts’ micro-computing – impact of mainstream software
– Pressure for simple / accessible / popular analytical techniques (whatever happened to loglinear models?)
– Propensity for simple ‘data management’
– Specialist development of very complex analytical packages for very simple sets of variables
Key variables 6
Survey research: Access, manipulate & analyse patterns in variables (‘variable by case matrix’)
Key variables 7
Critical role of syntactical records in working with data & variables
Reproducible (for self)Replicable (for all)Paper trail for whole
lifecycleCf. Dale 2006; Freese 2007
• In survey research, this means using clearly annotated syntax files (e.g. SPSS/Stata)
Some comments on survey analysis software for analysing variables..
• Data management and data analysis must be seen as integrated processes
• Stata is the most effective software, as it combines advanced data management and data analysis functionality and makes good documentation easy
• For an extended example of using Stata, concentrating on variable operationalisations and standardisations: – Lambert, P. S., & Gayle, V. (2009). Data management and
standardisation: A methodological comment on using results from the UK Research Assessment Exercise 2008. Stirling: University of Stirling, Technical paper 2008-3 of the Data Management through e-Social Science research Node (www.dames.org.uk) E.g. “do http://www.dames.org.uk/rae2008/uoa0108recode.do”E.g. “use http://www.dames.org.uk/rae2008/rae2008_3.dta, clear”
transformations and functional forms; relative effects in multivariate models
– Data collection and data analysis – Cf. www.longitudinal.stir.ac.uk/variables/
processes by which survey measures are defined and subsequently interpreted by research analysts
Key variables 11
β’s - Where’s the action?
• If we have lots of variables, lots of cases, yet often quite simple techniques and software, the action is primarily in the variable constructions…
• The example of social mobility research – see Lambert et al. (2007)
i. How we chose between alternative measures
ii. How much data management we try (or bother with)
Plus other issues in how we analyse & interpret the coefficients from the models we use (..elsewhere today..)
Key variables 12
i) Choosing measures
See (2) below • A sensible starting point is with ‘key variables’• Approaches to standardisation / harmonisation• {Lack of} awareness of existing resources
See (3) below• Influence of functional form
Key variables 13
ii) Data management – e.g. recoding data
Count
323 0 0 0 0 323
982 0 0 0 0 982
0 425 0 0 0 425
0 1597 0 0 0 1597
0 0 340 0 0 340
0 0 3434 0 0 3434
0 0 161 0 0 161
0 0 0 1811 0 1811
0 0 0 0 2518 2518
0 0 0 331 0 331
0 0 0 0 421 421
0 0 0 257 0 257
102 0 0 0 0 102
0 0 0 0 2787 2787
138 0 0 0 0 138
1545 2022 3935 2399 5726 15627
-9 Missing or wild
-7 Proxy respondent
1 Higher Degree
2 First Degree
3 Teaching QF
4 Other Higher QF
5 Nursing QF
6 GCE A Levels
7 GCE O Levels or Equiv
8 Commercial QF, No OLevels
9 CSE Grade 2-5,ScotGrade 4-5
10 Apprenticeship
11 Other QF
12 No QF
13 Still At School No QF
Highesteducationalqualification
Total
-9.001.00
Degree2.00
Diploma
3.00 Higherschool orvocational
4.00 Schoollevel orbelow
educ4
Total
Key variables 14
ii) Data management – e.g. Missing data / case selection
Key variables 15
ii) Data management – e.g. Linking data Linking via ‘ojbsoc00’ : c1-5 =original data / c6 = derived from data / c7 = derived from www.camsis.stir.ac.uk
Data Management through e-Social Science (DAMES – www.dames.org.uk)
• Supporting operations on data widely performed by social science researchers
– Matching data files together
– ‘Cleaning’ data
– Operationalising variables
– Specialist data resources (occupations; education; ethnicity)
• Why is e-Social Science relevant? – Dealing with distributed, heterogeneous datasets
– Generic data requirements / provisions
– Lack of previous systematic standards (e.g. metadata; security; citation procedures; resources to review/obtain suitable data)
Key variables 19
Working with variables – further issues
• Re-inventing the wheel – …In survey data analysis, somebody else has already
struggled through the variable constructions your are working on right now…
– Increasing attention to documentation and replicability [cf Dale 2006; Freese 2007]
• Guidance and support– In the UK, use www.esds.ac.uk – Most guidance concerns collecting & harmonising data– Less is directed to analytically exploiting measures
• “the degree to which survey measures or questions are able to assess identical phenonema across two or more cultures”
[Harkness et al 2003, p351]
Equivalence
Measurement equivalence involves same instruments and equality of measures (e.g. income in pounds)
Functional equivalence involves different instruments, but addresses same concepts (e.g. inflation adjusted income)
Key variables 27
“Equivalence is the only meaningful criterion if data is to be compared from one context to another. However, equivalence of measures does not necessarily mean that the measurement instruments used in different countries are all the same. Instead it is essential that they measure the same dimension. Thus, functional equivalence is more precisely what is required”
[HZ and Wolf 2003, p389]
Key variables 28
Harmonisation & equivalence combined
‘Universality’ or ‘specificity’ in variable constructions
But specificity is theoretically betterSpecificity is more easily obtained than is often realisedEspecially for well-known ‘key variables’
Key variables 29
Working with key variables - speculation
a) Data manipulation skills and inertia
• I would speculate that around 80% of applications using key variables don’t consult literature and evaluate alternative measures, but choose the first convenient and/or accessible variable in the dataset Data supply decisions (‘what is on the archive version’) are critical
• Much of the explanation lies with lack of confidence in data manipulation / linking data
• Too many under-used resources – cf. www.esds.ac.uk
• ‘everything depends on everything else’ [Crouchley and Fligelstone 2004]
• We know a lot about simple properties of key variables• Key variables often change the main effects of other variables• Simple decisions about contrast categories can influence
interpretations • Interaction terms are often significant and influential
• We have only scratched the surface of understanding key variables in multivariate context and interpretation
• Key variables are often endogenous (because they are ‘key’!)• Work on standards / techniques for multi-process systems and/or
comparing structural breaks involving key variables is attractive
Key variables 31
An example: Occupations
• In the social sciences, occupation is seen as one of the most important things to know about a personDirect indicator of economic circumstancesProxy Indicator of ‘social class’ or ‘stratification’
• Projects at Stirling (www.dames.org.uk)• GEODE – how social scientists use data on occupations• DAMES – extending GEODE resources
Stage 1 - Collecting Occupational Data (and making a mess)
Example 1: BHPS Occ description Employment status SOC-2000 EMPST
Miner (coal) Employee 8122 7
Police officer (Serg.) Supervisor 3312 6
Electrical engineer Employee 2123 7
Retail dealer (cars) Self-employed w/e 1234 2
Example 2: European Social Survey, parent’s dataOcc description SOC-2000 EMPST
Miner ?8122 ?6/7
Police officer ?3312 ?6/7
Engineer ?? ??
Self employed businessman ?? ?1/2
Key variables 33
www.geode.stir.ac.uk/ougs.html
34
Occupations: we agree on what we should do: Preserve two levels of data
Source data: Occupational unit groups, employment status Social classifications and other outputs
Use transparent (published) methods [i.e. OIR’s] for classifying index units for translating index units into social classifications
for instance.. Bechhofer, F. 1969. 'Occupations' in Stacey, M. (ed.) Comparability in Social Research.
London: Heinemann. Jacoby, A. 1986. 'The Measurement of Social Class' Proceedings from the Social
Research Association seminar on "Measuring Employment Status and Social Class". London: Social Research Association.
Lambert, P.S. 2002. 'Handling Occupational Information'. Building Research Capacity 4: 9-12.
Rose, D. and Pevalin, D.J. 2003. 'A Researcher's Guide to the National Statistics Socio-economic Classification'. London: Sage.
35
…in practice we don’t keep to this...
Inconsistent preservation of source data• Alternative OUG schemes
• SOC-90; SOC-2000; ISCO; SOC-90 (my special version)
• Inconsistencies in other index factors • ‘employment status’; supervisory status; number of employees• Individual or household; current job or career
Inconsistent exploitation of Occupational Information• Numerous alternative occupational information files
• (time; country; format)• Substantive choices over social classifications
• Inconsistent translations to social classifications – ‘by file or by fiat’• Dynamic updates to occupational information resources • Strict security constraints on users’ micro-social survey data• Low uptake of existing occupational information resources
Key variables 36
GEODE provides services to help social scientists deal with occupational information resources
1) disseminate, and access other, Occupational Information Resources
2) Link together their (secure) micro-data with OIR’s
External user
(micro-social data)
Occ info (index file) (aggregate)
User’s output
(micro-social data)
id oug sex . oug CS-M CS-F EGP id oug CS
1 110 1 . 110 60 58 I 1 110 60 .
2 320 1 . 320 69 71 II 2 320 69 .
3 320 2 . 874 39 51 VIIa 3 320 71 .
4 874 1 . 4 874 39 .
5 874 2 . 5 874 51 .
Occupational information resources: small electronic files about OUGs…
Index units # distinct files (average size kb)
Updates?
CAMSIS, www.camsis.stir.ac.uk
Local OUG*(e.s.)
200 (100) y
CAMSIS value labelswww.camsis.stir.ac.uk
Local OUG 50 (50) n
ISEI tools, home.fsw.vu.nl/~ganzeboom
Int. OUG 20 (50) y
E-Sec matrices www.iser.essex.ac.uk/esec
Int. OUG*(e.s.)
20 (200) n
Hakim gender seg codes (Hakim 1998)
Local OUG 2 (paper) n
Key variables 38
For example: ISCO-88 Skill levels classification
Key variables 39
and: UK 1980 CAMSIS scales and CAMCON classes
Key variables 40
Existing resources on occupations
Popular websites: • http://www2.warwick.ac.uk/fac/soc/ier/publications/software/cascot/ • http://home.fsw.vu.nl/~ganzeboom/pisa/ • www.iser.essex.ac.uk/esec/ • www.camsis.stir.ac.uk/occunits/distribution.html
Emerging resource: http://www.geode.stir.ac.uk/
Some papers: – Chan, T. W., & Goldthorpe, J. H. (2007). Class and Status: The Conceptual
Distinction and its Empirical Relevance. American Sociological Review, 72, 512-532.
– Rose, D., & Harrison, E. (2007). The European Socio-economic Classification: A New Social Class Scheme for Comparative European Research. European Societies, 9(3), 459-490.
– Lambert, P. S., Tan, K. L. L., Gayle, V., Prandy, K., & Bergman, M. M. (2008). The importance of specificity in occupation-based social classifications. International Journal of Sociology and Social Policy, 28(5/6), 179-192.
• Growing interest in longitudinal analysis and use of longitudinal summary data on occupations
• Intuitive measures (e.g. ever in Class I) Lampard, R. (2007). Is Social Mobility an Echo of Educational Mobility?
Sociological Research Online, 12(5).
• Empirical career trajectories / sequences Halpin, B., & Chan, T. W. (1998). Class Careers as Sequences. European
Sociological Review, 14(2), 111-130.
• Growing cross-national comparisons– Ganzeboom, H. B. G. (2005). On the Cost of Being Crude: A Comparison of
Detailed and Coarse Occupational Coding. In J. H. P. Hoffmeyer-Zlotnick & J. Harkness (Eds.), Methodological Aspects in Cross-National Research (pp. 241-257). Mannheim: ZUMA, Nachrichten Spezial.
• Treatment of the non-working populations• Seldom adequate to treat non-working as a category• ‘Selection modelling’ approaches expanding
Key variables 42
Occupations as key variables
• Extensive debate about occupation-based social classifications • Document your procedures.. • ..as you may be asked to do something different..
• When choosing between occupation-based measures…– They all measure, mostly, the same things – Don’t assume concepts measure measures
• Lambert, P. S., & Bihagen, E. (2007). Concepts and Measures: Empirical evidence on the interpretation of ESeC and other occupation-based social classifications. Paper presented at the ISA RC28 conference, Montreal (14-17 August), www.camsis.stir.ac.uk/stratif/archive/lambert_bihagen_2007_version1.pdf .
• Endogeneity – ignoring multiprocess system may bias results
(e.g. selection bias)
Key variables 52
Pragmatics of model choice
• General rapid expansion in model functionality in statistical packages
• Stata stands out for it wide range of data management and data analysis functionality– E.g. ‘statsby’; ‘est table’; ‘outreg2’; ‘estout’
facilitate testing and comparing related models with different combinations of variables
Key variables 53
c) Other variables and interaction effects
• A very important influence on one RHS coefficient is what else is in the RHS and what it is interacted with
Some brief comments on: • Offsets (constraints)• Interactions• Logit models’ fixed variance
Key variables 54
A comment on ‘offsets’- for comparisons between regressions, it is sometimes suitable to force the coefficients of some variables (e.g. controls) to have a certain fixed value
- Below example (predicting income) using ‘cnsreg’ in Stata, e.g.: regress lninc fem age femage matrix define mod1m=e(b)scalar fem_coef=mod1m[1,1]constraint def 1 fem=fem_coefcnsreg lninc fem age femage mcamsis, constraints(1)
• Start with main effects – get a good idea how they work
• Be careful how you fit interaction effects– Often appealing substantively – In practice not always significant (especially higher order)– Hard to interpret higher order interactions– Over-fit - check for replication (e.g. in other datasets)– Always wise to formally test interactions (cf. armchair critics) – Best to construct your own interaction variable(s) and maybe
fit them as a single X (especially complicated categorical interactions)
56
The fixed variance in logit: linear cf. categorical outcomes
GHS Data
OLS: Y = age left education (years)
Logit: Y = Graduate / Non Graduate
X Vars
Female
4-category social Class (Advantaged; Lower Supervisory; Semi-routine; Routine)
Age (centred at 40)
Key variables 57
Regression Estimates
A B C D E
Female -0.32 -0.34 -0.27
Age (40) -0.06 -0.06 -0.05
Supervisory -1.83 -1.85
Semi-Routine -1.98 -1.88
Routine -2.40 -2.33
Constant 17.52 17.5 17.75 18.22 18.54
Key variables 58
Linear Regression Models
• 1 unit change in X leading to a change in Y
• The is consistent – minor insignificant random variation (survey data)
• As long as the X vars are uncorrelated
(a classical regression assumption)
Key variables 59
Estimates (logit scale)
A B C D E
Female -0.24 -0.23 -0.20
Age (40) -0.03 -0.03 -0.04
Supervisory -1.46 -1.52
Semi-Routine -1.82 -1.87
Routine -2.65 -2.70
Constant -0.90 -0.80 -0.39 -0.68 -0.04
Parameterization ??
Key variables 60
Logit Model• Estimates on a logit scale
• The estimates a shift from X1=0 to X1=1 leads to a change in the log odds of y=1
• Even when the X vars are uncorrelated, including additional variables can lead to changes in estimates
• The estimates the effect given all other X vars in the model
• Fixed variance in the logit model (/3)
Key variables 61
Summary – Social science measurement and
functional form
• We argue that the route to better critical understanding of variable effects combines complex analysis with many mundane, prosaic tasks in checking data
Abbott, A. (2006). Mobility: What? When? How? In S. L. Morgan, D. B. Grusky & G. S. Fields (Eds.), Mobility and Inequality. Stanford University Press.
Bosveld, K., Connolly, H., Rendall, M. S., & (2006). A guide to comparing 1991 and 2001 Census ethnic group data. London: Office for National Statistics.
Burgess, R. G. (Ed.). (1986). Key Variables in Social Investigation. London: Routledge. Crouchley, R., & Fligelstone, R. (2004). The Potential for High End Computing in the Social Sciences. Lancaster:
Centre for Applied Statistics, Lancaster University, and http://redress.lancs.ac.uk/document-pool/hecsspotential.pdf. Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research Methodology, 9(2),
143-158. Dorling, D., & Simpson, S. (Eds.). (1999). Statistics in Society: The Arithmetic of Politics. London: Arnold. Freese, J. (2007). Replication Standards for Quantitative Social Science: Why Not Sociology? Sociological Methods
and Research, 36(2), 2007. Harkness, J., van de Vijver, F. J. R., & Mohler, P. P. (Eds.). (2003). Cross-Cultural Survey Methods. New York:
Wiley. Hoffmeyer-Zlotnik, J. H. P., & Wolf, C. (Eds.). (2003). Advances in Cross-national Comparison: A European
Working Book for Demographic and Socio-economic Variables. Berlin: Kluwer Academic / Plenum Publishers. Irvine, J., Miles, I., & Evans, J. (Eds.). (1979). Demystifying Social Statistics. London: Pluto Press. Jowell, R., Roberts, C., Fitzgerald, R., & Eva, G. (2007). Measuring Attitudes Cross-Nationally. London: Sage. Lambert, P. S., Prandy, K., & Bottero, W. (2007). By Slow Degrees: Two Centuries of Social Reproduction and
Mobility in Britain. Sociological Research Online, 12(1). Prandy, K. (2002). Measuring quantities: the qualitative foundation of quantity. Building Research Capacity, 2, 3-4. Procter, M. (2001). Analysing Survey Data. In G. N. Gilbert (Ed.), Researching Social Life, Second Edition (pp.
252-268). London: Sage. Schneider, S. L. (2008). The International Standard Classification of Education (ISCED-97). An Evaluation of
Content and Criterion Validity for 15 European Countries. Mannheim: MZES. Stacey, M. (Ed.). (1969). Comparability in Social Research. London: Heineman.