Introduction Data Preparation Language Variation Suite Working with Data Visual Analytics Inferential Analysis Data Modification Mixed Effects RBRUL Appendix Cross Tabulation Data Modification References Optimizing Language Variation Analysis: Language Variation Suite Olga Scrivner, Manuel D´ ıaz-Campos and Rafael Orozco [email protected][email protected][email protected]Indiana University and Louisiana State University NWAV45, 2016 1 / 93
99
Embed
Language Variation Suite - interactive toolkit for quantitative analysis
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Optimizing Language Variation Analysis:Language Variation Suite
Olga Scrivner, Manuel Dıaz-Campos and Rafael Orozco
Upload file, data summary, adjust data, cross tabulation
3 Visual Analysis
Plotting, cluster classification
4 RBRUL
New version by Daniel Johnson!
5 Inferential statistics
Modeling, regression, conditional trees, random forest,model comparison
34 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Modeling
35 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Modeling
35 / 93
We are interested in RETENTION= Application
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Regression Types
Model
a.) Fixed effects
b.) Mixed effects - individual speaker/token variation (withingroup)
Type of Dependent Variable
a.) Binary/categorical (only two values)
b.) Continuous (numeric)
c.) Multinomial - categorical with more than two values
36 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Regression
37 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Model Output
38 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Interpretation
1 Estimate: reported in log-odds: negative or positive effectcloser to zero - lesser effect
2 P - significance (p < 0.05)
39 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Interpretation
Lexical item Fourth has a negative effect on retention and issignificant
Normal style has a slightly negative effect on retention but itscoefficient is not significant
Macy’s and Saks have a positive and significant effect onretention. Saks (upper middle class store) is more significantthan Macy’s (middle class store)
Lexical item Fourth has a negative effect on retention and issignificant
Normal style has a slightly negative effect on retention but itscoefficient is not significant
Macy’s and Saks have a positive and significant effect onretention. Saks (upper middle class store) is more significantthan Macy’s (middle class store)
Fixed Effects Model : All predictors are treated independently.
Underlying assumption - no group-internalvariation between speakers or tokens
Mixed Effects Model : Allows for evaluation of individual- andgroup-level variation
68 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Fixed and Mixed Models: Errors
Fixed Regression Model - ignoring individual variations(speakers or words) may lead to Type I Error:“a chance effect is mistaken for a real differencebetween the populations”
Mixed Regression Model - prone to Type II Error:“if speaker variation is at a high level, we cannotdiscern small population effects without a largenumber of speakers” (Johnson 2009, 2015)
69 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Mixed Effect Regression
Mixed Model = fixed effects + random effects
Fixed-effects factor - “repeatable and a small number of levels”
Random-effects factor - “a non-repeatable random sample
from a larger population” (Wieling 2012)
walk, sleep, study, finish, eat, etc
aspectual verb, stative verb
speaker1, speaker3, speaker3, etc
male, female
70 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Mixed Effect Regression
Mixed Model = fixed effects + random effects
Fixed-effects factor - “repeatable and a small number of levels”
Random-effects factor - “a non-repeatable random sample
from a larger population” (Wieling 2012)
walk, sleep, study, finish, eat, etc
aspectual verb, stative verb
speaker1, speaker3, speaker3, etc
male, female
70 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Mixed Effect Regression
Mixed Model = fixed effects + random effects
Fixed-effects factor - “repeatable and a small number of levels”
Random-effects factor - “a non-repeatable random sample
from a larger population” (Wieling 2012)
walk, sleep, study, finish, eat, etc
aspectual verb, stative verb
speaker1, speaker3, speaker3, etc
male, female
70 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Mixed Effect Modeling
71 / 93
NULL when the dependent variable is continuous
Fixed Effects - independent variables
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Mixed Effect Modeling
72 / 93
Mixed Effects - group-internal variation
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Regression Results
73 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Interpretation - Random Effects
1 Standard Deviation: a measure of the variability for eachrandom effect (speakers and tokens)
2 Residual: random variation that is not due to speakers ortokens (residual error)
74 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Interpretation - Fixed Effects
1 Estimate/coefficient: reported in log-odds (negative orpositive)
2 P-value: tells you if the level is significant
75 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Language Variation Suite - Structure
1 Demo
Brief introduction
2 Data
Upload file, data summary, adjust data, cross tabulation
Cross-tabulation examines the relationship between twovariables (their interaction).
85 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Cross-Tabulation: One Dependent and OneIndependent Variables
86 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Cross-Tabulation Output
Raw frequency / Proportion by column / Proportion across row
87 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Appendix 2: Data Modification
88 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Adjust Data
Retain: Select data subset
Exclude: Exclude variables from a factor group
Recode: Combine and rename variables
Change class: Numeric → factor; factor → numeric
Transform: Apply log transformation to a specific column
ADJUSTED DATASET:
Run - to apply all above changes
Reset - to reset to the original dataset
89 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Exclude: Emphatic Style
90 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Adjusted Dataset
91 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
Adjusting Dataset
To revert to the original data, select RESET:
92 / 93
Introduction
DataPreparation
LanguageVariationSuite
Working withData
VisualAnalytics
InferentialAnalysis
DataModification
Mixed Effects
RBRUL
Appendix
Cross Tabulation
DataModification
References
References I
[1] Baayen, Harald. 2008. Analyzing linguistic data: A practical introduction to statistics. Cambridge:Cambridge University Press
[2] Bentivoglio, Paola and Mercedes Sedano. 1993. Investigacion sociolinguıstica: sus metodos aplicados auna experiencia venezolana. Boletın de Linguıstica 8. 3-35
[3] Gries, Stefan Th. 2015. Quantitative designs and statistical techniques. In Douglas Biber RandiReppen (eds.), The Cambridge Handbook of English Corpus Linguistics. Cambridge: CambridgeUniversity Press
[4] Labov, W. 1966. The Social Stratification of English in New York City. Washington: Center for AppliedLinguistics
[5] http://gifsanimados.espaciolatino.com/x bob esponja 8.gif