Applying Generalized Additive Mixed Modeling: Tuscan Dialects vs. Standard Italian Martijn Wieling 1 , Simonetta Montemagni 2 , John Nerbonne 1 and Harald Baayen 3 1 University of Groningen, Center for Language and Cognition Groningen 2 Istituto di Linguistica Computationale “Antonio Zampolli”, CNR, Pisa 3 Eberhard Karls University, Tübingen; University of Alberta, Edmonton Leuven Statistics Days 2012, June 7 - 8, 2012 Martijn Wieling, Simonetta Montemagni, John Nerbonne and Harald Baayen Applying Generalized Additive Mixed Modeling 1/20
20
Embed
Applying Generalized Additive Mixed Modeling: Tuscan Dialects … · 2014-04-08 · Applying Generalized Additive Mixed Modeling: Tuscan Dialects vs. Standard Italian Martijn Wieling
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Applying Generalized Additive Mixed Modeling:Tuscan Dialects vs. Standard Italian
Martijn Wieling1, Simonetta Montemagni2, John Nerbonne1 and HaraldBaayen3
1University of Groningen, Center for Language and Cognition Groningen
2Istituto di Linguistica Computationale “Antonio Zampolli”, CNR, Pisa
3Eberhard Karls University, Tübingen; University of Alberta, Edmonton
Leuven Statistics Days 2012, June 7 - 8, 2012
Martijn Wieling, Simonetta Montemagni, John Nerbonne and Harald Baayen Applying Generalized Additive Mixed Modeling 1/20
Overview
IntroductionGeneralized Additive Mixed ModelingStandard Italian and Tuscan dialects
Material
Methods
Results
Discussion
Martijn Wieling, Simonetta Montemagni, John Nerbonne and Harald Baayen Applying Generalized Additive Mixed Modeling 2/20
Generalized Additive Modeling (1)
linear model : linear relationship between predictors and dependentvariable: y = a1x1 + ...+ anxn
Non-linearities via explicit parametrization: y = a1x21 + a2x1 + ...
generalized linear model : linear relationship between predictors anddependent variable via link function: g(y) = a1x1 + ...+ anxn
Example: logistic regression for predicting a binary outcome
generalized additive model (GAM): relationship between individualpredictors and (possibly transformed) dependent variable is estimated bya non-linear smooth function: g(y) = s(x1) + s(x2, x3) + a4x4 + ...
multiple predictors can be combined in a (hyper)surface smooth
Martijn Wieling, Simonetta Montemagni, John Nerbonne and Harald Baayen Applying Generalized Additive Mixed Modeling 3/20
Generalized Additive Modeling (2)
Advantage of GAM over manual specification of non-linearities: theoptimal shape of the non-linearity is determined automatically
appropriate degree of smoothness is automatically determined on the basisof cross validation to prevent overfitting
Choosing a smoothing basisSingle predictor or isotropic predictors: thin plate regression spline
Efficient approximation of the optimal (thin plate) spline
Generalized Additive Mixed Modeling:Random effects can be treated as smooths as well (Wood, 2008)R: gam and bam (package mgcv)
For more (mathematical) details, see Wood (2006)
Martijn Wieling, Simonetta Montemagni, John Nerbonne and Harald Baayen Applying Generalized Additive Mixed Modeling 4/20
Standard Italian and Tuscan dialects
Standard Italian originated in the 14th century as a written languageIt originated from the prestigious Florentine varietyThe spoken standard Italian language was adopted in the 20th century
People used to speak in their local dialect
In this study, we investigate the relationship between standard Italian andTuscan dialects
We focus on lexical variationWe attempt to identify which social, geographical and lexical variablesinfluence this relationship
Martijn Wieling, Simonetta Montemagni, John Nerbonne and Harald Baayen Applying Generalized Additive Mixed Modeling 5/20
Material: lexical data
We used lexical data from the Atlante Lessicale Toscano (ALT)We focus on 213 locations (> 2000 informants) and 170 conceptsWe grouped the informants based on age (young/old: born after/before1930) and used the majority’s lexical form
It was computationally not feasible to include the informants individually
Total number of cases: 69,259For every case, we identified if the lexical form was different from standardItalian (1) or the same (0)
Martijn Wieling, Simonetta Montemagni, John Nerbonne and Harald Baayen Applying Generalized Additive Mixed Modeling 6/20
Geographic distribution of locations
S
FP
Martijn Wieling, Simonetta Montemagni, John Nerbonne and Harald Baayen Applying Generalized Additive Mixed Modeling 7/20
Material: additional data
In addition, we obtained the following information:Number of inhabitants in each locationAverage income in each locationAverage age in each locationFrequency of each concept
Martijn Wieling, Simonetta Montemagni, John Nerbonne and Harald Baayen Applying Generalized Additive Mixed Modeling 8/20
Modeling geography’s influence with a GAM
# logistic regression: family="binomial"> library(mgcv) # current version 1.7-17> geo = gam(NotStd ~ s(Lon,Lat), data=tusc, family="binomial", method="ML")> vis.gam(geo,view=c("Lon","Lat"),plot.type="contour",color="terrain",...)
10.0 10.5 11.0 11.5 12.0
42.5
43.0
43.5
44.0
Contour plot
Longitude
Latit
ude
−0.
5
−0.5
−0.4
−0.4
−0.4
−0.3
−0.3
−0.3
−0.2
−0.2
−0.2
−0.1
−0.1
−0.1
−0.1
0
0
0
0.1
0.1
0.1
Martijn Wieling, Simonetta Montemagni, John Nerbonne and Harald Baayen Applying Generalized Additive Mixed Modeling 9/20
Adding a random intercept to create a GAMM
# bam is quicker and less memory-intensive than gam (here 0.3 GB vs. 1.8 GB)> model = bam(NotStd ~ s(Lon,Lat) + s(Concept,bs="re"),
R-sq.(adj) = 0.323 Deviance explained = 27.3%ML score = 97869 Scale est. = 1 n = 69259
Martijn Wieling, Simonetta Montemagni, John Nerbonne and Harald Baayen Applying Generalized Additive Mixed Modeling 11/20
Varying geography’s influence based on concept freq.
Wieling, Nerbonne and Baayen (2011, PLoS ONE) showed that the effectof word frequency varied depending on geographyHere we explicitly include this in the GAMM and we also allow forvariation per age group
> m = bam(NotStd ~ te(Lon, Lat, Freq, by=IsOld, d=c(2,1)) + IsOld + CommSize+ ..., data=tusc, family="binomial", method="ML")
The results will be discussed next...Note that the exact random-effect structure is subject to change, as therewere some changes made to the mgcv package recently, affecting p-valuecalculations of the random-effect smooths (i.e. use version ≥ 1.7-17)
Martijn Wieling, Simonetta Montemagni, John Nerbonne and Harald Baayen Applying Generalized Additive Mixed Modeling 12/20
Martijn Wieling, Simonetta Montemagni, John Nerbonne and Harald Baayen Applying Generalized Additive Mixed Modeling 17/20
Discussion
Using a generalized additive mixed model (GAMM) to investigate lexicaldifferences between standard Italian and Tuscan dialects revealedinteresting dialectal patterns
GAMs are very suitable to model the non-linear influence of geographyThe regression approach allowed for the simultaneous identification ofimportant social, geographical and lexical predictorsBy including many concepts, results are less subjective than traditionalanalyses focusing on only a few pre-selected conceptsThe mixed-effects regression approach still allows a focus on individualconcepts
There are some drawbacks to GAMMs, however...gam and bam are computationally much more expensive than linearmixed-effects modeling using lmer (lme4 package)Model comparison is problematic when including random-effect smooths(i.e. using anova(gam1,gam2) is useless)
Martijn Wieling, Simonetta Montemagni, John Nerbonne and Harald Baayen Applying Generalized Additive Mixed Modeling 18/20
Some advertisements
More information about applying GAMMs in LVLC research:
Martijn Wieling (2012). A Quantitative Approach to Social andGeographical Dialect Variation. PhD thesis, Rijksuniversiteit Groningen.Available at http://www.martijnwieling.nl/phd
Workshop on Quantitative Linguistics and Dialectology: June 29, 2012Speakers: Mark Liberman, Harald Baayen, Roeland van Hout, Robert G.Shackleton, Jack Chambers, Piet van Reenen, Simonetta Montemagni,Charlotte Gooskens and Wilbert HeeringaRegistration: http://www.martijnwieling.nl/workshop
Martijn Wieling, Simonetta Montemagni, John Nerbonne and Harald Baayen Applying Generalized Additive Mixed Modeling 19/20