Page 1
For Peer Review
Metric structure and rhyme predictability modulate speech
intensity during child-directed reading
Journal: Language and Speech
Manuscript ID Draft
Manuscript Type: Original Article
Date Submitted by the Author: n/a
Complete List of Authors: Fitzroy, Ahren; Mount Holyoke College, Psychology and Education; University of Massachusetts, Amherst, Psychological and Brain Sciences Breen, Mara ; Mount Holyoke College, Psychology and Education
Keywords: speech prosody, intensity, meter, rhyme, child-directed reading
Abstract:
Temporal and phonological predictability in children’s literature may support early literacy acquisition. Because such texts are typically heard
before they are read, realization of predictive structure in caregiver prosody could guide children’s attention during shared reading in a manner that supports reading subskill development. However, little is known about how predictive structure is realized in prosody during child-directed reading. We investigated whether speakers use word intensity to signal predictive metric and rhyme structure in a corpus of child-directed and read-alone productions of The Cat in the Hat (Dr. Seuss, 1957). Using linear mixed-effects regression, we modeled the maximum intensity (dB) of each produced monosyllabic word as a function of metric strength, rhyme predictability, and a set of control parameters. In the control model, intensity increased with lower lexical frequency, capitalization, first mention, and syntactic boundary likelihood. Metric structure predicted word
intensity beyond these control factors hierarchically, such that words were produced with highest intensity when aligned with beat one in a 6/8 metric structure, intermediate intensity when aligned with beat four, and lowest intensity when aligned with all other beats. Additionally, phonologically predictable rhyme targets were produced with reduced intensity. Together these results demonstrate that predictability is encoded in word intensity during child-directed reading, and that metric structure is realized hierarchically in word intensity. Further, the manner by which predictability is encoded in word intensity differs from that previously reported for word duration in this same corpus (Breen, 2018), demonstrating that intensity and duration present nonidentical prosodic information channels.
https://mc.manuscriptcentral.com/las
Language and Speech
Page 2
For Peer Review
Page 1 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 3
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
Metric structure and rhyme predictability modulate speech intensity during child-directed
reading
Ahren B. Fitzroy1,2 & Mara Breen1
1Department of Psychology and Education, Mount Holyoke College
2Department of Psychological and Brain Sciences, University of Massachusetts, Amherst
Corresponding author:
Ahren B. Fitzroy, Ph.D.
Mount Holyoke College
50 College Street
South Hadley, MA 01075
USA
E-mail: [email protected]
Page 2 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 4
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
Abstract
Temporal and phonological predictability in children’s literature may support early literacy
acquisition. Because such texts are typically heard before they are read, realization of predictive
structure in caregiver prosody could guide children’s attention during shared reading in a manner
that supports reading subskill development. However, little is known about how predictive
structure is realized in prosody during child-directed reading. We investigated whether speakers
use word intensity to signal predictive metric and rhyme structure in a corpus of child-directed
and read-alone productions of The Cat in the Hat (Dr. Seuss, 1957). Using linear mixed-effects
regression, we modeled the maximum intensity (dB) of each produced monosyllabic word as a
function of metric strength, rhyme predictability, and a set of control parameters. In the control
model, intensity increased with lower lexical frequency, capitalization, first mention, and
syntactic boundary likelihood. Metric structure predicted word intensity beyond these control
factors hierarchically, such that words were produced with highest intensity when aligned with
beat one in a 6/8 metric structure, intermediate intensity when aligned with beat four, and lowest
intensity when aligned with all other beats. Additionally, phonologically predictable rhyme
targets were produced with reduced intensity. Together these results demonstrate that
predictability is encoded in word intensity during child-directed reading, and that metric
structure is realized hierarchically in word intensity. Further, the manner by which predictability
is encoded in word intensity differs from that previously reported for word duration in this same
corpus (Breen, 2018), demonstrating that intensity and duration present nonidentical prosodic
information channels.
Page 3 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 5
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
Metric structure and rhyme predictability modulate speech intensity during child-directed
reading
Two common features in children’s literature across languages are the presence of a
strong metrical framework, and the use of rhyming phrases (Burling, 1966; Hanna, Lindner, &
Dufter, 2002). The temporal and phonological predictability created by these features may
support the purported benefits of these texts for literacy acquisition (Goswami, 1999; Huss,
Verney, Fosker, Mead, & Goswami, 2011); early children’s literature is typically heard before it
is read, and temporal and phonological predictability guide the allocation of attention to
important moments in the auditory stream (Astheimer & Sanders, 2009, 2011; Breen, Dilley,
McAuley, & Sanders, 2014; Fitzroy & Sanders, 2015). However, little is empirically known
about how meter and rhyme in children’s literature are instantiated in spoken productions of
these texts, which is a necessary precursor to understanding how such structures might guide
perceptual or literacy learning in young listeners. Breen (2018) recently demonstrated that the
metric and rhyme structures of The Cat in the Hat (Dr. Seuss, 1957) are realized in the duration
and temporal spacing of words in child-directed productions of the text. Prominence in speech is
imparted through several prosodic cues however, with duration and intensity typically thought to
be the most important (Brenier, Cer, & Jurafsky, 2005; Kochanski, Grabe, Coleman, & Rosner,
2005; Kochanski & Orphanidou, 2008; Silipo & Greenberg, 2000). Therefore, in the present
paper we examine the effects of meter and rhyme in The Cat in the Hat on the produced intensity
of words during child-directed reading, and consider the results in light of previous findings of
impacts on word duration (i.e., Breen, 2018).
Multiple studies have demonstrated that increased intensity imparts prominence in
speech. Increased intensity for more prominent syllables has been observed for isolated and
Page 4 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 6
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
continuous speech in which prominence is subjectively assessed by a listener (Brenier et al.,
2005; Kochanski et al., 2005; Silipo & Greenberg, 2000; Streefkerk, Pols, & Bosch, 1999), and
for metronome-synchronous speech in which prominence is objectively defined as syllabic
alignment with a metronome click (Boutsen, Brutten, & Watts, 2000; Kochanski & Orphanidou,
2008). Intensity variation signals metric structure similarly during expressive musical
performance, with greater intensity indicating greater metric strength. Importantly, intensity in
expressive music performance imparts metric strength in a hierarchical manner: metric structures
that involve three levels of strength (e.g., 6/8 meter) are produced with three corresponding
levels of intensity (Drake & Palmer, 1993). It is unclear whether intensity in continuous speech
imparts metric prominence in a similarly hierarchical manner, or only in a binary manner. The
strongly hierarchical metric structure of The Cat in the Hat is reflected in hierarchically produced
word duration (Breen, 2018), suggesting that it may also be signaled through hierarchical
modulation of produced word intensity.
Predictability in speech also modulates the production of words, as more predictable
words are typically acoustically reduced relative to less predictable words (Aylett & Turk, 2004;
Gregory, Raymond, Bell, Fosler-lussier, & Jurafsky, 1999; Jurafsky, Bell, Gregory, & Raymond,
2001). Moreover, predictability-related reduction is preferentially realized in word intensity,
such that more predictable words are produced with lower intensity regardless of whether they
are repeated in the local context (Lam & Watson, 2010). In contrast, local word repetition is
preferentially associated with shortened word duration (Lam & Watson, 2010). Most stanzas in
The Cat in the Hat are written such that the first line ends with a rhyme prime (e.g., “all” in (1)),
and the second line ends with a rhyme target (e.g., “fall” in (1)). This structure causes the stanza-
final rhyme target to be both phonologically and semantically predictable, which should cause
Page 5 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 7
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
the rhyme targets to be reduced during production. Breen (2018) observed equivalent word
duration but lengthened inter-word-onset intervals for predictable rhyme targets relative to
rhyme primes in The Cat in the Hat, which may be interpreted as a reduction of the rhyme target.
However, the temporal characteristics alone do not conclusively support this interpretation.
Given that predictability-related reduction is preferentially realized in word intensity (Lam &
Watson, 2010), we predict that word intensity will be reduced for rhyme targets relative to rhyme
primes in The Cat in the Hat.
In the current study, we test the hypotheses that metric structure and rhyme predictability
in The Cat in the Hat are realized in produced word intensity during child-directed reading. We
assess the influence of metric strength and rhyme predictability on word intensity in a corpus of
child-directed and read-alone productions of The Cat in the Hat (Breen, 2018), while controlling
for intrinsic word characteristics, font emphasis, and local repetition. We predict that metrically
stronger words will be produced with higher intensity, and that, similar to prior findings in
expressive musical performance, hierarchical metric structure will be realized as hierarchically
produced word intensity. We further predict that predictable rhyme targets will be produced with
reduced intensity relative to rhyme primes.
Method
Participants
Eighteen young adult (age 18-35 years) Mount Holyoke students participated in the
current study. All participants identified as female and as native speakers of American English,
defined as speaking English in the United States since at least age five. One participant was later
determined to not be a native speaker of American English and withdrawn from the study,
Page 6 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 8
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
leaving 17 participants in the final analyses. All participants were compensated for their time
with research participation credit for Psychology courses.
Stimuli
Participants read The Cat in the Hat (Dr. Seuss, 1957) aloud from a hardcover copy of the
text. The Cat in the Hat is a 61-page illustrated book written primarily in anapestic (i.e., weak-
weak-STRONG) tetrameter. It consists of 1625 words (1576 monosyllabic, 236 unique lexemes)
organized primarily into seventy stanzas, with each stanza containing two lines of four anapests
each (as in (1)). The first line in each stanza ends with a rhyme prime and the second line ends
with a phonologically predictable rhyme target. In addition, there are six single anapestic lines in
the text. In two cases, these single lines rhyme internally (e.g., And then something went BUMP!
How that bump made us jump!). In the other four cases, the final word of the single line rhymes
with the preceding stanza.
(1) “Put me down!” said the fish.
“This is no fun at all!
Put me down!” said the fish.
“I do NOT wish to fall!”
Procedure
Participants were randomly assigned to read The Cat in the Hat aloud in its entirety in
one of two locations: to an audience of preschool students (4-5 years old) in a quiet classroom (n
= 8), or alone in a quiet room (n = 9). In both environments participants read the book while
seated and holding the book in their hands, and turned the pages on their own. All participants’
productions were digitally recorded using a head-mounted microphone (Shure SM10A)
connected to a pre-amplifier (Rolls MP13 Mini-mic) plugged into the motherboard audio input
Page 7 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 9
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
of a laptop (in the classroom) or desktop computer (in the quiet room). Production recordings
were digitized at 44,100 Hz with 16-bit resolution. Participants were given no specific
instructions on how to read the text, other than to read it aloud. All participants provided
informed consent before taking part in the study, and parents of the children who listened to the
readings provided written assent for their child’s participation.
Acoustic measures
[FIGURE 1 HERE]
Figure 1 – Word intensity measurement. An excerpt from one The Cat in The Hat production is
plotted as a time-frequency domain spectrogram (top) and as a time domain waveform (bottom).
Identified word and silence boundaries are plotted as dashed vertical lines. The smoothed
intensity contour generated for this excerpt is plotted in black over the spectrogram, with the
parabolically-interpolated maximum intensity for each word indicated with an asterisk.
Word and silence boundaries (Figure 1) were identified in the production recordings by
automatic force-alignment with the text in Praat (Boersma & Weenink, 2001) using the
Prosodylab-Aligner (Gorman, Howell, & Wagner, 2011), which relies on the Hidden Markov
Model Toolkit (Young & Young, 1993). Automatic force-alignment results were manually
inspected and corrected as needed. As shown in Figure 1, linearly-spaced intensity (dB SPL)
contours were calculated automatically in Praat for each production recording. Intensity contours
were smoothed based on a minimum pitch of 100 Hz by squaring intensity values then
convolving them with a 32 ms Gaussian window, which kept pitch-synchronous intensity
variations to less than 0.00001 dB (Boersma & Weenink, 2001). Word intensity was defined as
the parabolically-interpolated maximum of the intensity contour occurring within each word.
Page 8 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 10
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
Multisyllabic words were excluded from further analyses, because the unstressed syllables in
these words have reduced intensities for reasons unrelated to metric structure (Fry, 1955).
Disfluent and incorrect word productions and were manually identified and excluded from
further analyses, resulting in a loss of 473 out of 26,792 possible monosyllabic word productions
(1.77%). The remaining maximum intensity values of correctly-produced monosyllabic words
were centered and scaled to standard deviation units (i.e., Z-transformed) separately for each
participant.
Data analysis
We analyzed the standardized maximum intensity of correctly-produced monosyllabic
words using linear mixed-effects regression modeling implemented in the lme4 package (Bates,
Mächler, Bolker, & Walker, 2014) within R (R Core Team, 2015; RStudio Team, 2014). Data
and analysis scripts can be downloaded from https://osf.io/xgy4t. We predicted maximum word
intensity as a function of metric strength, rhyme predictability, linguistic control factors (number
of phonemes, lexical frequency, word class, font emphasis, intra-stanza repetition, and syntactic
boundary strength), and presence of a child audience.
[FIGURE 2 HERE]
Figure 2 – Regression predictors. Word intensity was modeled using linear mixed-effects
regression with within-subjects factors of metric strength in a 6/8 metric structure (MS), rhyme
predictability (RP), number of phonemes (#P), lexical frequency (LF), word class (WC), font
Page 9 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 11
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
emphasis (FE), intra-stanza repetition (ISR), and syntactic boundary strength (SBS). Lexical
frequency values are rounded to the nearest tenth for clarity. See text for details.
Metric strength was defined by beat position in a 6/8 metric hierarchy (Figure 2), with
beat 1 of each measure assigned metric strength 3 (strong), beat 4 assigned metric strength 2
(medium), and beats 2, 3, 5, and 6 assigned metric strength 1 (weak). The first stressed syllable
in each stanza was considered beat 1 of the first measure of that stanza. This metric parsing is in
agreement with previous linguistic metric descriptions of The Cat in the Hat as anapestic
tetrameter in terms of stress placement (Nel, 2004), but adopts a musical perspective of metric
phase in that the first stressed event in a phrase likely represents the start of a larger metric unit
(i.e., a measure) with any preceding unstressed events considered anacrustic (Lerdahl &
Jackendoff, 1983). Rhyme predictability was coded as 1 for verse-final rhyme target words (e.g.,
“fall” in (1)) and 0 for all other words.
To account for effects of basic word and text characteristics on produced intensity,
several linguistic control factors were included in the regression models (Figure 2). The 1576
monosyllabic words in The Cat in the Hat were annotated using the Medical Research Council
(MRC) Psycholinguistic Database (Coltheart, 1981) for number of phonemes (range = 1-6, M =
2.54, SD = 0.75) and lexical frequency as assessed by Kučera-Francis (K-F; Francis & Kučera,
1982) norms (log-transformed K-F frequency; range = 0-11.16, M = 7.37, SD = 2.31). For words
that did not have raw K-F frequencies (n = 18), the K-F frequency of the singular or infinitival
form of the word was substituted if available. There was no K-F frequency available for the word
plop, so it was excluded from further analyses. The analyzed words were also annotated for
syntactic word class (open [n = 540], closed [n = 1035]), font emphasis (normal font [n = 1550],
Page 10 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 12
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
SMALL CAPS [n = 25]), intra-stanza repetition (not repeated [n = 1243], repeated [n = 332]), and
syntactic boundary strength as defined by the Left-Right Boundary model (LRB; Breen, Watson,
& Gibson, 2011; Watson & Gibson, 2004) (range = 0-5, M = 1.24, SD = 2.00). For detailed
description about how the LRB model was applied, please see (Breen, 2018); in brief, the LRB
model assigns a higher probability of a phrase boundary with increases in the size of the material
a speaker has just produced (the left-hand side) and increases in what they are going to produce
(the right-hand side).
Our regression approach was to first create a fully saturated control model containing all
linguistic control factors as both fixed effects and random slopes over participant, and audience
as a fixed effect. Before being entered into the model as either a fixed effect or random slope,
continuous predictors were centered and categorical predictors were recoded such that the sum of
factor levels for each contrast was equal to zero. We then refined the control model by removing
each fixed effect individually in order of ascending t magnitude, and using a likelihood ratio test
to compare model fit with and without the fixed effect to justify its continued inclusion. The
control model included only fixed effects that significantly increased model fit. We then added
metric strength into the regression model as both a fixed effect and random slope over
participant, coded using simple contrast coding with strength level 2 (medium; beat 4) as the
reference level. This model did not converge, so we iteratively removed random slopes in order
of least variance explained until the model converged. We then compared the resultant metric
strength model to an updated version of the control model containing only those random slopes
included in the metric strength model using a likelihood ratio test to determine whether metric
strength predicted word intensity variance beyond the control parameters. Lastly, we added
rhyme predictability to the model as a fixed effect and random slope over participant, and
Page 11 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 13
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
compared the result to the prior model containing control factors and metric strength using a
likelihood ratio test to determine whether rhyme predictability accounted for unique word
intensity variance.
Results
Control model
The control model included number of phonemes, lexical frequency, word class, font
emphasis, intra-stanza repetition, and syntactic boundary strength as random slopes over
participant, and lexical frequency, font emphasis, intra-stanza repetition, and syntactic boundary
strength as fixed effects (Table 1). Audience presence did not have an effect on produced word
intensity (p > 0.8), and was not included in the final control model. Similarly, number of
phonemes and word class were not warranted for inclusion in the control model as fixed effects
(p’s > 0.3). Across individuals, higher word intensity was predicted by lower lexical frequency
(β = -0.133, SE = 0.006, t = -22.542), presentation in a capitalized font (β = 0.784, SE = 0.065, t
= 12.101), not being a repeat of recent material (β = -0.164, SE = 0.022, t = -7.567), and higher
likelihood of being a syntactic boundary (β = 0.015, SE = 0.008, t = 1.888).
Random slopes Variance SD
Intercept 0.0014615 0.03823
Number of phonemes1 0.0022393 0.04732
Lexical frequency1 0.0009228 0.03038
Word class 0.0076330 0.08737
Font emphasis 0.0437573 0.20918
Intra-stanza repetition1 0.0048553 0.06968
Syntactic boundary strength1 0.0015851 0.03981
Fixed effects β estimate β standard error t χ2 (p)
Intercept 0.021509 0.007745 2.777 -
Lexical frequency -0.132559 0.005881 -22.542 58.354 (< 0.001)
Font emphasis 0.784229 0.064807 12.101 38.876 (< 0.001)
Intra-stanza repetition -0.163933 0.021664 -7.567 37.551 (< 0.001)
Page 12 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 14
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
Syntactic boundary strength 0.014507 0.007638 1.888 3.358 (0.067)
Table 1 – Control model. Variance estimates and standard deviations are shown for each random
slope over participant included in the control model. β estimates, standard errors, and t-values are
shown for each fixed effect justified for inclusion in the control model, along with chi-square and
p values from the likelihood ratio tests justifying its inclusion. 1Note that the random slopes over
participant for number of phonemes, lexical frequency, intra-stanza repetition, and syntactic
boundary strength were later removed from the control model to allow direct comparison with
the experimental model containing metric strength.
Experimental models
[FIGURE 3 HERE]
Figure 3 – Fixed effect estimates (β) from final experimental model. Metric strength and rhyme
predictability estimates are highlighted in black, linguistic control factors are shown in grey.
Thicker horizontal bars indicate one standard error, thinner horizontal bars indicate two standard
errors.
As shown in Table 2 and Figures 3 and 4, position in a 6/8 metric hierarchy predicted
word intensity such that beat 1 was produced with highest intensity, beat 4 was produced with
intermediate intensity, and beats 2, 3, 5, and 6 were produced with similarly low intensity (beat 1
vs. beat 4: β = 0.322, SE = 0.058, t = 5.536; beats 2, 3, 5, 6 vs. beat 4: β = -0.191, SE = 0.054, t =
-3.554). Including metric strength in the regression model significantly improved model fit
relative to an updated version of the control model containing the same non-meter factors as
fixed effects and random slopes over participant, χ2 (11) = 1157.60, p < 0.001. Therefore, metric
strength provides explanatory power for produced word intensity beyond that explained by
intrinsic and contextual word chararacteristics.
Page 13 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 15
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
Random slopes Variance SD
Intercept 0.004285 0.06546
Beat 1 vs. beat 4 0.058568 0.24201
Beats 2, 3, 5, 6 vs. beat 4 0.041924 0.20475
Word class 0.020477 0.14310
Font emphasis 0.051992 0.22802
Fixed effects β estimate β standard error t
Intercept 0.138004 0.012368 11.158
Beat 1 vs. beat 4 0.322274 0.058215 5.536
Beats 2, 3, 5, 6 vs. beat 4 -0.191348 0.053841 -3.554
Lexical frequency -0.115302 0.003829 -30.113
Font emphasis 0.623397 0.071589 8.708
Intra-stanza repetition -0.158835 0.014046 -11.308
Syntactic boundary strength 0.002515 0.003936 0.639
Table 2 – Initial experimental model containing metric strength. Variance estimates and standard
deviations are shown for each random slope over participant included in the initial experimental
model examining metric strength. β estimates, standard errors, and t-values are shown for each
fixed effect included in this model.
[FIGURE 4 HERE]
Figure 4 – Maximum word intensity as a function of position in a 6/8 metric hierarchy and
rhyme predictability. Produced word intensity increased hierarchically with metric strength (left),
and predictable rhyme targets were produced with lower intensity than other words, including
phonologically similar but unpredictable rhyme primes (right). Note that all words that were not
rhyme targets were coded as unpredictable in the regression models, but only unpredictable
rhyme primes are shown here for clarity.
As shown in Table 3 and Figures 3 and 4, rhyme predictability predicted word intensity
such that predictable rhyme targets (e.g., “fall” in (1)) were produced with lower intensity than
other words, including metrically and phonologically similar rhyme primes (β = -0.434, SE =
Page 14 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 16
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
0.086, t = -5.030). Adding rhyme predictability to the regression model significantly improved
model fit relative to the model containing final control parameters and metric strength, χ2 (7) =
290.73, p < 0.001, demonstrating that rhyme predictability provides explanatory power for
produced word intensity beyond that explained by intrinsic and contextual word characteristics
and metric structure.
Random slopes Variance SD
Intercept 0.003865 0.06217
Beat 1 vs. beat 4 0.035875 0.18941
Beats 2, 3, 5, 6 vs. beat 4 0.021121 0.14533
Rhyme predictability 0.111544 0.33398
Word class 0.022192 0.14897
Font emphasis 0.058998 0.24290
Fixed effects β estimate β standard error t
Intercept 0.175713 0.012311 14.272
Beat 1 vs. beat 4 0.204102 0.041646 4.901
Beats 2, 3, 5, 6 vs. beat 4 -0.302707 0.040018 -7.564
Rhyme predictability -0.434422 0.086374 -5.030
Lexical frequency -0.113800 0.003822 -29.776
Font emphasis 0.608770 0.074313 8.192
Intra-stanza repetition -0.159876 0.013965 -11.448
Syntactic boundary strength 0.011017 0.003962 2.781
Table 3 – Final experimental model containing metric strength and rhyme predictability.
Variance estimates and standard deviations are shown for each random slope over participant
included in the final experimental model examining both metric strength and rhyme
predictability. β estimates, standard errors, and t-values are shown for each fixed effect included
in this model.
Discussion
As predicted, both metric structure and rhyme predictability in The Cat in the Hat
modulated word intensity during read-aloud productions of the text. Greater metric strength
resulted in greater produced word intensity in a hierarchic manner, such that the metrically
Page 15 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 17
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
strongest words were produced with highest intensity, metrically intermediate words were
produced with intermediate intensity, and metrically weak words were produced with lowest
intensity. Rhyme predictability resulted in clear acoustic reduction, as rhyme targets were
produced with lower intensity than other words including rhyme primes. Moreover, both metric
strength and rhyme predictability provided unique explanatory power for word intensity,
accounting for variance beyond that accounted for by lexical frequency, font emphasis, local
repetition, and syntactic boundary strength. These effects were unique from one another as well,
with rhyme predictability providing explanatory power beyond that accounted for by metric
strength alone.
The effects of intrinsic word and text factors, local repetition, and syntactic structure in
The Cat in the Hat on produced word intensity were largely consistent with prior findings for
produced word duration (Breen, 2018). Consistent with the duration findings of Breen (2018),
produced word intensity increased with lower lexical frequency, higher likelihood of being a
syntactic boundary, and font emphasis in the form of SMALL CAPS, and decreased with repetition.
Contrary to the duration findings of Breen (2018) however, produced word intensity was not
modulated by word length, word class, or the presence or absence of a child audience. Together,
these results demonstrate that the intrinsic and contextual linguistic information signaled by
duration and intensity during child-directed reading is often redundant between these two
prosodic channels, but not identical.
The realization of hierarchical metric structure in The Cat in the Hat as hierarchical word
intensity demonstrates that during child-directed poetic reading, multiple levels of acoustic
prominence are signaled. This regular, hierarchically organized prominence could provide
temporal guideposts for young listeners, creating expectancies for certain moments in time to
Page 16 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 18
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
which children can then predictively attend (e.g., Jones, 1976). Such periodic guidance of
temporal attention during child-directed reading could represent a mechanism by which highly
metrical children’s literature leads to positive reading outcomes; better ability to synchronize to
an external auditory rhythm is associated with better pre-reading skills (e.g., phonological
awareness and rapid automatized naming) in pre-readers (Woodruff Carr, White-Schwoch,
Tierney, Strait, & Kraus, 2014), and guided practice tracking auditory rhythms during shared
reading could improve auditory rhythm synchronization skills more broadly. Alternatively,
increased intensity for metrically strong words could reflect speaker-centric factors such as
greater speaker attention to metrically strong words (e.g., Arnold & Watson, 2015). The
observed lack of audience effect could indicate that the metric strength effect is less likely to be
driven by concerns about the listener. However, the read-alone participants were not given any
specific instructions on how to read the text, and we speculate that because the experimental
material is a prominent children’s book, the read-alone participants may have read it as they
would have to a child. Importantly, whether the metric strength effect reflects speaker-centric or
listener-centric motivations, hierarchical prominence in the speech signal could still provide
perceptual benefits to the listener.
The realization of hierarchical meter in The Cat in the Hat as hierarchical word intensity
is consistent with previous report of its realization as hierarchical word duration and inter-word
intervals (Breen, 2018). However, the specific pattern of metric prominence realized in intensity
differs from that realized in temporal variation. In the present study, word intensity was highest
for beat 1 in a 6/8 metric parsing, intermediate for beat 4, and lowest for beats 2, 3, 5, and 6.
Conversely in Breen (2018), word duration and inter-word intervals were longest for beat 4 in a
6/8 parsing (called metric levels 3, 4, and 5 in the metric model inspired by Fabb & Halle (2008)
Page 17 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 19
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
employed by Breen), intermediate for beat 1 (called metric level 2), and shortest for beats 2, 3, 5,
and 6 (called metric level 1). These findings demonstrate that although word intensity and word
duration provide important and potentially overlapping cues to prominence in speech, the
information encoded on these two prosodic channels is not identical. In the same productions,
word duration was aligned with the predictions of a linguistic model of rhythm in poetry (Fabb &
Halle, 2008), whereas word intensity was aligned with previous observations of dynamics in
expressive music performance (Drake & Palmer, 1993). Though it is possible that the observed
dissocation between intensity and duration is unique to productions of poetry, which involves
elements of linguistic and musical rhythm, previous reports that prominence in non-poetic
contexts is best explained by a combination of intensity and duration (Kochanski & Orphanidou,
2008; Silipo & Greenberg, 2000) suggest that this dissociation may be a more general prosodic
mechanism. Further, these findings indicate that metric strength is realized in both word duration
and word intensity, but that consistent with prior work (Wagner & Watson, 2010) phrase
structure is preferentially encoded in word duration; in The Cat in the Hat, beat 4 in a 6/8 metric
parsing often occurs at phrase-final positions, whereas beat 1 never does. Though we did not
explicitly model for phrase structure in either study, our results across the two studies could be
interpreted as hierarchically increased duration and intensity with metric strength in a 6/8
structure, combined with an additional duration increase for phrase-final beat 4s.
The observed intensity reduction for highly predictable rhyme targets is consistent with
prior findings that more predictable words are acoustically reduced relative to less predictable
words (e.g., Jurafsky et al., 2001). Moreover, the contrast between the observed clear
predictability-related reduction in intensity and the equivocal predictability effects on duration
presented by Breen (2018) is consistent with prior findings indicating that predictability-related
Page 18 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 20
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
reduction is preferentially realized in intensity (Lam & Watson, 2010). There is no local
perceptual benefit to a listener when a talker acoustically reduces a predictable word, as
reduction reduces the effective signal-to-noise ratio of that speech element. Further, listeners do
not preferentially allocate attention to predicted moments in speech when the information to be
presented at that time is also completely predictable, reducing the signal to noise ratio even more
at these times (Astheimer & Sanders, 2011). Nonetheless, highly predictable words are regularly
communicated without error, suggesting that the system is robust to information loss at moments
where information is highly predictable. It may be that listeners shift from a detailed, attentive
perceptual strategy to a template-matching perceptual strategy when information is highly
predicted, which would be both more efficient and more robust to reduced signal-to-noise ratio.
Note that reductions in produced word intensity might be a result of either speakers being aware
of and accommodating such a strategy by listeners or of a similar shift in strategy by the speaker
during production without regard to the listener, but in both cases the communicative outcome is
the same.
Collectively, the increased prominence with metric strength and reduction with rhyme
predictability observed in the current study represent two methods by which predictability
modulates produced word intensity. One one hand, temporal predictability, such as that provided
by a regular metric framework, indicates when important information is likely to occur but not
what that information will be. It is advantageous then for listeners to direct attention to these
moments, maximizing the perceptual resources available to encode the unknown important
information. It is in turn advantageous for talkers to impart prominence at these moments, both
to attract listener attention and to increase signal strength. On the other hand, phonological and
semantic predictability provide strong expectations regarding not only when important
Page 19 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 21
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
information will occur, but also what that information will be. Under these conditions it is less
important for listeners to encode with high perceptual detail, instead only needing to encode with
sufficient detail to confirm or disconfirm their expectations. In turn, talkers can take advantage of
the lowered communicative demands of highly predictable information by reducing production
effort for such words. Finally, the differences between how metric strength and rhyme
predictability are realized in word intensity in the present study compared to in word and inter-
word duration in Breen (2018) provide further evidence that intensity and duration provide
important, but separate, prosodic communication channels during speech production.
Funding
This work was supported by the James S. McDonnell Foundation [Understanding Human
Cognition Scholar Award to MB].
Page 20 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 22
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
References
Arnold, J. E., & Watson, D. G. (2015). Synthesising meaning and processing approaches to
prosody: performance matters. Language, Cognition and Neuroscience, 30(1–2), 88–102.
https://doi.org/10.1080/01690965.2013.840733
Astheimer, L. B., & Sanders, L. D. (2009). Listeners modulate temporally selective attention
during natural speech processing. Biological Psychology, 80(1), 23–34.
https://doi.org/10.1016/j.biopsycho.2008.01.015
Astheimer, L. B., & Sanders, L. D. (2011). Predictability affects early perceptual processing of
word onsets in continuous speech. Neuropsychologia, 49(12), 3512–3516.
https://doi.org/10.1016/j.neuropsychologia.2011.08.014
Aylett, M., & Turk, A. (2004). The smooth signal redundancy hypothesis: a functional
explanation for relationships between redundancy, prosodic prominence, and duration in
spontaneous speech. Language and Speech, 47(1), 31–56.
https://doi.org/10.1177/00238309040470010201
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2014). Fitting linear mixed-effects models
using lme4. ArXiv:1406.5823 [Stat]. Retrieved from http://arxiv.org/abs/1406.5823
Boersma, P., & Weenink, D. (2001). PRAAT, a system for doing phonetics by computer. Glot
International, 5(9/10), 341–345.
Boutsen, F. R., Brutten, G. J., & Watts, C. R. (2000). Timing and intensity variability in the
metronomic speech of stuttering and nonstuttering speakers. Journal of Speech,
Language, and Hearing Research, 43(2), 513–520. https://doi.org/10.1044/jslhr.4302.513
Breen, M. (2018). Effects of metric hierarchy and rhyme predictability on word duration in The
Cat in the Hat. Cognition, 174, 71–81. https://doi.org/10.1016/j.cognition.2018.01.014
Page 21 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 23
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
Breen, M., Dilley, L. C., McAuley, J. D., & Sanders, L. D. (2014). Auditory evoked potentials
reveal early perceptual effects of distal prosody on speech segmentation. Language,
Cognition and Neuroscience, 29(9), 1132–1146.
https://doi.org/10.1080/23273798.2014.894642
Breen, M., Watson, D. G., & Gibson, E. (2011). Intonational phrasing is constrained by meaning,
not balance. Language and Cognitive Processes, 26(10), 1532–1562.
https://doi.org/10.1080/01690965.2010.508878
Brenier, J. M., Cer, D. M., & Jurafsky, D. (2005). The detection of emphatic words using
acoustic and lexical features. In Ninth European Conference on Speech Communication
and Technology.
Burling, R. (1966). The metrics of children’s verse: a cross-linguistic study. American
Anthropologist, 68(6), 1418–1441. https://doi.org/10.1525/aa.1966.68.6.02a00040
Coltheart, M. (1981). The MRC psycholinguistic database. The Quarterly Journal of
Experimental Psychology Section A, 33(4), 497–505.
https://doi.org/10.1080/14640748108400805
Dr. Seuss. (1957). The Cat in the Hat. New York, NY: Random House.
Drake, C., & Palmer, C. (1993). Accent structures in music performance. Music Perception: An
Interdisciplinary Journal, 10(3), 343–378. https://doi.org/10.2307/40285574
Fabb, N., & Halle, M. (2008). Meter in poetry: a new theory. Cambridge University Press.
Fitzroy, A. B., & Sanders, L. D. (2015). Musical meter modulates the allocation of attention
across time. Journal of Cognitive Neuroscience, 27(12), 2339–2351.
https://doi.org/10.1162/jocn_a_00862
Page 22 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 24
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
Francis, W., & Kučera, H. (1982). Frequency Analysis of English Usage. Boston: Houghton
Mifflin Company.
Fry, D. B. (1955). Duration and intensity as physical correlates of linguistic stress. The Journal
of the Acoustical Society of America, 27(4), 765–768. https://doi.org/10.1121/1.1908022
Gorman, K., Howell, J., & Wagner, M. (2011). Prosodylab-aligner: A tool for forced alignment
of laboratory speech. Canadian Acoustics, 39(3), 192–193.
Goswami, U. (1999). Causal connections in beginning reading: the importance of rhyme. Journal
of Research in Reading, 22(3), 217.
Gregory, M. L., Raymond, W. D., Bell, A., Fosler-lussier, E., & Jurafsky, D. (1999). The Effects
of Collocational Strength and Contextual Predictability in Lexical Production.
Hanna, P. N. A., Lindner, K., & Dufter, A. (2002). The meter of nursery rhymes: universal
versus language-specific patterns. In Sounds and systems: studies in structure and change
(pp. 241–267).
Huss, M., Verney, J. P., Fosker, T., Mead, N., & Goswami, U. (2011). Music, rhythm, rise time
perception and developmental dyslexia: Perception of musical meter predicts reading and
phonology. Cortex, 47(6), 674–689. https://doi.org/10.1016/j.cortex.2010.07.010
Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention,
and memory. Psychological Review, 83(5), 323–355. https://doi.org/10.1037/0033-
295X.83.5.323
Jurafsky, D., Bell, A., Gregory, M., & Raymond, W. D. (2001). Probabilistic relations between
words: Evidence from reduction in lexical production. In Typological studies in
language, vol. 45: Frequency and the emergence of linguistic structure (pp. 229–254).
Page 23 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 25
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
Amsterdam, Netherlands: John Benjamins Publishing Company.
https://doi.org/10.1075/tsl.45.13jur
Kochanski, G., Grabe, E., Coleman, J., & Rosner, B. (2005). Loudness predicts prominence:
fundamental frequency lends little. The Journal of the Acoustical Society of America,
118(2), 1038–1054.
Kochanski, G., & Orphanidou, C. (2008). What marks the beat of speech? The Journal of the
Acoustical Society of America, 123(5), 2780–2791. https://doi.org/10.1121/1.2890742
Lam, T. Q., & Watson, D. G. (2010). Repetition is easy: Why repeated referents have reduced
prominence. Memory & Cognition, 38(8), 1137–1146.
https://doi.org/10.3758/MC.38.8.1137
Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT
Press.
Nel, P. (2004). Dr. Seuss: American Icon. Bloomsbury Academic.
R Core Team. (2015). R: A Language and Environment for Statistical Computing. Vienna,
Austria. Retrieved from http://www.R-project.org/
RStudio Team. (2014). RStudio: Integrated Development for R. Boston, MA. Retrieved from
http://www.rstudio.org/
Silipo, R., & Greenberg, S. (2000). Prosodic stress revisited: Reassessing the role of fundamental
frequency. In Proc. NIST Speech Transcription Workshop.
Streefkerk, B. M., Pols, L. C., & Bosch, L. F. T. (1999). Acoustical features as predictors for
prominence in read aloud Dutch sentences used in ANN’s. In Sixth European Conference
on Speech Communication and Technology.
Page 24 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 26
For Peer Review
RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY
Wagner, M., & Watson, D. G. (2010). Experimental and theoretical advances in prosody: A
review. Language and Cognitive Processes, 25(7-9), 905-945.
Watson, D., & Gibson, E. (2004). The relationship between intonational phrasing and syntactic
structure in language production. Language and Cognitive Processes, 19(6), 713–755.
https://doi.org/10.1080/01690960444000070
Woodruff Carr, K., White-Schwoch, T., Tierney, A. T., Strait, D. L., & Kraus, N. (2014). Beat
synchronization predicts neural speech encoding and reading readiness in preschoolers.
Proceedings of the National Academy of Sciences, 111(40), 14559–14564.
https://doi.org/10.1073/pnas.1406219111
Young, S. J., & Young, S. (1993). The HTK hidden Markov model toolkit: Design and
philosophy. University of Cambridge, Department of Engineering.
Page 25 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 27
For Peer Review
Figure 1 – Word intensity measurement. An excerpt from one The Cat in The Hat production is plotted as a time-frequency domain spectrogram (top) and as a time domain waveform (bottom). Identified word and silence boundaries are plotted as dashed vertical lines. The smoothed intensity contour generated for this excerpt is plotted in black over the spectrogram, with the parabolically-interpolated maximum intensity for
each word indicated with an asterisk.
86x41mm (300 x 300 DPI)
Page 26 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 28
For Peer Review
Figure 2 – Regression predictors. Word intensity was modeled using linear mixed-effects regression with within-subjects factors of metric strength in a 6/8 metric structure (MS), rhyme predictability (RP), number
of phonemes (#P), lexical frequency (LF), word class (WC), font emphasis (FE), intra-stanza repetition
(ISR), and syntactic boundary strength (SBS). Lexical frequency values are rounded to the nearest tenth for clarity. See text for details.
65x23mm (300 x 300 DPI)
Page 27 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 29
For Peer Review
Figure 3 – Fixed effect estimates (β) from final experimental model. Metric strength and rhyme predictability estimates are highlighted in black, linguistic control factors are shown in grey. Thicker horizontal bars
indicate one standard error, thinner horizontal bars indicate two standard errors.
116x77mm (300 x 300 DPI)
Page 28 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
Page 30
For Peer Review
Figure 4 – Maximum word intensity as a function of position in a 6/8 metric hierarchy and rhyme predictability. Produced word intensity increased hierarchically with metric strength (left), and predictable rhyme targets were produced with lower intensity than other words, including phonologically similar but
unpredictable rhyme primes (right). Note that all words that were not rhyme targets were coded as unpredictable in the regression models, but only unpredictable rhyme primes are shown here for clarity.
83x40mm (300 x 300 DPI)
Page 29 of 28
https://mc.manuscriptcentral.com/las
Language and Speech
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960