For Peer Reviewmbreen/pubs/FitzroyBreen_CITHint_sub… · read-alone productions of The Cat in the Hat (Dr. Seuss, 1957). Using linear mixed-effects regression, we modeled the maximum

For Peer Review

Metric structure and rhyme predictability modulate speech

intensity during child-directed reading

Journal: Language and Speech

Manuscript ID Draft

Manuscript Type: Original Article

Date Submitted by the Author: n/a

Complete List of Authors: Fitzroy, Ahren; Mount Holyoke College, Psychology and Education; University of Massachusetts, Amherst, Psychological and Brain Sciences Breen, Mara ; Mount Holyoke College, Psychology and Education

Keywords: speech prosody, intensity, meter, rhyme, child-directed reading

Abstract:

Temporal and phonological predictability in children’s literature may support early literacy acquisition. Because such texts are typically heard

before they are read, realization of predictive structure in caregiver prosody could guide children’s attention during shared reading in a manner that supports reading subskill development. However, little is known about how predictive structure is realized in prosody during child-directed reading. We investigated whether speakers use word intensity to signal predictive metric and rhyme structure in a corpus of child-directed and read-alone productions of The Cat in the Hat (Dr. Seuss, 1957). Using linear mixed-effects regression, we modeled the maximum intensity (dB) of each produced monosyllabic word as a function of metric strength, rhyme predictability, and a set of control parameters. In the control model, intensity increased with lower lexical frequency, capitalization, first mention, and syntactic boundary likelihood. Metric structure predicted word

intensity beyond these control factors hierarchically, such that words were produced with highest intensity when aligned with beat one in a 6/8 metric structure, intermediate intensity when aligned with beat four, and lowest intensity when aligned with all other beats. Additionally, phonologically predictable rhyme targets were produced with reduced intensity. Together these results demonstrate that predictability is encoded in word intensity during child-directed reading, and that metric structure is realized hierarchically in word intensity. Further, the manner by which predictability is encoded in word intensity differs from that previously reported for word duration in this same corpus (Breen, 2018), demonstrating that intensity and duration present nonidentical prosodic information channels.

https://mc.manuscriptcentral.com/las

Language and Speech

For Peer Review

Page 1 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

RUNNING HEAD: METER AND RHYME MODULATE SPEECH INTENSITY

Metric structure and rhyme predictability modulate speech intensity during child-directed

reading

Ahren B. Fitzroy1,2 & Mara Breen1

1Department of Psychology and Education, Mount Holyoke College

2Department of Psychological and Brain Sciences, University of Massachusetts, Amherst

Corresponding author:

Ahren B. Fitzroy, Ph.D.

Mount Holyoke College

50 College Street

South Hadley, MA 01075

USA

E-mail: [email protected]

Page 2 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

mailto:[email protected]

For Peer Review


Abstract

Temporal and phonological predictability in children’s literature may support early literacy

acquisition. Because such texts are typically heard before they are read, realization of predictive

structure in caregiver prosody could guide children’s attention during shared reading in a manner

that supports reading subskill development. However, little is known about how predictive

structure is realized in prosody during child-directed reading. We investigated whether speakers

use word intensity to signal predictive metric and rhyme structure in a corpus of child-directed

and read-alone productions of The Cat in the Hat (Dr. Seuss, 1957). Using linear mixed-effects

regression, we modeled the maximum intensity (dB) of each produced monosyllabic word as a

function of metric strength, rhyme predictability, and a set of control parameters. In the control

model, intensity increased with lower lexical frequency, capitalization, first mention, and

syntactic boundary likelihood. Metric structure predicted word intensity beyond these control

factors hierarchically, such that words were produced with highest intensity when aligned with

beat one in a 6/8 metric structure, intermediate intensity when aligned with beat four, and lowest

intensity when aligned with all other beats. Additionally, phonologically predictable rhyme

targets were produced with reduced intensity. Together these results demonstrate that

predictability is encoded in word intensity during child-directed reading, and that metric

structure is realized hierarchically in word intensity. Further, the manner by which predictability

is encoded in word intensity differs from that previously reported for word duration in this same

corpus (Breen, 2018), demonstrating that intensity and duration present nonidentical prosodic

information channels.

Page 3 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


Metric structure and rhyme predictability modulate speech intensity during child-directed

reading

Two common features in children’s literature across languages are the presence of a

strong metrical framework, and the use of rhyming phrases (Burling, 1966; Hanna, Lindner, &

Dufter, 2002). The temporal and phonological predictability created by these features may

support the purported benefits of these texts for literacy acquisition (Goswami, 1999; Huss,

Verney, Fosker, Mead, & Goswami, 2011); early children’s literature is typically heard before it

is read, and temporal and phonological predictability guide the allocation of attention to

important moments in the auditory stream (Astheimer & Sanders, 2009, 2011; Breen, Dilley,

McAuley, & Sanders, 2014; Fitzroy & Sanders, 2015). However, little is empirically known

about how meter and rhyme in children’s literature are instantiated in spoken productions of

these texts, which is a necessary precursor to understanding how such structures might guide

perceptual or literacy learning in young listeners. Breen (2018) recently demonstrated that the

metric and rhyme structures of The Cat in the Hat (Dr. Seuss, 1957) are realized in the duration

and temporal spacing of words in child-directed productions of the text. Prominence in speech is

imparted through several prosodic cues however, with duration and intensity typically thought to

be the most important (Brenier, Cer, & Jurafsky, 2005; Kochanski, Grabe, Coleman, & Rosner,

2005; Kochanski & Orphanidou, 2008; Silipo & Greenberg, 2000). Therefore, in the present

paper we examine the effects of meter and rhyme in The Cat in the Hat on the produced intensity

of words during child-directed reading, and consider the results in light of previous findings of

impacts on word duration (i.e., Breen, 2018).

Multiple studies have demonstrated that increased intensity imparts prominence in

speech. Increased intensity for more prominent syllables has been observed for isolated and

Page 4 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


continuous speech in which prominence is subjectively assessed by a listener (Brenier et al.,

2005; Kochanski et al., 2005; Silipo & Greenberg, 2000; Streefkerk, Pols, & Bosch, 1999), and

for metronome-synchronous speech in which prominence is objectively defined as syllabic

alignment with a metronome click (Boutsen, Brutten, & Watts, 2000; Kochanski & Orphanidou,

2008). Intensity variation signals metric structure similarly during expressive musical

performance, with greater intensity indicating greater metric strength. Importantly, intensity in

expressive music performance imparts metric strength in a hierarchical manner: metric structures

that involve three levels of strength (e.g., 6/8 meter) are produced with three corresponding

levels of intensity (Drake & Palmer, 1993). It is unclear whether intensity in continuous speech

imparts metric prominence in a similarly hierarchical manner, or only in a binary manner. The

strongly hierarchical metric structure of The Cat in the Hat is reflected in hierarchically produced

word duration (Breen, 2018), suggesting that it may also be signaled through hierarchical

modulation of produced word intensity.

Predictability in speech also modulates the production of words, as more predictable

words are typically acoustically reduced relative to less predictable words (Aylett & Turk, 2004;

Gregory, Raymond, Bell, Fosler-lussier, & Jurafsky, 1999; Jurafsky, Bell, Gregory, & Raymond,

2001). Moreover, predictability-related reduction is preferentially realized in word intensity,

such that more predictable words are produced with lower intensity regardless of whether they

are repeated in the local context (Lam & Watson, 2010). In contrast, local word repetition is

preferentially associated with shortened word duration (Lam & Watson, 2010). Most stanzas in

The Cat in the Hat are written such that the first line ends with a rhyme prime (e.g., “all” in (1)),

and the second line ends with a rhyme target (e.g., “fall” in (1)). This structure causes the stanza-

final rhyme target to be both phonologically and semantically predictable, which should cause

Page 5 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


the rhyme targets to be reduced during production. Breen (2018) observed equivalent word

duration but lengthened inter-word-onset intervals for predictable rhyme targets relative to

rhyme primes in The Cat in the Hat, which may be interpreted as a reduction of the rhyme target.

However, the temporal characteristics alone do not conclusively support this interpretation.

Given that predictability-related reduction is preferentially realized in word intensity (Lam &

Watson, 2010), we predict that word intensity will be reduced for rhyme targets relative to rhyme

primes in The Cat in the Hat.

In the current study, we test the hypotheses that metric structure and rhyme predictability

in The Cat in the Hat are realized in produced word intensity during child-directed reading. We

assess the influence of metric strength and rhyme predictability on word intensity in a corpus of

child-directed and read-alone productions of The Cat in the Hat (Breen, 2018), while controlling

for intrinsic word characteristics, font emphasis, and local repetition. We predict that metrically

stronger words will be produced with higher intensity, and that, similar to prior findings in

expressive musical performance, hierarchical metric structure will be realized as hierarchically

produced word intensity. We further predict that predictable rhyme targets will be produced with

reduced intensity relative to rhyme primes.

Method

Participants

Eighteen young adult (age 18-35 years) Mount Holyoke students participated in the

current study. All participants identified as female and as native speakers of American English,

defined as speaking English in the United States since at least age five. One participant was later

determined to not be a native speaker of American English and withdrawn from the study,

Page 6 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


leaving 17 participants in the final analyses. All participants were compensated for their time

with research participation credit for Psychology courses.

Stimuli

Participants read The Cat in the Hat (Dr. Seuss, 1957) aloud from a hardcover copy of the

text. The Cat in the Hat is a 61-page illustrated book written primarily in anapestic (i.e., weak-

weak-STRONG) tetrameter. It consists of 1625 words (1576 monosyllabic, 236 unique lexemes)

organized primarily into seventy stanzas, with each stanza containing two lines of four anapests

each (as in (1)). The first line in each stanza ends with a rhyme prime and the second line ends

with a phonologically predictable rhyme target. In addition, there are six single anapestic lines in

the text. In two cases, these single lines rhyme internally (e.g., And then something went BUMP!

How that bump made us jump!). In the other four cases, the final word of the single line rhymes

with the preceding stanza.

(1) “Put me down!” said the fish.

“This is no fun at all!

Put me down!” said the fish.

“I do NOT wish to fall!”

Procedure

Participants were randomly assigned to read The Cat in the Hat aloud in its entirety in

one of two locations: to an audience of preschool students (4-5 years old) in a quiet classroom (n

= 8), or alone in a quiet room (n = 9). In both environments participants read the book while

seated and holding the book in their hands, and turned the pages on their own. All participants’

productions were digitally recorded using a head-mounted microphone (Shure SM10A)

connected to a pre-amplifier (Rolls MP13 Mini-mic) plugged into the motherboard audio input

Page 7 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


of a laptop (in the classroom) or desktop computer (in the quiet room). Production recordings

were digitized at 44,100 Hz with 16-bit resolution. Participants were given no specific

instructions on how to read the text, other than to read it aloud. All participants provided

informed consent before taking part in the study, and parents of the children who listened to the

readings provided written assent for their child’s participation.

Acoustic measures

[FIGURE 1 HERE]

Figure 1 – Word intensity measurement. An excerpt from one The Cat in The Hat production is

plotted as a time-frequency domain spectrogram (top) and as a time domain waveform (bottom).

Identified word and silence boundaries are plotted as dashed vertical lines. The smoothed

intensity contour generated for this excerpt is plotted in black over the spectrogram, with the

parabolically-interpolated maximum intensity for each word indicated with an asterisk.

Word and silence boundaries (Figure 1) were identified in the production recordings by

automatic force-alignment with the text in Praat (Boersma & Weenink, 2001) using the

Prosodylab-Aligner (Gorman, Howell, & Wagner, 2011), which relies on the Hidden Markov

Model Toolkit (Young & Young, 1993). Automatic force-alignment results were manually

inspected and corrected as needed. As shown in Figure 1, linearly-spaced intensity (dB SPL)

contours were calculated automatically in Praat for each production recording. Intensity contours

were smoothed based on a minimum pitch of 100 Hz by squaring intensity values then

convolving them with a 32 ms Gaussian window, which kept pitch-synchronous intensity

variations to less than 0.00001 dB (Boersma & Weenink, 2001). Word intensity was defined as

the parabolically-interpolated maximum of the intensity contour occurring within each word.

Page 8 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


Multisyllabic words were excluded from further analyses, because the unstressed syllables in

these words have reduced intensities for reasons unrelated to metric structure (Fry, 1955).

Disfluent and incorrect word productions and were manually identified and excluded from

further analyses, resulting in a loss of 473 out of 26,792 possible monosyllabic word productions

(1.77%). The remaining maximum intensity values of correctly-produced monosyllabic words

were centered and scaled to standard deviation units (i.e., Z-transformed) separately for each

participant.

Data analysis

We analyzed the standardized maximum intensity of correctly-produced monosyllabic

words using linear mixed-effects regression modeling implemented in the lme4 package (Bates,

Mächler, Bolker, & Walker, 2014) within R (R Core Team, 2015; RStudio Team, 2014). Data

and analysis scripts can be downloaded from https://osf.io/xgy4t. We predicted maximum word

intensity as a function of metric strength, rhyme predictability, linguistic control factors (number

of phonemes, lexical frequency, word class, font emphasis, intra-stanza repetition, and syntactic

boundary strength), and presence of a child audience.

[FIGURE 2 HERE]

Figure 2 – Regression predictors. Word intensity was modeled using linear mixed-effects

regression with within-subjects factors of metric strength in a 6/8 metric structure (MS), rhyme

predictability (RP), number of phonemes (#P), lexical frequency (LF), word class (WC), font

Page 9 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


emphasis (FE), intra-stanza repetition (ISR), and syntactic boundary strength (SBS). Lexical

frequency values are rounded to the nearest tenth for clarity. See text for details.

Metric strength was defined by beat position in a 6/8 metric hierarchy (Figure 2), with

beat 1 of each measure assigned metric strength 3 (strong), beat 4 assigned metric strength 2

(medium), and beats 2, 3, 5, and 6 assigned metric strength 1 (weak). The first stressed syllable

in each stanza was considered beat 1 of the first measure of that stanza. This metric parsing is in

agreement with previous linguistic metric descriptions of The Cat in the Hat as anapestic

tetrameter in terms of stress placement (Nel, 2004), but adopts a musical perspective of metric

phase in that the first stressed event in a phrase likely represents the start of a larger metric unit

(i.e., a measure) with any preceding unstressed events considered anacrustic (Lerdahl &

Jackendoff, 1983). Rhyme predictability was coded as 1 for verse-final rhyme target words (e.g.,

“fall” in (1)) and 0 for all other words.

To account for effects of basic word and text characteristics on produced intensity,

several linguistic control factors were included in the regression models (Figure 2). The 1576

monosyllabic words in The Cat in the Hat were annotated using the Medical Research Council

(MRC) Psycholinguistic Database (Coltheart, 1981) for number of phonemes (range = 1-6, M =

2.54, SD = 0.75) and lexical frequency as assessed by Kučera-Francis (K-F; Francis & Kučera,

1982) norms (log-transformed K-F frequency; range = 0-11.16, M = 7.37, SD = 2.31). For words

that did not have raw K-F frequencies (n = 18), the K-F frequency of the singular or infinitival

form of the word was substituted if available. There was no K-F frequency available for the word

plop, so it was excluded from further analyses. The analyzed words were also annotated for

syntactic word class (open [n = 540], closed [n = 1035]), font emphasis (normal font [n = 1550],

Page 10 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


SMALL CAPS [n = 25]), intra-stanza repetition (not repeated [n = 1243], repeated [n = 332]), and

syntactic boundary strength as defined by the Left-Right Boundary model (LRB; Breen, Watson,

& Gibson, 2011; Watson & Gibson, 2004) (range = 0-5, M = 1.24, SD = 2.00). For detailed

description about how the LRB model was applied, please see (Breen, 2018); in brief, the LRB

model assigns a higher probability of a phrase boundary with increases in the size of the material

a speaker has just produced (the left-hand side) and increases in what they are going to produce

(the right-hand side).

Our regression approach was to first create a fully saturated control model containing all

linguistic control factors as both fixed effects and random slopes over participant, and audience

as a fixed effect. Before being entered into the model as either a fixed effect or random slope,

continuous predictors were centered and categorical predictors were recoded such that the sum of

factor levels for each contrast was equal to zero. We then refined the control model by removing

each fixed effect individually in order of ascending t magnitude, and using a likelihood ratio test

to compare model fit with and without the fixed effect to justify its continued inclusion. The

control model included only fixed effects that significantly increased model fit. We then added

metric strength into the regression model as both a fixed effect and random slope over

participant, coded using simple contrast coding with strength level 2 (medium; beat 4) as the

reference level. This model did not converge, so we iteratively removed random slopes in order

of least variance explained until the model converged. We then compared the resultant metric

strength model to an updated version of the control model containing only those random slopes

included in the metric strength model using a likelihood ratio test to determine whether metric

strength predicted word intensity variance beyond the control parameters. Lastly, we added

rhyme predictability to the model as a fixed effect and random slope over participant, and

Page 11 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


compared the result to the prior model containing control factors and metric strength using a

likelihood ratio test to determine whether rhyme predictability accounted for unique word

intensity variance.

Results

Control model

The control model included number of phonemes, lexical frequency, word class, font

emphasis, intra-stanza repetition, and syntactic boundary strength as random slopes over

participant, and lexical frequency, font emphasis, intra-stanza repetition, and syntactic boundary

strength as fixed effects (Table 1). Audience presence did not have an effect on produced word

intensity (p > 0.8), and was not included in the final control model. Similarly, number of

phonemes and word class were not warranted for inclusion in the control model as fixed effects

(p’s > 0.3). Across individuals, higher word intensity was predicted by lower lexical frequency

(β = -0.133, SE = 0.006, t = -22.542), presentation in a capitalized font (β = 0.784, SE = 0.065, t

= 12.101), not being a repeat of recent material (β = -0.164, SE = 0.022, t = -7.567), and higher

likelihood of being a syntactic boundary (β = 0.015, SE = 0.008, t = 1.888).

Random slopes Variance SD

Intercept 0.0014615 0.03823

Number of phonemes1 0.0022393 0.04732

Lexical frequency1 0.0009228 0.03038

Word class 0.0076330 0.08737

Font emphasis 0.0437573 0.20918

Intra-stanza repetition1 0.0048553 0.06968

Syntactic boundary strength1 0.0015851 0.03981

Fixed effects β estimate β standard error t χ2 (p)

Intercept 0.021509 0.007745 2.777 -

Lexical frequency -0.132559 0.005881 -22.542 58.354 (< 0.001)

Font emphasis 0.784229 0.064807 12.101 38.876 (< 0.001)

Intra-stanza repetition -0.163933 0.021664 -7.567 37.551 (< 0.001)

Page 12 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


Syntactic boundary strength 0.014507 0.007638 1.888 3.358 (0.067)

Table 1 – Control model. Variance estimates and standard deviations are shown for each random

slope over participant included in the control model. β estimates, standard errors, and t-values are

shown for each fixed effect justified for inclusion in the control model, along with chi-square and

p values from the likelihood ratio tests justifying its inclusion. 1Note that the random slopes over

participant for number of phonemes, lexical frequency, intra-stanza repetition, and syntactic

boundary strength were later removed from the control model to allow direct comparison with

the experimental model containing metric strength.

Experimental models

[FIGURE 3 HERE]

Figure 3 – Fixed effect estimates (β) from final experimental model. Metric strength and rhyme

predictability estimates are highlighted in black, linguistic control factors are shown in grey.

Thicker horizontal bars indicate one standard error, thinner horizontal bars indicate two standard

errors.

As shown in Table 2 and Figures 3 and 4, position in a 6/8 metric hierarchy predicted

word intensity such that beat 1 was produced with highest intensity, beat 4 was produced with

intermediate intensity, and beats 2, 3, 5, and 6 were produced with similarly low intensity (beat 1

vs. beat 4: β = 0.322, SE = 0.058, t = 5.536; beats 2, 3, 5, 6 vs. beat 4: β = -0.191, SE = 0.054, t =

-3.554). Including metric strength in the regression model significantly improved model fit

relative to an updated version of the control model containing the same non-meter factors as

fixed effects and random slopes over participant, χ2 (11) = 1157.60, p < 0.001. Therefore, metric

strength provides explanatory power for produced word intensity beyond that explained by

intrinsic and contextual word chararacteristics.

Page 13 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review



Intercept 0.004285 0.06546

Beat 1 vs. beat 4 0.058568 0.24201

Beats 2, 3, 5, 6 vs. beat 4 0.041924 0.20475

Word class 0.020477 0.14310

Font emphasis 0.051992 0.22802

Fixed effects β estimate β standard error t

Intercept 0.138004 0.012368 11.158

Beat 1 vs. beat 4 0.322274 0.058215 5.536

Beats 2, 3, 5, 6 vs. beat 4 -0.191348 0.053841 -3.554

Lexical frequency -0.115302 0.003829 -30.113

Font emphasis 0.623397 0.071589 8.708

Intra-stanza repetition -0.158835 0.014046 -11.308

Syntactic boundary strength 0.002515 0.003936 0.639

Table 2 – Initial experimental model containing metric strength. Variance estimates and standard

deviations are shown for each random slope over participant included in the initial experimental

model examining metric strength. β estimates, standard errors, and t-values are shown for each

fixed effect included in this model.

[FIGURE 4 HERE]

Figure 4 – Maximum word intensity as a function of position in a 6/8 metric hierarchy and

rhyme predictability. Produced word intensity increased hierarchically with metric strength (left),

and predictable rhyme targets were produced with lower intensity than other words, including

phonologically similar but unpredictable rhyme primes (right). Note that all words that were not

rhyme targets were coded as unpredictable in the regression models, but only unpredictable

rhyme primes are shown here for clarity.

As shown in Table 3 and Figures 3 and 4, rhyme predictability predicted word intensity

such that predictable rhyme targets (e.g., “fall” in (1)) were produced with lower intensity than

other words, including metrically and phonologically similar rhyme primes (β = -0.434, SE =

Page 14 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


0.086, t = -5.030). Adding rhyme predictability to the regression model significantly improved

model fit relative to the model containing final control parameters and metric strength, χ2 (7) =

290.73, p < 0.001, demonstrating that rhyme predictability provides explanatory power for

produced word intensity beyond that explained by intrinsic and contextual word characteristics

and metric structure.


Intercept 0.003865 0.06217

Beat 1 vs. beat 4 0.035875 0.18941

Beats 2, 3, 5, 6 vs. beat 4 0.021121 0.14533

Rhyme predictability 0.111544 0.33398

Word class 0.022192 0.14897

Font emphasis 0.058998 0.24290

Fixed effects β estimate β standard error t

Intercept 0.175713 0.012311 14.272

Beat 1 vs. beat 4 0.204102 0.041646 4.901

Beats 2, 3, 5, 6 vs. beat 4 -0.302707 0.040018 -7.564

Rhyme predictability -0.434422 0.086374 -5.030

Lexical frequency -0.113800 0.003822 -29.776

Font emphasis 0.608770 0.074313 8.192

Intra-stanza repetition -0.159876 0.013965 -11.448

Syntactic boundary strength 0.011017 0.003962 2.781

Table 3 – Final experimental model containing metric strength and rhyme predictability.

Variance estimates and standard deviations are shown for each random slope over participant

included in the final experimental model examining both metric strength and rhyme

predictability. β estimates, standard errors, and t-values are shown for each fixed effect included

in this model.

Discussion

As predicted, both metric structure and rhyme predictability in The Cat in the Hat

modulated word intensity during read-aloud productions of the text. Greater metric strength

resulted in greater produced word intensity in a hierarchic manner, such that the metrically

Page 15 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


strongest words were produced with highest intensity, metrically intermediate words were

produced with intermediate intensity, and metrically weak words were produced with lowest

intensity. Rhyme predictability resulted in clear acoustic reduction, as rhyme targets were

produced with lower intensity than other words including rhyme primes. Moreover, both metric

strength and rhyme predictability provided unique explanatory power for word intensity,

accounting for variance beyond that accounted for by lexical frequency, font emphasis, local

repetition, and syntactic boundary strength. These effects were unique from one another as well,

with rhyme predictability providing explanatory power beyond that accounted for by metric

strength alone.

The effects of intrinsic word and text factors, local repetition, and syntactic structure in

The Cat in the Hat on produced word intensity were largely consistent with prior findings for

produced word duration (Breen, 2018). Consistent with the duration findings of Breen (2018),

produced word intensity increased with lower lexical frequency, higher likelihood of being a

syntactic boundary, and font emphasis in the form of SMALL CAPS, and decreased with repetition.

Contrary to the duration findings of Breen (2018) however, produced word intensity was not

modulated by word length, word class, or the presence or absence of a child audience. Together,

these results demonstrate that the intrinsic and contextual linguistic information signaled by

duration and intensity during child-directed reading is often redundant between these two

prosodic channels, but not identical.

The realization of hierarchical metric structure in The Cat in the Hat as hierarchical word

intensity demonstrates that during child-directed poetic reading, multiple levels of acoustic

prominence are signaled. This regular, hierarchically organized prominence could provide

temporal guideposts for young listeners, creating expectancies for certain moments in time to

Page 16 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


which children can then predictively attend (e.g., Jones, 1976). Such periodic guidance of

temporal attention during child-directed reading could represent a mechanism by which highly

metrical children’s literature leads to positive reading outcomes; better ability to synchronize to

an external auditory rhythm is associated with better pre-reading skills (e.g., phonological

awareness and rapid automatized naming) in pre-readers (Woodruff Carr, White-Schwoch,

Tierney, Strait, & Kraus, 2014), and guided practice tracking auditory rhythms during shared

reading could improve auditory rhythm synchronization skills more broadly. Alternatively,

increased intensity for metrically strong words could reflect speaker-centric factors such as

greater speaker attention to metrically strong words (e.g., Arnold & Watson, 2015). The

observed lack of audience effect could indicate that the metric strength effect is less likely to be

driven by concerns about the listener. However, the read-alone participants were not given any

specific instructions on how to read the text, and we speculate that because the experimental

material is a prominent children’s book, the read-alone participants may have read it as they

would have to a child. Importantly, whether the metric strength effect reflects speaker-centric or

listener-centric motivations, hierarchical prominence in the speech signal could still provide

perceptual benefits to the listener.

The realization of hierarchical meter in The Cat in the Hat as hierarchical word intensity

is consistent with previous report of its realization as hierarchical word duration and inter-word

intervals (Breen, 2018). However, the specific pattern of metric prominence realized in intensity

differs from that realized in temporal variation. In the present study, word intensity was highest

for beat 1 in a 6/8 metric parsing, intermediate for beat 4, and lowest for beats 2, 3, 5, and 6.

Conversely in Breen (2018), word duration and inter-word intervals were longest for beat 4 in a

6/8 parsing (called metric levels 3, 4, and 5 in the metric model inspired by Fabb & Halle (2008)

Page 17 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


employed by Breen), intermediate for beat 1 (called metric level 2), and shortest for beats 2, 3, 5,

and 6 (called metric level 1). These findings demonstrate that although word intensity and word

duration provide important and potentially overlapping cues to prominence in speech, the

information encoded on these two prosodic channels is not identical. In the same productions,

word duration was aligned with the predictions of a linguistic model of rhythm in poetry (Fabb &

Halle, 2008), whereas word intensity was aligned with previous observations of dynamics in

expressive music performance (Drake & Palmer, 1993). Though it is possible that the observed

dissocation between intensity and duration is unique to productions of poetry, which involves

elements of linguistic and musical rhythm, previous reports that prominence in non-poetic

contexts is best explained by a combination of intensity and duration (Kochanski & Orphanidou,

2008; Silipo & Greenberg, 2000) suggest that this dissociation may be a more general prosodic

mechanism. Further, these findings indicate that metric strength is realized in both word duration

and word intensity, but that consistent with prior work (Wagner & Watson, 2010) phrase

structure is preferentially encoded in word duration; in The Cat in the Hat, beat 4 in a 6/8 metric

parsing often occurs at phrase-final positions, whereas beat 1 never does. Though we did not

explicitly model for phrase structure in either study, our results across the two studies could be

interpreted as hierarchically increased duration and intensity with metric strength in a 6/8

structure, combined with an additional duration increase for phrase-final beat 4s.

The observed intensity reduction for highly predictable rhyme targets is consistent with

prior findings that more predictable words are acoustically reduced relative to less predictable

words (e.g., Jurafsky et al., 2001). Moreover, the contrast between the observed clear

predictability-related reduction in intensity and the equivocal predictability effects on duration

presented by Breen (2018) is consistent with prior findings indicating that predictability-related

Page 18 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


reduction is preferentially realized in intensity (Lam & Watson, 2010). There is no local

perceptual benefit to a listener when a talker acoustically reduces a predictable word, as

reduction reduces the effective signal-to-noise ratio of that speech element. Further, listeners do

not preferentially allocate attention to predicted moments in speech when the information to be

presented at that time is also completely predictable, reducing the signal to noise ratio even more

at these times (Astheimer & Sanders, 2011). Nonetheless, highly predictable words are regularly

communicated without error, suggesting that the system is robust to information loss at moments

where information is highly predictable. It may be that listeners shift from a detailed, attentive

perceptual strategy to a template-matching perceptual strategy when information is highly

predicted, which would be both more efficient and more robust to reduced signal-to-noise ratio.

Note that reductions in produced word intensity might be a result of either speakers being aware

of and accommodating such a strategy by listeners or of a similar shift in strategy by the speaker

during production without regard to the listener, but in both cases the communicative outcome is

the same.

Collectively, the increased prominence with metric strength and reduction with rhyme

predictability observed in the current study represent two methods by which predictability

modulates produced word intensity. One one hand, temporal predictability, such as that provided

by a regular metric framework, indicates when important information is likely to occur but not

what that information will be. It is advantageous then for listeners to direct attention to these

moments, maximizing the perceptual resources available to encode the unknown important

information. It is in turn advantageous for talkers to impart prominence at these moments, both

to attract listener attention and to increase signal strength. On the other hand, phonological and

semantic predictability provide strong expectations regarding not only when important

Page 19 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


information will occur, but also what that information will be. Under these conditions it is less

important for listeners to encode with high perceptual detail, instead only needing to encode with

sufficient detail to confirm or disconfirm their expectations. In turn, talkers can take advantage of

the lowered communicative demands of highly predictable information by reducing production

effort for such words. Finally, the differences between how metric strength and rhyme

predictability are realized in word intensity in the present study compared to in word and inter-

word duration in Breen (2018) provide further evidence that intensity and duration provide

important, but separate, prosodic communication channels during speech production.

Funding

This work was supported by the James S. McDonnell Foundation [Understanding Human

Cognition Scholar Award to MB].

Page 20 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


References

Arnold, J. E., & Watson, D. G. (2015). Synthesising meaning and processing approaches to

prosody: performance matters. Language, Cognition and Neuroscience, 30(1–2), 88–102.

https://doi.org/10.1080/01690965.2013.840733

Astheimer, L. B., & Sanders, L. D. (2009). Listeners modulate temporally selective attention

during natural speech processing. Biological Psychology, 80(1), 23–34.

https://doi.org/10.1016/j.biopsycho.2008.01.015

Astheimer, L. B., & Sanders, L. D. (2011). Predictability affects early perceptual processing of

word onsets in continuous speech. Neuropsychologia, 49(12), 3512–3516.

https://doi.org/10.1016/j.neuropsychologia.2011.08.014

Aylett, M., & Turk, A. (2004). The smooth signal redundancy hypothesis: a functional

explanation for relationships between redundancy, prosodic prominence, and duration in

spontaneous speech. Language and Speech, 47(1), 31–56.

https://doi.org/10.1177/00238309040470010201

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2014). Fitting linear mixed-effects models

using lme4. ArXiv:1406.5823 [Stat]. Retrieved from http://arxiv.org/abs/1406.5823

Boersma, P., & Weenink, D. (2001). PRAAT, a system for doing phonetics by computer. Glot

International, 5(9/10), 341–345.

Boutsen, F. R., Brutten, G. J., & Watts, C. R. (2000). Timing and intensity variability in the

metronomic speech of stuttering and nonstuttering speakers. Journal of Speech,

Language, and Hearing Research, 43(2), 513–520. https://doi.org/10.1044/jslhr.4302.513

Breen, M. (2018). Effects of metric hierarchy and rhyme predictability on word duration in The

Cat in the Hat. Cognition, 174, 71–81. https://doi.org/10.1016/j.cognition.2018.01.014

Page 21 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


Breen, M., Dilley, L. C., McAuley, J. D., & Sanders, L. D. (2014). Auditory evoked potentials

reveal early perceptual effects of distal prosody on speech segmentation. Language,

Cognition and Neuroscience, 29(9), 1132–1146.

https://doi.org/10.1080/23273798.2014.894642

Breen, M., Watson, D. G., & Gibson, E. (2011). Intonational phrasing is constrained by meaning,

not balance. Language and Cognitive Processes, 26(10), 1532–1562.

https://doi.org/10.1080/01690965.2010.508878

Brenier, J. M., Cer, D. M., & Jurafsky, D. (2005). The detection of emphatic words using

acoustic and lexical features. In Ninth European Conference on Speech Communication

and Technology.

Burling, R. (1966). The metrics of children’s verse: a cross-linguistic study. American

Anthropologist, 68(6), 1418–1441. https://doi.org/10.1525/aa.1966.68.6.02a00040

Coltheart, M. (1981). The MRC psycholinguistic database. The Quarterly Journal of

Experimental Psychology Section A, 33(4), 497–505.

https://doi.org/10.1080/14640748108400805

Dr. Seuss. (1957). The Cat in the Hat. New York, NY: Random House.

Drake, C., & Palmer, C. (1993). Accent structures in music performance. Music Perception: An

Interdisciplinary Journal, 10(3), 343–378. https://doi.org/10.2307/40285574

Fabb, N., & Halle, M. (2008). Meter in poetry: a new theory. Cambridge University Press.

Fitzroy, A. B., & Sanders, L. D. (2015). Musical meter modulates the allocation of attention

across time. Journal of Cognitive Neuroscience, 27(12), 2339–2351.

https://doi.org/10.1162/jocn_a_00862

Page 22 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


Francis, W., & Kučera, H. (1982). Frequency Analysis of English Usage. Boston: Houghton

Mifflin Company.

Fry, D. B. (1955). Duration and intensity as physical correlates of linguistic stress. The Journal

of the Acoustical Society of America, 27(4), 765–768. https://doi.org/10.1121/1.1908022

Gorman, K., Howell, J., & Wagner, M. (2011). Prosodylab-aligner: A tool for forced alignment

of laboratory speech. Canadian Acoustics, 39(3), 192–193.

Goswami, U. (1999). Causal connections in beginning reading: the importance of rhyme. Journal

of Research in Reading, 22(3), 217.

Gregory, M. L., Raymond, W. D., Bell, A., Fosler-lussier, E., & Jurafsky, D. (1999). The Effects

of Collocational Strength and Contextual Predictability in Lexical Production.

Hanna, P. N. A., Lindner, K., & Dufter, A. (2002). The meter of nursery rhymes: universal

versus language-specific patterns. In Sounds and systems: studies in structure and change

(pp. 241–267).

Huss, M., Verney, J. P., Fosker, T., Mead, N., & Goswami, U. (2011). Music, rhythm, rise time

perception and developmental dyslexia: Perception of musical meter predicts reading and

phonology. Cortex, 47(6), 674–689. https://doi.org/10.1016/j.cortex.2010.07.010

Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention,

and memory. Psychological Review, 83(5), 323–355. https://doi.org/10.1037/0033-

295X.83.5.323

Jurafsky, D., Bell, A., Gregory, M., & Raymond, W. D. (2001). Probabilistic relations between

words: Evidence from reduction in lexical production. In Typological studies in

language, vol. 45: Frequency and the emergence of linguistic structure (pp. 229–254).

Page 23 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


Amsterdam, Netherlands: John Benjamins Publishing Company.

https://doi.org/10.1075/tsl.45.13jur

Kochanski, G., Grabe, E., Coleman, J., & Rosner, B. (2005). Loudness predicts prominence:

fundamental frequency lends little. The Journal of the Acoustical Society of America,

118(2), 1038–1054.

Kochanski, G., & Orphanidou, C. (2008). What marks the beat of speech? The Journal of the

Acoustical Society of America, 123(5), 2780–2791. https://doi.org/10.1121/1.2890742

Lam, T. Q., & Watson, D. G. (2010). Repetition is easy: Why repeated referents have reduced

prominence. Memory & Cognition, 38(8), 1137–1146.

https://doi.org/10.3758/MC.38.8.1137

Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT

Press.

Nel, P. (2004). Dr. Seuss: American Icon. Bloomsbury Academic.

R Core Team. (2015). R: A Language and Environment for Statistical Computing. Vienna,

Austria. Retrieved from http://www.R-project.org/

RStudio Team. (2014). RStudio: Integrated Development for R. Boston, MA. Retrieved from

http://www.rstudio.org/

Silipo, R., & Greenberg, S. (2000). Prosodic stress revisited: Reassessing the role of fundamental

frequency. In Proc. NIST Speech Transcription Workshop.

Streefkerk, B. M., Pols, L. C., & Bosch, L. F. T. (1999). Acoustical features as predictors for

prominence in read aloud Dutch sentences used in ANN’s. In Sixth European Conference

on Speech Communication and Technology.

Page 24 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review


Wagner, M., & Watson, D. G. (2010). Experimental and theoretical advances in prosody: A

review. Language and Cognitive Processes, 25(7-9), 905-945.

Watson, D., & Gibson, E. (2004). The relationship between intonational phrasing and syntactic

structure in language production. Language and Cognitive Processes, 19(6), 713–755.

https://doi.org/10.1080/01690960444000070

Woodruff Carr, K., White-Schwoch, T., Tierney, A. T., Strait, D. L., & Kraus, N. (2014). Beat

synchronization predicts neural speech encoding and reading readiness in preschoolers.

Proceedings of the National Academy of Sciences, 111(40), 14559–14564.

https://doi.org/10.1073/pnas.1406219111

Young, S. J., & Young, S. (1993). The HTK hidden Markov model toolkit: Design and

philosophy. University of Cambridge, Department of Engineering.

Page 25 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

Figure 1 – Word intensity measurement. An excerpt from one The Cat in The Hat production is plotted as a time-frequency domain spectrogram (top) and as a time domain waveform (bottom). Identified word and silence boundaries are plotted as dashed vertical lines. The smoothed intensity contour generated for this excerpt is plotted in black over the spectrogram, with the parabolically-interpolated maximum intensity for

each word indicated with an asterisk.

86x41mm (300 x 300 DPI)

Page 26 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

Figure 2 – Regression predictors. Word intensity was modeled using linear mixed-effects regression with within-subjects factors of metric strength in a 6/8 metric structure (MS), rhyme predictability (RP), number

of phonemes (#P), lexical frequency (LF), word class (WC), font emphasis (FE), intra-stanza repetition

(ISR), and syntactic boundary strength (SBS). Lexical frequency values are rounded to the nearest tenth for clarity. See text for details.

65x23mm (300 x 300 DPI)

Page 27 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

Figure 3 – Fixed effect estimates (β) from final experimental model. Metric strength and rhyme predictability estimates are highlighted in black, linguistic control factors are shown in grey. Thicker horizontal bars

indicate one standard error, thinner horizontal bars indicate two standard errors.

116x77mm (300 x 300 DPI)

Page 28 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

Figure 4 – Maximum word intensity as a function of position in a 6/8 metric hierarchy and rhyme predictability. Produced word intensity increased hierarchically with metric strength (left), and predictable rhyme targets were produced with lower intensity than other words, including phonologically similar but

unpredictable rhyme primes (right). Note that all words that were not rhyme targets were coded as unpredictable in the regression models, but only unpredictable rhyme primes are shown here for clarity.

83x40mm (300 x 300 DPI)

Page 29 of 28


Language and Speech

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Reviewmbreen/pubs/FitzroyBreen_CITHint_sub… · read-alone productions of The Cat in the Hat (Dr. Seuss, 1957). Using linear mixed-effects regression, we modeled the maximum

Documents