Top Banner
ABSTRACT Current computer animated speech systems do not take into account the visual impact of prosody. This leads to nonrealistic animated figures, since prosodic effects are fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation, length, and rhythm of syllables and sentences. A person’s use of prosody during speech causes his/her mouth, face, jaw, and lips to visually change. The project had three distinct phases. The first phase started with an analysis of the existing linguistics research on standard American English. During this phase an experimental corpus of words and sentences was developed that exhibits prosody. This corpus was used in a motion capture environment to capture raw data of prosodic effects on the human face. In the second phase, computer assisted data segmentation was used to remove noise, determine the timing of phonemes, and to match the data captured to prosody parameters. During the third phase, computer software was developed to implement an algorithm to extract jaw, mouth, and facial muscle parameters using the motion capture data as input data. These parameters were used to animate a parametric facial model. The extracted parametric curves can then be used to develop a model of prosody to create advanced facial animations. ii
85

Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

Jun 26, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

ABSTRACT

Current computer animated speech systems do not take into account the visual

impact of prosody. This leads to nonrealistic animated figures, since prosodic effects are

fundamental in human communications. Prosody, for a person speaking in English, is the

stress, intonation, length, and rhythm of syllables and sentences. A person’s use of

prosody during speech causes his/her mouth, face, jaw, and lips to visually change.

The project had three distinct phases. The first phase started with an analysis of

the existing linguistics research on standard American English. During this phase an

experimental corpus of words and sentences was developed that exhibits prosody. This

corpus was used in a motion capture environment to capture raw data of prosodic effects

on the human face. In the second phase, computer assisted data segmentation was used

to remove noise, determine the timing of phonemes, and to match the data captured to

prosody parameters. During the third phase, computer software was developed to

implement an algorithm to extract jaw, mouth, and facial muscle parameters using the

motion capture data as input data. These parameters were used to animate a parametric

facial model. The extracted parametric curves can then be used to develop a model of

prosody to create advanced facial animations.

ii

Page 2: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

TABLE OF CONTENTS

Abstract…………………………………………………………………………. ii Table of Contents………………………………………………………………. iii List of Figures...................................................................................................... ..v 1. Introduction and Background...……………………………………….….....1

1.1 Background ……………………….…………………………………....1

1.2 What Is Prosody?.........………………………………………………....2 1.3 What Using a Prosody Model Can Accomplish………..……………....4 1.4 Current Research..........………………………………………………....5

1.4.1 Talking Head Model..........……………………………….........5

1.4.2 Structure of Talking Head………………………………........ ..5

2. Animated Speech Prosody Modeling................................…………….........7 2.1 Corpus Selection Criteria..……………………………………………..7 2.2 Data Segmentation........………………………………………………..8 2.3 Prosody Parameter Development..……………………………………..8 3. System Design or Research..................…………………………………......9

3.1 Determining a Prosody Corpus……….……………………………......9

3.2 Use of Corpus during Motion Capture…………………..…………......9 3.3 Data Segmentation……....………………………………………….....10

3.4 Inverse Mapping Algorithm.……………………………...…........…..11

4. Evaluation and Results…..………………………………………………...14

iii

Page 3: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

4.1 Corpus Evaluation………….……….……………………..………........14

4.2 Corpus Results...…………….……………………………...……..........14

4.2.1 Suprasegmentals…..……………………………….….…...….....15 4.2.2 Word Stress…..………...…………………………….….…........15 4.2.3 Intonation……...…..……………………………….…....…….....15 4.2.4 Length……………..…………………………….………….…....16 4.2.5 Sentence Stress and Rhythm...……………………..……............16 4.3 Data Segmentation Evaluation………...………………………..…........16

4.4 Data Segmentation Results...…………..……………………...…..........17

4.5 Prosody Parameter Evaluation……….…………………..………..........19 4.6 Prosody Parameter Results……….…………………..………...............20 5. Future Work......................................………………………….…………….25 6. Conclusion.......................................………………………………….….…..26 Bibliography and References………..…………………………..………..…....…28 Appendix A. Prosody Corpus ..............................................................................31 Appendix B. Label and Hierarchy Files................................................................45 Appendix C. Algorithm Code................................................................................53 Appendix D. Example Phoneme-Viseme Parameter Set.......................................74

iv

Page 4: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

LIST OF FIGURES

Figure 1.1. Facial Animation Schematic....………………………………….......6 Figure 1.2. Parameter Control Screen....…………………………………...........6 Figure 4.1. EMU Segmentation Results – Timing..............................................17

Figure 4.2. EMU Segmentation Results – Hierarchy..........................................18

Figure 4.3. EMU Segmentation Results – Query Tool.......................................18

Figure 4.4. EMU Segmentation Results – Query Results...................................19

Figure 4.5. EMU Segmentation Results for “about”...........................................20 Figure 4.6. Top 5 Key Parameter Values for “about”.........................................21 Figure 4.7. Error Reduction for “about”..............................................................22 Figure 4.8. Word Parameter Impact.....................................................................23 Figure 4.9. Parameter Impact...............................................................................23

v

Page 5: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

1. INTRODUCTION AND BACKGROUND

Prosody is the stress, intonation, length, and rhythm of syllables and sentences, as

a person is speaking English. Audio prosody has been extensively studied. During

prosodic speech, how a person visually changes his/her mouth, face, jaw, and lips

synchronized with audio prosody has not been extensively studied. As a result, computer

animated speech systems do not take into account the visual impact of prosody. This

leads to non-realistic animated figures. Conducting basic research of prosody and its

visual components was the objective of this study.

1.1 Background

Audio prosody, sometimes called suprasegmentals in linguistics, has been studied

extensively in the field of phonetics [Coleman 2005]. This was due to the need for

understanding the process in learning languages, either as the primary or follow-on

language learned by individuals. Most of this research focused on phonemes made up of

one or more phones (basic sounds a human can produce) and their use to build syllables

which in turn were used to build words, phrases, and utterances. By applying prosodic

affects to these phonemes, syllables, words, phrases, and utterances you can change their

information content and meaning. In addition, how phonemes are physically created was

studied.

A field of study called English phonology has developed since the initial research

interest in phonetics. English phonology is the study of the patterns in speech sounds in

the English language [Wikipedia 2005]. In particular, it is this field of study that appears

1

Page 6: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

to be the dominant field in the study of audio prosody. The concept of prosody is applied

to a subset of the phonological hierarchy, called the prosodic hierarchy.

Animated speech prosody modeling is a relatively new field of research. Prosody

effects, i.e., how the mouth, face, and lips change as a result of the speaker’s use of

prosody to communicate, is relatively unexplored at this time.

With the advent of digital computers and the advances in computer graphics since

1990, more and more full length, feature movies are computer animations. Also in the

future, computer animations of human’s speaking will become a main human computer

interaction mechanism. The present lack of naturally speaking animations, that take into

account both visual effects and audio prosody, have created an emerging field of study.

1.2 What Is Prosody?

Over the years audio prosody has been defined in many ways, with many different

terms, but the underlying concepts of what audio prosody is have been fairly standard. It

will be useful to explore the various definitions of prosody to build a case that it is the

perspective of the prosody researcher that varies, not the underlying concepts.

Three different definitions of prosody are given in Wikipedia, the free

encyclopedia [Wikipedia 2005]. They are:

• “Prosody consists of distinctive variations of stress, tone, and timing in

spoken language. How pitch changes from word to word, the speed of

speech, the loudness of speech, and the duration of pauses all contribute to

prosody.

• In linguistics, prosody includes intonation and vocal stress in speech.

2

Page 7: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

• In poetry, prosody includes the scansion and metrical shape of the lines.”

Another definition of prosody can be gleaned from the International Phonetic

Alphabet use of “a group of symbols for stress, length, intonation, syllabification and

tone under the general heading “suprasegmentals,” reflecting a conceptual division of

speech into “segmental” and “suprasegmental” parts.” [Coleman 2005]. Note, this

distinction is only partially correct, because IPA also applies the concepts of stress and

intonation to vowels and consonants which are at the segmental level of the phonological

hierarchy, i.e., the phoneme level.

Another definition of prosody comes from the field of speech synthesis. It defines

prosody to be the relationships between the duration, amplitude, and F0 (the fundamental

frequency) of sound sequences [O’Shaughnessy 2000]. He believes “segmentals cue

phoneme and word identification, while prosody primarily cues other linguistic

phenomena.” [O’Shaughnessy 2000]. There are lexically stressed syllables to place

emphasis, syntactically and emotionally stressed words and phrases, and efforts to make

speech clearer to understand. He also believes that “highlighting stressed syllables

against a background of unstressed syllables is a primary function of prosody.”

[O’Shaughnessy 2000].

A final definition comes from J. Bowen’s book entitled Patterns of ENGLISH

Pronunciation written in 1975 [Bowen 1975]. He states “The term intonation as used in

this book is intended to cover a number of separate phenomena, including stress, pitch,

juncture (the transitions between phrases from sound to silence at the ends of phrases),

and rhythm.” [Bowen 1975]. He further highlights that the combination of these

phenomena, with the syllables making up words and phrases, conveys the meaning

3

Page 8: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

intended by the person speaking. He then proceeds to examine and to describe some of

the features of intonation and their combinations in the spoken informal English

language. Finally, he defines prosody as “the science of poetical forms, including

quantity and accent of syllables, meter, versification, and metrical composition.” [Bowen

1975].

1.3 What Using a Prosody Model Can Accomplish

Bowen’s definition of intonation produces almost exactly the same affects as

does O’Shaughnessy’s definition developed in 2000. So, if one considers all of the

prosody definitions, it is clear that audio prosody is an essential part of human

communications.

Now the question is, “Why do we want to capture the audio and visual elements

of prosody in facial computer animations?” The answer is clearly that we desire to have

our computer animations produce, as close as possible, the reality of human

communications.

The underlying goal of this project is to conduct research that supports further

work toward a mathematical model of visual effects and audio prosody for computer

animations. For example, having a computer animated figure reading poetry with the

same level of quality, understandability, and emotions, as a real human, could result from

this and follow-on research.

4

Page 9: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

1.4 Current Research

Speaking facial model animations currently exist, but to the best of my knowledge

none of them tightly couple the facial models with linguistics prosody associated with

standard American English. Part of the problem is the lack of definition of prosody in a

linguistics sense by the developers of these facial models. An example of this disconnect

is apparent in a recent article. Somasundaram believes that visual prosodic elements are

the movement of the head, eye, eyebrow, and eyelid and that these elements improve the

intelligibility of speech. [Somasundaram 2005].

Speaking facial model animations are fundamentally different from the facial

animations used in today’s generation of animated movies. In animated movies, actors

provide the voices before the animation is created on a frame by frame basis by

animation artists. Speaking facial model animations have synthesized or recorded speech

creating the facial animations as the speaking is occurring.

1.4.1 Talking Head Model

Talking Head is a 3D parametric lip model which supports the lip motion for

facial animation [King 2001]. It uses the approach given in the schematic in Figure 2.1.

It was primarily designed to provide a lip model for synchronized speech. It also has

been used successfully in a text-to-audio-visual-speech system to achieve facial

animations with synchronized speech.

1.4.2 Structure of Talking Head

The lip model is created from a B-spline surface which has parameters to define

the movement of the lip surface and its internal surface [King 2000]. The model

parameterization is muscle-based and allows for specification of a wide range of lip

5

Page 10: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

motion. Figure 2.2 shows a screen shot of the talking head model control screen. The

lip model is combined with a model of a human head which uses computer graphics to

add color, lighting, and surface texture to increase the combined model’s realism.

Figure 1.1: A facial animation system general schematic [King 2001]

Figure 1.2: A parameter control screen [King 2001]

6

Page 11: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

2. ANIMATED SPEECH PROSODY MODELING

This section provides an overview of the project. The project had three distinct

phases. The first phase developed a conceptual basis for determining an experimental

corpus of words and sentences that exhibit prosody. This corpus was used in a motion

capture environment at The Ohio State University to capture raw data of prosody effects

on the human face. During the second phase, computer assisted data segmentation was

used to remove noise, to determine timing of phonemes, to match the data captured to

prosody parameters, and to better understand the captured data. During the third phase,

computer software was developed, using the motion capture data as input data. This

software implemented an algorithm to extract jaw, mouth, and facial muscle parameters.

These parameters drive an existing parametric talking head facial model and set the stage

for further research into the effect of prosody on animated speech.

2.1 Corpus Selection Criteria

Selection criteria was determined which focused on all three primary areas of

prosody in linguistics, word stress, intonation, and length. Words, phrases, and sentences

were selected to get a broad mix of a, e, i, o, u, and y phonemes associated with the

consonants that surround them. Sentences and a poem were selected in an attempt to

capture some data on sentence stress and rhythm.

The Carnegie Mellon University (CMU) Pronouncing Dictionary was used in the

selection process [CMU 2005]. It is a pronunciation dictionary for North American

English that contains over 125,000 words and their phoneme construction. It is

7

Page 12: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

particularly useful for speech recognition and synthesis research because it maps the

words to the phoneme set that is commonly used to pronounce the words. It currently

contains 39 phonemes, and if you consider vowels with stress, the dictionary contains 50

phonemes that are commonly used in standard American English (see Appendix A for a

listing of the phonemes).

2.2 Data Segmentation

Data segmentation techniques were used to ensure the data represented the

prosodic aspects highlighted in each motion capture data session. The phonemes’ (which

make up the syllables, phrases, and sentences) timing and other characteristics were

processed and evaluated.

2.3 Prosody Parameter Development

During the third phase, computer software was developed to extract prosody

parameters (jaw, mouth, and other facial movements, i.e., the visemes produced by

prosody) from the motion capture data. TalkingHead, an existing set of computer

programs, is a parametric talking head facial model that uses a set of these parameters to

deform the face. An algorithm was used to do an inverse mapping of motion capture data

to a set of 19 parameters for the 50 phoneme-viseme pairs. The local optimal set for the

individual phoneme-viseme pairs is the set of parameters that minimized the distance

(error) between the motion capture data and the virtual marker data for each motion

capture frame. Virtual marker data locations were initially set for a head at rest and then

were transformed for each frame to predict how the animated face should move.

8

Page 13: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

3. SYSTEM DESIGN OR RESEARCH

This section provides a detailed description of how the project was accomplished.

As previously mentioned, this project had three phases. The first phase surveyed

linguistics research to develop a basis for determining an experimental corpus of words

and sentences that exhibit prosody. The second phase accomplished computer assisted

data segmentation to remove noise, to determine the timing of phonemes, to match the

data captured to prosody, and to clearly understand the data captured during the first

phase. During the third phase, computer software was developed to extract prosody

parameters from the motion capture data.

3.1 Determining a Prosody Corpus

A prosody corpus, designed to highlight the principle prosodic aspects of standard

American English, was developed using applied phonetics laboratory exercises. Applied

phonetics is a field primarily used to train teachers how to help people with English as a

second language, people with speaking difficulties, or people who want to improve their

pronunciation of American English.

The Carnegie Mellon University (CMU) Pronouncing Dictionary was used in the

selection process of various words and sentences that highlight prosodic aspects of

standard American English. The selected words and sentences are in Appendix A.

3.2 Use of Corpus during Motion Capture

Each word, phrase, sentence was said two times as it would be said in standard

American English to establish a baseline for each particular prosody aspect, word or

9

Page 14: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

sentence. This procedure was then repeated by saying the corpus content, one time,

loudly, then softly, then faster, and then slower. By varying the volume and speaking

rate of the corpus, pitch changes under differing conditions were captured, a full range of

viseme patterns were available for parameterization, and several candidate viseme

patterns to simulate emotional activity of a speaker were captured in the 96 motion

capture sessions.

The motion capture data collection design included many features to facilitate the

data segmentation process. A wired microphone was used for sound capture. A head jig

was used to provide the motion capture equipment a reference set of points which were

later used to remove rigid motion effects from the motion capture data. A rigid object

with a second set of points was next to the speaker for advanced noise estimation.

Motion capture sessions were broken up into different sessions and tagged with both the

type of session and flash card number. The speaker made a deliberate pause between

words, phrases, and sentences where he closed his mouth. The audio capture was synced

to the motion capture, so motion capture data frames and selected audio waveforms

exhibiting prosody were time related.

3.3 Data Segmentation

The second phase involved computer assisted data segmentation to remove noise,

to adjust the timing of the phonemes, to match the data captured to prosody, and to

clearly understand the data captured during the first phase. An open source text to speech

program, called Festival, and Festvox, a Festival tool, were used to match the spoken data

phonemes to the 50 phonemes used in the CMU dictionary. It was also used to generate

10

Page 15: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

initial estimates of the start and stop times of the collected phonemes. The EMU Speech

Database System, an audio analyzer computer program, was used to fine tune the start

and stop times of these phonemes to the actual audio waveforms. EMU was also used to

develop a hierarchical database of phones, phonemes, syllables, words, phrases, and

utterances for future use with TalkingHead software.

3.4 Inverse Mapping Algorithm

The primary inputs to the inverse mapping algorithm were the location of the

motion capture data markers in x, y, and z coordinates on the face of the speaker, as the

speaker goes through the data generation plan, using the prosody corpus. These inputs

were compared to the virtual x, y, and z data resulting from systematically varying

combinations of individual facial model parameters. Thus, the virtual marker placement

in x, y, and z coordinates were translated due to facial deformation. These translated

coordinates were then subtracted from the motion capture data for each frame to

determine the error. The combination of parameters with the smallest distance error, for

each frame, was determined to be the local optimal set of parameters for a particular

phoneme/viseme combination. The combination of parameters unique to each frame were

stored so they could be used to playback the prosodic features of a particular phoneme,

syllable, word, or sentence in the future.

Talking Head model code and mostly MoCap.C was modified to create the new

code to extract facial model parameters. This code is given in Appendix C - Algorithm

Code. The MoCap.C subroutine, MoCapGUICB, was modified to control the estimation

of parameters on a frame by frame basis.

11

Page 16: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

The TalkingHead.C subroutine, estimateJawParamFromMocap, was used to

provide an initial estimate for the Jaw Open parameter. The TalkingHead.C subroutine

was used to provide an initial estimate for the OrbOris parameter.

The error between the measured markers and the virtual markers was calculated in

the newly created TalkingHead.C subroutine, markerError. The MoCap marker set

locations and the virtual marker set locations were subtracted in this subroutine.

The newly created TalkingHead::estimateParamsFromMocap(int f, float t)

subroutine then estimated 17 parameters to produce a locally optimal phoneme-viseme

set for each frame. The subroutine was based on a search of realistic values of each of

the parameters in a prioritized order of importance. Once the higher priority parameters

were determined they are used as givens in the remaining search processes to reduce

computation time. The parameters were estimated, in a six step search process, in the

following order: k0, k1; k3, k12, k14; k4, k5, k6; k7, k15, k16; k11, k17, k18; and k8,

k9, k10. A description of the parameters follows:

Step1 k0 = OPEN_JAW - opens jaw k1 = JAW_IN - moves jaw in Step 2 k3 = ORB_ORIS - contracts lips, making mouth opening smaller k12 = DEP_INF - opens both lips k14 = MENTALIS - pulls lips together Step 3 k4 = L_RIS, Left Risorius - moves left corner towards ear k5 = R_RIS, Right Risorius - moves right corner towards ear k6 = L_PLATYSMA, Left Platysma - moves left corner downward and lateral Step 4 k7 = R_PLATYSMA, Right Platysma - right corner downward and lateral k15 = L_BUCCINATOR - pulls back at left corner k16 = R_BUCCINATOR - pulls back at right corner

12

Page 17: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

Step 5 k11 = R_LEV_SUP - moves right top lip up k17 = INCISIVE_SUP - top lip rolls over bottom lip k18 = INCISIVE_INF - bottom lip rolls over top lip Step 6 k8 = L_ZYG, Left Zygomaticus - raises corner up and lateral k9 = R_ZYG, Right Zygomaticus - raises corner up and lateral k10 = L_LEV_SUP - moves left top lip up

13

Page 18: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

4. EVALUATION AND RESULTS

This section describes the evaluation and the results. During the first phase

evaluation and results consisted primarily of a statistical analysis to determine

completeness of phoneme coverage and developing a theory based prosody corpus. The

second phase evaluation and results consisted of utilizing data segmentation techniques

and developing a methodology to segment individual phoneme – viseme pairs collected

during motion capture. Such things as common sense examination of the motion capture

data, listening to audio while viewing the waveforms to remove noise, adjusting start and

stop times of the phonemes, matching the data captured to prosody, and understanding

the strengths and weaknesses of the motion capture data were all used. During the third

phase, prosody parameters (visemes) were generated from the motion capture data.

4.1 Corpus Evaluation

The primary way the corpus was tested prior to using it for collecting prosody

data during motion capture was through the use of statistical analysis. Specifically a

histogram was developed that showed the frequency distribution of all 50 phonemes to

ensure that a complete coverage of these phonemes would occur during data capture.

4.2 Corpus Results

As a result of the research into prosody, it was determined that the primary

aspects of prosody in standard American English were: word stress, intonation, and

vowel length. Secondary aspects of prosody were: sentence stress and rhythm. These

aspects were the prosody attributes that the corpus was developed to study.

14

Page 19: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

4.2.1 Suprasegmentals

Suprasegmentals (prosodic features) are fundamental to the definition of prosody

in linguistics research. Their effects are said to be “superimposed on” or “distributed

over” strings of segments [Lehiste 1970].

In standard American English, a phoneme is a segment. A suprasegmental’s

domain is larger than just one segment and may apply to an entire syllable or word. To

study suprasegmentals, comparisons or relative values must be used, e.g., to decide if a

vowel is long it must be compared to another vowel. Also, different suprasegmental

aspects interact, e.g., stressed vowels tend to be lengthened.

4.2.2 Word Stress

Word stress is a primary prosodic effect in standard American English. Stressed

syllables are produced with extra force or increased muscular effort. Stress differences

are used to distinguish words containing the same sounds, e.g., convict (noun) and

convict (verb). Some guidelines exist for two-syllable words, but in general stress is not

predictable. In syllables with stressed vowels there is a slight change of pitch just prior to

the next phoneme.

4.2.3 Intonation

Intonation is a primary prosodic effect in standard American English. Intonation

refers to the function of pitch (vocal fold vibration or fundamental frequency) at the

phrase or sentence level [Lehiste 1970]. Pitch changes signal meaning differences, e.g.,

in statements, exclamations, and questions:

– You’re going.

– You’re going!

15

Page 20: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

– You’re going?

Pitch change patterns also exist for commands, where, what, why, when, yes/no , and tag

questions.

4.2.4 Length

Length is less significant than word stress and intonation as a prosodic effect in

standard American English. Length is the perceived duration and quantity refers to how

length is used in a language [Lehiste 1970]. Length changes are largely predictable and

often depend on neighboring phonemes for both vowels and consonants.

4.2.5 Sentence Stress and Rhythm

Sentence stress is often used in standard American English. Content words, such

as nouns and verbs, are normally stressed by speaking them more slowly or more

distinctly. Sentence stresses tend to occur at regular intervals, so English is said to be a

stress-timed language. Poetry and the rhythm of a poem are an example of the use of

sentence stress to convey different meanings and images.

4.3 Data Segmentation Evaluation

The primary way the motion capture data was tested was through a labor intensive

analysis of the motion capture and audio data using two software programs. These

programs are extensively used in speech synthesis and linguistics research. Data

anomalies were determined and judgment calls were made to determine timing and

validity of data. For example, noise and/or other problems, such as the speaker saying

extra words in a recording, were removed from the data.

16

Page 21: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

4.4 Data Segmentation Results

The following Figures show samples of the data segmentation results. Figure 4.1

shows an example of phoneme timing and Figure 4.2 shows an example of the

hierarchical database created for quickly finding phonemes and their characteristics.

Figure 4.1: EMU Segmentation Results – Timing

17

Page 22: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

Figure 4.2: EMU Segmentation Results - Hierarchy

Figure 4.3: EMU Segmentation Results – Query Tool

18

Page 23: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

Figure 4.4: EMU Segmentation Results – Query Results

Figure 4.3 shows the EMU query tool searching for the phoneme “IH0” in the

sound wave. Figure 4.4 shows the results of that search. It is now possible to create

libraries of the 50 phonemes and their timing characteristics. Appendix B - Label and

Hierarchy Files, contains an example of the *.lab file (which gives the phoneme timing

and is used in TalkingHead) and the *.hlb file (which gives the utterance hierarchy).

4.5 Prosody Parameter Evaluation

The intent of this phase was to determine a set of prosody parameters for each

frame that represented a local optimal solution for each phoneme-viseme pair. During

each developmental step of the computer software, used to implement the inverse

mapping algorithm, the new code was white box tested to insure the appropriate action

was happening. This was necessary, because the new code had to interface with

19

Page 24: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

theTalkingHead model code, which is over 50 Mbytes and represents numerous years of

research work. Also, with any code project of this magnitude, there are numerous coding

techniques, object oriented relationships, and OpenInventor (an open source OpenGL

application program) techniques that had to be studied.

4.5 Prosody Parameter Results

A careful comparison of Figure 4.1 and Figure 4.5 for the phonemes AH1 and

AH0, respectively, clearly show the prosodic effect of stress on the phoneme AH. AH0

is non-stressed, as indicated by the “0” following the AH. AH1 is the same phoneme

stressed, as indicated by the “1” following the AH. The “judgment” AH1 audio

waveform is different in the time domain, the frequency domains, and pitch.

Figure 4.5: EMU Segmentation Results for “about”

20

Page 25: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

This project has captured similar information for every phoneme, syllable, word,

phrase, and utterance in the prosody experimental corpus. Associated with every audio

prosody event captured is a phoneme-viseme parameter set to animate the Talking Head

model in real-time. An example phoneme-viseme parameter set for the word “about” in

given in Appendix D.

Figure 4.6 gives the top 5 viseme parameters for “about” for the MoCap frames

associated with saying the word “about”. These parameters move the Talking Head

model and thus exhibit the use of the non-stressed phoneme AH0. The actual word

“about” is shown from approximately frame 1073 to frame 1137.

Figure 4.6: Top 5 Key Parameter Values for “about”

Another set of parameters move the Talking Head model differently for the word

“judgment” for the stressed phoneme AH1. Thus, it is now possible to compare and to

21

Page 26: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

study both the audio differences between AH0 and AH1 and the visual differences caused

by audio prosody.

Figure 4.7 graphically represents the six step search process and an example of its

results. The line labeled “Guess” was an heuristic approach to model two parameters

quickly. The line labeled “Initial” was the best case error given the current limitations of

the Talking Head model. These limitations included the model being based on the

speaker’s face several years ago along with errors in the MoCap and virtual marker

determination process.

Figure 4.7: Error Reduction for “about”

Step 1, using 2 different parameters, was the first step in the process and

immediately produced better results than “Guess” prior to “about” being said, during the

time “about” was voiced, and immediately after “about” was voiced. Each subsequent

step converged to the limiting “Initial” error in the virtual marker data. Similar search

22

Page 27: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

results were obtained with the words “become” and “judgment.” Also, only 3 frames

from 117 frames for “about” produced errors of 28 – 35 millimeters for the 44 marker

locations studied.

Word Parameter Impact

02468

1012141618

0 1 3 4 5 7 8 9 10 11 12 14 15 16 17 18

Parameter Number

Rank

(low

er m

eans

mor

e im

pact

) aboutbecomejudgment

Figure 4.8: Word Parameter Impact

Parameter Impact

02468

1012141618

0 1 3 4 5 6 7 8 9 10 11 12 14 15 16 17 18

Parameter Number

Aver

age

Ran

k (lo

wer

m

eans

mor

e im

pact

)

Figure 4.9: Parameter Impact

23

Page 28: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

Figures 4.8 and 4.9 clearly show that lip movement and then jaw movement

parameters dominate the prosody parameters. These figures were created by rank

ordering the impact each parameter had describing the facial movements during the

speaking of three different words. These words were chosen to give a wide range of

facial movements due to the speaking of different phonemes. The lower the number

means the higher the rank, e.g., a rank order of 1 indicates that this parameter was the

dominate parameter produced by the inverse mapping algorithm.

24

Page 29: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

5. FUTURE WORK

Now that a research scope has been established, there are several areas where

future work can extend this project. One of the main things that need to be done is

improving the “Initial” error situation. This will involve updating the Talking Head

facial model to reflect the speaker changes and improving the translation process for both

the MoCap data and the virtual marker data.

Finding a better way to estimate the sets of parameters, given motion capture data,

is another primary task that should also be done. It is also envisioned that feedback loops

could be used in the optimization of a parameter set for each type of prosody and

phoneme used in the corpus.

Visemes can be viewed by comparing the movies taken during each of the motion

capture data sessions with the Talking Head facial model saying the same word or phrase

with the same prosodic effects. This gave a high level indication as to how the

methodology worked.

Libraries of model parameters (visemes) could be developed at the phoneme

level. Different words that contain the same phonemes could be used to determine how

good the parameter set developed from one use of the phoneme applies to a differing use

of the phoneme. This will give a more quantifiable measure of the prosody effects

captured and lead to Talking Head exhibiting real-time prosodic effects.

Model parameters (visemes) developed could be studied at the word, phrase, or

sentence level from different volume levels or different speed levels. This would give a

range of parameter sets for each of the prosody effects captured.

25

Page 30: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

Finally, some of the activities in each phrase could be improved upon. Low

priority improvements were not attempted because of the limited time available for this

project. Also, applying the methodology to all the motion capture data was not

attempted.

6. CONCLUSION

This project is important because it helps understand the impact of prosody to

change the animated figure’s face as the figure is speaking. A person’s use of prosody

during speech causes his/her mouth, face, jaw, and lips to visually change. These

changes have clearly documented by this project. This project also clearly demonstrates

that audio prosody does have a visual signal associated with the audio signal.

This project has produced a research scope and an experimental methodology that

can help in future explorations of visemes in human communications. It has also

produced several results that are significant in the use of this methodology. Findings that

the “Initial” error in MoCap and virtual marker translation process need to be improved is

important to develop trust in the inverse mapping algorithm. Findings that the inverse

mapping algorithm converges to the “Initial” error indicate that a tailored search

procedure for a local optimal solution is workable and a great improvement over

simplistic heuristic models. Determining whether or not a global optimal solution is even

necessary to exhibit prosody in an animated facial model is now in question. Findings

that indicate that lip movement and jaw movement are the primary parameters to describe

prosodic events need to be further explored. All of these findings, as they are further

26

Page 31: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

explored and validated, will lead to more realistic facial animations and improved human

computer interaction(s) with future computer systems.

27

Page 32: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

BIBLIOGRAPHY AND REFERENCES

[Bowen 1975] Bowen, J. Donald. Patterns of ENGLISH Pronunciation. Newbury House Publishers, Inc., Rowley, Massachusetts, 1975.

[Bronstein 1960] Bronstein, Arthur J. The Pronunciation of AMERICAN ENGLISH, An

Introduction to Phonetics. Appleton-Century-Crofts, Inc., New York, N.Y., 1960. [Clarey 1963] Clarey, M. Elizabeth and Dixson, Robert J. Pronunciation Exercises in

English. Regents Publishing Company, Inc., New York, N.Y., 1963. [CMU 2005] Carnegie Mellon University. The CMU Pronouncing Dictionary.

Available from http://www.speech.cs.cmu.edu/cgi-bin/cmudict (visited Oct 30, 2005).

[Coleman 2005] Oxford University Phonetics Laboratory. Prosody (and

Suprasegmentals). Available from www.phon.ox.ac.uk/~jcoleman/PROSODY.htm (visited Oct 4, 2005).

[DeCarlo 2002] DeCarlo, Matthew Stone, Corey Revilla, and Venditti, Jennifer J. Making discourse visible: coding and animating conversational facial displays.

Proceedings of Computer Animation, (June 2002), 11 - 16. [DeCarlo 2004] DeCarlo, Matthew Stone, Corey Revilla, and Venditti, Jennifer J. Specifying and animating facial signals for discourse in embodied conversational

agents. Computer Animation Virtual Worlds 2004, 15 (2004), 27 - 38. [DeCarlo 2005] DeCarlo, Doug and Stone, Matthew. The Rutgers University Talking

Head: RUTH. Available from http://www.cs.rutgers.edu/~village/ruth/ruthmanual10.pdf (visited Nov. 6, 2005).

[Edge 2001] Edge, James D. and Maddock, Steve. Expressive Visual Speech using

Geometric Muscle Functions. Proceeding Eurographics UK, 2001, (2001). [Edge 2003] Edge, James D. and Maddock, Steve. Image-based Talking Heads using

Radial Basis Functions. Proceedings of the Theory and Practice of Computer Graphics, 2003, (2003).

[Edge 2003] Edge, James D., Manuel S. Lorenzo, Scott A. King and Maddock, Steve.

Use and Re-use of Facial Motion Capture Data. Proceedings of Vision, Video, and Graphics, 2003, University of Bath, (July 2003), 135 - 142.

28

Page 33: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

[Edwards 1983] Edwards, Mary Louise and Shriberg, Lawrence D. Phonology Applications in Communicative Disorders. College-Hill Press, Inc., San Diego, California, 1983.

[Edwards 1986] Edwards, Mary Louise. Introduction to Applied Phonetics, Laboratory

Workbook. College-Hill Press, San Diego, California, 1986. [Geroch 2004] Geroch, Margaret S. Motion Capture for the Rest of Us. Journal of

Computing Sciences in Colleges, (2004). [Goecke 2004] Goecke, Roland. A Stereo Vision Lip Tracking Algorithm and

Subsequent Statistical Analyses of the Audio-Video Correlation in Australian English. Ph.D. Thesis, The Australian National University, Canberra, Australia, 2004.

[Harrington 1999] Harrington, Jonathan and Cassidy, Steve. Techniques in Speech

Acoustics. Kluwer Academic Publishers, Boston, Massachusetts, 1999. [Harris 2005] Harris, Randy Allen. Voice Interaction Design, Crafting the New

Conversational Speech Systems. Morgan Kaufmann Publishers, Boston, Massachusetts, 2005.

[Horne 2000] Horne, Merle, editor. Prosody: Theory and Experiment, Studies

Presented to Gosta Bruce. Kluwer Academic Publishers, Boston, Massachusetts, 2000.

[King 2000] King, Scott A., Parent, Richard E., and Olsafsky, Barbara. An

Anatomically-Based 3d Parametric Lip Model to Support Facial Animation and Synchronized Speech. Proceedings of Deform 2000, 29 -30 November, Geneva, (2000), 7 -19.

[King 2001] King, Scott A. A Facial Model and Animation Techniques for Animated

Speech. Ph.D. Thesis, Ohio State University, Columbus, Ohio, 2004. [Ladefoged 1975] Ladefoged, Peter. A Course in Phonetics. Harcourt Brace

Jovanovich, Inc., New York, N.Y., 1975. [Lehiste 1970] Lehiste, I. Suprasegmentals. The M.I.T. Press, Cambridge,

Massachusetts, 1970. [Lieberman 1967] Lieberman, Philip. Intonation, Perception, and Language. The

M.I.T. Press, Cambridge, Massachusetts, 1967. [O’Shaughnessy 2000] O’Shaughnessy, Douglas. Speech Communications, Human and

Machine 2nd ed. The Institute of Electrical and Electronics Engineers, Inc., New York, N.Y., 2000.

29

Page 34: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

[Sakiey 1980] Sakiey, Elizabeth, Edward Fry, Albert Goss, and Loigman, Barry. A

Syllable Frequency Count. Visible Language (originally published as The Journal of Typographical Research), Vol. 14.2, (1980).

[Somasundaram 2005] Somasundaram, Arunachalam and Parent, Rick. Audio-Visual

Speech Styles with Prosody. Eurographics/ACM SIGGRAPH Symposium on Computer Animation, Posters and Demos, 2005.

[Wikipedia 2005] Wikipedia. Available from en.wikipedia.org/wiki/ (visited on Oct. 2,

2005). [Wennerstrom 2001] Wennerstrom, Ann. The Music of Everyday Speech, Prosody and

Discourse Analysis. Oxford University Press, Inc., New York, N.Y., 2001.

30

Page 35: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

APPENDIX A – PROSODY CORPUS

The current CMU phoneme set has 39 phonemes, not counting varia for lexical stress. Phoneme Example Translation ------- ------- ----------- AA odd AA D AE at AE T AH hut HH AH T AO ought AO T AW cow K AW AY hide HH AY D B be B IY CH cheese CH IY Z D dee D IY DH thee DH IY EH Ed EH D ER hurt HH ER T EY ate EY T F fee F IY G green G R IY N HH he HH IY IH it IH T IY eat IY T JH gee JH IY K key K IY L lee L IY M me M IY N knee N IY NG ping P IH NG OW oat OW T OY toy T OY P pee P IY R read R IY D S sea S IY SH she SH IY T tea T IY TH theta TH EY T AH UH hood HH UH D UW two T UW V vee V IY W we W IY Y yield Y IY L D Z zee Z IY ZH seizure S IY ZH ER

31

Page 36: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

Two-syllable words (Word Stress) Slide 11 1’ – 2 ma’rry

marry M EH1 R IY0 .

ju’dgment judgment JH AH1 JH M AH0 N T .

la’ter later L EY1 T ER0 .

1 – 2’ abo’ut

about AH0 B AW1 T .

beco’me become B IH0 K AH1 M .

secu’re secure S IH0 K Y UH1 R .

ali’gn align AH0 L AY1 N .

disea’se disease D IH0 Z IY1 Z .

Two-syllable words (Word Stress) Slide 12 1’ – 2` gre’enhou`se

greenhouse G R IY1 N HH AW2 S .

dru’gsto`re drugstore D R AH1 G S T AO2 R .

bla’ckou`t blackout B L AE1 K AW2 T .

i’ncli`ne incline (IH0 N K L AY1 N | IH1 N K L AY0 N) .

32

Page 37: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

1` – 2’ my`se’lf

myself M AY2 S EH1 L F .

hi`mse’lf himself HH IH0 M S EH1 L F .

po`stpo’ne postpone (P OW0 S T P OW1 N | P OW0 S P OW1 N) .

Two-syllable words (Word Stress) Slide 13 1’ – 2` (noun) co’ndu`ct

conduct (K AH0 N D AH1 K T | K AA1 N D AH0 K T) .

su’spe`ct suspect (S AH0 S P EH1 K T | S AH1 S P EH2 K T) .

pe’rmi`t permit (P ER0 M IH1 T | P ER1 M IH2 T) .

re’bel rebel (R EH1 B AH0 L | R IH0 B EH1 L) .

i’nse`rt insert (IH0 N S ER1 T | IH1 N S ER2 T) .

pro’te`st protest (P R OW1 T EH2 S T | P R AH0 T EH1 S T) .

1 – 2’ (verb) condu’ct suspe’ct permi’t rebe’l inse’rt prote’st

33

Page 38: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

Three-syllable words (Word Stress) Slide 14 1’ – 2 – 3` me’mori`ze

memorize M EH1 M ER0 AY2 Z .

ma’gni`fy magnify M AE1 G N AH0 F AY2 .

we’ather va`ne weather vane W EH1 DH ER0 . V EY1 N .

1` – 2 – 3’ pre`matu’re

premature P R IY2 M AH0 CH UH1 R .

a`scerta’in ascertain AE2 S ER0 T EY1 N .

o`verco’me overcome OW1 V ER0 K AH2 M .

u`ndersta’nd understand AH2 N D ER0 S T AE1 N D .

i`ntrodu’ce introduce (IH2 N T R AH0 D UW1 S | IH2 N T R OW0 D UW1 S) .

Four-syllable words (Word Stress) Slide 15 1` – 2 – 3’ – 4 e`duca’tion

education (EH2 JH AH0 K EY1 SH AH0 N | EH2 JH Y UW0 K EY1 SH AH0 N) .

co`ntribu’tion contribution K AA2 N T R AH0 B Y UW1 SH AH0 N .

a`boli’tion abolition AE2 B AH0 L IH1 SH AH0 N .

de`moli’tion demolition D EH2 M AH0 L IH1 SH AH0 N .

si`tua’tion

34

Page 39: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

situation S IH2 CH UW0 EY1 SH AH0 N .

sci`enti’fic scientific S AY2 AH0 N T IH1 F IH0 K .

1’ – 2 – 3` – 4 nego’tia`te

negotiate (N AH0 G OW1 SH IY0 EY2 T | N IH0 G OW1 SH IY0 EY2 T) .

apo’logi`ze apologize AH0 P AA1 L AH0 JH AY2 Z .

ele’ctrify` electrify IH0 L EH1 K T R AH0 F AY2 .

Extra words 1 (Word Stress) Slide 16 a’prico`t

apricot EY1 P R AH0 K AA2 T .

a’mbulato`ry ambulatory AE1 M B Y AH0 L AH0 T AO2 R IY0 .

chi’cken chicken CH IH1 K AH0 N .

a’irwo`rthy airworthy EH1 R W ER2 DH IY0 .

a’irfie`ld airfield EH1 R F IY2 L D .

po’licyho`lder policyholder P AA1 L AH0 S IY0 HH OW2 L D ER0 .

ra’dio` radio R EY1 D IY0 OW2 .

destroy’ destroy D IH0 S T R OY1 .

co’wboy` cowboy K AW1 B OY2 .

35

Page 40: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

abra’sion abrasion AH0 B R EY1 ZH AH0 N .

ca’moufla`ge camouflage K AE1 M AH0 F L AA2 ZH .

Extra words 2 (Word Stress) Slide 17 fla’shpoi`nt

flashpoint F L AE1 SH P OY2 N T .

po’laroi`d polaroid P OW1 L ER0 OY2 D .

the’ta theta TH EY1 T AH0 .

thu’nder thunder TH AH1 N D ER0 .

o’verloo`k overlook OW1 V ER0 L UH2 K .

ca’two`man catwoman K AE1 T W UH2 M AH0 N .

epi’cu`re epicure EH1 P IH0 K Y UH2 R .

sei’zure seizure S IY1 ZH ER0 .

a’zure azure AE1 ZH ER0 .

e’vildo`er evildoer IY1 V AH0 L D UW2 ER0 .

fi’reproo`f fireproof F AY1 ER0 P R UW2 F .

36

Page 41: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

No Vowel Stress 1 - Extra Words Slide 18 ai’rdrop

airdrop EH1 R D R AA0 P .

a’lgol algol AE1 L G AA0 L .

amba’ssador ambassador AE0 M B AE1 S AH0 D ER0 .

anta’cid antacid AE0 N T AE1 S AH0 D .

augme’nt augment AO0 G M EH1 N T .

auto’matio`n automation AO0 T AH0 M EY1 SH AH0 N .

bloo’dhound bloodhound B L AH1 D HH AW0 N D .

co’mpound compound (K AA1 M P AW0 N D | K AH0 M P AW1 N D) .

co’mbine combine (K AA1 M B AY0 N | K AH0 M B AY1 N) .

No Vowel Stress 2 - Extra Words Slide 19 co’mment

comment K AA1 M EH0 N T .

co’mpress compress (K AA1 M P R EH0 S | K AH0 M P R EH1 S) .

disobeye’d disobeyed D IH2 S OW0 B EY1 D .

elli’psoid ellipsoid IH0 L IH1 P S OY0 D .

Ha’noi

37

Page 42: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

Hanoi HH AE1 N OY0 .

co’rrelating correlating K AO1 R AH0 L EY0 T IH0 NG .

de`tainee’ detainee D IY2 T EY0 N IY1 .

hooray’ hooray HH UH0 R EY1 .

jura’ssic jurassic JH UH0 R AE1 S IH0 K .

Statements (Intonation)* Slide 20

• We hi’d it. • We hid it

W IY1 . HH IH1 D . (IH1 T | IH0 T) . • John might se’e us. • John might see us

JH AA1 N . M AY1 T . S IY1 . (AH1 S | Y UW1 EH1 S) . • He just returned from a me’eting. • He just returned from a meeting

HH IY1 . (JH AH1 S T | JH IH0 S T) . (R IH0 T ER1 N D | R IY0 T ER1 N D) . (F R AH1 M | F ER0 M) . (AH0 | EY1) . M IY1 T IH0 NG .

• You should ge’t it. • You should get it

Y UW1 . SH UH1 D . (G EH1 T | G IH1 T) . (IH1 T | IH0 T) . • It was ea’rly. • It was early

(IH1 T | IH0 T) . (W AA1 Z | W AH1 Z | W AH0 Z | W AO1 Z) . ER1 L IY0 . • I’ did. • I did

AY1 . (D IH1 D | D IH0 D) .

• We hi’d. • We hid

W IY1 . HH IH1 D .

38

Page 43: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

• John might se’e. • John might see

JH AA1 N . M AY1 T . S IY1 . • He just returned from a tri’p. • He just returned from a trip

HH IY1 . (JH AH1 S T | JH IH0 S T) . (R IH0 T ER1 N D | R IY0 T ER1 N D) . (F R AH1 M | F ER0 M) . (AH0 | EY1) . T R IH1 P .

• He was la’te. • He was late

HH IY1 . (W AA1 Z | W AH1 Z | W AH0 Z | W AO1 Z) . L EY1 T . Wh… Questions (Intonation) Slide 21

• Where is my su’itcase? • Where is my suitcase

(W EH1 R | HH W EH1 R) . (IH1 Z | IH0 Z) . M AY1 . S UW1 T K EY2 S . • Who is his bro’ther? • Who is his brother

HH UW1 . (IH1 Z | IH0 Z) . (HH IH1 Z | HH IH0 Z) . B R AH1 DH ER0 . • When’s the pa’rty? • When is the party

(W EH1 N | HH W EH1 N | W IH1 N | HH W IH1 N) . (IH1 Z | IH0 Z) . (DH AH0 | DH AH1 | DH IY0) . P AA1 R T IY0 .

• Who’s the o’wner? • Who is the owner

HH UW1 . (IH1 Z | IH0 Z) . (DH AH0 | DH AH1 | DH IY0) . OW1 N ER0 .

• Where is my su’it? • Where is my suit

(W EH1 R | HH W EH1 R) . (IH1 Z | IH0 Z) . M AY1 . S UW1 T . • Who is he’? • Who is he

HH UW1 . (IH1 Z | IH0 Z) . HH IY1 . • When’s the da’nce? • When is the dance

(W EH1 N | HH W EH1 N | W IH1 N | HH W IH1 N) . (IH1 Z | IH0 Z) . (DH AH0 | DH AH1 | DH IY0) . D AE1 N S .

• Where is my co’at? • Where is my coat

(W EH1 R | HH W EH1 R) . (IH1 Z | IH0 Z) . M AY1 . K OW1 T . • Who just came i’n?

39

Page 44: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

• Who just came in HH UW1 . (JH AH1 S T | JH IH0 S T) . K EY1 M . (IH0 N | IH1 N) .

Yes-no Questions (Intonation) Slide 22

• Did they co’me? • Did they come

(D IH1 D | D IH0 D) . DH EY1 . K AH1 M . • Can you go wi’th us? • Can you go with us

(K AE1 N | K AH0 N) . Y UW1 . G OW1 . (W IH1 DH | W IH1 TH | W IH0 TH | W IH0 DH) . (AH1 S | Y UW1 EH1 S) .

• Are you a’ngry? • Are you angry

(AA1 R | ER0) . Y UW1 . AE1 NG G R IY0 . • Have you met Bo’b? • Have you met Bob

HH AE1 V . Y UW1 . M EH1 T . B AA1 B . • Is that Su’san? • Is that Susan

(IH1 Z | IH0 Z) . (DH AE1 T | DH AH0 T) . S UW1 Z AH0 N . Tag Questions (Intonation) Slide 23

You don’t want to go’, do’ you?

You don’t want to go, do you Y UW1 . D AA1 N . ? . T IY1 . (W AA1 N T | W AO1 N T) . (T UW1 | T IH0 | T

AH0) . G OW1 . ? . D UW1 . Y UW1 .

You don’t want to go’, do’ you?

You don’t want to go, do you Y UW1 . D AA1 N . ? . T IY1 . (W AA1 N T | W AO1 N T) . (T UW1 | T IH0 | T

AH0) . G OW1 . ? . D UW1 . Y UW1 .

They haven’t se’en him, ha’ve they? They haven’t seen him, have they

DH EY1 . HH EY1 V AH0 N . ? . T IY1 . S IY1 N . (HH IH1 M | IH0 M) . ? . HH AE1 V . DH EY1 .

40

Page 45: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

They haven’t se’en him, ha’ve they? They haven’t seen him, have they

DH EY1 . HH EY1 V AH0 N . ? . T IY1 . S IY1 N . (HH IH1 M | IH0 M) . ? . HH AE1 V . DH EY1 . Vowel Length Prosody* Slide 24

• faze face • faze

F EY1 Z . • face

F EY1 S . • mat mass • mat

M AE1 T . • mass

M AE1 S . • “H” age • H

EY1 CH . • age

EY1 JH . • maid mate • maid

M EY1 D . • mate

M EY1 T . • match Madge • match

M AE1 CH . • Madge

M AE1 JH . • seed seat • seed

S IY1 D . • seat

S IY1 T . • bead beat • bead

B IY1 D . • beat

B IY1 T . • kiss kit • kiss

K IH1 S .

41

Page 46: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

• kit K IH1 T .

• bit bid • bit

B IH1 T . • bid

B IH1 D . • Bert bird • Bert

B ER1 T . • bird

B ER1 D . • glo lock • log

L AO1 G . • lock

L AA1 K . • mop mob • mop

M AA1 P . • mob

M AA1 B . • moot moose • moot

M UW1 T . • moose

M UW1 S . • it suedsu • suit

S UW1 T . • sued

S UW1 D . • buck bug • buck

B AH1 K . • bug

B AH1 G .

Sentence Stress Prosody* Slide 25

ls.• The bo’y bu’ilds ma’ny mo’de • The boy builds many models

(DH AH0 | DH AH1 | DH IY0) . B OY1 . B IH1 L D Z . M EH1 N IY0 . M A

• TA1 D AH0 L Z . he bo’ y is interested in constru’cting mo’dels.

42

Page 47: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

• The boy is interested in constructing models (DH AH0 | DH AH1 | DH IY0) . B OY1 . (IH

AH0 D | IH1 N T R IH0 S T IH0 D | IH1 N T ER0 AH0 S T AH0 D | IH1 N T ER0 IH0 S T IH0 D) . (IH0 N | IH1 N) . K AH0 N S T R AH1 K T IH0 NG . M AA1 D AH0 L Z .

1 Z | IH0 Z) . (IH1 N T R AH0 S T

Tw kle, Twinkle Little Star* (Rhythm) Slide 26in

winkle, twinkle, little star, T

Twinkle, twinkle, little star, T W IH1 NG K AH0 L . ? . T

How

W IH1 NG K AH0 L . ? . L IH1 T AH0 L . S T AA1 R . ? .

I wonder what you are. How I wonder what you are. HH AW1 . AY1 . W AH1 N D

R0) . ? . Up above t

ER0 . (W AH1 T | HH W AH1 T) . Y UW1 . (AA1 R | E

he world so high, Up above the world so high, AH1 P . AH0 B AH1 V . (DHY1 . ? .

Like a dia

AH0 | DH AH1 | DH IY0) . W ER1 L D . S OW1 . HH A

mond in the sky. Like a diamond in the sky. L AY1 K . (AH0 | EY1) . D H IY0) . S K AY1 . ? .

Twinkle, twinkle, little star

AY1 M AH0 N D . (IH0 N | IH1 N) . (DH AH0 | DH AH1 | D

, Twinkle, twinkle, little star, T W IH1 NG K AH0 L . ? . T

How

W IH1 NG K AH0 L . ? . L IH1 T AH0 L . S T AA1 R . ? .

I wonder what you are! How I wonder what you are. HH AW1 . AY1 . W AH1 N D

R0) . ? .

ER0 . (W AH1 T | HH W AH1 T) . Y UW1 . (AA1 R | E

lide 27

hen the blazing sun is gone,

S W

When the blazing sun is gone, (W EH1 N | HH W EH1 N | W I L EY1 Z IH0 NG . S AH1 N . (IH1 Z | IH0 Z) . G AO1 N . ? .

When he nothing shines upon,

H1 N | HH W IH1 N) . (DH AH0 | DH AH1 | DH IY0) . B

When he nothing shines upon, (W EH1 N | HH W EH1 N | W I AY1 N Z . AH0 P AA1 N . ? .

Then you show your little light,

H1 N | HH W IH1 N) . HH IY1 . N AH1 TH IH0 NG . SH

43

Page 48: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

Then you show your little light, DH EH1 N . Y UW1 . SH OW1 . (Y AO1 R | Y UH1 R) . L IH1 T AH0 L . L AY1 T .

? . Twinkle, twinkle, all the night.

Twinkle, twinkle, all the night. T W IH1 NG K AH0 L . ? . T W IH1 NG K AH0 L . ? . AO1 L . (DH AH0 | DH AH1 |

DH IY0) . N AY1 T . ? . Twinkle, twinkle, little star,

Twinkle, twinkle, little star, T W IH1 NG K AH0 L . ? . T W IH1 NG K AH0 L . ? . L IH1 T AH0 L . S T AA1 R .

? . How I wonder what you are!

How I wonder what you are. HH AW1 . AY1 . W AH1 N D ER0 . (W AH1 T | HH W AH1 T) . Y UW1 . (AA1 R |

ER0) . ? .

44

Page 49: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

APPENDIX B – LABEL AND HIERARCHY FILES

Phoneme Timing File – a01.lab signal a01 nfields 1 # 2.137369 125 H# 2.234768 125 m 2.301221 125 eh 2.435000 125 r 2.575000 125 iy 4.303340 125 pau 4.354888 125 jh 4.473623 125 ah 4.573807 125 jh 4.599780 125 m 4.683267 125 ah 4.745000 125 n 4.910000 125 t 6.328551 125 pau 6.421517 125 l 6.552619 125 ey 6.597144 125 t 6.735000 125 er 8.221845 125 pau 8.282010 125 ah 8.385000 125 b 8.540329 125 aw 8.609592 125 t 10.121434 125 pau 10.160000 125 b 10.210000 125 ih 10.328170 125 k 10.509793 125 ah 10.594078 125 m 11.969260 125 pau 12.085000 125 s 12.140000 125 ih 12.290000 125 k 12.350000 125 y 12.455000 125 uh 12.540000 125 r 13.784136 125 pau 13.850000 125 ah 13.945000 125 l 14.150000 125 ay 14.308383 125 n 15.563582 125 pau 15.592183 125 d 15.663299 125 ih 15.762243 125 z 16.042069 125 iy 16.230000 125 z 18.395000 125 pau

45

Page 50: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

Phoneme Hierarchical Labels File – a01.hlb **EMU hierarchical labels** 183 Utterance Utterance 6 marry 10 pau 26 judgment 32 pau 34 later 144 pau 150 about 145 pau 45 become 158 pau 33 secure 27 pau 25 align 154 pau 7 disease 151 pau Phrase Phrase 2 marry 15 pau 16 judgment 19 44 pau 141 later 146 pau 147 about 153 pau 157 become 163 pau 164 secure 172 pau 173 align 177 pau 178 disease 183 pau Word Word Accent Text 1 w n marry 8 pau pau 9 w n judgment 31 pau pau 36 w n later 37 pau pau 142 w n about 143 pau pau 148 w n become

46

Page 51: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

149 pau pau 155 w n secure 156 pau pau 160 w n align 161 pau pau 166 w n disease 167 pau pau Syllable Syllable Pitch_Accent 0 mar y 3 ry n 4 pau 5 judg y 11 ment n 12 pau 13 lat y 14 er n 17 pau 18 a n 21 bout y 22 pau 23 be n 24 come y 28 pau 29 se n 30 cure y 35 pau 38 a n 39 lign y 40 pau 41 dis n 43 ease y 46 pau Phoneme Phoneme 47 M 48 EH1 49 R 50 IY0 51 pau 52 JH 53 AH1 54 JH 55 M 56 AH0 57 N 58 T 59 pau 60 L 61 EY1 62 T 63 ER0 64 pau 65 AH0 66 B 67 AW1 68 T

47

Page 52: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

69 pau 70 B 71 IH0 72 K 73 AH1 74 M 75 pau 76 S 77 IH0 78 K 79 Y 80 UH1 81 R 82 pau 83 AH0 84 L 85 AY1 86 N 87 pau 88 D 89 IH0 90 Z 91 IY1 92 Z 93 pau Phonetic Phonetic 94 m 95 eh 96 r 97 iy 98 pau 99 jh 100 ah 101 jh 102 m 103 ah 104 n 105 t 106 pau 107 l 108 ey 109 t 110 er 111 pau 112 ah 113 b 114 aw 115 t 116 pau 117 b 118 ih 119 k 120 ah 121 m 122 pau 123 s

48

Page 53: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

124 ih 125 k 126 y 127 uh 128 r 129 pau 130 ah 131 l 132 ay 133 n 134 pau 135 d 136 ih 137 z 138 iy 139 z 140 pau 0 47 48 49 94 95 96 1 0 3 47 48 49 50 94 95 96 97 2 0 1 3 47 48 49 50 94 95 96 97 3 50 97 4 51 98 5 52 53 54 99 100 101 6 0 1 2 3 47 48 49 50 94 95 96 97 7 41 43 88 89 90 91 92 135 136 137 138 139 166 178 8 4 51 98 9 5 11 52 53 54 55 56 57 58 99 100 101 102 103 104 105 10 4 8 15 51 98 11 55 56 57 58 102 103 104 105 12 59 106 13 60 61 62 107 108 109 14 63 110 15 4 8 51 98 16 5 9 11 52 53 54 55 56 57 58 99 100 101 102 103 104 105 17 64 111 18 65 112 19 21 66 67 68 113 114 115 22 69 116 23 70 71 117 118 24 72 73 74 119 120 121 25 38 39 83 84 85 86 130 131 132 133 160 173 26 5 9 11 16 52 53 54 55 56 57 58 99 100 101 102 103 104 105 27 35 82 129 156 172 28 75 122 29 76 77 123 124 30 78 79 80 81 125 126 127 128 31 12 59 106 32 12 31 44 59 106 33 29 30 76 77 78 79 80 81 123 124 125 126 127 128 155 164 34 13 14 36 60 61 62 63 107 108 109 110 141 35 82 129 36 13 14 60 61 62 63 107 108 109 110 37 17 64 111 38 83 130

49

Page 54: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

39 84 85 86 131 132 133 40 87 134 41 88 89 90 135 136 137 43 91 92 138 139 44 12 31 59 106 45 23 24 70 71 72 73 74 117 118 119 120 121 148 157 46 93 140 47 94 48 95 49 96 50 97 51 98 52 99 53 100 54 101 55 102 56 103 57 104 58 105 59 106 60 107 61 108 62 109 63 110 64 111 65 112 66 113 67 114 68 115 69 116 70 117 71 118 72 119 73 120 74 121 75 122 76 123 77 124 78 125 79 126 80 127 81 128 82 129 83 130 84 131 85 132 86 133 87 134 88 135 89 136 90 137 91 138 92 139 93 140 94 95 96

50

Page 55: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 13 14 36 60 61 62 63 107 108 109 110 142 18 21 65 66 67 68 112 113 114 115 143 22 69 116 144 17 37 64 111 146 145 22 69 116 143 153 146 17 37 64 111 147 18 21 65 66 67 68 112 113 114 115 142 148 23 24 70 71 72 73 74 117 118 119 120 121 149 28 75 122 150 18 21 65 66 67 68 112 113 114 115 142 147 151 46 93 140 167 183 153 22 69 116 143 154 40 87 134 161 177

51

Page 56: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

155 29 30 76 77 78 79 80 81 123 124 125 126 127 128 156 35 82 129 157 23 24 70 71 72 73 74 117 118 119 120 121 148 158 28 75 122 149 163 160 38 39 83 84 85 86 130 131 132 133 161 40 87 134 163 28 75 122 149 164 29 30 76 77 78 79 80 81 123 124 125 126 127 128 155 166 41 43 88 89 90 91 92 135 136 137 138 139 167 46 93 140 172 35 82 129 156 173 38 39 83 84 85 86 130 131 132 133 160 177 40 87 134 161 178 41 43 88 89 90 91 92 135 136 137 138 139 166 183 46 93 140 167 0

52

Page 57: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

APPENDIX C – ALGORITHM CODE

//Talking Head model code and MoCap.C was written by Dr. Scott King. //Stanley Leja modified the code and created new code to extract facial model parameters. //Some of the modified code and all of the newly created code are shown below. //This subroutine was modified to control the estimation of parameters on a frame by frame basis. Void MoCapGUICB(Widget w, XtPointer clientData, XtPointer user) { CGCBData *cbdata = (CGCBData *) clientData; TalkingHead *th = ((TalkingHead *)(cbdata->obj)); int which = cbdata->WhichScale; // cerr << "MoCapGUICB which " << which << endl; switch (which) { case -1: int Val; XtVaGetValues(w, XmNvalue, &Val, NULL); cbdata->GUI->setFrame((int) (Val * .0020 * cbdata->GUI->getMaxFrame())); break; case 1: cbdata->GUI->setFrame(cbdata->GUI->getFrame() - 1); break; case 2: cbdata->GUI->setFrame(cbdata->GUI->getFrame() + 1); //changed by Stan break; case 3: // Prev Phoneme //cbdata->GUI->setTime(cbdata->GUI->getTime() - 1/100.0); break; case 4: // Next Phoneme //cbdata->GUI->setTime(cbdata->GUI->getTime() + 1/100.0); break; default: cerr << "default which is " << which << "!\n"; break; } int f = cbdata->GUI->getFrame(); float t = ((float) f)/ 120.0; th->SetShowKeysTime(t); th->DisplayMoCapFrame(f);

53

Page 58: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

cbdata->GUI->setTime(t); // cerr << "setting time to " << t << " and max is " << cbdata->GUI->getTime() << endl; cerr << "frame " << f << " \t Before estimating total markerError is " << th->markerError(f) << endl; if (th->estimatingParams) { // cerr << "We are estimating params\n"; th->estimateParamsFromMocap(f, t); //cerr << "\nframe = " << f << "\tAfter estimating makerError is " << th->markerError(f) << endl; } } //This subroutine was used to provide the initial estimate for the Jaw Open parameter. float TalkingHead::estimateJawParamFromMocap(int f) { float d, v; static int StartFrame = 0; // 30 for some datasets! int ChinMarker; switch (_MoCapData->numMarkers) { case 31: case 32: ChinMarker = 1; break; case 73: // Same as 74 only no MNOSE (it fell off case 74: ChinMarker = 31; // It is the 32nd marker break; case 75: ChinMarker = 31; // It is the 32nd marker break; case 90: ChinMarker = 78; // It is the 32nd marker break; default: ChinMarker = 1; // This will cause an error regardless. break; } d = (_MoCapData->Data[f][ChinMarker] - _MoCapData->Data[StartFrame][ChinMarker]).length(); // Experimentation is about 76 is the max. v = d/76.0; // Should give us between 0 and 1; if (v < 0) v = 0.0;

54

Page 59: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

if (v > 1) v = 1.0; return(v); } //This subroutine was used to provide the initial estimate for the OrbOris parameter. float TalkingHead::estimateOrbOrisParamFromMocap(int f, float pJaw) { float d, v; static int StartFrame = 0; // 30 for some datasets! int LMouthMarker, RMouthMarker; switch (_MoCapData->numMarkers) { case 31: case 32: // Mouth markers are // 13 LMTH:X LMTH:Y LMTH:Z // 16 LULP:X LULP:Y LULP:Z // 30 RULP:X RULP:Y RULP:Z // 27 RMTH:X RMTH:Y RMTH:Z // 24 RLLP:X RLLP:Y RLLP:Z // 10 LLLP:X LLLP:Y LLLP:Z LMouthMarker = 13; RMouthMarker = 27; break; case 75: // jig1:LOH,,,jig1:LIH,,,jig1:RMH,,,jig1:ROH,,,jig1:UPV,,, // jig1:MIDV,,,jig1:LOWV,,,sk:RHAIR,,,sk:HAIR,,,sk:LHAIR,,, // sk:RFOR,,,sk:LFOR,,,sk:FORE,,,sk:BRDG,,,sk:NOSE,,, 10 // sk:RNOSE,,,sk:MNOSE,,,sk:LNOSE,,,sk:LMTH,,,sk:LIUL,,, // sk:LOUL,,,sk:MUL,,,sk:ROUL,,,sk:RIUL,,,sk:RMTH,,, 20 // sk:RILL,,,sk:ROLL,,,sk:MLL,,,sk:LOLL,,,sk:LILL,,, // sk:RCHN,,,sk:CHIN,,,sk:LCHN,,,sk:RTMP,,,sk:RSID,,, 30 // sk:REAR,,,sk:ROBW,,,sk:RMBW,,,sk:RIBW,,,sk:RUEYE,,, // sk:RIEYE,,,sk:REYE,,,sk:RUCB,,,sk:ROCB,,,sk:RMCB,,, 40 // sk:RICB,,,sk:ROCK,,,sk:RMCK,,,sk:RICK,,,sk:RLJO,,, // sk:RUJO,,,sk:RUJI,,,sk:RLJI,,,sk:RLJM,,,sk:LTMP,,, 50 // sk:LSID,,,sk:LEAR,,,sk:LOBW,,,sk:LMBW,,,sk:LIBW,,, // sk:LUEYE,,,sk:LIEYE,,,sk:LEYE,,,sk:LUCB,,,sk:LOCB,,, 60 // sk:LMCB,,,sk:LICB,,,sk:LOCK,,,sk:LMCK,,,sk:LICK,,, // sk:LLJO,,,sk:LUJO,,,sk:LUJI,,,sk:LLJI,,,sk:LLJM 70 LMouthMarker = 18; RMouthMarker = 24; case 90: // For 2005 LMouthMarker = 65; // ?? is this right RMouthMarker = 59; //LMouthMarker = 66; //RMouthMarker = 60;

55

Page 60: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

/* This table gives a listing of all the MoCap marker location names: 0 placeholder:CEN0 placeholder:CEN1 placeholder:CEN2 placeholder:R1 4 placeholder:L1 placeholder:L2 SKfaceNEW:HDCENHI SKfaceNEW:HDCENLOW 8 SKfaceNEW:HDRTUP SKfaceNEW:HDRTMID SKfaceNEW:HDRTBTM SKfaceNEW:HDLTUP 12 SKfaceNEW:HDLTMID SKfaceNEW:HDLTBTM SKfaceNEW:BROWCENHI SKfaceNEW:BROWCENLOW 16 SKfaceNEW:BROWRTOUT SKfaceNEW:BROWRTIN SKfaceNEW:BROWLTIN SKfaceNEW:BROWLTOUT 20 SKface:RTEYEBROWOUT SKfaceNEW:RTEYEBROW3 SKfaceNEW:RTEYEBROW2 SKfaceNEW:RTEYEBROW1 24 SKfaceN:RTEYEBROWIN SKfaceNEW:LTEYEBROWIN SKfaceNEW:LTEYEBROW1 SKfaceNEW:LTEYEBROW2 28 SKfaceN:LTEYEBROW3 SKfaceNEW:LTEYEBROWOUT SKfaceNEW:RTEYELIDTOP SKfaceNEW:RTEYELIDBTMOUT 32 SKfac:RTEYELIDBTMIN SKfaceNEW:LTEYELIDTOPIN SKface:LTEYELIDTOPOUTSKfaceNEW:LTEYELIDBTMOUT 36 SKfa:LTEYELIDBTMIN SKfaceNEW:BRIDGE SKfaceNEW:TIPNOSE SKfaceNEW:RTNOSTRIL 40 SKfaceNE:LTNOSTRIL SKfaceNEW:UNDERNOSERT SKfaceNEW:UNDERNOSECEN SKfaceNEW:UNDERNOSELT 44 SKface:RTUPCHEEKUP SKface:RTUPCHEEKFRONT SKfaceNEW:RTUPCHEEKBACK SKfaceNEW:RTUPCHEEKLOW 48 SK:RTLOWCHEEKFRONT SKfaceNEW:RTLOWCHEEKMID SKface:RTLOWCHEEKBACK SKfaceNEW:LTUPCHEEKUP 52 SK:LTUPCHEEKFRONT SKfaceNEW:LTUPCHEEKBACK SKfaceNEW:LTUPCHEEKLOW SKfaceNEW:LTLOWCHEEKFRONT 56 SKfa:LTLOWCHEEKMID SKface:LTLOWCHEEKBACK SKface:RTMOUTHCORNERLOW SKfaceNEW:(RTMOUTHCORNER) 60 SK:RTMOUTHCORNERHI SKfaceNEW:UPMOUTHRT SKfaceNEW:UPMOUTHCEN SKfaceNEW:UPMOUTHLT 64 SK:LTMOUTHCORNERHI SKface:(LTMOUTHCORNER) SKface:LTMOUTHCORNERLOW SKfaceNEW:DOWNMOUTHLT 68 SKfac:DOWNMOUTHCEN SKfaceNEW:DOWNMOUTHRT SKfaceNEW:CHINRT SKfaceNEW:CHINHI 72 SKfaceNEW:CHINLOW SKfaceNEW:CHINLT SKfaceNEW:RTJAWOUT SKfaceNEW:RTJAW2 76 SKfaceNEW:RTJAW1 SKfaceNEW:RTJAWIN SKfaceNEW:CENJAW SKfaceNEW:LTJAWIN 80 SKfaceNEW:LTJAW1 SKfaceNEW:LTJAW2 SKfaceNEW:LTJAWOUT SKfaceNEW:NECKRTBACK 84 SKfaceNE:NECKRTMIDSKfaceNEW:NECKRTFRONT SKfaceNEW:NECKCEN SKfaceNEW:NECKLTFRONT 88 SKfaceNE:NECKLTMID SKfaceNEW:NECKLTBACK

56

Page 61: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

*/ } if ((_MoCapData->numMarkers == 31) || (_MoCapData->numMarkers == 32)) { SbVec3f Dir = (_MoCapData->Data[f][LMouthMarker] - _MoCapData->Data[f][RMouthMarker]); d = (_MoCapData->Data[f][LMouthMarker] - _MoCapData->Data[f][RMouthMarker]).length(); if (d > 57) v = 0; else { // cerr << "v = (d - 57 - 18*pJaw))/15 = " << " (" << d << " - 57 - 18*" << pJaw << "))/15" << " 57-18*pJaw is " << 57-18*pJaw << endl; v = ((57.0 - 18.0*pJaw)-d)/15.0; } // Should give us between 0 and 1; if (v < 0) v = 0.0; if (v > 1) v = 1.0; //cerr << "For frame:" << f << " d: " << d << " Dir:" << PrintVec(Dir) //<< " orb_oris est ~= " << v << endl; return(v); } return(0); } //Written by Stanley Leja //The error between the measured markers and the virtual markers from TalkingHead is calculated here. // markerError is a member of the TalkingHead class for simplicity // It can be a convenience function or a member of some new Processing MoCap // class. The two marker set locations are passed in and used here. float TalkingHead:: markerError(int frame) { float error = 0; float squaredError = 0; float d; int i, N = _VirtualMarkers->point.getNum(); for (i = 39; i < N - 7; i++) { // starts with 39 because mouth muscles only effect markers 39 and up & deletes neck markers 83-89 //cerr << "_MoCapData->Data[f= " << frame << "][i= " << i << "] " << CerrPt(_MoCapData->Data[frame][i]) << endl; //cerr << "_VirtualMarkers->point[i]" << CerrPt(_VirtualMarkers->point[i]) << endl;

57

Page 62: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

//cerr << "VirtualMarkerLocationsXfm[i]->translation.getValue()" << CerrPt(VirtualMarkerLocationsXfm[i]->translation.getValue()) << endl; // d = (_MoCapData->Data[frame][i] - (_VirtualMarkers->point[i] + VirtualMarkerLocationsXfm[i]->translation.getValue())).length(); // d = (_MoCapData->Data[frame][i] - _VirtualMarkers->point[i]).length(); // Import markers for the 2005 dataset // 78 bottom of chin // 68 middle of lower lip // 72 top of chin // 71 mid chin d = (_MoCapData->Data[frame][i] - VirtualMarkerLocationsXfm[i]->translation.getValue()).length(); //if (i == 78 || i == 68 || i == 72 || i == 71) //changed by Stan //cerr << "marker " << i << " frame " << frame << " d is " << d <<endl; //changed by Stan error += d; // squaredError += d*d; // weightedError += d*w[i]; // w[i] is some weights for the errors // You can use a SoMFFloat, or just and array // w can be passed in or made a member of the TalkingHead class // or the MoCap class, or have a new class for doing this. } return error; // or squared error; } Written by Stanley Leja /* This subroutine estimates 17 parameters which produce the phoneme-viseme set. It is based on a search of realistic values of each of the parameters in a prioritized order of importance. Once the higher priority parameters are locally searched they are used as given in the remaining local search processes to reduce computation time. The parameters were searched in the following order: k0, k1; k3, k12, k14; k4, k5, k6; k7, k15, k16; k11, k17, k18; and k8, k9, k10. k0 = OPEN_JAW - opens jaw k1 = JAW_IN - moves jaw in k2 = JAW_SIDE – does not work k3 = ORB_ORIS - contracts lips, making mouth opening smaller k4 = L_RIS Left Risorius - moves left corner towards ear k5 = R_RIS Right Risorius - moves right corner towards ear k6 = L_PLATYSMA Left Platysma - moves left corner downward and lateral k7 = R_PLATYSMA Right Platysma - moves right corner downward and lateral k8 = L_ZYG Left Zygomaticus - raises corner up and lateral

58

Page 63: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

k9 = R_ZYG Right Zygomaticus - raises corner up and lateral k10 = L_LEV_SUP - moves left top lip up k11 = R_LEV_SUP - moves right top lip up k12 = DEP_INF - opens both lips k13 = DEP_ORIS - does not work k14 = MENTALIS - pulls lips together k15 = L_BUCCINATOR - pulls back at left corner k16 = R_BUCCINATOR - pulls back at right corner k17 = INCISIVE_SUP - top lip rolls over bottom lip k18 = INCISIVE_INF - bottom lip rolls over top lip */ void TalkingHead::estimateParamsFromMocap(int f, float t) { startFunc("\nestimateParamsFromMoCap"); float pJaw, pOrb, k0 = 0.0, k1 = 0.0, k3 = 0.0, k12 = 0.0, k14 = 0.0, markerErrorStart = 10000.0, markerErrorNew = 9999.0; float k4 = 0.0, k5 = 0.0, k6 = 0.0, k7 = 0.0, k8 = 0.0, k9 = 0.0, k10 = 0.0, k11 = 0.0, k15 = 0.0, k16 = 0.0, k17 = 0.0, k18 = 0.0; #define STUFF 1 #if STUFF SbVec3f localOptVM_Data[100]; #endif _MoCapData->MoCapToVM_Error[f][7] = markerError(f); // establishes marker error without estimation //cerr << "time = " << t << endl; _MoCapData->MoCapToVM_Value[f][19] = t; //storing time of frame for (int i = 0; i < 19; i++) { _LipModel->setParameter(i, 0.0); } duringFunc("\nAbout to do 0 & 1"); // now doing 0 & 1 for (int i = 0; i < 10; i++) { k0 = (float) i/10; _LipModel->setParameter(0, k0); for (int k = 0; k < 3; k++) { if (k == 0) k1 = -0.20; if (k == 1) k1 = 0.00; if (k == 2) k1 = 0.20; _LipModel->setParameter(1, k1);

59

Page 64: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

//needed to adjust prior to deforming characteristic points and markers SetMandible(_LipModel->getParameter(_LIP_PARAM_OPEN_JAW)*40, - _LipModel->getParameter(_LIP_PARAM_JAW_SIDE) * _LipModel->getDelta(_LIP_PARAM_JAW_SIDE)[0], _LipModel->getParameter(_LIP_PARAM_JAW_IN) * _LipModel->getDelta(_LIP_PARAM_JAW_IN)[2]); DeformCharacteristicPoints(); DeformMarkers(); markerErrorNew = markerError(f); if (markerErrorNew < markerErrorStart) { //cerr << "\n*******markerErrorNew = " << markerErrorNew << "\tmarkerErrorStart = " << markerErrorStart << "\n"; markerErrorStart = markerErrorNew; _MoCapData->MoCapToVM_Value[f][0] = k0; //storing best parameters _MoCapData->MoCapToVM_Value[f][1] = k1; #if STUFF for (int iii = 39; iii < 83; iii++) { localOptVM_Data[iii] = VirtualMarkerLocationsXfm[iii]->translation.getValue(); /* if (iii == 78 || iii == 68 || iii == 72 || iii == 71) { cerr << "VirtualMarkerLocationsXfm[" << iii << "]->translation.getValue()" << CerrPt(VirtualMarkerLocationsXfm[iii]->translation.getValue()) << endl; cerr << "localOptVM_Data[iii] = " << CerrPt(localOptVM_Data[iii]) << endl; } */ } #endif } } // k1 } // k0 // develops comparison data from old estimation process pJaw = estimateJawParamFromMocap(f); _LipModel->setParameter(_LIP_PARAM_OPEN_JAW, pJaw); pOrb = estimateOrbOrisParamFromMocap(f, pJaw); _LipModel->setParameter(_LIP_PARAM_ORB_ORIS, pOrb);

60

Page 65: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

cerr << "The old estimated parameters are pJaw: " << pJaw << "\tpOrb: " << pOrb << endl; SetMandible(_LipModel->getParameter(_LIP_PARAM_OPEN_JAW)*40, - _LipModel->getParameter(_LIP_PARAM_JAW_SIDE) * _LipModel->getDelta(_LIP_PARAM_JAW_SIDE)[0], _LipModel->getParameter(_LIP_PARAM_JAW_IN) * _LipModel->getDelta(_LIP_PARAM_JAW_IN)[2] ); #if STUFF for (int ii = 39; ii < 83; ii++) { VirtualMarkerLocationsXfm[ii]->translation = localOptVM_Data[ii]; } #endif DeformCharacteristicPoints(); DeformMarkers(); _MoCapData->MoCapToVM_Error[f][0] = markerError(f); cerr << "\tmarkerErrorAfterDeformFace Using Old Estimates = " << markerError(f) << "\n\n"; // sets parameters to values from new estimation process // develops comparison from new process for (int i = 0; i < 19; i++) { _LipModel->setParameter(i, _MoCapData->MoCapToVM_Value[f][i]); if (_MoCapData->MoCapToVM_Value[f][i] != 0.0) cerr << " MoCapToVM_Value[" << f << "][" << i << "] = " << _MoCapData->MoCapToVM_Value[f][i] << "\n"; } _MoCapData->MoCapToVM_Error[f][1] = markerErrorStart; // use [1], etc., if decide to make more pass(es) for error reduction cerr << " MoCapToVM_Error[" << f << "][1] = " << markerErrorStart << "\n"; SetMandible(_LipModel->getParameter(_LIP_PARAM_OPEN_JAW)*40, - _LipModel->getParameter(_LIP_PARAM_JAW_SIDE) * _LipModel->getDelta(_LIP_PARAM_JAW_SIDE)[0], _LipModel->getParameter(_LIP_PARAM_JAW_IN) * _LipModel->getDelta(_LIP_PARAM_JAW_IN)[2] ); DeformCharacteristicPoints(); DeformMarkers();

61

Page 66: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

#if STUFF for (int ii = 39; ii < 83; ii++) { VirtualMarkerLocationsXfm[ii]->translation = localOptVM_Data[ii]; } #endif DeformCharacteristicPoints(); DeformMarkers(); duringFunc("\nAbout to do 3,12,14"); // now doing 3, 12, & 14 k0 = _MoCapData->MoCapToVM_Value[f][0]; k1 = _MoCapData->MoCapToVM_Value[f][1]; //cerr << "***k0 = " << k0 << "\tk1 = " << k1 << "\tk3 = " << k3 << "\tk12 = " << k12 << "\tk14 = " << k14 << endl; for (int k = 0; k < 5; k++) { if (k == 0) k3 = 0.0; if (k == 1) k3 = 0.2; if (k == 2) k3 = 0.4; if (k == 3) k3 = 0.6; if (k == 4) k3 = 0.8; _LipModel->setParameter(3, k3); for (int k = 0; k < 10; k++) { k12 = (float) k/10; _LipModel->setParameter(12, k12); for (int k = 0; k < 5; k++) { if (k == 0) k14 = 0.0; if (k == 1) k14 = 0.2; if (k == 2) k14 = 0.4; if (k == 3) k14 = 0.6; if (k == 4) k14 = 0.8; _LipModel->setParameter(14, k14); SetMandible(_LipModel->getParameter(_LIP_PARAM_OPEN_JAW)*40, - _LipModel->getParameter(_LIP_PARAM_JAW_SIDE) * _LipModel->getDelta(_LIP_PARAM_JAW_SIDE)[0], _LipModel->getParameter(_LIP_PARAM_JAW_IN) *

62

Page 67: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

_LipModel->getDelta(_LIP_PARAM_JAW_IN)[2]); DeformCharacteristicPoints(); DeformMarkers(); markerErrorNew = markerError(f); if (markerErrorNew < markerErrorStart) { markerErrorStart = markerErrorNew; _MoCapData->MoCapToVM_Value[f][3] = k3; _MoCapData->MoCapToVM_Value[f][12] = k12; _MoCapData->MoCapToVM_Value[f][14] = k14; #if STUFF for (int iii = 39; iii < 83; iii++) { localOptVM_Data[iii] = VirtualMarkerLocationsXfm[iii]->translation.getValue(); } #endif } } //k14 } //k12 } //k3 // sets parameters to values from new estimation process // develops comparison from new process for (int i = 0; i < 19; i++) { _LipModel->setParameter(i, _MoCapData->MoCapToVM_Value[f][i]); //cerr << _MoCapData->MoCapToVM_Value[f][i]; if (_MoCapData->MoCapToVM_Value[f][i] != 0.0) cerr << " MoCapToVM_Value[" << f << "][" << i << "] = " << _MoCapData->MoCapToVM_Value[f][i] << "\n"; } _MoCapData->MoCapToVM_Error[f][2] = markerErrorStart; // use [1], etc., if decide to make more pass(es) for error reduction cerr << " MoCapToVM_Error[" << f << "][2] = " << markerErrorStart << "\n"; SetMandible(_LipModel->getParameter(_LIP_PARAM_OPEN_JAW)*40, - _LipModel->getParameter(_LIP_PARAM_JAW_SIDE) * _LipModel->getDelta(_LIP_PARAM_JAW_SIDE)[0], _LipModel->getParameter(_LIP_PARAM_JAW_IN) * _LipModel->getDelta(_LIP_PARAM_JAW_IN)[2] );

63

Page 68: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

#if STUFF for (int ii = 39; ii < 83; ii++) { VirtualMarkerLocationsXfm[ii]->translation = localOptVM_Data[ii]; } #endif DeformCharacteristicPoints(); DeformMarkers(); duringFunc("\nAbout to do 4,5,6"); // now doing 4, 5, & 6 k3 = _MoCapData->MoCapToVM_Value[f][3]; k12 = _MoCapData->MoCapToVM_Value[f][12]; k14 = _MoCapData->MoCapToVM_Value[f][14]; for (int k = 0; k < 5; k++) { if (k == 0) k4 = 0.0; if (k == 1) k4 = 0.2; if (k == 2) k4 = 0.4; if (k == 3) k4 = 0.6; if (k == 4) k4 = 0.8; _LipModel->setParameter(4, k4); for (int k = 0; k < 5; k++) { if (k == 0) k5 = 0.0; if (k == 1) k5 = 0.2; if (k == 2) k5 = 0.4; if (k == 3) k5 = 0.6; if (k == 4) k5 = 0.8; _LipModel->setParameter(5, k5); for (int k = 0; k < 5; k++) { if (k == 0) k6 = 0.0; if (k == 1) k6 = 0.2; if (k == 2) k6 = 0.4; if (k == 3) k6 = 0.6; if (k == 4) k6 = 0.8; _LipModel->setParameter(6, k6); SetMandible(_LipModel->getParameter(_LIP_PARAM_OPEN_JAW)*40, - _LipModel->getParameter(_LIP_PARAM_JAW_SIDE) *

64

Page 69: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

_LipModel->getDelta(_LIP_PARAM_JAW_SIDE)[0], _LipModel->getParameter(_LIP_PARAM_JAW_IN) * _LipModel->getDelta(_LIP_PARAM_JAW_IN)[2]); DeformCharacteristicPoints(); DeformMarkers(); markerErrorNew = markerError(f); if (markerErrorNew < markerErrorStart) { markerErrorStart = markerErrorNew; _MoCapData->MoCapToVM_Value[f][4] = k4; _MoCapData->MoCapToVM_Value[f][5] = k5; _MoCapData->MoCapToVM_Value[f][6] = k6; #if STUFF for (int iii = 39; iii < 83; iii++) { localOptVM_Data[iii] = VirtualMarkerLocationsXfm[iii]->translation.getValue(); } #endif } } //k6 } // k5 } // k4 // sets parameters to values from new estimation process // develops comparison from new process for (int i = 0; i < 19; i++) { _LipModel->setParameter(i, _MoCapData->MoCapToVM_Value[f][i]); //cerr << _MoCapData->MoCapToVM_Value[f][i]; if (_MoCapData->MoCapToVM_Value[f][i] != 0.0) cerr << " MoCapToVM_Value[" << f << "][" << i << "] = " << _MoCapData->MoCapToVM_Value[f][i] << "\n"; } _MoCapData->MoCapToVM_Error[f][3] = markerErrorStart; // use [1], etc., if decide to make more pass(es) for error reduction cerr << " MoCapToVM_Error[" << f << "][3] = " << markerErrorStart << "\n"; SetMandible(_LipModel->getParameter(_LIP_PARAM_OPEN_JAW)*40, - _LipModel->getParameter(_LIP_PARAM_JAW_SIDE) * _LipModel->getDelta(_LIP_PARAM_JAW_SIDE)[0], _LipModel->getParameter(_LIP_PARAM_JAW_IN) * _LipModel->getDelta(_LIP_PARAM_JAW_IN)[2] );

65

Page 70: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

#if STUFF for (int ii = 39; ii < 83; ii++) { VirtualMarkerLocationsXfm[ii]->translation = localOptVM_Data[ii]; } #endif DeformCharacteristicPoints(); DeformMarkers(); duringFunc("\nAbout to do 7,15,16"); // now doing 7, 15, & 16 k4 = _MoCapData->MoCapToVM_Value[f][4]; k5 = _MoCapData->MoCapToVM_Value[f][5]; k6 = _MoCapData->MoCapToVM_Value[f][6]; //cerr << "***k0 = " << k0 << "\tk1 = " << k1 << "\tk3 = " << k3 << "\tk12 = " << k12 << "\tk14 = " << k14 << endl; //cerr << "***k4 = " << k4 << "\tk5 = " << k5 << "\tk6 = " << k6 << endl; //cerr << "***k7 = " << k7 << "\tk15 = " << k15 << "\tk16 = " << k16 << endl; for (int k = 0; k < 5; k++) { if (k == 0) k7 = 0.0; if (k == 1) k7 = 0.2; if (k == 2) k7 = 0.4; if (k == 3) k7 = 0.6; if (k == 4) k7 = 0.8; _LipModel->setParameter(7, k7); for (int k = 0; k < 5; k++) { if (k == 0) k15 = 0.0; if (k == 1) k15 = 0.2; if (k == 2) k15 = 0.4; if (k == 3) k15 = 0.6; if (k == 4) k15 = 0.8; _LipModel->setParameter(15, k15); for (int k = 0; k < 5; k++) { if (k == 0) k16 = 0.0; if (k == 1) k16 = 0.2; if (k == 2) k16 = 0.4; if (k == 3) k16 = 0.6; if (k == 4) k16 = 0.8;

66

Page 71: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

_LipModel->setParameter(16, k16); SetMandible(_LipModel->getParameter(_LIP_PARAM_OPEN_JAW)*40, - _LipModel->getParameter(_LIP_PARAM_JAW_SIDE) * _LipModel->getDelta(_LIP_PARAM_JAW_SIDE)[0], _LipModel->getParameter(_LIP_PARAM_JAW_IN) * _LipModel->getDelta(_LIP_PARAM_JAW_IN)[2]); DeformCharacteristicPoints(); DeformMarkers(); markerErrorNew = markerError(f); if (markerErrorNew < markerErrorStart) { markerErrorStart = markerErrorNew; _MoCapData->MoCapToVM_Value[f][7] = k7; _MoCapData->MoCapToVM_Value[f][15] = k15; _MoCapData->MoCapToVM_Value[f][16] = k16; #if STUFF for (int iii = 39; iii < 83; iii++) { localOptVM_Data[iii] = VirtualMarkerLocationsXfm[iii]->translation.getValue(); } #endif } } //k16 } // k15 } // k7 // sets parameters to values from new estimation process // develops comparison from new process for (int i = 0; i < 19; i++) { _LipModel->setParameter(i, _MoCapData->MoCapToVM_Value[f][i]); //cerr << _MoCapData->MoCapToVM_Value[f][i]; if (_MoCapData->MoCapToVM_Value[f][i] != 0.0) cerr << " MoCapToVM_Value[" << f << "][" << i << "] = " << _MoCapData->MoCapToVM_Value[f][i] << "\n"; } _MoCapData->MoCapToVM_Error[f][4] = markerErrorStart; // use [1], etc., if decide to make more pass(es) for error reduction

67

Page 72: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

cerr << " MoCapToVM_Error[" << f << "][4] = " << markerErrorStart << "\n"; SetMandible(_LipModel->getParameter(_LIP_PARAM_OPEN_JAW)*40, - _LipModel->getParameter(_LIP_PARAM_JAW_SIDE) * _LipModel->getDelta(_LIP_PARAM_JAW_SIDE)[0], _LipModel->getParameter(_LIP_PARAM_JAW_IN) * _LipModel->getDelta(_LIP_PARAM_JAW_IN)[2] ); #if STUFF for (int ii = 39; ii < 83; ii++) { VirtualMarkerLocationsXfm[ii]->translation = localOptVM_Data[ii]; } #endif DeformCharacteristicPoints(); DeformMarkers(); duringFunc("\nAbout to do 11,17,18"); // now doing 11, 17, & 18 k7 = _MoCapData->MoCapToVM_Value[f][7]; k15 = _MoCapData->MoCapToVM_Value[f][15]; k16 = _MoCapData->MoCapToVM_Value[f][16]; //cerr << "***k0 = " << k0 << "\tk1 = " << k1 << endl; //cerr << "***k3 = " << k3 << "\tk12 = " << k12 << "\tk14 = " << k14 << endl; //cerr << "***k4 = " << k4 << "\tk5 = " << k5 << "\tk6 = " << k6 << endl; //cerr << "***k7 = " << k7 << "\tk15 = " << k15 << "\tk16 = " << k16 << endl; //cerr << "***k11 = " << k11 << "\tk17 = " << k17 << "\tk18 = " << k18 << endl; //cerr << "***k8 = " << k8 << "\tk9 = " << k9 << "\tk10 = " << k10 << endl; for (int k = 0; k < 5; k++) { if (k == 0) k11 = 0.0; if (k == 1) k11 = 0.2; if (k == 2) k11 = 0.4; if (k == 3) k11 = 0.6; if (k == 4) k11 = 0.8; _LipModel->setParameter(11, k11); for (int k = 0; k < 5; k++) { if (k == 0) k17 = 0.0;

68

Page 73: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

if (k == 1) k17 = 0.2; if (k == 2) k17 = 0.4; if (k == 3) k17 = 0.6; if (k == 4) k17 = 0.8; _LipModel->setParameter(17, k17); for (int k = 0; k < 5; k++) { if (k == 0) k18 = 0.0; if (k == 1) k18 = 0.2; if (k == 2) k18 = 0.4; if (k == 3) k18 = 0.6; if (k == 4) k18 = 0.8; _LipModel->setParameter(18, k18); SetMandible(_LipModel->getParameter(_LIP_PARAM_OPEN_JAW)*40, - _LipModel->getParameter(_LIP_PARAM_JAW_SIDE) * _LipModel->getDelta(_LIP_PARAM_JAW_SIDE)[0], _LipModel->getParameter(_LIP_PARAM_JAW_IN) * _LipModel->getDelta(_LIP_PARAM_JAW_IN)[2]); DeformCharacteristicPoints(); DeformMarkers(); markerErrorNew = markerError(f); if (markerErrorNew < markerErrorStart) { markerErrorStart = markerErrorNew; _MoCapData->MoCapToVM_Value[f][11] = k11; _MoCapData->MoCapToVM_Value[f][17] = k17; _MoCapData->MoCapToVM_Value[f][18] = k18; #if STUFF for (int iii = 39; iii < 83; iii++) { localOptVM_Data[iii] = VirtualMarkerLocationsXfm[iii]->translation.getValue(); } #endif } } //k18 } // k17 } // k11

69

Page 74: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

// sets parameters to values from new estimation process // develops comparison from new process for (int i = 0; i < 19; i++) { _LipModel->setParameter(i, _MoCapData->MoCapToVM_Value[f][i]); //cerr << _MoCapData->MoCapToVM_Value[f][i]; if (_MoCapData->MoCapToVM_Value[f][i] != 0.0) cerr << " MoCapToVM_Value[" << f << "][" << i << "] = " << _MoCapData->MoCapToVM_Value[f][i] << "\n"; } _MoCapData->MoCapToVM_Error[f][5] = markerErrorStart; // use [1], etc., if decide to make more pass(es) for error reduction cerr << " MoCapToVM_Error[" << f << "][5] = " << markerErrorStart << "\n"; SetMandible(_LipModel->getParameter(_LIP_PARAM_OPEN_JAW)*40, - _LipModel->getParameter(_LIP_PARAM_JAW_SIDE) * _LipModel->getDelta(_LIP_PARAM_JAW_SIDE)[0], _LipModel->getParameter(_LIP_PARAM_JAW_IN) * _LipModel->getDelta(_LIP_PARAM_JAW_IN)[2] ); #if STUFF for (int ii = 39; ii < 83; ii++) { VirtualMarkerLocationsXfm[ii]->translation = localOptVM_Data[ii]; } #endif DeformCharacteristicPoints(); DeformMarkers(); duringFunc("\nAbout to do 8,9,10"); // now doing 8, 9, & 10 k11 = _MoCapData->MoCapToVM_Value[f][11]; k17 = _MoCapData->MoCapToVM_Value[f][17]; k18 = _MoCapData->MoCapToVM_Value[f][18]; //cerr << "***k0 = " << k0 << "\tk1 = " << k1 << endl; //cerr << "***k3 = " << k3 << "\tk12 = " << k12 << "\tk14 = " << k14 << endl; //cerr << "***k4 = " << k4 << "\tk5 = " << k5 << "\tk6 = " << k6 << endl; //cerr << "***k7 = " << k7 << "\tk15 = " << k15 << "\tk16 = " << k16 << endl; //cerr << "***k11 = " << k11 << "\tk17 = " << k17 << "\tk18 = " << k18 << endl;

70

Page 75: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

//cerr << "***k8 = " << k8 << "\tk9 = " << k9 << "\tk10 = " << k10 << endl; for (int k = 0; k < 5; k++) { if (k == 0) k8 = 0.0; if (k == 1) k8 = 0.2; if (k == 2) k8 = 0.4; if (k == 3) k8 = 0.6; if (k == 4) k8 = 0.8; _LipModel->setParameter(8, k8); for (int k = 0; k < 5; k++) { if (k == 0) k9 = 0.0; if (k == 1) k9 = 0.2; if (k == 2) k9 = 0.4; if (k == 3) k9 = 0.6; if (k == 4) k9 = 0.8; _LipModel->setParameter(9, k9); for (int k = 0; k < 5; k++) { if (k == 0) k10 = 0.0; if (k == 1) k10 = 0.2; if (k == 2) k10 = 0.4; if (k == 3) k10 = 0.6; if (k == 4) k10 = 0.8; _LipModel->setParameter(10, k10); SetMandible(_LipModel->getParameter(_LIP_PARAM_OPEN_JAW)*40, - _LipModel->getParameter(_LIP_PARAM_JAW_SIDE) * _LipModel->getDelta(_LIP_PARAM_JAW_SIDE)[0], _LipModel->getParameter(_LIP_PARAM_JAW_IN) * _LipModel->getDelta(_LIP_PARAM_JAW_IN)[2]); DeformCharacteristicPoints(); DeformMarkers(); markerErrorNew = markerError(f); if (markerErrorNew < markerErrorStart) { markerErrorStart = markerErrorNew; _MoCapData->MoCapToVM_Value[f][8] = k8; _MoCapData->MoCapToVM_Value[f][9] = k9; _MoCapData->MoCapToVM_Value[f][10] = k10; #if STUFF for (int iii = 39; iii < 83; iii++) {

71

Page 76: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

localOptVM_Data[iii] = VirtualMarkerLocationsXfm[iii]->translation.getValue(); } #endif } } //k10 } // k9 } // k8 // sets parameters to values from new estimation process // develops comparison from new process for (int i = 0; i < 19; i++) { _LipModel->setParameter(i, _MoCapData->MoCapToVM_Value[f][i]); //cerr << _MoCapData->MoCapToVM_Value[f][i]; if (_MoCapData->MoCapToVM_Value[f][i] != 0.0) cerr << " MoCapToVM_Value[" << f << "][" << i << "] = " << _MoCapData->MoCapToVM_Value[f][i] << "\n"; } _MoCapData->MoCapToVM_Error[f][6] = markerErrorStart; // use [1], etc., if decide to make more pass(es) for error reduction cerr << " MoCapToVM_Error[" << f << "][6] = " << markerErrorStart << "\n"; SetMandible(_LipModel->getParameter(_LIP_PARAM_OPEN_JAW)*40, - _LipModel->getParameter(_LIP_PARAM_JAW_SIDE) * _LipModel->getDelta(_LIP_PARAM_JAW_SIDE)[0], _LipModel->getParameter(_LIP_PARAM_JAW_IN) * _LipModel->getDelta(_LIP_PARAM_JAW_IN)[2] ); #if STUFF for (int ii = 39; ii < 83; ii++) { VirtualMarkerLocationsXfm[ii]->translation = localOptVM_Data[ii]; } #endif //DeformCharacteristicPoints(); //DeformMarkers(); DeformFace(); endFunc("\nestimateParamsFromMoCap"); //used to generate viseme data for individual words

72

Page 77: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

if ((f == 444) || (f == 706) || (f == 930) || (f == 1174) || (f == 1401) || (f == 1634) || (f == 1840) || (f == 2067)) { cerr << "\n\nMoCapToVM_Value\n"; for (int i = 1057; i < 1174; i++) { cerr << i << ", "; for (int ii = 0; ii < 20; ii++) { cerr << _MoCapData->MoCapToVM_Value[i][ii] << ", "; } cerr << endl; } cerr << "\n\nMoCapToVM_Error\n"; for (int i = 1057; i < 1174; i++) { cerr << i << ", "; for (int ii = 0; ii < 8; ii++) { cerr << _MoCapData->MoCapToVM_Error[i][ii] << ", "; } cerr << endl; } } }

73

Page 78: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

APPENDIX D –

EXAMPLE PHONEME – VISEME PARAMETER SET

74

Page 79: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

75

Page 80: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

76

Page 81: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

77

Page 82: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

78

Page 83: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

styleA01.non: aboutError Error

Guess Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Initial 6 -

Initial 1057 182.641 43.7326 39.7317 38.649 38.5266 37.6226 37.6226 38.5266 -0.9041058 180.08 46.6152 42.0226 40.9576 40.7039 38.441 38.441 38.441 01059 176.004 50.977 44.5637 43.4274 43.1544 40.6643 40.6643 40.8307 -0.16641060 169.467 55.8736 47.9001 47.9001 47.9001 47.9001 47.9001 42.5283 5.37181061 160.677 61.7136 49.7332 49.7332 49.7332 49.7332 49.7332 50.8101 -1.07691062 150.729 70.6569 53.8929 53.8929 53.8929 53.8929 53.8929 53.8929 01063 141.135 81.6877 58.8964 58.8964 58.8964 58.1585 58.1585 61.0254 -2.86691064 132.338 72.8126 59.965 59.734 59.734 59.2683 59.2683 63.8345 -4.56621065 125.018 62.16 53.7245 53.4521 53.4521 51.4133 51.4133 53.5863 -2.1731066 119.457 53.0341 49.3902 49.1482 49.1482 45.7909 45.7909 47.4773 -1.68641067 116.1 47.0163 47.0163 46.8321 46.8321 42.7739 42.7739 44.2996 -1.52571068 113.49 48.7586 47.227 47.1363 47.1363 42.555 42.555 44.3675 -1.81251069 113.066 55.6928 50.9719 50.9719 50.9719 47.1787 47.0009 49.0952 -2.09431070 114.959 66.9603 58.4946 58.4946 58.4946 53.2435 52.8622 57.2207 -4.35851071 117.116 82.0419 67.9019 67.9019 67.9019 66.2323 65.0709 63.2658 1.80511072 120.713 95.2619 73.3093 73.3093 73.3093 71.4427 70.2186 71.4819 -1.26331073 126.511 95.7676 95.7676 95.6316 95.6316 89.509 88.1599 78.3233 9.83661074 131.891 96.5888 95.7821 95.7821 95.7821 93.9884 92.9548 89.0804 3.87441075 135.724 97.6142 94.027 94.027 94.027 91.9875 90.5157 91.9679 -1.45221076 139.843 101.494 94.5224 94.5224 94.5224 92.1199 90.1604 91.6359 -1.47551077 143.722 106.191 96.0776 96.0776 96.0776 93.3911 91.1143 92.448 -1.33371078 147.795 111.53 98.981 98.981 98.981 96.1394 93.5113 94.5946 -1.08331079 152.328 116.956 102.633 102.633 102.633 99.6178 97.2057 97.3381 -0.13241080 156.253 121.93 106 106 106 101.088 98.4188 99.5538 -1.1351081 160.075 126.475 108.999 108.999 108.999 103.364 100.815 100.815 01082 162.832 129.89 111.132 111.132 111.132 104.395 101.658 102.54 -0.8821083 164.592 132.457 112.584 112.584 112.584 105.335 102.642 102.642 01084 165.294 134.819 114.183 114.183 114.183 106.552 103.841 103.841 01085 167.104 137.726 116.172 116.172 116.172 108.391 105.696 105.696 01086 169.581 140.003 117.842 117.842 117.842 109.896 107.202 107.202 01087 171.764 141.214 118.624 118.624 118.624 110.767 108.034 108.148 -0.1141088 173.335 141.923 118.82 118.82 118.82 111.164 108.585 108.585 01089 173.339 141.938 118.519 118.519 118.519 111.051 108.663 108.663 01090 171.692 140.221 140.156 140.156 140.156 137.907 135.763 107.372 28.3911091 169.567 136.162 113.46 113.46 113.46 106.956 104.814 134.553 -29.7391092 165.546 128.753 108.499 108.499 108.499 103.748 102.024 101.641 0.3831093 157.393 117.385 101.425 101.425 101.425 98.8807 97.6085 98.0691 -0.46061094 146.958 102.346 93.1146 93.1146 93.1146 91.6763 91.6763 96.2643 -4.5881095 137.933 88.3096 86.674 86.674 86.674 85.3386 85.3386 90.6027 -5.26411096 134.08 88.8577 86.3897 86.3897 86.3897 84.7807 84.7807 91.8373 -7.05661097 134.621 107.091 94.9096 94.9096 94.9096 92.7119 92.7119 99.3164 -6.6045

1098 137.956 105.32 98.2062 98.2062 98.2062 95.6867 95.6867 109.787-

14.1003

79

Page 84: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

1099 141.727 96.2601 94.8471 94.8471 94.8471 92.9038 92.9038 93.5513 -0.64751100 145.375 92.2448 92.2448 92.2448 91.7253 89.1216 89.1216 91.6761 -2.55451101 150.062 91.4752 91.4752 91.4752 91.4752 88.4695 88.4695 89.3561 -0.88661102 315.963 264.445 264.445 264.445 264.445 263.795 263.795 263.191 0.6041103 313.929 264.937 264.937 264.937 264.937 263.854 263.854 263.854 01104 309.987 267.882 267.661 267.661 267.661 265.562 265.562 265.871 -0.3091105 306.597 276.377 273.748 273.748 273.748 270.318 270.318 270.604 -0.2861106 303.647 292.194 283.4 283.4 283.4 278.406 278.406 279.672 -1.2661107 302.608 282.318 274.984 274.984 274.157 267.478 267.478 291.434 -23.9561108 134.226 99.9178 98.9473 98.8513 96.8922 91.5039 91.5039 97.3195 -5.8156

1109 139.04 101.485 98.3321 98.3321 98.3321 92.1479 92.1479 104.182-

12.03411110 147.455 119.382 105.868 105.868 105.868 99.242 99.1177 107.007 -7.88931111 157.602 128.4 125.472 125.472 125.472 119.844 119.454 112.704 6.751112 167.138 126.607 126.607 126.607 126.607 120.707 119.833 124.323 -4.491113 174.891 132.442 130.134 130.126 130.126 123.574 120.618 125.231 -4.6131114 182.028 138.789 134.11 134.076 134.076 127.376 122.537 125.132 -2.5951115 187.934 144.934 138.566 138.494 138.494 130.425 126.94 127.913 -0.9731116 192.492 149.278 142.558 142.558 142.558 137.748 136.071 130.044 6.0271117 197.594 154.185 144.953 144.953 144.953 140.244 138.397 138.397 01118 203.799 158.928 147.371 147.371 147.371 142.625 140.786 141.793 -1.0071119 208.48 162.656 150.678 150.678 150.678 140.653 137.482 143.896 -6.4141120 211.122 163.939 150.758 150.758 150.758 141.707 139.28 139.445 -0.1651121 212.153 163.636 149.482 149.482 149.482 141.725 139.436 139.436 01122 212.145 161.686 146.874 146.874 146.874 141.167 138.941 139.126 -0.1851123 211.655 159.089 144.273 144.273 144.273 139.401 137.433 140.061 -2.6281124 210.925 156.426 143.164 143.164 143.164 139.472 138.076 138.558 -0.4821125 207.015 152.59 143.255 143.255 143.255 140.855 139.941 140.127 -0.1861126 201.546 145.951 142.024 142.024 142.024 134.975 134.082 144.926 -10.8441127 193.854 139.549 138.482 138.482 138.482 136.183 135.606 139.454 -3.8481128 183.339 138.288 137.83 137.83 137.83 136.917 136.64 138.864 -2.2241129 170.876 122.646 109.274 109.274 109.274 108.194 107.854 143.097 -35.2431130 158.567 105.183 100.673 100.673 100.673 100.188 99.5417 104.153 -4.61131131 147.655 97.0923 97.0923 97.0923 97.0923 96.9076 96.6258 100.905 -4.2792

1132 139.402 103.863 98.1612 98.1612 98.1612 91.2931 91.2931 103.79-

12.4969

1133 132.658 103.278 92.0073 90.6537 90.6537 85.51 85.51 99.6064-

14.09641134 128.155 84.3874 81.3986 81.3986 81.3986 75.1843 75.1843 76.9901 -1.80581135 126.706 66.4641 66.4641 66.4641 66.4641 64.9717 64.9717 68.5576 -3.58591136 127.641 58.1319 57.0195 57.0195 57.0195 56.1216 56.1216 60.3363 -4.21471137 131.345 60.5746 55.485 55.485 55.485 54.9083 54.9083 54.8724 0.03591138 138.43 66.7204 58.484 58.484 58.484 58.0211 58.0211 58.0211 01139 147.774 73.7758 64.5431 64.5431 64.5431 64.5144 64.5144 64.3494 0.1651140 155.489 79.6534 70.0169 69.9535 69.9535 69.7945 69.7945 69.759 0.03551141 161.412 84.1921 74.8121 74.4662 74.4662 74.0462 74.0462 74.141 -0.0948

1142 166.769 85.7951 66.3458 65.9145 65.9145 64.5382 64.5382 77.9246-

13.38641143 171.622 83.4051 65.2404 64.505 64.505 62.9096 62.9096 62.9096 0

80

Page 85: Animated Speech Prosody Modelingsci.tamucc.edu/~cams/projects/264.pdf · fundamental in human communications. Prosody, for a person speaking in English, is the stress, intonation,

1144 175.083 83.7631 65.7321 64.7811 64.7811 62.8727 62.8383 62.8727 -0.03441145 177.705 83.148 65.619 64.5382 64.5382 62.3233 62.1947 62.1947 01146 178.839 83.464 66.5744 65.381 65.381 62.7812 62.5654 62.5654 01147 179.4 84.273 67.6592 66.2154 66.2154 63.5587 63.3234 63.2409 0.08251148 181.243 84.0872 67.6003 65.9306 65.9306 63.3736 63.2166 63.2166 01149 184.013 83.7346 67.5709 65.757 65.757 63.211 63.1213 63.1213 01150 186.497 83.2186 67.6159 65.7126 65.7126 62.943 62.7689 62.7689 01151 187.682 82.695 67.6938 65.7496 65.7496 62.7038 62.3944 62.3944 01152 188.181 81.8448 67.1062 65.1749 65.1749 62.1008 61.738 61.738 01153 188.941 80.6706 66.3466 64.4825 64.4825 61.562 61.1983 61.1983 01154 188.942 80.0056 66.1585 64.3154 64.3154 61.4176 61.0574 61.0574 01155 188.002 80.1494 66.4149 64.552 64.552 61.4985 61.1748 61.1748 01156 187.538 79.3414 65.8676 64.1208 64.1208 61.099 60.8065 60.8065 01157 187.78 76.7627 63.8111 62.2687 62.2687 59.9086 59.9086 59.5671 0.34151158 187.723 74.1038 61.5793 60.2368 60.2368 57.9945 57.9945 58.249 -0.25451159 188.029 71.5758 59.8302 58.6261 58.6261 56.9152 56.9152 56.9152 01160 187.599 69.7957 58.9109 57.8528 57.8528 56.5807 56.5807 56.5807 01161 187.282 67.5688 57.6793 56.7328 56.7328 55.6699 55.6464 55.7251 -0.07871162 188.064 64.8448 56.0058 55.2054 55.2054 54.5396 54.5396 54.8953 -0.35571163 190.253 61.0299 53.304 52.8623 52.8623 52.4234 52.211 52.6182 -0.40721164 191.602 58.1454 51.4727 51.358 51.358 50.8193 50.7651 51.0178 -0.25271165 194.134 54.399 49.5335 49.5225 49.5225 49.3257 49.3257 49.7605 -0.4348

1166 196.37 49.542 43.2887 40.6639 40.3984 37.2275 37.2275 47.5491-

10.32161167 198.391 45.8688 40.7717 38.5692 38.4464 36.6932 36.6932 37.2844 -0.59121168 201.036 41.9066 38.1132 35.9742 35.9742 34.401 34.401 36.0132 -1.61221169 201.061 39.9413 36.7929 34.7655 34.7655 33.7933 33.7933 33.7933 01170 200.631 39.581 37.1399 35.1736 35.1367 35.1367 35.1367 34.4207 0.7161171 201.629 38.9881 37.0701 35.0984 34.9954 34.9954 34.9954 34.9954 01172 202.544 37.9633 36.203 34.2748 34.2748 34.1389 34.1389 34.3779 -0.2391173 202.055 37.8064 35.9037 34.0042 34.0042 34.0042 34.0042 34.3226 -0.3184

81