Top Banner
Testing / Evaluation Eric Morley June 1, 2010
26

Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Dec 17, 2015

Download

Documents

Lucas Gallagher
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Testing / Evaluation

Eric MorleyJune 1, 2010

Page 2: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Papers

• File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative and Alternative Communication 18 (December), 228-241.

• Todman, John and Halina Rzepecka. 2003. Effect of pre-utterance pause length on perceptions of communicative competence in AAC-aided social conversations. Augmentative and Alternative Communication, 19(4):222–234.

• Ball, L. L., Beukelman, D. R., Pattee, G. L., 2004. Acceptance of augmentative and alternative communication technology by persons with amyotrophic lateral sclerosis. Augmentative and Alternative Communication 20 (2), 113-122.

Page 3: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

File and Todman 2002

• Voice output communication aids (VOCAs) allow storage of whole utterances, but are not designed for users to rely on these for open-ended conversations– Impossible to predict which utterances will be needed– Difficult to locate potential follow-up phrases

• May be possible to use “imperfect expressions” in social conversation and still have a positive experience

Page 4: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

TALK

• Computer system (Talk Aid Using Pre-Loaded Knowledge)• Uses pre-stored messages• For people who understand language but can’t talk

(anymore)• User writes phrases on their own time

– Phrases stored in person, time and aspect locations• Ex: My father was a shop manager (me/past/who)

• Quick fires: “ah yes”, “too bad”, etc.– Can be set to say equivalent phrase to one selected

• Comments: useful in many contexts (ex. “what about you?”)• Allows word by word entry

Page 5: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Old TALK

• Evaluations have been getting more realistic since inception (started with cocktail party type situation)

• Rates of up to 40-60wpm (vs 2-15wpm for word by word)– Improvements in interaction quality

• Some differences from normal conversation in simulated use with unimpaired user– VOCA user was less likely to follow up with narrative

Page 6: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Current Study• Conversations can be on any topic• Genuine users interacting with new partners

• 19/68 conversations had the same partner allowed assessment of coherence of conversations with a familiar partner

• Analyzed 68 old transcripts of conversations with lag-sequential analysis

• Participants were– 1 TALK user (~40y.o w/cerebral palsy w/dysarthria)– 56 undergrad psych students

• 50 students were invited to talk• 6 were invited to have a “getting to know you” conversation• People: TALK user (CP); repeat partner (RP); new partners (NP1 and

NP2) from different sets of partners (not repeated)

Page 7: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Hand Labeling

• Speech acts coded by category– Questions; answer; observation; agreement;

disagreement; repetition; interjection; directive; narrative

– 2,345 speech acts w/RP; 2,802 acts w/NP1; 4,268 w/NP2

• Seemed to be ~90% agreement on labeling based off of re-labeling 15% of the acts– Used kappa statistic to confirm agreement of

labelers

Page 8: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Lag Sequential Analysis• Statistical analysis of a sequence of terms (here speech acts in a

conversation)– Finds statistically significant pairs of acts which occur at particular distances

(ie lags)– Lag n means acts a and b with n-1 intervening acts

• Crosslag: speech act a (intervening acts) speech act b• Autolag: speech act a (intervening acts) speech act a• Looked at both types together and at crosslag alone

– Pooled data across conversation sets (RP, NPi)• Use z-scores: difference between observed and expected lag

probabilities• Only counted long sequences as statistically significant if

subsequences were also significant)

Page 9: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Results

• Most speech acts were questions, answers, observations or agreements

• VOCA user made 42% of acts (41% of questions and observations, 55% of answers, 10% of agreements)– In unaided conversations one would expect many more observations

than questions and answers, and some more agreements• 31 sequences identified

– 19 had ≥3 speech acts– Of those which occurred in all 3 sets, 13 sequences identified, 7 with

3 speech acts• Question and answer sequences were common

– Facilitates turn-taking

Page 10: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Initiations

• Observation– Often followed by questions, sometimes agreement

• CP doesn’t interject as much as RP

• Agreement– Used for turn taking in unaided conversation– Questions used for this in aided conversation

• Other– CP repetition and N/RP narrative followed by

question

Page 11: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Discussion• Only speakers reliably followed answers with observations or narrative• Aided partner did not use narrative (possibly because of lack of

practice)– Maybe training would help this

• More questions in aided conversation– More likely that the VOCA user has an appropriate general question than a

specific narrative or observation– Gives VOCA user control over topic

• Pre-stored utterances seem to be re-usable between new and repeat conversation partners

• Should include “quick fire” interjection support for conversation with RP– Maybe the space taken up by this can be taken up with something else for NP

mode?

Page 12: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Todman and Rzepecka 2003

• Several types of VOCAs– Word by word

• Need to generate text at some point, even with pre-stored messages. Pauses are so long that speaking rate goes to 2-15wpm

• Utterances are extremely short, partner dominates conversation, “folk walk away”

– Whole utterance approach (WUA)• Based on the idea that content of conversations is “frequently

approximate rather than precise”• Should result in faster communication rates when precision is not critical

– Is this the case?– Does quality of conversation degrade?– If yes to both of these, does this result in more positive perceptions of VOCA

users’ communicative competence and interactions with them?

Page 13: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

More on WUA

• TALK system– For free-ranging social interactions– Uses lots of small talk– 40wpm w/o training, 50-80 with

• Frametalker– Designed for transactional conversations

(restaurant, bank, etc)– 45 wpm, rated as having a high degree of

naturalness

Page 14: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Quality of Conversation (WUA)

• Todman, Elder and Alm (1995)– Speaking person used TALK to converse– Parts of these conversations were reenacted with

speakers, and pauses were removed– Compared with non-aided conversations– Aided found to be of higher quality• Likely because pre-composed messages will be more

coherent– Would this be the case to the other listener, or only for people

eavesdropping?

Page 15: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Conversation Rate and Perceived Competence

• Variation in conversation rate (CR) can be approximated by looking at pre-utterance pause length

• A pause of even a few seconds can cause problems– User perceived as unintelligent– Poor quality of social interactions– Frustration– Abandonment of VOCA

• Previous studies have found– Positive correlation between conversational rate and satisfaction– Negative correlation between pre-utterance pause length and

satisfaction

Page 16: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Previous Experiments• Newman (1982)

– Pre-utterance pauses of 4-7s led to worse interactions when compared with utterances w/o pauses

– If the pauses were a result of doing something else (sculpting), this effect disappeared• Does this apply to VOCA users?

• Ratcliff, et al (2002)– Effects of pauses and speaking rate on naturalness of synthetic speech– Increased speaking rate perceived as more natural, pauses didn’t seem to do much on their

own (only had an effect since they changed the speaking rate)• Bedrosian (2002)

– VOCA users ask for a book, 1 with a mostly irrelevant message (after 4), other after 90sec with a highly relevant message• 2nd approach preferred

– Had VOCA users give too much/too little information quickly, or relevant info after a delay• Tradeoff: short delay led to improved affective/behavioral ratings, wrong amount of info

led to lower rating of cognitive component

Page 17: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Current Experiment• How does pre-utterance pause time affect the perception of social conversation?• Does the amount of experience a VOCA user has with WUA have an effect on this?• 3 VOCA users with cerebral palsy

– Used TalkBoards, had varying experience with this• Partners had 20 min introductory conversation

– 2 of 3 got sick, so 5 partners, none with VOCA experience– Possible effects of having different partners– Told there were no restrictions on topic of conversation, but other would use VOCA

• Interactions recorded, 5 min chunk extracted (after small talk)– Pre-utterance pauses replaced with pauses of set lengths (2-10s)

• Pauses didn’t seem to be identical in length, so those are means– Also used original interaction

• 28 raters– All psych students– Used Likert scale and recorded conversations as baseline– told they would first listen to a conversation with some natural speech changed to synthetic– Then they would listen to “getting to know you” conversation involving VOCA user

Page 18: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Current Experiment (cont’d)

• 7 point scale to evaluate 4 areas– Linguistic, operational, social, strategic

• Raters heard each conversation 1x with one pause variation for each one

• Blocked raters, found high level of agreement among raters

• Effect of pause length found to be significant

Page 19: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Results, Discussion• Shorter pause time is preferred

– Linear trend for 2-16sec pause• Possible that pause time became salient because raters listened

to conversations with multiple pause times• VOCA user had significant effect

– May be something other than experience causing this– Smaller effect than pause time, no interaction

• Social nature of conversation– Perhaps perceived nature of VOCA user had effect

• Pauses might not be legitimized b/c no shared activity• WUA is important because of causal relationship between pause

time and partner/observer preferences

Page 20: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Ball, et al. 2004•Amyotrophic Lateral Sclerosis (ALS)• Neuromuscular disease which results in weakness, atrophy and paralysis

•80% eventually require AAC• ≥25% of these did not accept AAC

•Little is known about the 80% of ALS patients who need and use AAC• High vs. low tech• Stage of ALS at adoption• “Attitudes toward technology”

•Mathy et al. (2000) gives preliminary information• High tech: detailed needs and wants; written communication; stories• Low tech: immediate needs and wants; conversation

•Gutmann (1999) found gender differences• Women prefer low-tech strategies and VOCAs more than men• Men prefer high-tech writing systems more often than women

•Gutmann and Gryfe (1996) found that early and frequent intervention, and early introduction of AAC is critical for acceptance•Using an AAC can allow someone with ALS to continue working•Focus on high-tech AAC• Low-tech options haven’t changed very much, and have been examined• High-tech options are changing rapidly and becoming more accessible

•Is there a pattern to adoption of high-tech AAC?•Why do people use/reject AAC?

Page 21: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Group• 50 ALS patients monitored over 4 years• 22 females, 28 males• 17 bulbar, 22 spinal, 11 mixed diagnosis• Ages 36-78 (μ=60.16 y.o.)• All spoke English primarily• 2 had cognitive deficits• Seen for AAC assessment when their speech

began to change– Those wanting only written communication were not

included in the study• Wide variety of educational levels, social status

Page 22: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Procedure• AAC assessment when intelligibility ≤90% or

speaking rate ≤100wp (tested quarterly)• Patients presented with various devices– Tried them during presentation– Evaluator made recommendations– Could bring home favored device for 1 week trial

• AAC intervention recommended• AAC acceptance, use, rejection and

discontinuance were monitored until their death (4-181mo., μ=43.8mo, SD=37.54mo)

Page 23: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Results

• Acceptance: 90% immediate, 6% delayed– Came from all social classes– No gender differences

• Immediate Acceptance– In interviews, participants listed communication,

participation and employment as reasons for acceptance

– All used AAC as primary means of communication

Page 24: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Delayed Acceptance

• Ages 30-39, 70-79• Delay of 6-24 mo.• Preferred multifunctional devices• Delayed in part because of family members

– Believed that they could understand the participants well enough to meet their needs

– Thought they were providing adequate care w/o AAC• 2 thought that AAC questioned the quality of their care

• One physician advised a family to accept dysarthria rather than turn to technology

• Three individuals were in some form of denial

Page 25: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Rejection and Discontinuance

• Rejected by the two participants with cognitive limitation

• No discontinuance– High-tech AAC often abandoned at end-of-life

Page 26: Testing / Evaluation Eric Morley June 1, 2010. Papers File, P., Todman, J., 2002. Evaluation of the coherence of computer-aided conversations. Augmentative.

Discussion

• Saw wider adoption than before (1996)• AAC seems to be more widely accepted in society• US began funding AAC devices in 2000• Recommendations– Providing appropriate information regarding the speech-

language characteristics of ALS– Regular contact/monitoring– Sustaining awareness of AAC/intervention opportunities

• Doctors must be aware of these options and be able to explain them