Top Banner
Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute
58

Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

Mar 31, 2015

Download

Documents

Zoe Bray
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

Introduction to CHILDES and TalkBank

Brian MacWhinney

CMU - Psychology, Modern Languages, Language Technologies Institute

Page 2: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

The goal of TalkBank

Page 3: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

The core idea

Human communication is a single unified process.

However, patterns in communication are analyzed by 20 different fields.

The time scales of the processes varies from milliseconds to centuries.

But all of these processes must have their ultimate effect in the Moment.

We can capture the Moment on video.

Page 4: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

Principles

Data-sharing, Informed Consent

Multimedia

Open Access, Web Access, Commentary

Specified Format

Interoperability

Community integration

Page 5: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

Availability

http://childes.psy.cmu.edu

http://talkbank.org

programs, manuals, fonts, morphologies, CA conventions, video production guides, XML Schema, links to other programs

data can be either downloaded or played back over the web

Page 6: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

Current target areas

1. CHILDES

2. PhonBank

3. BilingualBank

4. AphasiaBank

5. CABank

6. ClassBank

Page 7: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

CHILDES

Child Language Data Exchange System

Founded in 1984 in Concord MA

Director: Brian MacWhinney [email protected]

Programmers: Leonid Spektor, Franklin Chen

3000 Members

130 corpora

Over 3200 published articles

Page 8: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

CHILDES and TalkBankCHILDES TalkBank

Age 23 years 7 years

Words 44 million 8 + 55 million

Media 750 GB 450GB

Languages 32 18

Publications 3200+ 89

Users 3000+ 500

Page 9: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

Practical Considerations

Learning CLAN takes about a week

Transcription is slow. Perhaps 15:1 ratio. Blitzscribe, LENA, etc. probably will not work

Currently available data may not be perfect for a given issue

Corpora may need enhancement through MOR or Coder’s editor

9

Page 10: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

Tools from the Web

Data: childes.psy.cmu.edu/data

CLAN: childes.psy.cmu.edu/clan

Manuals: childes.psy.cmu.edu/manuals

Morphosyntax: childes.psy.cmu.edu/morgrams

Phon childes.psy.cmu.edu/phon

Tutorial videos talkbank.org/training

Digital video: talkbank.org/dv

CA Methods: talkbank.org/CABank

Page 11: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

11

Why no handout?

“Overviews” link has this PPT presentation

CHILDES is now fully electronic. No more paper.

Page 12: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

Available Methods

Microanalysis - CA, phonetics, ethology

Microgenetic analysis - CA, code-switching (NEXT)

Group and treatment comparisons - Genesee

Error analysis - YipMatthews

Diffusion analysis - in preschools

Longitudinal studies - growth curves

Modeling - neural nets, dynamic systems, evolutionary models

Page 13: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

CLAN Tools

Transcribing

Editing

Counts -- FREQ, KWAL

Analyses: MOR, GRASP, PHON

• Interoperability -- ELAN, Praat, SFS, EXMARaLDA, CLAPI, PHON

Page 14: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

CA marks

inUnicod

e

Page 15: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

Transcripts linked to media

Page 16: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

16

Ground Rules

• Ethical use, informed consent

• Levels of permission

• Respect for dignity of participants

• Respect for contributors

• Requirement to cite sources

• Requirement to contribute data

Page 17: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

17

Info-CHILDES and Membership

[email protected]

• Archived at LinguistList

• Info-CHIBolts for nuts and bolts

• Membership list

• IASCL Membership

Page 18: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

18

Getting Set Up

• Download CLAN from Programs link

Page 19: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

19

Windows issues

• You can work in c:\childes

• But your administrator may have this locked, so, you may need shortcuts.

• Windows IPA is difficult.

• Windows compression may produce .wmf

Page 20: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

20

Downloading Manuals

CHAT, CLAN

Page 21: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

21

Getting Started

• Open CLAN Manual to Chapter 2

• Double-click application

• Control-D to open Commands Window

• Set Working Directory to

c:\childes\clan\lib\samples

Page 22: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

22

Should look like this:

Windows will be c:\childes\clan\lib\samples

Page 23: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

23

Run FREQ

• Freq sample.cha

• Hit RUN or carriage return

• In output, does “want” occur 3 times?

Page 24: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

24

Interface Features

• Help

• CLAN

• Files In

• Recall

• Set MOR, Lib, Output directories

Page 25: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

25

Files In

Page 26: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

26

Building Commands

• mlu +t*CHI +f sample.cha

• mlu *.cha

• Wildcards

• File output

• *.cha

Page 27: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

27

Changing Directories

• Set Working to: ne32

combo +t*MOT +s"is^*ing" *.cha

• Set Working to: samples

kwal +sbunny +w2 -w2 0042.cha

• Triple click on output line to go back to source file

Page 28: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

28

GEM

• Set Working to: Workshop

• GEM +s* pau001.cha

• Open output, play audio

Page 29: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

29

Exercises - Chapter 8

• MLU50 – mlu +t*CHI +z50u +f *.cha

• MLU5 – maxwd +t*CHI +g1 +c5 +dl 68.cha | mlu >

68.ml5.cex

• TTR– freq +t*CHI +s"*-%%" +f *.cha

Page 30: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

30

BatchFile• maxwd +t*CHI +g1 +c5 +dl 14.cha | mlu > 14.ml5.cex

• maxwd +t*CHI +g1 +c5 +dl 55.cha | mlu > 55.ml5.cex

• maxwd +t*CHI +g1 +c5 +dl 66.cha | mlu > 66.ml5.cex

• maxwd +t*CHI +g1 +c5 +dl 68.cha | mlu > 68.ml5.cex

• maxwd +t*CHI +g1 +c5 +dl 98.cha | mlu > 98.ml5.cex

• Batch batch.cex

• Or just run by highlighting in Commands (Windows)

Page 31: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

31

Tables

Child MLU50

MLU5 TTR MLT Ratio

14 0.10 0.12 1.84 -0.90

55 -0.70 -0.65 -0.15 -0.94

66 -0.25 -0.19 -0.68 -1.14

68 3.10 2.56 -0.67 1.60

98 -0.95 -1.11 -0.55 0.31

Page 32: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

32

The Editor

Page 33: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

33

Playing a linked file

• Esc-8

• Esc-A

• Cont-Click

• F5

Page 34: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

34

Linking a File - F5

• Cursor on *FAT

• Find file

• F5

• Press space for each utterance

• Save

Page 35: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

35

F5 Tricks

• Go back to last good link

• Space quickly through contained overlap

• If a bullet is missing, cut and paste an old one

• For precision, try Sonic Mode

Page 36: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

36

Sonic Mode

• Esc-0 to start

• Highlight area

• Shift-click to move edge

• Have cursor on line in file

• S to insert time marks

• Triple click a linked sentence

Page 37: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

37

Transcribing

• Open new window (Command-N)

• Insert headers – @Begin

– @Languages: en

– @Participants: CHI Target_Child, MOT Mother, FAT Father, ROS Brother

– @Date

• F5 with space at each utterance

• Go back and transcribe each bullet (c-click)

• Adjust time marks using Esc-A

Page 38: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

38

F5, locate sound, enter bullets

click on bullets, transcribe

Page 39: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

39

Or use SoundWalker

Page 40: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

40

Or use the Video Editor

Page 41: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

41

CHECK

• CHECK is CRUCIAL

• Internal: Esc-L

• External: check *.cha

• External CHECK provides fuller control

Page 42: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

42

Options

• Backup

• Wrapping

• Line Numbers

• CHECK

Page 43: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

43

More Options

Line numbers F5 bullets SoundAnalyzer

Page 44: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

44

Coder's Editor

• Open barry.cha

• Esc-0

• Cursor on first line

• Open codeshar.cut

• %spa

• Insert $NIA:AC:IN

Page 45: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

45

Coder's Editor Commands

• F1 finish current tier and go to the next

• Esc-c finish coding current tier

• Esc-t restrict coding to a particular speaker

• Esc-Esc go on to the next speaker

• Esc-s rotate subcodes

• Control-g cancel illegal command

Page 46: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

46

Send to Praat

Open Praat, Click before link, Send to Praat, Run Analysis

Page 47: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

47

Learning to Digitize

Page 48: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

48

Searching, Replacing

• Cont-R, Cont-F

• Space, No, !, control-G

Page 49: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

49

Fixing Things

• CHSTRING

• INSERT (inserts @ID headers)

• FIXIT

• LONGTIER

• FIXBULLETS

• REN

• COMBTIER

Page 50: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

50

Tour of English MOR Files

• Download a copy

• A-rules

• C-rules

• Sf.cut

• Lexicon

Page 51: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

51

Running MOR

• Set MOR directory

• mor +xi (dogs)

• mor +xl barry.cha

• Open barry.ulx.cex

• Fix problems using KWAL

• mor *.cha

Page 52: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

52

POST

• mor barry.cha +1 or else

• mor barry.cha and then

• ren *.mor.cex *.cha +f

• post *.cha +1

Page 53: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

53

Fixing POST• POST is 95% accurate, but some projects

need 100% accuracy

• Eve training set may need error checking

• More data will train a better POST

• POST training is mostly about bootstrapping, using regexp to find and correct subcases leading to error

• Need to remove some POS possibilities and add them back through post-POST rules (spell as N)

Page 54: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

54

CHAT

• What is an utterance?

• What is a word?

• Tour of the CHAT manual

Page 55: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

55

Web Browsing of Video

Page 56: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

56

Some examples

• Forrester

• Rollins

• Yasmin

• Paulo

• Brent, MacWhinney

• Classroom - JLS

Page 57: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

57

Rollins Coding

Page 58: Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

Conclusions

• CHILDES and TalkBank provide solid tools for studying language learning and functioning

• Data-sharing has led to major advances in the field

• New approaches emphasize the use of multimedia analysis, computational linguistics, and speech technology

58