DCU at the NTCIR-11 SpokenQuery&Doc Taskresearch.nii.ac.jp/ntcir/workshop/Online...ASR-based topic tracking for TV broadcast news. INTERSPEECH'11, 2011. [4] D.N. Racca et al. DCU search

Manual Match UnmatchAMLM0

0.020.040.060.08

0.10.120.14

UnmatchAMLM TranscriptsLILPr0.2 LILPr0.5 LIP0.9 TF_IDF

Spoken Query Types

MAP

Slide-group segments with prosody

This research is supported by the Science Foundation Ireland (Grant 12/CE/I2267) as part of CNGL (www.cngl.ie)

Results

Introduction• Speech is more than a simple sequence of words.• Prosodic variation encode rich information about:– emotions, discourse structure, dialogue acts, focus, emphasis, contrast, topic shifting, etc.

• We examined the potential of prosodic prominence in the NTCIR-11 SpokenQuery&Doc Task.

Indexing Stores normalised prosodic features for each term.

David N. Racca Gareth J.F. JonesCNGL Centre for Global Intelligent Content, School of Computing, Dublin City University, Dublin, Ireland

{dracca, gjones}@computing.dcu.ie

Background and Previous Work

Prosody may be useful in speech search: Relationship between stress and TF-IDF scores [1]. SDR exploiting amplitude and duration [2]. Topic tracking exploiting energy and pitch [3]. SCR exploiting pitch, loudness, and duration [4].

Conclusions• No significant differences between prosodic and text based runs.• Transcript quality affects retrieval effectiveness.• Prosodic-based models may be useful for certain queries.

Retrieval Increases weights of prominent terms.

References[1] F. Crestani. Towards the use of prosodic information for spoken document retrieval. SIGIR'01, 2001.[2] B. Chen et al. Improved spoken document retrieval by exploring extra acoustic and linguistic cues. INTERSPEECH'01, 2001.[3] C. Guinaudeau et al. Accounting for prosodic information to improve ASR-based topic tracking for TV broadcast news. INTERSPEECH'11, 2011.[4] D.N. Racca et al. DCU search runs at MediaEval 2014 Search and Hyperlinking. MediaEval 2014 Multimedia Benchmark Workshop, 2014.

IPUs with prosody

IPUGrouping

LecturesWAV

Segment Index


0.020.040.060.08

0.10.120.14

Match TranscriptsLILPr0.5 LIPr0.7 LIDur0.3 TF_IDF

Spoken Query Types

MAP

Match

Unmatch

Manual

Match

Unmatch

Manual

Match

Unmatch

Manual

Manual

Match

Unm

atch

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Query 1: Prosodic-based vs TF_IDF

TF_IDF Prosodic-based

AveP

Sp

oken

Qu

ery

Typ

es


0.020.040.060.08

0.10.120.14

Manual TranscriptsLIPr0.7 LILPr0.7 TF_IDF

Spoken Query Types

MAP

DCU at the NTCIR-11 SpokenQuery&Doc Task

Data Pre-processing

VAD

OpenSMILE

JuliusLVCSR

ChaSen

AnnotationRemoval

Forced Alignment

LectureNormalisation

QueriesWAV

tf-idf単語max (f0 tf-idf)=280.44 Hz Raw

max (f0 tf-idf)=0.58 Normalised

max (f0単語)=236.46 Hz Raw

max (f0単語)=0.49 Normalised

Terrier Matching

tf( i , j)=k1 tf i , j

tf i , j+k1(1−b+bdl j

avdl )idf (i ,C)=log( N

n i

+1)

ac (i , j)={f0 (i , j) Pitch [P]l (i , j) Loudness [L]d (i , j) Duration [Dur]f0range (i , j) Pitch Range [Pr]l (i , j) . f0 (i , j) [LP]l (i , j) . f0range( i , j) [LPr]

rel (q , s j)=∑i

M

w( i , j)

w (i , j)={idf (i ,C)[α . tf (i , j)+(1−α)ac (i , j) ] LI

θir . tf (i , j) . idf (i ,C )+θac .ac( i , j)θir+θac

G

tf (i , j) . idf (i ,C ) TF_IDF

f0 (i , j)=maxk

{max (f0i , jk )}

l(i , j)=maxk

{max( l i , jk

) }

f0range (i , j)=maxk

{max (f0i , jk

) }−mink

{min(f0i , jk

)}

d (i , j)=maxk

{d i , jk }

IPUsWAV

ManuallyAnnotatedTranscripts

F0,Loudness

every 10ms

ASRTranscripts

ManualTranscripts

NormalisedF0,

Loudness

EnrichedASR

Transcripts

EnrichedManual

Transcripts

EnrichedTranscripts

TerrierIndexing

DCU at the NTCIR-11 SpokenQuery&Doc Taskresearch.nii.ac.jp/ntcir/workshop/Online...ASR-based topic tracking for TV broadcast news. INTERSPEECH'11, 2011. [4] D.N. Racca et al. DCU search

Documents