Top Banner
Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center
23

Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Jan 21, 2016

Download

Documents

Jéssica Daniel
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Hierarchical Segmentation:Finding Changes in a Text Signal

Malcolm Slaney and Dulce Ponceleon

IBM Almaden Research Center

Page 2: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Problem Statement Problem

How do we browse video? Goal

Create a table-of-contents Solution

Look for topic changes in text

Page 3: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

TOC Example

Chapter 1

Chapter 2

Page 4: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Overview of This Talk Goal and approach Latent semantic indexing (LSI) Scale space Combination Results

LSIScaleSpaceFilter

Segment

Page 5: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Approach Sentences -> Semantic Space Filter at multiple scales Look for large jumps Three subjects (loops) shown

Loop 1: Polychromaticity Artifacts Loop 2: Emission Tomography Loop 3: Ultrasound Tomography

Page 6: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Courtesy of Jianbo Shi (CMU)

Building on Previous Work LSI and clustering Text tiling Change point analysis Segmentation Scale space

Page 7: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Latent Semantic Indexing Collect histogram of word

frequencies Use SVD to capture frequent

combinations Orthogonal decomposition

Represent in low-dimensional space

Word

s

Docs Docs

10D

Page 8: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

LSI Within a Document Split into chunks

Fixed size Sentences

Compute histograms Perform SVD Look at results Sources

“Principles of Computerized Tomographic Imaging”

PBS News Hour

Page 9: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

LSI – 2D Projection

Chapter 4 of Principles of Computerized Tomographic Imaging

Page 10: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

LSI – Self-similarity Measure

similarity Cosine of angle

between “documents”

Plot all pairs of chunks/sentences

Look for block diagonal

Chapter 4 of Principles of Computerized Tomographic Imaging

Page 11: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Scale-space Filtering What size are the features? Look at different scales! Continuous scale Used for

Object Recognition Feature Detection

Page 12: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Scale-space Movie Green line

marks best high-level segmentation

10d semantic space

Scale varies from 1 to 400 sentences

Page 13: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Scale-space Segmentation Low pass filter signal Form image of scale vs. time Look for changes Track peaks of vector derivative

across scale

Page 14: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Scale-space Example

Derivative as function of scale and sentence

Page 15: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

LSI and Scale Space Putting it all together Split document/transcript Perform LSI analysis Look at change in angle Perform scale-space segmentation Show tree

Page 16: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Scale-Space Image

Peaks in scale-space derivative

Peaks traced to their origin

Page 17: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Results – CT Comparison

Scale-Space Book Headings

Page 18: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Results – News Comparison

Scale-Space Ground Truth

Page 19: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Results – Autocorrelation Block

sentences Measure

correlation Positive

Peak Anti-

correlation

Page 20: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Discussion Issues Evaluation (and ground truth)

Lafferty’s measure Temporal properties

Histogram/SVD chunking size Autocorrelation

Page 21: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

Computational Effort Histogram: O(N) SVD: O(N3) Scale space: O(N2) N < 1000

Number of sentences in a video or document is not large

Page 22: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

LSI Document Lookup Histogram documents Entropy term weighting Compute SVD Use first 10-100 vectors to model

space Encode query as histogram Look for documents in similar

direction

Page 23: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center.

LSI Example Collection of

book titles Differential

equations vs. algorithms and applications