Top Banner
AnnoTone: Record-time Audio Watermarking for Context-aware Video Editing RYOHEI SUZUKI DAISUKE SAKAMOTO TAKEO IGARASHI THE UNIVERSITY OF TOKYO CHI 2015 @ Seoul Session: What do I hear? Communicating with Sound 1
43
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: AnnoTone (CHI 2015)

AnnoTone:Record-time Audio Watermarking

for Context-aware Video Editing

RYOHEI SUZUKI

DAISUKE SAKAMOTO

TAKEO IGARASHI

THE UNIVERSITY OF TOKYO

CHI 2015 @ Seoul

Session: What do I hear? Communicating with Sound

1

Page 2: AnnoTone (CHI 2015)

Video recording and sharing have become

casual hobbies for everyone.

2

Page 3: AnnoTone (CHI 2015)

Camera Computer

Software Broadcasting

3

Page 4: AnnoTone (CHI 2015)

Video Editing is Still Difficult

4

Why?

1. Cost of learning video authoring tools is high

2. Context-aware editing requires much labor

for careful review and trial-and-error

• Adding visual effects

• Clipping scenes

• Adding captions and overlays

• Using additional information (e.g., GPS)

Page 5: AnnoTone (CHI 2015)

Our Objective

Annotating videos with contextual information

during recording to facilitate video editing

5

1. Automate & speed-up video editing activity

2. Enhance expressions using additional data

Page 6: AnnoTone (CHI 2015)

In this talk, we propose

1. A video-annotation technique requiring

no special equipment

2. A video-editing workflow that exploits

contextual information for efficient editing.

6

Page 7: AnnoTone (CHI 2015)

Core Ideas

• Encoding contextual information as

inaudible sound signals

• Embedding encoded annotations directly

into the audio track of video during recording

• Extracting the embedded information

while editing process on demand

7

Page 8: AnnoTone (CHI 2015)

Annotation Embedding

with Smartphone

8

Page 9: AnnoTone (CHI 2015)

1. Hardware Setup

• Attach smartphone to video camera

• Launch annotation-embedding application

Attaching Launching application9

Page 10: AnnoTone (CHI 2015)

2. Video Recording

• Gathering annotation from user input or sensors

• Converting them into inaudible audio signals

User Input Sensors

Scene

Annotation Signals

10

Page 11: AnnoTone (CHI 2015)

Editing Workflow with

Embedded Annotations

11

Page 12: AnnoTone (CHI 2015)

Workflow Overview

12

• Extract embedded annotation from audio track

• Remove annotation signals after editing

Page 13: AnnoTone (CHI 2015)

Editing Pipeline

Generally, video-editing involves

a line of pipelined processes.

AddingCaptions& Effects

ColorCorrection

Clipping… …

13

Page 14: AnnoTone (CHI 2015)

Editing Pipeline

Annotated audio track can pass through

the existing pipeline as ordinary one.

AddingCaptions& Effects

Color Correction

Clipping… …

14

Page 15: AnnoTone (CHI 2015)

Annotation Extraction

AddingCaptions& Effects

ColorCorrection

Clipping… …

15

Annotation data is extracted on demand

using our Watermark Extraction API

Watermark Extractor

Annotation Data

Page 16: AnnoTone (CHI 2015)

Annotation Removal

AddingCaptions& Effects

AudioMastering

Clipping… …

16

After the process, annotation signals

can be removed by applying an audio filter.

Audio Filter

Page 17: AnnoTone (CHI 2015)

Applications

17

Page 18: AnnoTone (CHI 2015)

Record-time Editing

Recording: information of Success/Failure

Editing: Automatic extraction of successful parts

Recording

Success Failure Success

Good! Bad! Good!

Success Success

Automatic extraction & combining

(time)

18

Page 19: AnnoTone (CHI 2015)

Video-editing with GPS

19

Recording: GPS positions

Editing: location-aware editing

Clipping movie by

sketching on a mapAutomatic map overlay

Page 20: AnnoTone (CHI 2015)

Automatic Overlaying

20

Recording: chess note of a game

Editing: automatic overlaying of board graphics

Notation UI Synthesized video20

Page 21: AnnoTone (CHI 2015)

Integrating with AfterEffects

AnnoTone plugin provides annotation data for AE

which can be used for generating effects

Exploiting annotations with existing practice21

Controlling AE animation

with sensor data

Page 22: AnnoTone (CHI 2015)

Integrating with AfterEffects

1. Analyzing footage to extract annotations

2. Generating a text layer containing JSON-

formatted annotation data at timeframe

3. Associating video effects/parameters with

annotations using expressions mechanism

22Footage

Effect control

(Javascript)JSON text layer

[{x: 138.0019,y: 38.13840},{x: 139.0133,y: 38.43405}]…

Page 23: AnnoTone (CHI 2015)

Annotation by

Audio Watermarking

23

Page 24: AnnoTone (CHI 2015)

Human’s Hearing Characteristics

Human cannot perceive high-frequency sounds.

Sakamoto, Masayuki, et al. "Average thresholds in the 8 to 20 kHz range as a function of age.”

Scandinavian audiology 27.3 (1998): 189-192.24

Page 25: AnnoTone (CHI 2015)

Data-hiding as High-frequency

Audio Signals

25

Fre

quency (

Hz)

20

20k

22k

18k

High-frequency

RangeRecordable

RangeAudible

Range

We can hide information in the audio track

as high-frequency signals (audio watermarks).

Microphone Human

Page 26: AnnoTone (CHI 2015)

Spectrogram of audio track

High-frequency region

(almost inaudible)

26

Data-hiding as High-frequency

Audio Signals

Hidden

information

Page 27: AnnoTone (CHI 2015)

Benefit of Audio Watermarking

27

• Compatible with almost all video cameras

• Consistent synchronization between

annotations and video sequence

• Removable by applying low-pass filter

Page 28: AnnoTone (CHI 2015)

Watermarking Protocol

28

• Dual-Tone Multi-Frequency (DTMF)

– Representing 4-bits information by combination of

two single tones from 7 frequencies

• Packet representation

– Variable-length payload

– 400 bps gross data rate

Spectrogram of a watermark packet

Page 29: AnnoTone (CHI 2015)

Related Work

29

Page 30: AnnoTone (CHI 2015)

ContextCam[Patel & Abowd, 2004]

Incompatible with existing video cameras.

Using special camera to record contexts of home videos

Storing annotations in frames by image watermarking

30

Page 31: AnnoTone (CHI 2015)

Cryptone (Ultra Sound Control)[Hirabayashi & Shimizu, 2012]

AnnoTone uses similar audio data-hiding method

for video editing support.

0100111010

Interaction between loudspeaker and smartphones

using high-frequency tones to convey information

31

Page 32: AnnoTone (CHI 2015)

Performance Evaluation

33

Page 33: AnnoTone (CHI 2015)

0

20

40

60

80

100

667 571 500 444 400 364

Co

rrect

dete

cti

on

rate

(%

)

Gross bitrate (bps)

silent

public

rock

electronic

Data-rate vs. Reliability

~100% correct detection rate was achieved

with 400 bps annotation data rate. 34

Page 34: AnnoTone (CHI 2015)

Travel Distance

Watermark signal can travel up to 20cmthrough air from a smartphone speaker 35

0

20

40

60

80

100

0 5 10 15 20 25 30

Co

rrect

de

tecti

on

rate

(%

)

Distance between speaker and microphone (cm)

silent

public

rock

electronic

Page 35: AnnoTone (CHI 2015)

Durability against Conversion

36

Watermarks are preserved after conversion into

Ogg Vorbis, AC-3 and AAC with enough bitrate.

0

20

40

60

80

100

128 192 256 320

Co

rrect

de

tecti

on

rate

(%

)

Bit rate (kbps)

MP3

Ogg Vorbis

AC-3

AAC

Page 36: AnnoTone (CHI 2015)

Transparency for Human Ear

37

Measured noticeability of watermarks for human

• Click a button after notice of noise (6 participants)

0

20

40

60

80

100

silent public rock electronic

No

ticed

Wate

rmark

Rate

(%

)

Before Erasure

After Erasure

Page 37: AnnoTone (CHI 2015)

Limitations

38

• One-off development of

annotation-embedding applications

• Audio quality loss in watermark removal

• Limited data-rate of annotation

Page 38: AnnoTone (CHI 2015)

Future Work

39

Page 39: AnnoTone (CHI 2015)

Embedding from Public Speaker

40

• Synchronization & integration of large number

of videos to create multi-view videos, etc.

• Entertainment use at amusement parks, etc.

“Sleeping Beauty Castle at Disneyland” by Lyght

Licensed under CC BY-SA 3.0

“Picture of Stadium” by Jazza5

Licensed under CC BY-SA 3.0

Page 40: AnnoTone (CHI 2015)

Conclusion

41

Page 41: AnnoTone (CHI 2015)

We proposed

42

a video annotation technique using audio watermarking,

and a video-editing workflow exploiting annotations.

BenefitAnnoTone can facilitate and enhance non-professional

video editing process without special equipment.

Page 42: AnnoTone (CHI 2015)

43

Page 43: AnnoTone (CHI 2015)

Compared with

Smartphone Recording

Some smartphone camera apps can record

annotation as metadata format (e.g., Adobe XMP)

– Of course, using such apps is clever for smartphone

recording occasions

What’s AnnoTone’s superiority?

• Dedicated video cameras are still superior to

smartphone camera

– In resolution, definition, lens quality, etc.

• No need of dealing with external metadata

– Because annotations are directly embedded as sound44