Top Banner
Carnegie Mellon
34

Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Page 2: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Multimedia

Michael ChristelAlex Hauptmann

Rong Jin (TA)

http://www.cs.cmu.edu/~alex/mmCourse

Page 3: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

How to get in touch with us• Mike Christel

[email protected]

• http://www.cs.cmu.edu/~christel

• (412)268-7799 or x8-7799

• WeH5212

• Alex Hauptmann

[email protected]

• http://www.cs.cmu.edu/~alex

• (412)268-1448 or x8-1448

• WeH5124

– Office Hours by Appointment

Page 4: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Teaching Assistant

• Rong Jin

[email protected]

• Office WeH5316

• Office hours by appointment

• (412)268-4050 or x8-4050

Page 5: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Course Outline, Part 1 of 3

More details at www.cs.cmu.edu/~alex/mmCourse

October 22 Intro to Multimedia

October 25 Multimedia Enabling Technologies, Macromedia Flash Intro and Demo

October 29 Sound Processing, Speech Recognition

November 1 Digital Video Creation and Transmission

November 5 Speech Synthesis

Page 6: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Course Outline, Part 2 of 3

More details at www.cs.cmu.edu/~alex/mmCourse

November 8 Image Processing

November 12 Digital Music and Music Processing

November 15 Multimedia Internet Protocols, SMIL

November 19 Synthetic Interviews: A Multimedia Company

(Experiences from the Field)

November 22 Programming for Interactive Multimedia (CGI

Scripts/ASP)

Page 7: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Course Outline, Part 3 of 3

More details at www.cs.cmu.edu/~alex/mmCourse

November 29 Content Analysis and Coding of Digital Audio and Video, Multimedia Storage and Retrieval

Management.

December 3 Video Retrieval Evaluation and TestingMultimedia Interface Design, Digital Libraries

December 6 Visual Design, Multimedia Interface Design Guidelines, Multimedia use in the

future (Experience on Demand)

December 10 Multimedia as Entertainment Technology, Virtual Reality

Page 8: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Homeworks

• See http://www.cs.cmu.edu/~alex/mmCourse

• 9 Homeworks planned, 10 points each

• One hard homework will be worth 20 points

• No final, no midterm

• Publish homeworks on your web page - email us URL

• Space?

Page 9: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Today: Intro to Multimedia

Apple Knowledge Navigator Vision 1988

Page 10: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

AudioAudio

ImagesImages

InformationInformationRetrievalRetrieval

StorageStorageSystemsSystems

NetworkingNetworking PsychologyPsychology

HCIHCI

DataDataCompressionCompression

NaturalNaturalLanguageLanguageProcessingProcessing

MultimedMultimediaia

CPU PowerCPU Power

VideoVideo

Page 11: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Definition of Multimedia

• Multi (latin multus - numerous)

• Media, medium (latin medius, medium: middle, center, intermediary; latin mediat: intermediary, means)

• Multiple types of information captured, stored, manipulated, transmitted, and presented.

• Specifically: Images, Video, Audio (+Speech) and Text

Page 12: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Definition of Multimodal

• Multi (latin multus - numerous)

• Modal (latin modus: manner)

• Traditionally refers to input/output formats:

• Input:

• sounds, speech (mike)

• gestures (camera, tablet)

• eye-gaze (camera),

• mouse,

• keyboard

• Output:

• sounds, speech

• video

• Pictures

• Animations

• Text

Page 13: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Perceived Information

• Physical Variables

• Sound is a waveform

• An image is a waveform

• light is electromagnetic radiation with different intensity in spatial coordinates

• color corresponds to wavelength

Page 14: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

History of Multimedia I

• Analog signals to sensors

• E.g. vinyl records

• Fidelity is faithfulness to the original

• Digital representation (‘60s)

• Sampling

• Quantizing

• Coding

• codec, modem, (A/D and D/A)

Page 15: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Hardware Advances

• CPU• Bus • Network I/O• Keyboard, Mouse• Disk• Mike + A/D Board• Camera + A/D Board• Speakers (+ D/A Board)• Display

Page 16: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

History of Multimedia II

• Analog controls only

• Special hardware (Displays, Scanners, FFTs)

• Integrated hardware components

• Further Integration

• Other devices

Page 17: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

History of Multimedia III

Limiting Factors:

• Storage Limits

• CPU Speeds

• I/O Speeds

• Network Bandwidth

Page 18: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Why Digital?

• Universal storage, transmission format

• CD, internet

• Precision (Range of values, number of bits, floating point)

• Lossless transmission/storage

BUT:

• sampling rate distorts information

• size requirements may be ‘large’ compared to analog

Page 19: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Digitization Process

• Sampling from an analog signal• Sampling Errors relate to signal frequencies

• Quantization Errors

Page 20: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Text

• ASCII, Unicode• Formatted Text, Rich Text• Document Formats:

– Structured: Tex, HTML– Page Descriptions: Postscript, PDF

Page 21: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Graphics

• Objects– circles, splines, rectangles, lines

• Editable– resize, reshape, move, colorize

• Synthetic

Page 22: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Images (Pictures)

• Fixed digitized representation– bitmap, colors per pixel

• Editable in limited ways– retouch, cut and paste, remap colors, filter

[Photoshop tools]– no ‘model’ of the thing

• Captured– not just from real life, clip art, screen dump

Page 23: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Audio

• Sounds– hear 15 Hz to 20 kHz– Speech is 50 Hz to 10 kHz

• Speech Recognition– It is hard to wreck a nice beach– Ice cream I scream

• Synthesis– Speech– Music

MIDI for 127 instruments, 47 percussion sounds

Notes, timing

Page 24: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Speech Recognition Issues

• Continuous vs Discrete• Vocabulary Size• Channel (Microphone)• Environment (Location of mike and Speaker)• Speaker Dependent/Speaker Independent• Context (Language Model)• Interactivity (Dialog Model)

Page 25: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Acoustic Modeling

Describes the sounds thatmake up speech

Lexicon

Describes which sequences of speech

sounds make upvalid words

Language Model

Describes the likelihoodof various sequences of

words being spoken

Speech Recognition

Speech Recognition Knowledge Sources

Page 26: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Speech Variations

Style Variations

careful, clear, articulated, formal, casualspontaneous, normal, read,

dictated, intimateVoice Quality

breathy, creaky,whispery, tense,

lax, modal

Context

sport, professional,interview,

free conversation,man-machine dialogue

Speaking Rate

normal, slow, fast,very fast

Stress

in noise, with increased vocaleffort (Lombard reflex),

emotional factors (e.g. angry),under cognitive load

Page 27: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Video

• Frames comprise the video– Frame rate = delay between successive frames– minimal change between frames

• Sequencing creates the illusion of movement> 16 fps is “smooth”

Standards: 29.97 is NTSC, 25 is PAL, 60 is HDTV

Interlacing

• Display scan rate is different – monitor refresh rate– 60 - 70 Hz (= 1/s)

Page 28: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Captured vs. Synthetic

• Animation vs Video

• Graphics vs Pictures

• Synthesizer vs Recording

• Storage? Manipulation? Processor Requirements?

• Fidelity to real world

• Hybrids are possible

Page 29: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Why is Multimedia Important?

• Our society -

– captures its experience,– records its accomplishments,– portrays its past– informs its masses……in pictures, audio and video

• For many, CNN has become the “publication of record”

• Multimedia learning leverages “multiple intelligences” Gardner, 1993

• Multimedia Digital libraries are an essential component of

– formal, informal, and professional learning– distance education, telemedicine

Page 30: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Technology Push vs Market Pull

– Home Entertainment– Catalog Ordering– Multimedia Training, Education– Videoconferencing– Professional Video Services– Videomail– Speech Recognition

Page 31: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Hype vs. Reality

• What is feasible, under what circumstances?

• What is possible?

• What is impossible?

• What is unlikely?

Page 32: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Multimedia Visions

• DARPA: Dominate the Battle Space• HP “1995”• LSI “Flash Point”• HP “Synergies”

Page 33: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon

Intro to Multimedia

That’s all for today

Page 34: Carnegie Mellon. Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) alex/mmCourse.

CarnegieMellon