Introduction to Multimedia Introduction to Multimedia and and MSEC 20-791 MSEC 20-791 Mike Christel Alex Hauptmann ARCHIVE: http://www.cs.cmu.edu/~christel/MM2002/s yllabus.htm
Introduction to Multimedia and Introduction to Multimedia and MSEC 20-791MSEC 20-791
Mike ChristelAlex Hauptmann
ARCHIVE: http://www.cs.cmu.edu/~christel/MM2002/syllabus.htm
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 2 Carnegie Mellon
Contact InformationContact Information
Mike ChristelMike [email protected]@cs.cmu.eduhttp://www.cs.cmu.edu/~christelhttp://www.cs.cmu.edu/~christel(412) 268-7799(412) 268-7799Wean Hall 5212Wean Hall 5212
Alex HauptmannAlex [email protected]@cs.cmu.eduhttp://www.cs.cmu.edu/~alexhttp://www.cs.cmu.edu/~alex(412) 268-1448(412) 268-1448Wean Hall 5124Wean Hall 5124
Office Hours by AppointmentOffice Hours by Appointment
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 3 Carnegie Mellon
Teaching AssistantTeaching Assistant
Rong YanRong [email protected]@cs.cmu.eduhttp://www.cs.cmu.edu/~yanronghttp://www.cs.cmu.edu/~yanrong(412) 268-9515(412) 268-9515Newell Simon Hall 4533Newell Simon Hall 4533
Office Hours by AppointmentOffice Hours by Appointment
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 4 Carnegie Mellon
Carnegie Mellon Campus MapCarnegie Mellon Campus Map
Wean Hall
Newell-Simon Hall
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 5 Carnegie Mellon
Course OutlineCourse Outline
Oct. 24Oct. 24 Introduction to MultimediaIntroduction to Multimedia
Oct. 29Oct. 29 Images as Multimedia InterfaceImages as Multimedia InterfaceComponents; Intro to Macromedia Flash 5Components; Intro to Macromedia Flash 5
Oct. 31 Oct. 31 Digital Audio; Speech RecognitionDigital Audio; Speech Recognition
Nov. 5Nov. 5 Image Processing and Computer VisionImage Processing and Computer Vision
Nov. 7Nov. 7 Speech Synthesis and Speech Speech Synthesis and Speech Dialogue ApplicationsDialogue Applications
Nov. 12Nov. 12 Digital VideoDigital Video
Nov. 14Nov. 14 Multimedia via Cell Phones and PDAsMultimedia via Cell Phones and PDAs
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 6 Carnegie Mellon
Course OutlineCourse Outline
Nov. 19Nov. 19 Web Specifications, MM SynchronizationWeb Specifications, MM Synchronization
Nov. 21Nov. 21 Digital Music and Music ProcessingDigital Music and Music Processing
Nov. 26Nov. 26 MM Projects: Project LISTEN, InformediaMM Projects: Project LISTEN, Informedia
Dec. 3Dec. 3 Multimedia Information Retrieval, Multimedia Information Retrieval, TREC Interactive Video Track TREC Interactive Video Track
Dec. 5Dec. 5 Multimedia and Entertainment: Carnegie Multimedia and Entertainment: Carnegie Mellon’s Entertainment Technology Mellon’s Entertainment Technology
CenterCenter
Dec. 10Dec. 10 MM Content Analysis: Digital Human MM Content Analysis: Digital Human Memory; Informedia Interface Evaluation Memory; Informedia Interface Evaluation
Dec. 12Dec. 12 (MM Experiences from the Field planned…)(MM Experiences from the Field planned…)
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 7 Carnegie Mellon
GradingGrading
• No midterm, no finalNo midterm, no final
• Textbook plus recommended links/readingsTextbook plus recommended links/readings
• Grading based on homeworks (90%), class presence Grading based on homeworks (90%), class presence and participation (10%)and participation (10%)• Homeworks MUST be published to your web site; Homeworks MUST be published to your web site;
email me ([email protected]) by next class your email me ([email protected]) by next class your base URL from which a “MSEC 20-791” link will exist base URL from which a “MSEC 20-791” link will exist
• Homework time deadlines are strictly enforced: loss Homework time deadlines are strictly enforced: loss of 10% per day late for each assignmentof 10% per day late for each assignment
• Flash homework is worth twice other homeworksFlash homework is worth twice other homeworks• 10% for class time meant to encourage you to show 10% for class time meant to encourage you to show
up mentally and physically for classup mentally and physically for class
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 8 Carnegie Mellon
Definition of Multimedia
Multi (Latin multus - numerous)Multi (Latin multus - numerous)
Media, medium (Latin medius, medium: middle, center, Media, medium (Latin medius, medium: middle, center, intermediary; Latin mediat: intermediary, means)intermediary; Latin mediat: intermediary, means)
Multiple types of information captured, stored, Multiple types of information captured, stored, manipulated, transmitted, and presented. manipulated, transmitted, and presented.
Specifically: Images, Video, Audio (+Speech) and TextSpecifically: Images, Video, Audio (+Speech) and Text
Related terms: hypermedia, hypertextRelated terms: hypermedia, hypertext
Problem: “hypertext”, “hypermedia”, “multimedia” so Problem: “hypertext”, “hypermedia”, “multimedia” so overused/generalized they now convey little meaningoverused/generalized they now convey little meaning
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 9 Carnegie Mellon
A Few Items in a Multimedia TimelineA Few Items in a Multimedia Timeline
Pre-Digital Age:Pre-Digital Age: suggestions?suggestions?
see “Multimedia: From Wagner to Virtual Reality”, see “Multimedia: From Wagner to Virtual Reality”, http://www.artmuseum.net/w2vr/timeline/timeline.htmlhttp://www.artmuseum.net/w2vr/timeline/timeline.html
1906 – Color photography made practicable 1906 – Color photography made practicable http://www.niepce.com/pagus/pagus-inv.htmlhttp://www.niepce.com/pagus/pagus-inv.html
1945 – Vannevar Bush, memex “As We May Think”1945 – Vannevar Bush, memex “As We May Think”http://www.theatlantic.com/unbound/flashbks/computer/bushf.htmhttp://www.theatlantic.com/unbound/flashbks/computer/bushf.htm
1960s – Ted Nelson, Xanadu, “a universal instantaneous hypertext 1960s – Ted Nelson, Xanadu, “a universal instantaneous hypertext publishing network”publishing network”
1967 – Nicholas Negroponte formed MIT Architecture Machine Group 1967 – Nicholas Negroponte formed MIT Architecture Machine Group (later in 1985 MIT Media Lab opens)(later in 1985 MIT Media Lab opens)
1987 – RCA’s David Sarnoff Labs’ announce Digital Video Interactive1987 – RCA’s David Sarnoff Labs’ announce Digital Video Interactive
1988 – Apple “Knowledge Navigator” vision1988 – Apple “Knowledge Navigator” vision
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 10 Carnegie Mellon
Multimedia Timeline, ContinuedMultimedia Timeline, Continued
1989 – Tim Berners-Lee proposed the World Wide Web to CERN1989 – Tim Berners-Lee proposed the World Wide Web to CERN
1991 – Motion Picture Experts Group1991 – Motion Picture Experts Group
1993 – NCSA Mosaic1993 – NCSA Mosaic
1994 – Netscape; creation of World Wide Web Consortium (W3C)1994 – Netscape; creation of World Wide Web Consortium (W3C)
1995 – JAVA for platform-independent application development1995 – JAVA for platform-independent application development
1996 – PNG (Portable Network Graphics)1996 – PNG (Portable Network Graphics)
1997 – HTML 4.01997 – HTML 4.0
1998 – XML 1.01998 – XML 1.0
1999 – XSLT 1.0 and Xpath 1.01999 – XSLT 1.0 and Xpath 1.0
2001 – MPEG-7, JPEG 2000, SVG2001 – MPEG-7, JPEG 2000, SVG
2002 – intellectual property and JPEG 2000 (www.jpeg.org/newsrel1.html)2002 – intellectual property and JPEG 2000 (www.jpeg.org/newsrel1.html)
Help with alphabet soup: http://www.w3c.org, other on-line multimedia course Help with alphabet soup: http://www.w3c.org, other on-line multimedia course glossaries, e.g., http://www.cs.cornell.edu/courses/cs631/1999sp/glossaries, e.g., http://www.cs.cornell.edu/courses/cs631/1999sp/
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 11 Carnegie Mellon
Top Ten Misconceptions about Top Ten Misconceptions about Multimedia ComputingMultimedia Computing
Ramesh Jain, founding chairman of Virage and CTO of Ramesh Jain, founding chairman of Virage and CTO of Praja, Praja, www.praja.com, presented the following “top ten” www.praja.com, presented the following “top ten” MISCONCEPTIONS list as part of his keynote speech at MISCONCEPTIONS list as part of his keynote speech at the ACM Multimedia Conference, Ottawa, Canada, the ACM Multimedia Conference, Ottawa, Canada, October 2, 2001:October 2, 2001:
10.10. Video = Multimedia.Video = Multimedia.
9.9. Multimedia = multi X separate medium.Multimedia = multi X separate medium.
8. 8. All information is ONLY in the images or video.All information is ONLY in the images or video.
7. 7. Editing of media is almost always off-line.Editing of media is almost always off-line.
6. 6. Query by example is best access method.Query by example is best access method.
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 12 Carnegie Mellon
Top Ten Misconceptions about Top Ten Misconceptions about Multimedia Computing, ContinuedMultimedia Computing, Continued
5.5. All users have PhDs in multimedia computing.All users have PhDs in multimedia computing.
4.4. Users have no memory or context.Users have no memory or context.
3. 3. Computers are for computing.Computers are for computing.
2. 2. Medium is the message.Medium is the message.
1. 1. We work for computers.We work for computers.
Ramesh Jain concluded his keynote talk with the Ramesh Jain concluded his keynote talk with the observation:observation:
Information Builds Experience, Experience is Life.Information Builds Experience, Experience is Life.
AudioAudio
ImagesImages
InformationInformationRetrievalRetrieval
StorageStorageSystemsSystems
NetworkingNetworking PsychologyPsychology
HCIHCI
DataDataCompressionCompression
NaturalNaturalLanguageLanguageProcessingProcessing
MultimedMultimediaia
CPU PowerCPU Power
VideoVideo
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 14 Carnegie Mellon
Multimedia PhysicsMultimedia Physics
• Sound is a waveformSound is a waveform
• Imagery is a waveformImagery is a waveform• light is electromagnetic radiation with different intensity in light is electromagnetic radiation with different intensity in
spatial coordinatesspatial coordinates• color corresponds to wavelength (red is the longest color corresponds to wavelength (red is the longest
wavelength visible by people)wavelength visible by people)
• Introductory treatment of “light behaves as both particle Introductory treatment of “light behaves as both particle and wave” at http://www.howstuffworks.com/light1.htm and wave” at http://www.howstuffworks.com/light1.htm
• ““Distributed Multimedia” by Palmer Agnew and Anne Distributed Multimedia” by Palmer Agnew and Anne Kellerman, published by Atomic Dog Publishing, Kellerman, published by Atomic Dog Publishing, http://www.atomicdogpublishing.comhttp://www.atomicdogpublishing.com
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 15 Carnegie Mellon
A Quick Introduction to Light WavesA Quick Introduction to Light Waves
• Derived from: Derived from: http://www.pbs.org/deepspace/classroom/activity2.htmlhttp://www.pbs.org/deepspace/classroom/activity2.html
• Waves characterized by wavelength and frequencyWaves characterized by wavelength and frequency
• Light is a type of electromagnetic radiation in a range for which our Light is a type of electromagnetic radiation in a range for which our eyes are sensitiveeyes are sensitive
• Sound is not electromagnetic radiation, but sound is a wave as well. Sound is not electromagnetic radiation, but sound is a wave as well. Higher pitches are caused by higher frequencies of vibrating Higher pitches are caused by higher frequencies of vibrating molecules that reach your eardrum. Lower pitches are likewise molecules that reach your eardrum. Lower pitches are likewise caused by lower frequencies. caused by lower frequencies.
wavelength
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 16 Carnegie Mellon
Wavelength/ Frequency SpectrumWavelength/ Frequency Spectrum
Long radio waves Microwaves X-rays Gamma rays
TV, FM Infrared Ultraviolet
700 nm 600 nm 500 nm 400 nm
4.5x1014 Hz 5x1014 Hz 6x1014 Hz 7x1014 Hz
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 17 Carnegie Mellon
Migration from Analog to Digital RepresentationMigration from Analog to Digital Representation
• Analog signals to sensorsAnalog signals to sensors• E.g. vinyl recordsE.g. vinyl records• Fidelity is faithfulness to the originalFidelity is faithfulness to the original
• Digital representation (1960s)Digital representation (1960s)• SamplingSampling• QuantizingQuantizing• CodingCoding
• Limiting factors in move to digital:Limiting factors in move to digital:• Storage limitsStorage limits• CPU speedsCPU speeds• I/O speedsI/O speeds• Network bandwidthNetwork bandwidth
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 18 Carnegie Mellon
Loss of Fidelity Due to SamplingLoss of Fidelity Due to Sampling
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 19 Carnegie Mellon
Loss of Fidelity Due to QuantizingLoss of Fidelity Due to Quantizing
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 20 Carnegie Mellon
Overview of Compression StrategiesOverview of Compression Strategies
• Lossless CompressionLossless Compression• Huffman EncodingHuffman Encoding• Adaptive Huffman EncodingAdaptive Huffman Encoding• Lempel-Ziv-Welch (LZW)Lempel-Ziv-Welch (LZW)
• used in GIFused in GIF• JPEG-LSJPEG-LS
• Lossy CompressionLossy Compression• JPEGJPEG• H.261, MPEG-1, MPEG-2H.261, MPEG-1, MPEG-2
• Lossless and Lossy TogetherLossless and Lossy Together• JPEG 2000JPEG 2000
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 21 Carnegie Mellon
Huffman Encoding ProcedureHuffman Encoding Procedure
1. Initialization: Put all items in a list L, sorted by freq. 1. Initialization: Put all items in a list L, sorted by freq.
2. Repeat until L has only one node left: 2. Repeat until L has only one node left:
(a) From L pick two nodes having the lowest frequency, (a) From L pick two nodes having the lowest frequency, create a parent node of them. create a parent node of them.
(b) Assign the sum of the children's frequencies to the (b) Assign the sum of the children's frequencies to the parent node and insert it into L (kept in sorted order). parent node and insert it into L (kept in sorted order).
(c) Assign code 0, 1 to the two branches of the tree, and (c) Assign code 0, 1 to the two branches of the tree, and delete the children from L. delete the children from L.
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 22 Carnegie Mellon
• Input: “ALOHA HAWAII”Input: “ALOHA HAWAII”
• Frequency: 4A, 2H, 2I, 1L, 1O, 1 space, 1WFrequency: 4A, 2H, 2I, 1L, 1O, 1 space, 1W
• 96 bits (8 bits * 12 characters) to 32 bits:96 bits (8 bits * 12 characters) to 32 bits:
Huffman Coding ExampleHuffman Coding Example
A
I H
L [space] W O
0 1
0
0 0
0
01
1
1
11
A=0, I=100, H=101, L=1100, space=1101, etc.A=0, I=100, H=101, L=1100, space=1101, etc.
RECOMMENDED: Java applet example at RECOMMENDED: Java applet example at http://www.cs.sfu.ca/CC/365/li/squeeze/index.htmlhttp://www.cs.sfu.ca/CC/365/li/squeeze/index.html
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 23 Carnegie Mellon
Why Digital?Why Digital?
• Universal storage, transmission format Universal storage, transmission format
• CD, InternetCD, Internet
• Precision (range of values, number of bits, floating Precision (range of values, number of bits, floating point)point)
• Lossless transmission/storageLossless transmission/storage
BUT:BUT:
• Sampling rate distorts informationSampling rate distorts information
• Size requirements may be huge compared to analog, Size requirements may be huge compared to analog, e.g., 4.2 million pixels for single 35 mm photograph!e.g., 4.2 million pixels for single 35 mm photograph!
results in lots of work on perception-based lossy digital results in lots of work on perception-based lossy digital compression strategiescompression strategies
Why Perception MattersWhy Perception Matters
http://www.libertarian.on.ca/images/Florida%20Recount.jpghttp://www.libertarian.on.ca/images/Florida%20Recount.jpg
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 25 Carnegie Mellon
AudioAudio
• SoundsSounds• Hear 15 Hz to 20 kHzHear 15 Hz to 20 kHz• Speech is 50 Hz to 10 kHzSpeech is 50 Hz to 10 kHz
• Speech RecognitionSpeech Recognition• It is hard to wreck a nice beach / It is hard to recognize It is hard to wreck a nice beach / It is hard to recognize
speechspeech• Ice cream / I scream Ice cream / I scream
• SynthesisSynthesis• SpeechSpeech• Music Music
• MIDI for 127 instruments, 47 percussion soundsMIDI for 127 instruments, 47 percussion sounds
• Notes, timingNotes, timing
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 26 Carnegie Mellon
Speech Recognition IssuesSpeech Recognition Issues
• Continuous vs. discreteContinuous vs. discrete
• Vocabulary sizeVocabulary size
• Channel (microphone)Channel (microphone)
• Environment (location of microphone and speaker)Environment (location of microphone and speaker)
• Speaker dependent/speaker independentSpeaker dependent/speaker independent
• Context (language model)Context (language model)
• Interactivity (dialog model)Interactivity (dialog model)
Acoustic Modeling
Describes the sounds thatmake up speech
Lexicon
Describes which sequences of speech
sounds make upvalid words
Language Model
Describes the likelihoodof various sequences of
words being spoken
Speech Recognition
Speech Recognition Knowledge Sources
Speech Variations
Style Variations
careful, clear, articulated, formal, casualspontaneous, normal, read,
dictated, intimateVoice Quality
breathy, creaky,whispery, tense,
lax, modal
Context
sport, professional,interview,
free conversation,man-machine dialogue
Speaking Rate
normal, slow, fast,very fast
Stress
in noise, with increased vocaleffort (Lombard reflex),
emotional factors (e.g. angry),under cognitive load
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 29 Carnegie Mellon
VideoVideo
• Video is made up of framesVideo is made up of frames• Frame rate = delay between successive framesFrame rate = delay between successive frames• Minimal change between framesMinimal change between frames• Sequencing creates the illusion of movementSequencing creates the illusion of movement
• 16 frames per second (fps) is “smooth”16 frames per second (fps) is “smooth”
• Standards: NTSC 29.97 fps, PAL fps, HDTV 60 fpsStandards: NTSC 29.97 fps, PAL fps, HDTV 60 fps• InterlacingInterlacing
• Display scan rate is different Display scan rate is different • Monitor refresh rate, e.g., 60-70 Hz = ~1/secondMonitor refresh rate, e.g., 60-70 Hz = ~1/second
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 30 Carnegie Mellon
Captured vs. SyntheticCaptured vs. Synthetic
• Animation vs. VideoAnimation vs. Video
• Vector Graphics vs. Bitmap/Raster PicturesVector Graphics vs. Bitmap/Raster Pictures
• Synthesizer vs. RecordingSynthesizer vs. Recording
• Storage? Manipulation? Processor Requirements?Storage? Manipulation? Processor Requirements?
• Fidelity to real world Fidelity to real world
• Hybrids are possibleHybrids are possible
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 31 Carnegie Mellon
Why is Multimedia Important?Why is Multimedia Important?
• Our society -Our society -• captures its experience,captures its experience,• records its accomplishments,records its accomplishments,• portrays its pastportrays its past• informs its massesinforms its masses…………in pictures, audio and videoin pictures, audio and video
• For many, CNN has become the “publication of record”For many, CNN has become the “publication of record”
• Multimedia learning leverages “multiple intelligences”Multimedia learning leverages “multiple intelligences”
• Multimedia Digital Libraries are an essential component Multimedia Digital Libraries are an essential component ofof• formal, informal, and professional learningformal, informal, and professional learning• distance education, telemedicinedistance education, telemedicine
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 32 Carnegie Mellon
Technology Push vs. Market PullTechnology Push vs. Market Pull
• Home EntertainmentHome Entertainment
• Catalog OrderingCatalog Ordering
• Multimedia Training, EducationMultimedia Training, Education
• VideoconferencingVideoconferencing
• Professional Video ServicesProfessional Video Services
• VideomailVideomail
• Speech RecognitionSpeech Recognition
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 33 Carnegie Mellon
Hype vs. RealityHype vs. Reality
What is feasible, under what circumstances?What is feasible, under what circumstances?
What is possible?What is possible?
What is impossible?What is impossible?
What is unlikely?What is unlikely?
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 34 Carnegie Mellon
A Multimedia Vision for the Home MarketA Multimedia Vision for the Home Market
FX Palo Alto LaboratoryFX Palo Alto Laboratory
John J. Doherty, Lynn Wilcox, and Andreas John J. Doherty, Lynn Wilcox, and Andreas GirgensohnGirgensohn
““A Night at the Opera” A Night at the Opera”
Video to appear as part of theVideo to appear as part of the
ACM Multimedia Conference, 2002ACM Multimedia Conference, 2002(7:11)(7:11)
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 35 Carnegie Mellon
Upcoming HomeworkUpcoming Homework
Register: send email to [email protected] with URL where Register: send email to [email protected] with URL where your homeworks will be located (we will use that URL plus your homeworks will be located (we will use that URL plus your sending email address for future correspondence) – your sending email address for future correspondence) – before Oct. 28before Oct. 28
Homework 1: Multimedia lookup via the web – Oct. 28Homework 1: Multimedia lookup via the web – Oct. 28
Homework 2: Scanning and image search – Oct. 30/Nov. 4Homework 2: Scanning and image search – Oct. 30/Nov. 4
Homework 3: Animation via Macromedia Flash – Nov. 24Homework 3: Animation via Macromedia Flash – Nov. 24
Homeworks 4,5,6,7 for later in the termHomeworks 4,5,6,7 for later in the term
Homework 8: Multimedia web site – Dec. 12Homework 8: Multimedia web site – Dec. 12
See syllabus for detailsSee syllabus for details