A Non-obtrusive Head Mounted Face Capture System Thesis Committee: Dr. George C. Stockman (Main Advisor) Dr. Frank Biocca (Co-Advisor) Dr. Charles Owen.

Post on 20-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

A Non-obtrusive Head Mounted A Non-obtrusive Head Mounted

Face Capture SystemFace Capture System

Thesis Committee: Thesis Committee: Dr. George C. Stockman (Main Dr. George C. Stockman (Main Advisor)Advisor)Dr. Frank Biocca (Co-Advisor)Dr. Frank Biocca (Co-Advisor)Dr. Charles OwenDr. Charles OwenDr. Jannick Rolland (External Dr. Jannick Rolland (External Faculty)Faculty)

Chandan K. ReddyChandan K. ReddyMaster’s Thesis DefenseMaster’s Thesis Defense

Modes of CommunicationModes of Communication

Text only - e.g. Mail, Electronic MailText only - e.g. Mail, Electronic Mail Voice only – e.g. TelephoneVoice only – e.g. Telephone PC camera based conferencing – e.g. PC camera based conferencing – e.g.

Web camWeb cam Multi-user TeleconferencingMulti-user Teleconferencing Teleconferencing through Virtual Teleconferencing through Virtual

EnvironmentsEnvironments Augmented Reality Based Augmented Reality Based

TeleconferencingTeleconferencing

Problem DefinitionProblem Definition

Face Capture System ( FCS ) Virtual View Synthesis Depth Extraction and 3D Face

Modeling Head Mounted Projection Displays 3D Tele-immersive Environments High Bandwidth Network Connections

Thesis ContributionsThesis Contributions

Complete hardware setup for the FCS.Complete hardware setup for the FCS. Camera-mirror parameter estimation for the Camera-mirror parameter estimation for the

optimal configuration of the FCS.optimal configuration of the FCS. Generation of quality frontal videos from two Generation of quality frontal videos from two

side videosside videos Reconstruction of texture mapped 3D face model Reconstruction of texture mapped 3D face model

from two side viewsfrom two side views Evaluation mechanisms for the generated frontal Evaluation mechanisms for the generated frontal

views.views.

Existing Face Capture Existing Face Capture SystemsSystems

Advantages :Advantages : Freedom for Head Movements Freedom for Head Movements

Drawbacks :Drawbacks : Obstruction of the user’s Field of view Obstruction of the user’s Field of view

Main Applications :Main Applications : Character Animation and Mobile Character Animation and Mobile environmentsenvironments

CourteCourtesy :sy :

FaceCap3d - a FaceCap3d - a product product from Standard from Standard DeviationDeviation

Optical Face Tracker – a Optical Face Tracker – a product from Adaptive product from Adaptive OpticsOptics

Existing Face Capture Existing Face Capture SystemsSystems

Advantages :Advantages : No burden for the user No burden for the user

Drawbacks :Drawbacks : Highly equipped environments and Highly equipped environments and restricted head motionrestricted head motion

Main Applications :Main Applications : Teleconferencing and Teleconferencing and Collaborative workCollaborative work

CourteCourtesy:sy:

National tele-immersion National tele-immersion InitiativeInitiative

Sea of CamerasSea of Cameras(UNC Chappel Hill)(UNC Chappel Hill)

Proposed Face Capture Proposed Face Capture SystemSystem

Novel Face Capture System that is being Novel Face Capture System that is being developed.developed.

Two Cameras capture the corresponding Two Cameras capture the corresponding side views through the mirrorsside views through the mirrors

(F. Biocca and J. P. Rolland, “Teleportal face-to-face system”, Patent Filed, 2000.)

AdvantagesAdvantages

User’s field of view is unobstructedUser’s field of view is unobstructed Portable and easy to usePortable and easy to use Gives very accurate and quality face Gives very accurate and quality face

imagesimages Can process in real-timeCan process in real-time Simple and user-friendly systemSimple and user-friendly system Static with respect to human headStatic with respect to human head Flipping the mirror – cameras view Flipping the mirror – cameras view

the user’s viewpointthe user’s viewpoint

ApplicationsApplications

Mobile EnvironmentsMobile Environments Collaborative WorkCollaborative Work Multi-user TeleconferencingMulti-user Teleconferencing Medical AreasMedical Areas Distance LearningDistance Learning Gaming and Entertainment industryGaming and Entertainment industry OthersOthers

System DesignSystem Design

Equipment RequiredEquipment Required

HardwareHardware 2 lipstick cameras 2 lipstick cameras

2 lenses with focal length 2 lenses with focal length 12mm 12mm

2 mirrors with 1.5 inch 2 mirrors with 1.5 inch diameterdiameter

2 Matrox Meteor II 2 Matrox Meteor II standard cardsstandard cards

Lighting equipmentLighting equipment

VGA to NTSC ConverterVGA to NTSC Converter

A ProjectorA Projector

A MicrophoneA Microphone

SoftwareSoftware NetworkNetworkMIL – LITE 7.0MIL – LITE 7.0

Visual Studio 6.0Visual Studio 6.0

Adobe Premiere 6.0Adobe Premiere 6.0

Sound RecorderSound Recorder

Internet 2Internet 2

NAC 3000 MPEG EncoderNAC 3000 MPEG Encoder

NAC 4000 MEG DecoderNAC 4000 MEG Decoder

Optical LayoutOptical Layout

Three Components to be consideredThree Components to be considered CameraCamera MirrorMirror Human FaceHuman Face

Specification ParametersSpecification Parameters

CameraCamera Sensing area: 3.2 mm X 2.4 mm (¼”).Sensing area: 3.2 mm X 2.4 mm (¼”). Pixel Dimensions: Image sensed is of Pixel Dimensions: Image sensed is of

dimensions 768 X 494 pixels. Digitized image dimensions 768 X 494 pixels. Digitized image size is 320 X 240 due to restrictions of the size is 320 X 240 due to restrictions of the RAM size. RAM size.

Focal Length(Fc): 12 mm (VCL – 12UVM).Focal Length(Fc): 12 mm (VCL – 12UVM). Field of View (FOV): 15.2 Field of View (FOV): 15.2 00 X 11.4 X 11.4 00.. Diameter (Dc): 12mmDiameter (Dc): 12mm Fnumber (Nc): 1 -achieve maximum lightness.Fnumber (Nc): 1 -achieve maximum lightness. Minimum Working Distance (MWD)- 200 mm. Minimum Working Distance (MWD)- 200 mm. Depth of Field (DOF): to be estimatedDepth of Field (DOF): to be estimated

Specification Parameters (Contd.)Specification Parameters (Contd.)

MirrorMirror   Diameter (Dm) / Fnumber (Nm)Diameter (Dm) / Fnumber (Nm) Focal Length (fm)Focal Length (fm) Magnification factor (Mm)Magnification factor (Mm) Radius of curvature (Rm) Radius of curvature (Rm)

Human FaceHuman Face Height of the face to be captured (H~ 250mm)Height of the face to be captured (H~ 250mm) Width of the face to be captured (W~ 175 mm)Width of the face to be captured (W~ 175 mm)

DistancesDistances Distance between the camera and the mirror. Distance between the camera and the mirror.

(D(Dcmcm~150mm)~150mm) Distance between the mirror and the face. (DDistance between the mirror and the face. (Dmfmf

~200mm)~200mm)

Customization of Cameras and Customization of Cameras and MirrorsMirrors Off-the-shelf camerasOff-the-shelf cameras

Customizing camera lens is a tedious taskCustomizing camera lens is a tedious task Trade-off has to be made between the field of view Trade-off has to be made between the field of view

and the depth of fieldand the depth of field Sony DXC LS1 with 12mm lens is suitable for our Sony DXC LS1 with 12mm lens is suitable for our

applicationapplication Custom designed mirrorsCustom designed mirrors

A plano-convex lens with 40mm diameter is coated A plano-convex lens with 40mm diameter is coated with black on the planar side. with black on the planar side.

The radius of curvature of the convex surface is The radius of curvature of the convex surface is 155.04 mm. 155.04 mm.

The thickness at the center of the lens is 5 mm. The thickness at the center of the lens is 5 mm. The thickness at the edge is 3.7 mm. The thickness at the edge is 3.7 mm.

Block diagram of the systemBlock diagram of the system

Experimental setupExperimental setup

Virtual Video SynthesisVirtual Video Synthesis

Problem StatementProblem Statement

Generating virtual frontal view from two side Generating virtual frontal view from two side viewsviews

Data processingData processing

Two synchronized videos are captured in real-Two synchronized videos are captured in real-time (30 frames/sec) simultaneously.time (30 frames/sec) simultaneously.

For effective capturing and processing, the For effective capturing and processing, the data is stored in uncompressed format.data is stored in uncompressed format.

Machine Specifications (Lorelei @ Machine Specifications (Lorelei @ metlab.cse.msu.edu):metlab.cse.msu.edu): Pentium III processorPentium III processor Processor speed: 746 MHzProcessor speed: 746 MHz RAM Size: 384 MBRAM Size: 384 MB Hard Disk write Speed (practical): 9 MB/sHard Disk write Speed (practical): 9 MB/s

MIL-LITE is configured to use 150 MB of RAMMIL-LITE is configured to use 150 MB of RAM

Data processing (Contd.)Data processing (Contd.)

Size of 1 second video = 30 * 320 * 240 *3 Size of 1 second video = 30 * 320 * 240 *3 = 6.59 MB= 6.59 MB

Using 150 MB RAM, only 10 seconds video Using 150 MB RAM, only 10 seconds video from two cameras can be capturedfrom two cameras can be captured

Why does the processing have to be offline?Why does the processing have to be offline? Calibration procedure is not automaticCalibration procedure is not automatic Disk writing speed must be at least 14 MB/S.Disk writing speed must be at least 14 MB/S. To capture 2 videos of 640 * 480 resolution, the To capture 2 videos of 640 * 480 resolution, the

Disk writing speed must be at least 54 MB/S ???Disk writing speed must be at least 54 MB/S ???

Structured Light Structured Light techniquetechnique

A square grid in the A square grid in the frontal view appears as a frontal view appears as a quadrilateral (with curved quadrilateral (with curved edges) in the real side edges) in the real side viewview

Projecting a grid on Projecting a grid on the frontal view of the frontal view of the facethe face

Color BalancingColor Balancing

Hardware based approach White balancing of the cameras

Why this is more robust ? – why not software based ? There is no change in the input camera Better handling of varying lighting conditions No pre - knowledge of the skin color is required No additional overhead Its enough if both cameras are color balanced relatively

Left Calibration Left Calibration Face ImageFace Image

Transformation TablesTransformation Tables

ProjectorProjector

Right Calibration Right Calibration Face ImageFace Image

Off-line Calibration StageOff-line Calibration Stage

Transformation Transformation TablesTables

RightRightFace ImageFace Image

LeftLeftFace ImageFace Image

Right WarpedRight WarpedFace ImageFace Image

Left WarpedLeft WarpedFace ImageFace Image

Mosaiced Face ImageMosaiced Face Image

Operational StageOperational Stage

Virtual video synthesis Virtual video synthesis (Calibration phase)(Calibration phase)

Virtual video synthesis Virtual video synthesis (contd.)(contd.)

Virtual Frontal VideoVirtual Frontal Video

Comparison of the Frontal ViewsComparison of the Frontal Views

First row – Virtual frontal views First row – Virtual frontal views Second row – Original frontal viewsSecond row – Original frontal views

Video Synchronization (Eye Video Synchronization (Eye blinking)blinking)

First row – Virtual frontal views First row – Virtual frontal views Second row – Original frontal viewsSecond row – Original frontal views

Face Data through Head Face Data through Head Mounted SystemMounted System

3D Face Model3D Face Model

Coordinate SystemsCoordinate Systems

There are five coordinate systems in our application

World Coordinate System (WCS) Face Coordinate System (FCS) Left Camera Coordinate system (LCCS) Right Camera Coordinate system (RCCS) Projector Coordinate System (PCS)

Camera CalibrationCamera Calibration Conversion from 3D world coordinates to 2D Conversion from 3D world coordinates to 2D

camera coordinates - Perspective Transformation camera coordinates - Perspective Transformation ModelModel

s s L L PPrr

s s L L PPcc

ss

CC1111 CC1212 CC1313

CC1414

CC2121 CC2222 CC2323

CC2424

CC3131 CC3232 CC3333 11

==

s s W W PPyy

s s W W PPzz

11

s s W W PPxx

Eliminating the scale factorEliminating the scale factor

uujj = (c = (c1111 – c – c31 31 uujj) x) xjj + (c + (c1212 – c – c32 32 uujj) y) yjj + (c + (c1313 – c – c33 33

uujj) z) zjj + c + c1414

vvjj = (c = (c2121 – c – c31 31 vj) xj + (cvj) xj + (c2222 – c – c32 32 vj) yj + (cvj) yj + (c2323 – – cc33 33 vj) zj + cvj) zj + c2424

Calibration sphereCalibration sphere A sphere can be used for CalibrationA sphere can be used for Calibration Calibration points on the sphere are Calibration points on the sphere are

chosen in such a way that the chosen in such a way that the

Azimuthal angle is varied in steps of Azimuthal angle is varied in steps of 4545oo

Polar angle is varied in steps of 30Polar angle is varied in steps of 30oo

The location of these calibration The location of these calibration points is known in the 3D coordinate points is known in the 3D coordinate System with respect to the origin of System with respect to the origin of the spherethe sphere

The origin of the sphere defines the The origin of the sphere defines the origin of the World Coordinate origin of the World Coordinate SystemSystem

Projector CalibrationProjector Calibration

Similar to Camera CalibrationSimilar to Camera Calibration 2D image coordinates can not be obtained 2D image coordinates can not be obtained

directly from a 2D image.directly from a 2D image. A “Blank Image” is projected onto the sphereA “Blank Image” is projected onto the sphere The 2D coordinates of the calibration points The 2D coordinates of the calibration points

on the projected image are notedon the projected image are noted More points can be seen from the projector’s More points can be seen from the projector’s

point of view – some points are common to point of view – some points are common to both camera viewsboth camera views

Results appear to have slightly more errors Results appear to have slightly more errors when compared to the camera calibrationwhen compared to the camera calibration

3D Face Model Construction3D Face Model Construction

Why?Why? To obtain different views of the faceTo obtain different views of the face To generate the stereo pair to view it in To generate the stereo pair to view it in

the HMPDthe HMPD

Steps requiredSteps required Computation of 3D LocationsComputation of 3D Locations Customization of 3D ModelCustomization of 3D Model Texture MappingTexture Mapping

Computation of 3D Computation of 3D pointspoints 3d point estimation using stereo3d point estimation using stereo

Stereo between two cameras is not Stereo between two cameras is not possible because of the occlusion by the possible because of the occlusion by the facial featuresfacial features

Hence two stereo pair computationsHence two stereo pair computations Left camera and projectorLeft camera and projector Right camera and projectorRight camera and projector

Using stereo, compute 3D points of Using stereo, compute 3D points of prominent facial feature points in FCSprominent facial feature points in FCS

3D Generic Face Model3D Generic Face Model

A generic face model with 395 vertices and A generic face model with 395 vertices and 818 triangles818 triangles

Left: front view and Right: side viewLeft: front view and Right: side view

Texture Mapped 3D FaceTexture Mapped 3D Face

EvaluationEvaluation

Evaluation SchemesEvaluation Schemes Evaluation of facial expressions and is not Evaluation of facial expressions and is not

studied extensively in literaturestudied extensively in literature Evaluation can be done for facial Evaluation can be done for facial

alignment, face recognition for static alignment, face recognition for static images images

Lip and eye movements in a dynamic eventLip and eye movements in a dynamic event Perceptual quality – How are the moods Perceptual quality – How are the moods

conveyed?conveyed? Two types of evaluationTwo types of evaluation

Objective evaluationObjective evaluation Subjective evaluationSubjective evaluation

Objective EvaluationObjective Evaluation Theoretical EvaluationTheoretical Evaluation No human feedback requiredNo human feedback required This evaluation can give us a measure This evaluation can give us a measure

ofof Face recognitionFace recognition Face alignmentFace alignment Facial movementsFacial movements

Methods appliedMethods applied Normalized cross correlationNormalized cross correlation Euclidean distance measuresEuclidean distance measures

Evaluation ImagesEvaluation Images

5 frames were considered for objective 5 frames were considered for objective evaluationevaluation

First row – virtual frontal views First row – virtual frontal views Second row – original frontal viewsSecond row – original frontal views

Normalized Cross-Normalized Cross-CorrelationCorrelation

Regions considered for normalized cross-Regions considered for normalized cross-correlation correlation

( Left: Real image Right: Virtual image)( Left: Real image Right: Virtual image)

Normalized Cross-Normalized Cross-CorrelationCorrelation

Let V be the virtual image and R be the real imageLet V be the virtual image and R be the real image Let w be the width and h be the height of the imagesLet w be the width and h be the height of the images The Normalized Cross-correlation between the two images V and R is given byThe Normalized Cross-correlation between the two images V and R is given by

wherewhere

Normalized Cross-Normalized Cross-CorrelationCorrelation

VideoVideo

FramFrameses

Left Left eyeeye

Right Right eyeeye

MoutMouthh

Eyes Eyes + +

MoutMouthh

CompleComplete facete face

FramFrame1e1

0.9880.988 0.9870.987 0.9930.993 0.9890.989 0.9890.989

FramFrame2e2

0.9690.969 0.9720.972 0.9850.985 0.9780.978 0.9850.985

FramFrame3e3

0.9690.969 0.9670.967 0.9920.992 0.9780.978 0.9860.986

FramFrame4e4

0.9910.991 0.9890.989 0.9930.993 0.9900.990 0.9900.990

FramFrame5e5

0.9850.985 0.9860.986 0.9920.992 0.9880.988 0.9890.989

Euclidean Distance Euclidean Distance measuresmeasures

Euclidean distance between two points i and j is given byEuclidean distance between two points i and j is given by

Let Rij be the euclidean distance between two points i and j in Let Rij be the euclidean distance between two points i and j in the real imagethe real image

Let Vij be the euclidean distance between two points i and j in Let Vij be the euclidean distance between two points i and j in the virtual imagethe virtual image

Dij = | Rij - Vij |Dij = | Rij - Vij |

Euclidean Distance Euclidean Distance measuresmeasures

framesframes DDafaf DDbfbf DDcfcf DDcgcg DDdgdg DDegeg ErrorError

FrameFrame11

2.02.000

0.80.800

4.14.155

3.43.499

2.92.955

3.43.466

2.802.80

FrameFrame22

0.50.599

3.03.000

0.70.799

4.94.911

0.60.633

0.80.800

1.791.79

FrameFrame33

1.81.888

3.83.844

4.24.299

4.34.344

2.62.688

1.81.833

3.143.14

FrameFrame44

1.01.099

2.92.977

2.12.100

6.36.333

3.03.011

4.04.088

3.363.36

FrameFrame55

1.61.622

2.22.211

5.55.577

4.94.999

1.21.244

1.91.900

2.922.92

Subjective EvaluationSubjective Evaluation Evaluates the human perceptionEvaluates the human perception Measurement of quality of a talking faceMeasurement of quality of a talking face Factors that might affectFactors that might affect

Quality of the videoQuality of the video Facial movements and expressionsFacial movements and expressions Synchronization of the two halves of the faceSynchronization of the two halves of the face Color and Texture of the faceColor and Texture of the face Quality of audioQuality of audio Synchronization of audioSynchronization of audio

A preliminary study has been made to A preliminary study has been made to assess the quality of the generated videosassess the quality of the generated videos

ConclusionConclusion and and Future Future WorkWork

Virtual Frontal Image

Texture Mapped 3D Face Model

Virtual Frontal Video

3D Facial Animation

Conclusion Future Work

SummarySummary

Design and implementation of a novel Design and implementation of a novel Face Capture SystemFace Capture System

Generation of virtual frontal view from Generation of virtual frontal view from two side views in a video sequencetwo side views in a video sequence

Extraction of depth information using Extraction of depth information using stereo methodstereo method

Texture mapped 3D face model Texture mapped 3D face model generationgeneration

Evaluation of virtual frontal videosEvaluation of virtual frontal videos

Future WorkFuture Work

Online processing in real-timeOnline processing in real-time Automatic calibrationAutomatic calibration 3D facial animation3D facial animation Subjective Evaluation of the virtual frontal videosSubjective Evaluation of the virtual frontal videos Data compression while processing and Data compression while processing and

transmissiontransmission Customization of camera lensesCustomization of camera lenses Integration with a Head Mounted Projection Integration with a Head Mounted Projection

DisplayDisplay

Thank YouThank You

Doubts,Doubts,

QueriesQueries

&&

SuggestionsSuggestions

top related