Top Banner
Towards Coding for Human and Machine Vision: A Scalable Image Coding Approach Yueyu Hu, Shuai Yang, Wenhan Yang, Ling-Yu Duan, Jiaying Liu Peking University IEEE ICME 2020
40

Towards Coding for Human and Machine Vision: A Scalable ...

Nov 28, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Towards Coding for Human and Machine Vision: A Scalable ...

Towards Coding for Human and Machine Vision:

A Scalable Image Coding ApproachYueyu Hu, Shuai Yang, Wenhan Yang, Ling-Yu Duan, Jiaying Liu

Peking University

IEEE ICME 2020

Page 2: Towards Coding for Human and Machine Vision: A Scalable ...

IEEE ICME 20202

SCENE

Page 3: Towards Coding for Human and Machine Vision: A Scalable ...

3

WHAT HUMANS SEE

IEEE ICME 2020

Page 4: Towards Coding for Human and Machine Vision: A Scalable ...

4

MACHINE FEATURES

IEEE ICME 2020

Page 5: Towards Coding for Human and Machine Vision: A Scalable ...

5

MACHINE ANALYTICS

human human human human

human human human

human

human

human human human human human

human human

human human human human human human human human human

human human human human

human human human human

car car

flag

flag

traffic light traffic light

hair

face

up

r-arm

l-arm

r-leg

l-leg

r-shoe

l-shoe

IEEE ICME 2020

Page 6: Towards Coding for Human and Machine Vision: A Scalable ...

6

IMAGE CODING FOR WHOM?

HUMAN VISION

AMOUNT OF INFORMATION

1) Feature Extraction• DegradedHigh-Quality • Enhanced Information

2) Guided Reconstruction• Feature Image• Information Generation

MACHINE VISION

1) Feature Extraction• Redundant Compact• Compressed Information

2) Regression• Feature Label• Further Compressed

?HIGH-LEVEL FEATURESFOR MACHINE VISION

RECONSTRUCTED IMAGESFOR HUMAN VISION

IEEE ICME 2020

Page 7: Towards Coding for Human and Machine Vision: A Scalable ...

7

IMAGE CODING NEXTGEN

Ling-Yu Duan, Jiaying Liu, Wenhan Yang, Tiejun Huang, Wen Gao. Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics. arXiv:2001.03569, 2020

Video Acquisition

VCM EncoderBitstream

Bitstream

VCM Decoder

Machine Vision

Human Vision

• Scalable (according to utilizations)• Efficient compression for joint human and machine vision

Reconstructed Video

Reconstructed FeaturesVideo Stream

Feature Stream

Model Stream

ScalableFeedback

IEEE ICME 2020

Page 8: Towards Coding for Human and Machine Vision: A Scalable ...

8

INFORMATION DENSITY SPECTRUM

High Efficiency Video Coding

SIFT

CDVS

CDVA

Compressive Sensing

Super-Resolution

AVS3

AVS2

Versatile Video Coding

HIGH-LEVEL FEATURESFOR MACHINE VISION

RECONSTRUCTED IMAGESFOR HUMAN VISION

AMOUNT OF INFORMATION

• Descriptor coding for efficient machine vision analytics (low bit-rate)• Sophisticated video codecs for improved human vision (high bit-rate)

IEEE ICME 2020

Page 9: Towards Coding for Human and Machine Vision: A Scalable ...

9

EDGES• Efficient for structural information

• Maintain scalability

• Sparse and light-weight

• Supports smooth scaling

PROS

• Inefficient for details in images

• Ambiguous in color

CONS

IMAGE REPRESENTATIONS

IEEE ICME 2020

Page 10: Towards Coding for Human and Machine Vision: A Scalable ...

10

• Avoid color ambiguity

• Sparse and compact

• Related to visual fidelity

PROS

• Usually randomly distributed

• Inefficient for further compress

CONS

COLOR

IMAGE REPRESENTATIONS

DARK BLUE

LIGHT BLUE BLUE

IEEE ICME 2020

✔✘

Page 11: Towards Coding for Human and Machine Vision: A Scalable ...

IEEE ICME 202011

HUMAN FACES

Faces are naturally salient area in images we are looking at.

Machine vision systems to analysis faces have been widely developed.

It is the reflection of humanity in technology.

Analytics of Faces

Page 12: Towards Coding for Human and Machine Vision: A Scalable ...

12

Input image

Decoded image

Edge extraction Structure code

Color code EncoderBitstream

Bitstream

G

Decoder

Machine Vision

Human Vision

• Conceptual compression to achieve high quality with compact features• Scalable bit-stream for different tasks• Vectorized Edges + Sparse Pixels

SCALABLE FRAMEWORK

IEEE ICME 2020

Page 13: Towards Coding for Human and Machine Vision: A Scalable ...

13

Input image

Decoded image

Edge extraction Structure code

Color code EncoderBitstream

Bitstream

G

Decoder

Machine Vision

Human Vision

ENCODER · EDGE• Edge detection via structured forests

P. Dollar and C. L. Zitnick. Structured forests for fast edge detection. ICCV, 2013.

IEEE ICME 2020

Page 14: Towards Coding for Human and Machine Vision: A Scalable ...

14

Input image

Decoded image

Edge extraction Structure code

Color code EncoderBitstream

Bitstream

G

Decoder

Machine Vision

Human Vision

ENCODER · EDGE• Edge detection via structured forests

• AutoTrace to convert edge pixels to vectorized representations

• Represented by lines and curves• Short and trivial edges are screened

• Prediction for Partial Matching (PPM)to losslessly compress vectors

M.Weber. AutoTrace: a program for converting bitmap to vector graphic. 1998. http://autotrace.sourceforge.net/

IEEE ICME 2020

Page 15: Towards Coding for Human and Machine Vision: A Scalable ...

15

Input image

Decoded image

Edge extraction Structure code

Color code EncoderBitstream

Bitstream

G

Decoder

Machine Vision

Human Vision

ENCODER · COLOR• Sparse pixels sampled according to edges

• Segments: sample on both sides

α

αp1 p2

p1

p2

ps

ps

pt

pt

IEEE ICME 2020

Page 16: Towards Coding for Human and Machine Vision: A Scalable ...

16

Input image

Decoded image

Edge extraction Structure code

Color code EncoderBitstream

Bitstream

G

Decoder

Machine Vision

Human Vision

pa

ps pt

pb

p1

ENCODER · COLOR• Sparse pixels sampled according to edges

• Segments: sample on both sizes• Curves: sample on areas with steepest

gradients

IEEE ICME 2020

Page 17: Towards Coding for Human and Machine Vision: A Scalable ...

17

Input image

Decoded image

Edge extraction Structure code

Color code EncoderBitstream

Bitstream

G

Decoder

Machine Vision

Human Vision

DECODER· MACHINE VISION• Image-to-image translation

• Render pixels with vectorized representations• Edge-to-RGB translation

IEEE ICME 2020

Page 18: Towards Coding for Human and Machine Vision: A Scalable ...

18

Input image

Decoded image

Edge extraction Structure code

Color code EncoderBitstream

Bitstream

G

Decoder

Machine Vision

Human Vision

DECODER· HUMAN VISION• Image-to-image translation

• Render pixels with vectorized representations• Generate masks for completion synthesis• Image inpainting

IEEE ICME 2020

Page 19: Towards Coding for Human and Machine Vision: A Scalable ...

19

Input image

Decoded image

Edge extraction Structure code

Color code EncoderBitstream

Bitstream

G

Decoder

Machine Vision

Human Vision

LOSS FUNCTIONS• Reconstruction Loss

• Perceptual Loss1 2|| || SSIM( , )][r G GI I I Iλ λ= − +

3PERC( , )][p GI Iλ=

D

Reconstruction Loss + Perceptual Loss

Adversarial Loss

• Adversarial Objective

]

, , )],

[ ([ReLU( ([ReLU( ( ,

, ))] , ))

G G

D G

DM

ID ID I E M

E MEτ

τ −

= −=+

+

IEEE ICME 2020

Page 20: Towards Coding for Human and Machine Vision: A Scalable ...

20

EXPERIMENTAL RESULTS

HUMAN VISION

Subjective preference survey.

Measuring fidelity and Aesthetics.

MACHINE VISION

Evaluate facial landmark detection.

Measuring information preservation.

IEEE ICME 2020

Page 21: Towards Coding for Human and Machine Vision: A Scalable ...

HUMAN VISION

21

SCALABLE OUTPUT

MACHINE VISION INPUT IMAGE

0.114 bpp 0.172 bpp

0.157 bpp 0.249 bpp

IEEE ICME 2020

Page 22: Towards Coding for Human and Machine Vision: A Scalable ...

22

HUMAN PERCEPTION

Input image JPEG 0.266 bpp

IEEE ICME 2020

Page 23: Towards Coding for Human and Machine Vision: A Scalable ...

23

HUMAN PERCEPTION

Input image Ours 0.249 bpp

IEEE ICME 2020

Page 24: Towards Coding for Human and Machine Vision: A Scalable ...

24

HUMAN PERCEPTION

Input image JPEG 0.208 bpp

IEEE ICME 2020

Page 25: Towards Coding for Human and Machine Vision: A Scalable ...

25

HUMAN PERCEPTION

Input image Ours 0.198 bpp

IEEE ICME 2020

Page 26: Towards Coding for Human and Machine Vision: A Scalable ...

26

HUMAN PERCEPTION

Input image JPEG 0.178 bpp

IEEE ICME 2020

Page 27: Towards Coding for Human and Machine Vision: A Scalable ...

27

HUMAN PERCEPTION

Input image Ours 0.177 bpp

IEEE ICME 2020

Page 28: Towards Coding for Human and Machine Vision: A Scalable ...

28

MACHINE VISION

2

4

8

16

32

64

0.12 0.135 0.15 0.165 0.18 0.195 0.21 0.225 0.24 0.255 0.27

Nor

mal

ized

Poi

nt-t

o-Po

int

Erro

r

Bit-Rate (bpp)

JPEG (qp=8)

JPEG (qp=8)

LANDMARK DETECTION ACCURACY

IEEE ICME 2020

Page 29: Towards Coding for Human and Machine Vision: A Scalable ...

29

MACHINE VISION

2

4

8

16

32

64

0.12 0.135 0.15 0.165 0.18 0.195 0.21 0.225 0.24 0.255 0.27

Nor

mal

ized

Poi

nt-t

o-Po

int

Erro

r

Bit-Rate (bpp)

JPEG (qp=8)

our (E+C)

OUR (edge+color)

LANDMARK DETECTION ACCURACY

IEEE ICME 2020

Page 30: Towards Coding for Human and Machine Vision: A Scalable ...

30

MACHINE VISION

2

4

8

16

32

64

0.12 0.135 0.15 0.165 0.18 0.195 0.21 0.225 0.24 0.255 0.27

Nor

mal

ized

Poi

nt-t

o-Po

int

Erro

r

Bit-Rate (bpp)

JPEG (qp=7)

JPEG (qp=8)

our (E+C)

JPEG (qp=7)

LANDMARK DETECTION ACCURACY

IEEE ICME 2020

Page 31: Towards Coding for Human and Machine Vision: A Scalable ...

2

4

8

16

32

64

0.12 0.135 0.15 0.165 0.18 0.195 0.21 0.225 0.24 0.255 0.27

Nor

mal

ized

Poi

nt-t

o-Po

int

Erro

r

Bit-Rate (bpp)

31

MACHINE VISION

JPEG (qp=6)

JPEG (qp=7)

JPEG (qp=8)

our (E+C)

JPEG (qp=6)

LANDMARK DETECTION ACCURACY

IEEE ICME 2020

Page 32: Towards Coding for Human and Machine Vision: A Scalable ...

32

MACHINE VISION

2

4

8

16

32

64

0.12 0.135 0.15 0.165 0.18 0.195 0.21 0.225 0.24 0.255 0.27

Nor

mal

ized

Poi

nt-t

o-Po

int

Erro

r

Bit-Rate (bpp)

JPEG (qp=4)

JPEG (qp=6)

JPEG (qp=7)

JPEG (qp=8)

our (E+C)

JPEG (qp=4)

LANDMARK DETECTION ACCURACY

IEEE ICME 2020

Page 33: Towards Coding for Human and Machine Vision: A Scalable ...

33

MACHINE VISION

2

4

8

16

32

64

0.12 0.135 0.15 0.165 0.18 0.195 0.21 0.225 0.24 0.255 0.27

Nor

mal

ized

Poi

nt-t

o-Po

int

Erro

r

Bit-Rate (bpp)

JPEG (qp=4)

JPEG (qp=6)

JPEG (qp=7)

JPEG (qp=8)

our (E+C)our (E)

OUR (edge)

LANDMARK DETECTION ACCURACY

IEEE ICME 2020

Page 34: Towards Coding for Human and Machine Vision: A Scalable ...

34

MACHINE VISIONLANDMARK DETECTION ACCURACY

• Quantitatively evaluate the accuracy

of facial landmark detection on the

reconstructed images.

• Results show statistically improved

accuracy at a lower bit-rate.

• While the basic layer maintain a high

accuracy, the enhancing layer provide

more fidelity.

IEEE ICME 2020

Page 35: Towards Coding for Human and Machine Vision: A Scalable ...

35

MACHINE VISIONLANDMARK DETECTION RESULTS

JPEG 0.131 bpp Ours 0.115 bpp

IEEE ICME 2020

Page 36: Towards Coding for Human and Machine Vision: A Scalable ...

36

MACHINE VISIONLANDMARK DETECTION RESULTS

JPEG 0.138 bpp Ours 0.114 bpp

IEEE ICME 2020

Page 37: Towards Coding for Human and Machine Vision: A Scalable ...

37

MACHINE VISIONLANDMARK DETECTION RESULTS

JPEG 0.145 bpp Ours 0.108 bpp

IEEE ICME 2020

Page 38: Towards Coding for Human and Machine Vision: A Scalable ...

38

MACHINE VISIONLANDMARK DETECTION RESULTS

JPEG 0.158 bpp Ours 0.143 bpp

IEEE ICME 2020

Page 39: Towards Coding for Human and Machine Vision: A Scalable ...

39

CONCLUSION

APPROACH TO COLLABORATIVE CODING• Edge + sparse pixels, vectorized representation• Generative adversarial reconstruction

• Human-machine collaborative feature extraction

INFORMATION SCALABLE FRAMEWORK• Base layer Semantically accurate

• Enhanced layer Visually faithful• Efficient feature adaptation

FUTURE DIRECTIONS• Self-learned feature adaptation• Multi-task collaborative inference

• Theoretical analysis on collaborative coding

IEEE ICME 2020

Page 40: Towards Coding for Human and Machine Vision: A Scalable ...

IEEE ICME 2020

PROJECT

Towards Coding for Human and Machine Vision: A Scalable Image Coding Approach

PAPER ID 818

Thank You!CODE