Top Banner
14

Semi-supervised Video Object Segmentation

Jul 13, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Semi-supervised Video Object Segmentation
Page 2: Semi-supervised Video Object Segmentation

Semi-supervised Video Object Segmentation• Benchmarks & Metrics

• Benchmarks

• DAVIS 2016: Popular single object VOS benchmark

• DAVIS 2017: Multi object VOS benchmark with high quality annotation and higher resolution

• YouTube-VOS: The largest and most complex VOS dataset

Page 3: Semi-supervised Video Object Segmentation

• Benchmarks & Metrics

• Metrics

• Jaccard Score ( ): IoU of predicted mask and ground truth mask

• Contour Accuracy( ): F1 score of predict mask’s boundary element and ground truth mask’s boundary element

• : Harmonic average of the above two indicators

Semi-supervised Video Object Segmentation

Page 4: Semi-supervised Video Object Segmentation

Semi-supervised Video Object Segmentation

• Semi Supervised

• Given one or more annotated frames

• propagate the manual labeling to the entire video

Page 5: Semi-supervised Video Object Segmentation

• Multi-object Scenarios

• post-ensemble manner:

• AOT associates and segments multiple objects within an end-to-end framework

Semi-supervised Video Object Segmentation

Page 6: Semi-supervised Video Object Segmentation

Identity Assignment

• Identity Embedding

• Identity Decoding

Page 7: Semi-supervised Video Object Segmentation

Long-short term transformer (LSTT)

• Long Term Attention

• Short Term Attention

Page 8: Semi-supervised Video Object Segmentation
Page 9: Semi-supervised Video Object Segmentation

Overview Architecture

• Encoder

• MobileNet V2

• Decoder

• FPN

• Loss Function

• Binary Cross Entropy Loss

• IoU Loss

Page 10: Semi-supervised Video Object Segmentation

AOT-Tiny:L=1, m=1

AOT-Small:L=2, m=1

AOT-Base:L=3, m=1

AOT-Large:L=3, m={1,7,13,……}

AOT-Base 5 times faster than CFBI

(15.2fps vs 3.4fps)

Page 11: Semi-supervised Video Object Segmentation

Ablation study

Page 12: Semi-supervised Video Object Segmentation

Interpretability — Identity Bank

Page 13: Semi-supervised Video Object Segmentation

Interpretability — Long term & Short term Memory

Page 14: Semi-supervised Video Object Segmentation

Thanks for watching!