Top Banner
Places Challenge 2017 Scene Parsing WinterIsComing Riwei Chen, Qi Chen, Xinglong Wu Yifan Lu, Yudong Jiang, Linfu Wen
19

Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

Sep 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

PlacesChallenge2017SceneParsing

WinterIsComingRiweiChen,QiChen,XinglongWuYifanLu,YudongJiang,LinfuWen

Page 2: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

Outline• SingleModelResults• MethodOverview• MethodDetails

• ModelPretraining• PyramidPooling• BatchSize&BN• Other details• Submissions

• VisualResults• FutureDirection

Page 3: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

FeaturesofADE20KDataset—SceneParsing

• Numberofimage• Training:20K• Validation:2K• Testing:3K

• Numberofcategory• Semanticcategory:150

Page 4: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

SingleModelResultsonValidationSet

• Singlemodel• Comparedwiththebestsinglemodelresultof2016

[1] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network, CVPR 2017[2] Wu Z, Shen C, Hengel A V D. Wider or Deeper: Revisiting the ResNet Model for Visual Recognition. 2016* The result of “Model C, 2 conv”

Team mIoU pixel accuracy

SenseCuSceneParsing[1] 43.39% 80.90%

Adelaide[2]* 43.06% 80.53%

WinterIsComming(ours) 43.98% 81.13%

Page 5: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

MethodOverview

• BaseNetwork:ResNet38• PyramidPooling• ImageNetandPlaces2pretraining• BatchSizeiscritical• Ensemblemodelstrainedwithdifferentepochs

Page 6: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

NetworkStructure

[1] Wu Z, Shen C, Hengel A V D. Wider or Deeper: Revisiting the ResNet Model for Visual Recognition. arXiv 2016[2] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network, CVPR 2017* Our implement is based on: https://github.com/itijyou/ademxapp

Page 7: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

BuildingBlocks

Page 8: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

Res-MobileNet

Model computation (macc)

ResNet50 109.4G

Res-MobileNet 32.5G

ResNet38 415.5G

VGG16 618.0G

* The computationcostofmodels wheninput size is 512x512

Page 9: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

ModelPerformance

[1] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network, CVPR 2017[2] Szegedy C, Ioffe S, et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. arXiv 2016[3] Wu Z, Shen C, Hengel A V D. Wider or Deeper: Revisiting the ResNet Model for Visual Recognition. arXiv 2016

Page 10: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

PyramidPooling

Page 11: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

PyramidPooling

• PyramidPoolingimprovestheintegrityofsegmentation

Image Ground Truth without Pyramid Pooling with Pyramid Pooling

Page 12: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

Pretraining

• ResNet50withoutImageNetpretraininghasthelowestaccuracy• Places2pretraininghelpsimproveaccuracy

Page 13: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

Batchsize&BatchNorm

• Trainingbatchsizeiscritical

• ExperimentwithRes-MobileNet

• ResNet38w/oPP,batchsize=6

• AfteraddingPP,batchsize=2

• Usuallyuse4GTX 1080Ti GPUs

Training Batch Size per GPU

Testing Pixel Accuracy

1 68.4%

2 69.7%

4 70.7%

finetune with fixed BN 72.9%

finetune ImageNet pretrained model with

fixed BN 74.1%

Page 14: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

Other Details

• Training augmentation• Multi-scale: [0.7, 1.3]• Flip• Random crop to 512x512

• Testing augmentation• Flip• Nomulti-scale

• SGD solver with lr = 1e-4 for 64 epochs

Page 15: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

Submissions

• Submit1:trainwithonlyADE20Ktrainingset• weget81.13%/43.98%pixelaccuracy/mIOUonvalidationset

• Submit2-4:finetunethemodelwithbothtrainingandvalidationsetfor5,22,29epochsrespectively

• Submit5:ensemblesubmit1-4modelsbyvoting

Page 16: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

Summary

• Pretrainingiscriticalanddatasetsofsimilartasksworkbetter• Batchsizeshouldbelargeenough• FixBNparamscanfurtherimproveresult(whenbatchsizeissmall)• PyramidPoolingcanimproveregionintegrityofsegmentation

Page 17: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

VisualResults

Image Ground Truth without Pyramid Pooling with Pyramid Pooling

Page 18: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

Futurework

• Memory-efficientdeeplearningframework• Well-PretrainedRes-MobileNet• Focalloss• Expertmodel

Page 19: Places Challenge 2017 Scene Parsingpresentations.cocodataset.org/Places17-WinterIsComing.pdf · 2020. 4. 1. · • ImageNet and Places2 pretraining • Batch Size is critical •

Thanks&Questions