Top Banner
SeRanet Super resolution software through Deep Learning https :// github.com/corochann/SeRanet
41

SeRanet introduction

Feb 07, 2017

Download

Software

Kosuke Nakago
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SeRanet introduction

SeRanetSuper resolution software through Deep Learninghttps://github.com/corochann/SeRanet

Page 2: SeRanet introduction

Table of contents Introduction   Machine learning   Deep learning

SRCNN   Problem   Introduction of previous works    “ Image Super-Resolution Using Deep Convolutional Networks”     waifu2x

SeRanet   Sprice   Fusion    CNN model  Result   Performance

Conclusion

Page 3: SeRanet introduction

Table of contents

Introduction   Machine learning   Deep learning

Page 4: SeRanet introduction

What is machine learning There are 3 major category in machine learning

・ Supervised learning   Pile of input data and “correct/labeled” output data is given during the training. Goal: train software to output “correct/label” value from given input data. Ex. Image recognition (input: image data, output: recognition result (human, cat, car etc.))    Voice recognition (input: human voice data, output: text which human speaks))

・ Unsupervised learning   Only a lot of input data is given. Goal: categorize data based on the data statistics (find out existing deviation in the data) Ex. Categorize the type of cancer    Link users who have similar interests in the web application for recommendation

・ Reinforcement learningThe problem setting: agent chooses an “action” inside given “environment”. Choosing an action makes interference with environment and agent gets some “reward”.Goal: Find out an action which maximizes the reward agent can gain.

Ex. Deepmind DQN, Alpha GO    Robot self learning: how to control own parts

Page 5: SeRanet introduction

What is machine learning There are 3 major category in machine learning

・ Supervised learning   Pile of input data and “correct/labeled” output data is given during the training. Goal: train software to output “correct/label” value from given input data. Ex. Image recognition (input: image data, output: recognition result (human, cat, car etc.))    Voice recognition (input: human voice data, output: text which human speaks))

・ Unsupervised learning   Only a lot of input data is given. Goal: categorize data based on the data statistics (find out existing deviation in the data) Ex. Categorize the type of cancer    Link users who have similar interests in the web application for recommendation

・ Reinforcement learningThe problem setting: agent chooses an “action” inside given “environment”. Choosing an action makes interference with environment and agent gets some “reward”.Goal: Find out an action which maximizes the reward agent can gain.

Ex. Deepmind DQN, Alpha GO    Robot self learning: how to control own parts

SeRanet uses this machine learning

Page 6: SeRanet introduction

Deep learning“Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using a deep graph with multiple processing layers, composed of multiple linear and non-linear transformations.” cite from Wikipedia “Deep Learning”

Input Output

Page 7: SeRanet introduction

Table of contents

SRCNN   Problem   Introduction of previous works    “ Image Super-Resolution Using Deep Convolutional Networks”     waifu2x

Page 8: SeRanet introduction

Super resolution task by machine learning Problem definition ・ You are given a compressed picture with half size.   Recover original picture and output it. Training phase:

Map

The goal of this machine learning is to construct a map to convert compressed picture into original picture (as close as possible).

Original picture Compressed picture (half size)

Page 9: SeRanet introduction

Super resolution task by machine learning After training ・ Input: arbitrary picture → Output: twice size picture with super resolution

Twice size pictureHigh quality

Picture to be enlarged

map obtained by machine learning

Page 10: SeRanet introduction

Representation of the “map”Deep Convolutional Neural Network ( CNN ) is used. - Current trend for image recognition task

Page 11: SeRanet introduction

Previous work ①“Image Super-Resolution Using Deep Convolutional Networks” Chao Dong, Chen Change Loy, Kaiming He and Xiaoou Tang https://arxiv.org/abs/1501.00092

・ The original paper which suggest “SRCNN”.

It reports that superior result is obtained for super resolution using Convolutional Neural Network.

In this slide, this work paper be denoted as “SRCNN paper” in the following http://mmlab.ie.cuhk.edu.hk/projects/SRCNN.html

Page 12: SeRanet introduction

Algorithm summary1. Read picture/image file

2. Enlarge the picture twice size in advance as preprocessing (can be generalized to n-times size)  3. Convert RGB format into YC b Cr format, and extract Y Channel

4. Normalization: Convert value range from 0-255 to 0-1

5. Input Y Channel data into CNN   As output, we obtain Y channel data with normalized value

6. Revert value range to 0-255

7. CbCr Channel is enlarged by conventional method like Bicubic method etc. Compose obtained Y channel and CbCr Channel to get final result.

※ 3. , 7. can be skipped when you construct CNN with input/output RGB Channel

Page 13: SeRanet introduction

Remark of algorithm ① ・ Deal with only Y Channel as input/output data of CNN - Human is more sensitive to luminance (Y) than color difference chrominance (CrCb). - Only deal Y channel to reduce the problem complexity of learning CNN.

Y channel Cr channel Cb channel YCbCr decomposition

Page 14: SeRanet introduction

Remark of algorithm ① ・ Deal with only Y Channel as input/output data of CNN - Human is more sensitive to luminance (Y) than color difference chrominance (CrCb). - Only deal Y channel to reduce the problem complexity of learning CNN.

Y channel Cr channel Cb channel YCbCr decomposition

・ RGB Channel training is difficult? SRCNN paper surveys the case when CNN is trained for input/output RGB 3 Channel data.The result suggests that Y channel CNN has superior performance than RGB CNN when the CNN size is not so big.

Page 15: SeRanet introduction

Remark of algorithm ② ・ Enlarge the picture/image data in advance before input to CNNSRCNN paper uses Bicubic method, waifu2x (explained later) uses Nearest neighbor method to enlarge picture before input the data to CNN

 

[ Reason ]Input/output picture size is almost same when you implement convolutional neural network by machine learning library. ( Regolously, output picture size will be smaller by filter size - 1)※

input output

CNN

- Enlarge picture- Y channel extraction

Enlarge picture, CNN is not used for C r・ Cb Channel

compose

Page 16: SeRanet introduction

Previous work ①   CNN model

CNN Layer1 Layer2 Layer3In channel 1 32 64Out channel 32 64 1Kernel size 9 5 5# of parameter 2628 51264 1664# of convolution 2592×4WH 51200×4WH 1600×4WH

Relatively shallow CNN architecture with big kernel size

SRCNN paper’s CNN model, one example (many other parameters are tested in the paper)

※# of parameter = In channel * Out channel * (kernel size)^2 + Out channel# of convolution per pixel = In channel * Out channel * (kernel size) ^2 + Out channel※

Soppse the picture size before enlarge as w × h pixel, then the picture size after enlarge will be 2w×2h. We need convolution for 4wh pixels in total.

total # of parameter : 55556 total # of convolution : 55392×4WH

Page 17: SeRanet introduction

Previous work ②   waifu2xwaifu2x https://github.com/nagadomi/waifu2x

The term “waifu” comes from Japanese pronunciation of “wife” (Japanese uses the term “wife” to their favorite female anime character)

https://github.com/nagadomi/waifu2x

Open source software, originally published to enlarge art-style imageIt also supports picture style now.

You can test the application on server. http://waifu2x.udp.jp/

Page 18: SeRanet introduction

Previous work ②   waifu2xwaifu2x is open source software, which makes other software engineers to develop the related software.

Many of the derivative software is published now.

[Related links (in Japanese)]・ waifu2xとその派生ソフト一覧  http://kourindrug.sakura.ne.jp/waifu2x.html・サルでも分かる waifu2x のアルゴリズムhttps://drive.google.com/file/d/0B22mWPiNr-6-RVVpaGhZa1hJTnM/view

Page 19: SeRanet introduction

先行ソフト②  waifu2x CNN model

畳み込みの Kernel size を 3 と小さくとる分、深いニューラルネットを構成している。

CNN Layer1 Layer2 Layer3 Layer4 Layer5 Layer6 Layer7In channel 1 32 32 64 64 128 128Out channel 32 32 64 64 128 128 1Kernel size 3 3 3 3 3 3 3# of parameter 320 9248 18496 36928 73856 147584 1153# of convolution 288×4WH 9216×4WH 18432×4WH 36864×4WH 73728×4WH 147456×4WH 1152×4WH

Deep CNN architecture with small kernel size

total # of parameter : 287585 total # of convolution : 287136×4WH

Page 20: SeRanet introduction

What is special for SRCNN task The big difference point with image reconition task

1. Position sensitivity is required+ Image recognition task: - Translation invariant property is welcomed and Max pooling or Stride technique is often utilized.+ SRCNN task: - Translation variant property is necessary for super resolution to since it requires position-dependent output.

2. Feature map image size don’t reduce during the CNN image processing.  → As the number of feature map increases, amount of calculation increases     Speed/memory restriction is severe

    Required memory for CNN The volume of rectangular in CNN model figure≒     For image recognition task, usually feature map image size will be smaller in the deeper layer of CNN. The number of feature map can be bigger if the image size is smaller.

Page 21: SeRanet introduction

Table of contents

Explanation of SeRanet project starts from here.Introduction of the idea behind SeRanet.

SeRanet    Idea 1  Sprice    Idea 2  Fusion     SeRanet CNN model

Page 22: SeRanet introduction

SeRanet   Idea 1 Splice Input with the pre-scaled size w × h picture image, and getting 2w × 2h size picture as output → Introduce “Split” and “Splice” concept

input size: w × h

Split

Splice: 4 branches of neural network will be merged to obtain one 2w × 2h size image

Split: 4 branches of neural network (NN) with size w × h will be created

Splice

output size: 2 w × 2h LU

RU

LD

RD

Page 23: SeRanet introduction

SeRanet   Idea 1 Splice After split, 4 branches of neural network corresponds to Left-Up (LU), Right-Up (RU), Left-Down (LD), Right-Down(RD) pixel of enlarged picture.

Split Splice

Input image Output image

(1, 1) (1, 2)

(2, 1) (2, 2)

(1, 1) (1, 2)

(2, 1) (2, 2)

(1, 1) (1, 2)

(2, 1) (2, 2)

(1, 1) (1, 2)

(2, 1) (2, 2)

(1, 1) (1, 2)

(2, 1) (2, 2)

(1, 1) (1, 2)

(2, 1) (2, 2)

(1, 1)

(1, 1)(1, 1)

(1, 2)

(1, 2)(1, 2)

(2, 1)

(2, 1)

(2, 1) (2, 2)

(2, 2)

(2, 2)

At splice phase, 4 branches of CNN will be combined/spliced to get twice size image

LU RU

LD RD

Page 24: SeRanet introduction

The effect of introducing Splice → Flexibility of neural network modelling

Input w × h

Split Splice

Output 2 w × 2 h

3rd Phase2nd Phase1st Phase

1st Phase : Image size is wxh before enlarged, the amount of calculation is 1/4 compared to 3rd phase.        → The larger size of feature map, Kernel size is accepted in this phase.

2nd Phase : 4 branches CNN with image size wxh.         Total calculation amount is same with 3rd Phase, but the parameter learned at each branch (LU, RU, LD, RD) can be different.        → model representation potential will grow.         Another advantage is memory consumption is smaller compared to 3rd phase due to the size of image. 3rd Phase : Image size 2wx2h. The last phase to get output.        Memory consumption and the amount of calculation will be 4 times larger than 1st Phase.

Page 25: SeRanet introduction

目次 SeRanet Idea 1  Sprice Idea 2  Fusion     SeRanet CNN model

Page 26: SeRanet introduction

Fusion…The method has introduced in Colorization paper

・“ Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification” Satoshi Iizuka, Edgar Simo-Serra and Hiroshi Ishikawa http://hi.cs.waseda.ac.jp/~iizuka/projects/colorization/en/

The research aims to convert monotone input image to colorful output image through CNN supervised learning.

https://github.com/satoshiiizuka/siggraph2016_colorization

Input

Output

Page 27: SeRanet introduction

Neural network used in Colorization paper

Upper CNN: Main CNN used for colorizationLower CNN: This CNN is trained for image classification

So, different purpose CNN is utilized to help improve the performance of main CNN.

Lower CNN is expected to be learning global feature, e.g. “The image is taken in outsize”.The research reports colorization accuracy has increased by “fusioning” the global feature into main CNN.Example of how global feature helps colorization (Read paper for detail) - It reduces to mistakenly use sky-color at the top of image when the picture is taken in inside. - It reduces to mistakenly use brown ground color when picture is taken on the sea.

http://hi.cs.waseda.ac.jp/~iizuka/projects/colorization/en/ 

Page 28: SeRanet introduction

SeRanet   Idea 2 Fusion SeRanet combines/fusions 2 types of CNN at 1st Phase.

Purpose : Combining different type non-linear activation to get wide variety of model representation

※I want to use Convolutional Restricted Boltzmann Machine with Pre-training for lower CNN in the future development.

Upper CNN uses Leaky ReLU for Activation Lower CNN uses Sigmoid for Activation

Page 29: SeRanet introduction

SeRanet CNN model CNN model of seranet_v1

Split Splice

CNN Layer1 Layer2 Layer3

In channel 3 64 64

Out channel 64 64 128

Kernel size 5 5 5

# of parameter 4864 102464 204928

# of convolution

4800×WH 102400×WH 204800×WH

Layer4 Layer5 Layer6

256 512 256

512 256 128

1 1 3

131584 131584 295040

131072×WH 131072×WH 294912×WH

Layer7 Layer8 Layer9 Layer10

128 128 128 128

128 128 128 3

3 3 3 3

147584 147584 147584 3584

147456×4WH 147456×4WH 147456×4WH 3456×4WH

Fusion

total # of parameter : 3303680 total # of convolution : 1159150×4WH

× 2 × 4

3rd Phase2nd Phase1st Phase

Page 30: SeRanet introduction

Comparison

Parameter: 10 times more than waifu2x Convolution: 4 times

The number of parameter increases more compared to the number of convolution(calculation) increase.This is because SeRanet have position-dependent paramter (LU, RU, LD, RD).

→ Question: The increase of parameter and calculation results in better performance???

Model SRCNN paper waifu2x SeRanet_v1

Total parameter 55556 287585 3303680

Total convolution 55392×4WH 287136×4WH 1159150×4WH

Page 31: SeRanet introduction

Table of contents Result   Performance    Comparison between various resize methods

・ Bicubic ・ Lanczos ・ waifu2x ・ SeRanet

Conventional resize method

Resize through CNN

- Forward/backward slide for comparison- Slideshare may be difficult to find the difference, See the link for comparison https://github.com/corochann/SeRanet

Page 32: SeRanet introduction

ResultInput picture

Page 33: SeRanet introduction

ResultBicubic (OpenCV resize method is used)

Page 34: SeRanet introduction

ResultLanczos (OpenCV resize method is used)

Page 35: SeRanet introduction

Resultwaifu2x   (http://waifu2x.udp.jp/, Style : photo, Noise reduction: None, Upscaling: 2 x)

Page 36: SeRanet introduction

ResultSeRanet

Page 37: SeRanet introduction

ResultOriginal data (ground truth data, for reference)

Page 38: SeRanet introduction

The difference can be found at detail point of 1st picture, thin stalk at 4th picture (high frequency channel)

ResultOriginal data (ground truth data, for reference)

Page 39: SeRanet introduction

  Performance    Comparison between various resize methods

*Based on personal feeling

・ Bicubic ・ Lanczos ・ waifu2x ・ SeRanet ・ Original image

(comparison by specific mesurement is not done yet)

Almost same

Almost sameDifferent

Different

Conventional resize method

Resize through CNN

Result

Page 40: SeRanet introduction

Summary SeRanet

・ Big size CNN is used ( Depth 9 layer, total parameter 3303680 )・ RGB 3 Channel is used for input/output of CNN instead of only Y Channel・ Split, Splicing CNN  Left-Up, Right-Up, Left-Down, Right-Down branch will use different paramter・ Fusion  Different non-linearity is combined for flexibility of model representation・ Convolutional RBM Pretraining

The performance still not matured yet, we may improve more to get the output more close to original image.

Page 41: SeRanet introduction

At last,,,+ The project is open source project, on githubhttps://github.com/corochann/SeRanet

+ Improvement idea, discussion welcome

+ My Blog: http://corochann.com/

* If there is in-appropriate citing, please let me know.

* SeRanet is personal project, I may be misunderstanding. Please let me know if there’s wrong information.