SeRanet introduction

SeRanetSuper resolution software through Deep Learninghttps://github.com/corochann/SeRanet

https://github.com/corochann/SeRanet



Table of contents Introduction 　 Machine learning 　 Deep learning

SRCNN 　 Problem 　 Introduction of previous works 　　　“ Image Super-Resolution Using Deep Convolutional Networks” 　　　 waifu2x

SeRanet　　 Sprice　　 Fusion 　　 CNN model 　Result 　 Performance

Conclusion

Table of contents

Introduction 　 Machine learning 　 Deep learning

What is machine learning There are 3 major category in machine learning

・ Supervised learning 　 Pile of input data and “correct/labeled” output data is given during the training. Goal: train software to output “correct/label” value from given input data. Ex. Image recognition (input: image data, output: recognition result (human, cat, car etc.)) 　　 Voice recognition (input: human voice data, output: text which human speaks))

・ Unsupervised learning 　 Only a lot of input data is given. Goal: categorize data based on the data statistics (find out existing deviation in the data) Ex. Categorize the type of cancer 　　 Link users who have similar interests in the web application for recommendation

・ Reinforcement learningThe problem setting: agent chooses an “action” inside given “environment”. Choosing an action makes interference with environment and agent gets some “reward”.Goal: Find out an action which maximizes the reward agent can gain.

Ex. Deepmind DQN, Alpha GO 　　 Robot self learning: how to control own parts

What is machine learning There are 3 major category in machine learning

・ Supervised learning 　 Pile of input data and “correct/labeled” output data is given during the training. Goal: train software to output “correct/label” value from given input data. Ex. Image recognition (input: image data, output: recognition result (human, cat, car etc.)) 　　 Voice recognition (input: human voice data, output: text which human speaks))

・ Unsupervised learning 　 Only a lot of input data is given. Goal: categorize data based on the data statistics (find out existing deviation in the data) Ex. Categorize the type of cancer 　　 Link users who have similar interests in the web application for recommendation

・ Reinforcement learningThe problem setting: agent chooses an “action” inside given “environment”. Choosing an action makes interference with environment and agent gets some “reward”.Goal: Find out an action which maximizes the reward agent can gain.

Ex. Deepmind DQN, Alpha GO 　　 Robot self learning: how to control own parts

SeRanet uses this machine learning

Deep learning“Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using a deep graph with multiple processing layers, composed of multiple linear and non-linear transformations.” cite from Wikipedia “Deep Learning”

Input Output

Table of contents

SRCNN 　 Problem 　 Introduction of previous works 　　　“ Image Super-Resolution Using Deep Convolutional Networks” 　　　 waifu2x

Super resolution task by machine learning Problem definition ・ You are given a compressed picture with half size. 　 Recover original picture and output it. Training phase:

Map

The goal of this machine learning is to construct a map to convert compressed picture into original picture (as close as possible).

Original picture Compressed picture (half size)

Super resolution task by machine learning After training ・ Input: arbitrary picture → Output: twice size picture with super resolution

Twice size pictureHigh quality

Picture to be enlarged

map obtained by machine learning

Representation of the “map”Deep Convolutional Neural Network （ CNN ） is used. - Current trend for image recognition task

Previous work ①“Image Super-Resolution Using Deep Convolutional Networks” Chao Dong, Chen Change Loy, Kaiming He and Xiaoou Tang https://arxiv.org/abs/1501.00092

・ The original paper which suggest “SRCNN”.

It reports that superior result is obtained for super resolution using Convolutional Neural Network.

In this slide, this work paper be denoted as “SRCNN paper” in the following http://mmlab.ie.cuhk.edu.hk/projects/SRCNN.html

Algorithm summary１． Read picture/image file

２． Enlarge the picture twice size in advance as preprocessing (can be generalized to n-times size) 　３． Convert RGB format into YC ｂ Cr format, and extract Y Channel

４． Normalization: Convert value range from 0-255 to 0-1

５． Input Y Channel data into CNN　　 As output, we obtain Y channel data with normalized value

６． Revert value range to 0-255

７． CbCr Channel is enlarged by conventional method like Bicubic method etc. Compose obtained Y channel and CbCr Channel to get final result.

※ ３． , ７． can be skipped when you construct CNN with input/output RGB Channel

Remark of algorithm ① ・ Deal with only Y Channel as input/output data of CNN - Human is more sensitive to luminance (Y) than color difference chrominance (CrCb). - Only deal Y channel to reduce the problem complexity of learning CNN.

Y channel Cr channel Cb channel YCbCr decomposition

Remark of algorithm ① ・ Deal with only Y Channel as input/output data of CNN - Human is more sensitive to luminance (Y) than color difference chrominance (CrCb). - Only deal Y channel to reduce the problem complexity of learning CNN.

Y channel Cr channel Cb channel YCbCr decomposition

・ RGB Channel training is difficult? SRCNN paper surveys the case when CNN is trained for input/output RGB 3 Channel data.The result suggests that Y channel CNN has superior performance than RGB CNN when the CNN size is not so big.

Remark of algorithm ② ・ Enlarge the picture/image data in advance before input to CNNSRCNN paper uses Bicubic method, waifu2x (explained later) uses Nearest neighbor method to enlarge picture before input the data to CNN

　

［ Reason ］Input/output picture size is almost same when you implement convolutional neural network by machine learning library. ( Regolously, output picture size will be smaller by filter size - 1)※

input output

CNN

- Enlarge picture- Y channel extraction

Enlarge picture, CNN is not used for C ｒ・ Cb Channel

compose

Previous work ① 　 CNN model

CNN Layer1 Layer2 Layer3In channel 1 32 64Out channel 32 64 1Kernel size 9 5 5# of parameter 2628 51264 1664# of convolution 2592×4WH 51200×4WH 1600×4WH

Relatively shallow CNN architecture with big kernel size

SRCNN paper’s CNN model, one example (many other parameters are tested in the paper)

※# of parameter = In channel * Out channel * (kernel size)^2 + Out channel# of convolution per pixel = In channel * Out channel * (kernel size) ^2 + Out channel※

Soppse the picture size before enlarge as w × h pixel, then the picture size after enlarge will be 2w×2h. We need convolution for 4wh pixels in total.

total # of parameter ： 55556 total # of convolution ： 55392×4WH

Previous work ② 　 waifu2xwaifu2x https://github.com/nagadomi/waifu2x

The term “waifu” comes from Japanese pronunciation of “wife” (Japanese uses the term “wife” to their favorite female anime character)

https://github.com/nagadomi/waifu2x

Open source software, originally published to enlarge art-style imageIt also supports picture style now.

You can test the application on server. http://waifu2x.udp.jp/

http://waifu2x.udp.jp/



Previous work ② 　 waifu2xwaifu2x is open source software, which makes other software engineers to develop the related software.

Many of the derivative software is published now.

[Related links (in Japanese)]・ waifu2xとその派生ソフト一覧　 http://kourindrug.sakura.ne.jp/waifu2x.html・サルでも分かる waifu2x のアルゴリズムhttps://drive.google.com/file/d/0B22mWPiNr-6-RVVpaGhZa1hJTnM/view

http://kourindrug.sakura.ne.jp/waifu2x.html

http://kourindrug.sakura.ne.jp/waifu2x.html

https://drive.google.com/file/d/0B22mWPiNr-6-RVVpaGhZa1hJTnM/view



先行ソフト②　 waifu2x CNN model

畳み込みの Kernel size を 3 と小さくとる分、深いニューラルネットを構成している。

CNN Layer1 Layer2 Layer3 Layer4 Layer5 Layer6 Layer7In channel 1 32 32 64 64 128 128Out channel 32 32 64 64 128 128 1Kernel size 3 3 3 3 3 3 3# of parameter 320 9248 18496 36928 73856 147584 1153# of convolution 288×4WH 9216×4WH 18432×4WH 36864×4WH 73728×4WH 147456×4WH 1152×4WH

Deep CNN architecture with small kernel size


What is special for SRCNN task The big difference point with image reconition task

１． Position sensitivity is required+ Image recognition task: - Translation invariant property is welcomed and Max pooling or Stride technique is often utilized.+ SRCNN task: - Translation variant property is necessary for super resolution to since it requires position-dependent output.

２． Feature map image size don’t reduce during the CNN image processing. 　→ As the number of feature map increases, amount of calculation increases 　　　 Speed/memory restriction is severe

　　　 Required memory for CNN The volume of rectangular in CNN model figure≒ 　　　 For image recognition task, usually feature map image size will be smaller in the deeper layer of CNN. The number of feature map can be bigger if the image size is smaller.

Table of contents

Explanation of SeRanet project starts from here.Introduction of the idea behind SeRanet.

SeRanet 　　 Idea １　 Sprice 　　 Idea ２　 Fusion 　　 SeRanet CNN model

SeRanet 　 Idea 1 Splice Input with the pre-scaled size w × h picture image, and getting 2w × 2h size picture as output → Introduce “Split” and “Splice” concept

input size: ｗ × ｈ

Split

Splice: 4 branches of neural network will be merged to obtain one 2w × 2h size image

Split: 4 branches of neural network (NN) with size w × h will be created

Splice

output size: 2 ｗ × 2ｈ LU

RU

LD

RD

SeRanet 　 Idea 1 Splice After split, 4 branches of neural network corresponds to Left-Up (LU), Right-Up (RU), Left-Down (LD), Right-Down(RD) pixel of enlarged picture.

Split Splice

Input image Output image

(1, 1) (1, 2)

(2, 1) (2, 2)

(1, 1) (1, 2)

(2, 1) (2, 2)

(1, 1) (1, 2)

(2, 1) (2, 2)

(1, 1) (1, 2)

(2, 1) (2, 2)

(1, 1) (1, 2)

(2, 1) (2, 2)

(1, 1) (1, 2)

(2, 1) (2, 2)

(1, 1)

(1, 1)(1, 1)

(1, 2)

(1, 2)(1, 2)

(2, 1)

(2, 1)

(2, 1) (2, 2)

(2, 2)

(2, 2)

At splice phase, 4 branches of CNN will be combined/spliced to get twice size image

LU RU

LD RD

The effect of introducing Splice → Flexibility of neural network modelling

Input ｗ × ｈ

Split Splice

Output 2 ｗ × 2 ｈ

3rd Phase2nd Phase1st Phase

1st Phase ： Image size is wxh before enlarged, the amount of calculation is １／４ compared to 3rd phase.　　　　　　　　→ The larger size of feature map, Kernel size is accepted in this phase.

2nd Phase ： 4 branches CNN with image size wxh.　　　　　　　　 Total calculation amount is same with 3rd Phase, but the parameter learned at each branch (LU, RU, LD, RD) can be different.　　　　　　　　→ model representation potential will grow.　　　　　　　　 Another advantage is memory consumption is smaller compared to 3rd phase due to the size of image. 3rd Phase ： Image size 2wx2h. The last phase to get output.　　　　　　　 Memory consumption and the amount of calculation will be 4 times larger than 1st Phase.

目次 SeRanet Idea １　 Sprice Idea ２　 Fusion 　　 SeRanet CNN model

Fusion…The method has introduced in Colorization paper

・“ Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification” Satoshi Iizuka, Edgar Simo-Serra and Hiroshi Ishikawa http://hi.cs.waseda.ac.jp/~iizuka/projects/colorization/en/

The research aims to convert monotone input image to colorful output image through CNN supervised learning.

https://github.com/satoshiiizuka/siggraph2016_colorization

Input

Output

http://hi.cs.waseda.ac.jp/~iizuka/projects/colorization/en/





Neural network used in Colorization paper

Upper CNN: Main CNN used for colorizationLower CNN: This CNN is trained for image classification

So, different purpose CNN is utilized to help improve the performance of main CNN.

Lower CNN is expected to be learning global feature, e.g. “The image is taken in outsize”.The research reports colorization accuracy has increased by “fusioning” the global feature into main CNN.Example of how global feature helps colorization (Read paper for detail) - It reduces to mistakenly use sky-color at the top of image when the picture is taken in inside. - It reduces to mistakenly use brown ground color when picture is taken on the sea.

http://hi.cs.waseda.ac.jp/~iizuka/projects/colorization/en/　




SeRanet 　 Idea 2 Fusion SeRanet combines/fusions 2 types of CNN at 1st Phase.

Purpose ： Combining different type non-linear activation to get wide variety of model representation

※I want to use Convolutional Restricted Boltzmann Machine with Pre-training for lower CNN in the future development.

Upper CNN uses Leaky ReLU for Activation Lower CNN uses Sigmoid for Activation

SeRanet CNN model CNN model of seranet_v1

Split Splice

CNN Layer1 Layer2 Layer3

In channel 3 64 64

Out channel 64 64 128

Kernel size 5 5 5

# of parameter 4864 102464 204928

# of convolution

4800×WH 102400×WH 204800×WH

Layer4 Layer5 Layer6

256 512 256

512 256 128

1 1 3

131584 131584 295040

131072×WH 131072×WH 294912×WH

Layer7 Layer8 Layer9 Layer10

128 128 128 128

128 128 128 3

3 3 3 3

147584 147584 147584 3584

147456×4WH 147456×4WH 147456×4WH 3456×4WH

Fusion


× ２ × ４

3rd Phase2nd Phase1st Phase

Comparison

Parameter: 10 times more than waifu2x Convolution: 4 times

The number of parameter increases more compared to the number of convolution(calculation) increase.This is because SeRanet have position-dependent paramter (LU, RU, LD, RD).

→ Question: The increase of parameter and calculation results in better performance???

Model SRCNN paper waifu2x SeRanet_v1

Total parameter 55556 287585 3303680

Total convolution 55392×4WH 287136×4WH 1159150×4WH

Table of contents　Result 　 Performance 　　 Comparison between various resize methods

・ Bicubic ・ Lanczos ・ waifu2x ・ SeRanet

Conventional resize method

Resize through CNN

- Forward/backward slide for comparison- Slideshare may be difficult to find the difference, See the link for comparison https://github.com/corochann/SeRanet

ResultInput picture

ResultBicubic (OpenCV resize method is used)

ResultLanczos (OpenCV resize method is used)

Resultwaifu2x 　 (http://waifu2x.udp.jp/, Style ： photo, Noise reduction: None, Upscaling: ２ x)



ResultSeRanet

ResultOriginal data (ground truth data, for reference)

The difference can be found at detail point of 1st picture, thin stalk at 4th picture (high frequency channel)

ResultOriginal data (ground truth data, for reference)

　 Performance 　　 Comparison between various resize methods

*Based on personal feeling

・ Bicubic ・ Lanczos ・ waifu2x ・ SeRanet ・ Original image

(comparison by specific mesurement is not done yet)

Almost same

Almost sameDifferent

Different

Conventional resize method

Resize through CNN

Result

Summary SeRanet

・ Big size CNN is used （ Depth 9 layer, total parameter 3303680 ）・ RGB 3 Channel is used for input/output of CNN instead of only Y Channel・ Split, Splicing CNN　 Left-Up, Right-Up, Left-Down, Right-Down branch will use different paramter・ Fusion　 Different non-linearity is combined for flexibility of model representation・ Convolutional RBM Pretraining

The performance still not matured yet, we may improve more to get the output more close to original image.

At last,,,+ The project is open source project, on githubhttps://github.com/corochann/SeRanet

+ Improvement idea, discussion welcome

+ My Blog: http://corochann.com/

* If there is in-appropriate citing, please let me know.

* SeRanet is personal project, I may be misunderstanding. Please let me know if there’s wrong information.




http://corochann.com/



SeRanet introduction

Software