SeRanet Super resolution software through Deep Learning https :// github.com/corochann/SeRanet
SeRanetSuper resolution software through Deep Learninghttps://github.com/corochann/SeRanet
Table of contents Introduction Machine learning Deep learning
SRCNN Problem Introduction of previous works “ Image Super-Resolution Using Deep Convolutional Networks” waifu2x
SeRanet Sprice Fusion CNN model Result Performance
Conclusion
Table of contents
Introduction Machine learning Deep learning
What is machine learning There are 3 major category in machine learning
・ Supervised learning Pile of input data and “correct/labeled” output data is given during the training. Goal: train software to output “correct/label” value from given input data. Ex. Image recognition (input: image data, output: recognition result (human, cat, car etc.)) Voice recognition (input: human voice data, output: text which human speaks))
・ Unsupervised learning Only a lot of input data is given. Goal: categorize data based on the data statistics (find out existing deviation in the data) Ex. Categorize the type of cancer Link users who have similar interests in the web application for recommendation
・ Reinforcement learningThe problem setting: agent chooses an “action” inside given “environment”. Choosing an action makes interference with environment and agent gets some “reward”.Goal: Find out an action which maximizes the reward agent can gain.
Ex. Deepmind DQN, Alpha GO Robot self learning: how to control own parts
What is machine learning There are 3 major category in machine learning
・ Supervised learning Pile of input data and “correct/labeled” output data is given during the training. Goal: train software to output “correct/label” value from given input data. Ex. Image recognition (input: image data, output: recognition result (human, cat, car etc.)) Voice recognition (input: human voice data, output: text which human speaks))
・ Unsupervised learning Only a lot of input data is given. Goal: categorize data based on the data statistics (find out existing deviation in the data) Ex. Categorize the type of cancer Link users who have similar interests in the web application for recommendation
・ Reinforcement learningThe problem setting: agent chooses an “action” inside given “environment”. Choosing an action makes interference with environment and agent gets some “reward”.Goal: Find out an action which maximizes the reward agent can gain.
Ex. Deepmind DQN, Alpha GO Robot self learning: how to control own parts
SeRanet uses this machine learning
Deep learning“Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using a deep graph with multiple processing layers, composed of multiple linear and non-linear transformations.” cite from Wikipedia “Deep Learning”
Input Output
Table of contents
SRCNN Problem Introduction of previous works “ Image Super-Resolution Using Deep Convolutional Networks” waifu2x
Super resolution task by machine learning Problem definition ・ You are given a compressed picture with half size. Recover original picture and output it. Training phase:
Map
The goal of this machine learning is to construct a map to convert compressed picture into original picture (as close as possible).
Original picture Compressed picture (half size)
Super resolution task by machine learning After training ・ Input: arbitrary picture → Output: twice size picture with super resolution
Twice size pictureHigh quality
Picture to be enlarged
map obtained by machine learning
Representation of the “map”Deep Convolutional Neural Network ( CNN ) is used. - Current trend for image recognition task
Previous work ①“Image Super-Resolution Using Deep Convolutional Networks” Chao Dong, Chen Change Loy, Kaiming He and Xiaoou Tang https://arxiv.org/abs/1501.00092
・ The original paper which suggest “SRCNN”.
It reports that superior result is obtained for super resolution using Convolutional Neural Network.
In this slide, this work paper be denoted as “SRCNN paper” in the following http://mmlab.ie.cuhk.edu.hk/projects/SRCNN.html
Algorithm summary1. Read picture/image file
2. Enlarge the picture twice size in advance as preprocessing (can be generalized to n-times size) 3. Convert RGB format into YC b Cr format, and extract Y Channel
4. Normalization: Convert value range from 0-255 to 0-1
5. Input Y Channel data into CNN As output, we obtain Y channel data with normalized value
6. Revert value range to 0-255
7. CbCr Channel is enlarged by conventional method like Bicubic method etc. Compose obtained Y channel and CbCr Channel to get final result.
※ 3. , 7. can be skipped when you construct CNN with input/output RGB Channel
Remark of algorithm ① ・ Deal with only Y Channel as input/output data of CNN - Human is more sensitive to luminance (Y) than color difference chrominance (CrCb). - Only deal Y channel to reduce the problem complexity of learning CNN.
Y channel Cr channel Cb channel YCbCr decomposition
Remark of algorithm ① ・ Deal with only Y Channel as input/output data of CNN - Human is more sensitive to luminance (Y) than color difference chrominance (CrCb). - Only deal Y channel to reduce the problem complexity of learning CNN.
Y channel Cr channel Cb channel YCbCr decomposition
・ RGB Channel training is difficult? SRCNN paper surveys the case when CNN is trained for input/output RGB 3 Channel data.The result suggests that Y channel CNN has superior performance than RGB CNN when the CNN size is not so big.
Remark of algorithm ② ・ Enlarge the picture/image data in advance before input to CNNSRCNN paper uses Bicubic method, waifu2x (explained later) uses Nearest neighbor method to enlarge picture before input the data to CNN
[ Reason ]Input/output picture size is almost same when you implement convolutional neural network by machine learning library. ( Regolously, output picture size will be smaller by filter size - 1)※
input output
CNN
- Enlarge picture- Y channel extraction
Enlarge picture, CNN is not used for C r・ Cb Channel
compose
Previous work ① CNN model
CNN Layer1 Layer2 Layer3In channel 1 32 64Out channel 32 64 1Kernel size 9 5 5# of parameter 2628 51264 1664# of convolution 2592×4WH 51200×4WH 1600×4WH
Relatively shallow CNN architecture with big kernel size
SRCNN paper’s CNN model, one example (many other parameters are tested in the paper)
※# of parameter = In channel * Out channel * (kernel size)^2 + Out channel# of convolution per pixel = In channel * Out channel * (kernel size) ^2 + Out channel※
Soppse the picture size before enlarge as w × h pixel, then the picture size after enlarge will be 2w×2h. We need convolution for 4wh pixels in total.
total # of parameter : 55556 total # of convolution : 55392×4WH
Previous work ② waifu2xwaifu2x https://github.com/nagadomi/waifu2x
The term “waifu” comes from Japanese pronunciation of “wife” (Japanese uses the term “wife” to their favorite female anime character)
https://github.com/nagadomi/waifu2x
Open source software, originally published to enlarge art-style imageIt also supports picture style now.
You can test the application on server. http://waifu2x.udp.jp/
Previous work ② waifu2xwaifu2x is open source software, which makes other software engineers to develop the related software.
Many of the derivative software is published now.
[Related links (in Japanese)]・ waifu2xとその派生ソフト一覧 http://kourindrug.sakura.ne.jp/waifu2x.html・サルでも分かる waifu2x のアルゴリズムhttps://drive.google.com/file/d/0B22mWPiNr-6-RVVpaGhZa1hJTnM/view
先行ソフト② waifu2x CNN model
畳み込みの Kernel size を 3 と小さくとる分、深いニューラルネットを構成している。
CNN Layer1 Layer2 Layer3 Layer4 Layer5 Layer6 Layer7In channel 1 32 32 64 64 128 128Out channel 32 32 64 64 128 128 1Kernel size 3 3 3 3 3 3 3# of parameter 320 9248 18496 36928 73856 147584 1153# of convolution 288×4WH 9216×4WH 18432×4WH 36864×4WH 73728×4WH 147456×4WH 1152×4WH
Deep CNN architecture with small kernel size
total # of parameter : 287585 total # of convolution : 287136×4WH
What is special for SRCNN task The big difference point with image reconition task
1. Position sensitivity is required+ Image recognition task: - Translation invariant property is welcomed and Max pooling or Stride technique is often utilized.+ SRCNN task: - Translation variant property is necessary for super resolution to since it requires position-dependent output.
2. Feature map image size don’t reduce during the CNN image processing. → As the number of feature map increases, amount of calculation increases Speed/memory restriction is severe
Required memory for CNN The volume of rectangular in CNN model figure≒ For image recognition task, usually feature map image size will be smaller in the deeper layer of CNN. The number of feature map can be bigger if the image size is smaller.
Table of contents
Explanation of SeRanet project starts from here.Introduction of the idea behind SeRanet.
SeRanet Idea 1 Sprice Idea 2 Fusion SeRanet CNN model
SeRanet Idea 1 Splice Input with the pre-scaled size w × h picture image, and getting 2w × 2h size picture as output → Introduce “Split” and “Splice” concept
input size: w × h
Split
Splice: 4 branches of neural network will be merged to obtain one 2w × 2h size image
Split: 4 branches of neural network (NN) with size w × h will be created
Splice
output size: 2 w × 2h LU
RU
LD
RD
SeRanet Idea 1 Splice After split, 4 branches of neural network corresponds to Left-Up (LU), Right-Up (RU), Left-Down (LD), Right-Down(RD) pixel of enlarged picture.
Split Splice
Input image Output image
(1, 1) (1, 2)
(2, 1) (2, 2)
(1, 1) (1, 2)
(2, 1) (2, 2)
(1, 1) (1, 2)
(2, 1) (2, 2)
(1, 1) (1, 2)
(2, 1) (2, 2)
(1, 1) (1, 2)
(2, 1) (2, 2)
(1, 1) (1, 2)
(2, 1) (2, 2)
(1, 1)
(1, 1)(1, 1)
(1, 2)
(1, 2)(1, 2)
(2, 1)
(2, 1)
(2, 1) (2, 2)
(2, 2)
(2, 2)
At splice phase, 4 branches of CNN will be combined/spliced to get twice size image
LU RU
LD RD
The effect of introducing Splice → Flexibility of neural network modelling
Input w × h
Split Splice
Output 2 w × 2 h
3rd Phase2nd Phase1st Phase
1st Phase : Image size is wxh before enlarged, the amount of calculation is 1/4 compared to 3rd phase. → The larger size of feature map, Kernel size is accepted in this phase.
2nd Phase : 4 branches CNN with image size wxh. Total calculation amount is same with 3rd Phase, but the parameter learned at each branch (LU, RU, LD, RD) can be different. → model representation potential will grow. Another advantage is memory consumption is smaller compared to 3rd phase due to the size of image. 3rd Phase : Image size 2wx2h. The last phase to get output. Memory consumption and the amount of calculation will be 4 times larger than 1st Phase.
目次 SeRanet Idea 1 Sprice Idea 2 Fusion SeRanet CNN model
Fusion…The method has introduced in Colorization paper
・“ Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification” Satoshi Iizuka, Edgar Simo-Serra and Hiroshi Ishikawa http://hi.cs.waseda.ac.jp/~iizuka/projects/colorization/en/
The research aims to convert monotone input image to colorful output image through CNN supervised learning.
https://github.com/satoshiiizuka/siggraph2016_colorization
Input
Output
Neural network used in Colorization paper
Upper CNN: Main CNN used for colorizationLower CNN: This CNN is trained for image classification
So, different purpose CNN is utilized to help improve the performance of main CNN.
Lower CNN is expected to be learning global feature, e.g. “The image is taken in outsize”.The research reports colorization accuracy has increased by “fusioning” the global feature into main CNN.Example of how global feature helps colorization (Read paper for detail) - It reduces to mistakenly use sky-color at the top of image when the picture is taken in inside. - It reduces to mistakenly use brown ground color when picture is taken on the sea.
http://hi.cs.waseda.ac.jp/~iizuka/projects/colorization/en/
SeRanet Idea 2 Fusion SeRanet combines/fusions 2 types of CNN at 1st Phase.
Purpose : Combining different type non-linear activation to get wide variety of model representation
※I want to use Convolutional Restricted Boltzmann Machine with Pre-training for lower CNN in the future development.
Upper CNN uses Leaky ReLU for Activation Lower CNN uses Sigmoid for Activation
SeRanet CNN model CNN model of seranet_v1
Split Splice
CNN Layer1 Layer2 Layer3
In channel 3 64 64
Out channel 64 64 128
Kernel size 5 5 5
# of parameter 4864 102464 204928
# of convolution
4800×WH 102400×WH 204800×WH
Layer4 Layer5 Layer6
256 512 256
512 256 128
1 1 3
131584 131584 295040
131072×WH 131072×WH 294912×WH
Layer7 Layer8 Layer9 Layer10
128 128 128 128
128 128 128 3
3 3 3 3
147584 147584 147584 3584
147456×4WH 147456×4WH 147456×4WH 3456×4WH
Fusion
total # of parameter : 3303680 total # of convolution : 1159150×4WH
× 2 × 4
3rd Phase2nd Phase1st Phase
Comparison
Parameter: 10 times more than waifu2x Convolution: 4 times
The number of parameter increases more compared to the number of convolution(calculation) increase.This is because SeRanet have position-dependent paramter (LU, RU, LD, RD).
→ Question: The increase of parameter and calculation results in better performance???
Model SRCNN paper waifu2x SeRanet_v1
Total parameter 55556 287585 3303680
Total convolution 55392×4WH 287136×4WH 1159150×4WH
Table of contents Result Performance Comparison between various resize methods
・ Bicubic ・ Lanczos ・ waifu2x ・ SeRanet
Conventional resize method
Resize through CNN
- Forward/backward slide for comparison- Slideshare may be difficult to find the difference, See the link for comparison https://github.com/corochann/SeRanet
ResultInput picture
ResultBicubic (OpenCV resize method is used)
ResultLanczos (OpenCV resize method is used)
Resultwaifu2x (http://waifu2x.udp.jp/, Style : photo, Noise reduction: None, Upscaling: 2 x)
ResultSeRanet
ResultOriginal data (ground truth data, for reference)
The difference can be found at detail point of 1st picture, thin stalk at 4th picture (high frequency channel)
ResultOriginal data (ground truth data, for reference)
Performance Comparison between various resize methods
*Based on personal feeling
・ Bicubic ・ Lanczos ・ waifu2x ・ SeRanet ・ Original image
(comparison by specific mesurement is not done yet)
Almost same
Almost sameDifferent
Different
Conventional resize method
Resize through CNN
Result
Summary SeRanet
・ Big size CNN is used ( Depth 9 layer, total parameter 3303680 )・ RGB 3 Channel is used for input/output of CNN instead of only Y Channel・ Split, Splicing CNN Left-Up, Right-Up, Left-Down, Right-Down branch will use different paramter・ Fusion Different non-linearity is combined for flexibility of model representation・ Convolutional RBM Pretraining
The performance still not matured yet, we may improve more to get the output more close to original image.
At last,,,+ The project is open source project, on githubhttps://github.com/corochann/SeRanet
+ Improvement idea, discussion welcome
+ My Blog: http://corochann.com/
* If there is in-appropriate citing, please let me know.
* SeRanet is personal project, I may be misunderstanding. Please let me know if there’s wrong information.