Page 1
Improved Deep-Learning Side-Channel Attacks usingNormalization Layers
Damien Robissout, Gabriel Zaid, Lilian Bossuet, Amaury Habrard
[email protected]
Laboratoire Hubert CurienUniversite Jean Monnet
16/04/2019
Robissout, D. (LabHC) 16/04/2019 1 / 26
Page 2
Introduction
Good performance of neural networks in side-channel analysis
Improvement possible using batch normalization and regularization
No deep learning metric usable to evaluate networks for SCA
Proposition of a metric to tell how well a given architecture couldperform
Robissout, D. (LabHC) 16/04/2019 2 / 26
Page 3
Content
1 Batch Normalization
2 ∆train,val : an SCA metric to evaluate performances
3 Regularization
4 Conclusion
Robissout, D. (LabHC) 16/04/2019 3 / 26
Page 4
Content
1 Batch Normalization
2 ∆train,val : an SCA metric to evaluate performances
3 Regularization
4 Conclusion
Robissout, D. (LabHC) 16/04/2019 4 / 26
Page 5
Batch Normalization
Goal
Standardize the data representation across all layers
Consequence
The network focuses on the relative differences of the values rather thanon the numerical values
Neurons
(µ, σ2)
Val
ues
ofth
en
euro
ns
Val
ues
ofth
en
euro
ns
Neurons
(0, 1)
Batch Normalization
Robissout, D. (LabHC) 16/04/2019 5 / 26
Page 6
Updated architecture: CNNbn
Network architecture with Batch Normalization
Robissout, D. (LabHC) 16/04/2019 6 / 26
Page 7
Training on ASCAD desynchronized traces
DesyncN: random shift between 0 and N applied to the 700 points ofthe traces
Desync0
Robissout, D. (LabHC) 16/04/2019 7 / 26
Page 8
Training on ASCAD desynchronized traces
DesyncN: random shift between 0 and N applied to the 700 points ofthe traces
Desync0 Desync50
Robissout, D. (LabHC) 16/04/2019 7 / 26
Page 9
Training on ASCAD desynchronized traces
DesyncN: random shift between 0 and N applied to the 700 points ofthe traces
Desync0 Desync50 Desync100
Robissout, D. (LabHC) 16/04/2019 7 / 26
Page 10
Evaluate the performance of a network
Robissout, D. (LabHC) 16/04/2019 8 / 26
Page 11
Training Acc. vs. Validation Acc.
Goal
Evaluate the networks during training
CNNbest
Robissout, D. (LabHC) 16/04/2019 9 / 26
Page 12
Training Acc. vs. Validation Acc.
Goal
Evaluate the networks during training
CNNbest CNNbn
Robissout, D. (LabHC) 16/04/2019 9 / 26
Page 13
Content
1 Batch Normalization
2 ∆train,val : an SCA metric to evaluate performances
3 Regularization
4 Conclusion
Robissout, D. (LabHC) 16/04/2019 10 / 26
Page 14
The overfitting phenomena
OverfittingGood estimation
Robissout, D. (LabHC) 16/04/2019 11 / 26
Page 15
∆train,val : evaluation of the generalization capacity
Goal
Have a clear indication if the network is overfitting/underfitting and if theperformance of the network can be improved
Notations
Ttrain = Set of traces the network used to train
Tval = Set of traces the network has never seen
Ntrain(model) := min{ntrain | ∀n ≥ ntrain,SR1train(model(n)) = 90%}
Nval(model) := min{nval | ∀n ≥ nval ,SR1val(model(n)) = 90%}
Metric
∆train,val(model) =| Nval(model)− Ntrain(model) |
Robissout, D. (LabHC) 16/04/2019 12 / 26
Page 16
How to use the metric
Robissout, D. (LabHC) 16/04/2019 13 / 26
Page 17
Representation of ∆train,att for CNNbn
Robissout, D. (LabHC) 16/04/2019 14 / 26
Page 18
Content
1 Batch Normalization
2 ∆train,val : an SCA metric to evaluate performances
3 Regularization
4 Conclusion
Robissout, D. (LabHC) 16/04/2019 15 / 26
Page 19
Regularization
Goal
Reduce ∆train,att even further using regularization
Means
Dropout with parameter λD
L2-Norm regularization with parameter λL2
Robissout, D. (LabHC) 16/04/2019 16 / 26
Page 20
Regularization
Goal
Reduce ∆train,att even further using regularization
Means
Dropout with parameter λD
L2-Norm regularization with parameter λL2
Test (step = 0.1) Choice for desync100λD λL2 λD λL2
CONV 1&2 [0, ..., 0.3] [0, ..., 0.3] 0 0CONV 3 [0, ..., 0.8] [0, ..., 0.3] 0.5 0.2CONV 4 [0, ..., 0.8] [0, ..., 0.3] 0.6 0.3CONV 5 [0, ..., 0.8] [0, ..., 0.3] 0.7 0.3FC1 [0, ..., 0.8] [0, ..., 0.3] 0 0.3FC2 [0, ..., 0.3] [0, ..., 0.3] 0 0
Robissout, D. (LabHC) 16/04/2019 16 / 26
Page 21
Architecture with regularization: CNNbn+reg
Robissout, D. (LabHC) 16/04/2019 17 / 26
Page 22
Results without regularization: CNNbn
Robissout, D. (LabHC) 16/04/2019 18 / 26
Page 23
Results with regularization: CNNbn+reg
Robissout, D. (LabHC) 16/04/2019 19 / 26
Page 24
Results with regularization: CNNbn+reg
Robissout, D. (LabHC) 16/04/2019 19 / 26
Page 25
Attack on desync100 using λL2= 0.1 for CNNbn+reg
Robissout, D. (LabHC) 16/04/2019 20 / 26
Page 26
Attack on desync100 using λL2= 0.2 for CNNbn+reg
Robissout, D. (LabHC) 16/04/2019 21 / 26
Page 27
Attack on desync100 using λL2= 0.3 for CNNbn+reg
Robissout, D. (LabHC) 16/04/2019 22 / 26
Page 28
Evolution of ∆train,att for different numbers of epochs
Best results on other desynchronizations
Ntrain Natt ∆train,att FC1: λL2 Nb epochs
Desync0 104 272 168 0.1 125Desync50 21 279 258 0.1 200
Desync100 76 395 319 0.3 175
Robissout, D. (LabHC) 16/04/2019 23 / 26
Page 29
Content
1 Batch Normalization
2 ∆train,val : an SCA metric to evaluate performances
3 Regularization
4 Conclusion
Robissout, D. (LabHC) 16/04/2019 24 / 26
Page 30
Conclusion
New metric to evaluate the possible improvement of an architecture
Normalization and regularization improve CNN performance inSCA
Given the amount of regularization needed to obtain those results, abetter architecture probably exists
Apply this technique to other networks
Robissout, D. (LabHC) 16/04/2019 25 / 26
Page 31
Improved Deep-Learning Side-Channel Attacks usingNormalization Layers
Thank you for listening. Do you have questions ?
26 / 26
Page 32
Dropout example
Ref.: Roffo, Giorgio. (2017). Ranking to Learn and Learning to Rank: On theRole of Ranking in Pattern Recognition Applications.
1 / 2
Page 33
Pooling example
Ref.: Max pooling in CNN.Source: http://cs231n.github.io/convolutional-networks/
2 / 2