Exploring Complexity Reduction in Deep Learning Sourya Dey PhD Candidate, University of Southern California Advisors: Peter A. Beerel and Keith M. Chugg B. Tech, Instrumentation Engineering, IIT KGP, 2014 January 3, 2020
ExploringComplexityReductioninDeepLearning
SouryaDey
PhDCandidate,UniversityofSouthernCaliforniaAdvisors:PeterA.BeerelandKeithM.Chugg
B.Tech,InstrumentationEngineering,IITKGP,2014
January3,2020
Outline
Pre-Defined Sparsity
Reduce complexity of neural networks with minimal performance
degradation
UniversityofSouthernCalifornia
Overview
Neuralnetworks(NNs)arekeymachinelearningtechnologies
➢ Artificialintelligence➢ Self-drivingcars➢ Speechrecognition➢ FaceID➢ andmoresmartstuff…
SouryaDey �4
Basic working of an artificial neural network
Nodes/Neuronsinalayer
Basic working of an artificial neural network
Nodes/Neuronsinalayer
Edges/Connectionsinajunction
Basic working of an artificial neural network
Nodes/Neuronsinalayer
Edges/Connectionsinajunction
1
0.5
-2
-4.2
0.3
1.3-5
0
0
-0.7
3
-2.2
6.4-0.5
Weights
Basic working of an artificial neural network
Nodes/Neuronsinalayer
Edges/Connectionsinajunction
1
0.5
-2
-4.2
0.3
1.3-5
0
0
-0.7
3
-2.2
6.4-0.5
Weights
Basic working of an artificial neural network
Nodes/Neuronsinalayer
Edges/Connectionsinajunction
4.8 3.5
2 1
1
0.5
-2
-4.2
0.3
1.3-5
0
0
-0.7
3
-2.2
6.4-0.5
Weights
Basic working of an artificial neural network
Nodes/Neuronsinalayer
Edges/Connectionsinajunction
4.8 3.5
2 1
1
0.5
-2
-4.2
0.3
1.3-5
0
0
-0.7
3
-2.2
6.4-0.5
WeightsFeedforward
Cost
Basic working of an artificial neural network
InferenceTraining
Nodes/Neuronsinalayer
Edges/Connectionsinajunction
4.8 3.5
2 1
1
0.5
-2
-4.2
0.3
1.3-5
0
0
-0.7
3
-2.2
6.4-0.5
WeightsFeedforward
Backpropagation
Cost
Basic working of an artificial neural network
Training
Nodes/Neuronsinalayer
Edges/Connectionsinajunction
4.8 3.5
2 1
Weights2
0.4
-5
-5.9
0.9
1.4-4
0
1
-1.9
7
-4.7
2.5-1.1
Feedforward
Backpropagation
Update
Cost
Basic working of an artificial neural network
Training
Nodes/Neuronsinalayer
Edges/Connectionsinajunction
4.8 3.5
2 1
Weights2
0.4
-5
-5.9
0.9
1.4-4
0
1
-1.9
7
-4.7
2.5-1.1
Feedforward
Backpropagation
Update
Cost
Basic working of an artificial neural network
Weightsdominatecomplexity–theyareallusedinall3operations
UniversityofSouthernCalifornia
Motivation behind our work
TrainingcantakeweeksonCPUCloudGPUresourcesareexpensive
Fullyconnected(FC)MultilayerPerceptron(MLP)
TypicaldeepCNN
Modernneuralnetworkssufferfromparameterexplosion
SouryaDey �6
UniversityofSouthernCalifornia
Our Work: Pre-defined Sparsity
Pre-defineasparseconnectionpatternpriortotrainingUsethissparsenetworkforbothtrainingandinference
SouryaDey �7
UniversityofSouthernCalifornia
Our Work: Pre-defined Sparsity
Pre-defineasparseconnectionpatternpriortotrainingUsethissparsenetworkforbothtrainingandinference
SouryaDey �7
UniversityofSouthernCalifornia
Our Work: Pre-defined Sparsity
Pre-defineasparseconnectionpatternpriortotrainingUsethissparsenetworkforbothtrainingandinference
SouryaDey �7
UniversityofSouthernCalifornia
Our Work: Pre-defined Sparsity
Pre-defineasparseconnectionpatternpriortotrainingUsethissparsenetworkforbothtrainingandinference
SouryaDey �7
UniversityofSouthernCalifornia
Our Work: Pre-defined Sparsity
Pre-defineasparseconnectionpatternpriortotrainingUsethissparsenetworkforbothtrainingandinference
StructuredConstraints:Fixedin-,out-degreesforeverynode
SouryaDey �7
UniversityofSouthernCalifornia
Our Work: Pre-defined Sparsity
Pre-defineasparseconnectionpatternpriortotrainingUsethissparsenetworkforbothtrainingandinference
OverallDensitycomparedtoFC
StructuredConstraints:Fixedin-,out-degreesforeverynode
SouryaDey �7
UniversityofSouthernCalifornia
Our Work: Pre-defined Sparsity
Pre-defineasparseconnectionpatternpriortotrainingUsethissparsenetworkforbothtrainingandinference
OverallDensitycomparedtoFC
StructuredConstraints:Fixedin-,out-degreesforeverynode
Reducedtrainingandinferencecomplexity
SouryaDey �7
UniversityofSouthernCalifornia
Motivation behind pre-defined sparsity
InaFCnetwork,mostweightsareverysmallinmagnitudeaftertrainingSouryaDey �8
UniversityofSouthernCalifornia
Pre-defined sparsity performance on MLPs
SouryaDey �9
Startingwithonly20%ofparametersreducestestaccuracybyjust1%
UniversityofSouthernCalifornia
Pre-defined sparsity performance on MLPs
SouryaDey �9
Startingwithonly20%ofparametersreducestestaccuracybyjust1%
MNISThandwrittendigits
Reutersnewsarticles
TIMITphonemes
CIFARimages
MorsesymbolsS.Dey,K.M.ChuggandP.A.Beerel,“MorseCodeDatasetsforMachineLearning,”inICCCNT2018.WonBestPaperaward.https://github.com/usc-hal/morse-dataset
Analysis and Applications
Deep dive into pre-defined sparsity
for MLPs, and a corresponding application
UniversityofSouthernCalifornia
Designing pre-defined sparse networks
Apre-definedsparseconnectionpatternisahyperparametertobe
setpriortotraining
Findtrendsandguidelinestooptimizepre-definedsparsepatterns
SouryaDey �11
S.Dey,K.Huang,P.A.BeerelandK.M.Chugg,"Pre-DefinedSparseNeuralNetworkswithHardwareAcceleration,"inIEEEJournalonEmergingandSelectedTopicsinCircuitsandSystems,vol.9,no.2,pp.332-345,June2019.
UniversityofSouthernCalifornia
Individual junction densities
Latterjunctions(closertotheoutput)needtobedenserSouryaDey �12
UniversityofSouthernCalifornia
Individual junction densities
Eachcurvekeeps!2fixedandvaries!netbyvarying!1
Forthesame!net,!2>!1improvesperformance
SouryaDey �13
Mostlysimilartrendsobservedfordeepernetworks
UniversityofSouthernCalifornia
Highredundancy
Lowredundancy
Dataset redundancy
SouryaDey �14
UniversityofSouthernCalifornia
Highredundancy
Lowredundancy
Dataset redundancy
MNISTwithdefault784features
MNISTreducedto200featuresWiderspread
Lessredundancy=>LesssparsificationpossibleSouryaDey �14
UniversityofSouthernCalifornia
Effect of redundancy on sparsity
Reducingredundancyleadstoincreasedperformancedegradationonsparsification
SouryaDey �15
UniversityofSouthernCalifornia
‘Large sparse’ vs ‘small dense’ networks
Asparsernetworkwithmorehiddennodeswilloutperformadensernetworkwithlesshiddennodes,whenbothhavesamenumberofweights
SouryaDey �16
UniversityofSouthernCalifornia
‘Large sparse’ vs ‘small dense’ networksNetworkswithsamenumberofparametersgofrombadtogoodas#nodesinhiddenlayersisincreased
SouryaDey �17
UniversityofSouthernCalifornia
Regularization
SouryaDey �18
Regularizedcost
Originalunregularizedcost(likecross-entropy)
Regularizationterm
UniversityofSouthernCalifornia
Regularization
SouryaDey �18
Regularizedcost
Originalunregularizedcost(likecross-entropy)
Regularizationterm
Pre-definedsparsenetworksneedsmallerλ(asdeterminedbyvalidation)
Pre-definedsparsityreducestheoverfittingproblemstemmingfromover-parametrizationinbignetworks
OverallDensity λ
100% 1.1x10-4
40% 5.5x10-5
11% 0
ExampleforMNIST2-junctionnetworks
SlowTraining
HardwareIntensivez
Flexibility
Degreeofparallelism(z)=Numberofweightsprocessedinparallelinajunction
S.Dey,Y.Shao,K.M.ChuggandP.A.Beerel,“Acceleratingtrainingofdeepneuralnetworksviasparseedgeprocessing,”in26thInternationalConferenceonArtificialNeuralNetworks(ICANN)Part1,pp.273-280.Springer,Sep2017.
Application: A hardware architecture for on-device training and inference
Degreeofparallelism(z)=Numberofweightsprocessedinparallelinajunction
Connectionsdesignedforclash-freememoryaccessestopreventstalling
S.Dey,P.A.BeerelandK.M.Chugg,“Interleaverdesignfordeepneuralnetworks,”in51stAnnualAsilomarConferenceonSignals,Systems,andComputers(ACSSC),pp.1979-1983,Oct2017.
z=3
Application: A hardware architecture for on-device training and inference
Degreeofparallelism(z)=Numberofweightsprocessedinparallelinajunction
Connectionsdesignedforclash-freememoryaccessestopreventstalling
PrototypeimplementedonFPGA
S.Dey,D.Chen,Z.Li,S.Kundu,K.Huang,K.M.ChuggandP.A.Beerel,“AHighlyParallelFPGAImplementationofSparseNeuralNetworkTraining,”in2018InternationalConferenceonReconfigurableComputingandFPGAs(ReConFig),pp.1-4,Dec2018.Expandedpre-printversionavailableatarXiv:1806.01087.
Application: A hardware architecture for on-device training and inference
Model Search
Automate the design of CNNs with good performance and
low complexity
UniversityofSouthernCalifornia
Model search is ongoing research, hence currently not available publicly
SouryaDey �21