Making Deep Learning Understandable for Analyzing ... · Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation Dr. Yanjun Qi Department of Computer

MakingDeepLearningUnderstandableforAnalyzingSequentialDataabout

GeneRegulation

Dr.YanjunQiDepartmentofComputerScience

UniversityofVirginia

8/29/18 YanjunQi/UVACS 1

Tutorial@ACMBCB-2018

Today

• MachineLearning:aquickreview• DeepLearning:aquickreview• BackgroundBiology:aquickreview• DeepLearningforanalyzingSequentialDataaboutRegulation:

• DeepChrome• AttentiveChrome• DeepMotif

8/29/18 2

https://www.deepchrome.org

https://qdata.github.io/deep2Read/

YanjunQi/UVACS


• Biomedicine• Patient records, brain imaging, MRI & CT scans, …• Genomic sequences, bio-structure, drug effect info, …

• Science• Historical documents, scanned books, databases from

astronomy, environmental data, climate records, …

• Social media• Social interactions data, twitter, facebook records, online

reviews, …

• Business• Stock market transactions, corporate sales, airline traffic,

…

OUR DATA-RICH WORLD

Challenge of Data Explosion in Biomedicine

Molecular signatures oftumor / blood sample

Signs &Symptoms

Genetic Data

Public HealthData

Patient MedicalHistory &Demographics

Medical Images

Mobile medicalsensor data

TraditionalApproaches

Data-DrivenApproaches

MachineLearning

48/29/18 YanjunQi/UVACS

BASICS OF MACHINE LEARNING

• �The goal of machine learning is to build computer systems that can learn and adapt from their experience.� – Tom Dietterich

• �Experience� in the form of available dataexamples (also called as instances, samples)

• Available examples are described with properties (data points in feature space X)



e.g. SUPERVISED LEARNING• Find function to map input space X to output

space Y

• So that the difference between y and f(x) of each example x is small.

Ibelievethatthisbookisnotatallhelpfulsinceitdoesnotexplainthoroughlythematerial.itjustprovidesthereaderwithtablesandcalculationsthatsometimesarenoteasilyunderstood…

x

y-1

InputX:e.g.apieceofEnglishtext

OutputY:{1/Yes,-1/No}e.g.Isthisapositiveproduct review?

e.g.

SUPERVISED Linear Binary Classifier

• NowletuscheckoutaVERYSIMPLEcaseof


e.g.:Binaryy /Linearf/XasR2

f x y

f(x,w,b) = sign(wT x + b)

X =(x_1,x_2)

SUPERVISED Linear Binary Classifier

f x y

f(x,w,b) = sign(wT x + b)

wT x +b<0

CourtesyslidefromProf.AndrewMoore’stutorial

?

?

wTx +b>0

denotes +1 pointdenotes -1 pointdenotes future points

?


X =(x_1,x_2)

x_1

X_2


• Training (i.e. learning parameters w,b ) • Training set includes

• available examples' feature represenation: x1,…,xL• available corresponding labels y1,…,yL

• Find (w,b) by minimizing loss (i.e. difference between y and f(x) on available examples in training set)

(W, b) = argmin

W, b

Basic Concepts

• Testing (i.e. evaluating performance on �future�points)

• Difference between true y? and the predicted f(x?) on a set of testing examples (i.e. testing set)

• Key: example x? not in the training set

• Generalisation:learnfunction/hypothesisfrompastdatainorderto“explain”,“predict”,“model”or“control”new dataexamples

8/29/18

Basic Concepts

YanjunQi/UVACS 10


• Loss function • e.g. hinge loss for binary

classification task

• Regularization • E.g. additional information addedon loss function to control f

Basic Concepts

MaximizeSeparationMargin=>Minimize

Basics of Machine Learning

Input: X Output: Y



Input: XOutput: Y

Training

f(X)

f(X)=Y138/29/18 YanjunQi/UVACS


Input: X’Testing

f(X’)

SupervisedClassification


TYPICAL MACHINE LEARNING SYSTEM

8/29/18

Low-level sensing

Pre-processing

Feature Extract

Feature Select

Inference, Prediction, Recognition

Label Collection

YanjunQi/UVACS 15

Evaluation

TYPICAL MACHINE LEARNING SYSTEM

8/29/18

Low-level sensing

Pre-processing

Feature Extract

Feature Select

Inference, Prediction, Recognition

Label Collection

Data Complexity in X

Data Complexity

in Y

YanjunQi/UVACS 16

Evaluation

UNSUPERVISED LEARNING : [ COMPLEXITY OF Y ]

• No labels are provided (e.g. No Y provided)• Find patterns from unlabeled data, e.g. clustering

8/29/18

e.g.clustering=>tofind�natural� groupingofinstancesgivenun-labeleddata

YanjunQi/UVACS 17

Structured Output Prediction: [ COMPLEXITY in Y ]

• Many prediction tasks involve output labels having structured correlations or constraints among instances

8/29/18

Manymorepossible structuresbetweeny_i ,e.g.spatial,temporal, relational…

Thedogchasedthecat

APAFSVSPASGACGPECA…

TreeSequence GridStructured Dependency between Examples’ Y

Input

Output

CCEEEEECCCCCHHHCCC…

YanjunQi/UVACS 18

Original Space Feature Space

Structured Input: Kernel Methods [ COMPLEXITY OF X ]

Vectorvs.Relationaldata

e.g.Graphs,Sequences,3Dstructures,


�

�

�

More Recent: Representation Learning[ COMPLEXITY OF X ]

Deep Learning Supervised Embedding

8/29/18

Layer-wise Pretraining

YanjunQi/UVACS 20

Whylearnfeatures?

21

WhentouseMachineLearning?

• 1.Extractknowledgefromdata• Relationshipsandcorrelationscanbehiddenwithinlargeamountsofdata• Theamountofknowledgeavailableaboutcertaintasksissimplytoolargeforexplicitencoding(e.g.rules)byhumans

• 2.Learntasksthataredifficulttoformalise• Hard todefinewell,exceptbyexamples(e.g.facerecognition)

• 3.Createsoftwarethatimprovesovertime• Newknowledgeisconstantlybeingdiscovered.• Ruleorhumanencoding-basedsystemisdifficulttocontinuouslyre-design�byhand�.


Recap

•GoalofMachineLearning:Generalisation

• Training• Testing• Loss

8/29/18 23YanjunQi/UVACS

Today

• MachineLearning:aquickreview• DeepLearning:aquickreview• BackgroundBiology:aquickreview• DeepLearningforanalyzingSequentialDataaboutRegulation:

• DeepChrome• AttentiveChrome• DeepMotif

8/29/18 24

https://www.deepchrome.org


YanjunQi/UVACS

• DeepLearning• Whyisthisabreakthrough?• Basics• History• AFewRecenttrends

8/29/18 25


YanjunQi/UVACS

Deep Learning is Changing the World

8/29/18 Manymore!YanjunQi/UVACS 26

Whybreakthrough?

8/29/18 27

DeepLearning DeepReinforcementLearning

GenerativeAdversarialNetwork(GAN)

YanjunQi/UVACS

Breakthrough from 2012 Large-Scale Visual Recognition Challenge (ImageNet)

In one �very large-scale� benchmark competition(1.2 million images [X] vs.1000 different word labels [Y])

288/29/18

10%improvewithdeepCNN

YanjunQi/UVACS

29AdaptfromFromNIPS2017DLTrendTutorial


DNNshelpusbuildmoreintelligentcomputers

• Perceivetheworld,• e.g.,objectiverecognition,speechrecognition,…

•Understandtheworld,• e.g.,machinetranslation,textsemanticunderstanding

• Interactwiththeworld,• e.g.,AlphaGo,AlphaZero,self-drivingcars,…

• Beingabletothink/reason,• e.g.,learntocodeprograms,learntosearchdeepNN,…

• Beingabletoimagine/tomakeanalogy,• e.g.,learntodrawwithstyles,……


DeepLearningWay: LearningRepresentationfromdata

Feature Engineering ü Most critical for accuracy ü� Account for most of the computation ü �Most time-consuming in development cycle ü� Often hand-craft and task dependent in practice

Feature Learning ü Easily adaptable to new similar tasks ü Learn layerwise representation from data


Basics

•BasicNeuralNetwork(NN)• singleneuron,e.g.logisticregressionunit• multilayerperceptron(MLP)• variouslossfunction

• E.g.,whenformulti-classclassification,softmax layer• trainingNNwithbackprop algorithm


One“Neuron”:ExpandedLogisticRegression

x1

x2

x3

Σ

+1

z

z = wT . x + b

y = sigmoid(z) =33

ez

1 + ez

p = 3

w1

w2

w3

b1SummingFunction

SigmoidFunction

Multiplybyweights

ŷ = P(Y=1|x,w)

Input x

E.g.,ManyPossibleNonlinearityFunctions(akatransferoractivationfunctions)

x w

34https://en.wikipedia.org/wiki/Activation_function#Comparison_of_activation_functions

Name Plot Equation Derivative(w.r.tx )

usuallyworksbest inpractice

ez

1 + ez

One“Neuron”:ExpandedLogisticRegression

x1

x2

x3

Σ

+1

z

z = wT . x + b

y = sigmoid(z) =35

p = 3

w1

w2

w3

b1SummingFunction

SigmoidFunction

Multiplybyweights

ŷ = P(Y=1|x,w)

Input x

=>“NeuronView”

Multi-LayerPerceptron(MLP)- (Feed-ForwardNN)

36

1st

hiddenlayer

2nd

hiddenlayer

Outputlayer

x1

x2

x3

x ŷ

3-layerMLP-NN

W1

w3

W2

Historyè Perceptron:1-NeuronUnitwithStep

−FirstproposedbyRosenblatt(1958)−Asimpleneuronthatisusedtoclassifyitsinputintooneoftwocategories.−Aperceptronusesa stepfunction

φ(z)= +1ifz ≥0

−1ifz <0⎧⎨⎩

8/29/18 37

x1

x2

x3

Σ

SummingFunction

Step Function

w1

w2

w3

+1

b1

z

Multiplybyweights

YanjunQi/UVACS

z1

z2

z3

38

x1

x2

x3

x

Σ

Σ

Σ

ŷ1

ŷ2

ŷ3

E.g.,Cross-EntropyLossforMulti-ClassClassification

“Softmax”function. Normalizingfunctionwhichconvertseachclassoutputtoaprobability.

EW (ŷ,y) = loss = - yj ln ŷjΣj = 1.. .K

= P( ŷi = 1 | x )

W1 W3

W2

ŷi

Cross-entropyloss

K = 3

“BlockView”

x

1sthidden layer

2ndhidden layer Output layer

39

*W1

*W2

*W3

z1 z2 z3h1 h2

LossModule

“Softmax”

E (ŷ,y)ŷ

40

BuildingDeepNeuralNets

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

fx y

TrainingNeuralNetworks

41

Howdowelearntheoptimal weightsWL forour task??● StochasticGradientdescent:

LeCunet.al.EfficientBackpropagation. 1998

WLt = WL

t-1 - ! " E" WL

Buthowdowegetgradientsoflowerlayers?● Backpropagation!

○ Repeatedapplicationofchainruleofcalculus○ Locallyminimizetheobjective○ Requiresall“blocks”ofthenetworktobedifferentiable

x ŷ

W1

w3

W2

EW (ŷ,y)

– MainIdea:errorinhidden layers

IllustratingObjectiveLossFunction(extremelysimplified)andGradientDescent(2Dcase)


EW

W1 W2

E{xi,yi}(W1, W2)

Thegradientpointsinthedirection(inthevariablespace)ofthegreatestrateofincreaseofthefunctionanditsmagnitude istheslopeofthesurfacegraphinthatdirection

8/29/18 43AdaptfromFromNIPS2017DLTrendTutorial

YanjunQi/UVACS

ImportantBlock:ConvolutionalNeuralNetworks(CNN)

• Prof.Yann LeCun inventedCNNin1998• FirstNNsuccessfullytrainedwithmanylayers

44Y.LeCun,L.Bottou,Y.Bengio,andP.Haffner,Gradient-basedlearningappliedtodocument recognition,ProceedingsoftheIEEE86(11):2278–2324,1998.


CNNmodelsLocalityandTranslationInvariance

Makefully-connectedlayerlocally-connectedandsharing weight

YanjunQi/UVACS

• Prof.Schmidhuber invented"Longshort-termmemory”– RecurrentNN(LSTM-RNN) modelin1997

8/29/18 46

Sepp Hochreiter;JürgenSchmidhuber (1997)."Longshort-termmemory".NeuralComputation.9(8):1735–1780.

ImageCreditsfromChristopherOlah

ImportantBlock:RecurrentNeuralNetworks(RNN)

YanjunQi/UVACS

RNNmodelsdynamictemporaldependency

47

Imagecredit:wildML

• Makefully-connectedlayermodeleachunitrecurrently• Unitsformadirectedchaingraphalongasequence• Eachunitusesrecenthistoryandcurrentinputinmodeling

LSTMforMachineTranslation(GermanytoEnglish)

• DeepLearning• Whyisthisabreakthrough?• Basics• History• AFewRecenttrends

8/29/18 48


YanjunQi/UVACS

Manyclassificationmodelsinventedsincelate80’s• Neuralnetworks• Boosting• SupportVectorMachine• MaximumEntropy• RandomForest• ……


DeepLearning(CNN)inthe90’s• Prof.Yann LeCun inventedConvolutionalNeuralNetworks(CNN)in1998• FirstNNsuccessfullytrainedwithmanylayers

8/29/18 50

Y.LeCun,L.Bottou,Y.Bengio,andP.Haffner,Gradient-basedlearningappliedtodocument recognition,ProceedingsoftheIEEE86(11):2278–2324,1998.

YanjunQi/UVACS

DeepLearning(RNN)inthe90’s

• Prof.Schmidhuber invented"Longshort-termmemory”– RecurrentNN(LSTM-RNN) modelin1997

8/29/18 51

Sepp Hochreiter;JürgenSchmidhuber (1997)."Longshort-termmemory".NeuralComputation.9(8):1735–1780.

ImageCreditsfromChristopherOlahYanjunQi/UVACS

Between~2000to~2011MachineLearningFieldInterest

• LearningwithStructures!+ConvexFormulation!• Kernellearning• ManifoldLearning• SparseLearning• Structuredinput-outputlearning…• Graphicalmodel• TransferLearning• Semi-supervised• Matrixfactorization• ……


“WinterofNeuralNetworks”Since90’sto~2011

• Non-convex

• Needalotoftrickstoplaywith• Howmanylayers?• Howmanyhiddenunitsperlayer?• Whattopologyamonglayers?…….

• Hardtoperformtheoreticalanalysis


Breakthrough in 2012 Large-Scale Visual Recognition Challenge (ImageNet) : Milestones in Recent Vision/AI Fields


- 2013,GoogleAcquiredDeepNeuralNetworksCompanyheadedbyUtoronto “DeepLearning”ProfessorHinton

- 2013,FacebookBuiltNewArtificialIntelligenceLabheadedbyNYU“DeepLearning”ProfessorLeCun- 2016,Google'sDeepMind defeatslegendaryGoplayerLeeSe-dol inhistoricvictory/2017AlphaZero

Reason:Plentyof(Labeled)Data

• Text:trillionsofwordsofEnglish+otherlanguages• Visual:billionsofimagesandvideos• Audio: thousandsofhoursofspeechperday• Useractivity:queries,userpageclicks,maprequests,etc,• Knowledgegraph:billionsoflabeledrelationaltriplets

• ………

8/29/18 55Dr.JeffDean’stalk

YanjunQi/UVACS

Reason:AdvancedComputerArchitecturethatfitsDNNs


http://www.nvidia.com/content/events/geoInt2015/LBrown_DL.pdf

SomeRecentTrends• 1.Autoencoder/layer-wisetraining• 2.CNN /Residual/Dynamicparameter• 3.RNN /Attention /Seq2Seq,…• 4.NeuralArchitecturewithexplicitMemory• 5.NTM4programinduction/sequentialdecisions• 6.Learningtooptimize/LearningDNNarchitectures• 7.Learningtolearn/meta-learning/few-shots• 8.DNNongraphs/trees/sets• 9.DeepGenerativemodels,e.g.,autoregressive• 10.GenerativeAdversarialNetworks(GAN)• 11.Deepreinforcementlearning• 12.Validate/Evade/Test/Understand /VerifyDNNs

8/29/18 57


YanjunQi/UVACS

Recap


LearnedModels


YanjunQi/UVACS

BREAK5mins->SecondHalf8/29/18 59YanjunQi/UVACS

MakingDeepLearningUnderstandableforAnalyzingSequentialDataaboutGeneRegulation

Tutorial@ACMBCB-2018

Dr.YanjunQiDepartmentofComputerScienceUniversityofVirginia

Making Deep Learning Understandable for Analyzing ... · Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation Dr. Yanjun Qi Department of Computer

Documents