Top Banner
Deep Learning and Application in Bioinformatics
51

Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Sep 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Deep Learning and Application in Bioinformatics

Page 2: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Neural Network

Picture by me and Google AutoDraw

Human brain is the most sophisticated intelligence system so far. Can we create algorithms to model the brain neural network?

Page 3: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Neural Network• Invented to mirror the function of the brain.• Two resurgences:

• 1980's: development of backpropagation• 2000's:

• Improved design: CNN, RNN, GAN, ...

• Techniques of training: unsupervised pre-training...

• Increased computing power: GPU computation

• Big Data

• Getting a fancy name: Deep Learning

A series of techniques to construct neural networks and to facilitate their learning processes.

Page 4: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Neuron Model

X1

X2

z

w1

w2

g(z)

z = w1*x

1 + w

2*x

2 + b

Activation function

g(z) is any form of an activation function

Weights Bias

Page 5: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Neuron Model

X1

X2

y

w1

w2

y = g(z) = g(w1*x

1 + w

2*x

2 + b)

Page 6: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Neuron Model

X1

X2

y

w1

w2

>0.99

<0.01

0.5

-5 5

Page 7: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Neuron Model: And Logic

x1 = 0 or 1, x

2 = 0 or 1

y

=1, if x1 = x

2 = 1

=0, otherwise

X1

X2

y

w1

w2

y = Sigmoid(w1*x

1 + w

2*x

2 + b)

x1x

2y

0 0 0

0 1 0

1 0 0

1 1 1

Page 8: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Neuron Model: And Logic

x1 = 0 or 1, x

2 = 0 or 1

y

=1, if x1 = x

2 = 1

=0, otherwise

X1

X2

y

w1=10

w2=10

y = Sigmoid(w1*x

1 + w

2*x

2 + b)

x1x

2Wx+b y

0 0 -15 0

0 1 -5 0

1 0 -5 0

1 1 5 1

b=-15

Page 9: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Neuron Model: NOR Logic

x1 = 0 or 1, x

2 = 0 or 1

y

=1, if x1 = x

2 = 0

=0, otherwise

X1

X2

y

W1=-40

W2=-35

y = Sigmoid(w1*x

1 + w

2*x

2 + b)

x1x

2Wx+b y

0 0 25 1

0 1 -10 0

1 0 -15 0

1 1 -50 0

b=25

Page 10: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Neuron Model: OR Logic

x1 = 0 or 1, x

2 = 0 or 1

y

=0, if x1 = x

2 = 0

=1, otherwise

X1

X2

y

W1=22

W2=18

y = Sigmoid(w1*x

1 + w

2*x

2 + b)

x1x

2Wx+b y

0 0 -12 0

0 1 6 1

1 0 10 1

1 1 28 1

b=-12

Page 11: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Neuron Model: XNOR Logic

x1 = 0 or 1, x

2 = 0 or 1

y

=1, if x1 = x

2 = 0

or x1 = x

2 = 1

=0, otherwise

X1

X2

y

w1

w2

y = Sigmoid(w1*x

1 + w

2*x

2 + b)

x1x

2y

0 0 1

0 1 0

1 0 0

1 1 1

This is impossible with one single neuron!

Page 12: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Neural Network: XNOR Logicx1

x2

y

0 0 1

0 1 0

1 0 0

1 1 1

X1

X2 h2

h1

y

AND

OR

NOR

x1x

2h

1h

2y

0 0 0 1 1

0 1 0 0 0

1 0 0 0 0

1 1 1 0 1

Neural networks could approximate complex functions by adding hidden layers.Universal approximation theorem: a NN could approximate any function with one layer and finite parameters.

Page 13: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Neural Network: Hidden Layers

Example of a feedforward neural network

Feature extraction

AbstractConcrete

Page 14: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Neural Network: Hidden LayersHidden layers are usually hard to explain.

Yann Lecun, Facebook AI research, father of the convolutional neural network (CNN)

Page 15: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Neural Network: Hidden Layers

Yes!

Example: “Is this an 8?”

Page 16: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Neural Network: Hidden Layers

No!

Page 17: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Neural Network: Hidden LayersThe deeper, the better? How deep is “deep”?

Page 18: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Training NN: How Does A NN Learn?

Training data Neural Net

Output

...Cost

Evaluation

Update parameters

Page 19: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Training NN: Cost FunctionCost of classification models

● Binary– One sample:

– Many samples:

– Regularization term:

Page 20: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Training NN: Cost FunctionWhy regularization?

Slides from Andrew Ng

Page 21: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Training NN: Cost FunctionCost of classification models

● Binary

Loss of incorrect predictionsMaking your model more accurate

Loss of model complexityPrevent overfitting

Page 22: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Training NN: Cost FunctionCost of classification models

● Multi-class classification

Loss of incorrect predictionsMaking your model more accurate

Loss of model complexityPrevent overfitting

Page 23: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Training NN: Gradient DescentIdea: minimize cost functionJ(w) decreases fastest when w moves the direction of negative gradient

Updated w

Old w

Learning rate

Page 24: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Training NN: BackpropagationWith multiple hidden layers, it's hard to get an analytic form of a neural net, let alone its gradient. Backpropagation is an approach to estimating gradient numerically.

X1

X2 h2

h1

y

Step 1:Forward propagation

Page 25: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Training NN: BackpropagationWith multiple hidden layers, it's hard to get an analytic form of a neural net, let alone its gradient. Backpropagation is an approach to estimating gradient numerically.

X1

X2 h2

h1

y

Step 2: Calculate error of y

Page 26: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Training NN: BackpropagationWith multiple hidden layers, it's hard to get an analytic form of a neural net, let alone its gradient. Backpropagation is an approach to estimating gradient numerically.

X1

X2 h2

h1

y

Step 3:Calculate gradients of edges connected to y

Page 27: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Training NN: BackpropagationWith multiple hidden layers, it's hard to get an analytic form of a neural net, let alone its gradient. Backpropagation is an approach to estimating gradient numerically.

X1

X2 h2

h1

y

Step 4:Calculate errors of hidden units

Page 28: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Training NN: BackpropagationWith multiple hidden layers, it's hard to get an analytic form of a neural net, let alone its gradient. Backpropagation is an approach to estimating gradient numerically.

X1

X2 h2

h1

y

Step 5:Calculate gradients of edges connected to the hidden layer

Page 29: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Training NN: A Bag of Tricks (Geoffrey Hinton)

• Unsupervised pre-training: better initial parameters• Momentum method: more efficient updates• Batch normalization: prevent gradient vanishing/explosion• Stochastic gradient descent: dealing with large dataset• Dropout: prevent overfitting• Early termination: prevent overfitting• …...

Page 30: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Python libraries for implementation

Page 31: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Example: celltype predictor

Sinlge-cell RNA-seq data from 10xGenomicsPBMC sample from healthy donors

Page 32: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Example: celltype predictor

Sinlge-cell RNA-seq data from 10xGenomicsPBMC sample from healthy donors

Feedforward neural net with two hidden layers

Page 33: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Types of NN: Autoencoder

Encoder

Decoder

Hinton & Salakhutdinov, Science, 2006

Dimension reduction by autoencoder

Page 34: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Types of NN: Generative Adversarial Networks (GAN)

Page 35: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Types of NN: Convolutional Neural Net (CNN)

Visual system

CNN is the most powerful approach for image recognition so far.

V1 cortex tested in this experiment was only active in response to one simple pattern.Many identical cells detect the same pattern, which are connected to different parts of the retina.

Page 36: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Convolutional Neural Net

50

50

20

20

Parameters for one single layer:50*50*20*20 = 1 million!

Fully connected

Why not fully connected?

Page 37: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Convolutional Neural Net

50

50

3 x 3 matrixCalled “filter”

Feature map

Page 38: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Convolutional Neural Net

50

50

0 0 1

0 1 0

0 1 0

1 0 1

0 1 0

1 0 1

0 0 1

0 1 0

1 0 0

Page 39: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Convolutional Neural Net

Averaging neighbors blurs the figure Taking difference with neighbors detects edges

Page 40: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Convolutional Neural NetPooling:1. Reduces dimensions2. Allow positional variation

Page 41: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Convolutional Neural Net

Page 42: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Chromosome

Epigenomic marker

Target signal

Regions of interest

Application 1: epigenome reader

Page 43: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Convolution (k=20, w=4)Pooling (w=4)

Convolution (k=50, w=2)Pooling (w=4)

Convolution (k=20, w=1)

Fully connected (n=50)

Sigmoid output (n=2)

CN

N

Regularization Parameters: Dropout proportion Layer 2: 20% Layer 4: 20% Layer 5: 40% All other layers: 0%

Training Validation Testing

Accuracy 93.1% 93.7% 92.1%

Application 1: epigenome reader

Page 44: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Input transformation

A A T T C C G G

0 0 0 0 1 1 0 0

0 0 0 0 0 0 1 1

1 1 0 0 0 0 0 0

0 0 1 1 0 0 0 0

A

T

C

G

Raw sequence

Transformed features

Application 2: DNA motif detector

Page 45: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Simulation

Fixed position

No mutation

0.00% 0.05% 0.00%

2/8 mutations

0.57% 0.65% 0.57%

4/8 mutations

5.31% 5.90% 6.95%

6/8 mutations

47.52% 47.2% 49.98%

NNNNNNNN

Positive

Negative

AATTCCGG100bp

Mid point

Simulation_1: learning motif sequence

Training TestingValidation

Page 46: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Simulation

Fixed position

1/8 mutation

0.07% 0.00% 0.08%

1/8 mutation[25, 75]

0.07% 0.00% 0.13%

AATGCCGG

Positive

Negative

AATTCCGG100bp

Mid point

Simulation_1.1: learning motif sequence and detect mutations

Training TestingValidation

(negative samples with one mutation in motif)

Page 47: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Simulation

Position

Positive fixed coreNegative randomly-

shifted core0.03% 0.00% 0.10%

Positive core in regionNegative core out of

region0.04% 0.00% 0.17%

Simulation_2: learning motif position

AATTCCGG

Positive

Negative

AATTCCGG100bp

Mid point

Random shiftTraining TestingValidation

Page 48: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Simulation

Flexibility

Region size: 50bp 0.14% 0.03% 0.03%

Region size: 100bp 0.11% 0.08% 0.15%

Conclusion from 1-3:CNN is able to learn both sequence and positional information, while allowing positional flexibility

Simulation_3: testing positional flexibility

Positive

Negative

AATTCCGG100bp

Shift in a specified region

( )

Training TestingValidation

Page 49: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

SimulationSimulation_4: mixture

Positive

Negative

AATTCCGG100bp

Shift in a specified region

( )

Mixture 2/8 mutations + 50bp flexible region 9.608333% 9.975000% 9.125000%

Training TestingValidation

Fixed position 2/8 mutations 0.57% 0.65% 0.57%

Better alignments of regulatory sequences is helpful for feature detection

Page 50: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Simulation

Multiple motifs

10 0.13% 0.05% 0.20%

20 0.74% 0.60% 0.82%

40 1.25% 1.25% 1.93%

80 28.76% 22.15% 23.05%

20 motifs + 50bp region + 1/8 mutations

39.80% 43.25% 43.88%

Simulation_5: learning multiple motifs

Positive

Negative

motif100bp

Mid point

Training TestingValidation

Page 51: Deep Learning and Application in€¦ · Python libraries for implementation. Example: celltype predictor ... Better alignments of regulatory sequences is helpful for feature detection.

Summary

● Artificial intelligence should be better than human for reading and understanding biological data.

● Implementing deep learning or training a NN is easier than it seems to be (but harder than understanding it).

● “It's not who has the best algorithm that wins. It's who has the most data.” Andrew Ng