Top Banner
Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark [email protected]
33

Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.

Biological sequence analysis and information processing by artificial neural networks

Søren Brunak

Center for Biological Sequence Analysis

Technical University of Denmark

[email protected]

Page 2: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.

Parvis alignment>carp Cyprinus carpio growth hormone 210 aa vs.

>chicken Gallus gallus growth hormone 216 aa

scoring matrix: BLOSUM50, gap penalties: -12/-2

40.6% identity; Global alignment score: 487

10 20 30 40 50 60 70

carp MA--RVLVLLSVVLVSLLVNQGRASDN-----QRLFNNAVIRVQHLHQLAAKMINDFEDSLLPEERRQLSKIFPLSFCNSD

:: . : ...:.: . : :. . :: :::.:.:::: :::. ..:: . .::..: .: .:: :.

chicken MAPGSWFSPLLIAVVTLGLPQEAAATFPAMPLSNLFANAVLRAQHLHLLAAETYKEFERTYIPEDQRYTNKNSQAAFCYSE

10 20 30 40 50 60 70 80

80 90 100 110 120 130 140 150

carp YIEAPAGKDETQKSSMLKLLRISFHLIESWEFPSQSLSGTVSNSLTVGNPNQLTEKLADLKMGISVLIQACLDGQPNMDDN

: ::.:::..:..: ..:::.:. ::.:: : : ::. .:.:. :. ... ::: ::. ::..:.. : .: .

chicken TIPAPTGKDDAQQKSDMELLRFSLVLIQSWLTPVQYLSKVFTNNLVFGTSDRVFEKLKDLEEGIQALMRELEDRSPR---G

90 100 110 120 130 140 150 160

170 180 190 200 210

carp DSLPLP-FEDFYLTM-GENNLRESFRLLACFKKDMHKVETYLRVANCRRSLDSNCTL

.: : .. : . . .:. : ... ::.:::::.:::::::.: .::: .::::.

chicken PQLLRPTYDKFDIHLRNEDALLKNYGLLSCFKKDLHKVETYLKVMKCRRFGESNCTI

170 180 190 200 210

Page 3: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 4: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 5: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 6: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 7: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 8: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 9: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.

Biological neuron

Page 10: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 11: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.

Diversity of interactions in a network enables complex calculations

• Similar in biological and artificial systems

• Excitatory (+) and inhibitory (-) relations between compute units

Page 12: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 13: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.

Transfer of biological principles to neural network algorithms

• Non-linear relation between input and output

• Massively parallel information processing

• Data-driven construction of algorithms

• Ability to generalize to new data items

Page 14: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 15: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 16: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.

Simplest non-trivial classification problem

CNHSYYP, HIETRRA, NWQSADY, NQYSEPR, WHITRCA, DYHSANY, ...

• Two categories: positives and negatives• Data described by two features, e.g. charge, sidechain volume, molecular weight, number of atoms, ...

Page 17: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.

Features of phosphorylations sites

PKGcGMP-dep.kinase

PKC

CaM-IICa++/cal-modulin-dep. kinase

cdc2Cyclin-dep.kinase 2

CK-IICasein kinase 2

Page 18: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 19: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 20: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.

Homotypical cerebral cortex –(from primate) - 6 layers

Page 21: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 22: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 23: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 24: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 25: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.

DEMO

Page 26: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 27: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.

negativepositive

Training and error reduction

Page 28: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.

Transfer of biological principles to neural network algorithms

• Non-linear relation between input and output

• Massively parallel information processing

• Data-driven construction of algorithms

Page 29: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.

Sparse encoding of amino acid sequence windows

Page 30: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.

Sparse encoding of nucleotide sequence windows

Nucleotides

4 letter alphabet

Normally no need for a fifth letter

ACGTAGGCAATCTCAGACGTTTATC

1000010000100001100000100010010010001000000101000001010010000010100001000010000100010001100000010100

Page 31: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 32: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.
Page 33: Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.