Top Banner
Lecture 10, CS567 1 Neural Network Applications • Problems • Input transformation • Network Architectures • Assessing Performance
14

Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.

Jan 02, 2016

Download

Documents

William Scott
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.

Lecture 10, CS567 1

Neural Network Applications

• Problems

• Input transformation

• Network Architectures

• Assessing Performance

Page 2: Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.

Lecture 10, CS567 2

Problems

• Deducing the genetic code

• Predicting genes

• Predicting signal peptide cleavage sites

Page 3: Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.

Lecture 10, CS567 3

Deducing the genetic code• Problem: Given a codon, predict corresponding amino acid • Of didactic value

– Trivial mapping table, after-the-fact• Perfect classification problem, rather than prediction

– With minimal network• Learning issues

– ‘Similar’ codons code for ‘similar’ amino acids– Abundance of amino acids proportional to code

redundancy (this and previous point undermine effect of mutations)

– Third base ‘wobble’– N:1 mapping between codon and amino acid

Page 4: Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.

Lecture 10, CS567 4

The genetic code

http://molbio.info.nih.gov/molbio/gcode.html

T C A G

T

TTT Phe (F)TTC " TTA Leu (L)TTG "

TCT Ser (S)TCC " TCA " TCG "

TAT Tyr (Y)TAC TAA Ter TAG Ter

TGT Cys (C)TGC TGA Ter TGG Trp (W)

C

CTT Leu (L)CTC " CTA " CTG "

CCT Pro (P)CCC " CCA " CCG "

CAT His (H)CAC " CAA Gln (Q)CAG "

CGT Arg (R)CGC " CGA " CGG "

A

ATT Ile (I)ATC " ATA " ATG Met (M)

ACT Thr (T)ACC " ACA " ACG "

AAT Asn (N)AAC " AAA Lys (K)AAG "

AGT Ser (S)AGC " AGA Arg (R)AGG "

G

GTT Val (V)GTC " GTA " GTG "

GCT Ala (A)GCC " GCA " GCG "

GAT Asp (D)GAC " GAA Glu (E)GAG "

GGT Gly (G)GGC " GGA " GGG "

Page 5: Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.

Lecture 10, CS567 5

Network Architecture

• Orthogonal coding (4X3) 2 hidden neurons (Is this a linear or non-linear

problem?)

• 20 output neurons – Winner takes all

• Total of 86 parameters (How?)

• FFBP

Page 6: Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.

Lecture 10, CS567 6

Deducing the genetic code (Fig 6.7)

Page 7: Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.

Lecture 10, CS567 7

Deducing the genetic code (Fig 6.8)

Page 8: Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.

Lecture 10, CS567 8

Improving classification error

• Training rate high for misclassified codons, low otherwise (in addition to iteration dependence)

• Balanced cycles (Balanced in terms of amino acids, not codons)

• Adaptive training– Present mis-classified examples more often

Page 9: Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.

Lecture 10, CS567 9

Is it a gene or not a gene?• Approaches depend on

– Bias at junctions of coding and non-coding regions • Donor (5’ end of intron) and acceptor sites (3’ end of intron) have

biases in composition (GT [junk]+ C/U+ AG)

– Bias in composition of coding regions (but not of non-coding regions, eg, introns)

• Exons are “regular guys”, introns are “freshman dorm rooms”• Seen as GC bias, codon usage frequency and codon bias

– Inverse relationship between the two (splice site strength and regularity within exons)

• “Food exit sign on highway doesn’t need prominent restaurant signs”

• “Stretch of prominent restaurant signs doesn’t need a sign indicating food”

Page 10: Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.

Lecture 10, CS567 10

Regularity within coding regions (Fig 6.11)Bacteria Mammals

C. elegans A. thaliana

Page 11: Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.

Lecture 10, CS567 11

Predicting Exons: The holy GRAIL • Neural networks for gene prediction

– Input representation/transformation key

– NN per se trivial: MLP with single hidden layer and single output neuron

– Input = Coding region candidate, transformed to• 6mer (di-codon) score of candidate region

• 6mer (di-codon) score of flanking regions

• GC composition of candidate region

• GC composition of flanking region

• Markov model score

• Length of candidate

• Splice site score

Page 12: Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.

Lecture 10, CS567 12

Signal peptide (SignalP) prediction

• Signal peptides are N-terminal subsequences in proteins that are “export tags” including a “dotted line” (cleavage site) indicating point of detachment– Coding is species specific

• Problem analogous to exon/intron delineation– Distinguish between signalP and rest of protein– Find junction between signalP and rest of protein

Page 13: Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.

Lecture 10, CS567 13

Signal peptide (SignalP) prediction • Two kinds of network that output, for each position,

– S-score: Probability of classification as signal peptide

– C-score: Probability of being the junction

• Key is post-processing – using S and C scores to come up with final prediction

• C-score prediction: Based on Asymmetric windows (why?)• S-score prediction: Based on Symmetric windows (why?)

• Y-score = (CidSi)1/2 where dSi = Average difference in Si in windows of size d flanking position i

Page 14: Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.

Lecture 10, CS567 14

Signal peptide (SignalP) prediction (Fig 6.5)

S

S