Top Banner
Optimized Numerical Mapping Optimized Numerical Mapping Scheme for Filter-Based Exon Scheme for Filter-Based Exon Location in DNA Using a Quasi- Location in DNA Using a Quasi- Newton Algorithm Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou P. Ramachandran, W.-S. Lu, and A. Antoniou Department of Electrical Engineering, University of Victoria, BC, Canada. ISCAS 2010, Paris ISCAS 2010, Paris
23

Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

Optimized Numerical Mapping Scheme for Optimized Numerical Mapping Scheme for

Filter-Based Exon Location in DNA Using Filter-Based Exon Location in DNA Using

a Quasi-Newton Algorithma Quasi-Newton Algorithm

P. Ramachandran, W.-S. Lu, and A. AntoniouP. Ramachandran, W.-S. Lu, and A. Antoniou

Department of Electrical Engineering,University of Victoria, BC, Canada.

ISCAS 2010, ParisISCAS 2010, Paris

Page 2: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

2

DNADNA

The instructions to build and maintain a living organism The instructions to build and maintain a living organism are encoded in its are encoded in its DNA.DNA.

DNA is composed of smaller components called DNA is composed of smaller components called nucleotides, nucleotides, namely, adenine, thymine, guanine, and namely, adenine, thymine, guanine, and cytosine (cytosine (A, T, G,A, T, G, and and C).C).

DNA comprises a pair of strands.DNA comprises a pair of strands.

Page 3: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

3

DNA (cont’d)DNA (cont’d)

Nucleotides pair up across the two strands. Nucleotides pair up across the two strands.

AA always pairs with always pairs with TT and and GG always pairs with always pairs with C.C.

Symbolic representation of a DNA sequence.

Page 4: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

4

GenesGenes

Regions in a genome that code for proteins are called Regions in a genome that code for proteins are called genes. genes.

Page 5: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

5

Exons and IntronsExons and Introns

Genes are further split into coding regions called Genes are further split into coding regions called exonsexons and noncoding regions called and noncoding regions called introns. introns.

Page 6: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

6

Location of ExonsLocation of Exons

Accurate location of exons in genomes is very important Accurate location of exons in genomes is very important for understanding life processes.for understanding life processes.

The power spectra of DNA segments corresponding to The power spectra of DNA segments corresponding to exons exhibit a relatively strong component atexons exhibit a relatively strong component at

This is known as the This is known as the period-3 property.period-3 property.

Thus, exons can be located by mapping the DNA Thus, exons can be located by mapping the DNA characters into numbers and then tracking the strength of characters into numbers and then tracking the strength of the period-3 component along the length of the DNA the period-3 component along the length of the DNA sequence of interest.sequence of interest.

2π/3.

Page 7: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

7

EIIP ValuesEIIP Values

Earlier, we have used electron-ion interaction potential Earlier, we have used electron-ion interaction potential (EIIP) values in conjunction with a filtering technique for (EIIP) values in conjunction with a filtering technique for exon location.exon location.

Here, we propose the use of an optimized set of Here, we propose the use of an optimized set of nucleotide weights, we refer to as nucleotide weights, we refer to as pseudo-EIIP values,pseudo-EIIP values, that significantly improve the accuracy of our exon-that significantly improve the accuracy of our exon-location technique.location technique.

Page 8: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

8

Filter-Based Exon Location TechniqueFilter-Based Exon Location Technique

1.1. The DNA character sequence of interest is mapped onto a The DNA character sequence of interest is mapped onto a numerical sequence using EIIP values.numerical sequence using EIIP values.

NucleotideNucleotide EIIPEIIP

AdenineAdenine 0.12600.1260

ThymineThymine 0.13350.1335

GuanineGuanine 0.08060.0806

CytosineCytosine 0.13400.1340

EIIP Values

2.2. A narrowband bandpass digital filter with its passband A narrowband bandpass digital filter with its passband centered at the period-3 frequency is used to filter the DNA centered at the period-3 frequency is used to filter the DNA sequence.sequence.

Page 9: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

9Filter-Based Technique (cont’d)Filter-Based Technique (cont’d)

3.3. The filtered output is an amplitude modulated signal, which is The filtered output is an amplitude modulated signal, which is demodulated by filtering its power, , using a lowpass demodulated by filtering its power, , using a lowpass filter. filter.

The exon locations are identified as distinct peaks.The exon locations are identified as distinct peaks.

Exon location system.

2( [ ])y n

Page 10: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

10

Receiver Operating Characteristic (ROC) Receiver Operating Characteristic (ROC) TechniqueTechnique

The ROC technique is a tool for evaluating prediction The ROC technique is a tool for evaluating prediction techniques in terms of their performance.techniques in terms of their performance.

It is based on metrics known as the It is based on metrics known as the true positive ratetrue positive rate (TPR)(TPR) and the and the false positive ratefalse positive rate (FPR) (FPR)::

andand

TP, TN, FP, and FN denote the number of TP, TN, FP, and FN denote the number of true positives, true positives, true negatives, false positives,true negatives, false positives, and and false negatives,false negatives, respectively, of the predicted exon locations relative to a respectively, of the predicted exon locations relative to a set of known true locations.set of known true locations.

Page 11: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

11ROC Technique (cont’d)ROC Technique (cont’d)

ROC plane

The TPR is plotted The TPR is plotted versus the FPR to versus the FPR to obtain a point in the obtain a point in the ROC plane as ROC plane as illustrated.illustrated.

Since the TPR and FPR Since the TPR and FPR range from 0 to 1, the range from 0 to 1, the total area of the ROC total area of the ROC plane is unity.plane is unity.

Page 12: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

12ROC Technique (cont’d)ROC Technique (cont’d)

The northwest pole, The northwest pole, (0, 1), represents (0, 1), represents perfect prediction and perfect prediction and the goal of any the goal of any prediction technique is prediction technique is to reach this point. to reach this point.

The The area under the area under the ROC curveROC curve (AUC) is a (AUC) is a good indicator of the good indicator of the overall performance of overall performance of an exon-location an exon-location technique.technique.

The greater the AUC, The greater the AUC, the better would be the better would be the performance. the performance.

ROC plane

Page 13: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

13

Proposed Training ProcedureProposed Training Procedure

A better set of nucleotide weights can be obtained by A better set of nucleotide weights can be obtained by maximizing the maximizing the AUCAUC corresponding to a training set of DNA corresponding to a training set of DNA sequences or, equivalently, by minimizing the quantity sequences or, equivalently, by minimizing the quantity 11−−AUC.AUC.

A quasi-Newton algorithm based on the BFGS updating A quasi-Newton algorithm based on the BFGS updating formula was found to give good results.formula was found to give good results.

Closed-form expressions for the objective function and Closed-form expressions for the objective function and gradient are not possible for this problem and, therefore, gradient are not possible for this problem and, therefore, they are evaluated numerically.they are evaluated numerically.

Page 14: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

14

Training Procedure (cont’d)Training Procedure (cont’d)

For consistency between the optimized nucleotide For consistency between the optimized nucleotide weights and the EIIP values, we need to ensure that weights and the EIIP values, we need to ensure that

the four variables are always positive and the four variables are always positive and

their numerical values are normalized at the end of their numerical values are normalized at the end of each iteration such that their sum is always equal to each iteration such that their sum is always equal to the sum of the EIIP values.the sum of the EIIP values.

Page 15: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

15

Training Procedure (cont’d)Training Procedure (cont’d)

Positive values can be achieved by replacing each Positive values can be achieved by replacing each variable by its square in the objective function. variable by its square in the objective function.

The normalization can be achieved by using the The normalization can be achieved by using the following scaling factor in each iteration:following scaling factor in each iteration:

Constant Constant 0.47410.4741 is the sum of the actual EIIP is the sum of the actual EIIP values and the denominator variables are the values and the denominator variables are the current optimized nucleotide weights.current optimized nucleotide weights.

Page 16: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

16

Model for ROC CurvesModel for ROC Curves

ROC curves are not continuous but can be approximated ROC curves are not continuous but can be approximated using an exponential model of the formusing an exponential model of the form

Parameters and can be determined by Parameters and can be determined by minimizing the error function minimizing the error function

where and are points in the where and are points in the ROC plane.ROC plane.

Page 17: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

17

The minimization can be performed using a quasi-Newton The minimization can be performed using a quasi-Newton algorithm as before.algorithm as before.

Sample ROC curve and its approximation.

Training Procedure (cont’d)Training Procedure (cont’d)

Page 18: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

18

ResultsResults

Simulation were performed to optimize the nucleotide weights Simulation were performed to optimize the nucleotide weights using a specific data set and then test the optimized weights using a specific data set and then test the optimized weights on a nonoverlapping test set. on a nonoverlapping test set.

The data sets were chosen from the popular The data sets were chosen from the popular HMR195HMR195 database. database.

Of the 195 sequences in the database, we selected the 160 Of the 195 sequences in the database, we selected the 160 sequences that have been verified experimentally and divided sequences that have been verified experimentally and divided them into two sets, the initial training set and a test set of 80 them into two sets, the initial training set and a test set of 80 sequences each.sequences each.

Page 19: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

19

Termination tolerance: Termination tolerance: 1010--66

Iterations for minimization of 1Iterations for minimization of 1−−AUC: AUC: 4242 Iterations for exponential model: Iterations for exponential model: 2020

Results (cont’d)Results (cont’d)

Page 20: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

20Results (cont’d)Results (cont’d)

ROC curves corresponding to the actual and pseudo-EIIP values, obtained using the training set.

Pseudo-EIIP valuesEIIP values

Page 21: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

21

ROC curves corresponding to the actual and pseudo-EIIP values, obtained using a test set with no overlap with the training set.

Results (cont’d)Results (cont’d)

Pseudo-EIIP valuesEIIP values

Page 22: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

22

ConclusionsConclusions

A method for obtaining optimized nucleotide weights, A method for obtaining optimized nucleotide weights, referred to as referred to as pseudo-EIIP values,pseudo-EIIP values, has been proposed for has been proposed for use in filter-based exon location in DNA sequences. use in filter-based exon location in DNA sequences.

The pseudo-EIIP values were found to yield improved The pseudo-EIIP values were found to yield improved exon location with respect to the training set as well as a exon location with respect to the training set as well as a nonoverlapping set of DNA sequences. nonoverlapping set of DNA sequences.

The pseudo-EIIP values render the filter-based exon The pseudo-EIIP values render the filter-based exon location technique a more useful computational technique location technique a more useful computational technique that can be used by biologists as an alternative to that can be used by biologists as an alternative to expensive and laborious wet experimental techniques.expensive and laborious wet experimental techniques.

Page 23: Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.

23

Thank you for your attention.Thank you for your attention.