Top Banner
Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE
22

Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

Genomic Signal Processing

Dr. C.Q. Chang

Dept. of EEE

Page 2: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

Outline

• Basic Genomics

• Signal Processing for Genomic Sequences

• Signal Processing for Gene Expression

• Resources and Co-operations

• Challenges and Future Work

Page 3: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

Basic Genomics

Page 4: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

Genome• Every human cell contains 6 feet of double stranded (ds) DNA• This DNA has 3,000,000,000 base pairs representing 50,000-

100,000 genes• This DNA contains our complete genetic code or genome• DNA regulates all cell functions including response to disease,

aging and development• Gene expression pattern: snapshot of DNA in a cell• Gene expression profile: DNA mutation or polymorphism over

time• Genetic pathways: changes in genetic code accompanying

metabolic and functional changes, e.g. disease or aging.

Page 5: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.
Page 6: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

Gene: protein-coding DNA

Protein

mRNA

DNA

transcription

translation

CCTGAGCCAACTATTGATGAA

PEPTIDE

CCUGAGCCAACUAUUGAUGAA

Page 7: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

In more detail(color ~state)

Page 8: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

Signal Processing for Genomic Sequences

Page 9: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

The Data Set

Page 10: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

The Problem• Genomic information is digital letters A, T, C and G• Signal processing deals with numerical sequences,

character strings have to be mapped into one or more numerical sequences

• Identification of protein coding regions• Prediction of whether or not a given DNA segment

is a part of a protein coding region• Prediction of the proper reading frame• Comparing to traditional methods, signal processing

methods are much quicker, and can be even more accurate in some cases.

Page 11: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

Sequence to signal mapping

1 , 1 , 1 , 1a j t j c j g j

[ ] [ ] [ 1] / 2 [ 2] / 4y n x n x n x n

Page 12: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

Signal Analysis

• Spectral analysis (Fourier transform, periodogram)

• Spectrogram

• Wavelet analysis

• HMT: wavelet-based Hidden Markov Tree

• Spectral envelope (using optimal string to numerical value mapping)

Page 13: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

Spectral envelope of the BNRF1 gene from the Epstein-Barr virus

(a) 1st section (1000bp), (b) 2nd section (1000bp),

(c) 3rd section (1000bp), (d) 4th section (954bp)

Conjecture: the 4th quarter is actually non-coding

Page 14: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

Signal Processing for Gene Expression

Page 15: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

Biological Question

Sample preparationMicroarray

Life Cycle

Data Analysis & Modeling

Microarray Reaction

MicroarrayDetection

Taken from Schena & Davis

Page 16: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

cDNA clones(probes)

PCR product amplificationpurification

printing

microarray Hybridise target to microarray

mRNA target)

excitation

laser 1laser 2

emission

scanning

analysis

overlay images and normalise

0.1nl/spot

Page 17: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.
Page 18: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

Image Segmentation

• Simple way: fixed circle method• Advanced: fast marching level set segmentation

Advanced Fixed circle

Page 19: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

Clustering and filtering methodsPrincipal approaches:• Hierarchical clustering (kdb trees, CART, gene shaving)• K-means clustering• Self organizing (Kohonen) maps• Vector support machines• Gene Filtering via Multiobjective Optimization• Independent Component Analysis (ICA)Validation approaches:• Significance analysis of microarrays (SAM)• Bootstrapping cluster analysis• Leave-one-out cross-validation• Replication (additional gene chip experiments, quantitative PCR)

Page 20: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

ICA for B-cell lymphoma data

Data: 96 samples of normal and malignant lymphocytes.

Results: scatter-plotting of 12 independent components

Comparison: close related to results of hierarchical clustering

Page 21: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

Resources and Co-operations

Resources: databases on the internet such as

• GeneBank

• ProteinBank

• Some small databases of microarray data

Co-operations in need:

• First hand microarray data

• Biological experiment for validation

Page 22: Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.

Challenges and Future Work• Genomic signal processing opens a new signal

processing frontier• Sequence analysis: symbolic or categorical signal,

classical signal processing methods are not directly applicable

• Increasingly high dimensionality of genetic data sets and the complexity involved call for fast and high throughput implementations of genomic signal processing algorithms

• Future work: spectral analysis of DNA sequence and data clustering of microarray data. Modify classical signal processing methods, and develop new ones.