Top Banner
EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004
27

EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Dec 26, 2015

Download

Documents

Leo Ellis
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

EE150a – Genomic Signal and Information Processing

On DNA Microarrays Technology

October 12, 2004

Page 2: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Recall the information flow in cells

• Replication of DNA

– {A,C,G,T} to {A, C, G,T}

• Transcription of DNA to mRNA

– {A,C,G,T} to {A, C, G,U}

• Translation of mRNA to proteins

– {A,C,G,U} to {20 amino-acids}

• Interrupt the information flow

and measure gene expression

levels!

http://www-stat.stanford.edu/~susan/courses/s166/central.gif

Page 3: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Gene Microarrays• A medium for matching known and unknown sequences of

nucleotides based on hybridization (base-pairing: A-T, C-G)

• Applications

– identification of a sequence (gene or gene mutation)

– determination of expression level (abundance) of genes

– verification of computationally determined genes

• Enables massively parallel gene expression studies

• Two types of molecules take part in the experiments:

– probes, orderly arranged on an array

– targets, the unknown samples to be detected

Page 4: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Microarray Technologies• Oligonucleotide arrays (Affymetrix GeneChips)

– probes are photo-etched on a chip (20-80 nucleotides)

– dye-labeled mRNA is hybridized to the chip

– laser scanning is used to detect gene expression levels (i.e., amount of mRNA)

• cDNA arrays

– complementary DNA (cDNA) sequences “spotted” on arrays (500-1000 nucleotides)

– dye-labeled mRNA is hybridized to the chip (2 types!)

– laser scanning is used to detect gene expression levels

• There are various hybrids of the two technologies above

Page 5: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Oligonucleotide arrays

Source: Affymetrix website

Page 6: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

GeneChip Architecture

Source: Affymetrix website

Page 7: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Hybridization

Source: Affymetrix website

Page 8: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Laser Scanning

Source: Affymetrix website

Page 9: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Sample Image

Source: The Paterson Institute for Cancer Research

Page 10: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Competing Microarray Technologies

• So far considered oligonucleotide arrays:

– automated, on-chip design

– light dispersion may cause problems

– short probes, 20-80

• cDNA microarrays are another technology:

– longer probes obtained via PCR, polymerase chain reaction

– [sidenote: what is optimal length?]

– probes grown in a lab, robot printing

– two types of targets – control and test

Page 11: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

http://pcf1.chembio.ntnu.no/~bka/images/MicroArrays.jpg

cDNA Microarrays

Page 12: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Sample cDNA Microarray Image

Page 13: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Some Design Issues

• Photo-etching based design: unwanted light exposure

– border minimization

– the probes are 20-80 long

• Hybridization: binding of a target to its perfect complement

• However, when a probe differs from a target by a small

number of bases, it still may bind

• This non-specific binding (cross-hybridization) is a source of

measurement noise

• In special cases (e.g., arrays for gene detection), designer has a

lot of control over the landscape of the probes on the array

Page 14: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Dealing with Measurement Noise

• Recent models of microarray noise

– measurements reveal signal-dependent noise (i.e., shot-noise)

as the major component

– additional Gaussian-like noise due to sample preparation,

image scanning, etc.

• Image processing assumes image background noise

– attempts to subtract it

– sets up thresholds

• Lack of models of processes on microarrays

Page 15: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Probabilistic DNA Microarray Model

• Consider an m£m DNA microarray, with m2 unique types of

nucleotide probes

• A total of N molecules of n different types of cDNA targets

with concentrations c1,…,cn,

is applied to the microarray

• Measurement is taken after the system reached chemical

equilibrium

• Our goal: from the scanned image, estimate the concentrations

Page 16: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

DNA Microarray Model Cont’d

• Each target may hybridize to only one type of probe

• There are k non-specific bindings

• Model diffusion of unbound molecules by random walk;

distribution of unbound molecules uniform on the array

– justified by reported experimental results

• Assume known probabilities of hybridization and cross-

hybridization

– Theoretically: from melting temperature

– Experimentally: measurements (e.g., from control target

samples)

Page 17: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Markov Chain Model

Modeling transition between possible

states of a target:

• one specific binding state

• k=2 non-specific bindings

• pn=1-kpc-ph is probability that an

unbound molecule remains free

Measurement is taken after the

system reached state of chemical

equlibrium – need to find steady state

Page 18: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Markov Chain Model Cont’d

Let i=[i,1 i,2 … i,k+2]T be a vector

whose components are numbers of the

type i targets that are in one of the k+2

states of the Markov chain

• i,1 is the # of hybridized molecules

• i,j, 2 < j · k+2 is # of cross-hybrid.

Note that k=1k+2i,k=ci for every i.

Page 19: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Stationary State of the Markov Chain

• In equilibrium, we want to find i such that

where the transition matrix Pi is given by

• Clearly, in the stationary state we have

• Finally, ratio i/ci gives stationary state probabilities

Page 20: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Linear Microarray Model

• Let matrix Q collect the previously obtained probabilities

• The microarray measurement model can be written as

• Vector w describes inherent fluctuations in the measured

signal due to hybridization (shot-noise)

• Binding of the j-type target to the i-type probe is the Bernoulli

random variable with variance qi,j(1-qi,j)

– hence the variance of wi is given by

• Vector v is comprised of iid Gaussian entries

Page 21: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Detection of Gene Expression Levels

• A simple estimate is obtained via pseudo-inverse,

• Maximize a posteriori probability p(s|c), which is equivalent to

where the matrix is given by

• Optimization above readily simplifies to

Page 22: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Simulation Results

• Consider an 8£8 array (m=8)

• Apply n=6 types of targets

• Concentrations: [1e5 2e5 2e5 2e5 1e5 2e5] (N=1e6)

• Assume the following probabilities:

– hybridization – 0.8

– cross-hybridization – 0.1

– release – 0.02

• Let k=3 (number of non-specific bindings)

• Free molecules perform random walk on the array

Page 23: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Simulation Results: Readout Data

Page 24: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Simulation Results: Estimate

Page 25: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Some Comments

• Adopt mean-square error for a measure of performance

• As expected, we observe significant improvement over raw

measurements (improvement in terms of MSE)

• Things to do:

– investigate how to incorporate control sample

measurements

– modification of the technique for very large microarrays is

needed (matrix inversion may be unstable)

• Experimental verification!

Page 26: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Why is this Estimation Problem Important?

• Microarrays measure expression levels of thousands of gene simultaneously

• Assume that we are taking samples at different times during a biological process

• Cluster data in the expression level space

– relatedness in biological function often implies similarity in expression behavior (and vice versa)

– similar expression behavior indicates co-expression

• Clustering of expression level data heavily depends on the measurements

– better estimation may lead to different functionality conclusions

Page 27: EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004.

Summary

• Microarray technologies are becoming of great importance for

medicine and biology

– understanding how the cell functions, effects on organism

– towards diagnostics, personalized medicine

• Plenty of interesting problems

– combinatorial design techniques

– statistical analysis of the data

– signal processing / estimation