The Biostatistical & Bioinformatics Challenges in The Biostatistical & Bioinformatics Challenges in the High Dimensional Data Derived from High the High Dimensional Data Derived from High Throughput Assays: Today and Tomorrow Throughput Assays: Today and Tomorrow Yu Shyr ( Yu Shyr ( 石 石 ), Ph.D. ), Ph.D. May 14, 2008 May 14, 2008 China Medical University China Medical University [email protected][email protected]
63
Embed
Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University Yu.Shyr@vanderbilt
Yu Shyr ( 石 瑜 ), Ph.D. May 14, 2008 China Medical University [email protected]. The Biostatistical & Bioinformatics Challenges in the High Dimensional Data Derived from High Throughput Assays: Today and Tomorrow. Vanderbilt University 泛德堡大學. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The Biostatistical & Bioinformatics Challenges in the High The Biostatistical & Bioinformatics Challenges in the High Dimensional Data Derived from High Throughput Assays: Dimensional Data Derived from High Throughput Assays:
Denoising (wavelets) => Peak Selection (local maximum) => common Denoising (wavelets) => Peak Selection (local maximum) => common
peak finding across spectra(NLMLE)peak finding across spectra(NLMLE)
(4) (4) Feedback:Feedback: optimally choosing calibration peaks and setting feature optimally choosing calibration peaks and setting feature
extraction parameters.extraction parameters.
Flowchart of the Preprocessing Procedure
Raw data De-noisingPeak
DetectionPeak
Distribution
BaselineCorrection
Normalization
Calibration Alignment
CommonFeature
detection
Results
Convolution Based Calibration AlgorithmConvolution Based Calibration Algorithm
1. Known peaks’ simulation (choose 1. Known peaks’ simulation (choose peaks with high prevalence across peaks with high prevalence across spectra and clear pattern by feedback spectra and clear pattern by feedback 80% ).80% ).
2. Convolve each spectra with the 2. Convolve each spectra with the known peak simulation (Gaussian, or known peak simulation (Gaussian, or Beta). Maximum happens when two Beta). Maximum happens when two peak shapes match best.peak shapes match best.
3. The linear shift units makes multiple 3. The linear shift units makes multiple peaks matched best is the optimal peaks matched best is the optimal shift.shift.
Notice: all process are on the time Notice: all process are on the time domain.domain.
Pre- CalibrationPre- Calibration
Post CalibrationPost Calibration
1.1. Accurate m/z peak position (as theoretical)Accurate m/z peak position (as theoretical)2.2. Less variation of the peaks position Less variation of the peaks position 3.3. Easily to handle large dataset in batch mode. Easily to handle large dataset in batch mode.
Baseline is generally considered as an artificial bias of the Baseline is generally considered as an artificial bias of the
signal.signal.
We propose baseline might be caused by delayed charge We propose baseline might be caused by delayed charge
releasing.releasing.
We apply We apply quadratic splinesquadratic splines to the local minimums to get the to the local minimums to get the
continuous curve by sliding windows.continuous curve by sliding windows.
Trimmed total ion currentTrimmed total ion current ( (TIC) normalization.TIC) normalization.
Baseline Data Before CorrectionBaseline Data Before Correction
Baseline Corrected DataBaseline Corrected Data
Wavelets DenoisingWavelets Denoising
Wavelet: FBI's image coding standard for digitized fingerprints, Wavelet: FBI's image coding standard for digitized fingerprints,
successful to reproduce true signal by removing noises of successful to reproduce true signal by removing noises of
specific energy levels.specific energy levels.
Wavelets method has been used to denoise signals in a wide Wavelets method has been used to denoise signals in a wide
variety of contexts.variety of contexts.
Wavelet method analyzes the data in both time and frequency Wavelet method analyzes the data in both time and frequency
domain to extract more useful information. domain to extract more useful information.
Adaptive stationary discrete wavelet denoising method is Adaptive stationary discrete wavelet denoising method is
applied in our research, which is shift-invariant and efficient in applied in our research, which is shift-invariant and efficient in
denoising.denoising.
,( ) ( , ) ( )j kj Z k Z
f t c j k t
,( , ) ( ) ( )j kc j k f t t dt
Denoising strategyDenoising strategy
Stationary discrete wavelet denoising method is shift-Stationary discrete wavelet denoising method is shift-
invariant and offers both good reconstruction invariant and offers both good reconstruction
performance and smoothness.performance and smoothness.
Adaptive denoising method is based on the noise Adaptive denoising method is based on the noise
distribution, we set up different threshold values at distribution, we set up different threshold values at
different mass intervals and frequency levels.different mass intervals and frequency levels.
Parameters (decomposition and thresholds are Parameters (decomposition and thresholds are
determined by the feedback information)determined by the feedback information)
DWT DecompositionDWT Decomposition
Denoised DataDenoised Data
Peak list across spectraPeak list across spectra
Kernel Density EstimationKernel Density Estimation
Peak distribution without high-quality preprocessing Peak distribution without high-quality preprocessing
Peak distribution with high-quality preprocessing Peak distribution with high-quality preprocessing
Peak SelectionPeak Selection
Peak SelectionPeak Selection
Preprocessing on one spectrum after calibrationPreprocessing on one spectrum after calibration
1.1. Read in spectrum by two columns: m/z values and corresponding intensities. Read in spectrum by two columns: m/z values and corresponding intensities.
2.2. Apply Adaptive Stationary Discrete Wavelet Transform for denoising. Apply Adaptive Stationary Discrete Wavelet Transform for denoising.
3.3. Sliding widow splines estimate the baseline, and subtract the baseline. Total Ion Current Sliding widow splines estimate the baseline, and subtract the baseline. Total Ion Current Normalization through the whole spectrum.Normalization through the whole spectrum.
4.4. Local maximums contribute to peak list across spectra.Local maximums contribute to peak list across spectra.
day1day1
day2day2
day3day3
day4day4
Expression ProfilesExpression Profiles
The Results from the Cluster AnalysisThe Results from the Cluster Analysis
Day
Laser P
ow
er
Why?Why?
Quality Control Assessment - Reproducibility Quality Control Assessment - Reproducibility
PreprocessingPreprocessing Dr. Dean BillheimerDr. Dean Billheimer Dr. Ming LiDr. Ming Li Dr. Dong HongDr. Dong Hong Shuo ChenShuo Chen Huiming LiHuiming Li
Additional AcknowledgementsAdditional Acknowledgements Bashar ShakhtourBashar Shakhtour Dr. William WuDr. William Wu Dr. Bonnie LeFureDr. Bonnie LeFure
AnalysisAnalysis Jeremy RobertsJeremy Roberts Will GrayWill Gray Nimish GautamNimish Gautam Joan ZhangJoan Zhang Haojie WuHaojie Wu
Dr. Heidi ChenDr. Heidi Chen Dr. Jonathan XuDr. Jonathan Xu Dr. Tatsuki KoyamaDr. Tatsuki Koyama