Scientific Data Mining: Scientific Data Mining: Emerging Developments and Emerging Developments and Challenges Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics University of Maryland - Baltimore County
12
Embed
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Scientific Data Mining:Scientific Data Mining:Emerging Developments and Emerging Developments and
ChallengesChallenges
F. Seillier-Moiseiwitsch
Bioinformatics Research Center
Department of Mathematics and Statistics
University of Maryland - Baltimore County
Bioinformatics:
A View from the Trenches
Some Needed Developments: Simultaneous data mining of databases
• Different types of information in separate databases
GenBank, PDB, HIV-Web, PubMed, …
Data selection
Generic solution
Some Needed Developments: Simultaneous data mining of databases
• Same information in different databases
Meta-analysis
e.g. Gene expression data
Pre-processing
different technologies
sources of variability
Some Needed Developments: Data mining of heterogeneous databases
Many different types of information in same database
e.g. Patient records - diagnostics
lab results, DNA, microarray
2D gel images
data compression
features
Some Needed Developments: New Algorithms
• Molecular evolution
Phylogenetic reconstruction
Large number of sequences
Statistical evolutionary models
MCMC, E-M algorithm
Parallel processors
Emerging models
Some Needed Developments: New Algorithms
• Proteomics
images of 2D gels
clean up, alignment
group composite image
biological vs. experimental variability
easily updated
Some Needed Developments: New Algorithms
• Functional genomics
microarray data
background estimation (subjectivity)
automation of analytical protocols
Some Challenges
• Public domain software
• Easily implementation on any computing platform
• Incorporation of state-of-the-art statistical techniques