Top Banner
9/30/2004 TCSS588A Isabelle Bichi ndaritz 1 Introduction to Bioinformatics
22

9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

Jan 11, 2016

Download

Documents

Erica Curtis
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 1

Introduction to Bioinformatics

Page 2: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 2

Introduction to Class

• Syllabus• Schedule• Web-site http://courses.washington.edu/tcss588• Assignments:

– An application to genetics – An application to proteomics– …

• Project – project teams (proposal due next week)

Page 3: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 3

Introduction to Class• 1.    Biological foundations.• 2.    Machine learning algorithms and applications

to biology/life sciences.• 3.    Neural networks.• 4.    Hidden Markov Models.• 5.    Graphical models.• 6.    Case Based Reasoning.• 7.    Phylogenetic trees induction.• 8.    Microarrays and gene expression.• 9.    Image understanding and mining.• 10.  Biometrics.

Page 4: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 4

Introduction to ClassDay Date Subject Pre-reading R 9/30 Introduction to Bioinformatics and the Life Sciences Chapter 1 T 10/5 Probabilistic Framework Chapter 2 R 10/7 Probabilistic Inference Chapter 3 T 10/12 Machine Learning Algorithms (Part I) 4.1-4.4 R 10/14 Machine Learning Algorithms (Part II) 4.5-4.8 T 10/19 Neural Networks Theory Chapter 5 R 10/21 Neural Networks Applications Chapter 6 T 10/26 Hidden Markov Models Theory Chapter 7 R 10/28 Hidden Markov Models Applications Chapter 8 T 11/2 Graphical Models (Part I) 9.1-9.4

R 11/4 Graphical Models (Part II) 9.5-9.6 T 11/9 Case Based Reasoning Handout R 11/11 Veterans Day Holiday T 11/16 Future Trends Discussion / MIDTERM R 11/18 Phylogenetic Trees Induction Chapter 10 T 11/23 Microarrays and Gene Expression Chapter 12 R 11/25 Thanksgiving Holiday T 11/30 Image Understanding Handout R 12/2 Image Mining Handout T 12/7 Biometrics Handout R 12/9 Future Perspectives Discussion / FINAL R 12/16 FINAL PROJECT PRESENTATIONS in CP 106 5:00P– 7:15P

Page 5: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 5

Course Learning Objectives• Understand biological concepts and set of problems. o Understand scientific framework for bioinformatics in statistics,

complexity, and information theory.o Understand machine learning methods for bioinformatics.o Understand innovative algorithms and methods for

bioinformatics. o Program using available bioinformatics tools. o Learn familiarity with statistical learning, concept learning,

hidden Markov models, case based reasoning, neural networks, knowledge-based systems and ontologies, genetic algorithms, stochastic grammars and linguistics, grid computing, and semantic Web.

o Design and develop new computer systems for bioinformatics.  

Page 6: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 6

OutlineOutline

• Informatics / Medical Informatics / Bioinformatics / Computational Biology

• Project examples– Care Partner– Telemakus– Phylsyst– Human Genome Project

• Introduction to biology

Page 7: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 7

Informatics / Medical InformaticsInformatics / Medical Informatics

• Informatics is “The science of rational and computerized processing of information as it supports human knowledge and communication in scientific, technical, economical, and social domains.” .

• Often associated with health care and medical research applications medical informatics

• Interdisciplinary field involving medicine, biology, computer science, mathematics, information science, and statistics.

Page 8: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 8

Medical InformaticsMedical Informatics

• Computer Applications in Health Care

1 communication and telematics2 storage and retrieval3 processing and automation4 diagnosis and decision making5 therapy and control6 research and development

INCREASINGLEVEL OF

COMPLEXITY

Page 9: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 9

BioinformaticsBioinformatics

• Bioinformatics is the discipline that develops technologies for supporting information management in fields like biology.

• Target domains: biology, medicine, pharmacology, agriculture …

• Interdisciplinary field.• Main tasks: analyze biological sequence data,

genome content, and arrangement, predict the function and structure of macromolecules.

Page 10: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 10

Computational BiologyComputational Biology

• Computational biology provides algorithms for bioinformatics.

• Target applications: – Genomics DNA genes– Proteomics proteins– Phylogenetics evolutionary classifications

Page 11: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 11

Care Partner System DescriptionCare Partner System Description

• A decision support system for stem cell post transplant care:– comprehensive knowledge-base (scientific

literature, monographs, clinical guidelines, clinical pathways, clinical cases)

– available on the WWW – learns from experience

Page 12: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 12

Knowledge-Base Knowledge-Base

N LTFUCDSS

SNOMEDv. 3.4

Diseases 1109 35,834Functions 452 19,221Labs 1152 30,723Procedures 547 20,105Medications 2684 14,846Sites 460 5,875

Page 13: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 13

Knowledge-BaseKnowledge-Base

N CDSSTerms 739,439Relations 51

N CDSSPatient cases 4904

Page 14: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 14

Page 15: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 15

TelemakusTelemakus• Goal of the Telemakus System:

– to enhance the knowledge discovery process by developing retrieval, visual and interaction tools to mine and map research findings from the research literature.

• Objective of the research:– to create, test and validate an infrastructure to permit

the automation of the creation and maintenance of a searchable database that generates knowledge maps via query tools and concept mapping algorithms.

– to apply natural language processing models and information analysis methods to ultimately speed up the scientific discovery process.

Page 16: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 16

TelemakusTelemakus

Page 17: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 17

Phylsyst

Page 18: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 18

Phylsyst

• Example – Phylsyst built cladogram

clado1 Level 1 01-10 Doublon split on characters: 8 12 27 Level 1 values: 8(0) 12(1) 27(1) Level 2 01-10 Doublon split on characters: 18 29 25 Level 2 values: 18(0) 29(1) 25(1) Taxon Diphylleia Level 2 values: 18(1) 29(0) 25(1) Level 3 01-10 Doublon split on characters: 14 17 Level 3 values: 14(0) 17(1) Taxon: Dysosma Level 1 values: 8(1) 12(0) 27(0) Level 2 01-10 Doublon split on characters: 16 29 30 19 Level 2 values: 16(0) 29(1) 30(0) 19(0)

Level 3 00-11 Doublon split on characters: 1 7 33 25 23 13 11 Level 3 values: 1(0) 7(0) 33(0) 25(0) 23(0) 13(0) 11(0) 10(0) Level 4 Agglom. Split Taxon: Berberis Taxon: Mahonia Level 3 values: 1(1) 7(1) 33(1) 25(1) 23(1) 13(1) 11(1) 10(1) Taxon: Ranzania

Page 19: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 19

Human Genome ProjectHuman Genome Project• Goal of the Human Genome

Project:– identify all the approximate 30,000 genes in human DNA, – determine the sequences of the 3 billion chemical base

pairs that make up human DNA, – store this information in databases, – improve tools for data analysis, – transfer related technologies to the private sector, and – address the ethical, legal, and social issues (ELSI) that

may arise from the project.

• Completed in 2003

Page 20: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 20

Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001

Page 21: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 21

The Human Genome Project

• The Human Genome Project

Page 22: 9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.

9/30/2004 TCSS588A Isabelle Bichindaritz 22

The Visible Human Project

• Image understanding – the Visible Human Project