Top Banner
Bioinformatics: Applications ZOO 5903/4903 ZOO 5903/4903 Fall 2006, MW 10:30-11:45 Fall 2006, MW 10:30-11:45 Sutton Hall, Room 312 Sutton Hall, Room 312 Jonathan D. Wren, Ph.D. Jonathan D. Wren, Ph.D.
29
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bioinformatics

Bioinformatics: Applications

ZOO 5903/4903ZOO 5903/4903

Fall 2006, MW 10:30-11:45Fall 2006, MW 10:30-11:45

Sutton Hall, Room 312Sutton Hall, Room 312

Jonathan D. Wren, Ph.D.Jonathan D. Wren, Ph.D.

Page 2: Bioinformatics

Prereqs

General knowledge of biology is General knowledge of biology is requiredrequired Programming experience will help but is Programming experience will help but is

not necessarynot necessary Course will emphasize Internet-based Course will emphasize Internet-based

applications and databasesapplications and databases

Page 3: Bioinformatics

Goals for this course

I.I. Gain an understanding of the range of problems Gain an understanding of the range of problems being tackled by bioinformaticsbeing tackled by bioinformatics

II.II. Understand what bioinformatics programs are Understand what bioinformatics programs are available and being used in biomedical researchavailable and being used in biomedical research

III.III. Be able to interpret the output & results of Be able to interpret the output & results of bioinformatics programs and understand their bioinformatics programs and understand their limitationslimitations

IV.IV. Learn more about Biology from an information Learn more about Biology from an information standpointstandpoint

V.V. For those taking the advanced course, build a For those taking the advanced course, build a knowledge baseknowledge base

Page 4: Bioinformatics

Overview

Grade weightingGrade weightingHomeworkHomework 35% (10 total)35% (10 total)First 3 exams First 3 exams 30%30%Final examFinal exam 20% (comprehensive)20% (comprehensive)ReportReport 10%10%ParticipationParticipation 5% 5%

(homework is generally given on Wednesday and (homework is generally given on Wednesday and due the following Monday)due the following Monday)

Page 5: Bioinformatics

Textbook

Plus, there will be supplementary reading assignments, usually online, and a few extra (suggested) readings

Page 6: Bioinformatics

General policy/philosophy Lectures complement and enhance (but don’t duplicate) Lectures complement and enhance (but don’t duplicate)

reading assignmentsreading assignments Homeworks are intended to bridge theory & practice, Homeworks are intended to bridge theory & practice,

keep you on track for exams, and permit you to go at keep you on track for exams, and permit you to go at your own pace.your own pace.

Homeworks may be turned in late, but for every class Homeworks may be turned in late, but for every class they are late, 10% will be deducted.they are late, 10% will be deducted.

Exams are intended to gauge synthesis of topics Exams are intended to gauge synthesis of topics coveredcovered

I do believe in curves, but can’t guarantee their I do believe in curves, but can’t guarantee their application for every gradeapplication for every grade

Page 7: Bioinformatics

OfficeOffice: Room 2025, Stephenson Center for : Room 2025, Stephenson Center for Research and TechnologyResearch and Technology

HoursHours: 1 pm – 4 pm Monday: 1 pm – 4 pm Monday

1 pm – 4 pm Wednesday1 pm – 4 pm Wednesday

[email protected]

Page 8: Bioinformatics

Strategic overview

Instruction set

Active programs

Product

System

Network

Page 9: Bioinformatics

Outline for 1st exam classes(DNA thread)

Introduction to bioinformaticsIntroduction to bioinformatics Genomes, sequencing and featuresGenomes, sequencing and features Sequence alignment basicsSequence alignment basics Multiple sequence alignmentsMultiple sequence alignments Phylogenetics – evolutionary relationshipsPhylogenetics – evolutionary relationships Genome comparison & diversityGenome comparison & diversity

Page 10: Bioinformatics

What is Bioinformatics? Development of methods & algorithms to organize, Development of methods & algorithms to organize,

integrate, analyze and interpret biological and integrate, analyze and interpret biological and biomedical databiomedical data

Study of the inherent structure & flow of biological Study of the inherent structure & flow of biological informationinformation

Goals of bioinformatics:Goals of bioinformatics: Identify patternsIdentify patterns ClassifyClassify Make predictionsMake predictions Create modelsCreate models Better utilize existing knowledgeBetter utilize existing knowledge

““Two months in the lab can easily save an afternoon on the computer.” Two months in the lab can easily save an afternoon on the computer.” –Alan Bleasby–Alan Bleasby

Page 11: Bioinformatics

The “old” biology

The most challenging task for a scientist is to get good data

Page 12: Bioinformatics

The “new” biology

The most challenging task for a scientist is to make sense of lots of data

Page 13: Bioinformatics

Old vs New - What’s the difference?1) Economics

Miniaturize – less costMiniaturize – less cost Multiplex – more dataMultiplex – more data Parallelize – save timeParallelize – save time Automate – minimize human interventionAutomate – minimize human intervention Thus, you must be able to deal with large Thus, you must be able to deal with large

amounts of data and trust the process that amounts of data and trust the process that generated itgenerated it

Page 14: Bioinformatics

What’s the difference? 2) Scale

From gene sequencing (~ 1 KB) to genome From gene sequencing (~ 1 KB) to genome sequencing (many MB, even GB)sequencing (many MB, even GB)

From picking several genes for expression studies to From picking several genes for expression studies to analyzing the expression patterns of all genesanalyzing the expression patterns of all genes

From a catalog of key genes in a few key species to From a catalog of key genes in a few key species to a catalog of all genes in many speciesa catalog of all genes in many species

Analyzing your data in isolation makes less sense Analyzing your data in isolation makes less sense when you can make much more powerful when you can make much more powerful statements by including data from othersstatements by including data from others

Page 15: Bioinformatics

What’s the difference? 3) Logic

Hypothesis-driven research to data-driven researchHypothesis-driven research to data-driven research Expertise-driven approach versus information-Expertise-driven approach versus information-

driven approachdriven approach Reductionist versus integrationistReductionist versus integrationist How to answer the question becomes how to How to answer the question becomes how to

question an answerquestion an answer Algorithmic approaches for filtering, normalizing, Algorithmic approaches for filtering, normalizing,

analyzing and interpreting become increasingly analyzing and interpreting become increasingly importantimportant

Page 16: Bioinformatics

Data-driven Science Done Wrong

Must have some hypothesis – data is not the end goal Must have some hypothesis – data is not the end goal of scienceof science

Finding patterns in the data is where analysis starts, Finding patterns in the data is where analysis starts, not endsnot ends

Must understand the limits of high-throughput Must understand the limits of high-throughput technology (e.g. microarrays measure transcription technology (e.g. microarrays measure transcription only, one genome does not tell you about species only, one genome does not tell you about species variation, etc.)variation, etc.)

Must understand or explore the limits of your Must understand or explore the limits of your algorithmalgorithm

Page 17: Bioinformatics

Data is being collected faster and in greater amounts

0

100

200

300

400

500

600

700

19

80

19

85

19

90

19

95

20

00

20

05

Year

# o

f d

ata

ba

se

s (

es

tim

ate

d)

.

Page 18: Bioinformatics

Growth in microarray publications

0

2000

4000

6000

8000

10000

12000

14000

1998 1999 2000 2001 2002 2003 2004 2005

# o

f m

icro

arra

y p

aper

s

Page 19: Bioinformatics

Growth in information & knowledge

>4,800 Journals

>16,000,000 records

672,000 new papers in 2005 (~1,840 per day)

MEDLINE spans:

0

2

4

6

8

10

12

14

16

1973

1977

1981

1985

1989

1993

1997

2001

2005

Year

# o

f ar

ticl

es i

n M

ED

LIN

E (

mil

lio

ns)

Page 20: Bioinformatics

The use of software & algorithms is becoming more common in biomedical research

Page 21: Bioinformatics

BioBio informatics

Page 22: Bioinformatics

Discipline # Journals Total Papers 10 YR 3 YR TrendBiophysics 28 158,004 6.6% 10.2% 55.1%Physiology 36 111,827 5.6% 8.1% 44.0%Genetics 47 107,440 6.3% 8.7% 37.8%Food industry 22 17,359 5.2% 7.0% 35.7%Biotechnology 66 94,690 6.8% 9.2% 33.8%Oncology 64 223,394 4.3% 5.6% 30.0%Immunology 81 159,287 1.6% 2.0% 29.4%Pharmacology 137 238,989 3.8% 4.9% 29.4%Toxicology 48 76,085 4.3% 5.4% 27.0%Biomedicine 101 149,531 11.5% 14.5% 26.2%Botany 62 39,747 3.4% 4.3% 25.6%Multidisciplinary Sciences 34 120,273 6.1% 7.6% 25.6%Dermatology 25 58,757 3.4% 4.3% 25.1%Microbiology 77 178,945 2.9% 3.6% 25.0%Gastroenterology 30 80,279 6.5% 8.1% 23.6%Dentistry 36 59,034 8.6% 10.6% 23.3%Critical care 33 80,706 8.8% 10.9% 23.1%Biochemistry 110 398,602 5.7% 7.0% 23.0%Virology and Infectious Diseases 27 76,220 2.9% 3.5% 22.4%Obstetrics and Gynecology 36 92,132 5.4% 6.6% 21.0%Biology 167 221,553 4.3% 5.2% 21.0%Endocrinology 83 197,518 4.2% 5.1% 19.5%Veterinary 64 97,027 3.2% 3.9% 19.0%Urology 38 100,264 6.8% 8.0% 17.2%General Medicine 118 236,577 8.5% 10.0% 16.5%Agriculture 25 11,818 2.6% 3.1% 16.3%Psychiatry 43 76,524 8.5% 9.9% 15.9%Chemistry 36 44,772 8.2% 9.4% 14.0%Geriatrics 16 22,254 5.8% 6.6% 14.0%Surgery 86 234,211 12.3% 14.0% 13.9%Health Research 60 76,445 9.3% 10.6% 13.5%Otorhinolaryngology 19 36,150 16.1% 18.2% 13.0%Animal Husbandry and Research 14 27,631 5.1% 5.5% 9.1%Orthopedics 43 88,183 10.1% 11.0% 8.6%Acoustics 10 16,558 23.1% 25.0% 8.1%Radiology 66 123,267 45.1% 48.5% 7.5%Neurology 140 334,854 7.8% 8.3% 6.9%Pediatrics 49 111,268 8.5% 9.1% 6.9%Cardiology 107 295,901 7.4% 7.9% 6.2%Organic Chemistry 10 15,674 3.0% 3.1% 5.3%Applied and Medicinal Chemistry 23 38,244 5.6% 5.9% 4.8%Sport sciences 34 44,932 9.5% 9.8% 3.1%Psychology 21 17,060 12.6% 12.9% 3.0%Pathology 40 84,812 6.4% 6.5% 1.0%Computer Applications 17 10,439 85.0% 85.5% 0.6%Analytical Chemistry 22 55,174 8.0% 8.0% 0.3%Physics 12 15,279 6.7% 6.6% -1.7%Opthamology 34 86,172 8.9% 8.7% -2.0%Environmental 53 48,044 5.3% 5.1% -4.3%Zoology 41 32,012 3.5% 3.1% -12.9%

The use of bioinformatics and rate of growth is field-dependant

Page 23: Bioinformatics

Data => Information => Knowledge

Gene X mutatedGene X mutated

in disease Yin disease Y

Page 24: Bioinformatics

Data => Information => Knowledge

Gene X mutatedGene X mutated

in disease Yin disease Y

Gene X unmutatedGene X unmutated

in normal controlsin normal controls

Gene X is correlated with disease Y

Page 25: Bioinformatics

Data => Information => Knowledge

Gene X mutatedGene X mutated

in disease Yin disease Y

Gene X unmutatedGene X unmutated

in normal controlsin normal controls

Gene X is correlated with disease Y

Gene X homozygotes have severe disease

Gene X is probably causal in disease Y

Page 26: Bioinformatics

Bioinformatics – what can it do for you? Most research has a growing Most research has a growing

computational aspect to itcomputational aspect to it No matter what you do after No matter what you do after

graduation, being tech-savvy graduation, being tech-savvy gives you an advantagegives you an advantage

Boundaries between Boundaries between disciplines are blurringdisciplines are blurring

Page 27: Bioinformatics
Page 28: Bioinformatics
Page 29: Bioinformatics

For next time

Read Mount, Chapter 2, pages 33-40 and page Read Mount, Chapter 2, pages 33-40 and page 45 (FASTA format)45 (FASTA format)

Read Mount, Chapter 11, pages 496-511Read Mount, Chapter 11, pages 496-511