Top Banner
CS882: Advanced Topics in Bioinformatics History and Frontier of Bioinformatics
36

CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Mar 18, 2018

Download

Documents

phamdieu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

CS882: Advanced Topics in Bioinformatics

History and Frontier of Bioinformatics

Page 2: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Experience is a dear teacher, but fools will learn at no other.

-- Benjamin Franklin

Why Study History?

Page 3: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

This Course

• I will try to introduce bioinformatics problems in the context of history. – Developments in biology that lead to bioinformatics – Sequence Comparison – Genome sequencing – Proteomics

• Bioinformatics is too broad an area to be fully surveyed in a course. – This course is a sample of the works

• From the course, I hope you can learn – Many bioinformatics research problems. – How bioinformatics area evolved. – How to choose research problems. – How to formulate interesting, useful, and solvable problems.

Page 4: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Evaluation • Students choose between a seminar or a course project • Seminars

– Read several (10+) related research articles, – Write a survey, and comment on the significance and impact of the

papers. – Why (bother)? Who (did it)? What (was achieved)? When? How? So

what (the impacts)? – Predict the future developments.

• Projects – A few small course projects available for selection. – Coding involved. – Write a report.

• Both need to do a presentation. • Evaluation:

– participation (20%) – verbal presentation (40%) – written report (40%)

Page 5: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Ways to Find Survey Topics • A research area (or a sub-area)

– Sequence comparison

– Genome sequencing

– Proteomics

– Phylogeny

– Gene expression

– Protein-protein interaction network

– Haplotype

– Protein structures

• A research problem

– Find a research paper (RECOMB 2013, 2014)

– Read its references.

– Read citations of the important references (Google scholar useful)

– Survey the history and development of that research problems.

Page 6: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Research Projects • Select one of the following

– Study the genome and proteome similarity between organisms.

– Succinct representation of redundant data (compression isn’t the only goal. Better access important).

• EST

• Protein

• NGS

• I’ll come up with more next week…

Page 7: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

A QUICK REVIEW OF BACKGROUND Important Biology Advancements that Leads to Bioinformatics

Page 8: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Central Dogma

Page 9: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Ancient Time

• Our ancestors know something about genetics.

• Inheritance: Things produce children just like themselves.

• Selective breeding.

Image credit: The Cartoon Guide to Genetics by Gonick and Wheelis

Page 10: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Gregor Mendel (1822– 1884)

A scientist and a monk from Austria.

Mendel studied Pea plants.

Page 11: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Genotypes Control Phenotypes

Homozygous Heterozygous

• In this case, there are two genotypes of the same gene, A and a, for the height of the pea plants.

• Each individual has two copies of the gene, from two parents.

• A dominates a.

Page 12: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Hybridization

Homozygous Heterozygous

Page 13: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Descendants of Heterozygous Pea

Page 14: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

A Comment

• Experiment Data Knowledge

• Early days the data is small enough to be processed by a human.

• But today it requires computer – that is bioinformatics.

Page 15: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Rediscovery of Mendel's Work

• Mendel’s work was ignored by the world , until it was rediscovered around 1900, by Hugo de Vries and Carl Correns.

– 16 years after Mendel died.

Page 16: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Darwin (1809 – 1882) “I have called this principle, by which each slight variation, if useful, is preserved, by the term Natural Selection.”

Evolution

Page 17: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Phylogeny Trees

• In the past people dig the fossil to study the evolution.

• Now use characteristics (e.g. DNA sequence) of today’s species to computationally construct the evolution history.

Page 18: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Chromosomes

Walter Sutton (left) and Theodor Boveri (right) independently developed the chromosome theory of inheritance in 1902.

Human has 23 pairs of Chromosomes.

Page 19: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Genetic Map

If two genes are on the same chromosome, they tend to inherit together.

(AB, ab) x (AB, ab) will not give Ab, if there’s no cross-over.

Page 20: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Chromosomes Crossover

With this model, can you suggest a way to do genetic map (or linkage map)?

Page 21: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

DNA

Page 22: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Base Pairs

• 4 different nucleotides in DNA.

– A, C, G, T

• A single strand is a sequence of A,C,G,T.

• The other complementary strand can be computed: A-T, and C-G. These are base pairs.

Page 23: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

DNA Replication

Page 24: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Before the Discovery of DNA

• In 1869, DNA was first isolated by the Swiss physician Friedrich Miescher. He called it "nuclein” because it’s in nuclei of the cell.

• In 1878, Albrecht Kossel isolated the non-protein component of "nuclein", nucleic acid, and later isolated its five primary nucleobases.

• In 1927 Nikolai Koltsov proposed that inherited traits would be inherited via a "giant hereditary molecule" which would be made up of "two mirror strands that would replicate in a semi-conservative fashion using each strand as a template".

Page 25: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

First Confirmation of DNA’s Role

• 1928, Griffith concluded that the type II-R had been "transformed" into the lethal III-S strain by a "transforming principle" that was somehow part of the dead III-S strain bacteria

• 1944, Oswald Avery, Colin MacLeod, and Maclyn McCarty, confirmed that DNA was the “transforming principle”.

Griffith’s Experiment

Page 26: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Discovery of DNA Structure

• 1952, Rosalind Franklin and Raymond Gosling used X-ray crystallography to help visualize the structure of DNA.

• 1953, James D. Watson and Francis Crick suggested the first correct double-helix model of DNA structure.

• In 1962, after Franklin's death, Watson, Crick, and Wilkins jointly received the Nobel Prize in Physiology or Medicine.

Rosalind Franklin

Page 27: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

DNA sequencing

• Sanger Sequencing was developed in 1977 and soon became the method of choice.

ATACTCAC…. DNA to be sequenced

Grew the other strand using target DNA as a templete

• Monomers: A, C, G, T, and a modified A. • If a growth uses the modified A, then the

growth stops.

Page 28: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Sanger Sequencing

Do the experiment for all four bases, and separate different lengths with gel electrophoresis.

Page 29: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Popularity of DNA Sequencing

• Private sector played important role

• Applied Biosystem made the first automated DNA sequencer and a lot of money.

ABI 3130

Page 30: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Applied Biosystems

• May 1981, the company was founded by two scientist/engineer from Hewlett Packard, Sam Eletr and Andre Marion.

• 1982, first commercial instrument, the Model 470A Protein Sequencer. 40 employees, $400K revenue.

• 1983, employees = 80, IPO, revenues= US$5.9 million. Model 380A DNA Synthesizer. Licensed automated sequencing technology using fluorescent dyes from CIT.

• 1984, revenue US$18 million, 200 employees.

• 1985, revenue US$35 million.

• 1986, revenue US$52 million. The release of the Model 370A DNA Sequencing System, using fluorescent tags, revolutionized gene discovery.

• 1987, revenues US$85 million, 788 employees.

• 1988, revenue US$132 million, 1000 employees. In that year for the first time, genetic science reached the milestone of being able to identify individuals by their DNA.

• 1989, revenue reached nearly $160 million.

• 1990, the U.S. government approved financing to support the Human Genome Project.

• 1993, acquired by Perkin Elmer.

Page 31: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Human Genome Project

• The Human Genome Project (HGP) is an international project with a primary goal of sequencing and annotate human genome.

– October 1990, launch.

– 2003, finished sequencing and initial analysis.

– Funded by public funding, >3 Billion dollars spent.

Page 32: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Celera Corporation

• Founded 1998 by PE Corporation and Dr. J. Craig Venter.

• Craig Venter sequenced Yeast genome at TIGR (The Institute of Genetic Research)

• Competing with the public effort on finishing human genome.

• 2003, finished human at almost the same time (Venter announced the victory).

• Data: – Public: 13 years, 3 billion$

– Celera: 5 years, 300 million$

• Celera has access to prior knowledge.

Page 33: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Competing in Bioinformatics

Gene Myers vs. Jim Kent

Page 34: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

• “In a short time it will be hard to realize how we managed without the sequence data. Biology will never be the same again.”

-- N. Williams. Closing in on the complete yeast genome sequence. Science, 268:1560-61. 1995.

The rise of Bioinformatics!

Page 35: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Genome Sequenced, So What? • It turns out that genome sequencing isn’t the end of the story. • People called it the post-genome era after 2003. • It was a landmark but didn’t solve all problems. • First of all, your genome and my genome are different.

– 1000 genome project.

• Second, genes are expressed differently at different time/cells/conditions. – Gene expression (microarray) – A flash in the pan.

• Thirdly, proteins are not only expressed differently, but also modified. – Proteomics (mass spectrometry). – (HUPO) Human Proteomics Organization. – Proteomics started to produce some “biomarkers”

• Metabolomics, Glycomics … – Life is very complex.

Page 36: CS882: Advanced Topics in Bioinformaticsbinma/cs882/1.pdf · –Proteomics • Bioinformatics is too broad an area to be fully surveyed in a ... • Students choose between a seminar

Wrap Up

• Pre-bioinformatics developments in biology

• Emerging of bioinformatics – Bioinformatics deals with data

• Initial impacts of bioinformatics

• Many more years of challenging problems for bioinformatics. – Triggered by new measurements technology

– Accelerate the developments of new measurements.