Top Banner
Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko
37

Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

Jan 18, 2018

Download

Documents

Toby Sherman

3 Course Requirements 1.Submit written assignments. 1.9/12 short class assignments 4/4 home assignments 2.Each assignment is to be done and submitted in pairs (except the first two class assignment). 3.The pairs are ideally composed of a person from computer science and a person from life science. 2.A final project or a take home exam, submitted in pairs. 3.The course web site:
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

Introduction to Bioinformatics

Dr. Yael Mandel-GutfreundTA: Oleg Rokhlenko

Page 2: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

2

Course Objectives

• To introduce the bioinfomatics discipline • To make the students familiar with the major

biological questions which can be addressed by bioinformatics tools

• To introduce the major tools used for sequence and structure analysis and explain in general how they work (limitation etc..)

Page 3: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

3

Course Requirements1. Submit written assignments .

1. 9/12 short class assignments 4/4 home assignments2. Each assignment is to be done and submitted in pairs (except

the first two class assignment).3. The pairs are ideally composed of a person from computer

science and a person from life science.

2. A final project or a take home exam, submitted in pairs.

3. The course web site: http://webcourse.cs.technion.ac.il/236523

Page 4: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

4

Grading

• 10 % class assignments• 30 % home assignments• 60% final project/ test

Page 5: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

5

Literature list• Gibas, C., Jambeck, P. Developing Bioinformatics

Computer Skills. O'Reilly, 2001. • Lesk, A. M. Introduction to Bioinformatics. Oxford

University Press, 2002.• Mount, D.W. Bioinformatics: Sequence and Genome

Analysis. 2nd ed.,Cold Spring Harbor Laboratory Press, 2004.

Advanced Reading

Jones N.C & Pevzner P.A. An introduction to Bioinformatics algorithms MIT Press, 2004

Page 6: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

6

Course Outline• Introduction to bioinformatics • Bioinformatics databases• Pairwise and multiple sequence alignment • Searching for sequences in databases• Searching for motifs in sequences• Phylogenetics• RNA secondary Structure• Protein structure: secondary and tertiary structure• Proteins families: motifs, domains, clustering• The Human Genome Project• Gene prediction, alternative splicing• Gene expression analysis (DNA microarrays)• Comparative genomics, Biological networks

Page 7: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

7

Course Outline• Introduction to bioinformatics • Bioinformatics databases• Pairwise and multiple sequence alignment • Searching for sequences in databases• Searching for motifs in sequences• Phylogenetics• RNA secondary Structure• Protein structure: secondary and tertiary structure• Proteins families: motifs, domains, clustering• The Human Genome Project• Gene prediction, alternative splicing• Gene expression analysis (DNA microarrays)• Comparative genomics, Biological networks

Page 8: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

8

Introduction to Bioinformatics

• What is Bioinformatics?• From DNA to Genome• What’s next? the post genomic era

Page 9: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

9

“the field of science in which biology, computer science, and information technology merge to form a single discipline

Ultimate goal: to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.”

What is Bioinformatics?

Page 10: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

10

Central Paradigm in Molecular Biology

mRNAGene (DNA) Protein

TranslationTranscription

DNA RNA Protein Symptomes (Phenotype)

Page 11: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

11

21st century Biology –from purely lab-based science to an information science

Page 12: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

12

Central Paradigm of Bioinformatics

GeneticInformation

Molecular Structure

BiochemicalFunction Symptoms

Page 13: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

13

From DNA to Genome

Watson and Crick DNA model

Sanger sequences insulin protein

ARPANET (early Internet)

Sanger dideoxy DNA sequencing

PDB (Protein Data Bank)

N-W sequence alignment

GenBank database

PCR (Polymerase Chain Reaction)

1955

1960

1965

1970

1975

1980

1985

Dayhoff’s Atlas of Protein Seqs.

Page 14: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

14

1995

1990

2000

SWISS-PROT databaseUSA’s NCBI

WWW (World Wide Web)

Celera Genomics First human genome draft

Israel’s INN

Human Genome Initiative

BLAST algorithm

FASTA algorithm

First bacterial genome

Europe’s EBI

Yeast genome

Page 15: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

15

• 1994 0

• 1995 1

• 2004 234

eukaryotes 20

bacteria 194

archaea 19

Complete Genomes

Page 16: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

16

The “post-genomics” eraThe “post-genomics” era

Goal: to understand the functional networks of a living cell

Annotation Comparativegenomics

Structuralgenomics

Functionalgenomics

What’s Next ?

Page 17: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

17

Annotation

Open reading frames

Functional sites

Structure, function

Page 18: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

18

CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATGCGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAACTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTCAGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGAAGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAATAT GGA CAA TTG GTT TCT TCT CTG AAT .................... TGAAAAACGTA

Page 19: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

19

CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATGCGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAACTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTCAGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGAAGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAATAT GGA CAA TTG GTT TCT TCT CTG AAT .................................

.............. TGAAAAACGTA

TF binding sitepromoter

Ribosome binding SiteORF=Open Reading FrameCDS=Coding Sequence

Tran

script

ion

Start Si

te

Page 20: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

20

Comparativegenomics

Comparing ORFs

Identifying orthologs

Concluding on structure and function

Comparing functional sites

Concluding on regulatory networks

Page 21: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

21

Researchers have learned a great deal about the function of human genes by examining their counterparts in simpler model organisms such as the mouse.

Conservation of the IGFALS (Insulin-like growth factor)Between human and mouse.

Page 22: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

22

Ultraconserved Elements in the Human Genome Gill Bejerano,1* Michael Pheasant,3 Igor Makunin,3 Stuart Stephen,3W.James Kent,1 John S. Mattick,3 David Haussler2* There are 481 segments longer than 200 base pairs (bp) that are absolutely conserved (100% identity with no insertions or deletions) between orthologous regions of the human, rat, and mouse genomes. Nearly all of these segments are also conserved in the chicken and dog genomes, with an average of 95 and 99% identity, respectively. Many are also significantly conserved in fish. These ultraconserved elements of the human genome are most often located either overlapping exons in genes involved in RNA processing or in introns or nearby genes involved in the regulation of transcription and development. Along with more than 5000 sequences of over 100 bp that are absolutely conserved among the three sequenced mammals, these represent a class of genetic elements whose functions and evolutionary origins are yet to be determined, but which are more highly conserved between these species than are proteins and appear to be essential for the ontogeny of mammals and other vertebrates.

Page 23: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

23

Functionalgenomics

Genome-wide profiling of:• mRNA levels• Protein levels

Co-expression of genes and/or proteins

Identifying protein-protein interaction

Networks of interactions

Page 24: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

24

Understanding the function of genes and other parts of the genome

Page 25: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

25

Structural genomics

Assign structure to all proteins encoded in a genome

Page 26: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

26

Structural Genomics Expectations

~300unique folds

in PDB

~300 unique folds

Currently

27761 structure

Page 27: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

27

Structural Genomics Expectations

1000-3000unique folds

in “structure space”

Estimate

Page 28: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

28

Course Outline• Introduction to bioinformatics • Bioinformatics databases• Pairwise and multiple sequence alignment • Searching for sequences in databases• Searching for motifs in sequences• Phylogenetics• RNA secondary Structure• Protein structure: secondary and tertiary structure• Proteins families: motifs, domains, clustering• The Human Genome Project• Gene prediction, alternative splicing• Gene expression analysis (DNA microarrays)• Comparative genomics, Biological networks

Page 29: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

29

Database TypesSequence databases

General specialGenBank, embl TF binding sitesPIR, Swissprot Promoters

Genomes

Structure databases

General SpecialPDB Specific protein families

folds

Databases of experimental resultsCo-expressed genes, prot-prot interaction, etc.

Page 30: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

30

• World Wide Web– USA National Center for Biotechnology

Information: www.ncbi.nlm.nih.gov– European Bioinformatics Institute:

www.ebi.ac.uk– ExPASy Molecular Biology Server:

www.expasy.org– Israeli National Node: inn.org.il

http://www.agr.kuleuven.ac.be/vakken/i287/bioinformatica.htm

Page 31: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

31

Entrez – NCBI Engine

• Entrez is the integrated, text-based search and retrieval system used at NCBI for the major databases, including PubMed, Nucleotide and Protein Sequences, Protein Structures, Complete Genomes, Taxonomy, and others.

http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi?itool=toolbar

Page 32: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

32

Entrez – NCBI Engine

Page 33: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

33

Nucleotide

Nucleotides database is a collection of sequences from several sources, including GenBank, RefSeq, and PDB.

April 2004 -> 38,989,342,565 bases

Page 34: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

34

PubMed

• MEDLINE publication database– Over 17,000 journals– Some other citations

• Papers from 1960s– Over 12,000,000 entries

• Alerting services– http://www.pubcrawler.ie/– http://www.biomail.org/

Page 35: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

35

OMIM

• Online Mendelian Inheritance in Man– Genes and genetic disorders– Edited by team at Johns Hopkins– Updated daily

• Entries– 10670 single-loci phenotypes (*)– 1294 multi-loci phenotypes (#)– 2415 unclassified phenotypes

Page 36: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

36

Searching PubMed

• Structureless searches– Automatic term mapping

• Structured searches– Field names, e.g. [au], [ta], [dp], [ti]– Boolean operators, e.g. AND, OR, NOT, ()

• Additional features– Subsets, limits– Clipboard, history

Page 37: Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.

37

Searching OMIM

• Search Fields– Disease name, e.g. hypertension– Cytogenetic location, e.g. 1p31.6– Inheritance, e.g. autosomal dominant

• Browsing Interfaces– Alphabetical by disease– Genetic map

• Additional features like PubMed