Top Banner
A BIOINFORMATIC GENE HUNTING
51

A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

A BIOINFORMATIC GENE HUNTING

Page 2: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

E-learning"Tools and tips for science

teachers"

http://ariel.ctu.unimi.it/corsi/bioteach/home

Page 3: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Bioinformatics

When biology meets informatics

Page 4: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

What is bioinformatics?

•Creation and maintenance of databases to store biological information

•Development of mathematical and statistical tools for analysis, interpretation and continuous updating of biological information

•Development of new tools to assess relationships among members of large data sets in order to obtain a comprehensive picture of normal cellular activities and their alterations

•Data sharing

Page 5: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Bioinformatics includes:

1. Databases collecting

experimental data generated in

research laboratories

2. Software for navigating

databases

Page 6: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Where does bioinformatics stem from?

Human Genome Project

Experimental efforts to determine structure

and function of biologicalmolecules

Production of large data sets

Molecular biology databases(genes and proteins)

Interpretation

Techniques, tools, algorithmsfor analysis, comparison, classification,interpretation

Page 7: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

The global approach to the study of biological data refers to the possibility for analysis and

comparison of:

• Genomes ( the whole genetic information of a given organism)

• Transcriptomes ( the full set of RNAs of a given organism )

• Proteomes ( the full set of proteins of a given organism)

Page 8: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Applications of bioinformatics analysis

MEDICINE

AGRICULTUREPHARMACEUTICS

Page 9: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

A database is a collection of information.

Databases are made of “entries”.

Databases

Page 10: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Biological databases •A biological database is a large collection of information and data derived from laboratory studies (in vitro and in vivo analysis), from bioinformatics (in silico analysis) and from the scientific literature.

•Data are structured so to enable efficient user access and management of different types of information.

Page 11: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Bioinformatics was essential to obtain the complete sequence of

the human genome

Genomic DNA

Random long (5-20 kb) and short (0.4-1.2 kb) fragments derived from mechanical breakage of DNA were cloned in

vectors and sequenced.

Bidirectional automated sequencing

Computerized reconstruction of genomic sequence

Whole genome shotgun

Page 12: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Primary and specialized

databases

Primary databases collect nucleotide sequences (DNA , RNA) or protein sequences containing general information for the retrieval of sequences, and to identify species of origin and function.

Specialized databases collect large sets of homogeneous records (taxonomic, functional, literature, etc. etc...), with additional annotations and specific information.

Page 13: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

---ATGTTGAAGTTCAAGTATGGT---

--MLKFKYG--

Nucleotide sequence database

Amino acid sequence database

3D structures database

Genetic diseases database

Gene expression database

Page 14: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

How to extract information from a

database

We can combine different criteria by means of Boolean operators to intersect (operator AND), add (operator OR) or exclude (operator BUT NOT) information. More Boolean operators are available for more sophisticated searches (IN, NEAR and WITH).

By entering a text in a box (like with a search engine, i.e. google) or filling in a given form

AND

OR

BUT NOT

Page 15: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Algorithms in bioinformatics

Algorithms to compare sequences:- to assess similarities - to study molecular evolution and phylogenesis

Algorithms to predict:- genes- regulatory elements (promoters, etc.)- RNA structures- protein structures

Page 16: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Some important results obtained by bioinformatics:

• Search for homologous genes in the same and in different species

• Identification of genes and genetic markers

• Identification of disease-associated genes

• Prediction of three-dimensional structures of proteins

• Design of new drugs

• Data sharing

Page 17: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .
Page 18: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Genetic-based differences in the response to drugs

Comparing two human genomes, single base differences are found, on average, every 1200-1500 base pairs

Each individual is unique

A new “omics” discipline: PHARMACOGENOMICS

Page 19: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

What is pharmacogenomics for?

Patient with genetic defect

reduced dose of drug

standard drug

1/10 thiopurine

Page 20: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

What do you need to know to “surf among the genomes”

without being submerged by the waves !!!

Page 21: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Chromosome structure and classification

metacentric acrocentricsubmetacentric

long arm

q

short arm

p

satellite

centromere

Page 22: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Human karyotype and chromosome map

Page 23: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Chromosome banding

Karyotype: Q banding

Karyotype: G banding

Page 24: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Each chromosomehas a specific

banding pattern

Page 25: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Chromosomes mutations

Fig.10.2.1 Mutazioni cromosomiche

delezione

traslocazione

inversione

Basi perse

GAC-AAA-GGA-TGA-CTG original sequence

GAC-AAA-CGA-TGA-CTG substitution

GAC-AAA-TGG-ATG-ACT-G insertion

GAC-AA~G-GAT-GAC-TG deletion

Gene mutations or point mutations

Page 26: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Identification of genes and genetic

markers

Page 27: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Identification of disorder-

associated genes

Page 28: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

From gene to protein

Exon 1 Exon 2 Exon 3 Exon 4

Intron 1 Intron 2 Intron 3

Starttranscription

Endtranscription

H2N

Transcription

COOH

5’UTR

3'

5'3'

3'5'

Maturation

Translation

DNA

preRNA

mRNA

protein

5'

3’UTR

Page 29: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Prediction of genes within a genomic region

• Internal exons (---exon---gt---intron---ag---exon---)• First exon (5’ UTR sequence)• Last exon (3’ UTR sequence)• Unique exons• Alternative splicing sites• Promoters (TATA e CAAT boxes)• Polyadenylation signals (AAUAAA)• start codon ATG• STOP codon

Page 30: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

splice sites

Page 31: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Splicing

Page 32: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Alternative splicing

Page 33: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Alternative splicing

Page 34: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Here is a comprehensive view of what you should find among the

genome waves .… enjoy your surfing!!

Page 35: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Finding the Genes

Dr. Blat helping a gene find itself.

Page 36: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Chromosomes mutations

Fig.10.2.1 Mutazioni cromosomiche

delezione

traslocazione

inversione

Basi perse

GAC-AAA-GGA-TGA-CTG original sequence

GAC-AAA-CGA-TGA-CTG substitution

GAC-AAA-TGG-ATG-ACT-G insertion

GAC-AA~G-GAT-GAC-TG deletion

Gene mutations or point mutations

Page 37: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Bioinformatics uses algorithms

Algorithms to compare sequences:- to assess similarities - to study molecular evolution and phylogenesis

Algorithms to predict:- genes- regulatory elements (promoters, etc.)- RNA structures- protein structures

Page 38: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Genome sequence

Sequence Similarity Searches

Genome sequerce

Ganome sequence

Genome spequence

Genetic variability

Genme sequence

mutations

Page 39: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

•Evolution implies the generation of morphological

and molecular variants.•At the molecular level, variants are created by

errors (mutations) during DNA replication not

corrected by DNA repair systems. •Introduction of mutations (single aa substitutions,

deletions, insertions) imply that DNA segments

with the same function in different organisms don’t

share exactly the same sequence.

Sequences conservation and

evolution

Page 40: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Sequence alignment programs to study variability

Sequence alignment establishes a biunivocal relationship between two sequences (or parts of them) so minimizing the number of operations necessary to transform one sequence into the other.

Page 41: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Alignment is obtained by comparing

sequences in a pairwise fashion

Each comparison is given a score which is

a measure of the degree of similarity

Page 42: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

E V D Q K I S - - K W D| | | | | | |E V - K K I T R P K W D

SA= E V D Q K I S K W D

SB= E V K K I T R P K W D

gap mismatchmatch

Alignment:

When sequences are not identical, the alignment must contain gaps and mismatches

Page 43: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Identity, Similarity and HomologyIdentityThe extent to which two sequences are invariant

SimilarityQuantitative parameter defined by the alignment score

HomologyOrigin from a common ancestor sequence

Page 44: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Homologous Sequences

ATA GAAKAVALVLPNLKGKLNGIALRVPTPNVSVVDLVVQVSKK-TFAEEVNAAFRDSAEK-- 328ATB GAAKAVSLVLPQLKGKLNGIALRVPTPNVSVVDLVINVEKKGLTAEDVNEAFRKAANG-- 351HS GAAKAVGKVIPELNGKLTGMAFRVPTANVSVVDLTCRLEKP-AKYDDIKKVVKQASEG-- 268MM GAAKAVGKVIPELNGKLTGMAFRVPTPNVSVVDLTCRLEKP-AKYDDIKKVVKQASEG-- 266XL GAAKAVGKVIPELNGKITGMAFRVPTPNVSVVDLTCRLQKP-AKYDDIKAAIKTASEG-- 266DM GAAKAVGKVIPALNGKLTGMAFRVPTPNVSVVDLTVRLGKG-ASYDEIKAKVQEAANG-- 265CE GAAKAVGKVIPELNGKLTGMAFRVPTPDVSVVDLTVRLEKP-ASMDDIKKVVKAAADG-- 274SP GAAKAVGKVIPALNGKLTGMAFRVPTPDVSVVDLTVKLAKP-TNYEDIKAAIKAASEG-- 268ATC GAAKAVGKVLPALNGKLTGMSFRVPTVDVSVVDLTVRLEKA-ATYEEIKKAIKEESEG-- 272OS GAAKAVGKVLPDLNGKLTGMSFRVPTVDVSVVDLTVRIEKA-ASYDAIKSAIKSASEG-- 270SC GAAKAVGKVLPELQGKLTGMAFRVPTVDVSVVDLTVKLNKE-TTYDEIKKVVKAAAEG-- 266ECA GAAKAVGKVLPELNGKLTGMAFRVPTPNVSVVDLTVRLEKA-ATYEQIKAAVKAAAEG-- 266HI GAAKAVGKVLPALNGKLTGMAFRVPTPNVSVVDLTVNLEKP-ASYDAIKQAIKDAAEGKT 268ECC GAAKAIGLVIPELSGKLKGHAQRVPVKTGSVTELVSILGKK-VTAEEVNNALKQATTN-- 266

Homologous sequence comparison helps in:

•identifying important structural and functional domains of a given protein•identifying aa residues responsible for common features and those responsible for different features of a given protein

Page 45: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Degree of Sequences Conservation • In sequence alignment both sequence identity

and degree of conservation of different aa residues in positions where the two sequences differ are taken into consideration.

• Molecules with similar primary aa sequence tend to have similar secondary and tertiary structures

• If two proteins share 50% of their sequence, the probability that they have superimposable 3D structures is very, very high

Conservative (two aa with similar chemical properties) substitutionsSemi-conservative substitutionsNon-conservative substitutions

Page 46: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Genes in evolutionHomologous genes are those evolved from a common ancestral precursor gene:

•orthologous genes: genes in different species that have evolved directly from an ancestral gene, generally maintaining the same function.

•paralogous genes: two genes or clusters of genes at different chromosomal locations in the same organism that have structural similarities and have diverged from the parent copy by duplication. In general, their function is different although correlated with that of the ancestral precursor gene.

Page 47: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

The three-letter and one-letter amino acid code

Page 48: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Amino acid polarity

polar aanon polar aa

+ -

Page 49: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Sequence conservation during evolution

• Evolution doesn’t work on DNA sequences or on primary structures of proteins, but only on 3D structures of proteins

• As a consequence of this and of the degeneration of the genetic code, 3D structure of proteins is more conserved than primary structure, which in turn is more conserved than the nucleotide coding sequence

-ATGTTGAAGTTT-- M L K F -

-ATGTTGAAGTTT-- M L K F -

-ATGTTGAAGTTT-- M L K F -

-ATGTTGAAGTTC-- M L K F -aa sequence identity

-ATGTTGAAGTAT-- M L K Y -

Different aa sequence, conserved

structure

-ATGTTGAAGGTT-- M L K V -

Different aa sequence, altered 3D

structure

Page 50: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Model organismsModel organisms

Zebrafish Danio rerio

Mouse Mus musculus30.000 geni19.000 geni

Nematode Caenorhabditis elegans

Fruit flyDrosophila melanogaster

30,000 genes

13.600 geni

Page 51: A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" .

Unknown Function

The specific function of the major part of the human genome is unknown