Transcript

The Human Genome

HAGenetics.org

Dr. Hasan Alhaddad Guest lecturer: Molecular Basis of Human Diseases

October 12th, 14th, 16th 2014 Room 244 (1 PM)

Lectures structure

HAGenetics.org

• Part I (Sunday Oct 12th): • The book of life (Matt Ridely’s analogy with

modifications). • Introduction to the technologies at the time.

• Part II (Tuesday Oct 14th): • Why sequencing genomes/the human genome? • Genome war (public and private projects). • Sequencing the genome.

• Part III (Thursday Oct 16th): • Genome assembly revisited. • Genome annotation. • Genome outcome. • The Genomic era.

AIMS (part III)

• Learn the basic principles and terminology of genome assembly.

• Understand the importance of genome annotation.

• Become familiar with the outcomes of the human genome.

• Understand the technologies and applications that were developed due to the human genome project.

• Become familiar with the OMICS.

HAGenetics.org

Genome Assembly Revisited

HAGenetics.org

Genome Assembly Revisited

HAGenetics.org

DNA sequence: The sequence reads that gets produced by sequencing machine.

This can be considered the primary sequence of the genome.

Genome Assembly Revisited

HAGenetics.org

Sequence alignment: order and connect overlapping sequence reads to for a Contig.

This is something you are likely to do when you sequence a gene.

Genome Assembly Revisited

HAGenetics.org

We can consider Contigs the secondary level of genome assembly.

Genome Assembly Revisited

HAGenetics.org

Scaffolds are the tertiary level of genome assembly.

Scaffolds are also referred to as Super Contigs.

Scaffolds are formed by connecting ordered Contigs.

Genome Assembly Revisited

HAGenetics.org

Scaffolds are formed by connecting ordered and Contigs. How?

Genome Assembly Revisited

HAGenetics.org

Genome Assembly Revisited

HAGenetics.org

Genome assembly quality is measured by Contig/scaffold N50 or similar measures.

Genome Assembly Revisited

HAGenetics.org

What affects the quality of genome assembly?

1.Repeat elements.

2.Variations between the individuals sequenced (segmental duplications).

Genome Annotation

HAGenetics.org

Genome annotation is very important to study the biology of an organism.

Without a proper annotation, the sequence is useless.

Remember!

A book that cannot be read and understood is useless knowledge

Genome Annotation

HAGenetics.org

Genome

Coding Non-coding

Genes

Proteins or RNA

Introns Regulators

Etc.

Repetitive DNA

Interspersed Tandem

SINE LINE LTR

Transposons

Satellite Minisatellite

Microsatellite

The genome sequence can be classified into different groups based on the overall sequence composition and structure.

Genome Annotation

HAGenetics.org

Genome annotation can be divided into two approaches:

1.Structural annotation: 1. Largely in silico. 2. Utilizing the accumulated knowledge of genes and

genomes to identify sequence signatures.

2.Functional annotation: 1. Requires a lot of work and time. 2. Studying the function of the book/code. 3. Involves biochemical analyses of the genome. 4. Gene expression and regulation.

Structural annotation

HAGenetics.org

Introns

Exons5’ UTR 3’ UTR

Start End

Un-Translated Region

Promoter sequence

Regulation sequence

Structural annotation

HAGenetics.org

Hidden Markov Models are used for bioinformatic annotation

Genome Outcome

HAGenetics.org

Genome Outcome

HAGenetics.org

A time line of the developments in genomics

Genome Outcome

HAGenetics.org

Number of genes in the human genome ~ 22K and constitute ~1.5% of the genome

Genome Outcome

HAGenetics.org

Genes categorized

Genome Outcome

HAGenetics.org

Genes categorized

Genome Outcome

HAGenetics.org

Disease genes

Genome Outcome

HAGenetics.org

Potential Drug targets

Genome Outcome

HAGenetics.org

RNA gene are present in multiple copies in the human genome. WHY?

Genome Outcome

HAGenetics.org

Exon and intron size compared to other taxa

Genome Outcome

HAGenetics.org

Overall GC content of the human genome

Genome Outcome

HAGenetics.org

GC is correlated with genes

Genome Outcome

HAGenetics.org

GC is correlated with genes

CpG islands in the promoter region can regulate gene expression

Genome Outcome

HAGenetics.org

We are repeat elements with some genes :-)

Tandem Repeat elements

HAGenetics.org

Minisatellite: Variable Number Tandem Repeats (VNTR)

Repeat unit size = hundreds base pairs

Repeated 4 times

Repeated 8 times Repeated 20 times

Microsatellite: Short Tandem Repeats (STR) – Simple Sequence Repeats (SSR)

Repeat unit size = 2 - 6 base pairs

Tandem Repeat elements

HAGenetics.org

Genome Outcome

HAGenetics.org

• Evolutionary relationship.

• Syntenic region are the conserved regions across taxa.

Genome Outcome

HAGenetics.org

A summary of each chromosome

Genome Outcome

HAGenetics.org

SNP as a marker

HAGenetics.org

Single Nucleotide Polymorphism

1. Many are found in through out the genome.

2. Found in nuclear and mitochondrial DNA.

3. No need for a lot of DNA. 4. Can be used on degraded DNA. 5. Easy to detect – many

platforms. 6. Polymorphism lower than

microsatellites.

SNP as a marker

HAGenetics.org

The SNPs identified by the human genome project allowed the development of SNP arrays (SNP chip).

SNP array allows surveying the genome for variations between individuals easily at a low price.

SNP as a marker

HAGenetics.org

SNP as a marker

HAGenetics.org

Commercial uses of SNP markers to learn about ancestry and health

SNP as a marker

HAGenetics.org

Genome-wide Association studies

(GWAS)

Beyond the genome

HAGenetics.org

The ENCycleopedia Of DNA Elements

1. Transcripts 2. Regulatory elements 3. Enhancers 4. Silencers 5. Origins of replication 6. CpG islands 7. Histone modification sites 8. Open chromatin sites

Beyond the genome

HAGenetics.org

Genome papers are no longer news

The OME Era

HAGenetics.org

The OME Era

HAGenetics.org

top related