Top Banner
Heading for full solution to Now Generation Informatics BGI-Shenzhen Sep 19, 2011
39
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 20110919 beyond genome

Heading for full solution to Now Generation Informatics

BGI-Shenzhen

Sep 19, 2011

Page 2: 20110919 beyond genome

Nothing in biology makes sense except in

light of evolution

Theodosius Dobzhansky

“Tree” type of thinking of Genomics

They are different, they are also related

Page 3: 20110919 beyond genome

What is the scope of bioinformatics?

• Bioinformatics is to understand the tree of life.

• Bioinformatics will:

– Draw trees (basic information)

– Map information on trees (association/cause-effect)

– Show the trees (visualizations, databases, clouds)

Page 4: 20110919 beyond genome

Mission 1: Tree of Species

A set of different genes (sequence) made different forms of life

Page 5: 20110919 beyond genome

Mission 1: Tree of Species

• Draw– De novo genome assembly

– Multiple sequence mapping and alignment

– Phylogenic tree construction

• Map– In-depth Annotation

– Comparative genomicss

• Show– Genome browsers

Page 6: 20110919 beyond genome

Dinner

kung pao chicken

Peking Duck

Mapo toufu

Cabbagecucumber

oyster

“taste good, sequence it!”

Page 7: 20110919 beyond genome

Factory

Silk and silkworm Oil and castor bean

Cloth and cotton

“Useful, sequence it!”

Page 8: 20110919 beyond genome

Zoo

Panda Polar bear and Penguin Antelope

“look cute, sequence it!”

Page 9: 20110919 beyond genome

Misson 2: Tree of Individuals

A set of different variations (sequence) made different individuals/cells of Human

Page 10: 20110919 beyond genome

An Evolutionary perspective

The oldest human alleles originated in Africa well before the diasporas ofmodern humans 50,000 – 60,000 years ago.

These oldest alleles are common in all populations worldwide.

Approximately 90% of the variability in allele frequencies is of this sort.

From Mary-Claire King

Page 11: 20110919 beyond genome

• International project to construct a next generation baseline data set for human genetics

– Sequence level HapMap, an order of magnitude deeper

– Consortium with multiple centres, platforms, funders

• Aims

– Find >95% accessible SNPs at allele frequencies above 1%, down towards 0.1% in coding regions

– Genotype them and place on haplotype backgrounds

– Also discover and characterize indels, structural variants

Page 12: 20110919 beyond genome

An Evolutionary perspective

From Mary-Claire King

• Germlinede novo substitution rate =~ 1 x10-8 per generation

• Somatic/LCL substitution rate = 7-12x higher than germline rate

• Male mutation rate ~7x higher than female mutation rate

From 1000G Project

Development of agriculture in the past 10,000 years and of urbanization and industrialization inthe past 700 years has led to rapid populations growth and therefore to the appearance of vastnumbers of new alleles, each individually rare and specific to one population or even to one family.

Page 13: 20110919 beyond genome

What’s the whole picture of genetic variants ?

Allele Frequency

50% 5% 0.5% 0.05%Rarer Alleles

Stronger Effects

Common Alleles

Less Effects

Very Rare Alleles

Strongest Effects

Common/rare Disease Mendelian

Disease

Eg: MC4R, ABCA1 1q21.1 in SCZ

Eg: CFTR delta 508 PCSK9

C679X

Billion Genomes

Project

Personal genomics with phenotype information

Page 14: 20110919 beyond genome

• The history of silkworm domestication

D DomesticatedW wild

Silkworm domestication historySilkworm phylogenetic tree

Published in Science 16 Oct.

•relationship is not simply follow the geographic distribution which reflect gene-flow and other population level processes related to human activities such as ancient commercial trade• domestication event lead to a 90% reduction in effective population size during the initial bottleneck

Page 15: 20110919 beyond genome

from Anderssonand Georges, Nature Reviews of Genetic5: 202-212 (2004)

selective sweep: inheritance of regions around adaptive alleles

extent of selective sweep for domestication in MAIZE: tb1 locus (60 to 90-kb) (Clark et al. 2004), Y1 locus (about 600-kb) (Palaisa et al. 2004)

Page 16: 20110919 beyond genome

Domestication• Genome variation during silkworm domestication

Published in Science 16 Oct.

354 candidate domesticated genes

159 tissue-specific expressed (silk gland, midgut, testis)

Page 17: 20110919 beyond genome

50 Tibetan’s and 40 Han’s exomes has been sequenced

Function further validated in-Association with blood hemoglobin level-Expression level difference in placenta

EPAS1: endothelial Per-Arnt-Sim(PAS) domain protein 1

The gene (EPAS1) showing strongest selection signal (up to 80% frequency change in allele distribution), Han: 9%; Tibetan: 87%

The signal of selection

Page 18: 20110919 beyond genome

Your Micro-Environment, Your other genome?

Page 19: 20110919 beyond genome

PCA analysis for 85 Danish samples (based on gene profiling)

BMI data

Gene level

Page 20: 20110919 beyond genome

Misson 2. Tree of Individuals

• Draw

– (Complete spectrum of) variation identification

– Population frequencies and spectrums

• Map

– Selection and evolution

– Phenotypic traits

– Intermediate phenotypes

Page 21: 20110919 beyond genome

Misson 3: Tree of Cells

• Cell lineages are characterized by single biological levels and their inter-correlations.

Page 22: 20110919 beyond genome

+ : cancer*: normal*:cells possibly mixed (from tumor, but clustered to normal cells)

On DNADifferentiate the cancer and normal cells by PCA analysis

ET AML

BTCC

these cancers are really heterogeneous.

Page 23: 20110919 beyond genome

ET AML

Phylogenetic trees clearly show subpopulations in ET and AML cancers

Acute Myeloid LeukemiaEssential Thrombocythemia

Page 24: 20110919 beyond genome

Consensus Tree

Inferring key genes in AML (a typical heterozygous cancer)

Key Gene?

Key Gene for sub-pop?

Page 25: 20110919 beyond genome

G1~G6: different subpopulations from AML cancer

MLL: myeloid/lymphoid or mixed-lineage leukemia, recurrent translocations in acute leukemias that may be characterized as acute myeloid leukemia (AML; MIM 601626), acute lymphoblastic leukemia (ALL), or mixed lineage (biphenotypic) leukemia (MLL).

Key genes for AML

Page 26: 20110919 beyond genome

Inferring key genes in AML (a typical heterozygous cancer)

G1~G6: different subpopulations from AML cancer

LILRA1: leukocyte immunoglobulin-like receptor

Page 27: 20110919 beyond genome

G1~G6: different subpopulations from AML cancer

CTNNA1:Leukocyte transendothelial migration; Pathways in cancer

Inferring key genes in AML (a typical heterozygous cancer)

Page 28: 20110919 beyond genome

G1~G6: different subpopulations from AML cancer

CTSS: cathepsin

Inferring key genes in AML (a typical heterozygous cancer)

Page 29: 20110919 beyond genome

G1~G6: different subpopulations from AML cancer

PPP2R1A: TGF-beta signaling pathway

Inferring key genes in AML (a typical heterozygous cancer)

Page 30: 20110919 beyond genome

G1~G6: different subpopulations from AML cancer

DIAPH1: Focal adhesion; Regulation of actin cytoskeleton

Inferring key genes in AML (a typical heterozygous cancer)

Page 31: 20110919 beyond genome

G1~G6: different subpopulations from AML cancer

LILRA1: leukocyte immunoglobulin-like receptor

Inferring key genes in AML (a typical heterozygous cancer)

Page 32: 20110919 beyond genome
Page 33: 20110919 beyond genome

3. Tree of cells

• Draw

– Single-cell information acquisition technologies

• Map

– Single-cell metrics measurement technologies

Page 34: 20110919 beyond genome

Integrating DNA variation, molecular traits, and phenotypes to construct causal gene networks

Gene works in a network!

Page 35: 20110919 beyond genome
Page 36: 20110919 beyond genome

Finally: Where are the papers?

• On what paper you draw and map and show?– It is harder and harder to find a

platform efficient enough• Sample house

• High-throughput biology

• Capable computing system with high I/O performance

• Interlinked database and standardized formats

• Bioinformatics workflows to perform in silico analysis on data

0.1

1

10

100

1000

10000

2007 2008 2009 2010

Storage (Tb)

CPU power (TFLops)

Data generation (Tb)

Page 37: 20110919 beyond genome

Making data PUBLIC!

• Does not mean making data downloadable in theory

• Does mean the public could make use of data

• New types of databases with operations to the data are required

• New academic credit system to motivate high-quality easy-to-access datasets.

http://climb.genomics.cnhttp://www.gigasciencejournal.com

Page 38: 20110919 beyond genome

Acknowledgements

• Great International Efforts

– The Genome 10K Consortium

– The 1000 Genomes Project Consortium

– The 1000 Plant Genomes Project Consortium

– The 5000 insects Project Consortium (pending)

• BGI Initiatives and collaboration framework

– The 1000 Plant and Animal Genomes Project

– The 10K Microbial Genomes Project

http://ldl.genomics.org.cn

Page 39: 20110919 beyond genome

Acknowledgements

• Prof. Rasmus Nielson’s lab in UC Berkeley and in University of Copenhagen

• Prof. Richard Durbin’s lab in Wellcome Trust Sanger Insititute

• Prof. Tak-Wah Lam and Siu-Ming Yiu’s lab in Department of Computer Sciences, Hong Kong University

• Dr. Heng Li in Broad Insititute

• …