Top Banner
5/3/2015 Yannick Boursin NGS, Cancer and Bioinformatics 1
80

NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Jul 28, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

5/3/2015 Yannick Boursin

NGS, Cancer and Bioinformatics

1

Page 2: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

NGS and Clinical Oncology

• NGS in hereditary cancer genome testing• BRCA1/2 (breast/ovary cancer)

• XPC (melanoma)

• ERCC1 (colorectal cancer)

• NGS for personalized cancer treatment• Clinical trials: MOSCATO (GR), SAFIR (GR), SHIVA (Curie), …

• Ipilimumab (anti-CTLA4), Nivolumab (anti-PD1), Trastuzumab (anti-HER2), Cetuximab (anti-EGFR)

• Detection of chimeric transcripts• Chronic Myeloid Leukemia: Philadelphia chromosome (BCR/ABL)

• Non-Small-Cell Lung Cancer: EML4-ALK

5/3/2015 Yannick Boursin 2

Page 3: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

NGS and Oncology

5/3/2015 Yannick Boursin

NGS is now widely used as:• A research tool to screen a large amount of cancer samples

NGS and Oncology

18

07-09th April 2014 NGS and Bioinformatics

NGS is now widely used as:

• A research tool to screen a large amount of cancer samples

• A clinical/diagnosis tool in daily practice

These projects require dedicated bioinformatics integration project to access and analyses this huge amount of data

• A clinical/diagnosis tool in daily practice

These projects require dedicated bioinformatics integration project to access and analyses this huge amount of data.

NGS and Oncology

18

07-09th April 2014 NGS and Bioinformatics

NGS is now widely used as:

• A research tool to screen a large amount of cancer samples

• A clinical/diagnosis tool in daily practice

These projects require dedicated bioinformatics integration project to access and analyses this huge amount of data

3

Page 4: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Why do we need computers for NGS

Sequencing data size evolution Needs to address

• Store PetaBytes of data (1 PB is1000 TB).

• Share data around the world through networks

• Analyze huge amounts of data with complex algorithms

5/3/2015 Yannick Boursin 4

Page 5: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Bioinformatics and Oncology

• Problem: finding, extracting, and presenting relevantinformations.

• Partial solution: designingworkflows in order to ease data analysis.

5/3/2015 Yannick Boursin 5

Page 6: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Interdisciplinary collaboration

5/3/2015 Yannick Boursin

Bioinformatics acts as a hubs between the different fields. Trust between partners is needed, training is needed as well for efficient understanding.

6

Page 7: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Standard Workflow for NGS Analysis

5/3/2015 Yannick Boursin 7

A typical NGS workflow

Page 8: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Step 1: Quality Check and improvements

5/3/2015 Yannick Boursin 8

Page 9: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Standard Workflow for NGS Analysis

5/3/2015 Yannick Boursin 9

A typical NGS workflow

Page 10: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

NGS Data: what do they look like ?

5/3/2015 Yannick Boursin 10

A raw data file (.fastq, .sff, .fa, .csfasta/.qual)with millions of short reads of the same size (SOLiD, HiSeq) or readsof different size (Ion PGM/Proton)

Enhanced view of the reads in a fastq file

Page 11: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

FASTQ format

5/3/2015 Yannick Boursin

• 1 sequence = 1 read = 4 lines in the file

• First line = sequence identifier

11

Page 12: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

FASTQ format

5/3/2015 Yannick Boursin

• Fourth line = Quality

• ASCII encoded (Reduce the file size)

12

Page 13: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Sequence quality encoding

5/3/2015 Yannick Boursin 13

Page 14: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Why looking at sequencing quality ?

5/3/2015 Yannick Boursin

• Quality of data is very important for various downstream analyses:• Sequence assembly or mapping• Variants detection• Gene expression studies•...

• Quality of data = poor• Try to find a reason• Can we correct/improve the quality ? • May lead to erroneous conclusions

14

Page 15: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Quality controls on raw reads: which metricsto check ?

5/3/2015 Yannick Boursin

Mainly:• Quality score per base and over the reads

But also: • Read length distribution• Sequence content per base and % of GC• Kmers content• Overrepresented sequences• Duplicated reads

15

Page 16: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Quality scores

5/3/2015 Yannick Boursin

• Per base (Box Whisker type plot)-> to see wether base calls falls into low quality(commonly towards the end of a read)

• Per sequence (mean quality distribution)-> to see if a subset of your sequences have universallylow quality values

16

Page 17: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Quality scores

5/3/2015 Yannick Boursin 17

Page 18: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Quality scores

5/3/2015 Yannick Boursin 18

Page 19: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Standard Workflow for NGS Analysis

5/3/2015 Yannick Boursin 19

A typical NGS workflow

Page 20: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Reads cleaning: removing bad quality bases

• After QC, we need to remove bad quality entities.

• This is often done by scanning reads with a sliding window algorithm.

5/3/2015 Yannick Boursin 20

Read-ends trimming by a quality trimming algorithm. In red: bad quality bases. In blue: good quality bases.

Page 21: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Reads cleaning: adapters removal

5/3/2015 Yannick Boursin

• An adapter is a small piece of known DNA located at the end of the reads• Adapters roles:

• Hang read to the sequencer flowcell• Allows a specific PCR enrichment of reads having adapter • Use in multiplex sequencing (samples in mix)

• Available tools to trim adapters: • Cutadapt• Trimmomatic• RmAdapter

21

In blue: adapters. In orange: informative part of the read.

Page 22: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Standard Workflow for NGS Analysis

5/3/2015 Yannick Boursin 22

A typical NGS workflow

Page 23: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Step 2: Short Reads Alignment

5/3/2015 Yannick Boursin 23

Page 24: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Standard Workflow for NGS Analysis

5/3/2015 Yannick Boursin 24

A typical NGS workflow

Page 25: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Reads alignment - Vocabulary

5/3/2015 Yannick Boursin

Reference Genome : The reference genome is a known sequence, supposed to be as close as possible to the input genome, and whichis used as an anchor to organize the single reads information.Alignment : (mapping) The reads alignment aims at transformingthe single reads information in an organized and reduced set of information. Giving each read a genomic position.Mismatch : Incoherence between two nucleotidesGap : Bridge within the read alignment (i.e. small Insertion/deletion)Indels : Insertion/Deletion into the reference genomeMappability : Uniqueness of a region (repeated region = lowmappability, unique region = good mappability)

25

Page 26: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Reads alignment – Two strategies

5/3/2015 Yannick Boursin

The reads alignment aims at transforming the single readsinformation in an organized and reduced set of information.

Two strategies can be applied :

- De novo Reads AssemblyUsed when no reference genome are available. It aims atreconstructing long scaffolds from single reads information.

- Alignment on a Reference GenomeThe reads are directly compared to a known reference genome.

26

Page 27: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Alignment on a reference genome

5/3/2015 Yannick Boursin

The reference genome is a known sequence, supposed to be as close as possible to the input genome, and which isused as an anchor to organize the single reads information.

27

Alignment of reads against reference genome

Page 28: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Alignment on a reference genome

5/3/2015 Yannick Boursin

The reference genome is a known sequence, supposed to be as close as possible to the input genome, and which isused as an anchor to organize the single reads information.

28

Alignment of reads against reference genome

Page 29: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Alignment on a reference genome - Challenges

5/3/2015 Yannick Boursin

New alignment algorithms must address the requirements and characterics of NGS reads

– Millions of reads per run (30x of genome coverage)– Reads of different size (35bp - 200bp)– Different types of reads (single-end, paired-end, mate-pair, etc.)– Base-calling quality factors– Sequencing errors ( ~ 1%)– Repetitive regions– Sequencing organism vs. reference genome– Must adjust to evolving sequencing technologies and data formats

29

Page 30: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Alignment on a reference genome – Bioinformaticstools

5/3/2015 Yannick Boursin 30

Page 31: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Finding the best alignment - Rational

5/3/2015 Yannick Boursin

Given a reference and a set of reads, report at least one “good” local alignment for each read if one existsWhat is “good”? For now, we concentrate on: – Fewer mismatches is better

– Failing to align a low-quality base is better than failing to align a high-quality base

Based on a scoring system, i.e. score for a match (1), MM penalty (3), gap open penalty (5), gap extension penalty (2). The best alignmentis the one with the highest score.

31

Page 32: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Alignment key parameters - Repeats

5/3/2015 Yannick Boursin

Approximately 50% of the human genome is comprised of repeats

Trea

nge

nT.

J. a

nd

Sal

zber

gS.

L. 2

01

2. N

atu

re r

evie

wG

enet

ics

13

, 36

-46

32

Page 33: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Alignment key parameters - Repeats

5/3/2015 Yannick Boursin

Close proximity with genes : intergenic and intragenic positions

33

BRCA2: a mosaic of repeated regions

Page 34: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Alignment key parameters – Repeats – 3 strategies

5/3/2015 Yannick Boursin

-1- Report only unique alignment-2- Report best alignments and randomly assign readsacross equaly good loci-3- Report all (best) alignments

Treangen T.J. and Salzberg S.L. 2012. Nature review Genetics 13, 36-46

-1- Report only unique alignment -2- Report best alignments and randomly assign reads across equaly good loci -3- Report all (best) alignments

A B A B A B

-1- -2- -3-

07-09th April 2014 NGS and Bioinformatics

Alignment Key Parameters Repeats – Three strategies

60

07-09th April 2014 NGS and Bioinformatics

Treangen T.J. and Salzberg S.L. 2012. Nature review Genetics 13, 36-46

34

Page 35: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Alignment on a reference genome

5/3/2015 Yannick Boursin

Key points

• The alignment is a crucial step of the NGS analysis.

• The reference genome has to be carefully chosen.

• The mappability of the region of interest has to be takken intoaccount (primer design).

• The scoring method has to be chosen accordingly to the sequencing error rate and the quality of the raw reads.

• The alignment parameters have to be set properly.

35

Page 36: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Limitations of Alignment Tools

5/3/2015 Yannick Boursin

Even if we have now some nice tools to align reads on a reference genome, several issues are still important :

- Homopolymer mapping- Efficiently align small indels- Alignment on several genomes- Alignment on repeted sequences - ...

36

Page 37: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Alignment formats

5/3/2015 Yannick Boursin

• A lot of formats exists:

• SAM• BAM• ELAND (Illumina specific) • MAQ map• …

SAM and BAM are now the standard for aligned data

37

Page 38: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

SAM format

5/3/2015 Yannick Boursin

• SAM for Sequence Alignment Map• Tabulated text file• 1 line per read• Each line is composed of 11 fields (minimum)

38

Page 39: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

SAM format

5/3/2015 Yannick Boursin 39

Page 40: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

SAM format

5/3/2015 Yannick Boursin

• Second field can be used for quick sort of file

• With Samtools (command line) and –f et –F options • Useful webpage:

• http://broadinstitute.github.io/picard/explain-flags.html

40

Page 41: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

BAM format

5/3/2015 Yannick Boursin

• BAM for Binary Alignment/Map• Correspond to SAM format compressed as BGZF • Reduce by 5 fois the size of the alignment file• Not directly readable as SAM format• Require Samtools• Best format for alignment file sharing• Couples with an index file (BAI)• Avoid a sequential read of the complete file

41

Page 42: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Standard Workflow for NGS Analysis

5/3/2015 Yannick Boursin 42

A typical NGS workflow

Page 43: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

QC 3 : Which metric to check ?

5/3/2015 Yannick Boursin

In practice, how to validate my alignment ?

Be aware of the mapping strategy usedLook at simple descriptive statistics

– Number of aligned reads– Coverage/Depth– Mapping quality– Number of normal/abnormal pairs for paired-end

data–...

43

Page 44: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

NGS Analysis : How can I work with my NGS data ?

5/3/2015 Yannick Boursin

• Difficult on personal computer (lack of ressources)• 1 alignement = 4 processors + 15gb Ram (to multiply by the number of samples)• Impossible to open files into sofwares like text editor• Need a very large storage capacity• Data backup administration• Applications server connected to a computing cluster and storage array:

• Commercials solution (CLC Bio, NextGene, ...) • Galaxy server:

https://galaxy.gustaveroussy.fr/galaxyprod

44

Page 45: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Standard Workflow for NGS Analysis

5/3/2015 Yannick Boursin 45

A typical NGS workflow

Page 46: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

After sequencing : Data analysis

5/3/2015 Yannick Boursin

Main challenges :

• The rapid evolution of the high-throughput technologies

• The rapid evolution of the bioinformatics solutions

• The rapid evolution of the biological/medical knowledge

46

Page 47: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Data analysis

Chimerictranscript search

Alternative transcripts study

Differentialexpression study

Methylation study Detection of genomic variants

Detection of copy-number variation

5/3/2015 Yannick Boursin 47

Page 48: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Chimeric transcripts

5/3/2015 Yannick Boursin

Does the tumoral cells express any chimeric transcript ?

48

History of the bcr-abl fusion

Page 49: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Alternative transcripts

5/3/2015 Yannick Boursin 49

Page 50: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Differential expression

5/3/2015 Yannick Boursin 50

Are there genesthat would bestronglyexpressed in one kind of tumor that are not in the otherkind ?

Can we group tumorsaccording to their expression profiles ?

Clustering differential expression in breast tumours.

Page 51: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Methylome

5/3/2015 Yannick Boursin 51

Is there any difference between DNA methylation in tumors and in normal cells ?

How does methylation promotes cancer ?

Page 52: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Detection of copynumber variations

5/3/2015 Yannick Boursin 52

Are there any copy-number alteration (gain or loss of chomosomal regions, amplifications …) that could explain tumorigenesis ?

Copynumber variations in cancer. MYC and KRAS are amplified.

Page 53: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Detection of genomic variants

5/3/2015 Yannick Boursin 53

Are there mutational events that are specific to the tumoral genome ? Could the tumorigenesis be explained by those ? Is there any drug targeting those mutations ?

Pancreas adenocarcinoma: from normal cells to tumoral cells

Page 54: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Limitations: Detection of genomic variants

5/3/2015 Yannick Boursin 54

Between 1.4 and 8.9 % of the variants are technology specific

Page 55: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Limitations: Detection of genomic variants

5/3/2015 Yannick Boursin 55

Common genomic variants between different variant callers

Page 56: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Conclusion

• Nowadays, NGS is widely used in cancer centers in order to categorize cancers and link patients with personalized treatments (Precision Medicine)

• NGS is also used in cancer research, in order to discover new oncogenetic mechanisms, to understand the way a treatment works, to link biological and genetical characters …

• Due to technical issues using NGS might not answer your questions. It is important to know that the technique is limited:

• A) by the question you asked at first. If a cancer cannot be explained by mutational events, it might be explained by other mechanisms. But still, sometimes, nothing is to be found in data.

• B) by technical issues. Sequencers and softwares are prone to errors. Statistically, there will be at least one error for any analysis. You can often limit the effects of this limitations by making biological and technical replicates.

5/3/2015 Yannick Boursin 56

Page 57: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

5/3/2015 Yannick Boursin 57

Page 58: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Paired-end mapping

5/3/2015 Yannick Boursin

• Insert-size checking

• % of "All Good"= both reads in the pair have aligned• "the pair is properly aligned" meaning that they mapped within a

proper distance from each other• % of "All Bad" = neither the read nor its mate mapped• % of Only one read maps = only one read in a pair is mapped

58

Page 59: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Alignment key parameters – Using single or paired-end reads ?

5/3/2015 Yannick Boursin

The type of sequencing (i.e. single or paired-end reads) is oftendriven by the application.Exemple : Finding large indels, genomic rearrangements, ...However, in most of the case, the pair information can improve the mapping specificity- Single-end alignment – repeated sequence

- Paired-end alignment – unique sequence

59

Alignment of reads against reference genome

Page 60: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

NGS Toolkit : SAMtools

5/3/2015 Yannick Boursin

http://samtools.sourceforge.net/samtools.shtmlInteracting with SAM/BAM format

SAMTools provides the following commands :view : tansform and filter SAM or BAM datasort : sort a BAM file per genomic location or nameindex : creates a new index file that allows fast look-up of

data in a (sorted) SAM or BAMmpileup : SNVs/indel detectionrmdup : remove duplicated readsflagstat : compute statistics on the SAM/BAM file ...

60

Page 61: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

NGS Toolkit : BEDTools

5/3/2015 Yannick Boursin

http://code.google.com/p/bedtools/

• Address common genomics tasks such as finding feature overlaps and computing coverage.• Can manage BED, GFF/GTF, VCF and SAM/BAM• Unix-like command• Fast• All intersections or annotations tasks can be done with BEDTools

Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparinggenomic features. Bioinformatics. 26, 6, pp. 841–842.

61

Page 62: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

How to visualise data ?

5/3/2015 Yannick Boursin

IGV : Integrative Genome Viewer

http://www.broadinstitute.org/igv/

JAVA application (local version) Annotation available on the Broad server Batch command lineSupport a lot of different file formats (Variants visualization)Easy to useLimited in term of annotations

Screencast: How to use IGV (french) https://www.youtube.com/watch?v=Wx3zHYK0cNg,

62

Page 63: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

How to visualise data ?

5/3/2015 Yannick Boursin

UCSC Genome Browser

http://genome.ucsc.edu/

Hundred of annotation data Hundred of public (ENCODE) profils Tables functionsFully online (session) Can be difficult to upload big data files (new format: bigBED, bigWIG, etc.)

Screencast: How to use the UCSC genome browser (french) https://www.youtube.com/watch?v=VPeoeJebdFM,

63

Page 64: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Sequence length distribution

5/3/2015 Yannick Boursin

• Sequencers generates:• either sequence fragments of uniform length• or reads of wildly varying lengths.

Helps to identify and remove reads with abnormal length.

64

Page 65: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Sequence length distribution

5/3/2015 Yannick Boursin 65

Page 66: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Sequence content

5/3/2015 Yannick Boursin

• Proportion of each base position for which A,C, G, T has been called

• GC content of each base position-> in random librairies = a little to no difference betweenthe different bases

• N content per base-> If a sequencer is unable to make a base call withsufficient confidence

66

Page 67: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Sequence content

5/3/2015 Yannick Boursin 67

Page 68: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Over-represented sequences

5/3/2015 Yannick Boursin

• The sequences that are highly duplicated in your library, as well as any primer and/or adapter dimers that werepresent in the original library.

• Run A:• Sequence:

GACTCGGCAGCATCTCCATCCAAACTTTTCATTTCTGCTTTTAAAGGAAA • Count: 37• Pourcentage 0.1%

68

Page 69: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Duplicate reads

5/3/2015 Yannick Boursin

• Different reads which have the same sequence• A duplicate could be PCR effect or reading same

fragment twice or come from enrichment• Reads which align to the identical location on the

reference

• Remove duplicates ? It depends of the application. • Exemple: for targeted sequencing, you do not

want duplicates to be removed

69

Page 70: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Duplicate reads

5/3/2015 Yannick Boursin 70

Page 71: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Data analysis

Motif search Chimeric transcriptsearch

Microbiota study Alternatetranscript search

Differentialexpression study

Methylation study Detection of genomic variants

Detection of copy-number variation

5/3/2015 Yannick Boursin 71

Page 72: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Motif search

5/3/2015 Yannick Boursin 72

How does my protein interacts with DNA ?

Page 73: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Chimeric transcripts

5/3/2015 Yannick Boursin

Are there any chimeric transcripts ?

73

Page 74: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Microbiota

5/3/2015 Yannick Boursin 74

What kind of species grows in the human gut ? Could thosespecies beassociated withtumorigenesis ?

Page 75: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Alternative transcripts

5/3/2015 Yannick Boursin 75

Are there any differences between normal cell and tumoral cells regardingsplicing events ?

Page 76: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Differential expression

5/3/2015 Yannick Boursin 76

Are there genesthat would bestronglyexpressed in one kind of tumor that are not in the otherkind ?

Can we group tumorsaccording to their expression profiles ?

Page 77: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Methylome

5/3/2015 Yannick Boursin 77

Is there anydifferencebetween DNA methylation in tumors and in normal cells ?

Page 78: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Detection of copynumber variations

5/3/2015 Yannick Boursin 78

Are there any copy-number alteration (gain or loss of chomosomal regions, amplifications …) that could explain tumorigenesis ?

Page 79: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Detection of genomic variants

5/3/2015 Yannick Boursin 79

Are there mutational events that are specific to the tumoral genome ? Could the tumorigenesis be subrogated to that ?

Page 80: NGS, Cancer and Bioinformaticsrssf.i2bc.paris-saclay.fr/transfert/IFSBM/IFSBM_intro_NGS-YB.pdf · NGS and Oncology 5/3/2015 Yannick Boursin NGS is now widely used as: • A research

Quality controls on raw reads : lets start aftersequencing

5/3/2015 Yannick Boursin 80

A raw read is characterized by three parameters:• Its length• Its sequence• Per-base-in-sequence quality

Raw reads