Next Generation Sequencing & Transcriptome Analysis

Post on 03-Dec-2014

16200 Views

Category:

Education

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

How to use next generation sequencing in transcriptomics and how to analyse those data.

Transcript

NEXT GENERATION SEQUENCING

NEXT GENERATION SEQUENCING

AND HOW TO USE THE DATA GENERATED

FOR TRANSCRIPTOMICS

METHODS

METHODS

454 SEQUENCING

SOLEXA / ILLUMINA

SOLID

454 SEQUENCING

SEQUENCING BY SYNTHESIS

PYROSEQUENCING

> 400 BASEPAIRS IN A SINGLE READ

454 SEQUENCING

454 SEQUENCING

454 SEQUENCING

454 SEQUENCING

REPEATS OF SINGLE NUCLEOTIDES ARE DETECTED BY SIGNAL STRENGTH

WORKS FOR UP TO 8 CONSECUTIVE BASES

SOLEXA / ILLUMINA

AGAIN: SEQUENCING BY SYNTHESIS

ANOTHER DETECTION-APPROACH

UP TO 100 BASEPAIRS IN A SINGLE READ

SOLEXA / ILLUMINA

T

A

C

C

G

G

...

...

SOLEXA / ILLUMINA

TG

C

AT

A

C

C

G

G

...

...

SOLEXA / ILLUMINA

TG

C

AT

A

C

C

G

G

...

...

SOLEXA / ILLUMINA

TG

C

AT

A

C

C

G

G

...

...

ADVANTAGES OF NGS

CAN RUN IN PARALLEL

PREPERATION CAN BE AUTOMATED

MUCH CHEAPER WHEN COMPARED TO TRADITIONAL SEQUENCING

TRANSCRIPTOME ANALYSIS

ALLOWS FOR EXPRESSION CHANGES IN:

DIFFERENT CELL TYPES

DIFFERENT CONDITIONS OF THE ENVIRONMENT

DISEASES

DIFFERENT DEVELOPMENTAL STAGES

TRANSCRIPTOME ANALYSIS

CAN BE USED TO IDENTIFY NEW GENES

CAN BE APPLIED TO NON-MODEL ORGANISMS

HOW TO ANALYSE TRANSCRIPTOMES

TRADITIONALLY: EXPRESSED SEQUENCE TAGS (ESTS)

USING NGS: RNA-SEQ

FIRST STEP: GET THE DATA

ESTS

DONE USING SHOTGUN-SEQUENCING

TAKES CLONES OF EXPRESSED MRNA

CHEAP TO PRODUCE

RNA-SEQ

SAME PRINCIPLE:

GET AVAILABLE MRNA

THEN SEQUENCING IN PARALLEL VIA NGS

RNA-SEQ

SAME PRINCIPLE:

GET AVAILABLE MRNA

THEN SEQUENCING IN PARALLEL VIA NGS

RNA-SEQ == EST + NGS

HOW TO ANALYSE TRANSCRIPTOMES

ASSEMBLY OF READS

DETECTION OF SNPS

GENE ANNOTATION

DETECTION OF OPEN READING FRAMES

DETECTION OF HOMOLOGOUS GENES

ASSEMBLY

CAP3

MIRA

...

AVAILABLE TOOLS:

CAP3

SMITH-WATERMAN TO CLIP BAD ENDINGS

GLOBAL ALIGNMENT TO FIND FALSE OVERLAPS

MIRA

COMBINES ASSEMBLY & SNP-DETECTION

USES:

TRACE FILES

TEMPLATE INSERT INFORMATION

REDUNDANCY

MIRA

FAST READ COMPARISON TO DETECT POTENTIAL OVERLAPS

CONFIRMS OVERLAPS USING SMITH-WATERMAN AND CREATES ALIGNMENTS

ASSEMBLES READ-PAIRS BY FINDING BEST PATH

CHECKS ASSEMBLIES FOR ERRORS AND BEGINS AGAIN

MIRATHE WORKFLOW

MIRA

RESULTS:

CONSENSUS CONTIGS MADE OF READS THAT OVERLAP

SNPS THAT ARE CALLED DURING ASSEMBLY PROCESS

SNP DETECTION

TOOLS:

MIRA

QUALITYSNP

AND SOME MORE

QUALITYSNP

USES CAP3-FILES

INPUT: CLUSTERS OF POTENTIAL HAPLOTYPES

CALCULATES SIMILARITY BETWEEN SEQUENCES TO CONSTRUCT HAPLOTYPES AND REMOVES PARALOGS

QUALITYSNP

REMOVES HAPLOTYPES THAT CONSIST OF ONLY ONE SEQUENCE

DETECTS SYNONYMOUS AND NON-SYNONYMOUS SNPS

PROVIDES A WEB-FRONTEND CALLED HAPLOSNPER

HOMOLOGY DETECTION

ALLOWS TO FIND GENES THAT SHARE AN ANCESTOR

USUALLY ONE SEARCHES AGAINST A DATABASE

HOMOLOGY DETECTION

DIFFERENT KIND OF SEARCHES:

PROTEIN AGAINST PROTEIN

NUCLEOTIDE AGAINST NUCLEOTIDE

PROTEIN AGAINST NUCLEOTIDE

NUCLEOTIDE AGAINST PROTEIN

HOMOLOGY DETECTION

TOOLS:

BLAST

FASTX / FASTY

HMMER

PATTERNHUNTER

BLAST

AVAILABLE FOR ALL TYPES OF COMPARISONS

ONE OF THE OLDEST ALGORITHMS

WIDELY USED

SPEED OVER SENSITIVITY

FASTX / FASTY

PARTS OF FASTA

COMPARE NUCLEOTIDES AGAINST PROTEINS

DETERMINES A HYPOTHESIZED CODING REGION (HCR)

FASTX IS FASTER, FASTY IS MORE ACCURATE

HMMER

PROTEIN-QUERIES AGAINST PROTEIN-DATABASE

USES HIDDEN MARKOV MODELS

MAPS SMITH-WATERMAN PARAMETERS ONTO A PROBABILISTIC MODEL

IMPROVES ACCURACY

PATTERNHUNTER

NUCLEOTIDE-QUERIES AGAINST OTHER NUCLEOTIDE-SEQUENCES

USES NON-CONSECUTIVE SEEDS FOR INCREASED SENSITIVITY

COMPARES HUMAN GENOME TO MOUSE GENOME IN 20 CPU-DAYS

ORF DETECTION

READING FRAMES CAN BE DETECTED IN EST-DATA

ALLOWS TO SCREEN FOR PREVIOUSLY UNKNOWN GENES

ALLOWS TO GIVE A POTENTIAL PROTEIN SEQUENCE

ORF DETECTION

TOOLS:

ESTSCAN

ORFPREDICTOR

...

ESTSCAN

USES HIDDEN MARKOV MODELS

ROBUST FOR FRAMESHIFT ERRORS

SENSITIVE ( 5 % FN, 18 % FP)

ORFPREDICTOR

WEB-BASED

USES BLASTX AS GUIDELINE IF POSSIBLE

USES A DEFINED RULESET FOR DEFINING ORFS

ORFPREDICTOR

GENE ANNOTATION

BLAST2GO VIA GENE ONTOLOGY

FINDS HOMOLOG GENES TO ANNOTATE FUNCTIONS OF GENE OF INTEREST

GENE ONTOLOGY

3 ONTOLOGIES:

MOLECULAR FUNCTION

CELLULAR COMPONENTS

BIOLOGICAL PROCESS

CONCLUSIONS

NGS PROVIDES A FAST AND CHEAP WAY TO GENERATE DATA

TONS OF TOOLS EXIST TO ANALYSE TRANSCRIPTOME DATA

ALL TOOLS HAVE THEIR OWN PROS & CONTRAS

MOST OF THOSE TOOLS ARE UNSUITABLE FOR A „NORMAL USER“

top related