Top Banner
Integration and visualization of genome-wide data
94

Integration and visualization of genome-wide data

Jan 01, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Integration and visualization of genome-wide data

Integration and visualization of genome-wide data

Page 2: Integration and visualization of genome-wide data

Data integration and visualization

http://genomeview.org/

http://www.broadinstitute.org/igv/

Stand-alone genome browsers

http://genome.ucsc.edu/

http://www.ensembl.org/

http://www.ncbi.nlm.nih.gov/mapview/

Web-based genome browsers

Page 3: Integration and visualization of genome-wide data

Web-based Genome Browsers

• Software designed to enable a user to access and display genome sequence data

• Visual integration and correlation of different types of information

• Organize large amounts of genome sequence data

Page 4: Integration and visualization of genome-wide data

Web-based Genome Browsers

• UCSC, Ensembl and NCBI are based on the same reference genome

• The three genome browser mainly change at interface level and annotations available.

• Some genomes are available in one genome browser but not the other.

Page 5: Integration and visualization of genome-wide data

UCSC genome browser http://genome.ucsc.edu/

88 species annotated in UCSC genome browser •MAMMALS •VERTEBRATES •DEUTEROSTOMES •NEMATODES •OTHER (Sea Hare, Yeast)

NO plants

Page 6: Integration and visualization of genome-wide data

UCSC genome browser

• Allows aligning to genome sequence via BLAT

• Table Browser

• Creation of PDF

• Provides access to all the data produced by the project, and to the software used to analyze and present it

• Site produces and maintains annotation tracks

http://genome.ucsc.edu/

Page 7: Integration and visualization of genome-wide data

Annotation tracks

• Genomic data: known genes, predicted genes, ESTs, mRNAs, CpG islands, assembly gaps and coverage, chromosomal bands, homologies, Chip-Seq data, DNAse-Seq data, Expression data, …

• Annotation tracks are both computed at UCSC from publicly available sequence data and provided by collaborators

• Users can also add their own custom tracks to the browser

Page 8: Integration and visualization of genome-wide data

UCSC genome browser search

Page 9: Integration and visualization of genome-wide data

UCSC genome browser search

Page 10: Integration and visualization of genome-wide data

UCSC genome browser layout http://genome.ucsc.edu/

Current chromosomal location, navigation and search bar.

Page 11: Integration and visualization of genome-wide data

UCSC genome browser http://genome.ucsc.edu/

Graphical visualization of the chromosome and current view location

Page 12: Integration and visualization of genome-wide data

UCSC genome browser http://genome.ucsc.edu/

Coordinates on reference genome

Page 13: Integration and visualization of genome-wide data

UCSC genome browser layout http://genome.ucsc.edu/

Annotation tracks

Page 14: Integration and visualization of genome-wide data

UCSC genome browser layout http://genome.ucsc.edu/

Annotation tracks

Genes

SNPs

H3K27Ac Epig. marker

ChIP-Seq data (TF binding sites)

Conservation of sequence in mammals (PhyloP)

Multiple alignments

Repeated sequences

Page 15: Integration and visualization of genome-wide data

UCSC genome browser layout http://genome.ucsc.edu/

GENES

Page 16: Integration and visualization of genome-wide data

UCSC genome browser layout http://genome.ucsc.edu/

GENES

H3K27Ac

Page 17: Integration and visualization of genome-wide data

UCSC genome browser layout http://genome.ucsc.edu/

GENES

H3K27Ac SNPs

Page 18: Integration and visualization of genome-wide data

Tracks available

Below the main view of UCSC genome browser many more tracks are available. Tracks are grouped in: •Mapping and sequencing tracks •Phenotype and Disease Associations •Genes and Gene Prediction Tracks •Literature •mRNA and EST Tracks •Expression •Regulation •Comparative Genomics •Neandertal Assembly and Analysis •Denisova Assembly and Analysis •Variation and Repeats

Tracks group

Different visualization options: -Hide -Dense -Full -Squish -Pack

Page 19: Integration and visualization of genome-wide data

Example search: BRCA1

Page 20: Integration and visualization of genome-wide data

Example search: BRCA1

Transcripts isoforms

Page 21: Integration and visualization of genome-wide data

Example search: BRCA1

Page 22: Integration and visualization of genome-wide data

Example search: BRCA1

Activate Catalogue Of Somatic Mutations In Cancer (COSMIC) track

Page 23: Integration and visualization of genome-wide data

Example search: BRCA1

Activate Catalogue Of Somatic Mutations In Cancer (COSMIC) track

Page 24: Integration and visualization of genome-wide data

Example search: BRCA1

Activate Catalogue Of Somatic Mutations In Cancer (COSMIC) track

Page 25: Integration and visualization of genome-wide data

Get DNA for By clicking on a feature (for example a gene) with the right button it’s possible to get access to the feature menu. By clicking on Get DNA for gene name it’s possible to download the gene sequence

Page 26: Integration and visualization of genome-wide data

Get DNA for

Page 27: Integration and visualization of genome-wide data

Get DNA for

Page 28: Integration and visualization of genome-wide data

Get DNA for

Page 29: Integration and visualization of genome-wide data

Get DNA for

Page 30: Integration and visualization of genome-wide data

Display of custom tracks

Page 31: Integration and visualization of genome-wide data

Display of custom tracks

Paste here your own track data!!

Page 32: Integration and visualization of genome-wide data

Tracks formats

• BED: flexible format to define data lines that are displayed in an annotation track.

• GTF: gene transfer format, generally used to display gene annotation data

• WIG: allows to display countinuos valued data • BAM: standard alignment format for NGS

sequence aligners • VCF: variants call format, used to display

sequence variants • Many more…

Page 33: Integration and visualization of genome-wide data

BED file format

Name Description

chrom* The name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671).

chromStart* The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0.

chromEnd* The ending position of the feature in the chromosome or scaffold.

name Defines the name of the BED line.

score A score between 0 and 1000.

strand Defines the strand - either '+' or '-'.

thickStart The starting position at which the feature is drawn thickly .

thickEnd The ending position at which the feature is drawn thickly

itemRgb An RGB value of the form R,G,B (e.g. 255,0,0).

blockCount The number of blocks (exons) in the BED line.

blockSizes A comma-separated list of the block sizes. The number of items in this list should correspond to blockCount.

BlockStarts A comma-separated list of block starts.

track name=junctions description="TopHat junctions" test_chromosome 180 402 JUNC00000001 46 + 180 402 255,0,0 2 70,52 0,170 test_chromosome 349 550 JUNC00000002 38 + 349 550 255,0,0 2 51,50 0,151

33 * Required fields

Flexible format, easy to adapt from blast and blat outputs…

Page 34: Integration and visualization of genome-wide data

BED file example

chr7 127471196 127472363 Pos1 0 +

chr7 127472363 127473530 Pos2 100 +

chr7 127473530 127474697 Pos3 100 +

chr7 127474697 127475864 Pos4 1000 +

chr7 127475864 127477031 Neg1 1000 -

chr7 127477031 127478198 Neg2 0 -

chr7 127478198 127479365 Neg3 0 -

chr7 127479365 127480532 Pos5 0 +

chr7 127480532 127481699 Neg4 0 -

Page 35: Integration and visualization of genome-wide data

Display of custom tracks

Page 36: Integration and visualization of genome-wide data

Display of custom tracks

Page 37: Integration and visualization of genome-wide data

Display of custom tracks

Page 38: Integration and visualization of genome-wide data

Display of custom tracks Cliccando con il tasto destro sulla traccia posso modificare come viene mostrata:

Page 39: Integration and visualization of genome-wide data

Display of custom tracks Se seleziono “full”

Page 40: Integration and visualization of genome-wide data

Display of custom tracks By clicking on the name of the track it’s possible to configure how the track is displayed

Page 41: Integration and visualization of genome-wide data

Display of custom tracks

Page 42: Integration and visualization of genome-wide data

Display of custom tracks

Posso ad esempio cambiare il nome della traccia e la descrizione ma anche colorare la traccia in maniera differente a seconda dello strand

track name='my data' description='this is an example track' colorByStrand="255,0,0 0,0,255"

Page 43: Integration and visualization of genome-wide data

Display of custom tracks

chr7 127471196 127472363 Pos1 0 +

chr7 127472363 127473530 Pos2 100 +

chr7 127473530 127474697 Pos3 100 +

chr7 127474697 127475864 Pos4 1000 +

chr7 127475864 127477031 Neg1 1000 -

chr7 127477031 127478198 Neg2 0 -

chr7 127478198 127479365 Neg3 0 -

chr7 127479365 127480532 Pos5 0 +

chr7 127480532 127481699 Neg4 0 -

Page 44: Integration and visualization of genome-wide data

Display of custom tracks

track name='my data' description='this is an example track' useScore=1

By using the useScore=1 option it’s possible to show the tracks in shades of gray depending on the score column value

Page 45: Integration and visualization of genome-wide data

Display of custom tracks

chr7 127471196 127472363 Pos1 0 +

chr7 127472363 127473530 Pos2 100 +

chr7 127473530 127474697 Pos3 100 +

chr7 127474697 127475864 Pos4 1000 +

chr7 127475864 127477031 Neg1 1000 -

chr7 127477031 127478198 Neg2 0 -

chr7 127478198 127479365 Neg3 0 -

chr7 127479365 127480532 Pos5 0 +

chr7 127480532 127481699 Neg4 0 -

Page 46: Integration and visualization of genome-wide data

UCSC Genome Browser Tools

Page 47: Integration and visualization of genome-wide data

UCSC Genome Browser Tools

Page 48: Integration and visualization of genome-wide data

UCSC Genome Browser Tools

Page 49: Integration and visualization of genome-wide data

UCSC Genome Browser Tools

Page 50: Integration and visualization of genome-wide data

Extracting informations with the Table Browser

All the data displayed on UCSC genome browser is contained in a public MySQL server (relational database) which can be: • accessed directly with standard mysql command • or can be accessed using the Table browser

Useful to retrieve data associated with a track in text format, to calculate intersections between tracks, and to retrieve DNA sequence covered by a track.

Page 51: Integration and visualization of genome-wide data

Extracting informations with the Table Browser

Clade Genoma Versione del genoma

Page 52: Integration and visualization of genome-wide data

Extracting informations with the Table Browser

Page 53: Integration and visualization of genome-wide data

Extracting informations with the Table Browser

Track type

Page 54: Integration and visualization of genome-wide data

Extracting informations with the Table Browser

Track

Page 55: Integration and visualization of genome-wide data

Extracting informations with the Table Browser

Track

Page 56: Integration and visualization of genome-wide data

Extracting informations with the Table Browser

Page 57: Integration and visualization of genome-wide data

Extracting informations with the Table Browser

Descrizione dei dati contenuti nella tabella

Page 58: Integration and visualization of genome-wide data

Extracting informations with the Table Browser

Filtering

Page 59: Integration and visualization of genome-wide data

Extracting informations with the Table Browser

Filtering

Page 60: Integration and visualization of genome-wide data

Extracting informations with the Table Browser

Page 61: Integration and visualization of genome-wide data

Extracting informations with the Table Browser

Page 62: Integration and visualization of genome-wide data

Extracting informations with the Table Browser

Page 63: Integration and visualization of genome-wide data

Extracting informations with the Table Browser

Allows to intersect the selected track with another track.

Page 64: Integration and visualization of genome-wide data

Extracting informations with the Table Browser

Select output format

Page 65: Integration and visualization of genome-wide data

Extracting informations with the Table Browser

Select which features to save into the BED file

Page 66: Integration and visualization of genome-wide data

Extracting informations with the Table Browser

Page 67: Integration and visualization of genome-wide data

Ensembl http://www.ensembl.org

71 species in Ensembl database

Includes automatic annotations on eukariotes made by Ensembl

Page 68: Integration and visualization of genome-wide data

EnsemblGenomes http://www.ensemblgenomes.org/

Extension to standard Ensembl: •Ensembl Bacteria •Ensembl Fungi •Ensembl Metazoa •Ensembl Plants •Ensembl Protists

Page 69: Integration and visualization of genome-wide data

Accessing data in Ensembl

Page 70: Integration and visualization of genome-wide data

Search Ensembl data

Page 71: Integration and visualization of genome-wide data

Search Ensembl data

Page 72: Integration and visualization of genome-wide data

Search Ensembl data

Page 73: Integration and visualization of genome-wide data

Search Ensembl data

Report on transcripts encoded by gene ENSG00000012048

Page 74: Integration and visualization of genome-wide data

Search Ensembl data

Visualization of the region on a genome browser

Page 75: Integration and visualization of genome-wide data

Search Ensembl data

Variations in ENSG00000012048

Page 76: Integration and visualization of genome-wide data

Search Ensembl data

Page 77: Integration and visualization of genome-wide data

Ensembl genome browser layout

Chromosome view

Region view

Page 78: Integration and visualization of genome-wide data

Ensembl genome browser layout

Chromosome view

Region view

By clicking on a gene name more informations and links are shown

Page 79: Integration and visualization of genome-wide data

Ensembl genome browser layout

Tracks

Page 80: Integration and visualization of genome-wide data

Configuring displayed tracks

By clicking on “Configure this page” button it’s possible to configure the tracks shown

Page 81: Integration and visualization of genome-wide data

Configuring displayed tracks

Click on a leaf on the tree to select a group

Page 82: Integration and visualization of genome-wide data

Configuring displayed tracks

Configure the track by clicking on the tick box

Page 83: Integration and visualization of genome-wide data

Configuring displayed tracks

Save and close

Page 84: Integration and visualization of genome-wide data

Configuring displayed tracks

A new track has been added to the view

Page 85: Integration and visualization of genome-wide data

Phylogenetic trees

Page 86: Integration and visualization of genome-wide data

Phylogenetic trees

Page 87: Integration and visualization of genome-wide data

NCBI Map Viewer

NCBI is the source of data also for Ensembl and UCSC Genomes available: •28 Vertebrates •17 Invertebrates •19 Protozoa •118 Plants

http://www.ncbi.nlm.nih.gov/mapview/

Page 88: Integration and visualization of genome-wide data

NCBI Map Viewer layout Chromosome selection

Chromosome ideogram Unigene clusters Unigene clusters

Zoom

“Navigation” Ideogram

Page 89: Integration and visualization of genome-wide data
Page 90: Integration and visualization of genome-wide data

Configuring the view

Page 91: Integration and visualization of genome-wide data

Configuring the view

Page 92: Integration and visualization of genome-wide data

Configuring the view

Page 93: Integration and visualization of genome-wide data

Stand-alone Genome Browsers Integrative Genomics Viewer

-Stand-alone (Java) -Easily configurable -Useful when you want to work on your own data -No need to upload the data on public servers -Memory limited to java instance memory-hungry if many tracks are loaded.

http://www.broadinstitute.org/igv/

Page 94: Integration and visualization of genome-wide data

Integrative Genomics Viewer

Chromosome location

Conservation data

1000 genomes data

Alignments

Coverage

Gene annotations

dbSNP

More tracks can be loaded (expression, methylation, GC content, …)