Top Banner
1 of 38 Data Mining in Ensembl with Data Mining in Ensembl with BioMart BioMart
39

1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

Jan 03, 2016

Download

Documents

Ashlynn McCoy
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

1 of 38

Data Mining in Ensembl with Data Mining in Ensembl with BioMartBioMart

Page 2: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

2 of 38

Simple Text-based Simple Text-based Search EngineSearch Engine

Page 3: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

3 of 38

‘‘Mouse Gene’ Gives Us ResultsMouse Gene’ Gives Us Results

Page 4: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

4 of 38

A More Complex Query is Not as A More Complex Query is Not as UsefulUseful

Page 5: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

5 of 38

BioMart- Data miningBioMart- Data mining

• BioMart is a search engine that can find multiple terms and put them into a table format.

• Such as: human gene (IDs), chromosome and base pair position

• No programming required!

Page 6: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

6 of 38

General or Specific Data-TablesGeneral or Specific Data-Tables

• All the genes for one species

• Or… only genes on one specific region of a chromosome

• Or… genes on one region of a chromosome associated with a disease

Page 7: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

7 of 38

BioMart Data SetsBioMart Data Sets

• Ensembl genes• Vega genes• SNPs

• Markers• Phenotypes• Gene expression information• Gene ontology• Homology predictions• Protein annotation

Page 8: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

8 of 38

Web InterfaceWeb Interface

With BioMart, quickly extract gene-associated information from the Ensembl databases.

Page 9: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

9 of 38

Information FlowInformation Flow

• Choose the species of interest (Dataset)

• Decide what you would like to know about the genes (Attributes)(sequences, IDs, description…)

• Decide on a smaller geneset using Filters.(enter IDs, choose a region …)

Page 10: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

10 of 38

Web InterfaceWeb Interface

Three main stages: Dataset, Attributes and Filters.

Choose the species of

interest

Choose what information

to view.

Choose the gene set using what we know.

Page 11: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

11 of 38

The First Step: Choose the DatasetThe First Step: Choose the Dataset

Homo sapiensgenes are the

default.

Page 12: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

12 of 38

The Second Step: AttributesThe Second Step: Attributes

Attributes are what we want to know about the genes.

Four output pages.

Page 13: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

13 of 38

TheThe SNP Attribute PageSNP Attribute Page

Output variation information such as SNP reference ID and alleles.

Page 14: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

14 of 38

Filters Allow Gene SelectionFilters Allow Gene Selection

Choose the gene set by region, gene ID(s), protein/domain type.

Page 15: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

15 of 38

Export Sequence or TablesExport Sequence or Tables

Genes and attributes are exported as sequence (Fasta format) or tables.

Page 16: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

16 of 38

Query:Query:

• For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI.

• In the query:

Attributes: what we want to know.

Filters: what we know

Page 17: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

17 of 38

Query:Query:

• For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI.

• In the query:

Attributes: what we want to know.

Filters: what we know

Page 18: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

18 of 38

Query:Query:

• For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI.

• In the query:

Attributes: what we want to know.

Filters: what we know

Page 19: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

19 of 38

A Brief ExampleA Brief Example

Change dataset tomouse

Mus musculus

Page 20: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

20 of 38

A Brief ExampleA Brief Example

Dataset has changed.

Page 21: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

21 of 38

Attributes (Output Options)Attributes (Output Options)

ClickAttributes.

Attributes allow us to choose what we wish to know.

IDs are found in the ‘Features’ page.

Click on ‘GENE’.

Page 22: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

22 of 38

Default options selected:Ensembl Gene ID and Transcript ID

Attributes (Output Options)Attributes (Output Options)

Ensembl Gene ID is selected

Page 23: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

23 of 38

Scroll down to select MGI symbol.Also select the accession number.

Attributes (Output Options)Attributes (Output Options)

‘Markersymbol ID’ will give us the MGI ID

Page 24: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

24 of 38

‘Results’ give us Gene IDs for all mouse genes in the Ensembl database.

The Results TableThe Results Table

Page 25: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

25 of 38

Select a Smaller Gene SetSelect a Smaller Gene Set

Select ‘Filters’

Expand the REGION panel

Instead of all mouse genes, select protein coding genes on chromosome 10.

Page 26: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

26 of 38

Select Genes on Chromosome 10Select Genes on Chromosome 10

Select chromosome

10

Instead of all mouse genes, select protein coding genes on chromosome 10.

Page 27: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

27 of 38

Select Protein Coding GenesSelect Protein Coding Genes

Filters are set to chromosome 10 and protein-coding genes. Genes must meet BOTH

criteria to be in the result table.

Gene type:protein coding

Page 28: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

28 of 38

Results (Preview)Results (Preview)

This is a preview- if you are happy with the table, click ‘Go’.

For the full result table: Go

Page 29: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

29 of 38

Full Result TableFull Result Table

Ensembl Gene IDTranscript

IDMGI

symbolMGI Accession

Number

Page 30: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

30 of 38

Original Query:Original Query:

• For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI.

• In the query:

Attributes: columns in the Result Table

Filters: what we know

Page 31: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

31 of 38

Other Export Options (Attributes)Other Export Options (Attributes)

• Sequences: UTRs, flanking sequences, cDNA and peptides, etc

• Gene IDs from Ensembl and external sources (MGI, Entrez, etc.)

• Microarray data

• Protein Functions/descriptions (Interpro, GO)

• Orthologous gene sets

• SNP/ Variation Data

Page 32: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

32 of 38

Central ServerCentral Server

www.biomart.org

Page 33: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

33 of 38

WormBase WormBase

Page 34: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

34 of 38

HapMapHapMap

Population frequencies

Inter- population comparisons

Gene annotation

Page 35: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

35 of 38

DictyBaseDictyBase

Page 36: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

36 of 38

Uniprot, MSDUniprot, MSD

Page 37: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

37 of 38

GRAMENEGRAMENE

Rice, Maize, Arabidopsis genomes…

Page 38: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

38 of 38

How to Get ThereHow to Get There

• Either www.biomart.org/biomart/martview

• Or click on ‘BioMart’ from Ensembl

Page 39: 1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

QQ&&AA

Thanks Arek KasprzykBenoît BallesterSyed HaiderRichard HollandDamian Smedley