Top Banner
www.cmmt.ubc.ca JASPAR BioPython & MANTA Anthony Mathelier, David Arenillas & Wyeth Wasserman [email protected] & [email protected] Wasserman Lab
34
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Webinar about JASPAR BioPython module and MANTA.

www.cmmt.ubc.ca

JASPAR BioPython & MANTA

Anthony Mathelier, David Arenillas & Wyeth Wasserman

[email protected] & [email protected]

Wasserman Lab

Page 2: Webinar about JASPAR BioPython module and MANTA.

2 2

Outline

● JASPAR BioPython module– What is JASPAR?– How to construct matrices from JASPAR files using

the JASPAR BioPython module.

● MANTA– What is stored in MANTA?– How to interrogate the MANTA DB using Python and

our web application.

Page 3: Webinar about JASPAR BioPython module and MANTA.

3 3

http://jaspar.genereg.net

Mathelier et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 2014 PMID 24194598

Page 4: Webinar about JASPAR BioPython module and MANTA.

4 4

Modelling Transcription Factor Binding Sites (TFBS)

A [ 1 0 19 20 18 1 20 7 ]C [ 1 0 1 0 1 18 0 2 ]G [17 0 0 0 1 0 0 3 ]T [ 1 20 0 0 0 1 0 8 ]

Example: FOXD1PFM – Position Frequency Matrix

Logo

gctaaGTAACAATgcgcacttaaGTAAACATcgctcccaatGTAAACAAacggagaaagGTAAACAAtgggc GTAAACATgtactcttgtGTAAACAAaaagccttaaGTAAACACgtccgcttatGTCAACAGtgggt tGTAAACATtgcat GTAAACAAtgcgacttagGTAAACATtttcgTTAAGTAAaca caaaATAAACAAcgtgcgctaaCTAAACAGagagagtgttGTAAACATtggaa taatGTAAACAAtgcgggaaagGTAAACATaagaacctaaGTAAACACaacgccctaaGTAAACATtcttatGTAAACAGaggtc

Known binding sites

Page 5: Webinar about JASPAR BioPython module and MANTA.

5 5

Scoring putative TFBS sequences

A  [ 1  0 19 20 18  1 20  7 ]C  [ 1  0  1  0  1 18  0  2 ]G  [17  0  0  0  1  0  0  3 ]T  [ 1 20  0  0  0  1  0  8 ]

A  [­1.5 ­2.5  1.7  1.8  1.6 ­1.5  1.8  0.4 ]C  [­1.5 ­2.5 ­1.5 ­2.5 ­1.5  1.6 ­2.5 ­1.0 ]G  [ 1.6 ­2.5 ­2.5 ­2.5 ­1.5 ­2.5 ­2.5 ­0.6 ]T  [­1.5  1.8 ­2.5 ­2.5 ­2.5 ­1.5 ­2.5  0.6 ]

A C G A G T T A A A C A A G C T AA  [­1.5 ­2.5  1.7  1.8  1.6 ­1.5  1.8  0.4 ]C  [­1.5 ­2.5 ­1.5 ­2.5 ­1.5  1.6 ­2.5 ­1.0 ]G  [ 1.6 ­2.5 ­2.5 ­2.5 ­1.5 ­2.5 ­2.5 ­0.6 ]T  [­1.5  1.8 ­2.5 ­2.5 ­2.5 ­1.5 ­2.5  0.6 ]

Score = 9.2

PFM PWM – Position Weight Matrix

PWM Sum score at each position

(aka PSSM – Position Specific Scoring Matrix)

Page 6: Webinar about JASPAR BioPython module and MANTA.

6 6

Overview of the JASPAR 2014 database

Page 7: Webinar about JASPAR BioPython module and MANTA.

7 7

JASPAR Biopython modules

➢ Bio.motifs.jaspar

➢ Read / write motifs encoded in the JASPAR flat file formats: sites, PFM and jaspar

➢ Bio.motifs.jaspar.db

➢ Search / fetch motifs from a JASPAR formatted database.

http://biopython.org*

*Cock et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009 Jun 1;25(11):1422-3. PMID: 19304878

Extend Biopython's Bio.motifs module to support construction of TFBS matrices from JASPAR supported formats.

Page 8: Webinar about JASPAR BioPython module and MANTA.

8 8

Constructing a matrix from a JASPAR sites formatted file

The JASPAR sites format consists of a list of known binding sites for a motif.

Page 9: Webinar about JASPAR BioPython module and MANTA.

9 9

Constructing a matrix from a JASPAR pfm formatted file

The JASPAR pfm format simply describes a frequency matrix for a single motif.

Page 10: Webinar about JASPAR BioPython module and MANTA.

10 10

Constructing matrices from a JASPAR jaspar formatted file

Note the use of the parse rather than the read method to read multiple motifs.

The JASPAR jaspar format allows for multiple motifs. Each record consists of a header line followed by four lines defining the frequency matrix.

Page 11: Webinar about JASPAR BioPython module and MANTA.

11 11

Constructing matrices from a JASPAR jaspar formatted file cont'd

The frequency portions of the file can be specified in a simpler format identical to the pfm format.

Page 12: Webinar about JASPAR BioPython module and MANTA.

12 12

The JASPAR DB module

Connect to a JASPAR database:

Modelled after the Perl TFBS modules*.

Specifically, the Bio.motifs.jaspar.db.JASPAR5 BioPython class is modelled after the TFBS::DB::JASPAR5 perl class.

Fetch a specific motif by it's JASPAR ID:

* Lenhard et al. TFBS: Computational framework for transcription factor binding site analysis. Bioinformatics. 2002 PMID 12176838

Page 13: Webinar about JASPAR BioPython module and MANTA.

13 13

JASPAR DB module cont'dFetch multiple motifs according to various attributes.

Example: fetch the motifs of all the vertebrate and insect transcription factors from the CORE JASPAR collection which are part of the Forkhead family and which have an information content of at least 12 bits:

Note that selection criteria (such a 'tax_group' and 'tf_family') which allow multiple values may be specified either as a single value or as a list of values.

Page 14: Webinar about JASPAR BioPython module and MANTA.

14 14

For more information...

For an overview and examples of using these modules, please see the JASPAR sub-section under the “Reading motifs” section of the BioPython Tutorial and Cookbook: http://biopython.org/DIST/docs/tutorial/Tutorial.html

For more technical information see the Bio.motifs.jaspar section of the BioPython API docs: http://biopython.org/DIST/docs/api

Page 15: Webinar about JASPAR BioPython module and MANTA.

15 15

MANTA

MongoDB for Analysis of TFBS Alteration

Mathelier et al. Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas. Genome Biology. 2015. PMID 25903198

Page 16: Webinar about JASPAR BioPython module and MANTA.

16 16

MANTA

DB

...gctaaGTAACAATgcgca...

...cttaaGTAAACATcgctc...

...ccaatGTAAACAAacgga...

Adapted from Szalkowski and Schmid (2010). Briefings in Bioinfomatics.

Page 17: Webinar about JASPAR BioPython module and MANTA.

17 17

MANTA Statistics

ChIP-seq experiments 477

Transcription factors 103

TFBSs 9,510,336

Unique bases covered 76,160,599 (~2.25% of the human genome)

Page 18: Webinar about JASPAR BioPython module and MANTA.

AMIA TBI&CRI March 19th-23rd, 2012 18

18

Variations may impact TF binding

TF

Binding sequence

Mutated binding sequence

Transcription initiated

Transcription fails to initiate

TF recognizes binding site

TF fails to recognize binding site

Exon

Exon

5’ UTR

5’ UTR

AGCTAGCTATATTTAAACAACACTGTCTAGCATTGCCTGATAGATGAGCCGTCGCAGCTGGA

AGCTAGCTATATTTAATCCACACTGTCTAGCATTGCCTGATAGATGAGCCGTCGCAGCTGGA

TFTF

Page 19: Webinar about JASPAR BioPython module and MANTA.

19 19

DNATFBS

Assessing the impact of variations on TF binding

Page 20: Webinar about JASPAR BioPython module and MANTA.

20 20

DNASNV

Assessing the impact of variations on TF binding

Page 21: Webinar about JASPAR BioPython module and MANTA.

21 21

DNASNV

Assessing the impact of variations on TF binding

Page 22: Webinar about JASPAR BioPython module and MANTA.

22 22

DNASNV

Assessing the impact of variations on TF binding

Page 23: Webinar about JASPAR BioPython module and MANTA.

23 23

DNASNV

Assessing the impact of variations on TF binding

Page 24: Webinar about JASPAR BioPython module and MANTA.

24 24

DNASNV

Assessing the impact of variations on TF binding

Page 25: Webinar about JASPAR BioPython module and MANTA.

25 25

DNASNV

Record best TFBS hit with the mutated sequence

Assessing the impact of variations on TF binding

Page 26: Webinar about JASPAR BioPython module and MANTA.

26 26

DNATFBS

0.80 0.85 0.90 0.95 1.00 1.05 1.10

01

23

45

67

alt/ref

Density

Assessing the impact of variations on TF binding

Page 27: Webinar about JASPAR BioPython module and MANTA.

27 27

DNASNV

0.80 0.85 0.90 0.95 1.00 1.05 1.10

01

23

45

67

alt/ref

Density

Alternative

Assessing the impact of variations on TF binding

Page 28: Webinar about JASPAR BioPython module and MANTA.

28 28

Example of Application of MANTA

Mathelier et al. Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas. Genome Biology. 2015. PMID

Page 29: Webinar about JASPAR BioPython module and MANTA.

29 29

The MANTA Database

Implemented with MongoDB (http://www.mongodb.org)

Consists of 3 collections:

Experiments

- experiment name, type, TF name, JASPAR matrix ID, etc.

Peaks

- peak position (chromosome, start, end), score, position of maximum peak height, etc.

TFBSs / SNVs

- position (chromosome, start, end), strand, score for the unmutated TFBS plus similar information and impact score for each position / alt. allele mutation.

Page 30: Webinar about JASPAR BioPython module and MANTA.

30 30

MANTA DB with Python

Example: connect to MANTA DB and fetch all TFBS affected by an SNV at position 6425005 on chromosome 19.

Page 31: Webinar about JASPAR BioPython module and MANTA.

31 31

MANTA Web Interface

URL: http://manta.cmmt.ubc.ca/manta

Source code: https://github.com/wassermanlab/MANTA

Page 32: Webinar about JASPAR BioPython module and MANTA.

32 32

Page 33: Webinar about JASPAR BioPython module and MANTA.

33 33

Page 34: Webinar about JASPAR BioPython module and MANTA.

34 34

Thanks!

Any questions?

Contacts:Anthony Mathelier, [email protected] Arenillas, [email protected]

URLs:Wasserman Lab: www.cisreg.caBioPython: http://biopython.orgMANTA: manta.cmmt.ubc.ca/manta