Top Banner
The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘08 Genome Research 2008
19

The AMADEUS Motif Discovery Platform

Feb 11, 2016

Download

Documents

arva

The AMADEUS Motif Discovery Platform. C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University. Genome Research 2008. ApoSys workshop May ‘ 08. TF. TF. 5 ’. 3 ’. Gene. BS. BS. Promoter Analysis: Exteremely brief intro. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The AMADEUS Motif Discovery Platform

The

AMADEUSMotif Discovery Platform

C. Linhart, Y. Halperin, R. ShamirTel-Aviv University

ApoSys workshop May ‘08Genome Research 2008

Page 2: The AMADEUS Motif Discovery Platform

• Transcription is regulated primarily by transcription factors (TFs) – proteins that bind to DNA subsequences, called binding sites (BSs)

• TFBSs are located mainly (not always!) in the gene’s promoter – the DNA sequence upstream the gene’s transcription start site (TSS)

• TFs can promote or repress transcription

Promoter Analysis:Exteremely brief intro

TFTFGene5’ 3’

BSBSTSS

Page 3: The AMADEUS Motif Discovery Platform

• The BSs of a particular TF share a common pattern, or motif, which is often modeled using:– Consensus string

TASDAC (S={C,G} D={A,G,T})– Position weight matrix (PWM / PSSM)

Promoter Analysis (cont.)TFBS models

A0.10.800.70.20C00.10.50.10.40.6G000.50.10.40.1T0.90.100.100.3

> Threshold = 0.01:

TACACC (0.06)TAGAGC (0.06)TACAAT (0.015)…

Page 4: The AMADEUS Motif Discovery Platform

Promoter Analysis (cont.): Typical pipeline

Cluster I

Cluster II

Cluster III

Gene expressionmicroarrays

Clustering

Location analysis(ChIP-chip, …)

Functional group(e.g., GO term)

Promotersequences

Motifdiscovery

Co-regulated gene set

Page 5: The AMADEUS Motif Discovery Platform

Reverse-engineer the transcriptional regulatory network = find the TFs (and their BSs) that regulate the studied biological processInput: A set of co-expressed genesOutput: “Interesting” motif(s):

1. Known motifs: PRIMA, ROVER, …

2. Novel motifs: MEME, AlignACE, …

3. A group of co-occurring motifs = cis-regulatory module (CRM):

MITRA, CREME, …

Promoter Analysis (cont.): Goals

AMADEUS

Page 6: The AMADEUS Motif Discovery Platform

• Extant tools perform reasonably well for:– Finding known/novel motifs in organisms with short,

simple promoters, e.g., yeast– Identifying some of the known motifs in complex

species, e.g., TFs whose BSs are usually close to the TSS• … but often fail in other cases!• Each tool is custom-built for a specific target score, often

parametric (i.e., assumes a BG model) or uses a small part of the genome as BG reference;Majority of tools can efficiently handle only dozens of genes

• Comparison of tools: [Tompa et al. ’05]

Promoter Analysis: Status of motif discovery tools

Page 7: The AMADEUS Motif Discovery Platform

AMADEUSA Motif Algorithm for DetectingEnrichment in mUltiple Species

• Research platform:• Extensible: add new algs, scores, motif models• Flexible: control params, algs, scores of execution

• Experimental tool:• Sensitive: find subtle signals • Efficient: analyze many long sequences• Informative: show lots of info on motifs • User-friendly: nice GUI

Page 8: The AMADEUS Motif Discovery Platform

Main features: I/OInput:

• Type: target set / expression data• Multiple species / target-sets• Sequence region (promoter, 1st intron, 3’ UTR, …)

Output:• Non-redundant set of motifs• Rich info per output motif:

1. Graphical motif logo2. Multiple scores & combined p-value3. Similarity to known TFBS models4. List of target genes5. BS localization graph6. Targets mean expression graph

Page 9: The AMADEUS Motif Discovery Platform

Main features: alg.Algorithm: Multiple refinement phases: • Each phase receives best candidates of previous phase,

and refines them (e.g., uses a more complex motif model)• First phases are simple and fast (e.g., try all k-mers);

Last phases are more complex (e.g., optimize PWM using EM)

Page 10: The AMADEUS Motif Discovery Platform

Main features: scoresMotif scores:

• User selects scores to use, a subset of:─ Target-set: Over/under-representation:

1. Hypergeometric2. GC-content+length binned binomial

─ Expression: 1. Enrichment of ranked expression (multiple conditions)

(Not yet in the public version) ─ Global/spatial:

1. Localization2. Strand-bias3. Chromosomal preference

• Scores are combined into a single p-value• Doesn’t assume specific models for distribution of BSs

and/or expression values

Page 11: The AMADEUS Motif Discovery Platform

Main features: misc.GUI:

• Control all parameters• Save/load parameters from file• Save textual+graphical output to file• TFBS viewer

Other:• Ignore redundant sequences (with identical subsequence) • Applicable to multiple genome-scale promoter sequences • Bootstrapping: Empirical p-value estimation using

random target sets / shuffled data• Execution modes: GUI , batch• Interoperability: Java application

Page 12: The AMADEUS Motif Discovery Platform

Case study:G2 & G2/M phases of human cell

cycle [Whitfield et al. ’02]CHR (not in TRANSFAC)

NF-Y

(Module was reported in [Linhart et al., ’05], [Tabach et al. ’05])Module: CHR and NF-Y motifs co-occur

Page 13: The AMADEUS Motif Discovery Platform

Benchmark I:Yeast TF target sets [Harbison et al.

’04]Source: ChIP-chip [Harbison et al., ’04]Data: target-sets of 83 TFs with known BS motifsAverage set size: 58 genes (=35 Kbps)Success rates: (for top 2 motifs of lengths 8 & 10)

Page 14: The AMADEUS Motif Discovery Platform

Performance on metazoan datasetsResults on 42 target-sets:• Collected from 29 publications• Based on high-throughput expr’s• Species: human, mouse, fly, worm • Sets: 26 TFs, 8 microRNAs• All have known motifs

Page 15: The AMADEUS Motif Discovery Platform

Global Analysis I:Localized human+mouse motifs

Input: • All human & mouse promoters (2 x ~20,000) • Region: -500…100 (w.r.t. TSS)• Total sequence length: ~26 Mbps• [No target-set / expression data]• Score: localization

Results: • Recovered known TFs: Sp1, NF-Y, GABP, TATA, Nrf-1, ATF/CREB, Myc, RFX1• Recovered the splice donor site• Identified several novel motifs

Page 16: The AMADEUS Motif Discovery Platform

Input: • All fly promoters (~14,000) • Region: -1000…200 (w.r.t. TSS)• Total sequence length: ~11 Mbps• [No target-set / expression data]• Score: chromosomal preference

Results: • DNA Replication Element Factor (DREF) on X chromosome

Global Analysis II:Chromosomal preference

Page 17: The AMADEUS Motif Discovery Platform

Global Analysis II:Chromosomal preference (cont.)

Input: • All worm promoters (~18,000) • Region: -500…100 (w.r.t. TSS)• Total sequence length: 6.6 Mbps• [No target-set / expression data]• Score: chromosomal preference

Results: • Novel motif on chrom IV

Page 18: The AMADEUS Motif Discovery Platform

Summary• Developed Amadeus motif discovery platform:

• Easy to use• Feature-rich, informative• Sensitive & efficient

• Constructed a large, real-life, heterogeneous benchmark for testing motif finding tools• Demonstrated various applications of motif discovery• http://acgt.cs.tau.ac.il/amadeus

Page 19: The AMADEUS Motif Discovery Platform

Acknowledgements

Tel-Aviv UniversityChaim LinhartYonit HalperinRon Shamir

The Hebrew University of JerusalemGidi Weber