Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription Factor DB
Dec 21, 2015
Introduction to Bioinformatics - Tutorial no. 5
MEME – Discovering motifs in sequences
MAST – Searching for motifs in databanks
TRANSFAC – The Transcription Factor DB
http://weblogo.berkeley.edu
WebLogo - InputAligned
Sequences(e.g. output of
ClulatlW)
RUN !
Genes:
WebLogo - Output
Proteins:
MEME
http://meme.sdsc.edu/ Motif discovery from unaligned sequences
Genomic or protein sequences Identifies profile motifs
Multiple motifs for any input Flexible model of motif presence
Motif can be absent in some sequences Can appear several times in one sequence
MEME InputEmail address Multiple input sequences
How many times in each sequence?
How many motifs?
How many sites?
Range of motif lengths
MEME Output (1)
Motif length
Number of times
Like BLAST
“Position-Specific Probability Matrix”
= Motif Profile
Diversion of motif position
from background
Most popular symbols
MEME Output (2)
Sequence names
Reverse complement (genomic input only)
Position in sequence
Strength of match
Motif within sequence
MEME Output (3)
Overall strength of motif matches
Original sequence lengths
Motif instance
MAST Searches for motifs (one or more) in
sequence databases: Like BLAST but motifs for input Similar to iterations of PSI-BLAST
Profile defines strength of match Multiple motif matches per sequence Combined E value for all motifs
MEME uses MAST to summarize results: Each MEME result is accompanied by the MAST
result for searching the discovered motifs on the given sequences.
MAST InputEmail address
Database (like BLAST)
Motif file (e.g. MEME output)
Consider matched sequence length
E value threshold
MAST Output (1)
Matched accession
Match E value
Length of sequence
Link to GenBank
MAST Output (2)Motif
diagram
MAST Output (3)
Position of each instance
P value of instance
Matched parts of
sequence
Motif ‘consensus’
Motif and orientation
TRANSFACDatabase of eukaryotic DNA transcription regulation: Individual regulatory sites (SITES table)
Genes to which they belong Proteins which bind them
Proteins which bind sites (FACTORS table) Cellular source of protein Nucleotide motif profile for binding Some grouping and classification
Classification of factors (CLASS table) Position-specific matrices for select factors
(MATRIX table) Cell localization (CELL table)
Searching TRANSFAC www.gene-regulation.com Search a single table
By identifier, factor name, gene name By species, author
Browse your way from table to table Search within a sequence
MatInspector, TFScan (EMBOSS package)
TRANSFAC FactorDT Date; authorFA Factor nameGE Encoding geneSF Structural featuresCP Cell specificity (positive)CN Cell specificity (negative)EX Expression patternFF Functional featuresIN Interacting factors MX MatrixBS Binding SITE DR External databases
References: RN Reference no.RX MEDLINE IDRA Reference authorsRT Reference titleRL Reference data
TRANSFAC MatrixAccession
Position Specific Matrix
Statistical basis
Concensus (IUPAC subset
symbols)
TRANSFAC Site (1)
Accession number
DNA or
RNA
Gene
Gene region
Sequence of regulatory element
Position range of factor
binding site
TRANSFAC Site (2)
Binding factor
accession
Factor name
Binding ‘quality’1 functionally confirmed
2 binding of pure protein
3immunologically
characterized extract
4via known binding
sequence
5extract protein binding to
bona fide element
6 unassigned
Organism
Cellular source
Methods of identifying site
External links
TRANSFAC Factor (1)
AC: Accession number
FA: Factor name
SX: Other names
OS: OrganismOC: Taxonomy
HO: Homologs
CL: Classification
SZ: SizeSX: Amino
acid sequence
TRANSFAC Factor (2)
Protein sequence reference
Features and positions
Structural featuresCell specificity
Question
A biologist at your university has found 15 target genes that she thinks are co-regulated. She gives you 15 upstream regions of length 50 base pairs in FASTA format, file DNASample50.txt, and asks you to identify the motif, and - if possible - the potential regulating protein. She tells you the sequences are from Homo sapiens, and by intuition feels the motifs of length 8. She wants you to suggest only the best possible candidate motif.
QuestionAfter you ran all the programs your biologist friend confesses that she is not sure if her intuition about the motif length was correct. Re-run the tool without knowledge of motif length. Do you get the same results?
Determine a potential DNA binding protein using TRANSFAC