Top Banner
Psi-BLAST, Psi-BLAST, Prosite, Prosite, UCSC Genome UCSC Genome Browser Browser Lecture 3 Lecture 3
36

Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

Psi-BLAST,Psi-BLAST,Prosite, Prosite,

UCSC Genome UCSC Genome BrowserBrowser

Lecture 3Lecture 3

Page 2: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

Searching for remote homologsSearching for remote homologs

Sometimes BLAST isn’t enoughSometimes BLAST isn’t enough Large protein family, and BLAST only finds Large protein family, and BLAST only finds

close members. We want more distant close members. We want more distant members members

Page 3: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

PSI-BLASTPSI-BLAST

PPosition osition SSpecific pecific IIterative terative BLASTBLAST

Regular blast

Construct profile from blast results

Blast profile search

Final results

Page 4: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

Consensus, Pattern, PSSMConsensus, Pattern, PSSM

AATTCCTTTTGG

AAAACCTTTTGG

AAAACCTTTTCC

1 2 3 4 5 6

Seq1

Seq2

Seq3

Consensus:

the most frequent character in the column is chosen

A A C T T G A-[TA]-C-T-T-[GC]

Pattern:

represents the alignment as a regular expression Pos

Nuc112233445566

AA11.67.6700000000

CC0000110000.33.33

GG0000000000.67.67

TT00.33.3300111100

Profile = PSSM:

Position Specific Score Matrix

Page 5: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

S(AACCAA)=1*0.67*1*1*.25*.33S(GACCAA)=0Sequences with higher scores -> higher chance of being related to the PSSM

PosNuc 112233445566

AA11..67670000..2525..3333

CC00..33331111..252500

GG00000000..2525..3333

TT00000000..2525..3333

Page 6: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

PSI-BLASTPSI-BLAST

PPosition osition SSpecific pecific IIterative terative BLASTBLAST

Regular blast

Construct profile from blast results

Blast profile search

Final results

Page 7: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

BLAST – PSI-BlastBLAST – PSI-Blast

Page 8: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

PSI-Blast - resultsPSI-Blast - results

Page 9: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

PSI-BLASTPSI-BLAST

AdvantageAdvantage: PSI-BLAST looks for seq’s : PSI-BLAST looks for seq’s that are close to the query, and learns that are close to the query, and learns from them to extend the circle of friendsfrom them to extend the circle of friends

DisadvantageDisadvantage: if we obtained a WRONG : if we obtained a WRONG hit, we will get to unrelated sequences hit, we will get to unrelated sequences (contamination). This gets worse and (contamination). This gets worse and worse each iterationworse each iteration

Page 10: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

PSI-BLASTPSI-BLAST

Which of the following is/are correct?Which of the following is/are correct?

1.1. PSI-BLAST is expected to give more hits PSI-BLAST is expected to give more hits than BLASTthan BLAST

2.2. PSI-BLAST is an iterative search methodPSI-BLAST is an iterative search method

3.3. PSI-BLAST is faster than BLASTPSI-BLAST is faster than BLAST

4.4. Each iteration of PSI-BLAST can only Each iteration of PSI-BLAST can only improve the results of the previous improve the results of the previous iterationiteration

Page 11: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

Turning information into knowledgeTurning information into knowledge

The outcome of a sequencing project are The outcome of a sequencing project are masses of raw datamasses of raw data

The challenge is to turn these The challenge is to turn these raw data raw data into biological knowledgeinto biological knowledge

A valuable tool for this challenge is an A valuable tool for this challenge is an automated diagnostic pipe through which automated diagnostic pipe through which newly determined sequences can be newly determined sequences can be streamlinedstreamlined

Page 12: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

From sequence to functionFrom sequence to function Nature tends to innovate rather than inventNature tends to innovate rather than invent Proteins are composed of functional Proteins are composed of functional

elements: domains and motifselements: domains and motifs DomainsDomains are structural are structural

units that carry out a units that carry out a certain function. They are certain function. They are shared between different shared between different proteinsproteins

MotifsMotifs are shorter are shorter and are usually criticaland are usually criticalfor the biological activityfor the biological activity

Page 13: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

http://www.expasy.ch/http://www.expasy.ch/prositeprosite

Page 14: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

PrositeProsite

From analyzing conserved regions in From analyzing conserved regions in protein sequences it is possible to derive protein sequences it is possible to derive signatures of motifs and domainssignatures of motifs and domains

Prosite consists of annotated Prosite consists of annotated sites/motifs/signatures/fingerprints sites/motifs/signatures/fingerprints

Given an uncharacterized translated Given an uncharacterized translated protein sequence, prosite tries to predict protein sequence, prosite tries to predict which motifs and domains make up the which motifs and domains make up the protein and thus identify the family to protein and thus identify the family to which it belongswhich it belongs

Page 15: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

PrositePrositeProsite represents entries with Prosite represents entries with patternspatterns or or profilesprofiles

A A C T T C

A T C T T G

A A C T T G

profile

A-[TA]-C-T-T-[GC]

Profiles are used in prosite when the motif is relatively Profiles are used in prosite when the motif is relatively divergent, and is difficult to represent as a patterndivergent, and is difficult to represent as a pattern Profiles also characterize domains over their entire length, not Profiles also characterize domains over their entire length, not just the motifjust the motif

pattern

1122334455 66

AA110.670.6700000000

TT000.330.3300111100

CC00001100000.330.33

GG00000000000.670.67

Page 16: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

Prosite sequence queryProsite sequence query

Page 17: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.
Page 18: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

Patterns with a high probability of Patterns with a high probability of occurrenceoccurrence

Entries describing commonly found postEntries describing commonly found post--translational modifications or compositionally translational modifications or compositionally biased regionsbiased regions

Found in the majority of known protein Found in the majority of known protein sequences sequences

High probability of occurrenceHigh probability of occurrence Prosite filters them by defaultProsite filters them by default

Page 19: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

Scanning PrositeScanning Prosite

Query: sequence

Query: pattern

Result: all patterns found in the sequence

Result: all sequences which adhere to this pattern

Page 20: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

Prosite pattern queryProsite pattern query

Page 21: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.
Page 22: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.
Page 23: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

UCSC Genome BrowserUCSC Genome Browser

Page 24: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway

Reset all settings of

previous uses

Page 25: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway

Page 26: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

ResultsResults

Page 27: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

Annotation tracksAnnotation tracks

Mammal conservation

mRNAs (GenBank)

RefSeq Genes

Base position

Species alignment

SNPs

Repeats

GeneDirection

Coding

Intron

UTRUCSC Genes

Page 28: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

UCSC GeneUCSC Gene

Page 29: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

UCSC Genome Browser - movementUCSC Genome Browser - movement

Zoom x3 + Center

Page 30: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

ControllingControllingannotationannotation

trackstracks

Page 31: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

Malariadistr.

Sickle-cell anemia distr.

Page 32: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

BLATBLAT

BLAT = BBLAT = Blast-last-LLike ike AAlignment lignment TTool ool BLAT is designed to find similarity of BLAT is designed to find similarity of >95% on >95% on

DNADNA, , >80% for protein>80% for protein Rapid search by indexing entire genomeRapid search by indexing entire genome

Good for:Good for:

1.1. Finding genomic coordinates of cDNAFinding genomic coordinates of cDNA

2.2. Determining exons/intronsDetermining exons/introns

3.3. Finding human (or chimp, dog, cow…) Finding human (or chimp, dog, cow…) homologs of another vertebrate sequencehomologs of another vertebrate sequence

Page 33: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

BLAT on UCSC Genome BrowserBLAT on UCSC Genome Browser

Page 34: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

BLAT searchBLAT search

Page 35: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

BLAT ResultsBLAT Results

Page 36: Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

BLAT ResultsBLAT Results

Match

Non-Match(mismatch/indel)

Indel boundaries

query

hit