Facts and Fallacies about de Novo Sequencing & Database Search.

Post on 23-Dec-2015

220 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Facts and Fallacies

about de Novo Sequencing & Database Search

1. There are a large number of high quality spectra left unassigned after DB search.

TrueFalse

Leftover

Unassigned Spectra in ABRF/iPRG 2011 Study

Unassigned Spectra

• Nonspecific trypsin cleavages• Novel peptide/incomplete database • PTM• Mutations

PEAKS PTM

SPIDER

PEAKS DB

De novo sequencing

Speed

• PEAKS 6 de novo sequence 15 spec/second.– Intel i7 Quad Core, 8GB RAM.– Trypsin– Orbitrap CID MS/MS, mostly charge +2/+3

• PEAKS 7 (coming soon): – Improve speed on high charge states and longer

peptides.– Add 8 core support in standard (desktop) license.

4. De novo should be done after DB search.

TrueFalse

DB search DB peptides

de novo seq.

Unassigned spectra

de novo peptides

Order of de Novo and DB

• Better conduct de novo on all spectra.– De novo not slow, and computing is cheap.– De novo provides independent validation for DB result.

# consensus AA (de novo vs. DB search)

true true

score

false

without de novo

with de novo

5. My protein sequence is confirmed with two unique peptide hits.

TrueFalse

Routine Full Protein Coverage

• For regular proteins, full sequence coverage can be routinely achieved with – 3 or more enzyme digests, and– multiple algorithms in PEAKS 6.

• For highly variable proteins (such as antibodies), BSI offers data analysis service for antibody sequencing.

6. If a peptide is identified with 1% FDR, then it’s sequence is 99% correct.

TrueFalse

Peptide Validation vs. Amino Acid Validation

You are confident about the peptide sequence only if • you can de novo sequence it, and• the de novo sequence matches the database peptide.

weak hits

confident protein

weak protein

Target-Decoy Incompatible with Certain Highly Optimized Search Engines

• Adding “protein bonus” to peptide hits increases accuracy.• But it creates bias between target and decoy.

– In extreme, bonus is so large that only peptides from target proteins are selected.

– This gives the wrong impression that FDR=0, while there are still false peptides in the result.

weak hits

confident protein

weak protein

Decoy Fusion Is A More Powerful Validation Method

• Decoy fusion append a decoy sequence to each protein.

• Recreates the balance.• The built-in validation method since PEAKS 5.3.

Error Accumulation

• In PEAKS, the inChorus algorithm automatically selects a less than 1% common FDR for each engine so that the combined FDR is approximately 1%.

PEAKS DB Mascot

1696(37)2.4%

2174(1)0.1%

195(22)13%

Target(decoy)FDR%PEAKS DB

3870(38)1%

2369(23)1%

Mascot

Correct < sum of the twoError ≈ sum of the two

Combined FDR = 1.5%

10. There is no automated way to validate de novo sequencing results.

TrueFalse

top related