Seeing things differently: Innovation in Computational ...€¦ · Seeing things differently: Innovation in Computational Mass Spectrometry Rob Smith, Ph.D. ... live in different

Post on 26-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Seeing things differently: Innovation in Computational

Mass SpectrometryRob Smith, Ph.D.

Associate ProfessorDepartment of Computer ScienceUniversity of Montana

“Pain-free MS data processing”

Founder | CEO

“I undertook something that not everyone may undertake: I descended into the depths, I bored into the foundations.”

—Nietzche, “Dawn of Morning”

Overview: Where is

the innovation?

Innovation

Innovation

Innovation

InnovationInnovation

Innovation

Innovation

Innovation

Current Limits

Current Limits

Cur

rent

Lim

itsC

urrent Limits

InnovationInnovation

Innovation

Innovation

Innovation

Current Limits

Current Limits

Cur

rent

Lim

itsC

urrent Limits

InnovationInnovation

Innovation

Innovation

Innovation

Current Limits

Current Limits

Cur

rent

Lim

itsC

urrent Limits

InnovationInnovation

Innovation

Innovation

Innovation

Why don’t we go there?

Current Limits

Current Limits

Cur

rent

Lim

itsC

urrent Limits

InnovationInnovation

Innovation

Innovation

Innovation

Why don’t we go there?

• Need to identify the limits.

• Need to take risks.

What does the journey look like?

You are on the right track

when…

a) The old guard says, “why would you want to

do that?”

Innovation

Innovation

Innovation

Their world looks like this:

Current Limits

Current Limits

Cur

rent

Lim

itsC

urrent Limits

InnovationInnovation

Innovation

Innovation

Innovation

Not this:

Inside the Box

“There are no unsolved problems.” - A Developer

“Conversations with 100 scientists in the field reveal a bifurcated perception of the state of mass spectrometry software.” R. Smith, Journal of Proteome Research, 2018.

Inside the Box

“How could you possibly make significant improvements to the state of

the art?!” - A Bigwig

Outside the Box

• “All scientific software sucks. It is idosyncratic, it makes no sense, it has glitches, it is a pain in the ass!” - A User

• “[There are ] a few mediocre ones, the rest are absolute crap.” - A User

• “They are complete trash.” - A User

“Conversations with 100 scientists in the field reveal a bifurcated perception of the state of mass spectrometry software.” R. Smith, Journal of Proteome Research, 2018.

b) You ask “why not,” and you find there isn’t a sufficiently good reason.

Current Limits

Current Limits

Cur

rent

Lim

its

Current Lim

its

InnovationInnovation

Innovation

Innovation

Innovation

Why not?

The limits of the possible can only be defined by going beyond them into the impossible.

-Arthur C. Clarke

c) You need a new vocabulary to describe

your solution.

Innovation occurs in the space between reality and the

language we use to describe it.

d) You are able to see and measure limitations

in the status quo.

Outline

• The old guard says, “why would you want to do that?”

• You ask “why not,” and you find there isn’t a sufficiently good reason.

• You need a new vocabulary to describe your solution.

• You are able to see and measure limitations in the status quo.

Outline

• The old guard says, “why would you want to do that?”

• You ask “why not,” and you find there isn’t a sufficiently good reason.

• You need a new vocabulary to describe your solution.

• You are able to see and measure limitations in the status quo.

Part 1: Words and Concepts

“Many problems are caused by the difference between how things actually work, and the

language / tools / paradigms / tropes we use to describe and engage with them.”

-Gregory Bateson

“Language allows you to have ideas otherwise un-haveable, and that by

extension people who own different words live in different conceptual worlds.”

-Joshua Hartshorne

Innovation occurs in the space between reality and the

language we use to describe it.

You can’t code what you can’t describe.

“Current controlled vocabularies are insufficient to uniquely map molecular entities to mass

spectrometry signal” Smith et al., BMC Bioinformatics 16(7), 2015.

Part 2: Asking different

questions

• What we think we are asking

• What we are actually asking

• What we should be asking

• What we think we are asking

• What we are actually asking

• What we should be asking } Not the

same!

What we think we are doing What we are actually doing

p(x) p(x|a,b,c,….)}

What we want to measure

}Our assumptions

An analog or estimate

p(x|a,b,c,….)}

Our assumptions

EASIER TO CALCULATE

But what if a,b,c,…. are wrong?

What we think we are doing What we are actually doing

Given: -a spectrum -context

…what do I have?

Assuming: a single species.

the most abundant ions are from the same species.

ion abundance = parent abundance.

there are little to no modifications.

database contains the correct match.

…what matches best?

What we think we are doing What we are actually doingWhat is the likelihood that a match is correct (FDR)?

Assumes:Target/decoy accurately simulates

the likelihood of a false positive match.

Decoy sequences are dissimilar to target sequences.

The database size is chosen such that the FDR is accurate.

What is the similarity between matched spectra and shuffled or reversed spectra?

What we think we are doing What we are actually doing

Correspondence Alignment

Elution order never changes

MS/MS ID rates are high

m/z doesn’t shift

RT shifts are monotonic

Assumes:

“LC-MS alignment in theory and practice: a comprehensive algorithmic review.” Smith et al. Briefings in Bioinformatics 16(1), 2015.

What we think we are doing What we are actually doing

Which PTMs are in this sample? Does this sample contain this particular PTM?

At what index are these peptides modified?

At what index is this particular PTM found?

One modification at a time.

Only the modification we are looking for.

What we think we are doing What we are actually doingValidating accuracy w/ CV Measuring consistency w/ CV

Sameness -> correctnessCorrect peak integration

What we think we are doing What we are actually doing

Validating algorithms Measuring agreement between algorithms

Sameness -> correctness

If we had more time…

• Signal to noise is meaningful.

• DIA >> DDA

• 2-dimensional signals should be used (XICs, TICs, etc.)

• Predicting spectra is hard; machine learning can make it easier.

Summarizing

• Analogs are not the same as equals.

• We ignore massive and often provably incorrect assumptions.

• Bad assumptions = incorrect results.

Summarizing

• What is the space between reality and the language we use to describe it?

• Are our estimates actually any good?

• Can estimates be improved?

• Can we actually measure what we are currently only estimating?

Acknowledgements

www.primelabs.ms

Smith Computational Mass Spectrometry Lab

NSF Career Award 1552240

NSF SBIR 1819290

NSF I-Corps 1741270

MTBRCT 19-51-031

Funding:

“Pain-free MS data processing”

rob.smith@primelabs.ms

top related