What everybody knows but nobody says can hurt interdisciplinary research John V. Carlis University of Minnesota
Feb 25, 2016
What everybody knows but nobody says
can hurt interdisciplinary research
John V. CarlisUniversity of Minnesota
“What we have here is a lack of communication”
Messages What everybody knows, nobody says Missed (not mis-) communication:
painful surprises Plan for success
exponential growth in data beyond human scale
Yeah! good work to do invent together
Inter-Disciplinary Research
IT-ist [CS/Eng. … /Math/Stat] Tool Builder [content neutral]
Biologist [soils/neuro/microbio/dent/biochem/ecol/vet] Content Seeker/Maker [tool user]
Surprises & DoInter-Alien Research
Can an IT-ist become a Biologist or vice versa?
Well, life’s too short specialization of labor bioinformatics grad minor
CBCB in our future?
Ante-Disciplinary?
Business Surprise User: NO! IT: but I built what you told me to build User: I gave you a typical example, but of
course there are exceptions IT: you didn’t tell me User: you didn’t ask,
and, besides, everybody knows that
worse in science – but why?
Science for IT is harder Business – human decides complexity Science -- reality >> models Exponential growth in data Competing models Lots of vocabulary Specific vs Abstract Vocabulary sloshes
Surprises
Surprise (1/5): Context Ph is not Ph
need to remember instrument used Annotation
Beyond genome is harder What
plus When & Where [microarray; mass spec]
harder to share/re-use data
Surprise (2/5): Casual Vocabulary chimp chimp + baby chimp + offspring chimp + offspring
+ close personal friend
Surprise (3/5): Success Brings Pain Prosite’s curated protein patterns + descriptions: ~2 mb of free (con)text
human browses toooo little success tooooooo many
Genbank Obsolete fields “misc”
Parsing free text is hard & error prone
Surprise (4/5): Vocabulary missing/overloaded/off
Text readable only by those who already know Nouns – pretty good Verbs -- Janeway’s “Immunology”:
mediate, … “Pathway” BAD diagrams
248
e.g., “metadata”
Surprise (5/5): Idiosyncratic brain viewing Different machines,conditions &
warping parameters Fuss ‘til it looks right
a day’s work! requires scarce expertise Doesn’t scale to comparisons among images
processing plan is data too
Can IT-ist ignore performance? IT-ist expects specifications Short run efficiency for given specs
get it working but cycles/space cheap/available
Change? Plan for unplanned changes
not trained/rewarded attitudelack vision
Togetherness
Communication
Anchored/Enabled/Rewarded
Vocabulary Mantra:what do we mean by one of this type?
Data Model What to remember, not how Fine distinctions [singular/plural]
disease vs affliction host vs pathogen Multi, not single function,
so not partition cluster
Hit Limits DBMS Extensions “manual” brain image manipulations
new content-neutral operators “this” is a special case of what more
general task constant vector multi-hull
not “the” query;parachute in then explore territory
Interdisciplinary Impedance Mismatch
Mundane vs interesting Messy problem (seeking insights)
vs optimal solution (irrelevant but hopeful)
Good clusters/fast algorithm/DB not directly a Bio goal
Some professional danger but big potential reward
Good Work Expect to struggle to communicate
invent vocabulary define verbs
Seek visionary colleagues