Publishing 3.0, or: Why we will all be disintermediated, (and that is a good thing!) Anita de Waard Disrup@ve Technologies Director, Elsevier Labs, Burlington, VT (= not what the program says !) AAMC GREAT/GRAND Mee@ng September 21, 2012
May 10, 2015
Publishing 3.0, or: Why we will all be disintermediated,
(and that is a good thing!)
Anita de Waard Disrup@ve Technologies Director,
Elsevier Labs, Burlington, VT (= not what the program says J!)
AAMC GREAT/GRAND Mee@ng September 21, 2012
What’s the big deal with big data? Decoding the human genome involves analysing 3 billion base pairs—it took ten years the first @me it was done, in 2003, but can now be achieved in one week. Data, Data Everywhere, The Economist, February 25, 2010 Mobile Internet devices will outnumber humans this year,
Cisco predicts…Global mobile data traffic is expected to increase 18-‐fold over the next five years to 10.8 exabytes per month. Cloud traffic is expected to account for 71%, or 7.6 exabytes per month, of total mobile data traffic by 2016. ‘Big data’ offers huge challenges for biomedicine
in an era of massive data sets… Francis Collins, Director of NIH, Yesterday
Facebook stores 100 petabytes in Hadoop.
Your funders are telling you to share your data:
• NSF Data Sharing Policy: Inves8gators are expected to share with other researchers, at no more than incremental cost and within a reasonable @me, the primary data, samples, physical collec8ons and other suppor8ng materials created or gathered in the course of the work under NSF grants.
• NIH Data Sharing Policy: Final Research Data should be made as widely and freely available as possible while safeguarding the privacy of par@cipants, and protec@ng confiden@al and proprietary data. Final Research Data means recorded factual material commonly accepted in the scien8fic community as necessary to document and support research findings. This does not mean summary sta@s@cs or tables; rather, it means the data on which summary sta@s@cs and tables are based.
So are you sharing your data? Really?
5
Crea@ng more data by the minute.
1
Home(64%)
Search(36%)
People manager(23%)
Employment law(15%)
Search (35%)
Policies & Docs.(16%)
Emp. law Ref. Man. (11%)
Search (48%)
Pols. and docs. (11%)
Search (53%)
Pols. and docs.(15%)
Search (25%)
Pols. and doc. (44%) Search (26%)
Pols. and docs. (49%)
Pols. And docs. (53%)
Search (15%)Search (37%)
Pols. and docs. (25%)
Home (38%)
Search (19%)
Policies (13%)
Emp. law ref. man. (43%)
Search (25%)
Search (28%)
Emp. law ref. man. (40%)
Employment law. (8%)
Pols. and docs. (13%)
Search (35%)
Emp. law ref. man. (19%)
Emp. Law (82%)
Search (9%)
Employment law (86%)
Statutory rates (4%)
Employment law (65%)
Emp. law ref. man. (24%)
Statutory rates (37%)
Employment law (31%)
Home (8%)
Policies (8%)
Search (35%)
Emp. law ref. man. (17%)
Pols. and doc.(9%)
Legal guidance (8%)
Search (48%)
Employment law (9%)
Emp. law ref. man. (11%)
Search (28%)
Employment law (11%)
Emp. law ref. man. (63%)
Legal guidance (28%)
Search (26%)
Employment law (14%)
Pols. and docs. (32%) Employment law (14%)
Time:8.8minAge : 33.6Bounce : 1% N= 25,423
Time:1.14minAge : 1Bounce : 0% N= 16
What’s new(9%)
Time:2.2 minAge : 7.9Bounce : 1.8% N= 115,498
Time:0.4minAge : 8.5Bounce : 6.3% N= 10,562
What’s new (16%)
Legal guidance (17%)
Time:3.9 minAge : 27.7Bounce : 0.7% N= 2681
Time:31.9minAge : 11.6Bounce : 1.2% N= 1815
Time:0.4minAge : 8.6Bounce : 3.6% N= 8,563
Time:2.5minAge : 4.8Bounce : 28.4% N= 5,780
Time:1.6 minAge : 4Bounce : 1.4% N= 141
Time:1.7minAge : 29.3Bounce : 1% N= 826
Time:1.63minAge : 32.5Bounce : 2.6% N= 268
Time:2.4minAge : 7.3Bounce : 2.1% N= 96 Time:1.8min
Age : 5.4Bounce : 0% N= 58
Employment law (16%)
Time:2.8minAge : 40Bounce : 0% N= 57
What’s new (28%)
Time:2.5minAge : 8.7Bounce : 0.9% N= 6,219
Legal guidance (13%)Time:1.8 minAge : 9.02Bounce : 5.2% N= 910
What’s new (36%)
Legal reports (11%)
Time:2.1 minAge : 10.2Bounce : 1.3 % N= 230What’s new (20%)
Legal reports (33%)
Search (16%)
Time:1.1 minAge : 8.9Bounce : 1 % N= 98
What’s new (13%)
Search (16%)
Legal guidance (24%)
Employment law (10%)
Time:1.1 minAge : 9.3Bounce : 0.8 % N= 877
What’s new (17%)
Employment law (58%)Time:0.7minAge : 9.2Bounce : 4.7 % N= 85
What’s new (13%)Search (16%)Legal guidance (24%)Time:0.8min
Age : 8.8Bounce : 3.4 % N= 174
Search (31%)Pols. and doc.(17%)
Emp. law ref. man. (13%)
Time:1.7minAge : 31.7Bounce : 1.5 % N= 136
Legal reports (16%)
What’s new (14%)Legal guidance (11%)
Time:2minAge : 8.8Bounce :1% N= 104
Time:13.7minAge : 35.4Bounce : 2% N= 3,561 Time:2min
Age : 20Bounce : 1% N= 523Time:1.9min
Age : 32.2Bounce : 0% N= 620 Time:1.6 min
Age : 22.2Bounce : 0.8% N= 761Time:1.4min
Age : 11.2Bounce : 1.6% N= 497
Time:2.36 minAge : 33.5Bounce : 0.7% N= 427
Time:87.5minAge : 35.6Bounce : 2.2% N= 7980
This plant tweets! • Internet of things: we can interact with ‘objects that blog’ or ‘Blogjects’, that track where they are and where they’ve been;
• have histories of their encounters and experiences have agency
• have a voice on the social web
Larry Smarr creates lots of data: • He wears:
• A Fitbit to count his every step • A Zeo to track his sleep pajerns • A Polar WearLink that lets him regulate his
maximum heart rate during exercise • 23andMe analyzed his DNA for disease suscep@bility.
• Your Future Health analyzed blood and stool samples for 100 biomarkers: • At one point, C-‐reac@ve protein stood out as higher than normal. • A blood test showed that his CRP had climbed to 14.5 during the ajack. • He took an@bio@cs, the symptoms resolved, and his CRP dropped to 4.9—
but that was s@ll unusually high. • Lactoferrin, too, rose several @mes to sky-‐high levels—200, whereas the
normal count is less than 7.3 – and in tandem with CRP • Smarr now thinks his diver@culi@s ajack was actually Crohn's disease – and
his gastroenterologist (reluctantly) agreed.
Clearity Founda@on: A transla@onal medicine and public service founda@on for: • Providing doctors access to molecular profiling for their ovarian cancer pa@ents • Providing doctors and pa@ents clinical trial op@ons informed by individual tumor biology • Providing financial support for the profiling work for pa@ents – Oprah approved!
As are lots of other ‘Quan@fied Selfers’:
But who uses all that data?
• It knows where you are • And who you talked to • And what you bought • And how much you paid.. • And whether you need another pair of shoes • And when and where you can get them…
does!
Brijany Wenger does!
17-‐year old Brijany Wenger developed a cloud-‐based neural network that is able to seamlessly and accurately assess 8ssue samples for signs/evidence of breast cancer to give more credence to the currently used (less reliable) minimally invasive procedure called Fine Needle Aspirates (FNAs). By looking at nine different input features and comparing them to the training examples, Brijany’s cloud-‐based neural network can detect malignant breast tumors with an accuracy of 99.11% Because her neural network is deployed in the cloud using Google’s app engine it means it can be accessed from exis8ng medical systems as well as through a web browser or mobile apps.
Winner of the Google Science Fair 2012
Using what is known about interac@ons in fly & yeast, predict new interac@ons with a human protein –
Running over data on the web that he neither created nor knew about!
Mark Wilkinson does! Given a protein P in Species X:
Find proteins similar to P in Species Y
Retrieve interactors in Species Y
Sequence-‐compare Y-‐interactors with Species X
genome
(1) à Keep only those with homologue in
Find proteins similar to P in Species Z
Retrieve interactors in Species Z
Sequence-‐compare Z-‐interactors with (1)
à Puta8ve interactors in Species X
These are different Web services! (and neither of them Mark’s) ...selected at run-‐@me based on the same model
Running the web like an experiment:
Puyng it another way:
Science is becoming distributed:
Tools
Thoughts
Data
Science is becoming distributed:
Tools
Thoughts
Data
Data is king! • Data needs to say what it’s about • Data needs to say where it comes from • Data needs to know who owns it • Data needs to be sensi@ve to privacy • Data needs to know how it’s used
Science is becoming distributed:
Tools
Thoughts
Data Tools rule! Tools can be made by everyone: Tools are open and free Tools will know where data lives Tools need to know about data: • Privacy/ownership • Trustworthiness • Provenance
Science is becoming distributed:
Tools
Thoughts
Data
If data and tools are ubiquitous, what majers most are the ques@ons you ask: • What is interes@ng? • What is important? • Who cares?
Science is becoming more distributed:
So where does that leave you?
How can you prepare (your students) for this future?
Well, you can’t -‐ not really. But there are a few habits you can ins@ll (and model):
Habit # 1: Be a good data producer • Know that you are crea@ng data • Be aware of privacy and IPR issues re. your data • Assume that someone, some @me will be using this data for some purpose you cannot imagine
• Learn which data repositories exist in your field, how they work, what they need from you
• Set up your work habits to automa@cally create (or force you to add) metadata to enable discovery and use of your data.
• Store your data in the repositories. Every @me.
Habit #2: Be a good data consumer.
• Find out which data exists that might be relevant to your work.
• Learn how to query available data. • Be aware of privacy and IPR licenses. • Give credit where it’s due: – Cite any data sources that you use – Share your knowledge on querying data – Deposit any data you’ve derived from other data!
Habit #3: Learn to code. • Brijany Wenger was born in 1995! • All sorts of people are using technology that was invented a{er the birth of your oldest grandchild.
• Use anything at your disposal to learn: – Your students – Your kids – Online forums – Video tutorials,
• Etc. etc. • E.g. Coursera course on Clinical Research InformaKcs -‐ see Cynthia Gadd (Vanderbilt)
Habit # 4: Expect to keep learning. • This will only get worse! (Or: bejer?) • Listen to Douglas Engelbart:
(he invented the mouse and the cursor, as well as collabora@ve work): “[For] improving the intellectual effecKveness of the individual human being…[o]ne of the tools that shows the greatest immediate promise is the computer…” (1962) “The grand challenge is to boost the collecKve IQ of organizaKons and of society.” (2000)
• Expect to keep learning – from anyone, and anywhere – the only thing that can limit your success is the idea that you can’t/don’t have to learn/change/adapt/evolve
Richard Feynman on Scien@fic Integrity: if you're doing an experiment, you should report everything that you think might make it invalid -‐ not only what you think is right about it If you make a theory, for example, and adver@se it, or put it out, then you must also put down all the facts that disagree with it, as well as those that agree with it. When you have put a lot of ideas together to make an elaborate theory, you want to make sure, when explaining what it fits, that those things it fits are not just the things that gave you the idea for the theory; but that the finished theory makes something else come out right, in addi@on.
Habit # 5: Don’t find what you already know.
Habit # 6: Anyone can come up with a great idea.
• To paraphrase Remi the Rat (Ratatouille): ‘Not everyone can be a great scienKst, but a great scienKst can come from anywhere’
• Grand challenges, hackathons, open invita@ons etc etc can offer great solu@ons to difficult problems (See Cameron for the story of Tim Gowers, who crowdsourced math)
• See also Collins’ talk yesterday: issues with race/ethnicity need to be overcome; involve students from around the world
• Involve K-‐12 students: get more kids excited about science!
Tools
Thoughts
Data
Six habits that might help:
3. Learn to code 4. Expect to keep learning
5. Don’t find what you already know 6. Anyone can come up with a great idea!
1. Be a good data producer 2. Be a good data consumer
Anyway -‐ how are we going to publish all of this?
Not like this!
How are we going to publish all of this?
We’re not. YOU are.
(With support from ‘us’ = publishers, libraries, ins@tu@ons, crowd…)
Maybe as Executable Papers….
Or by linking data to hospital info systems..
Electronic Patient Records Clinical Guideline
Data
Step 1: Patient data + diagnosis link to Guideline recommendation
Step 2: Guideline recommendation links to evidence in report or data
Or by crea@ng Linked Data stores...
33
Images from: Discovering drug–drug interactions: a text-mining and reasoning approach based on properties of drug metabolism, Luis Tari∗, Saadat Anwar, Shanshan Liang, James Cai and Chitta Baral Vol. 26 ECCB 2010, pages i547–i553 doi:10.1093/bioinformatics/btq382
Step 1: Manually iden@fy DDIs and drug names in wide collec@on of content sources
Step 2: Develop a model of Drug-‐Drug Interac@on and define candidates
Step 3: Automate this process and store as Linked Data
Calculate, coordinate…
Compile, comment, compare…
6. Run ni{y apps over all of this.
Or by gra{ing stories onto your data… 1. Add metadata to everything metadata
metadata
metadata
metadata
metadata
5. The reviewer approves (or comments, author revises, etc)
2. Use a workflow tool
4. Invite reviews
Review Edit
Revise
Rats were subjected to two grueling tests (click on fig 2 to see underlying data). These results suggest that the neurological pain pro-‐
3. Write in a shared space
Or by other ways… • Force11.org: ‘Future of Research Communica@ons and e-‐Science’: – ‘Society’ for thinking about new ways of communica@ng science and the humani@es
– Invi@ng general par@cipa@on – Please join!
In summary: • Big data and linked tools are completely changing the face of science by distribu@ng the crea@on of data, the building of tools, and the intelligent use of both
• Social media and open educa@on are changing who can do science, and how it is done
• Publishing all of this will not be a simple act, and not something publishers can do alone.
• All of this offers tremendous opportuni@es to expand the prac@ce and promise of science
• The best thing you can do is prepare to be amazed…
P.S.: Do we have any jobs for your graduates? Maybe! Some intriguing ideas: • Internships/traineeships? • Use cases for classes on informa@cs, e.g.: – Elsevier provides content/ontologies – Students develop ways to integrate data and publica@ons
– Students help user tes@ng/UI, model development
• Host joint grand challenges? • Certainly there will be lots of work in the informa@cs arena – with publishers, digital repositories, startups, etc, etc…
Ques@ons?