Top Banner
84

CENDI wilbanks

May 17, 2015

Download

Business

john wilbanks

Talk given to the meeting of the CENDI group in early November 2013. CENDI is a volunteer-powered membership organization that serves the federal information community - that is, all those who create, manage, aggregate, organize, and provide access to federally-funded data and publications resulting from the nation’s $150 billion annual investment in federal R&D. Member organizations represent a cross-section of federal data and publication providers, including libraries, data centers, aggregators, information technology developers, and content management providers.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CENDI wilbanks
Page 2: CENDI wilbanks

1. the policy

environment. it is not sufficient.

Page 3: CENDI wilbanks
Page 4: CENDI wilbanks
Page 5: CENDI wilbanks

http://www.systemswiki.org/images/8/8a/Wisdom.png

Page 6: CENDI wilbanks
Page 7: CENDI wilbanks
Page 8: CENDI wilbanks
Page 9: CENDI wilbanks
Page 10: CENDI wilbanks
Page 11: CENDI wilbanks
Page 12: CENDI wilbanks
Page 13: CENDI wilbanks

“is it open?” is perhaps not the

right frame.

Page 14: CENDI wilbanks

accessibility

adaptability

leverage

ease of mastery

Page 15: CENDI wilbanks
Page 16: CENDI wilbanks

accessibility

adaptability

leverage

ease of mastery

EASY TO USE NO OPEN LICENSE

Page 17: CENDI wilbanks

�17

Page 18: CENDI wilbanks
Page 19: CENDI wilbanks

�19

Page 20: CENDI wilbanks

accessibility

adaptability

leverage

ease of mastery

NO OPEN LICENSE DOWNLOAD AVAILABLE DOCUMENTATION IN PDF

Page 21: CENDI wilbanks

2. doing research in the open: early returns. it is not sufficient.

Page 22: CENDI wilbanks

“how accurately can we predict if a female breast cancer survivor will develop a second tumor?”

Page 23: CENDI wilbanks

may the best (statistical) model win

Page 24: CENDI wilbanks

code sharing a prerequisite.

Page 25: CENDI wilbanks
Page 26: CENDI wilbanks

accuracy of model jumped three orders of magnitude in nine days.

Page 27: CENDI wilbanks

�27

76% accurate.

Page 28: CENDI wilbanks

�28

(not a biologist)

Page 29: CENDI wilbanks

21 february 2013

17 april 2013

ongoing...

Page 30: CENDI wilbanks
Page 31: CENDI wilbanks
Page 32: CENDI wilbanks
Page 33: CENDI wilbanks

SHOW ME THE CODE!

Page 34: CENDI wilbanks
Page 35: CENDI wilbanks

...

Page 36: CENDI wilbanks

...

Page 37: CENDI wilbanks

...

Page 38: CENDI wilbanks

...

Page 39: CENDI wilbanks

...

Page 40: CENDI wilbanks
Page 41: CENDI wilbanks
Page 42: CENDI wilbanks
Page 43: CENDI wilbanks

if we don’t have the article in machinable form with rights to tranform? doesn’t happen.

Page 44: CENDI wilbanks

can we predict clinical utility from genetics of arthritis?

Page 45: CENDI wilbanks

can we predict scores on alzheimers cognitive tests from existing data?

Page 46: CENDI wilbanks
Page 47: CENDI wilbanks
Page 48: CENDI wilbanks

accessibility

adaptability

leverage

ease  of  mastery

0

25

25

25

25

THREE  OPTIONS  TO  DOWNLOAD  NO  CLEAR  LICENSE  PRIVACY  RESTRICTIONS  METADATA

Page 49: CENDI wilbanks

accessibility

adaptability

leverage

ease  of  mastery

IMPACT  OF  PRIVATE  INTERVENTION

Page 50: CENDI wilbanks

68core projects

Page 51: CENDI wilbanks

248researchers

Page 52: CENDI wilbanks

28institutions

Page 53: CENDI wilbanks

1070datasets

Page 54: CENDI wilbanks

1723results

Page 55: CENDI wilbanks

Omberg,  et  al.  Nature  Gene*cs

Page 56: CENDI wilbanks

colorectal cancer subtyping

Page 57: CENDI wilbanks

A

B

C

D

E

F

1

2

3

4

5

6

datasets subtypesanalysis groups

Page 58: CENDI wilbanks

A

B

C

D

E

F

1

2

3

4

5

6

datasetsanalysis groups

G ...

subtypes

Page 59: CENDI wilbanks

analysis groups

G

Page 60: CENDI wilbanks

A

B

C

D

E

F

1

2

3

4

5

6

datasetsanalysis groups

G ...

subtypes

Page 61: CENDI wilbanks

3. research and culture are

on a collision course, driven by data.

Page 62: CENDI wilbanks

tension between anonymity and utility.

Page 63: CENDI wilbanks

“more like plutonium than gold”

Page 64: CENDI wilbanks

tension between expectation and reuse.

Page 65: CENDI wilbanks

68% want their data shared for science

Page 66: CENDI wilbanks

tension between value of individual and value of

aggregate.

Page 67: CENDI wilbanks
Page 68: CENDI wilbanks

$.50 to $2.50 for SSN, birthdate, etc.

Page 69: CENDI wilbanks

$5 to $15 for credit, background checks.

Page 70: CENDI wilbanks

~40 records for $2100

Page 71: CENDI wilbanks

tension between “research” data and

“consumer” data.

Page 72: CENDI wilbanks
Page 73: CENDI wilbanks
Page 74: CENDI wilbanks

https://www.scienceexchange.com/

Page 75: CENDI wilbanks
Page 76: CENDI wilbanks
Page 77: CENDI wilbanks
Page 78: CENDI wilbanks
Page 79: CENDI wilbanks

it’s likely that we will end up with a data network

effect of some sort.

Page 80: CENDI wilbanks

a. the incremental institution.

Page 81: CENDI wilbanks

b. the walled garden.

Page 82: CENDI wilbanks

c. big networks of small things.

Page 83: CENDI wilbanks
Page 84: CENDI wilbanks

thank you !

@wilbanks [email protected]