CENDI wilbanks

Post on 17-May-2015

783 Views

Category:

Business

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Talk given to the meeting of the CENDI group in early November 2013. CENDI is a volunteer-powered membership organization that serves the federal information community - that is, all those who create, manage, aggregate, organize, and provide access to federally-funded data and publications resulting from the nation’s $150 billion annual investment in federal R&D. Member organizations represent a cross-section of federal data and publication providers, including libraries, data centers, aggregators, information technology developers, and content management providers.

Transcript

1. the policy

environment. it is not sufficient.

http://www.systemswiki.org/images/8/8a/Wisdom.png

“is it open?” is perhaps not the

right frame.

accessibility

adaptability

leverage

ease of mastery

accessibility

adaptability

leverage

ease of mastery

EASY TO USE NO OPEN LICENSE

�17

�19

accessibility

adaptability

leverage

ease of mastery

NO OPEN LICENSE DOWNLOAD AVAILABLE DOCUMENTATION IN PDF

2. doing research in the open: early returns. it is not sufficient.

“how accurately can we predict if a female breast cancer survivor will develop a second tumor?”

may the best (statistical) model win

code sharing a prerequisite.

accuracy of model jumped three orders of magnitude in nine days.

�27

76% accurate.

�28

(not a biologist)

21 february 2013

17 april 2013

ongoing...

SHOW ME THE CODE!

...

...

...

...

...

if we don’t have the article in machinable form with rights to tranform? doesn’t happen.

can we predict clinical utility from genetics of arthritis?

can we predict scores on alzheimers cognitive tests from existing data?

accessibility

adaptability

leverage

ease  of  mastery

0

25

25

25

25

THREE  OPTIONS  TO  DOWNLOAD  NO  CLEAR  LICENSE  PRIVACY  RESTRICTIONS  METADATA

accessibility

adaptability

leverage

ease  of  mastery

IMPACT  OF  PRIVATE  INTERVENTION

68core projects

248researchers

28institutions

1070datasets

1723results

Omberg,  et  al.  Nature  Gene*cs

colorectal cancer subtyping

A

B

C

D

E

F

1

2

3

4

5

6

datasets subtypesanalysis groups

A

B

C

D

E

F

1

2

3

4

5

6

datasetsanalysis groups

G ...

subtypes

analysis groups

G

A

B

C

D

E

F

1

2

3

4

5

6

datasetsanalysis groups

G ...

subtypes

3. research and culture are

on a collision course, driven by data.

tension between anonymity and utility.

“more like plutonium than gold”

tension between expectation and reuse.

68% want their data shared for science

tension between value of individual and value of

aggregate.

$.50 to $2.50 for SSN, birthdate, etc.

$5 to $15 for credit, background checks.

~40 records for $2100

tension between “research” data and

“consumer” data.

https://www.scienceexchange.com/

it’s likely that we will end up with a data network

effect of some sort.

a. the incremental institution.

b. the walled garden.

c. big networks of small things.

thank you !

@wilbanks wilbanks@nitrd.gov

top related