13 november 2008 ucla graduate school of education and information sciences the knowledge web
13 november 2008
ucla graduate school of education and information sciences
the knowledge web
knowledge gapsprocess failures
transaction costslost opportunities
knowledge gapsprocess failures
transaction costslost opportunities
is the answer more information?
many information products advance
incrementally
the discovery process:
the discovery process:
thanks to the products, we already know a lot...
we need information innovationsand process innovations
to match product innovations.
1.
the “digital commons” represents a methodology that lowers the cost and
increases the volume of transactions at the “knowledge layer” of the net
does the ability to ask more questions, faster, lead us to more
knowledge or just more data?
what’s different about communications and computers?
1. we know stuff.
1. we know stuff.2. open networks.
physical
code
content
physical
code
content
physical
code
content
knowledge
knowledge rights
knowledge rights
“the commons”
“digital commons”
interoperabilitylow transaction costslaw and technology
user interface to copyright
140,000,000+ digital objects online under our licenses
licenses “ported” to 50+ countries
integrated with Google, Yahoo, Firefox, Microsoft Office...
2.
the digital commons is a stable methodology to manage data,
materials, and content for science.
“think market”
project development
funding
pro bono
community
“do no harm”
“running code”
early focus on life sciencesexploring climate change, geospatial, elsewhere
what would move via the science network?
Open Access Content
making knowledge legally and technically available for re-use and composition into new knowledge.
we use digital tools to replicate paper technology
© creative expression
© ideas or facts
e=mc2
the container, not the facts.
the container, not the facts.
but © locks the container.
IGFBP-5 plays a role in the regulation of cellular senescence via a p53-dependent pathway and in aging-associated vascular diseases
IGFBP-5 plays a role in the regulation of cellular senescence via a p53-dependent pathway and in aging-associated vascular diseases
http://orpheus-1.ucsd.edu/acq/license/cdlelsevier2004.pdf
indexing: disallowed.
image from the public library of sciencelicensed to the public under CC-BY 3.0
>1000 journals under CC
c
PubMedCentral ~ 1,000,000 articlespermissions granted: 50,000
(6% of PMC legal for transformative use)(.003 of all PubMed records)
what do these ideas mean in
a world of integrated data?
creativework?
“So, out of all of this discussion my
question is whether ChemSpider is
Content or Data.” - Antony Williams
“The motivation behind this memorandum is interoperability of scientific data.”
is it legal?
+
+
+
+ +
+
++ +
is it legal?
1 Converge on the public domain by waiving all rights based on intellectual property
2 Converge on the public domain by waiving other statutory or intellectual property rights.
3 Converge on the public domain by imposing no contractual controls.
4 Provide for interoperation with databases not available under the Protocol through open metadata
a protocol, not a license.
conflicts with the protection instinct
conflicts with the protection instinct
the protection instinct is frequently an instinct to protect “freedom”
3.
we have to build infrastructure for data into the web of documents that we have.
solves the legal problem
but not the container problem.
web 2.0, science 3.0, what about making Google work better?
over 200years at
one paper/day
what you want is a list of genes.
not a list of documents.
building a web for data:the “semantic web”
Web page Web pagelinks to
making computers understand links between documents
drinking coffee feel awakecauses
making computers understand relationships between concepts
drink coffee feel awakecauses
drinking coffee feel awakecauses
http://ontology.foo.org/drinking coffee http://ontology.foo.org/feel awake http://ontology.foo.org/receptor
http://ontology.foo.org/causes
192.168.1.1
we need a Domain Name System for concepts:
http://sciencecommons.org
coffee http://ontology.foo.org/coffee
coffee
“coffee”
“cafe”
“kopi” http://ontology.foo.org/coffee
use the web to integrate information from different places and different names
bed
get out of bed
get out of beddrink coffee
open eyes
located atlast subevent
first subevent
after
drink
coffee
wet
cup
is a
property ofoften near
make coffee
is for
subevent
feel awake
person
feel jittery
does not wantwants
causes
causes
pour coffee pick up cupafter after
cafe
sugar
often near
located in
(too much work for coffee)
(distributed, networked approaches start to look pretty good)
Open SourceData Integration
formatting digital knowledge into modular building blocks for
composition into new knowledge.
e pluribus unum.
prefix go: <http://purl.org/obo/owl/GO#>prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>prefix mesh: <http://purl.org/commons/record/mesh/>
prefix sc: <http://purl.org/science/owl/sciencecommons/>prefix ro: <http://www.obofoundry.org/ro/ro.owl#>
select ?genename ?processnamewhere
{ graph <http://purl.org/commons/hcls/pubmesh> { ?paper ?p mesh:D017966 .
?article sc:identified_by_pmid ?paper. ?gene sc:describes_gene_or_gene_product_mentioned_by ?article.
} graph <http://purl.org/commons/hcls/goa>
{ ?protein rdfs:subClassOf ?res. ?res owl:onProperty ro:has_function.
?res owl:someValuesFrom ?res2. ?res2 owl:onProperty ro:realized_as.
?res2 owl:someValuesFrom ?process. graph <http://purl.org/commons/hcls/20070416/classrelations>
{{?process <http://purl.org/obo/owl/obo#part_of> go:GO_0007166} union
{?process rdfs:subClassOf go:GO_0007166 }} ?protein rdfs:subClassOf ?parent.
?parent owl:equivalentClass ?res3. ?res3 owl:hasValue ?gene.
} graph <http://purl.org/commons/hcls/gene>
{ ?gene rdfs:label ?genename } graph <http://purl.org/commons/hcls/20070416>
{ ?process rdfs:label ?processname}}
Mesh: Pyramidal Neurons
Pubmed: Journal Articles
Entrez Gene: Genes
GO: Signal Transduction
we can transform complex queries into links
http://hcls1.csail.mit.edu:8890/sparql/?query=prefix%20go%3A%20%3Chttp%3A%2F%2Fpurl.org%2Fobo%2Fowl%2FGO%23%3E%0Aprefix%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0Aprefix%20owl%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0Aprefix%20mesh%3A%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Frecord%2Fmesh%2F%3E%0Aprefix%20sc%3A%20%3Chttp%3A%2F%2Fpurl.org%2Fscience%2Fowl%2Fsciencecommons%2F%3E%0Aprefix%20ro%3A%20%3Chttp%3A%2F%2Fwww.obofoundry.org%2Fro%2Fro.owl%23%3E%0A%0Aselect%20%3Fgenename%20%3Fprocessname%0Awhere%0A%7B%20%20graph%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2Fpubmesh%3E%0A%20%20%20%20%20%7B%20%3Fpaper%20%3Fp%20mesh%3AD017966%20.%0A%20%20%20%20%20%20%20%3Farticle%20sc%3Aidentified_by_pmid%20%3Fpaper.%0A%20%20%20%20%20%20%20%3Fgene%20sc%3Adescribes_gene_or_gene_product_mentioned_by%20%3Farticle.%0A%20%20%20%20%20%7D%0A%20%20%20graph%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2Fgoa%3E%0A%20%20%20%20%20%7B%20%3Fprotein%20rdfs%3AsubClassOf%20%3Fres.%0A%20%20%20%20%20%20%20%3Fres%20owl%3AonProperty%20ro%3Ahas_function.%0A%20%20%20%20%20%20%20%3Fres%20owl%3AsomeValuesFrom%20%3Fres2.%0A%20%20%20%20%20%20%20%3Fres2%20owl%3AonProperty%20ro%3Arealized_as.%0A%20%20%20%20%20%20%20%3Fres2%20owl%3AsomeValuesFrom%20%3Fprocess.%0A%20%20%20graph%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2F20070416%2Fclassrelations%3E%0A%20%20%20%20%20%7B%7B%3Fprocess%20%3Chttp%3A%2F%2Fpurl.org%2Fobo%2Fowl%2Fobo%23part_of%3E%20go%3AGO_0007166%7D%0A%20%20%20%20%20%20%20union%0A%20%20%20%20%20%20%7B%3Fprocess%20rdfs%3AsubClassOf%20go%3AGO_0007166%20%7D%7D%0A%20%20%20%20%20%20%20%3Fprotein%20rdfs%3AsubClassOf%20%3Fparent.%0A%20%20%20%20%20%20%20%3Fparent%20owl%3AequivalentClass%20%3Fres3.%0A%20%20%20%20%20%20%20%3Fres3%20owl%3AhasValue%20%3Fgene.%0A%20%20%20%20%20%20%7D%0A%20%20%20graph%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2Fgene%3E%0A%20%20%20%20%20%7B%20%3Fgene%20rdfs%3Alabel%20%3Fgenename%20%7D%0A%20%20%20graph%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2F20070416%3E%0A%20%20%20%20%20%7B%20%3Fprocess%20rdfs%3Alabel%20%3Fprocessname%7D%0A%7D&format=&maxrows=50
we can transform complex queries into links
we can transform complex queries into links
prefix go: <http://purl.org/obo/owl/GO#>prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>prefix owl: <http://www.w3.org/2002/07/owl#>prefix mesh: <http://purl.org/commons/record/mesh/>prefix sc: <http://purl.org/science/owl/sciencecommons/>prefix ro: <http://www.obofoundry.org/ro/ro.owl#>
select ?genename ?processnamewhere{ graph <http://purl.org/commons/hcls/pubmesh>
{ ?paper ?p mesh:D009369 . ?article sc:identified_by_pmid ?paper. ?gene sc:describes_gene_or_gene_product_mentioned_by ?article. } graph <http://purl.org/commons/hcls/goa> { ?protein rdfs:subClassOf ?res. ?res owl:onProperty ro:has_function. ?res owl:someValuesFrom ?res2. ?res2 owl:onProperty ro:realized_as. ?res2 owl:someValuesFrom ?process. graph <http://purl.org/commons/hcls/20070416/classrelations>
{{?process <http://purl.org/obo/owl/obo#part_of> go:GO_0006610} union
{?process rdfs:subClassOf go:GO_0006610 }} ?protein rdfs:subClassOf ?parent. ?parent owl:equivalentClass ?res3. ?res3 owl:hasValue ?gene. } graph <http://purl.org/commons/hcls/gene> { ?gene rdfs:label ?genename } graph <http://purl.org/commons/hcls/20070416> { ?process rdfs:label ?processname}}
we can help scholars “remix” queries
Mesh: Cancer
GO: Ribosomal Protein
we can build a corpus of queries as links
we can re-use cultural tools for scholarship
we can make science user-driven.
4.
institutions have to provide a stable foundation for the knowledge web.
Parkinson’s
Huntington’s
ALS
Multiple Sclerosis
Autism
process revolutions: the network
Parkinson’s
Huntington’s
ALS
Multiple Sclerosis
Autism
institutional revolutions: the network
the library to me:location, structure, discovery, preservation
the infrastructure for this is very, very shaky.
prefix dc: <http://purl.org/dc/elements/1.1/> prefix skos: <http://www.w3.org/2004/02/skos/core#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix owl: <http://www.w3.org/2002/07/owl#> prefix sc: <http://purl.org/science/owl/sciencecommons/> prefix foaf: <http://xmlns.com/foaf/0.1/>
what are the odds that the organizations making the namespaces will be here in 50 years? 100 years?
Huntington’s
Parkinson’s
Huntington’s
ALS
Multiple Sclerosis
Autism
Parkinson’s
Huntington’s
ALS
Multiple Sclerosis
Autism
library
“In any case, it is clear that a library containing all possible books, arranged at random, is equivalent (as a source of
information) to a library containing zero books.”
http://en.wikipedia.org/wiki/The_Library_of_Babel
exponential content growth
0
1.25
2.50
3.75
5.00
1990 1994 1998 2002
our brain capacity
1. Books are for use. 2. Every reader his [or her] book. 3. Every book its reader. 4. Save the time of the User. 5. The library is a growing organism.
1. Books are for use. 2. Every reader his [or her] book. 3. Every book its reader. 4. Save the time of the User. 5. The library is a growing organism.
what’s the digital version of the five laws?
call to action:
1. join up with the semantic people - support discipline-drivennamespaces and ontologies
2. queries are the interface - average user doesn’t knowhow to ask complicated questions on the research web.
3. make the library the hub of the research web.
thank you
http://sciencecommons.org