Top Banner
Automatically indexing science using natural- language processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust Gathering data Extracting (meta)data Using the data Thanks Automatically indexing science using natural-language processing, RDF and SPARQL Andrew Walkingshaw, Nick Day, Peter Corbett, Jim Downing, Joe Townsend, Peter Murray-Rust February 16, 2008
49

SemanticCampLondon, 16th February 2008

May 12, 2015

Download

Education

My presentation at SemanticCamp London, 16th February 2008
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Automatically indexing science usingnatural-language processing, RDF and

SPARQL

Andrew Walkingshaw, Nick Day, Peter Corbett, JimDowning, Joe Townsend, Peter Murray-Rust

February 16, 2008

Page 2: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Data sources

• Supplemental and experimental data

• Journals

• Self-archived papers (e.g. arXiv)

• Mainstream journalism

• Blogs

Page 3: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Data sources

• Supplemental and experimental data

• Journals

• Self-archived papers (e.g. arXiv)

• Mainstream journalism

• Blogs

Page 4: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Data sources

• Supplemental and experimental data

• Journals

• Self-archived papers (e.g. arXiv)

• Mainstream journalism

• Blogs

Page 5: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Data sources

• Supplemental and experimental data

• Journals

• Self-archived papers (e.g. arXiv)

• Mainstream journalism

• Blogs

Page 6: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Data sources

• Supplemental and experimental data

• Journals

• Self-archived papers (e.g. arXiv)

• Mainstream journalism

• Blogs

Page 7: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Supplemental data: CrystalEye

• http://wwmm.ch.cam.ac.uk/crystaleye/

• Repository for crystallographic data

Page 8: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Supplemental data: CrystalEye

• http://wwmm.ch.cam.ac.uk/crystaleye/

• Repository for crystallographic data

Page 9: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Journals and arXiv

• “Traditional” journal articles

• Titles and abstracts. . .

Page 10: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Journals and arXiv

• “Traditional” journal articles

• Titles and abstracts. . .

Page 11: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Journalism and blogs

• Unstructured text with little semantics;

• . . . hence Google Scholar, Web of Science, etc.

Page 12: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Journalism and blogs

• Unstructured text with little semantics;

• . . . hence Google Scholar, Web of Science, etc.

Page 13: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Semi-structured data: Golem

• We’ve got a lot of chemical data as CML

• http://en.wikipedia.org/wiki/Chemical Markup Language

• . . . but we still need to get data out of that and into amore useful form

• hence Golem: http://www.lexical.org.uk/science/golem/

• GRDDLish strategy for extracting data from CML files:identify dialect-specific concepts with XPath expressionsand XSLT stylesheets

• upshot: we can extract JSON objects from CML files.

Page 14: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Semi-structured data: Golem

• We’ve got a lot of chemical data as CML

• http://en.wikipedia.org/wiki/Chemical Markup Language

• . . . but we still need to get data out of that and into amore useful form

• hence Golem: http://www.lexical.org.uk/science/golem/

• GRDDLish strategy for extracting data from CML files:identify dialect-specific concepts with XPath expressionsand XSLT stylesheets

• upshot: we can extract JSON objects from CML files.

Page 15: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Semi-structured data: Golem

• We’ve got a lot of chemical data as CML

• http://en.wikipedia.org/wiki/Chemical Markup Language

• . . . but we still need to get data out of that and into amore useful form

• hence Golem: http://www.lexical.org.uk/science/golem/

• GRDDLish strategy for extracting data from CML files:identify dialect-specific concepts with XPath expressionsand XSLT stylesheets

• upshot: we can extract JSON objects from CML files.

Page 16: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Semi-structured data: Golem

• We’ve got a lot of chemical data as CML

• http://en.wikipedia.org/wiki/Chemical Markup Language

• . . . but we still need to get data out of that and into amore useful form

• hence Golem: http://www.lexical.org.uk/science/golem/

• GRDDLish strategy for extracting data from CML files:identify dialect-specific concepts with XPath expressionsand XSLT stylesheets

• upshot: we can extract JSON objects from CML files.

Page 17: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Semi-structured data: Golem

• We’ve got a lot of chemical data as CML

• http://en.wikipedia.org/wiki/Chemical Markup Language

• . . . but we still need to get data out of that and into amore useful form

• hence Golem: http://www.lexical.org.uk/science/golem/

• GRDDLish strategy for extracting data from CML files:identify dialect-specific concepts with XPath expressionsand XSLT stylesheets

• upshot: we can extract JSON objects from CML files.

Page 18: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Semi-structured data: Golem

• We’ve got a lot of chemical data as CML

• http://en.wikipedia.org/wiki/Chemical Markup Language

• . . . but we still need to get data out of that and into amore useful form

• hence Golem: http://www.lexical.org.uk/science/golem/

• GRDDLish strategy for extracting data from CML files:identify dialect-specific concepts with XPath expressionsand XSLT stylesheets

• upshot: we can extract JSON objects from CML files.

Page 19: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Free text: OSCAR3

• http://oscar3-chem.sourceforge.net/

• Natural-language parser for documents about chemistry

• Dark magic: don’t ask me how it works!

• . . . but it can be run as a Jetty webservice so as long as itdoes, I’m happy

• Author’s blog:http://wwmm.ch.cam.ac.uk/blogs/corbett/

Page 20: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Free text: OSCAR3

• http://oscar3-chem.sourceforge.net/

• Natural-language parser for documents about chemistry

• Dark magic: don’t ask me how it works!

• . . . but it can be run as a Jetty webservice so as long as itdoes, I’m happy

• Author’s blog:http://wwmm.ch.cam.ac.uk/blogs/corbett/

Page 21: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Free text: OSCAR3

• http://oscar3-chem.sourceforge.net/

• Natural-language parser for documents about chemistry

• Dark magic: don’t ask me how it works!

• . . . but it can be run as a Jetty webservice so as long as itdoes, I’m happy

• Author’s blog:http://wwmm.ch.cam.ac.uk/blogs/corbett/

Page 22: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Free text: OSCAR3

• http://oscar3-chem.sourceforge.net/

• Natural-language parser for documents about chemistry

• Dark magic: don’t ask me how it works!

• . . . but it can be run as a Jetty webservice so as long as itdoes, I’m happy

• Author’s blog:http://wwmm.ch.cam.ac.uk/blogs/corbett/

Page 23: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Free text: OSCAR3

• http://oscar3-chem.sourceforge.net/

• Natural-language parser for documents about chemistry

• Dark magic: don’t ask me how it works!

• . . . but it can be run as a Jetty webservice so as long as itdoes, I’m happy

• Author’s blog:http://wwmm.ch.cam.ac.uk/blogs/corbett/

Page 24: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Getting the data in

• Everything (more or less) talks RSS nowadays. . .

• RSS 0.91, RSS 1.0 (which one?), Atom, etc etc etc.

• Thankfully: feedparser (http://feedparser.org/)

Page 25: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Getting the data in

• Everything (more or less) talks RSS nowadays. . .

• RSS 0.91, RSS 1.0 (which one?), Atom, etc etc etc.

• Thankfully: feedparser (http://feedparser.org/)

Page 26: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Getting the data in

• Everything (more or less) talks RSS nowadays. . .

• RSS 0.91, RSS 1.0 (which one?), Atom, etc etc etc.

• Thankfully: feedparser (http://feedparser.org/)

Page 27: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Serializing metadata

• RDF – using:

• Dublin Core terms

• A homebrew ontology based on the IUCr’s CIF data format

• and another homebrew ontology for OSCAR annotations

• (it’d be good to standardise these, but to be honest, notmany people are doing this sort of thing)

Page 28: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Serializing metadata

• RDF – using:

• Dublin Core terms

• A homebrew ontology based on the IUCr’s CIF data format

• and another homebrew ontology for OSCAR annotations

• (it’d be good to standardise these, but to be honest, notmany people are doing this sort of thing)

Page 29: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Serializing metadata

• RDF – using:

• Dublin Core terms

• A homebrew ontology based on the IUCr’s CIF data format

• and another homebrew ontology for OSCAR annotations

• (it’d be good to standardise these, but to be honest, notmany people are doing this sort of thing)

Page 30: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Serializing metadata

• RDF – using:

• Dublin Core terms

• A homebrew ontology based on the IUCr’s CIF data format

• and another homebrew ontology for OSCAR annotations

• (it’d be good to standardise these, but to be honest, notmany people are doing this sort of thing)

Page 31: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Serializing metadata

• RDF – using:

• Dublin Core terms

• A homebrew ontology based on the IUCr’s CIF data format

• and another homebrew ontology for OSCAR annotations

• (it’d be good to standardise these, but to be honest, notmany people are doing this sort of thing)

Page 32: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

The process

• For each feed in a list of feeds:

• If it’s supplying CML data, set Golem on each entry, getthe observables out, and turn them into triples; runOSCAR3 over the title and/or abstract

• If it’s not, extract the free text from each entry, send it tothe OSCAR web service, and assign triples based on thechemical entities OSCAR finds

• Upload the RDF to your triple store

• (I’m using the Talis platform, so that’s just curl)

• And. . .

Page 33: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

The process

• For each feed in a list of feeds:

• If it’s supplying CML data, set Golem on each entry, getthe observables out, and turn them into triples; runOSCAR3 over the title and/or abstract

• If it’s not, extract the free text from each entry, send it tothe OSCAR web service, and assign triples based on thechemical entities OSCAR finds

• Upload the RDF to your triple store

• (I’m using the Talis platform, so that’s just curl)

• And. . .

Page 34: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

The process

• For each feed in a list of feeds:

• If it’s supplying CML data, set Golem on each entry, getthe observables out, and turn them into triples; runOSCAR3 over the title and/or abstract

• If it’s not, extract the free text from each entry, send it tothe OSCAR web service, and assign triples based on thechemical entities OSCAR finds

• Upload the RDF to your triple store

• (I’m using the Talis platform, so that’s just curl)

• And. . .

Page 35: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

The process

• For each feed in a list of feeds:

• If it’s supplying CML data, set Golem on each entry, getthe observables out, and turn them into triples; runOSCAR3 over the title and/or abstract

• If it’s not, extract the free text from each entry, send it tothe OSCAR web service, and assign triples based on thechemical entities OSCAR finds

• Upload the RDF to your triple store

• (I’m using the Talis platform, so that’s just curl)

• And. . .

Page 36: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

The process

• For each feed in a list of feeds:

• If it’s supplying CML data, set Golem on each entry, getthe observables out, and turn them into triples; runOSCAR3 over the title and/or abstract

• If it’s not, extract the free text from each entry, send it tothe OSCAR web service, and assign triples based on thechemical entities OSCAR finds

• Upload the RDF to your triple store

• (I’m using the Talis platform, so that’s just curl)

• And. . .

Page 37: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

The process

• For each feed in a list of feeds:

• If it’s supplying CML data, set Golem on each entry, getthe observables out, and turn them into triples; runOSCAR3 over the title and/or abstract

• If it’s not, extract the free text from each entry, send it tothe OSCAR web service, and assign triples based on thechemical entities OSCAR finds

• Upload the RDF to your triple store

• (I’m using the Talis platform, so that’s just curl)

• And. . .

Page 38: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

SPARQL is great.

Just post queries at a SPARQL endpoint:authortemplate=’’’PREFIX dc: <http://purl.org/dc/terms/>PREFIX ce:<http://wwmm.ch.cam.ac.uk/crystaleye/dictionary#>DESCRIBE ?file WHERE { ?file dc:contributorsome author . }’’’

Page 39: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

SPARQL isn’t (entirely) great.

• Scientists shouldn’t have to know this stuff.

• So we need to build a front end which your average senioracademic might be able to use. . .

• (i.e. it’s got to look like a website.)

Page 40: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

SPARQL isn’t (entirely) great.

• Scientists shouldn’t have to know this stuff.

• So we need to build a front end which your average senioracademic might be able to use. . .

• (i.e. it’s got to look like a website.)

Page 41: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

SPARQL isn’t (entirely) great.

• Scientists shouldn’t have to know this stuff.

• So we need to build a front end which your average senioracademic might be able to use. . .

• (i.e. it’s got to look like a website.)

Page 42: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

What queries do we want?

• What experimental data is an author responsible for?

• What chemical entities are in some data?

• Where is a given chemical entity talked about?

• So we can build a web app around these queries.

• django + rdflib + sparql + Talis Platform

Page 43: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

What queries do we want?

• What experimental data is an author responsible for?

• What chemical entities are in some data?

• Where is a given chemical entity talked about?

• So we can build a web app around these queries.

• django + rdflib + sparql + Talis Platform

Page 44: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

What queries do we want?

• What experimental data is an author responsible for?

• What chemical entities are in some data?

• Where is a given chemical entity talked about?

• So we can build a web app around these queries.

• django + rdflib + sparql + Talis Platform

Page 45: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

What queries do we want?

• What experimental data is an author responsible for?

• What chemical entities are in some data?

• Where is a given chemical entity talked about?

• So we can build a web app around these queries.

• django + rdflib + sparql + Talis Platform

Page 46: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

What queries do we want?

• What experimental data is an author responsible for?

• What chemical entities are in some data?

• Where is a given chemical entity talked about?

• So we can build a web app around these queries.

• django + rdflib + sparql + Talis Platform

Page 47: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Demo!

And here it is.

Page 48: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Thanks to. . .

• Talis (http://n2.talis.com/) for access to their platform

• and to the RSC and IUCr for their support of CrystalEye.

Page 49: SemanticCampLondon, 16th February 2008

Automaticallyindexing

science usingnatural-language

processing,RDF andSPARQL

AndrewWalkingshaw,

Nick Day,Peter Corbett,Jim Downing,

JoeTownsend,

PeterMurray-Rust

Gatheringdata

Extracting(meta)data

Using the data

Thanks

Thanks to. . .

• Talis (http://n2.talis.com/) for access to their platform

• and to the RSC and IUCr for their support of CrystalEye.