Top Banner
Crawling Across the Web of Chemistry Using ChemSpider
75

Crawling Across the Web of Chemistry Using ChemSpider

May 10, 2015

Download

Technology

ChemSpider is a free access website for chemists built with the intention of providing a structure centric community for chemists. It was developed to index available sources of chemical structures and their associated data into a single searchable repository and making it available to everybody, at no charge. While there are a large number of databases containing chemical compounds and data available online their inherent quality, accuracy and completeness is severely lacking. ChemSpider has provided a platform so that the chemistry community could contribute to improving the quality of data online and expanding the information to include data such as reaction syntheses, analytical data, experimental properties and linkages to other valuable resources. It has grown into a resource containing over 21 million unique chemical structures from over 200 data sources.
This presentation will provide an overview of ChemSpider and its value to chemists as a search tool, as a public repository of information and how it can become one of the primary foundations of internet-based chemistry. I will also discuss the vision for ChemSpider and some of the lofty goals we are setting for the system moving forward.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Crawling Across the Web of Chemistry Using ChemSpider

Crawling Across the Web of Chemistry Using ChemSpider

Page 2: Crawling Across the Web of Chemistry Using ChemSpider

Citizen Scientists Enable the Web

Who is writing about chemical compounds on Wikipedia?

Who is writing critical reviews of Chemistry online?

Who is blogging about chemistry on the web?

Page 3: Crawling Across the Web of Chemistry Using ChemSpider

For Synthesis…TotallySynthetic.com

Page 4: Crawling Across the Web of Chemistry Using ChemSpider

Org Prep Daily (Blog)

Page 5: Crawling Across the Web of Chemistry Using ChemSpider

Molbank (Open Access Journal)

Page 6: Crawling Across the Web of Chemistry Using ChemSpider

Synthetic Pages (Website)

Page 7: Crawling Across the Web of Chemistry Using ChemSpider

Encyclopedic Articles (Wikipedia)

Page 8: Crawling Across the Web of Chemistry Using ChemSpider
Page 9: Crawling Across the Web of Chemistry Using ChemSpider

Chemistry online – An Overview Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Chemical Synthesis procedures Scientific publications Chemical vendors Blogs Wikis Open Notebook Science

Page 10: Crawling Across the Web of Chemistry Using ChemSpider

What and who do you trust?

Page 11: Crawling Across the Web of Chemistry Using ChemSpider

Compounds and Identifiers

Page 12: Crawling Across the Web of Chemistry Using ChemSpider

What is ChemSpider? ChemSpider is:

Building a Structure Centric Community for Chemists >23 million compounds, ca. 250 data sources

A deposition and curation platform

A publishing platform for the community

Grows daily – more depositions, more links, more data sources

Page 13: Crawling Across the Web of Chemistry Using ChemSpider

Search Cholesterol

Page 14: Crawling Across the Web of Chemistry Using ChemSpider

Search Cholesterol

Page 15: Crawling Across the Web of Chemistry Using ChemSpider

Search Cholesterol

Page 16: Crawling Across the Web of Chemistry Using ChemSpider

Search Cholesterol

Page 17: Crawling Across the Web of Chemistry Using ChemSpider

Search Cholesterol

Page 18: Crawling Across the Web of Chemistry Using ChemSpider

Linked across the internet

Page 19: Crawling Across the Web of Chemistry Using ChemSpider

Link off a structure in ChemSpider

Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “Everything”

Page 20: Crawling Across the Web of Chemistry Using ChemSpider

Linked to Millions of Articles

Page 21: Crawling Across the Web of Chemistry Using ChemSpider

Answering Questions for Chemists

Questions a chemist might ask… What is the melting point of n-butanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue?

Page 22: Crawling Across the Web of Chemistry Using ChemSpider

What is the structure of Flibanserin?

Page 23: Crawling Across the Web of Chemistry Using ChemSpider

What is the structure of Flibanserin?

Page 24: Crawling Across the Web of Chemistry Using ChemSpider

Complex Data and Information

Page 25: Crawling Across the Web of Chemistry Using ChemSpider

Various Searches

Structure searching

Substructure searching

Subset searching – choose from 200 data sources

Property searching

Searches are used in various ways by different types of chemists…

Page 26: Crawling Across the Web of Chemistry Using ChemSpider

ChemSpider Searches

Page 27: Crawling Across the Web of Chemistry Using ChemSpider

ChemSpider Searches

Page 28: Crawling Across the Web of Chemistry Using ChemSpider

Caution! Question Everything!

Page 29: Crawling Across the Web of Chemistry Using ChemSpider

Vancomycin

Who will curate?

PubChem is not resourced to clean these errors

How would you clean such a large dataset?

Page 30: Crawling Across the Web of Chemistry Using ChemSpider

Vancomycin on ChemSpider 1 compound – discussions over 3 days

Page 31: Crawling Across the Web of Chemistry Using ChemSpider

The EXPERTS must get it right?!

Page 32: Crawling Across the Web of Chemistry Using ChemSpider

Wikipedia, C&E News, PubChem C&E News (from ACS)

Page 33: Crawling Across the Web of Chemistry Using ChemSpider

“Lathosterol”

Page 34: Crawling Across the Web of Chemistry Using ChemSpider

“Lathosterol”

Page 35: Crawling Across the Web of Chemistry Using ChemSpider

“Lathosterol”

Page 36: Crawling Across the Web of Chemistry Using ChemSpider

“Lathosterol” Removed

Page 37: Crawling Across the Web of Chemistry Using ChemSpider
Page 38: Crawling Across the Web of Chemistry Using ChemSpider

“Lathosterol” on PubChem

Page 39: Crawling Across the Web of Chemistry Using ChemSpider

Crowd-sourcing Chemistry Curation

Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate

Page 40: Crawling Across the Web of Chemistry Using ChemSpider

Citizen Scientists

Page 41: Crawling Across the Web of Chemistry Using ChemSpider

Become a Data Source

Page 42: Crawling Across the Web of Chemistry Using ChemSpider
Page 43: Crawling Across the Web of Chemistry Using ChemSpider

Synthesis Procedures

Page 44: Crawling Across the Web of Chemistry Using ChemSpider

Links to Data or Deposit Data

Page 45: Crawling Across the Web of Chemistry Using ChemSpider

Your Blog Posted Online?

Page 46: Crawling Across the Web of Chemistry Using ChemSpider

Upload Spectral Data, OPEN Data?

Page 47: Crawling Across the Web of Chemistry Using ChemSpider

Data as DOIs

Primary Data for Chemistry Available for the First Time

…Thieme is the first publisher to make primary chemistry data accessible worldwide

Analytical data, from various experiments, is the foundation of research work and scientific papers

From now on, primary data will be registered and made available online using digital object recognition in the form of Digital Object Identifiers (DOI)

Page 48: Crawling Across the Web of Chemistry Using ChemSpider

Linking Data By DOI

Page 49: Crawling Across the Web of Chemistry Using ChemSpider

Semantic Mark-up for Chemistry

Semantic mark-up for chemistry is here

RSC project prospect (structure linking, IUPAC Gold Book ontology and other ontologies). Based on the OSCAR system

ChemSpider Journal of Chemistry

Nature publishing group compound linking

Page 50: Crawling Across the Web of Chemistry Using ChemSpider

ChemSpider and Publishing

Curation led to a set of validated dictionaries

Integrated entity extraction with validated name dictionaries

Additional dictionaries gave reactions, groups, families, hardware and software vendors etc

Page 51: Crawling Across the Web of Chemistry Using ChemSpider

ChemMantis and CJOC

Page 52: Crawling Across the Web of Chemistry Using ChemSpider

Name-Structure Pairs

Page 53: Crawling Across the Web of Chemistry Using ChemSpider

Deposit Structures

Page 54: Crawling Across the Web of Chemistry Using ChemSpider

Species – linked to Wikipedia

Page 55: Crawling Across the Web of Chemistry Using ChemSpider

Semantic Linking of Structures

What would you want to link off a structure? Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “Everything”

Page 56: Crawling Across the Web of Chemistry Using ChemSpider

RSC’s Project Prospect

Page 57: Crawling Across the Web of Chemistry Using ChemSpider

In Development ChemSpider Synthesis

ChemSpider Synthesis will be a home for all things “synthetic”

An online resource for synthetic procedures from blogs, other online resources, RSC supplementary info, other publishers etc.

Public peer-review and feedback for synthetic procedures

Page 58: Crawling Across the Web of Chemistry Using ChemSpider

RSC Supplementary Info

Page 59: Crawling Across the Web of Chemistry Using ChemSpider

Online Journals and Live Data

Page 60: Crawling Across the Web of Chemistry Using ChemSpider

ChemSpider Everywhere : Embed

Page 61: Crawling Across the Web of Chemistry Using ChemSpider

ChemSpider Everywhere: Spectral Game

Page 62: Crawling Across the Web of Chemistry Using ChemSpider

ChemSpider EverywhereCrowdsourced Curation of Spectra

Page 63: Crawling Across the Web of Chemistry Using ChemSpider

Building a Structure Centric Community for Chemists

ChemSpider EverywhereChemMobi

Page 64: Crawling Across the Web of Chemistry Using ChemSpider

ChemSpider Web Services

Page 65: Crawling Across the Web of Chemistry Using ChemSpider

ChemSpider Everywhere Linked from Wikipedia

Linked from Open Notebook Science sites

Linked from Blogs using Structure/Spectra

Integrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source applets

Page 66: Crawling Across the Web of Chemistry Using ChemSpider
Page 67: Crawling Across the Web of Chemistry Using ChemSpider

Where is ChemSpider Lacking?

ChemSpider is limited to “defined chemicals”. No support for: Polymers Minerals Markush structures

ChemSpider is very dependent on InChIs Stereochemistry around non-carbon centers Organometallics are not correctly represented

There are millions of errors on ChemSpider

Page 68: Crawling Across the Web of Chemistry Using ChemSpider

What’s next? Keep cleaning and depositing data

Enable discovery via the semantic web (RDF)

Integrate software: Symyx Jdraw, NMRShiftDB

Integrate RSC content – a massive archive!

Integrate RSC publishing workflows and databases

Page 69: Crawling Across the Web of Chemistry Using ChemSpider

Continue Building Community for Chemistry

Building a Public ADME/Tox database

Delivering ChemSpider Synthetic Pages

Delivering ChemSpider Analytical Data

Delivering ChemSpider Education

Project Focus

Page 70: Crawling Across the Web of Chemistry Using ChemSpider

People Make Change HappenYou are invited.. Curate ChemSpider data and link to us

Deposit your data with us Structures Spectra Synthesis procedures

ChemSpider Synthesis is under development

Page 71: Crawling Across the Web of Chemistry Using ChemSpider

People Make Change Happen ChemSpider was a “hobby project”

Housed in a basement and running off three servers – one bought, two built

Sensitive to weather and power stability

Went live at ACS Spring 2007 in Chicago

ca. 6000 visitors a day, >50,000 transactions daily

Page 72: Crawling Across the Web of Chemistry Using ChemSpider

Organizations Scale Innovation

Page 73: Crawling Across the Web of Chemistry Using ChemSpider

There is a Downside…

Page 74: Crawling Across the Web of Chemistry Using ChemSpider

There is a Downside…

Page 75: Crawling Across the Web of Chemistry Using ChemSpider

Thank you

[email protected]: ChemSpidermanwww.chemspider.com/blogSLIDES: www.slideshare.net/AntonyWilliams