How community crowdsourcing and social networking is helping to build a quality online resource for chemists
Dec 13, 2014
How community crowdsourcing and social networking is helping to build a quality online resource for chemists
A Pragmatic Vision“Build a Structure Centric Community to
Serve Chemists”
Integrate chemical structure data on the web Create a “structure-based hub” to information and
data Provide access to structure-based “algorithms” Let chemists contribute their own data Allow the community to curate/correct data
www.chemspider.com
We’re Out to Answer Questions
Questions a chemist might ask… What is the melting point of n-heptanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue?
Search for a Chemical…by name
Available Information…
Linked to vendors, safety data, toxicity, metabolism
Available Information….
Search for a chemical…by structureSubstructure search coming…
Crowdsourcing – Wikipedia definition
“Crowdsourcing is a distributed problem-solving and production model.
Problems are broadcast to an unknown group of solvers in the form of an open call for solutions. Users—also known as the crowd—typically form into online communities, and the crowd submits solutions.
The crowd also sorts through the solutions, finding the best ones.”
Annotating, Cleaning and Growing...
Almost 25 million chemicals from 400 diverse data sources
“Diverse” data sources… High Quality through questionable to wrong Rich content of Wikipedia links, YouTube videos
and photographs to “Stub Records” containing “just a structure”
All records can be further enhanced…25 million compounds need annotation by the masses
ChemSpider Searching
Most chemists perform text-based searches first To get the correct structure from a text-based
search the name-structure association needs to be “correct” – should Viagra return sildenafil or sildenafil citrate?
Search “Vitamin H”
Search “Vitamin H”
“Curate” Identifiers
“Curate” Identifiers
“Curate” Identifiers
“Curate” Identifiers
General curation activities Remove incorrect names Correct spellings Remove names with/without stereo compared
to the structure Correct registry numbers and other numeric
identifiers (Beilstein, EINECS etc) Add multilingual names Add alternative names
Crowdsourced “Annotations”
Registered Users can add Descriptions/Syntheses/Commentaries Links to PubMed articles Links to articles via DOIs Add spectral data Add Crystallographic Information Files Add photos Add MP3 files Add Videos
Spectra Linked
Spectra Linked
Web Services
www.SpectralGame.comhttp://www.jcheminf.com/content/1/1/9
Spectral Game
Increasing Complexity
Reactions and ChemSpider
ChemSpider intends to be a high-quality source of structure-based information
What about chemical reactions?
ChemSpider SyntheticPages
ChemSpider SyntheticPages
Submission process Register as a user Use the Submit button and fill in the fields…
Submission Process
Submissions reviewed by editorial board
Published as is or comments sent to author
Online Peer Review process
Data supported include web movies, images, live spectra etc.
Community crowdsourcing and social networking Community crowdsourcing and social networking
is helping to build a quality online resource for chemists
Community provides and/or deposits data Community curation, feedback, annotation Social networking tools keep the community
engaged and connected – the latest web design was voted for on a blog
The path is working and we will continue to optimize
ChemSpider demos and training
ChemSpider demos at booth 301: Royal Society of Chemistry
Hands-on ChemSpider TrainingRoom: Room 102BLocation: Boston Convention Center Date: Tuesday 24th August Time: 3:30-6pm
Thank you
[email protected]: ChemSpidermanwww.chemspider.com/blogSLIDES: www.slideshare.net/AntonyWilliams