How an Online Chemistry How an Online Chemistry Resource Resource Could Change Could Change Our Our World World Antony Williams Antony Williams Triangle Chromatography Discussion Group, Triangle Chromatography Discussion Group, Raleigh, NC, May 2009 Raleigh, NC, May 2009
76
Embed
How an Online Resource for Chemistry Can Change Our World
This is a presentation given at the Triangle Chromatography Discussion Group with a focus on Mass Spectrometry and associated web services and what is possible for chromatographers
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
How an Online Chemistry How an Online Chemistry Resource Resource
Could Change Could Change OurOur World World
Antony WilliamsAntony Williams
Triangle Chromatography Discussion Group,Triangle Chromatography Discussion Group,Raleigh, NC, May 2009Raleigh, NC, May 2009
Building a Structure Centric Community for Chemists
Imagine a time when ….Imagine a time when ….
The internet is searchable by chemical structure and The internet is searchable by chemical structure and substructure (e.g.Wikipedia, Google Scholar)substructure (e.g.Wikipedia, Google Scholar)
When there is an online database of NMR, IR, MS spectra When there is an online database of NMR, IR, MS spectra and chromatography methods built by available to the and chromatography methods built by available to the communitycommunity
Chemistry articles are indexed and searchable by Chemistry articles are indexed and searchable by “chemistry”“chemistry”
The web is linked together through the “language of The web is linked together through the “language of chemistry”chemistry”
Publicly funded research data can be shared and Publicly funded research data can be shared and discussed in the Open, maybe as Open Notebook Sciencediscussed in the Open, maybe as Open Notebook Science
Cheminformatics has as much of a public face and success Cheminformatics has as much of a public face and success as bioinformatics (Protein DataBank, Genbank, etc)as bioinformatics (Protein DataBank, Genbank, etc)
Building a Structure Centric Community for Chemists
The Language of ChemistryThe Language of Chemistry
My language….My language….
Building a Structure Centric Community for Chemists
And its dialects….And its dialects….
Building a Structure Centric Community for Chemists
As a chemist…As a chemist…
I look for information about I look for information about chemicals/chemistrychemicals/chemistry What is a particular structure ?What is a particular structure ? What alternative names/identifiers?What alternative names/identifiers? Reaction synthesis?Reaction synthesis? Physical properties?Physical properties? Analytical data?Analytical data? Purchase?Purchase? Tell me more?Tell me more? Similar stuff – what other compounds are “like” Similar stuff – what other compounds are “like”
mine?mine?
Building a Structure Centric Community for Chemists
Linked Data CloudLinked Data Cloud
Building a Structure Centric Community for Chemists
Chemistry on the InternetChemistry on the Internet
Much of the information online is Much of the information online is User Beware! User Beware!
The Quality of information is “diverse”The Quality of information is “diverse”
Technologies can “link and connect” information Technologies can “link and connect” information but validation and curation is key to providing but validation and curation is key to providing qualityquality
The LinkedData web is of less value when the The LinkedData web is of less value when the data linked are “wrong”data linked are “wrong”
Building a Structure Centric Community for Chemists
““Good Stuff” Good Stuff” TotallySynthetic.comTotallySynthetic.com
Building a Structure Centric Community for Chemists
PubChemPubChem
Building a Structure Centric Community for Chemists
Questions a chemist might ask…Questions a chemist might ask… What is the melting point of n-butanol? What is the melting point of n-butanol? What is the chemical structure of Xanax?What is the chemical structure of Xanax? Chemically, what is phenolphthalein?Chemically, what is phenolphthalein? What are the stereocenters of cholesterol?What are the stereocenters of cholesterol? Where can I find publications about xylene?Where can I find publications about xylene? What are the different trade names for What are the different trade names for
Ketoconazole?Ketoconazole? What is the NMR spectrum of Aspirin?What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol What are the safety handling issues for Thymol
Blue?Blue?
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Link outsLink outs
Building a Structure Centric Community for Chemists
Complex Data and InformationComplex Data and Information
Building a Structure Centric Community for Chemists
Online Analytical DataOnline Analytical Data
Building a Structure Centric Community for Chemists
Various Searches Various Searches
Structure searchingStructure searching Substructure searchingSubstructure searching Subset searching – choose from 200 data Subset searching – choose from 200 data
Value for Mass Spectrometrists and Value for Mass Spectrometrists and Chromatographers?Chromatographers?
Building a Structure Centric Community for Chemists
ChemSpider for MS ChemSpider for MS SpectrometristsSpectrometrists
What would an MS spectrometrist want to do?What would an MS spectrometrist want to do? Search the database based on mass (various forms)Search the database based on mass (various forms) Search selected subsets of the database based on massSearch selected subsets of the database based on mass Search based on mass and substructure(s)Search based on mass and substructure(s) Search for structure based on name(s) or database IDsSearch for structure based on name(s) or database IDs Search for structures based on elements/not elementsSearch for structures based on elements/not elements Download the structure/structures in standard formatDownload the structure/structures in standard format Search literature for informationSearch literature for information Identify related data sources – chemical vendors, Identify related data sources – chemical vendors,
pathway databases, etcpathway databases, etc
Building a Structure Centric Community for Chemists
Search Database Based on Search Database Based on MassMass
Building a Structure Centric Community for Chemists
Mass Based Searches?Mass Based Searches?
What compounds have a mass of 300+/-What compounds have a mass of 300+/-0.001?0.001?
Building a Structure Centric Community for Chemists
59 hits/1.3 seconds from 21.5 59 hits/1.3 seconds from 21.5 MILLIONMILLION
Building a Structure Centric Community for Chemists
Substructure and PropertySubstructure and Property
Building a Structure Centric Community for Chemists
Building a Structure Centric Community for Chemists
Elemental ConstraintsElemental Constraints
Building a Structure Centric Community for Chemists
Search based on Data SourcesSearch based on Data Sources
Building a Structure Centric Community for Chemists
Outlinks – to vendors and other Outlinks – to vendors and other databasesdatabases
Example databases of interest to MS Example databases of interest to MS Spectrometrists:Spectrometrists: HMDB – Human Metabolome DatabaseHMDB – Human Metabolome Database
KEGG – Kyoto Encyclopedia of Genes and GenomesKEGG – Kyoto Encyclopedia of Genes and Genomes
BioCyc - collection of Pathway/Genome DatabasesBioCyc - collection of Pathway/Genome Databases
Uni. Minnesota Biodegradation DB - information on Uni. Minnesota Biodegradation DB - information on microbial biocatalytic reactions and biodegradation microbial biocatalytic reactions and biodegradation pathways for primarily xenobiotic, chemical pathways for primarily xenobiotic, chemical compounds compounds
WikiPathways – new initiative to build crowdsourced WikiPathways – new initiative to build crowdsourced pathway data managementpathway data management
Building a Structure Centric Community for Chemists
Links out to KEGGLinks out to KEGGKyoto Encyclopedia of Genes and Kyoto Encyclopedia of Genes and
Genomes Genomes
Building a Structure Centric Community for Chemists
WikiPathways LinkWikiPathways Link
Building a Structure Centric Community for Chemists
Download Structure(s)Download Structure(s)
Download individual record – molfileDownload individual record – molfile
Download SDF file (group of structures)Download SDF file (group of structures)
Building a Structure Centric Community for Chemists
Web Service IntegrationWeb Service Integration
ChemSpider integration presently ChemSpider integration presently integrated to Bruker, Waters and Thermo integrated to Bruker, Waters and Thermo – more vendors coming…– more vendors coming…
Direct integration to vendor data Direct integration to vendor data processing toolsprocessing tools
Building a Structure Centric Community for Chemists
MassSpec API Web ServicesMassSpec API Web Services
Building a Structure Centric Community for Chemists
Test resultsTest results
Building a Structure Centric Community for Chemists
Waters IntegrationWaters Integration
Building a Structure Centric Community for Chemists
Waters IntegrationWaters Integration
Building a Structure Centric Community for Chemists
Outlinks from TableOutlinks from Table
Building a Structure Centric Community for Chemists
For Chromatographers?For Chromatographers?
““Structure-based methods” being linkedStructure-based methods” being linked Structure-centric searching of methodsStructure-centric searching of methods We can host chromatograms for displayWe can host chromatograms for display LogPs and LogDs (pH5.5 and 7.4) calculated LogPs and LogDs (pH5.5 and 7.4) calculated
for >21 million compounds using ACD/Labs for >21 million compounds using ACD/Labs softwaresoftware
We’d love to host collections from the column We’d love to host collections from the column vendors!vendors!
Building a Structure Centric Community for Chemists
From 21.5 MILLION From 21.5 MILLION molecules…molecules…
Data are gathered/deposited from >200 Data are gathered/deposited from >200 data sourcesdata sources Government databasesGovernment databases Chemical vendorsChemical vendors WikipediaWikipedia
There are “imperfections” in all online There are “imperfections” in all online data sourcesdata sources
How bad can it get????How bad can it get????
Building a Structure Centric Community for Chemists
What is “wrong”?What is “wrong”?
Building a Structure Centric Community for Chemists
Quality is a Major Issue- Search Quality is a Major Issue- Search ButanolButanol
OLD EXAMPLE..now fixedOLD EXAMPLE..now fixed
Building a Structure Centric Community for Chemists
VancomycinVancomycin
Who will Who will curate?curate?
PubChem is PubChem is not resourced not resourced to clean these to clean these errors errors
How would How would you clean such you clean such a large a large dataset?dataset?
Building a Structure Centric Community for Chemists
Nature ChemistryNature Chemistry articles articles are annotated to identify all are annotated to identify all of the chemical compounds of the chemical compounds mentioned throughout the mentioned throughout the text. text.
Those compounds are linked Those compounds are linked out to other information out to other information resources including resources including PubChem and PubChem and ChemSpiderChemSpider. .
Building a Structure Centric Community for Chemists
Building a Structure Centric Community for Chemists
Building a Structure Centric Community for Chemists
It Happened in a Basement!!It Happened in a Basement!!
Homebuilt serversHomebuilt servers Cable internetCable internet Software donationsSoftware donations Lots of hard workLots of hard work >8000 users per day>8000 users per day >80,000 transactions per day>80,000 transactions per day
Building a Structure Centric Community for Chemists
And now…And now…
The The Royal Society of ChemistryRoyal Society of Chemistry announced on May announced on May 11th that it has 11th that it has acquired ChemSpideracquired ChemSpider, heralding a , heralding a breakthrough investment for the organisation and for breakthrough investment for the organisation and for the Chemistry Community. This acquisition reflects the Chemistry Community. This acquisition reflects RSC's commitment to providing access to rich RSC's commitment to providing access to rich resources of chemistry data and information. resources of chemistry data and information.