Introduction
• I will be discussing an expert system developed to determine the chemical structure of an unknown compound (structure elucidation)
• The expert system is implemented on a blackboard
IntroductionMotivation
• Structure elucidation is a fundamental component of organic chemistry
• Requires a wide range of expertise– Each elucidation technique has its own unique
vocabulary that needs to be mastered
• An expert system can be used to simplify this process
IntroductionOutline
• Outline of presentation:1) Fundamentals of blackboard systems
2) The expertise being modeled• General spectroscopic techniques
3) Description of the expert system
Blackboard Systems
“Metaphorically, we can think of a set of workers, all looking at the same blackboard: each is able to read everything that is on it and to
judge when he has something worthwhile to add to it.” – Newell, 1969
Blackboard Systems
• A set of experts independently modify solution elements on a central database to produce a complete solution
• The experts communicate solely through their contributions to the central database
• Three major components:– 1) a globally accessible database (the blackboard)– 2) a set of knowledge sources (the experts)– 3) a control mechanism (the scheduler)
Blackboard SystemsThe Blackboard
• Blackboard is structured as an abstraction hierarchy
• Problems can be solved from different points by different knowledge sources
• Items on the blackboard are called entries
• Entries on the same level or on different levels of the hierarchy are linked
• Linked entries constitute a potential solution
Blackboard SystemsThe Knowledge Sources
• Knowledge sources are structured as condition-action pairs– The condition component monitors the blackboard for
any changes– The action component makes changes to the
blackboard when the condition-part is satisfied
• When the condition is satisfied, the knowledge source is “triggered” and the scheduler decides whether the knowledge source will execute its action
Blackboard SystemsThe Scheduler
• One or more problem solving strategies are implemented
• The scheduler examines the current state of the blackboard and decides which triggered knowledge source to execute based on the problem solving strategy in place
• The scheduler can abandon a strategy and adopt a new one or ignore a strategy altogether in order to pursue the most promising solution
Structure Elucidation
• Modern structure elucidation is done using spectroscopy
• In absorption spectroscopy a frequency of light is irradiated on a sample of the unknown and the absorption of the compound is measured
• The resulting data is analyzed by an expert and information about the structure of the unknown can be obtained
• The information collected from each spectra is integrated to determine the complete structure
Infrared Spectroscopy
• Involves the absorption of light in the infrared region of the electromagnetic spectrum
• Used primarily to determine what functional groups are present in a molecule
O
CH3
CH3
CH N
CH3 OH
CH3 NH
CH3
CH3
CH3
CH CH
Infrared Spectroscopy
• The broad peak at around 3000 cm-1 indicates the presence of a hydroxyl group (OH)
• The strong, sharp peak at around 1750 cm-1 indicates the presence of a carbonyl group
UV Spectroscopy
• Involves the absorption of light in the ultraviolet region of the electromagnetic spectrum
• Used to determine the level of conjugation in the unknown– Conjugation is alternating single and double bonds
• UV spectroscopy is not very useful in structure elucidation
CH3 CH2
Proton NMR
• Contains information about the hydrogens in the molecule
• Three key aspects:1) chemical shift – the “type” of hydrogen2) integration – ratio of different types of hydrogens3) splitting – nearest neighbour relationship
• Can be used to identify the presence of certain functional groups
• Used primarily to determine how the different functional groups present fit together (the connectivity)
Proton NMR• The peak at around 10 ppm indicates the presence of an aldehyde
• The peak at 2.6 ppm is split into 4 peaks (a quartet) indicating adjacent to a carbon with 3 hydrogens
O
H
CH3
Carbon-13 NMR
• Contains information about the carbons in the molecule
• Three key aspects:1) chemical shift – the “type” of carbon
2) splitting – the number of hydrogens bonded to each carbon
3) number of unique carbons present
• Used to determine connectivity
• Peak at 190 ppm indicates the presence of a carbonyl (C=O)
• There are 7 total peaks indicating that there are only 7 unique carbons in the molecule
Carbon-13 NMR
Mass Spectroscopy
• Mass spectroscopy is used to determine the molecular formula of the unknown compound
• Mass spectroscopy data that provides structural information tends to be unreliable and thus will only be used to verify a possible structure or in the event that the other spectral techniques are unsuccessful
Structure ElucidationApplicability of a Blackboard Architecture
• Each type of spectroscopy is unique • A human expert will often analyze a set of
spectra as a whole, selectively determining which spectral information to utilize at a given time
• The blackboard architecture is ideal for this approach
• The blackboard architecture also allows for new experts to be added (new spectroscopic techniques)
The Expert SystemThe Blackboard
• An expert system implemented on a distributed blackboard has been developed to determine the structure of a chemical compound
• A sequential implementation of a blackboard would allow only one expert to access the blackboard at a time
• In a distributed system experts can access different sections of the blackboard at the same time
The Expert SystemThe Blackboard
• The hierarchy of the blackboard is based on the complexity of the structures being produced– Low level, basic structures occupy a certain
level of the blackboard while more complicated structures occupy a different level
The Expert SystemThe Experts
• There are two main types of experts:1) Structure generation routines
2) Spectroscopy experts
Structure Generation RoutinesStoring Structures
• Ideally every possible chemical structure could be stored but this is not feasible– Even a simple formula such as C23H48 has 5,731,580
structural isomers
• Instead a set of substructures (components) is stored such that any possible structure can be formed from a combination of these components
• There are 630 total components• Components are classified as primary,
secondary or tertiary components
Structure Generation RoutinesTypes of Components
• 1) Primary Components:– Primary components are the most basic components
for constructing organic molecules (CH3, CH2, CH, C, CO, OH, O, NH2, NH, N, SH, S, F, Cl, Br, I)
• 2) Secondary Components:– Secondary components are combinations of primary
components– There are 86 secondary components
• 3) Tertiary Components:– Tertiary components are secondary components with
a restriction on what the component can bond to
Structure Generation Routines
• The structure generation routines produce sets of primary, secondary or tertiary components based on input data
• The sets can be further pruned using spectral information
Spectroscopy Experts
• There is an expert for each type of spectroscopy:1) Infrared Expert
2) Ultraviolet Expert
3) Proton NMR Expert
4) Carbon-13 NMR Expert
5) Mass Spectroscopy Expert
Spectroscopy Experts
• The data contained in a spectrum may be unreliable or ambiguous– e.g. in a proton NMR spectrum if the chemical shift
between two hydrogens is < 1 then the splitting observed may be inaccurate
• Heuristic rules are used to handle this ambiguity• Uncertainty factors are attached to each
conclusion drawn from the spectra
Spectroscopy Experts
• Each spectral expert translates the data contained in the spectra into molecular fragments
• These fragments are placed in an “active list” which is used to direct and restrict the structure generation routines
• If fragments from different experts conflict then the fragment with the highest certainty factor is used
• The conflicting fragment is placed in an “inactive list” which is used in the event that a correct structure is not found using the active list
Spectroscopy Experts
• The spectroscopy experts are also used to test generated structures for consistency with the spectral information
• The system is able to identify when there is not enough information to verify a possible structure
An Example…
• Formula of unknown: C7H12O4
• 93 possible sets of primary components are produced
• Using these primary sets 497 sets of secondary components are possible– the number of sets of secondary components
can be decreased if the primary component sets are pruned using spectral data
An Example…
• After pruning the sets of primary components only one possible set remains:– Set contains 2CH3, 2C=O, 2OH, 1C and 2CH2
O
OHCH3
CH3
O
OHOH
OO
OH
CH3
CH3
Conclusion
• Determining the chemical structure of an unknown is an important part of organic chemistry
• Expert system technology can be applied to this domain
• A blackboard architecture is especially well suited to this task
References
1) Craig, I. D., Blackboard Systems, Artificial Intelligence Review (1988) 2, 103 - 118.
2) Funatsu, K., Susuta, Y., Sasaki, S., Introduction of Two-Dimensional NMR Spectral Information to an Automated Structure Elucidation System, CHEMICS. Utilization of 2D-Inadequate Information, J. Chem. Inf. Comput. Sci., 1989, 29, 6-11.
3) Sobczak, Ronald S., Matthews, Manton M., An Expert System for Chemical Structure Elucidation Implemented on a Blackboard, Proceedings of the 3rd International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, 1990, 91-98.
4) Sobczak, Ronald S., Matthews, Manton M., A Massively Parallel Expert System Architecture for Chemical Structure Analysis, Distributed Memory Computing Conference, 1990, 11-17.
5) Sasaki, S., Kudo, Y., Structure Elucidation System Using Structural Information from Multisources: CHEMICS, J. Chem. Inf. Comput. Sci., 1985, Vol. 25, 252-257.