Top Banner
“Found in Translation” – Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124) Theophile Gaudin, Riccardo Pisoni, David Lanyi, Costas Bekas & Teodoro Laino IBM Research Zurich, Switzerland Alpha Lee University of Cambridge Chem. Sci., 2018, 9, 6091-6098
19

“Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller...

Jun 26, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

“Found in Translation” – Neural machine translation models for chemical reaction prediction

Philippe Schwaller (@phisch124)

Theophile Gaudin, Riccardo Pisoni, David Lanyi, Costas Bekas & Teodoro Laino

IBM Research Zurich, Switzerland

Alpha LeeUniversity of Cambridge

Chem. Sci., 2018, 9, 6091-6098

Page 2: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

Exploring the nearly endless chemical space

O

O

OHO

N

S

O

NH

OO

OH

O

O-Na+

Cl

Cl

ClCl

Cl

N

OH O

OH

HO

S

O

O

OH

Page 3: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

Design Make

Test

De Novo MoleculesVAE / GAN / RL

Synthetic route planningReaction outcome prediction

Experimental verificationCredits to Marwin Segler

Page 4: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

Chemical reaction prediction

Page 5: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

Data

US patents

Lowe (2012,2017)

text-mining

Cl:1C:2

C:3

C:4 C:8

N:5N:6

C:7OH:14

N+:15O-:16

O:17

S:9O:10

O:11

OH:12

OH:13

Cl:1

C:2C:3

N+:15

O:14

O-:16

C:4

C:8

N:5

N:6C:7+ +

Jin et. al.(NIPS 2017)

filtering

SMARTS- reactants>>products- partly correct atom-mapping- >1M reactions

Benchmark dataset(USPTO_500k)

Page 6: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

Representing molecules for MLMolecular fingerprints000010000….0100

Text-based representations- SMILES / INCHI - CN1C=NC2=C1C(=O)N(C(=O)N2C)C

Graph-based

1D 2D 3D

3D structure

N

N

O

N

O

N

N

N

O

N

O

N

Page 7: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

Atoms as letters, molecules as words

SMILES to SMILES prediction with sequence-2-sequence models Schwaller et. al. Chem. Sci., 2018, 9, 6091-6098 / Nam & Kim arXiv:1612.09529

Page 8: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

How do Seq-2-Seq models work?

Page 9: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

Step-by-step

Inspired by Hendrik’s talk in Zurich

Page 10: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

Step-by-stepwith attention

Link to reaction prediction?- Reactants to products

Br . C C = C 1 C …Source

C

Prediction

Page 11: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

Attention weights

Plotting the attention weights at every decoder time step.

Page 12: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

Attention weightsChloro Buchwald-Hartwig amination:US20170240534A1

Source: Reactants

Targ

et: P

rodu

ct

Page 13: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

SMILES-2-SMILES

Trained end-2-endTemplate-free

Fully data-driven

Nochemicalknowledge incorporated

No atom-mappingrequired

Attention weightsand confidence score

Less than 0.5%invalid SMILES

Page 14: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

Limitations

Only as good asthe training data

Difficult to includenegative data

Chemically unreasonablepredictions possible

Page 15: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

What other template-free approaches exist?Graph neural networks• Jin et al. (MIT, 2017): Weisfeiler-Lehman Networks (WLDN)• Network 1: Reaction center identification• Network 2: Product candidate ranking• Trained separately • Outperforms template-based by 10% on USPTO_500k dataset

• Bradshaw et al. (Cambridge, 2018): Electron path prediction• Gated Graph Neural Networks• Outperforms WLDN on USPTO_350k dataset

(subset without more difficult reactions, e. g. cycloadditions)Fundamental limitation: require atom-mapped training sets

Page 16: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

Data set Jin et. al., Schwaller et. al. Our new model MIT_500k 74.0 % <74.0 % 87.3 %

How do we perform compared to others? Reactants > reagents Products

Reactants & reagents mixed Products

Data set Jin et. al., WLDN

Schwaller et. al., Seq-2seq

Bradshaw et al., GGNN

MIT_500k 79.6 % 80.3 %

Top-1 accuracy:

On USPTO_MIT benchmark dataset.Schwaller et al.: Chem. Sci., 2018, 9, 6091-6098Jin et al.: NIPS, 2017, 30, 2607-2616Bradshaw et al.: arXiv:1805.10970

-subset 350k 84.0 % 87.0 %

Our new model 88.8 %

Page 17: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

What else can we do?Reaction scoring

CC=C1CCCCC1.Cl>>CCC1(Cl)CCCCC1

CC=C1CCCCC1.Cl>>CC(Cl)C1CCCCC1

Score: 0.99

When we provide the modelwith reactants>>products

Score: 0.001

Markovni

kov

Anti-Markovnikov

Page 18: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

IBM RXNfor ChemistryFreely available now:research.ibm.com/ai4chemistry

#RXNFORCHEMISTRY

Booth #530

Page 19: “Found in Translation” –Neural machine translation …...“Found in Translation” –Neural machine translation models for chemical reaction prediction Philippe Schwaller (@phisch124)TheophileGaudin,

[email protected] / @phisch124

IBM RXN

Chemistry

for

research.ibm.com/ai4chem

istry

Questions?

Philippe Schwaller