# Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules

@article{GmezBombarelli2018AutomaticCD, title={Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules}, author={Rafael G{\'o}mez-Bombarelli and David Kristjanson Duvenaud and Jos{\'e} Miguel Hern{\'a}ndez-Lobato and Jorge Aguilera-Iparraguirre and Timothy D. Hirzel and Ryan P. Adams and Al{\'a}n Aspuru-Guzik}, journal={ACS Central Science}, year={2018}, volume={4}, pages={268 - 276} }

We report a method to convert discrete representations of molecules to and from a multidimensional continuous representation. This model allows us to generate new molecules for efficient exploration and optimization through open-ended spaces of chemical compounds. A deep neural network was trained on hundreds of thousands of existing chemical structures to construct three coupled functions: an encoder, a decoder, and a predictor. The encoder converts the discrete representation of a molecule… Expand

#### Figures, Tables, and Topics from this paper

#### 1,343 Citations

Learning Continuous and Data-Driven Molecular Descriptors by Translating Equivalent Chemical Representations

- 2018

There has been a recent surge of interest in using machine
learning across chemical space in order to predict properties of molecules or
design molecules and materials with desired properties. Most… Expand

3DMolNet: A Generative Network for Molecular Structures

- Computer Science, Biology
- ArXiv
- 2020

This work proposes a new approach to efficiently generate molecular structures that are not restricted to a fixed size or composition, based on the variational autoencoder which learns a translation-, rotation-, and permutation-invariant low-dimensional representation of molecules. Expand

Learning a Continuous Representation of 3D Molecular Structures with Deep Generative Models

- Computer Science, Biology
- ArXiv
- 2020

Deep generative models for three dimensional molecular structures using atomic density grids and a novel fitting algorithm that converts continuous grids to discrete molecular structures are described. Expand

Deep Molecular Dreaming: Inverse machine learning for de-novo molecular design and interpretability with surjective representations

- Computer Science, Physics
- Mach. Learn. Sci. Technol.
- 2021

PASITHEA is proposed, a direct gradient-based molecule optimization that applies inceptionism techniques from computer vision that forms an inverse regression model, which is capable of generating molecular variants optimized for a certain property. Expand

Data-Driven Approach to Encoding and Decoding 3-D Crystal Structures

- Computer Science, Physics
- ArXiv
- 2019

A method to encode and decode the position of atoms in 3-D molecules from a dataset of nearly 50,000 stable crystal unit cells that vary from containing 1 to over 100 atoms is presented. Expand

Representation of molecular structures with persistent homology for machine learning applications in chemistry

- Medicine
- Nature Communications
- 2020

A persistence homology based molecular representation derived from persistent homology is demonstrated through an active-learning approach for predicting CO 2 /N 2 interaction energies at the density functional theory (DFT) level. Expand

ChemoVerse: Manifold traversal of latent spaces for novel molecule discovery

- Computer Science, Biology
- ADGN@ECAI
- 2020

This work presents a manifold traversal with heuristic search to explore the latent chemical space using various generative models such as grammar variational autoencoders as they deal with the randomized generation and validity of compounds. Expand

Optimization of Molecular Characteristics via Machine Learning Based on Continuous Representation of Molecules

- 2021

We demonstrate an automatic materials design method using continuous representation of molecule and its atomic arrangement via a neural network algorithm. This method is applied to optimizing and… Expand

Actively Searching: Inverse Design of Novel Molecules with Simultaneously Optimized Properties

- Computer Science
- 2021

This work demonstrates an active learning approach to improve the performance of multi-target generative chemical models by utilizing their inherent generative and predictive aspects for self-refinement in situations where any number of properties with varying degrees of correlation must be optimized simultaneously. Expand

Molecular Optimization by Capturing Chemist's Intuition Using Deep Neural Networks

- Computer Science
- 2020

This work seeks to capture the chemist’s intuition from matched molecular pairs using machine translation models and shows that the Transformer can generate more molecules with desirable properties by making small modifications to the given starting molecules, which can be intuitive to chemists. Expand

#### References

SHOWING 1-10 OF 80 REFERENCES

ChemTS: an efficient python library for de novo molecular generation

- Computer Science, Physics
- Science and technology of advanced materials
- 2017

A novel Python library ChemTS that explores the chemical space by combining Monte Carlo tree search and an RNN is presented, which showed superior efficiency in finding high-scoring molecules in a benchmarking problem of optimizing the octanol-water partition coefficient and synthesizability. Expand

Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks

- Computer Science, Medicine
- ACS central science
- 2018

This work shows that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing, and demonstrates that the properties of the generated molecules correlate very well with those of the molecules used to train the model. Expand

Application of Generative Autoencoder in De Novo Molecular Design

- Computer Science, Medicine
- Molecular informatics
- 2018

The results show that the latent space preserves chemical similarity principle and thus can be used for the generation of analogue structures in autoencoder for de novo molecular design. Expand

Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks

- Computer Science, Physics
- ArXiv
- 2017

This work shows that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing, and demonstrates that the properties of the generated molecules correlate very well with those of the molecules used to train the model. Expand

Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space

- Physics, Medicine
- The journal of physical chemistry letters
- 2015

A systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules and is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. Expand

Quantum-chemical insights from deep tensor neural networks

- Medicine, Physics
- Nature communications
- 2017

An efficient deep learning approach is developed that enables spatially and chemically resolved insights into quantum-mechanical observables of molecular systems, and unifies concepts from many-body Hamiltonians with purpose-designed deep tensor neural networks, which leads to size-extensive and uniformly accurate chemical space predictions. Expand

Chemical space as a source for new drugs

- Chemistry
- 2010

The chemical space is the ensemble of all possible molecules, which is believed to contain at least 1060 organic molecules below 500 Da of possible interest for drug discovery. This review summarizes… Expand

Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds.

- Chemistry, Medicine
- Journal of the American Chemical Society
- 2013

The construction of a "representative universal library" spanning the SMU that samples the full extent of feasible small molecule chemistries is described, generated using the newly developed Algorithm for Chemical Space Exploration with Stochastic Search (ACSESS). Expand

Designing molecules by optimizing potentials.

- Chemistry, Medicine
- Journal of the American Chemical Society
- 2006

It is shown that the optimal structures can be determined without enumerating and separately evaluating the characteristics of the combinatorial number of possible structures, a process that would be much slower. Expand

Chemical Space Travel

- Chemistry, Medicine
- ChemMedChem
- 2007

A “spaceship” program is reported which travels from a starting molecule A to a target molecule B through a continuum of structural mutations, and thereby charts unexplored chemical space. Expand