Generic Context-Aware Group Contributions

Christoph Flamm, Marc Hellmuth, Daniel Merkle*, Nikolai Nojgaard, Peter F Stadler

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

78 Downloads (Pure)

Abstract

Many properties of molecules vary systematically with changes in the structural formula and can thus be estimated from regression models defined on small structural building blocks, usually functional groups. Typically, such approaches are limited to a particular class of compounds and requires hand-curated lists of chemically plausible groups. This limits their use in particular in the context of generative approaches to explore large chemical spaces. Here we overcome this limitation by proposing a generic group contribution method that iteratively identifies significant regressors of increasing size. To this end, LASSO regression is used and the context-dependent contributions are 'anchored' around a reference edge to reduce ambiguities and prevent overcounting due to multiple embeddings. We benchmark our approach, which is available as 'Context AwaRe Group cOntribution' ($\mathsf {CARGO}_{\mathrm{}}$CARGO), on artificial data, typical applications from chemical thermodynamics. As we shall see, this method yields stable results with accuracies comparable to other regression techniques. As a by-product, we obtain interpretable additive contributions for individual chemical bonds and correction terms depending on local contexts.

Original languageEnglish
Title of host publicationIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume19
PublisherIEEE
Publication date2022
Edition1
Pages429-442
DOIs
Publication statusPublished - 2022
SeriesIEEE/ACM Transactions on Computational Biology and Bioinformatics
ISSN1545-5963

Keywords

  • Group contributions
  • cheminformatics
  • frequent subgraph mining
  • lasso regression
  • thermodynamics
  • Regression Analysis
  • Thermodynamics

Fingerprint

Dive into the research topics of 'Generic Context-Aware Group Contributions'. Together they form a unique fingerprint.

Cite this