Toward the Reconciliation of Inconsistent Molecular Structures from Biochemical Databases

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

Abstract

Information on the structure of molecules, retrieved via biochemical databases, plays a pivotal role in various disciplines, including metabolomics, systems biology, and drug discovery. No such database can be complete and it is often necessary to incorporate data from several sources. However, the molecular structure for a given compound is not necessarily consistent between databases. This article presents StructRecon, a novel tool for resolving unique molecular structures from database identifiers. Currently, identifiers from BiGG, ChEBI, Escherichia coli Metabolome Database (ECMDB), MetaNetX, and PubChem are supported. StructRecon traverses the cross-links between entries in different databases to construct what we call identifier graphs. The goal of these graphs is to offer a more complete view of the total information available on a given compound across all the supported databases. To reconcile discrepancies met during the traversal of the databases, we develop an extensible model for molecular structure supporting multiple independent levels of detail, which allows standardization of the structure to be applied iteratively. In some cases, our standardization approach results in multiple candidate structures for a given compound, in which case a random walk-based algorithm is used to select the most likely structure among incompatible alternatives. As a case study, we applied StructRecon to the EColiCore2 model. We found at least one structure for 98.66% of its compounds, which is more than twice as many as possible when using the databases in more standard ways not considering the complex network of cross-database references captured by our identifier graphs. StructRecon is open-source and modular, which enables support for more databases in the future.

OriginalsprogEngelsk
TidsskriftJournal of Computational Biology
Vol/bind31
Udgave nummer6
Sider (fra-til)498-512
ISSN1066-5277
DOI
StatusUdgivet - 2024

Fingeraftryk

Dyk ned i forskningsemnerne om 'Toward the Reconciliation of Inconsistent Molecular Structures from Biochemical Databases'. Sammen danner de et unikt fingeraftryk.

Citationsformater