Uncategorized

md_harmonize: A Python Package for Atom-Level Harmonization of Public Metabolic Databases



. 2023 Dec 17;13(12):1199.


doi: 10.3390/metabo13121199.

Affiliations

Free PMC article

Item in Clipboard

Huan Jin et al.


Metabolites.


.

Free PMC article

Abstract

A major challenge to integrating public metabolic resources is the use of different nomenclatures by individual databases. This paper presents md_harmonize, an open-source Python package for harmonizing compounds and metabolic reactions across various metabolic databases. The md_harmonize package utilizes a neighborhood-specific graph coloring method for generating a unique identifier for each compound via atom identifiers based on a compound’s chemical structure. The resulting harmonized compounds and reactions can be used for various downstream analyses, including the construction of atom-resolved metabolic networks and models for metabolic flux analysis. Parts of the md_harmonize package have been optimized using a variety of computational techniques to allow certain NP-complete problems handled by the software to be tractable for these specific use-cases. The software is available on GitHub and through the Python Package Index, with end-user documentation hosted on GitHub Pages.


Keywords:

Python package; database harmonization; maximum common substructure; metabolite.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures


Figure 1



Figure 1

Example of matrix representation of a compound structure: (A) KEGG compound C00207 with the atoms numbered for comparison to rows and columns in the matrix; (B) matrix representation of KEGG compound C00207.


Figure 2



Figure 2

Example of a mapping matrix between two compound structures: (A) KEGG compound C00207 with atoms numbered to rows in the matrix; (B) KEGG compound C00466 with atoms numbered to columns in the matrix; and (C) mapping matrix between KEGG compound C00207 and KEGG compound C00466.


Figure 3



Figure 3

Flowchart of backtracking algorithm for generating one-to-one atom mappings of two compound structures.


Figure 4



Figure 4

Example of the shortest distance between any two atoms in a compound structure: (A) KEGG compound C00466 with atoms numbered to rows and columns in the matrix; (B) the shortest distance matrix D of KEGG compound C00466.


Figure 5



Figure 5

Flowchart of the modified Dijkstra algorithm for generating the shortest distance between each atom and the R groups in a compound. The “*” represents the multiplication operator.


Figure 6



Figure 6

Shortest distance to the R groups in a compound structure. (A) KEGG compound C05205 with atoms numbered to indeces in the array; (B) the array of the shortest distance from each atom to R groups in KEGG compound C05205.


Figure 7



Figure 7

Organization of the md_harmonize package presented with UML diagrams. (A) UML package diagram of the md_harmonize Python library; (B) UML class diagram of the md_harmonize Python package.


Figure 8



Figure 8

Command line interface of md_harmonize package.


Figure 9



Figure 9

Comparison of substructure performance after algorithm optimization.


Figure 10



Figure 10

Example of incorrect compound pair indicated via HMDB reference with different structure representations. (A) MetaCyc CPD-10813; (B) HMDB HMDB0000265.

References

    1. Faubert B., Solmonson A., DeBerardinis R.J. Metabolic reprogramming and cancer progression. Science. 2020;368:eaaw5473. doi: 10.1126/science.aaw5473.



      DOI



      PMC



      PubMed

    1. DeBerardinis R.J., Chandel N.S. Fundamentals of cancer metabolism. Sci. Adv. 2016;2:e1600200. doi: 10.1126/sciadv.1600200.



      DOI



      PMC



      PubMed

    1. You L., Zhang B., Tang Y. Application of Stable Isotope-Assisted Metabolomics for Cell Metabolism Studies. Metabolites. 2014;4:142–165. doi: 10.3390/metabo4020142.



      DOI



      PMC



      PubMed

    1. Fan T.W.-M., Lorkiewicz P.K., Sellers K., Moseley H.N.B., Higashi R.M., Lane A.N. Stable isotope-resolved metabolomics and applications for drug development. Pharmacol. Ther. 2012;133:366–391. doi: 10.1016/j.pharmthera.2011.12.007.



      DOI



      PMC



      PubMed

    1. Jin H., Moseley H.N.B. Moiety modeling framework for deriving moiety abundances from mass spectrometry measured isotopologues. BMC Bioinform. 2019;20:524. doi: 10.1186/s12859-019-3096-7.



      DOI



      PMC



      PubMed

LinkOut – more resources

  • Full Text Sources

  • Miscellaneous



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *