
md_harmonize: A Python Package for Atom-Level Harmonization of Public Metabolic Databases

. 2023 Dec 17;13(12):1199.

doi: 10.3390/metabo13121199.


Free PMC article

Item in Clipboard

Huan Jin et al.



Free PMC article


A major challenge to integrating public metabolic resources is the use of different nomenclatures by individual databases. This paper presents md_harmonize, an open-source Python package for harmonizing compounds and metabolic reactions across various metabolic databases. The md_harmonize package utilizes a neighborhood-specific graph coloring method for generating a unique identifier for each compound via atom identifiers based on a compound’s chemical structure. The resulting harmonized compounds and reactions can be used for various downstream analyses, including the construction of atom-resolved metabolic networks and models for metabolic flux analysis. Parts of the md_harmonize package have been optimized using a variety of computational techniques to allow certain NP-complete problems handled by the software to be tractable for these specific use-cases. The software is available on GitHub and through the Python Package Index, with end-user documentation hosted on GitHub Pages.


Python package; database harmonization; maximum common substructure; metabolite.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.


Figure 1

Figure 1

Example of matrix representation of a compound structure: (A) KEGG compound C00207 with the atoms numbered for comparison to rows and columns in the matrix; (B) matrix representation of KEGG compound C00207.

Figure 2

Figure 2

Example of a mapping matrix between two compound structures: (A) KEGG compound C00207 with atoms numbered to rows in the matrix; (B) KEGG compound C00466 with atoms numbered to columns in the matrix; and (C) mapping matrix between KEGG compound C00207 and KEGG compound C00466.

Figure 3

Figure 3

Flowchart of backtracking algorithm for generating one-to-one atom mappings of two compound structures.

Figure 4

Figure 4

Example of the shortest distance between any two atoms in a compound structure: (A) KEGG compound C00466 with atoms numbered to rows and columns in the matrix; (B) the shortest distance matrix D of KEGG compound C00466.

Figure 5

Figure 5

Flowchart of the modified Dijkstra algorithm for generating the shortest distance between each atom and the R groups in a compound. The “*” represents the multiplication operator.

Figure 6

Figure 6

Shortest distance to the R groups in a compound structure. (A) KEGG compound C05205 with atoms numbered to indeces in the array; (B) the array of the shortest distance from each atom to R groups in KEGG compound C05205.

Figure 7

Figure 7

Organization of the md_harmonize package presented with UML diagrams. (A) UML package diagram of the md_harmonize Python library; (B) UML class diagram of the md_harmonize Python package.

Figure 8

Figure 8

Command line interface of md_harmonize package.

Figure 9

Figure 9

Comparison of substructure performance after algorithm optimization.

Figure 10

Figure 10

Example of incorrect compound pair indicated via HMDB reference with different structure representations. (A) MetaCyc CPD-10813; (B) HMDB HMDB0000265.


    1. Faubert B., Solmonson A., DeBerardinis R.J. Metabolic reprogramming and cancer progression. Science. 2020;368:eaaw5473. doi: 10.1126/science.aaw5473.




    1. DeBerardinis R.J., Chandel N.S. Fundamentals of cancer metabolism. Sci. Adv. 2016;2:e1600200. doi: 10.1126/sciadv.1600200.




    1. You L., Zhang B., Tang Y. Application of Stable Isotope-Assisted Metabolomics for Cell Metabolism Studies. Metabolites. 2014;4:142–165. doi: 10.3390/metabo4020142.




    1. Fan T.W.-M., Lorkiewicz P.K., Sellers K., Moseley H.N.B., Higashi R.M., Lane A.N. Stable isotope-resolved metabolomics and applications for drug development. Pharmacol. Ther. 2012;133:366–391. doi: 10.1016/j.pharmthera.2011.12.007.




    1. Jin H., Moseley H.N.B. Moiety modeling framework for deriving moiety abundances from mass spectrometry measured isotopologues. BMC Bioinform. 2019;20:524. doi: 10.1186/s12859-019-3096-7.




LinkOut – more resources

  • Full Text Sources

  • Miscellaneous

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *