Next Article in Journal
Dataset for Collaborative Robotics
Previous Article in Journal
Intervention to Improve Attitudes Toward Stuttering: A Multi-Site International Replication and Expansion
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

TCM-MS2Link: A Unified AI-Ready Dataset Integrating TCM Herb–Compound Knowledge and MS/MS Spectral Data

1
School of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
2
School of Information Engineering, China Jiliang University, Hangzhou 310018, China
3
Center for Metrology Scientific Data, National Institute of Metrology, Beijing 100029, China
4
National Metrology Data Center, Beijing 100029, China
5
Key Laboratory of Metrology Digitalization and Digital Metrology, State Administration for Market Regulation, Beijing 100029, China
*
Author to whom correspondence should be addressed.
Data 2026, 11(5), 113; https://doi.org/10.3390/data11050113
Submission received: 30 March 2026 / Revised: 29 April 2026 / Accepted: 8 May 2026 / Published: 10 May 2026
(This article belongs to the Section Chemoinformatics)

Abstract

This study presents TCM-MS2Link, a standardized mass spectrometry-based association dataset for traditional Chinese medicine (TCM), serving as an important resource for natural product research in TCM. The dataset adopts a dual-layer “knowledge–data” architecture: the first layer, TCM-MolLink, comprises curated herb–compound association data, constructed through the integration of multiple heterogeneous databases and rigorous consistency filtering to establish high-confidence relationships between TCM herbs and their chemical constituents; the second layer, MS2-MLReady, is a benchmark dataset for mass spectrometry-based machine learning which, after systematic data cleaning, standardized preprocessing, and well-designed data partitioning, can directly support the training and evaluation of artificial intelligence models. By addressing key limitations in existing public resources, including data fragmentation, inconsistent annotations, and insufficient computational usability, TCM-MS2Link effectively overcomes major bottlenecks in the systematic analysis of TCM components and data-driven research. This study significantly enhances the reliability of herb–compound associations and the modeling readiness of mass spectrometry data, providing a high-quality, standardized, and reusable data foundation for applications such as TCM knowledge base construction and automated spectrum–structure identification, thereby promoting the advancement of TCM informatics and data-driven research.
Keywords: traditional Chinese medicine; natural products; MS/MS spectra; spectral dataset; mass spectrometry traditional Chinese medicine; natural products; MS/MS spectra; spectral dataset; mass spectrometry

Share and Cite

MDPI and ACS Style

Li, Q.; Zhao, F.; Zhang, J.; Zhou, H.; Guo, L.; Xiong, X. TCM-MS2Link: A Unified AI-Ready Dataset Integrating TCM Herb–Compound Knowledge and MS/MS Spectral Data. Data 2026, 11, 113. https://doi.org/10.3390/data11050113

AMA Style

Li Q, Zhao F, Zhang J, Zhou H, Guo L, Xiong X. TCM-MS2Link: A Unified AI-Ready Dataset Integrating TCM Herb–Compound Knowledge and MS/MS Spectral Data. Data. 2026; 11(5):113. https://doi.org/10.3390/data11050113

Chicago/Turabian Style

Li, Qianjin, Feifan Zhao, Jihang Zhang, Heng Zhou, Lin Guo, and Xingchuang Xiong. 2026. "TCM-MS2Link: A Unified AI-Ready Dataset Integrating TCM Herb–Compound Knowledge and MS/MS Spectral Data" Data 11, no. 5: 113. https://doi.org/10.3390/data11050113

APA Style

Li, Q., Zhao, F., Zhang, J., Zhou, H., Guo, L., & Xiong, X. (2026). TCM-MS2Link: A Unified AI-Ready Dataset Integrating TCM Herb–Compound Knowledge and MS/MS Spectral Data. Data, 11(5), 113. https://doi.org/10.3390/data11050113

Article Metrics

Back to TopTop