GC-MS Fingerprinting Combined with Chemical Pattern-Recognition Analysis Reveals Novel Chemical Markers of the Medicinal Seahorse

Seahorse is a valuable marine-animal drug widely used in traditional Chinese medicine (TCM), and which was first documented in the “Ben Cao Jing Ji Zhu” during the Liang Dynasty. Hippocampus kelloggi (HK) is the most common seahorse species in the medicinal material market and is one of the genuine sources of medicinal seahorse documented in the Chinese pharmacopeia. It is mainly cultivated in the Shandong, Fujian, and Guangxi Provinces in China. However, pseudo-HK, represented by Hippocampus ingens (HI) due to its similar appearance and traits, is often found in the market, compromising the safety and efficacy of clinical use. Currently, there is a lack of reliable methods for identifying these species based on their chemical composition. In this study, we employed, for the first time, a strategy combining gas chromatography-mass spectrometry (GC-MS) fingerprints and chemical patterns in order to identify HK and HI; it is also the first metabolomic study to date of HI as to chemical components. The obtained results revealed remarkable similarities in the chemical fingerprints, while significant differences were also observed. By employing hierarchical cluster analysis (HCA) and principal component analysis (PCA), based on the relative contents of their characteristic peaks, all 34 samples were successfully differentiated according to their species of origin, with samples from the same species forming distinct clusters. Moreover, nonadecanoic acid and behenic acid were exclusively detected in HK samples, further distinguishing them from HI samples. Additionally, the relative contents of lauric acid, tetradecanoic acid, pentadecanoic acid, n-hexadecanoic acid, palmitoleic acid, margaric acid, oleic acid, fenozan acid, eicosapentaenoic acid (EPA), and docosahexaenoic acid (DHA) exhibited significant differences between HK and HI (p < 0.0001), as determined by an unpaired t-test. Orthogonal partial least squares discriminant analysis (OPLS-DA) identified seven components (DHA, EPA, n-hexadecanoic acid, tetradecanoic acid, palmitoleic acid, octadecanoic acid, and margaric acid) with high discriminatory value (VIP value > 1). Thus, nonadecanoic acid, behenic acid, and these seven compounds can be utilized as chemical markers for distinguishing HK from HI. In conclusion, our study successfully developed a combined strategy of GC-MS fingerprinting and chemical pattern recognition for the identification of HK and HI, and we also discovered chemical markers that can directly differentiate between the two species. This study can provide a foundation for the authentication of Hippocampus and holds significant importance for the conservation of wild seahorse resources.


Introduction
Seahorse is a valuable marine-animal drug widely used in traditional Chinese medicine (TCM).It was first documented in the "Ben Cao Jing Ji Zhu" during the Liang Dynasty and is revered for its kidney-warming, Yang-strengthening, knot-dispersing, and swellingreducing properties, earning the title of "Animal Ginseng" [1].The Chinese Pharmacopoeia (2020 edition) specifies five seahorse species as medicinal seahorse, including Hippocampus kelloggi Jordan et Snyder, H. histrix Kaup, H. kuda Bleeker, H. trimaculatus Leach, and H. japonicas Kaup [2].However, our preliminary market survey and literature research have revealed a significant issue of species misidentification among medicinal seahorse, especially for H. kelloggi (HK).HK is the most common medical seahorse species in the traditional medicine market, and it is cultured in Shandong, Guangxi, Hainan, and other provinces of China [3,4].It is called "Xian-wen seahorse" in Chinese, meaning "a seahorse with linear stripes", due to its feature of intermittent small white dots on the head and trunk connected by vertical lines (Figure 1A).As it happens, another seahorse, the H. ingens, also possesses dense white vertical lines (Figure 1B) [5], making it visually similar to HK and leading to its fraudulent sale as HK in the market.Regrettably, HI is a unique seahorse species found in the Eastern Pacific Ocean and is not native to Chinese waters.It has not been documented in any traditional medicine literature, and there are no current studies on its chemical composition and pharmacological activity.Its risks and efficacy as a medicine are unclear.Furthermore, there are no reports on the artificial breeding of HI to date.It is quite possible that the HI appearing in the market is wild and mistaken for HK in international trade.The misidentification of medicinal seahorse not only weakens the safety and efficacy of treatments, but also exacerbates the threat to dwindling populations of wild seahorse.

Introduction
Seahorse is a valuable marine-animal drug widely used in traditional Chinese medicine (TCM).It was first documented in the "Ben Cao Jing Ji Zhu" during the Liang Dynasty and is revered for its kidney-warming, Yang-strengthening, knot-dispersing, and swelling-reducing properties, earning the title of "Animal Ginseng" [1].The Chinese Pharmacopoeia (2020 edition) specifies five seahorse species as medicinal seahorse, including Hippocampus kelloggi Jordan et Snyder, H. histrix Kaup, H. kuda Bleeker, H. trimaculatus Leach, and H. japonicas Kaup [2].However, our preliminary market survey and literature research have revealed a significant issue of species misidentification among medicinal seahorse, especially for H. kelloggi (HK).HK is the most common medical seahorse species in the traditional medicine market, and it is cultured in Shandong, Guangxi, Hainan, and other provinces of China [3,4].It is called "Xian-wen seahorse" in Chinese, meaning "a seahorse with linear stripes", due to its feature of intermittent small white dots on the head and trunk connected by vertical lines (Figure 1A).As it happens, another seahorse, the H. ingens, also possesses dense white vertical lines (Figure 1B) [5], making it visually similar to HK and leading to its fraudulent sale as HK in the market.Regrettably, HI is a unique seahorse species found in the Eastern Pacific Ocean and is not native to Chinese waters.It has not been documented in any traditional medicine literature, and there are no current studies on its chemical composition and pharmacological activity.Its risks and efficacy as a medicine are unclear.Furthermore, there are no reports on the artificial breeding of HI to date.It is quite possible that the HI appearing in the market is wild and mistaken for HK in international trade.The misidentification of medicinal seahorse not only weakens the safety and efficacy of treatments, but also exacerbates the threat to dwindling populations of wild seahorse.Species authenticity plays a crucial role in ensuring the quality and clinical efficacy of traditional Chinese medicine (TCM), making it a significant focus in research on TCM resources.Compared with plant-based drugs, animal-derived drugs exhibit components that more closely resemble the human body, facilitating their absorption and therapeutic effects [6,7].Hence, they are often referred to as "medicinal with lineage affinity to flesh and blood" in TCM, and known for their abundant resources, high activity, and remarkable healing properties.However, the diversity of species and the complexity of their composition, as well as challenges in species identification, pose a prominent issue of species confusion in animal-derived drugs, adversely affecting their efficacy and safety.Species authenticity plays a crucial role in ensuring the quality and clinical efficacy of traditional Chinese medicine (TCM), making it a significant focus in research on TCM resources.Compared with plant-based drugs, animal-derived drugs exhibit components that more closely resemble the human body, facilitating their absorption and therapeutic effects [6,7].Hence, they are often referred to as "medicinal with lineage affinity to flesh and blood" in TCM, and known for their abundant resources, high activity, and remarkable healing properties.However, the diversity of species and the complexity of their composition, as well as challenges in species identification, pose a prominent issue of species confusion in animal-derived drugs, adversely affecting their efficacy and safety.
So far, the authenticity of seahorse has mainly been addressed through molecular techniques, such as DNA barcode-based identification using the COI gene [8][9][10].However, this method is limited to the analysis of whole bodies, tissues, or powders from which DNA can be extracted, and it cannot be applied to extracts or preparations.Furthermore, DNA is chemically unstable and easily degraded, making it challenging to extract high-quality DNA from dried seahorse samples.Factors like temperature, light, and storage period can further impact the stability of DNA.Additionally, previous studies have indicated that the primary active components in Hippocampus are fatty acids, steroids, and other compounds [11].DNA-based methods do not provide information on the identification of these active components, making them unsuitable for research establishing the effective substances, quality markers, and pharmacological activities of seahorse.Therefore, it is advisable to employ appropriate technical approaches to explore and discover an identification strategy for seahorse based on its chemical components.
The TCM fingerprint technique has gained international recognition as an effective method for identifying natural products, and is valued for its ability to provide a comprehensive representation of the overall chemical composition of TCM [12].Gas chromatography-mass spectrometry (GC-MS) is an effective method for the separation, analysis, and identification of fatty acids, sterols, and other volatile components, with the advantages of high sensitivity, simple pre-treatment, low solvent-consumption, and good resolution, and it has been widely used in the study of the chemical composition of marine-animal drugs [11,13].Chemical pattern recognition is a multivariate analysis technique such as hierarchical cluster analysis (HCA), principal component analysis (PCA), or orthogonal partial least squares discriminant analysis (OPLS-DA) which can reveal the rules behind measured data and has significant advantages in distinguishing samples through the analysis and visualization of high-dimensional data [14].In recent years, the method of GC-MS fingerprinting combined with chemical pattern recognition has been effectively used in the identification, characterization, and quality evaluation of traditional medicines and foods rich in lipophilic components [15][16][17][18][19]. Therefore, it may be a viable approach to solve the aforementioned problem of medicinal seahorse.
In this study, the research strategy incorporating GC-MS fingerprints and chemical pattern recognition was employed, in response to the issue of HI seahorses being falsely sold as genuine HK seahorses, based on the characteristic chromatographic peaks.The differences in the characteristic components of HK and HI were detected for the first time through GC-MS fingerprinting.Subsequently, the HCA, PCA, and OPLS-DA techniques were applied to identify chemical markers that can directly differentiate between the two species.This study presents a novel method for authenticating medicinal seahorse and holds importance for the conservation of wild seahorse resources.

Fingerprints of H. kelloggi and H. ingens
The GC-MS data for the 17 batches of HK were imported into the Chromatographic Fingerprint Evaluation System for Chinese Medicine (2012 version), and the width of the time window was set to 0.1 min.The peaks indicating relatively higher contents and distinct separations were selected for multi-point calibration using HK1 as the reference spectrum, the marked peaks were matched, and the median method was used to generate the control spectrum (R).The GC-MS data for the control spectrum (R) and the 34 batches of Hippocampus were then loaded into the software, as shown in Figure 2, and the control spectrum (R) was used as the reference spectrum.Likewise, a multi-point calibration was performed, and the marked-peak matching method was employed to calculate the similarity between the spectra of each batch of data and that of the control spectrum (R).As presented in Table 1, except for HI samples No. 1 and No. 5, the similarity value between all samples and the reference spectrum was above 0.930, indicating that the GC fingerprints of HK and HI were highly similar.
Additionally, an unpaired t-test was used to examine the relative content levels of the characteristic peaks of all the samples to identify the compounds which can be employed as index components for the comparison of HK and HI.The p-value was set as the filtering standard to maintain the contents.The results obtained revealed that the two species of Hippocampus differed significantly in their relative contents of a variety of compounds (Figure 3).The content levels of lauric acid, tetradecanoic acid, pentadecanoic acid, nhexadecanoic acid, palmitoleic acid, margaric acid, and oleic acid in HK were significantly higher than those in HI (p < 0.0001), while the content levels of fenozan acid, EPA, and DHA in HI were significantly higher than those in HK (p < 0.0001).It is worth mentioning that in this study, Hippocampus was found to be rich in DHA and EDA.The relative content levels of DHA and EDA in HK extract were 4.21 ± 0.77% and 4.53 ± 1.39%, respectively, and their relative content levels in HI extract were 9.16 ± 3.13% and 10.04 ± 1.73%, respectively.The relative content levels of EPA and DHA were obviously higher in HI than in HK.EPA and DHA are both omega-3 polyunsaturated fatty acids.EPA can reduce blood viscosity, improve blood circulation, enhance tissue oxygenation, eliminate fatigue, and prevent atherosclerosis [20].DHA, which is known as "brain gold", is a major component of prostate glands and sperm, is crucial for cell growth, and is involved in the maintenance of the nervous system [21][22][23].DHA and EPA are important active components of several marine drugs which play a nourishing role and demonstrate strong pharmacological effects.Although HI contains very high levels of DHA and EPA, it has not been included in any pharmacopeia, and its pharmacological activity and effectiveness deserve further study.

HCA
HCA is a clustering method that assesses the degree of dissimilarity or similarity between the objects to be clustered.HCA will roughly group similar samples of Hippocampus into the same cluster based on the relative content levels of each component in the extracts.In order to find out the objective categories in the patterns of Hippocampus, the relative content levels of 20 characteristic components in the extracts were analyzed using HCA, with the parameter settings for "Between-groups linkage" and "Squared Euclidean distance".As shown in Figure 4, when the classification distance was 25, 34 samples were divided into two categories.This demonstrated that HK and HI could be effectively distinguished by the HCA model based on those 20 characteristic components.

PCA
PCA is an unsupervised analytical method used for the clustering and visualization of high-dimensional data through the application of dimensionality reduction and the extraction of several comprehensive indicators that can be used to explain the information obtained [24].The first two principal components (PCs) explained the variability of approximately 86.1% of the original data.From the PCA scatter plot it can be observed that HK and HI clustered into two distinct regions, which was consistent with the results of the cluster analysis (Figure 5).extracts.In order to find out the objective categories in the patterns of Hippocampus, the relative content levels of 20 characteristic components in the extracts were analyzed using HCA, with the parameter settings for "Between-groups linkage" and "Squared Euclidean distance".As shown in Figure 4, when the classification distance was 25, 34 samples were divided into two categories.This demonstrated that HK and HI could be effectively distinguished by the HCA model based on those 20 characteristic components.

PCA
PCA is an unsupervised analytical method used for the clustering and visualization of high-dimensional data through the application of dimensionality reduction and the extraction of several comprehensive indicators that can be used to explain the information obtained [25].The first two principal components (PCs) explained the variability of approximately 86.1% of the original data.From the PCA scatter plot it can be observed that HK and HI clustered into two distinct regions, which was consistent with the results of the cluster analysis (Figure 5).

OPLS-DA
Although HCA and PCA could clearly distinguish between HK and HI, neither could demonstrate the effects of variability on the classification of a sample.Thus, the relative content levels of 20 discrete components were analyzed using a supervised multivariate technique, OPLS-DA.This was to further elucidate the distinction between HK and HI

OPLS-DA
Although HCA and PCA could clearly distinguish between HK and HI, neither could demonstrate the effects of variability on the classification of a sample.Thus, the relative content levels of 20 discrete components were analyzed using a supervised multivariate technique, OPLS-DA.This was to further elucidate the distinction between HK and HI and to identify the critical variables (key markers) that can be applied for the categorization of the samples into either of the two species.The contribution to the classification of the samples was positively correlated with the variable importance in the projection (VIP) value.In this study, a component with a VIP value > 1 was chosen as the primary chemical marker for the sample classification.Seven components with VIP values > 1, namely, DHA, EPA, n-hexadecanoic acid, tetradecanoic acid, palmitoleic acid, octadecanoic acid, and margaric acid were chosen as prospective markers (Figure 6).

Samples
Hippocampus samples were collected from the Chengdu Hehuachi Chinese Herbal Medicine Market, Guangxi Yulin Chinese Herbal Medicine Market, and Guangzhou Qingping Chinese Herbal Medicine Market.A total of 34 batches, including 17 batches each of H. kelloggi and H. ingens, were collected and identified using a strategy of combined morphological identification and DNA barcoding.The information as to the 34 batches of samples is shown in Table 4.In combination with the results of the previous unpaired t-test, six compounds-DHA, EPA, n-hexadecanoic acid, tetradecanoic acid, palmitoleic acid, and margaric acid-were found to possess significant (p < 0.0001) discriminative values.The relative content levels of DHA in all HK samples were <5%, except in HK11, in which they were >6% in all HI samples.The relative content levels of EPA were <7% in all HK samples, while they were >7% in almost all the HI samples, excepting HI15 and 16.The relative content levels of n-hexadecanoic acid were >24% in all HK samples, while they were <24% in almost all the HI samples, excepting HI3 and 6.The relative content levels of tetradecanoic acid were >5% in all the HK samples, whereas they were <5% in almost all the HI samples, excepting HI15 and 16.The relative content levels of palmitoleic acid were >5.8% in all the HK samples except HK11, while they were <5.5% in almost all the HI samples, excepting HI4 and 12.The relative content levels of margaric acid were >2.9% in all the HK samples except HK11, while they were <2.1% in all the HI samples.

Samples
Hippocampus samples were collected from the Chengdu Hehuachi Chinese Herbal Medicine Market, Guangxi Yulin Chinese Herbal Medicine Market, and Guangzhou Qingping Chinese Herbal Medicine Market.A total of 34 batches, including 17 batches each of H. kelloggi and H. ingens, were collected and identified using a strategy of combined morphological identification and DNA barcoding.The information as to the 34 batches of samples is shown in Table 4. Sodium hydroxide, hydrochloric acid, anhydrous ethanol, and anhydrous sodium sulfate, all of analytical grade, were purchased from the Chengdu Kolong Chemical Co., Ltd.(Chengdu, China); n-hexane and methanol, both of a chromatographically pure grade, were purchased from Thermo Fisher Scientific (China) Co., Ltd.(Shanghai, China).

Extraction of Samples
The different batches of seahorses were crushed separately to obtain a coarse powder.Then, 2 g of each batch of powder was accurately weighed and extracted with tenfold anhydrous ethanol as a solvent for 60 min by ultra sonification.The extracts were centrifuged for 10 min at 3000 rpm, and the supernatant was collected.The supernatant was concentrated, thereby obtaining the total extract.

Methyl Esterification
In accord with the relevant literature [25][26][27][28], an acid-base combined catalysis method was selected.Considering the different incubation times used in the literature, the incubation time was optimized.Nine different incubation times, namely, 0.5, 1, 2, 5, 10, 20, 30, 60, and 120 min, were set to examine the effect; the results are shown in Supplementary Figure S1 and Supplementary Tables S2 and S3.As can be seen, although the peak areas of fatty acid methyl ester increase with the extension of incubation time (Table S2), which reflects the more complete methyl esterification reaction, the final calculated relative content levels of fatty acids do not change much (Table S3), and there is little difference between the different incubation times.The similarities between these nine GC fingerprints (Figure S1) are all greater than 0.993 (Table 5), as evaluated by the "Chromatographic Fingerprint Similarity Evaluation System for Traditional Chinese Medicine" (2012 version).These findings indicate that the incubation time has little impact on the results of the research.Based on the above research and the criterion of experimental efficiency, we chose the method of Jiang et al. [26] The sample extracts were methylated using the following method: The total extract was dissolved in n-hexane, and the same solvent was used to bring the final volume to 10 mL.Then, 2 mL was taken, and 1 mL of 2 mol/L NaOH-CH 3 OH was added for saponification, vortexed, and shaken for 10 min.Next, the solution was heated in a water bath at 50 • C for 5 min.The solution was cooled to room temperature, methyl esterified by adding 2 mL of 2 mol/L HCl-CH 3 OH, vortexed and shaken for 10 min, and then heated for 5 min in a water bath at 50 • C. The supernatant was washed with 2mL of distilled water, the aqueous layer was removed, and, after dehydration by adding anhydrous sodium sulfate, the supernatant was collected and filtered through 0.22 µm sized microporous membranes for GC-MS analysis.

GC-MS Analysis
The chromatographic conditions were set as follows-HP-5 MS (Agilent, Santa Clara, CA, USA, 30 m × 0.25 mm × 0.25 µm); carrier gas: He (99.999% purity); flow rate: The compounds were identified through the NIST14 mass spectrometry database (National Institute of Standards and Technology).The area normalization method was used to calculate the relative content of each compound in the chromatogram.The samples were taken for analysis in five successive injections.The similarities in each spectrum were >0.99 using the precision test.The relative retention time RSD of the 15 major common peaks was <0.01%and the peak area RSD was 0.02~0.16%,indicating a high precision of the instrument used.

Stability Test
The samples were analyzed at six time points of 0, 2, 4, 8, 16, and 24 h.The similarities between all the obtained spectra were >0.99 from the stability test.The relative retention time RSD of the 15 major common peaks was <0.01%, and the peak area RSD range from 0.13~0.27%,indicating that the sample solution was stable for at least 24 h.

Repeatability Test
Five samples from the same batch of seahorses were extracted and analyzed separately.The similarity in the obtained spectra of each batch was >0.99, as determined by the repeatability test.The relative retention time RSD of the 15 major common peaks was <0.01%, and the peak area RSD was 0.06~0.25%,indicating that the method used was highly reproducible.

Data Analysis
The similarity between the GC fingerprints was evaluated using the "Chromatographic Fingerprint Similarity Evaluation System for Traditional Chinese Medicine" (2012 version).The statistical analyses using an unpaired t-test were carried out employing GraphPad Prism 9 (GraphPad Software Inc., La Jolla, CA, USA).Moreover, these data were also analyzed using the HCA, PCA, and OPLS-DA, employing SPSS26.0 (SPSS Inc., Chicago, IL, USA) or SIMCA-P14.1,(Umetrics, Umea, Sweden).The results are expressed as mean values ± standard error of the mean (SEM) and a p < 0.05 was considered to be statistically significant.

Conclusions
In summary, this study established an effective strategy combining GC-MS fingerprints and chemical pattern recognition to identify genuine HK seahorse and its counterfeit, HI seahorse, for the first time, and further found chemical markers that can directly be used for the identification.The GC-MS fingerprints showed that there were 15 peaks common to HK and HI.Although the fingerprints of HK and HI were similar, significant differences were also observed.The comprehensive analysis utilizing GC-MS fingerprint, HCA, and PCA demonstrated a clear distinction between HK and HI samples.Notably, nine compounds, nonadecanoic acid, behenic acid, DHA, EPA, n-hexadecanoic acid, tetradecanoic acid, palmitoleic acid, octadecanoic acid, and margaric acid, were identified as chemical markers crucial for distinguishing HK from HI.These findings emphasize the significance of species origin in determining the quality of TCM.This study provides a scientific and technical foundation for the explicit authentication of Hippocampus, paves the way for further research on its chemical composition, and contributes to the conservation of wild natural resources.

Figure 2 .
Figure 2. GC-MS chromatograms of HK and HI samples.Figure 2. GC-MS chromatograms of HK and HI samples.

Figure 2 .
Figure 2. GC-MS chromatograms of HK and HI samples.Figure 2. GC-MS chromatograms of HK and HI samples.

Figure 4 .
Figure 4. HCA dendrogram of 34 Hippocampus samples from the two species using the betweengroups linkage method based on squared Euclidean distance.

Figure 4 .
Figure 4. HCA dendrogram of 34 Hippocampus samples from the two species using the betweengroups linkage method based on squared Euclidean distance.

Figure 5 .
Figure 5. Score plot of principal component analysis of 34 samples of Hippocampus from two species.

Figure 5 .
Figure 5. Score plot of principal component analysis of 34 samples of Hippocampus from two species.

Molecules 2023 , 17 Figure 6 .
Figure 6.VIP values of the 20 characteristic components of 34 samples of Hippocampus from the two species, based on OPLS-DA.

Figure 6 .
Figure 6.VIP values of the 20 characteristic components of 34 samples of Hippocampus from the two species, based on OPLS-DA.

Table 2 .
The relative content levels of the peaks common to the two species in HK.

Table 3 .
The relative content levels of the peaks common to the two species in HI.

Table 4 .
Information as to the 34 batches of hippocampal samples.

Table 4 .
Information as to the 34 batches of hippocampal samples.

Table 5 .
Similarity of GC fingerprints for nine different incubation times.