3.1. Heatmap and Cluster Analysis
A total of 297 volatile organic compounds (VOCs) were tentatively identified across all analyzed milk samples. The heatmap, together with the dendrogram, revealed a clear hierarchical structure in the volatile profiles of the cow milk and plant-based milk samples (
Figure 1). The most prominent pattern was the primary division between animal- and plant-derived samples. Cow milk appeared as a clear outlier, showing a dense, high-abundance red block in the first third of the volatile compounds; these features were almost entirely absent in all plant-based samples, which were predominantly blue in this region. This strong compositional difference was also reflected in the dendrogram, where cow milk occupied its own distinct primary branch, confirming that its molecular profile was substantially different from all plant-based alternatives [
7,
10].
A particularly notable feature was the distinctive behavior of barley milk. In the heatmap, barley milk exhibited a unique set of high-abundance volatiles in the middle-right section, forming red bands that did not overlap with either cow milk or the other cereal-based plant milks. This indicates a singular volatile fingerprint, likely associated with malt-like characteristics. This uniqueness was supported by the dendrogram: although barley milk remained clearly distinct from cow milk, it clustered closer to cow milk than many of the other plant-based alternatives. This suggests that barley milk shares some degree of overall compositional similarity with cow milk, despite possessing its own characteristic volatile signature, a finding that has been scarcely reported in previous studies.
Another important pattern emerged in the commercial plant-based milk group, which included soy, oat, almond, and coconut milks. Although these products originate from biologically different raw materials (such as legumes, grains, and nuts), the heatmap showed that they shared several common high-abundance stripes on the right side. These shared signals constitute an industrial fingerprint, likely representing markers of Maillard reactions generated during thermal sterilization (e.g., UHT processing) or common stabilizers and masking agents used to uniformize commercial plant milks [
11]. The dendrogram also supported this interpretation by grouping almond and coconut milks together, while soy and oat milks formed another nearby sub-cluster, suggesting that processing-related similarities may partially override differences in botanical origin.
In contrast, the laboratory-prepared plant milk samples of sorghum, rice, corn, and quinoa milks, excluding barely, formed a relatively neutral group in the heatmap, characterized by generally low abundances across many of the VOCs associated with cow milk. This greenish cluster defined the baseline for a non-dairy, pure plant volatile profile. The dendrogram further resolved this pattern by clustering sorghum, rice, and corn milks closely together, indicating highly similar volatile patterns that likely reflect a composition dominated by carbohydrates. These samples showed very low abundances in parts of the regions that are specific in barely samples. Additionally, unlike the commercial plant-based milks, this laboratory-prepared group lacked the red bands observed on the right side of the heatmap, further supporting the idea that their volatile profiles better preserved the integrity of raw material and showed little evidence of industrial processing signatures.
Overall, the combined heatmap and dendrogram analyses demonstrated that sample grouping was driven by both biological origin and processing history. Cow milk was clearly separated from all plant-based samples, barley milk showed a unique profile while clustering closer to cow milk than the other plant-based milks, the commercial plant-based milks shared processing-related signatures despite their different raw materials, and the laboratory-prepared grain-based samples formed a mild, neutral group characterized by low-abundance, less industrially altered volatile profiles. These results highlight that volatile composition in milk alternatives is shaped not only by the source material itself but also by the extent and type of industrial processing [
10,
11].
3.2. Principal Component Analysis (PCA)
PCA was performed on unit variance-scaled data to evaluate differences in volatile profiles among cow milk, commercial plant-based milks, and laboratory-prepared grain-based samples. The first three principal components explained 62.2% of the total variance, with PC1, PC2, and PC3 accounting for 26.0%, 24.0%, and 12.2%, respectively (
Figure 2). Eigenvalue analysis showed that PC1 (76.84), PC2 (70.94), and PC3 (36.19) made the largest contributions among the retained components and together captured the main structure of variation in the dataset. Although PC1 and PC2 explained the major variation, PC3 provided additional resolution for clearer spatial separation among samples in the 3D score plot.
PC1 (26.0%) clearly separated cow milk from all plant-based samples, indicating that the major source of variation was the fundamental biological difference between animal milk and plant-based matrices. Cow milk was located at the positive end of PC1, consistent with its characteristic high-abundance volatile compounds, whereas all plant-based milks clustered on the negative side, reflecting the absence of key dairy-associated aroma markers.
PC2 (24.0%) mainly distinguished barley milk from the other grain-based samples. Barley milk was positioned on the extreme negative side of PC2, indicating a unique volatile fingerprint, likely related to its aroma compounds characteristic of malt and cereal notes. In contrast, sorghum, rice, corn, and quinoa milks were positioned closer to the center along PC2, suggesting relatively milder and less distinctive volatile profiles.
PC3 (12.2%) represented a secondary separation associated with sample origin and processing background. Most commercial samples were located in the positive PC3 region, whereas laboratory-prepared samples were generally positioned away from this cluster. The commercial samples were also more broadly scattered, suggesting greater variability and compositional heterogeneity within this group. This suggests that PC3 reflected volatile characteristics related to industrial processing, such as UHT treatment and notes derived from Maillard reactions, together with the more complex volatile background of animal milk [
11]. In contrast, the laboratory-prepared samples remained chemically distinct from this commercial cluster.
The relatively tight clustering of the laboratory-prepared samples also indicates good reproducibility, suggesting that the preparation procedure was sufficiently controlled and uniform. In addition, their more consistent volatile profiles, which were less influenced by processing, may provide a cleaner basis for the future formulation of plant-based milk substrates that could serve as promising alternatives to cow milk.
3.3. Functional Group Analysis
Among the major volatile classes that form the structural basis of the aroma profile, including alkanes, alkenes, esters, alcohols, and ketones, cow milk showed a characteristic enrichment in carboxylic acids (
Figure 3). This feature is a typical marker of dairy aroma and likely reflects lipid and protein degradation pathways associated with animal fat, contributing to a sweeter, creamier, and more dairy-like volatile background. It also helps explain the clear separation of cow milk from the plant-based samples along PC1 [
12].
The higher levels of pyridines and ketones observed in grain-based samples (
Figure 3) such as corn, barley, and quinoa may be associated with heat reactions, including Maillard-type pathways and Strecker degradation, during extraction, homogenization, or sterilization. These reactions generate roasted, toasted, and cereal-like volatiles that are more characteristic of processed plant matrices than of fresh dairy milk [
13]. The greater abundance of phenolic compounds in the plant-based groups, especially in soy, can also be linked to the intrinsic composition of the raw materials. Phenols are widely distributed in plant tissues as secondary metabolites or degradation products, and their predominance in plant-based milks, together with their near absence in bovine milk, highlights a fundamental compositional difference between plant and animal material matrices. These compounds may also contribute beany, smoky, or astringent sensory notes commonly associated with plant-based products [
4,
14]. In addition, the elevated levels of ketones and aromatic hydrocarbons in certain plant-based samples likely stem from the autoxidation or enzymatic degradation of polyunsaturated fatty acids, such as linoleic and linolenic acids, which are prevalent in corn and quinoa matrices [
15,
16]. These findings align with previous reports indicating that postharvest treatments like drying or stabilization can further accelerate the formation of these lipid-derived volatiles by promoting hydroperoxide decomposition [
17,
18]. Furthermore, the presence of specific aromatic hydrocarbons may be attributed to both the intrinsic raw material background and environmental conditions during cultivation. While the inherent characteristics of the cereal matrix provide a fundamental basis for aroma, cultivation conditions may influence the availability of key precursor substances, such as reducing sugars, amino acids, and lipids, thereby affecting subsequent aroma formation and final product quality [
19,
20].
The differentiation of barley milk along PC2 was also consistent with its functional group profile. Compared with the other laboratory-based samples, barley milk showed higher abundances of furans, pyrazines, and other heterocyclic compounds associated with cereal, malt, toasted, and roasted notes, together with relatively higher levels of aldehydes, alcohols, and alkanes (
Figure 3). The elevated aldehyde content may reflect more active Strecker degradation and other heat-induced reactions, which are known to generate malty and toasted aroma compounds. Kinetic studies have indicated that barley contains higher levels of precursor amino acids, such as leucine and valine, which facilitate a more rapid conversion into branched-chain aldehydes compared with other grains [
21]. This enhanced thermal reactivity likely contributed to the more pronounced formation of these key odorants, resulting in a more distinctive volatile composition in barley than in the other laboratory-prepared grain samples. The higher abundance of alcohols and alkanes also points to a stronger contribution from lipid-derived volatiles, likely related to differences in substrate composition and oxidation behavior [
21]. These compounds may have added grassy, waxy, and fatty nuances, giving barley milk a fuller and more complex aroma profile. Overall, barley milk exhibited a more compositionally rich volatile pattern than sorghum, rice, corn, and quinoa milks, which showed relatively milder and more neutral functional group distributions. These differences likely contributed to the unique volatile fingerprint of barley milk and help explain its clear separation from the other laboratory-prepared samples [
11].
The separation observed along PC3 could also be interpreted through functional group composition. Market samples, shown in blue tones, tended to exhibit greater contributions from volatile classes related to processing, including pyrazines, amides, and thiols, which are often associated with heat treatment, UHT processing, and Maillard reactions (
Figure 3). These compounds may contribute roasted, nutty, or slightly pungent notes, bringing commercial products closer to a more processing-influenced volatile profile [
7]. In contrast, the laboratory-prepared samples showed a comparatively simpler and more compact distribution, with a lower contribution from functional groups associated with processing. This is consistent with their tight clustering in the PCA plot and supports the view that the laboratory preparation procedure was sufficiently controlled and reproducible. These samples were more consistently characterized by ketones and aldehydes, which likely reflect the primary lipid oxidation pathways that remain undisturbed by the thermal treatments or antioxidants often used in industrial settings. The presence of sulfur-containing compounds and furans in these controlled samples suggests a delicate balance of natural degradation that has not yet been masked by the heavy Maillard reaction products typical of large-scale thermal processing. These features supported a volatile background that was grassier, fresher, and more grain-like, with less apparent influence from industrial processing [
10].
3.4. Volcano Plot Analysis
To systematically interpret the compositional differences among samples, cow milk was used as the reference group for comparison with all other samples. Volcano plot analysis was first conducted to identify significantly upregulated and downregulated volatile compounds in each comparison. The significantly altered compounds were then prioritized according to −log
10 (
p) values, ranked from high to low so that the most statistically robust differences could be examined first. While the volcano plots indicate which compounds changed and how statistically reliable those changes were, abundance data provide information on the practical weight of those changes within the overall volatile profile. Therefore, the compounds selected via volcano plot screening were further evaluated alongside their abundance levels to assess their actual contribution to sample differentiation. The full volcano plots are provided in the
Supplementary Materials (Figures S1 and S2).
In addition, the volcano plot results were interpreted in parallel with the PCA structure. To align with PC1, all plant-based samples including market samples and laboratory-prepared samples were individually compared against cow milk. Compounds that were consistently and significantly upregulated or downregulated across these comparisons were identified, and their abundance patterns were examined to determine whether they represented the common compositional gap between dairy and plant-based systems. The corresponding heatmap clearly visualized this separation by showing the volatile compounds that consistently differentiated cow milk from the plant-based milk alternatives (
Figure 4).
Among the commonly upregulated compounds, cow milk showed a clear enrichment in acetic acid, hexadecanoic acid esters, including the ethyl and methyl esters, and several long chain fatty acid derivatives such as compounds related to 9,12 octadecadienoic acid. This pattern was consistent with the functional group results, which indicated that cow milk samples were characterized by relatively high levels of carboxylic acids. These compounds were present at high relative abundances in the bovine milk samples, whereas they were absent or detected only at very low levels in most plant-based samples, suggesting that they represent the core volatile markers of the dairy profile in this dataset. The pronounced difference in abundance further indicates that these acid and ester related volatiles provide an important chemical basis for the separation of cow milk from plant-based systems along PC1 and contribute to the rich and creamy sensory character typically associated with dairy products [
14].
In contrast, the common downregulated compounds were generally characterized by lower abundances in cow milk but higher abundances in selected plant-based samples, reflecting the presence of plant-associated volatile features. For example, coconut and oat samples showed relatively high levels of compounds such as decane, hexanoic acid, and pyrazine, whereas almond and soy were characterized by an enrichment of octane and 2-octene, which may be associated with their nutty or beany aroma attributes [
12,
14,
22]. In addition, quinoa and sorghum exhibited comparatively higher levels of benzene derivatives and 2-butanone. These results indicate that, although plant-based samples clustered together apart from dairy milk at the global PCA level, they still retained matrix-specific volatile enrichments within the broader plant-associated chemical space.
Using the same interpretation strategy, PC2 was examined by focusing on the distinctive displacement of barley milk. This was achieved by comparing the volcano plot of barley milk versus cow milk with those of the other laboratory-prepared plant milks versus cow milk. Compounds that were significantly upregulated in the barley comparison but were insignificantly or only marginally different in the other laboratory-prepared comparisons were considered candidate markers underlying the unique barley-associated shift. The corresponding heatmap revealed a clear barley-specific enrichment pattern, as the selected compounds were consistently abundant in barley milk but present at low or near-absent levels in the other laboratory-prepared samples, including sorghum, quinoa, and corn (
Figure 5). This marked abundance gap indicates that these volatiles were the main contributors driving the separation of barley milk from the other lab-made plant-based milks along PC2, highlighting a barley-specific volatile signature rather than a general plant-based feature.
Examination of the individual compounds further suggested that this barley-associated shift was driven by chemically meaningful aroma contributors. For example, 2-butyl-3-methylpyrazine was one of the most notable barley-enriched compounds. As pyrazines are commonly associated with roasted, toasted, nutty, or malty aroma notes, its selective enrichment in barley milk provides a plausible explanation for the cereal-like and toasted sensory impression often associated with barley-based systems [
4,
23]. In addition, compounds such as 2-methyl-1-undecanol and hexanoic acid, 2-propenyl ester point to a distinct lipid-derived volatile background in barley milk, which may contribute to creamy, fatty or fruity nuances in the overall aroma profile [
21,
24]. This interpretation is further supported by previous sensory studies, since the selective enrichment of pyrazines in barley milk is in line with the roasted and nutty attributes reported in barley-based systems and with the recognized sensory role of pyrazines in such aroma perception [
7,
25].
Several hydrocarbon-related compounds, including bicyclo [4.1.0]heptane, 3-methyl-7-pentyl-; and pyrene, hexadecahydro-, were also preferentially enriched in barley. Although such compounds may not always be dominant odorants individually, their collective enrichment suggests that barley possesses a more complex and distinctive volatile matrix than the other tested lab-made grain samples. Notably, these compounds have not previously been reported in the volatile profiles of plant-based milks, further suggesting that the barley matrix possesses unique volatile characteristics.
Likewise, to align with PC3 (12.2%), which reflected a possible industrialization-related fingerprint, compounds were screened that were significantly upregulated in commercial samples versus cow milk but showed little change or remained at very low levels in laboratory-prepared samples versus cow milk. These compounds were interpreted as candidate markers associated with industrial processing, formulation, or differentiation driven by additives. The corresponding heatmap showed that these compounds were almost entirely absent in cow milk and remained relatively low in the grain-based laboratory-prepared samples, whereas the strongest enrichment was concentrated in commercial almond, oat, and soy products (
Figure 6). This distribution indicates that the separation captured by PC3 was driven not only by plant origin but also by volatile features associated with commercial formulation and processing history.
Several of the enriched compounds further supported this interpretation. For example, 2-furanmethanol, 5-methyl-, which showed particularly high abundances in oat and soy samples, is commonly associated with thermal processing and Maillard-type reactions [
26]. Its enrichment in commercial samples suggests that intensive heat treatment, roasting, or shelf-stability processing may have contributed to the generation of cooked or caramelized volatile notes that were not evident in the laboratory-prepared samples This explanation is also supported by previous sensory studies showing that processed commercial plant-based beverages can exhibit more pronounced caramel-like, nutty, or stale sensory characteristics due to various industrial processing methods [
27,
28]. In addition, a series of hydrocarbon-related compounds, including substituted cyclohexane and heptane derivatives, were particularly prominent in almond milk. Although these compounds may not necessarily act as dominant odorants individually, their preferential occurrence in commercial products suggests a possible link to industrial oil handling, extraction history, or formulation-related processing aids [
29]. Another notable compound was (S)-(+)-1,2-propanediol, which was abundant in almond and oat samples. As this compound is commonly used as a stabilizer to prevent separation, particularly in products requiring high-viscosity or moisture retention [
30], its presence further supports the view that the volatile profile of commercial samples reflected a higher degree of formulation complexity than that of the laboratory-prepared systems. In addition, Ether, 2-ethylhexyl tert-butyl was detected; this compound that has rarely been reported in plant-based milk systems. Its occurrence in the commercial samples suggests that, beyond processing history and formulation, packaging may also contribute to VOC differentiation in market products. Such compounds have been associated with the migration of volatile substances, including ethers and compounds derived from plasticizers, from packaging materials into food products, particularly beverages [
31]. Although direct evidence in plant-based milk remains limited, sensory research in fluid milk has demonstrated that packaging type can influence sensory properties and off-flavor perception, supporting the possibility that packaging-related aroma interactions may also contribute to sensory variation in commercial plant-based milk products [
32].
Furthermore, to investigate the compositional basis underlying the clustering of laboratory-prepared samples (excluding barley), the significant-difference lists from the relevant volcano plots were compared for overlap. The resulting heatmap revealed a set of compounds that were consistently enriched across the laboratory-prepared samples, including sorghum, quinoa, corn, and rice, while remaining low or absent in cow milk (
Figure 7). These shared compounds likely constituted the common volatile background that bonded these samples into a relatively tight cluster in the PCA space. Among them, compounds such as pyrazine and 1H-pyrrole, 1-ethyl- suggest that the lab-scale preparation process generated a common toasted or cereal-like aromatic foundation across different grain matrices, as reported in previous studies [
33]. In addition, Hexanoic acid, 1-cyclopentylethyl ester, although not yet reported in plant-based milk systems, has been described as a compound associated with green top notes [
34]. Likewise, there is currently no direct reports linking 2,6-Diphenyl-6-methyl-1,3-dioxan-4-one to plant-based milk products. Given its woody and fruity odor characteristics and its reported use in perfume oils, its detection in the present study may reflect contributions from the intrinsic material background of the laboratory-prepared samples rather than commercial processing influence [
35].
In addition, nonanal and 2-butanone, which are typical lipid oxidation-related volatiles, were detected across the laboratory-prepared group, indicating that these samples also shared a similar oxidative background associated with plant-based emulsions and thermal preparation [
36].
The heatmap also showed a complementary set of compounds, including acetic acid, dodecanoic acid, and hexadecanoic acid esters, that were consistently lower in the laboratory-prepared samples than in cow milk. The uniform absence of these milk-associated fatty acid and ester markers further reinforced the similarity among the laboratory-prepared samples, as all of them shared the same compositional distance from cow milk, as observed before. This relatively consistent and simplified volatile background suggests that these samples may serve as promising candidate matrices for future plant-based milk development. Compared with commercial systems, their less complex volatile composition may offer a more controllable starting point for formulation and process optimization, allowing target aroma traits to be introduced and managed with greater clarity and precision.