Rapid Discrimination and Authentication of Korean Farmstead Mozzarella Cheese through MALDI-TOF and Multivariate Statistical Analysis

Geographical origin and authenticity are the two crucial factors that propel overall cheese perception in terms of quality and price; therefore, they are of great importance to consumers and commercial cheese producers. Herein, we demonstrate a rapid, accurate method for discrimination of domestic and import mozzarella cheeses in the Republic of Korea by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). The protein profiles’ data aided by multivariate statistical analysis successfully differentiated farmstead and import mozzarella cheeses according to their geographical location of origin. A similar investigation within domestic samples (farmsteads/companies) also showed clear discrimination regarding the producer. Using the biomarker discovery tool, we identified seven distinct proteins, of which two (m/z 7407.8 and 11,416.6) were specific in farmstead cheeses, acting as potential markers to ensure authentication and traceability. The outcome of this study can be a good resource in building a database for Korean domestic cheeses. This study also emphasizes the combined utility of MALDI-TOF MS and multivariate analysis in preventing fraudulent practices, thereby ensuring market protection for Korean farmstead cheeses.


Introduction
Food authentication procedures are gaining much importance in the international food market because of increasing consumer awareness regarding food quality and safety. Market globalization and industrialization of food products forced the industry to become very competitive and financially profitable by creating more opportunities in adulteration/false labeling [1,2]. The growing demand for high-quality foods, the desire for new delicate products or foods having specific organoleptic traits, and the consumer's ability to pay a high price provoke more vulnerability in counterfeiting [3]. The increased exposure of food scandals and their adverse effects on public health in recent years has urged consumers to demand more evidence on the authenticity and quality of popularized foods. Substitution of rare/expensive ingredients with cheaper ones to attain high financial gains is crucial in animal-origin foods, especially in dairy products [3,4].
Cheese is the most prominent dairy food that has become the subject of counterfeiting in the global market. The premium price for specialty cheeses and fluctuation in milk availability make it profitable for the cheese manufacturers to engage in adulteration and reduce production costs [5,6]. Food frauds in cheeses entail several undesirable consequences, and they are consumed widely from infants to aged people in many countries. Fraudulent substitution of low-cost milk and/non-milk fat during the cheese production and mislabeling of cheese composition/geographical origin can expose consumers to potential allergens and deceive consumers' gratification [1,7,8]. The cheese producers must compulsorily provide the species origin of milk as few consumers evade cow's milk for allergic/intolerance disorders and ethical/religious reasons [5,9]. Because of these legal aspects and consumerism and reliability in the dairy industry, authentication becomes mandatory to define the uniqueness against frauds or commercial disputes [2,10].
Cheese consumption in the Republic of Korea (ROK) is steadily increasing over the decades, outpacing domestic production. To meet the growing demand in the consumer market and food sectors, the country highly depends on imported cheeses from the United States, the European Union, New Zealand, and Australia [11]. Despite the high market price, growing health concerns towards food safety and a healthy lifestyle have forced many Koreans to rely upon domestic farmstead cheeses, consequently increasing the proportion of domestic dairy farms [12,13]. The high production costs, demand, and uniqueness drive the necessity to authenticate farm products, thereby protecting consumers against the deception of adulterant and imitant cheeses. To identify mislabeling about the geographical origin for cheeses is an international issue gaining increased attention, as the specialty cheeses or cheeses produced in particular regions relate to a high price for their superior and precise attributes [1,8]. Most of the authentication techniques in dairy products depend upon the analysis of fatty acid profiles or milk protein content. In last few years, matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry, a powerful analytical tool, is gaining increased attention to detect adulteration in milk and cheeses [6,7,14,15]. This tool is already well-known for microbial identification [16] and biomarker discovery in cancer diagnosis [17] owing to its high precision, rapidity, simple operation, and great sensitivity, even in low molecular weight samples [15]. More recently, protein/peptide fingerprints from MALDI-TOF is demonstrated for effectively discriminating the geographical origin or species in foods, including kimchi [18], wine [19], honey [20], mushrooms [21], vegetable oil [22], and meat and gelatin [23].
In this study, our goal was to employ MALDI-TOF MS to determine the protein fingerprints of domestic and import mozzarella cheeses, as well as to combine multivariate statistical analysis to discriminate the samples on the basis of geographical origin. To the best of our knowledge, we are the first to analyze the protein profiles in the domestic mozzarella cheeses of the ROK. Moreover, identifying potential markers allows for functional application in safeguarding the authenticity and traceability of the farmstead cheeses.

Spectra of Mozzarella Cheese Samples
Cheese is growing as a part of the Korean dietary culture, and there is a growing demand for the mozzarella type for its wide application in foodservice and food processing sectors. MALDI-TOF has been suggested as a valuable tool in the investigation of milk and cheese protein profiles. Few studies have shown MALDI-TOF suitability to evaluate fraudulence in commercially produced water buffalo mozzarella cheese [24] and Pecorino cheese [5]. To the best of our knowledge, there is no research on protein profiles of the domestic cheeses produced in the ROK. Herein, we attempted to develop a direct method using MALDI-TOF MS to characterize the protein profile and to discriminate the mozzarella cheeses (from cow milk) according to geographical origin. For this purpose, cheese samples from 24 domestic farms, 9 domestic companies, and 10 import types were analyzed for their protein profiles, and a representative example from each type is illustrated in Figure 1. The y-axis corresponds to the ion intensity (arbitrary units), while the mass/charge ratio (m/z) is represented on the x-axis. Peaks with m/z from 1000 to 5000 can be well observed.  The mass spectra and protein profiles clearly define the complexity of proteolysis; further, innate enzymes in cheese and thermal events hydrolyze the casein fractions into several diverse peptides [7,25]. Moreover, the pH gradient developed during the cheese ripening process imparts favorable or unfavorable conditions for the activity of the enzymes involved in the ripening process that corroborates with the complexity of proteolytic process [25]. The m/z peaks at 1869.6, 2754.8, 3470.6, 4478.6, 4693.1, 4909.1, 6127.5, 7049.6, 8631.3, 11,810.9, and 12,254.5 were common in all the mozzarella samples used in this study. The results are also in good agreement with the earlier findings of Rau et al. [26] and support the presence of these signals only in mozzarella cheeses produced from cow's milk.
The proteins from MALDI-TOF spectra can be determined and quantified by identifying the several peaks and assigning them to specific proteins on the basis of the protein molecular mass data published earlier. However, individual proteins can deviate from the exact position of the peak, and subsequently their molecular mass due to genetic and environmental factors, in particular, the processing conditions of milk (thermal denaturation and proteolysis) can modify individual protein structure [14]. Accordingly, the m/z peaks 1869.6 and 2754. 8 showing the higher relative intensities in all the samples can be assigned to a s1 -CNf(1-16) and a s1 -CNf(1-23) fragments corresponding to m/z 1877 and 2764, respectively, mentioned in previous studies [27][28][29][30]. The a s1 -CNf(1-23) recognized for immunomodulatory and antimicrobial potential is found mainly in casein-hydrolyzing peptidases from the starter microorganisms [25,31].
Statistical analysis of all the peak intensities between the groups (farm/company/import) using one-way ANOVA revealed a total of 17 significant peaks (ranging from m/z 1231.80 to 14,092.60) ( Table 1). For simplicity, only peaks of statistical significance are discussed. The representative average intensities and standard deviation for the above-mentioned peaks in each sample group are shown in Table 1. The variation can be attributed to the fact that the cheese samples used in this study were obtained from different producers and different geographical locations, as well as being produced at different intervals. The protein profiles and their intensities in the cheeses are greatly influenced by the different production protocols of each production unit [2,9]. For multiple comparisons between the groups, we submitted p-values obtained from the ANOVA to Tukey's post hoc analysis in GraphPad Prism 6.0 software. Pairwise comparisons between the groups revealed higher significance for farmstead samples against both domestic companies and imports. Four peaks in farmstead samples (m/z 1231.80, 4025.02, 4909.33, and 10,426.62) were significantly different (**** p < 0.0001) in comparison to samples from imports and domestic companies. On the other hand, the significance was found to be much lower in the pairwise comparison between company and import groups. a -ANOVA p-value, FDR obtained using the one-way ANOVA analysis in the software MetaboAnalyst 5.0. b -Tukey's HSD test was performed with ANOVA p-values in GraphPad Prism 8.0 (* p < 0.05, ** p < 0.01, *** p < 0.001, **** p < 0.0001; ns = not significant).

Discrimination of Cheese Samples from Farm and Import
The statistical analysis of data representing the peptide profile is a powerful approach that can be applied to verify the origin of cheese samples. With the aim to obtain precise information about the peptides responsible for the discrimination of groups (farmstead/company/import) according to producer/geographical origin, we used a combination of univariate (volcano plot) and multivariate analyses (PLS-DA).
The volcano plot analysis between the cheese samples from farmstead and import (p < 0.01, fold ≥ 1.5) revealed 21 peaks as statistically significant, of which 12 were downregulated and 9 were upregulated ( Table 2).
The discriminant analysis by PLS-DA ( Figure 2) applied between the same groups explained a total variance of 52.5%. The score plot (Figure 2a) shows a clear differentiation between farmstead and import samples, indicating higher variation in their mass data and protein profiles/intensities. Each point in the score plot represents an individual sample. The farmstead samples were found to be tightly clustered inside the Hotelling T2 ellipse (95% confidence region), indicating they share major similar profiles. In contrast, the import samples are sparsely located from each other inside the ellipse. This can be attributed to the fact that the import samples used in this study were obtained from different geographical locations and produced at different time intervals using different procedures. The corresponding loading plot (Figure 2b) presents variables that define the samples in score plot, whereas important variables for group discrimination were certainly in higher positive and negative loadings of PC1. The samples located on the right side of the score plot tended to have higher relative intensities of proteins on positive loadings and a lower abundance of proteins on negative loadings. Interestingly, the variables identified as discriminative were the same as those identified by univariate analysis as having a trend toward being different between the two groups. The variation in climatic conditions, feed type, grassland, and phenological stages influences the protein profiles of the milk, which in turn influences the protein composition of the cheeses. These results show that the geographical origin (and indirectly the environment and diet) plays a key role in the peptide profile of cheeses [2]. m/z-mass value; p-peak comparison of the respective groups; FC-fold change; "+" denotes fold change (upregulated); "−"denotes fold change (downregulated).
The VIP plot from the PLS-DA enables selection of the most essential variables in discrimination on the basis of their VIP scores (VIP >1) [32,33]. Ten peak lists (Figure 2c) were selected as potential classifiers, among which intensity of six peaks (m/z 1869.6, 4024.7, 1231.8, 2243.9, 4909.4, and 10,426.6) was higher in farmstead samples, while four (m/z 3976.2, 4479.9, 3470.1, and 12,255.2) were higher in imports. In general, every classification analysis including PLS-DA needs strict validation as it can easily overfit the data. Therefore, the cross-validation method is crucial to measure the predictive accuracy and reliability of the model with limited samples. In our study, the performance of the PLS-DA model was validated through 10-fold cross-validation (10-fold CV). The cross-validation method was validated using the two parameters (R 2 X and Q 2 ), where R 2 X and R 2 Y signify the "goodness of fit" for "predictors" and "responses", respectively, and Q 2 indicates the "goodness of prediction". The model is said to have a good predictability when both the parameters are approaching 1 [32,33]. The R 2 X and Q 2 of the PLS-DA model were 0.919 and 0.817, respectively, representing a good predictive ability.
Finally, hierarchical clustering analysis (HCA) was performed in order to monitor the related clusters and sub-clusters visualized in a dendrogram graph, in which the samples with maximum similarities were clustered preferentially [32]. The dendrogram shown in Figure 2d was obtained by scaling the same variables used for PLS-DA and applying a complete linkage to Euclidean distance between the two groups. The results confirmed the clear separation of cheese samples (farm and import) into two overall clusters, visually representing the variation in geographical origin. The overall results confirm the practical possibility of using protein profiles as a screening tool in discrimination of the cheeses according to geographical origin.

Discrimination of Cheese Samples from Company and Farmstead
The volcano plot analysis (p < 0.01, fold ≥ 1.5) between the cheese samples produced in the Korean companies and farmsteads revealed 18 significant peaks, among which 8 were downregulated and 10 were upregulated ( Table 2).
The PLS-DA analysis ( Figure 3) explained a total variance of 54.7% (Figure 3a) with a slight overlapping between the groups. It can be related to the raw milk composition used for cheese production in both groups being from the same geographical location of origin. One farmstead sample was found to be lying outside the Hotelling T2 ellipse (Figure 3a), while most of the other farmstead samples were tightly clustered together inside the ellipse. In terms of samples from domestic companies, some were sparsely located inside the ellipse, while some clustered together. On the contrary, a complete separation of variables on positive and negative sides of the loading plot (Figure 3b) was evident. The variables located in the loading plot's negative and positive side represented higher intensities of proteins for samples on the left (domestic companies) and right (farmstead) sides of the score plot.
The variables identified as discriminative in the loading plots were identical to those identified by univariate analysis as having a trend toward being different between the two groups. The variation between the samples from farmsteads and domestic companies can be allied to the usage of imported curds in cheese production by domestic companies. The result agrees with the findings of Feeney et al. [33] that starter culture can adversely affect the composition of small peptides and amino acids in mozzarella cheese. Moreover, several studies have revealed the influence of starter composition on the cheese characteristics and quality [34,35]. The VIP plot displayed seven potential proteins (Figure 3c  The dendrogram of HCA analysis (Figure 3d) displays four samples from Korean companies clustered in the same clade of the farmstead samples, while remaining samples from the domestic companies clustered as a separate clade. The variation within the cheese samples from domestic companies can be related to the differences in the cheese processing methodology and starter culture composition [34,35]. The results collectively demonstrate the ability of MALDI-TOF protein profiles in discrimination of the domestic cheeses from farmstead and companies, even though both the producers utilize raw milk from the same geographical location of origin.

Discrimination of Cheese Samples from Company and Import
The volcano plot analysis (p < 0.01, fold ≥ 1.5) between cheese samples from domestic companies and imports revealed only three significant signals, of which two were downregulated and one was upregulated ( Table 2).
The PLS-DA ( Figure 4) score plot between the two groups shows partial overlapping, with a total variance of 30.7% (Figure 4a). It is worth mentioning that the total variance decreased < 50% in comparison with earlier plots. Moreover, the variables in loading plots (Figure 4b) did not display clear separation between the groups. These results indicate that the peptide profiles in the cheese samples between the two groups shared major similarities. This may have been due to the usage of imported curds in domestic companies for cheese production. The impact of geographical location of origin of the starter cultures on peptide profiles has been confirmed earlier in a few cheese varieties [34,35]. The VIP plot (Figure 4c) shows nine of the most potential variables in the group discrimination, of which five (m/z 4692.5, 4232.1, 1869.6, 2753.8, and 2244.3) were higher in samples from domestic companies, while four (m/z 1470.8, 5905.4, 3715.3, and 4479.6) were higher in imports. The R 2 X and Q 2 of the PLS-DA model were 0.943 and 0.305, respectively, indicating a weak model.
The result of the HCA analysis visualized as a dendrogram also shows mixed clustering of samples from import with company ( Figure 4d). As stated earlier, the mixed clustering might have been due to the influence of import starters employed in domestic companies for cheese production, which contributed to a major similarity of the peptide profiles between the two groups. Previous research works stated that the microbial composition of the traditional water buffalo mozzarella [34] and Caciocavallo Silano [35] cheeses are found to be highly influenced by the natural whey starters used in their production and are closely related to the geographical origin of the starters. Therefore, the peptide profiles of cheeses from the Korean companies did not differ greatly with the import samples, even though the former ones utilize native milk for cheese production. These reasons perhaps explain why the cheese samples from domestic companies and import were clustered in the PLS-DA (loading plot) and HCA graphics. Overall, the statistical analyses showed a low and insignificant discrimination potential between the groups, indicating a major share of similar protein profiles/intensities due to the usage of imported curd as a starter in cheese production by the domestic companies.

Potential Marker Proteins in Discrimination of Cheeses
The application of MALDI-TOF MS in dairy products mainly focuses on the attention towards using intact proteins as a marker for adulterations [6]. Identification of peptides specific to milk species has been unveiled as a convenient approach for detection of fraudulence in milk and cheeses [1,7,8]. This study focused on the ability to use specific proteins as a biomarker fingerprint for authentication and discrimination of the mozzarella cheeses according to the producer/geographical origin, which was determined using the biomarker discovery (inter-class analysis) tool in Mass-Up software. For that, the m/z peaks with a q-value <0.2 calculated by Benjamini-Hochberg FDR with 100% detection thresholds were selected as potential markers with the most significant discriminatory power as they denote the presence of specific peaks in the sample while not in others. In this study, a total of seven peaks (m/z 1101. 5, 1231.8, 3976.2, 4024.7, 4233.6, 7407.8, and 11,416.6) were selected as markers on the basis of their discrimination power (Table 3). Five peaks were able to be assigned to protein mass available in earlier studies [27,28,[36][37][38][39][40] associated with cheese/dairy products (Table 3). Interestingly, the remaining two unidentified peaks at m/z 7407.8 and 11,416.6 were specific to Korean farmstead cheeses. The three peaks at m/z 1101.5, 1231.8, and 4024.7 found in all Korean domestic (farmstead and companies) cheeses were absent in import samples, entailing variation in raw milk composition. Likewise, two peaks at m/z 3976.2 and 4233.6 common in cheese samples from domestic companies and import samples were not found in farmstead cheeses. These specific peaks can serve as robust markers to enable effective and high-reliable identification of cheeses originating from both Korean farmstead and companies, as well as from other countries. Hence, the method utilized herein can be applied to ascertaining the geographical origin of cheeses samples. The comparison of predicted and theoretical m/z confirmed that the se-lected peptides can serve as good candidates in the authentication of the cheeses. Therefore, constructing a database of mass spectra of the cheeses available in the Republic of Korea together with their geographical facts could allow for the identification of geographical location of origin of cheese samples within minutes.

Sample Collection
For this study, we used a total of 43 mozzarella cheese samples (n = 43) available commercially in the ROK and assorted into three categories according to the producers, namely, farmsteads (F), companies (C), and imports (I). The farmstead cheeses (n = 24) were purchased directly from dairy farms, while samples of domestic companies (n = 9) and imports (n = 10) were purchased via online and/retail markets. All samples were analyzed within a week or before the end of the shelf-life period.

Sample Preparation
The sample preparation for protein profiling in MALDI-TOF MS was based on the method described by Ham et al. [41]. For dissociation of the caseins and insoluble hydrolysis products, we homogenized each cheese sample (1 g) using 2.5 mL reduction buffer (pH 8.0; 73 mg of trisodium citrate dehydrate + 38 mg of dl-DTT in 37.5 mL of 8 M urea) and left for at least 1 h at room temperature. After centrifugation (15,000× g, 2 min), the clear supernatant (2 µL) was mixed with a matrix solution (2 µL) [sinapinic acid dissolved in 30% trifluoroacetic acid acetonitrile mixture (7 mL 0.1% trifluoroacetic acid + 3 mL acetonitrile)] in a 1:1 ratio. Approximately 1 µL of the mixture was hand-spotted onto the ground steel MALDI target plate (Bruker Daltonics, Bremen, Germany). The droplet was allowed to air dry at room temperature, and then the plate was inserted into the mass spectrometer.

Mass Spectrometry
The acquisition of the MALDI-TOF spectra from the cheeses was made using a Microflex TOF mass spectrometer (Bruker Daltonics, Germany) equipped with a pulsed N 2 laser (337 nm, 3-ns pulse duration). The spectrometer was run with the default parameter settings: positive linear mode, 60 Hz laser frequency, ion source 1 voltage: 20 kV, ion source 2 voltage: 18 kV, and mass range of m/z 1000 to 20,000 using the FlexControl 3.0 software (Bruker Daltonics, Germany). Three independent spectra were collected manually for each cheese sample to verify the mass accuracy and reproducibility, which was always within the range of 0.5 and 1%. The external mass calibration was measured daily using the Protein Calibration Standard I kit (Bruker Daltonics, Bremen, Germany).

Data Acquisition and Processing
Processing of the mass spectra data files was performed using the Flex Analysis 3.0 software (Bruker Daltonics, Bremen, Germany), and the raw data files were converted to mzmL format (m/z, intensity lists) using ProteoWizard 3.0 MSConvert [42]. The mzmL files were then imported into the Mass-Up open software (Mass-Up, Vigo, Spain) [43] and subjected to analysis, which included (i) intensity transformation (none), (ii) smoothing (none), (iii) baseline correction (Snip algorithm), and (iv) standardization (total ion current). Peak detection was achieved using MALDIquant (a multifaceted statistical package method) with a signal-to-noise ratio of 3, half-window size of 60, and minimum peak intensity (0.001). Peaks were matched through intra-sample and inter-sample matching (MALDIquant: tolerance (0.002), without selecting the generate consensus spectrum). Additionally, to improve the analysis, we applied quality control to detect and remove low-quality spectra.

Statistical Analysis
The pre-processed mass spectral data containing m/z values and intensity of the peaks obtained from the Mass-Up was subjected to statistical analysis.
The univariate and multivariate analyses were performed in the online statistical software MetaboAnalyst 5.0 (https://www.metaboanalyst.ca/, accessed on 4 January 2021). Peak lists with a statistically significant difference in terms of signal intensity/mass value were determined through volcano plot (univariate analysis).
For visualization and classification of cheese samples according to geographical origin, multivariate analysis using partial least-squares discriminant analysis (PLS-DA) was applied. Data normalization was performed by sum and scaled using the Pareto method. The PLS-DA is a supervised tool that aids in selecting the most relevant variables for sample discrimination, according to the variable importance in the projection score (VIP score > 1). The model was validated by multiple correlation coefficients (R 2 ) and cross-validation (Q 2 ) parameters.

Conclusions
This study demonstrated the protein profiles of domestic mozzarella cheeses produced in the ROK for the first time. This rapid and straightforward method was mainly developed to identify food fraud and counterfeiting in Korean farmstead cheeses. Combining MALDI-TOF mass fingerprint data with multivariate statistical analysis discriminated farmstead and import cheeses according to geographical origin. Interestingly, we observed huge differences within the protein profiles of domestic cheeses produced on farms and companies, although both producers were of similar geographical origin. The usage of imported curd as a starter in domestic companies greatly influenced their cheese protein profiles. The outcome of specific proteins in this preliminary study allows for further research on safeguarding Korean domestic cheese authenticity and traceability towards ascribing protected designation of origin. Furthermore, extensive cohort studies are necessary to verify the reliability and reproducibility of protein biomarkers as they may have a significant effect on proteolytic activity, starter materials, and processing conditions. It is eminent that peptides derived from the casein hydrolysis may present multiple biological activities. Hence, identification and characterization of the two unknown proteins designated as biomarkers in the farmstead samples and analyzing their biological activities can make the farmstead cheeses capable of being designated as functional food from a consumer's health perspective.