Towards a Standardized Approach for the Geographical Traceability of Plant Foods Using Inductively Coupled Plasma Mass Spectrometry (ICP-MS) and Principal Component Analysis (PCA)

This paper presents a systematic literature review focused on the use of inductively coupled plasma mass spectrometry (ICP-MS) combined with PCA, a multivariate technique, for determining the geographical origin of plant foods. Recent studies selected and applied the ICP-MS analytical method and PCA in plant food geographical traceability. The collected results from many previous studies indicate that ICP-MS with PCA is a useful tool and is widely used for authenticating and certifying the geographic origin of plant food. The review encourages scientists and managers to discuss the possibility of introducing an international standard for plant food traceability using ICP-MS combined with PCA. The use of a standard method will reduce the time and cost of analysis and improve the efficiency of trade and circulation of goods. Furthermore, the main steps needed to establish the standard for this traceability method are reported, including the development of guidelines and quality control measures, which play a pivotal role in providing authentic product information through each stage of production, processing, and distribution for consumers and authority agencies. This might be the basis for establishing the standards for examination and controlling the quality of foods in the markets, ensuring safety for consumers.

Despite the use of analytical ICP-MS methods to evaluate the content of elements in plant-based foods, relatively few publications exist on the metal content analyzed by ICP-MS in food. Most data analyzed by ICP-MS has been used for research on health risk assessment, daily uptake, and the development of analytical methods. Recently, multielement analytical data combined with principal component analysis has been increasingly applied in research on food geographical traceability [36,37]. The rapid development of ICP-MS applications in food traceability has made it a useful tool for verifying the origin of agricultural products and foods [38]. However, to date, no national or international standard methods for planting food geographical origin traceability have been established. Food traceability is still primarily conducted through the protected geographical indication (PGI) system, which involves identifying the specific ingredients of the food. However, analytical methods are mainly used to assess food quality rather than determine its origin. PGI is more of a commitment to agreement than a technical approach, and thus its application is limited in many countries. Table 1. Detected multielement in the plant-based food matrix.
Several common multivariate statistical methods are used for geographical origin determination, including principal component analysis (PCA), cluster analysis (CA), linear discriminant analysis (LDA), canonical discriminant analysis (CDA), and hierarchical cluster analysis (HCA). Among these, PCA is the most commonly used method due to its ability to effectively distinguish data with similar characteristics [73]. Therefore, this paper specifically focuses on the use of PCA for plant food traceability.

Multielement Analysis
Generally, trace elements represent the geographical tracer in a specific soil condition, and are absorbed via the roots and transferred to various parts of the plant. The distribution of trace elements reflects the elemental signature of the soil origin. In addition, the isotope ratios of the elements show the linkage between products and soil characteristics. Particularly, isotopes of heavy metals have been considered the most suitable for tracing a plant-based food's origin. However, the isotopes of light elements such as hydrogen, nitrogen, oxygen, and sulfur are considered reliable indicators of food authentication, but the ratio of these elements is too variable to serve as tracers of the soil where a product is produced [74][75][76].
ICP-MS is a robust analytical technique for the determination of multi-elemental composition (qualitatively), concentration (quantitatively), and isotopic abundances of various matrices. The structure and operating principles of the ICP-MS device have been presented in numerous previous documents. The ICP-MS analyzer can detect many elements and simultaneously identify their isotopes.
Out of the 118 elements in the periodic table, 16 elements are not recommended for measurement on ICP-MS and 44 elements cannot be measured on the ICP-MS instrument.

•
The elements not recommended for ICP-MS are B, Si, Cl, Ca, Br, Hg, P, S, Zr, Nb, Tc, I, Hf, Ta, W, and Os. Out of the remaining 58 elements, rare earth elements (REEs) can act as geochemical markers, however, less information using REEs in foodstuff traceability [77]. Additionally, there are other elements with low content in food samples, such as Ga, Ge, Rb, Y, Ru, Ir, Au, U, and Te. After removing the elements that are not present, 36 elements are left in the food sample. The elements selected for analysis on ICP-MS are presented in Table 2. These 36 elements are commonly analyzed using ICP-MS methods for elemental or multi-elemental determination in food traceability.

Sample Preparation
Solid samples are digested in strong and hot acid conditions, such as HNO 3 , HNO 3 /HCl, HNO 3 /H 2 O 2 , or HNO 3 /HF, which depend on the specific matrices. In general, samples are commonly digested with pure HNO 3 (65-70%) in a microwave oven, and then diluted with ultra-pure water [78]. There are various methods to convert solid samples into aerosols, including electrothermal vaporization (ETV), laser ablation (LA), microwave-assisted digestion (MAD), spark ablation, etc. The samples are then transported to the plasma by an inert gas. In these techniques, the ETV analysis method is used for combustible samples while the spark ablation is applied for conducting samples in sampling large spots with a diameter of 1-3 mm. The LA microanalysis technique uses high-irradiance (UV) lasers to measure very small spots (2-750 µm in diameter) on almost solid samples whilst the MAD method is applied for the sample preparation process in the analysis by ICP-MS, inductively coupled plasma atomic emission spectrometry (ICP-AES), graphite furnace atomic absorption spectrophotometry (GFAA), and flame atomic absorption spectrophotometry (FLAA).

PCA Tools
PCA is a popular multivariate statistical algorithm program to distinguish components from each other via six main steps by transforming a vector into a matrix in mathematics [79]. PCA plays an important role in reducing the dimensionality of complex datasets, changing them to a more simple and easier status, and minimizing information loss [80]. PCA and LDA are considered as the most powerful discriminators of data in multivariate analysis tools, which are commonly used to discriminate the geographical origin of plant-based agricultural products. While both PCA and LDA techniques were applied to identify linear combinations of features in the best data explanation, LDA is a technique reducing the supervised dimension that achieves the simultaneous data classification. LDA concentrates on finding a feature subspace which helps to enhance the separability between groups, whilst PCA is an unsupervised technique that disregards class labels and concentrates to capture the maximum variation direction in the datasets [79,80]. Figure 1 shows the different number of research publications using PCA, LDA, Knearest neighbor (KNN), and HCA multivariate statistical methods for geographical origin determination. PCA is a popular technique used in determining the geographical origin of agricultural products because it can reduce dimensionality by using main principal components (PCs) to express the information spread across numerous columns, wherein the first few PCs can account for an important proportion of the total variance. These PCs are then used as explicable variables in machine-learning models. In addition, datasets with more than three features or dimensions can be difficult in class visualization. It can be observed that a clear distinction between clusters or classes relies on the first two PCs, which allows for a simple and more effective visualization of the data [81]. s 2023, 12, x FOR PEER REVIEW origin of agricultural products because it can reduce dimensionalit cipal components (PCs) to express the information spread across wherein the first few PCs can account for an important proportion These PCs are then used as explicable variables in machine-learnin datasets with more than three features or dimensions can be difficult It can be observed that a clear distinction between clusters or classes PCs, which allows for a simple and more effective visualization of t Principal component analysis possesses some advantages: it is tion algorithm that can speed up machine learning processes and p PCA can improve the performance of machine learning (ML) algo unnecessary correlated variables; the variance of the PCA is high, w ualization of the data; and PCA can contribute to reducing noise, w matically ignored, making it a valuable tool for data analysis. On disadvantages of principal component analysis have been reported can sometimes be difficult to interpret, particularly when identifyin characteristics even after the calculation of the major components; variances and covariance matrices may sometimes be challenging; a cases, the computed principal components might be more difficult t original set of components.
Principal Component Analysis (PCA) can be a complex statist quires expertise in mathematics. However, there are several softwa that make it easier for non-specialists to perform PCA. These progra useful for determining the geographical origin of a sample. To this eral commonly used software options. Principal component analysis possesses some advantages: it is an effective computation algorithm that can speed up machine learning processes and prevent data overflow; PCA can improve the performance of machine learning (ML) algorithms by eliminating unnecessary correlated variables; the variance of the PCA is high, which allows better visualization of the data; and PCA can contribute to reducing noise, which cannot be automatically ignored, making it a valuable tool for data analysis. On the other hand, a few disadvantages of principal component analysis have been reported, including that PCA can sometimes be difficult to interpret, particularly when identifying the most necessary characteristics even after the calculation of the major components; the calculation of covariances and covariance matrices may sometimes be challenging; and, in some particular cases, the computed principal components might be more difficult to understand than the original set of components.
Principal Component Analysis (PCA) can be a complex statistical technique that requires expertise in mathematics. However, there are several software programs available that make it easier for non-specialists to perform PCA. These programs can be particularly useful for determining the geographical origin of a sample. To this end, Table 3 lists several commonly used software options.
In addition to the widely used software packages mentioned earlier, there are many other specialized software programs built for specific purposes depending on the type of data being analyzed. Some of these specialized software packages are designed for genomics, proteomics, medical imaging, or other specialized fields. The availability of these specialized software packages reflects the diverse and complex nature of modern data analysis.  The option to perform PCA on specific subsets of data, such as specific rows or columns of a matrix - The ability to save PCA results and load them for future analysis or comparison. [85]

Statistics Version 26
United States SPSS (Statistical Package for the Social Sciences) is, a IBM product, widely used statistical software in various fields, such as psychology, marketing, healthcare, and education. It has a broad range of statistical analysis options, making it a versatile tool for data analysis. Additionally, SPSS allows for data cleaning, data transformation, and data management, which are essential steps in the data analysis process. With SPSS, users can conduct various multivariate techniques, including principal component analysis, factor analysis, cluster analysis, and discriminant analysis, among others. The software is regularly updated to incorporate the latest statistical techniques and methods, making it a reliable and up-to-date tool for data analysis.
SPSS offers a variety of features for data analysis, including statistical analysis, data mining, and predictive analytics. Some of its key features related to PCA include the ability to perform principal component analysis to identify underlying structure in the data and reduce its dimensionality, the option to perform factor analysis to identify latent variables underlying observed variables, and the ability to generate graphical output to visualize the results. SPSS also offers a user-friendly interface, making it accessible to non-technical users, as well as a range of advanced statistical techniques for more experienced users. Additionally, SPSS allows users to automate analysis and reporting, making it a time-efficient option for large datasets. [86,87]

New Zealand
A collection of machine learning algorithms for data mining tasks was developed at University of Waikato. It is open-source software written in Java. Weka includes tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It provides a graphical user interface for exploring data and running machine learning algorithms, as well as a command-line interface for batch processing and integration with other software systems. Weka is widely used in both academia and industry for research, education, and practical applications in areas such as bioinformatics, text mining, image analysis, and more.
Weka has a built-in PCA algorithm that can perform principal component analysis on data sets. It allows users to select the number of principal components to be extracted and provides options for normalization and centering of the data. Weka also provides visualizations of the data, including scatter plots and parallel coordinate plots, to aid in understanding the results of the PCA analysis. Additionally, Weka's PCA algorithm can be used in combination with other machine learning algorithms available in the software for tasks such as classification and clustering. Weka provides several visualization tools for exploring and interpreting the results of PCA, including scatter plots, biplots, and correlation matrices. These charts can help users to better understand the relationships between variables and identify patterns in the data. Additionally, Weka supports the export of charts and graphs to various formats, such as PNG and PDF, for easy sharing and presentation of results. [88] Foods 2023, 12, 1848 9 of 19 While most software packages use similar algorithms and produce similar results, each package has its unique features, interfaces, and outputs. Some software packages, such as XLSTAT, have a vivid visual interface that makes it easy for non-expert users to interact with the software and interpret results. Other software packages, such as R, offer more advanced customization options and allow for greater flexibility in data analysis.
The ability to customize the analysis and interpret the results accurately is critical in scientific research and decision-making. Therefore, the choice of software package depends on the specific research question, the type of data being analyzed, and the expertise of the user. In addition to the analytical capabilities, software packages also offer various chart and graph options to visualize the data in different ways, allowing users to communicate the results more effectively. Overall, choosing the right software package is essential for efficient and effective data analysis.
There are numerous specialized software programs available for various purposes, depending on the type of data being analyzed. Most software requires a license to be purchased. While the software generally produces similar results due to using similar algorithms, each program offers different information, features, and interfaces. Among the various software programs, XLSTAT is often preferred due to its graphical representation and user-friendly interface. However, professional users tend to utilize R software, which allows for greater flexibility in tweaking the code to produce more accurate data ( Figure 2). In addition to the widely used software packages mentioned earlier, there are many other specialized software programs built for specific purposes depending on the type of data being analyzed. Some of these specialized software packages are designed for genomics, proteomics, medical imaging, or other specialized fields. The availability of these specialized software packages reflects the diverse and complex nature of modern data analysis.
While most software packages use similar algorithms and produce similar results, each package has its unique features, interfaces, and outputs. Some software packages, such as XLSTAT, have a vivid visual interface that makes it easy for non-expert users to interact with the software and interpret results. Other software packages, such as R, offer more advanced customization options and allow for greater flexibility in data analysis.
The ability to customize the analysis and interpret the results accurately is critical in scientific research and decision-making. Therefore, the choice of software package depends on the specific research question, the type of data being analyzed, and the expertise of the user. In addition to the analytical capabilities, software packages also offer various chart and graph options to visualize the data in different ways, allowing users to communicate the results more effectively. Overall, choosing the right software package is essential for efficient and effective data analysis.
There are numerous specialized software programs available for various purposes, depending on the type of data being analyzed. Most software requires a license to be purchased. While the software generally produces similar results due to using similar algorithms, each program offers different information, features, and interfaces. Among the various software programs, XLSTAT is often preferred due to its graphical representation and user-friendly interface. However, professional users tend to utilize R software, which allows for greater flexibility in tweaking the code to produce more accurate data ( Figure 2).

Applications of ICP-MS Combined with PCA for Determining the Origin of Agricultural Products
In recent years, numerous scientific studies worldwide have been conducted to successfully develop methods for determining the origin of food products for different agricultural commodities. Many of these studies have utilized the ICP-MS analysis method combined with PCA to determine the origin of various food products, such as wine [90,91], pork [7], sheep [92], mutton [15], bivalve mollusks [93], and sea cucumbers [94]. Meanwhile, studies on tracing the origin of plant-based food products are summarized in the Table 4.  The Table 4 illustrates that a large number of samples are necessary to determine the origin of food products. Typically, more than three samples are required from a single region. The greater the number of samples collected, the more accurately the distinctive characteristics of the region can be identified, resulting in a more precise identification. Additionally, a greater number of elements need to be identified using ICP-MS than in other studies, with a minimum of 20 elements being the most effective. Fewer elements lead to less information for identification and inaccurate results. Conversely, if there is insufficient data, the use of PCA statistics will not produce complete information. This is similar to having high precision but low accuracy. These factors highlight the significant effort, time, and analytical costs involved in food traceability studies, as more information is required than in other types of research [132,133]. Table 4 reveals that rice is the most extensively researched commodity worldwide in terms of traceability. The authenticity of rice has increasingly become a crucial issue in recent years. To authenticate rice, a range of techniques has been employed, such as determining its geographical origin, distinguishing between different cultivars, verifying organic rice authenticity, and detecting impurities in rice [134].

Current Related Standard
Although various traceability studies have been conducted on different food products as presented above, the current method of food traceability is still not regulated, making it difficult to accurately evaluate. Quality standards and packaging regulations are primarily set to evaluate the quality of products and determine their origin [80]. Several compelling factors are driving the need for accurate analytical methods to authenticate the origin of our food. The UK Food Standards Agency (FSA) has solicited public input on various food labeling issues. According to the FSA's research, "country of origin labeling" ranked high on consumers' list of demands for change [135].
Geographical indications (GIs) refer to the use of place names to identify products that originate from specific regions and protect their quality and reputation. They are commonly used for wines, spirits, and agricultural products. By granting certain foods recognition for their distinctiveness, GIs differentiate them from other foods in the marketplace, making them commercially valuable. GIs may also provide relief from acts of infringement or unfair competition and protect consumers from deceptive or misleading labels. Some examples of registered or established GIs include Parmigiano Reggiano cheese and Prosciutto di Parma ham from the Parma region of Italy, Toscano olive oil from Tuscany, Roquefort cheese, Champagne from the region of the same name in France, Irish Whiskey, Darjeeling tea, Florida oranges, Idaho potatoes, Vidalia onions, Washington State apples, and Napa Valley Wines. Over the past decade, determining the geographical origin of food has become an increasingly important issue for countries worldwide. Consumers are concerned about the authenticity of the food they eat [75].
To carry out geographical indications, scientists need to conduct a series of studies on chemical composition analysis using various methods of determination. This is a complex and expensive task that can sometimes be too costly for businesses to afford [61,99,136]. On the other hand, without experience, one would have to search for suitable methods, which would take a lot of time and effort because there is no pre-defined method. Therefore, issuing a feasible international standard method that can distinguish the geographical origin of plant foods will help reduce costs and time in PGI research. This will improve efficiency in agricultural production.

Main Steps of the Proposed Standard Method
Undoubtedly, creating standards for determining the origin of plant-based food is of the utmost importance. Based on the data collected, several essential steps need to be incorporated into the development of the standard, as illustrated in Figure 3. This entails the integration of two processes: (1) the profiling method and (2) the geographical traceability method. The profiling method involves the following key steps:

•
Step 1-Sample Collection: the received samples must ensure relevant information about the variable, geographical details, and coordinates. The collection of samples should include at least 5 or 10 samples per geographical region.

•
Step 2-Sample Analysis includes two methods: sample preparation and analysis. These methods are built depending on the equipment of each laboratory. However, it is necessary to ensure the analysis of at least 20 elements on the ICP-MS equipment. When building the standard, it is necessary to specify which elements and parameters of the method are included.

•
Step 3-Input data into PCA: it is necessary to set parameters for the PCA software. The PCA method needs to determine accuracy and reliability.
In the initial stages of the geographical traceability method, the sample of interest is an unknown entity. These samples are taken to the laboratory to undergo meticulous analysis using ICP-MS. The resulting multielement data is then fed into the PCA, allowing for effective differentiation and source identification. It is imperative that the findings are presented with precision and reliability.

Main Steps of the Proposed Standard Method
Undoubtedly, creating standards for determining the origin of plant-based food is of the utmost importance. Based on the data collected, several essential steps need to be incorporated into the development of the standard, as illustrated in Figure 3. This entails the integration of two processes: (1) the profiling method and (2) the geographical traceability method. The profiling method involves the following key steps:

Conclusions
The present review summarizes the research on the application of ICP-MS and PCA in the geographical origin authentication of agricultural products. Consequently, ICP-MS is a robust, accurate, and highly sensitive technique for determining the inorganic elements in food substances, whereas PCA can reduce dimensions, speed up machine learning processes, prevent data overflow and reduce noise. The combination of ICP-MS and PCA can be considered a powerful tool and a standardized approach to authenticating and certificating the geographical origin of plant-based foods, which plays an important role in protecting quality products. In addition, this might be the base for producers making decisions to enhance the effectiveness of the certification of their products to match the demand of consumers in the markets.

Conflicts of Interest:
The authors declare that there is no conflict of interest regarding the publication of this paper.