Geochemical Modeling of Copper Mineralization Using Geostatistical and Machine Learning Algorithms in the Sahlabad Area, Iran

: Analyzing geochemical data from stream sediment samples is one of the most proactive tools in the geochemical modeling of ore mineralization and mineral exploration. The main purpose of this study is to develop a geochemical model for prospecting copper mineralization anomalies in the Sahlabad area, South Khorasan province, East Iran. In this investigation, 709 stream sediment samples were analyzed using inductively coupled plasma mass spectrometry (ICP-MS)


Introduction
At the reconnaissance stage of mineral exploration, the location of potential mineral deposits (metallic or non-metallic) or prospecting areas must be exclusively determined [1][2][3][4]. The importance of the reconnaissance stage is due to the identification and separation of potential and barren areas in terms of ore mineralization; the potential areas will be lost or not identified if there is any error or lack of accuracy during this stage [5,6]. Geochemical modeling of stream sediment samples using methods such as multifractal analysis, clustering, principal component analysis (PCA), factor analysis, and artificial neural networks (ANN) is a valuable tool for mineral exploration [7][8][9][10][11]. Each method has advantages and disadvantages for geochemical modeling of stream sediment samples [12][13][14]. The advantages of using these methods are that they are simple in predicting the nonlinear geochemical behavior of the elements relative to each other and detecting the limits of the geochemical anomaly threshold. Perhaps the main disadvantage of the above methods is that the results are directly related to the sampled values and the smallest error in the raw data can greatly affect the output results. Accordingly, an integration and simultaneous use of the analyzing methods for geochemical modeling can diminish the percentage of errors and intensify the accuracy of results for mineral exploration [15][16][17].
Numerous studies have used fractal and correlation analysis and machine learning methods to identify potential mineralization zones separately [5,9,[18][19][20][21][22][23][24][25]. However, a few studies have used integrated and hybrid methods for the geochemical modeling of ore mineralization [26][27][28][29]. Geochemical studies based on correlation analysis can only determine the relationship between two or more elements and commonly provide the correlation coefficients [30,31]. Although there is not necessarily a direct or an inverse relationship between two elements, their geochemical behavior in most cases is a nonlinear relationship that a second-or third-degree curve must be explained. Clustering methods (e.g., k-means clustering) are dominant tools for detecting elements' nonlinear geochemical relationships [32,33].
Machine learning methods have various applications in geochemical modeling, such as detecting trace elements, clustering of elements, anomaly separation, the combination of layers [34,35], testifying large data volumes, and discovering specific trends and patterns [36]. However, there are some disadvantages to using machine learning methods, including the existence of comprehensive data, a long learning time which involves a powerful computer, and the interpretation of complex results [37,38]. Previous studies have scrutinized machine learning methods for geochemical modeling of ore mineralization in metallogenic provinces [39][40][41]. It is definitely noteworthy that a combination of machine learning methods with geostatistical methods can optimize the geochemical modeling results [9]. For instance, it is essential to determine statistical communities when applying the linear discriminant analysis (LDA) algorithm [42,43]. Threshold limits of each community can be calculated with various methods, such as k-means clustering or concentration-area (C-A) fractal analysis. Separation of these communities can affect the results of classifying elements by means of the methods used. Integration of different methods (e.g., machine learning and geostatistical techniques) can reduce the methods' weaknesses and intensify their advantages for an accurate geochemical modeling of ore mineralization [44][45][46][47][48][49][50].
The main purpose of this study is to develop an approach for comprehensive geochemical modeling of copper mineralization by integrating geostatistical methods (based on correlation analysis) and machine learning techniques for prospecting copper anomaly zones in the Sahlabad area, Birjand, East Iran. The Sahlabad region is located in the Sistan structural zone of Eastern Iran ( Figure 1A,B) [50], which has prodigious potential for copper, gold, magnesite, chromium, and iron deposits [51]. In the Sahlabad area, there are three active copper mines, namely Mesgaran, Chah-Rasteh (Kooh Kheiri) and Zahri, two old inactive mines (i.e., Kasrab and Cheshmeh-Zangi), and nine copper mineralization occurrences [52,53]. The mineralization type of the copper deposits is massive sulfide. Only a few geological studies have been documented for the Sahlabad area [54][55][56][57]; however, no comprehensive geochemical modeling for copper mineralization has yet been accomplished or reported in this area. The main objectives of this study are (1) to identify trace elements and the predictor composition of copper mineralization in the study area; (2) to separate copper geochemical communities; (3) to analyze geochemical behavior of copper element versus its trace elements. Therefore, geostatistical and machine learning algorithms are used to separate geochemical communities, analyze the geochemical behavior of elements, and cluster stream sediment data in the Sahlabad area. Consequently, this investigation makes a key contribution to develop an integrated geochemical model for copper mineralization using hierarchical analysis, factor analysis, k-means clustering, concentration-area (C-A) fractal analysis, linear discriminant analysis, and correlation analysis.  [57]. (B) The geographical location of the Sistan structural zone in Iran and study area. An: andesite; Ba: basalt; Co: conglomerate; Dd: dacitic dyke; Lm: limestone; Lv: listwanite; Ml: ophiolite melange; Mt: metadiabase; Qt: Quaternary sediments; Tu: tuff; Ub: ultrabasic rocks; Sch: schist; Sh: shale.

Geology of the Study Area
The Sahlabad area is located in the east of Iran (Birjand, South Khorasan province). It is positioned between longitudes 59°30′ to 60° and latitudes 32° to 32°30′ ( Figure 1A,B). The study area is entirely located in the flysch belt and ophiolite melange of eastern Iran in the Sistan structural zone [51]. This structural zone is situated between the Nehbandan (in the west) and the Harirod fault (in the east), which is 800 km long and 200 km wide [52,53]. This zone has undergone evolutionary stages from oceanic crust to continental crust and is one of the derivations of the "young Tethys" type [53][54][55]. In this area, igneous, metamorphic, and sedimentary lithological units are exposed from the late Cretaceous to the Neogene [56]. The geological formations in the Sahlabad area include rocks with the characteristics of the flysch belt and ophiolite melange belt, which are attributed to the upper Cretaceous and lower Tertiary, and the volcanic cover and younger Tertiary sediments [54].

Copper Deposits and Ore Mineralization
Copper, gold, nickel, chromium and magnesite mineralization and some old mining activity and excavations are documented in different lithological units (e.g., ultrabasic, intermediate and acidic rock units, metamorphic rocks, listwanites, etc.) of the study area [55][56][57][58][59][60]. The spatial location of copper mines, deposits, and indices are shown in the geology map of the study region ( Figure 1A). Malachite, chalcopyrite, and chalcocite were documented in the copper mineralization zones [61][62][63]. Characteristics of copper mineralization zones in the study area are shown in Table 1.

Geochemical Sampling
Geochemical sampling in Sahlabad area was prepared through the stream sediment sampling method. The initial design of the sampling points was mainly based on determining the center of stream's gravity. For this purpose, a map of the drainage systems of the study area was generated using topographic maps and aerial images. Stream sediment samples (709 samples) with a particle size of −40 mesh were collected from the study area.
The design and collection of stream sediment samples in the Sahlabad area were performed by the Geological Survey of Iran (GSI) [58]. Figure 2 shows the location of stream sediment samples in the Sahlabad area. The Inductively coupled plasma mass spectrometry (ICP-MS) method [64] was used to analyze geochemical samples collected from the study area. Zn, Cr, Ti, Mn, Sr, Ba, Au, As, Sb, Bi, Hg, W, Pb, Ni, Mo, Sn, Ag, Co, Fe and Cu were selected in this analysis. In geochemistry analysis, one of the main components of general error in exploration operations is laboratory error, and obtaining this error is important to know the precision of the analysis. Since, in regional-scale geochemical projects, the goal is to measure the relative values of each element corresponding to each other for finding promising areas, hence the precision of measurements is more important than their accuracy. For this reason, the precision of the operation has been investigated by repeated analysis of geochemical samples. To check the accuracy of the analysis, for every 10 to 15 geochemical samples, a duplicate sample (about 10% of the total samples) was analyzed. In this project, 79 samples were selected randomly in the study area. Figure 3 shows a diagram of the relative error rate for different elements. It shows that Au, W, Cr and As have high relative error rates.

Methodology and Data Processing
Geochemical data processing to evaluate relationships between elements, determine the threshold limits of geochemical communities, and identify the geochemical predictor composition of Cu mineralization was carried out using geostatistical and machine learning methods in this study. In doing so, an integrated geochemical model, including hierarchical analysis, factor analysis, k-means clustering, fractal analysis, linear discriminant analysis, and correlation analysis, was used to optimize geochemical analysis to identify copper anomalies in the Sahlabad area. To use raw data in statistical methods, censored and outlier data need to be identified and replaced. In the study area, censored data were unavailable. Outlier data were identified using the boxplot method [59,60] and corrected with the largest and smallest numbers. Typically, the research path with the mentioned approaches follows three main objectives, which are actually different dimensions of the geochemical model.
The first objective is to determine the geochemical predictor composition of Cu mineralization in the study area. For this purpose, at first, hierarchical analysis was applied on geochemical data to identify elemental groups based on Pearson correlation coefficient. In order to analyze the nature of the geochemical data and the obtained results, the factor analysis method was used. In this method, the data are evaluated from the aspect of principal components analysis and variance justification. Thus, the geochemical predictor composition of Cu mineralization was determined and confirmed. After that, Spearman's rank correlation coefficient analysis was used to quantify the relationship between the elements of the predictor composition elements.
The second objective is to separate the Cu geochemical communities based on the geochemical data of stream sediments. This approach was studied using the k-means clustering method as an innovative technique for this purpose. Threshold limits of background, anomaly, and enrichment communities were obtained through the results of kmeans clustering analysis. In order to validate the results, the data were processed again using concentration-area fractal analysis.
The third objective is to investigate the behavior of Cu element versus its geochemical tracer elements. At first, a clustering analysis of predictor composition elements of Cu mineralization (Cu geochemical tracer elements) was performed using Linear Discriminant Analysis (LDA). The goal was to evaluate the clusters within the group. LDA analysis determines the clusters using geochemical threshold limits of the target element (Cu). As a consequence, the genetic relationship of the tracer elements versus the Cu can be investigated. Subsequently, the k-means method was implemented on the data with the aim of quantifying the behavior of geochemical tracer elements. The mechanism of this method is that it first divides the data into clusters based on the Euclidean distance, and the optimal number of clusters in this research was determined based on the utility function (S(i)). Then, with the regression method, the centers of the optimal clusters are trended, and in this way, the trend of changes in the concentration of different elements versus the Cu is evaluated. Relativ Error (%) Figure 4 shows an overview of the methodological flowchart in the study area. Additionally, in the following subsections, calculation principles and resources for further studies are presented.

Correlation Coefficients Method
In data mining and statistics, "hierarchical clustering" is a method that categorizes and groups observations and data in a hierarchical manner. The point that sets this method apart from other clustering methods is the top-down (or bottom-up) order that exists in this technique [61]. In this method, unlike other clustering methods, each obser-vation may be placed in more than one cluster because clusters are formed based on different levels of distance; therefore, each cluster may be a subset of another cluster at a distance [62]. The clustering aims to classify the data into similar groups to identify elements with similar genetic characteristics in the area [63,64]. After identifying the copper element family, the correlation coefficient method was used for further study. Spearman's rank method was used to understand the effective processes in the formation of deposits and determine the correlation coefficients in the geochemical family of copper element. This method was based on the degree of the dependence of two variables measured in a set of individual data and the dispersion of different elements in the rock units of a deposit.

Factor Analysis (FA) Method
Factor analysis (FA) is a multivariate statistical method that establishes a special relationship between a large set of seemingly unrelated variables under a hypothetical model. This method was used to reduce the size of the data in the first step and then identify the area's main factors of copper mineralization [65]. In the FA method, a large number of variables are expressed in terms of a small number of dimensions or structures, which is called a factor [66]. Directions with maximum variability were identified using eigenvalues and eigenvectors.

K-Means Clustering Method
The K-means clustering method was used to group the data into clusters based on common characteristics [67]. Instead of simultaneously examining large numbers of data, clusters and their centers with specific features were used in the analysis. The K-means algorithm is presented as follows in five steps [68]: (1) n members are divided into k clusters. The K number is selected randomly.
(3) Equation (2) calculates the center of each class while the algorithm is running. The x indicates the vector of each member of Cj, and the number of Cj class members is represented by #Cj. [69].
(4) The objective function is calculated based on Equation (2), which determines the total distance of each member from the center of the class.
The objective function is minimized, and the number of optimal classes (K) is determined based on the minimum objective function.
Shirazy et al. [70,71] presented a software to increase the speed of execution of the above algorithm. The primary purpose of using this method in the first step was to cluster the data related to the element copper and identify groups with common characteristics in the study area. In other words, the geochemical threshold of the copper element was calculated in the background, anomaly, and enrichment communities. The next step in using the K-mean method was to investigate the geochemical behavior of the copper family. Using the previous methods, the geochemical family of a copper element was identified, but how members of this family behave regarding the copper element is the result of the K-means method.

Concentration-area (C-A) Fractal Method
Unusual phenomena with irregular shapes do not follow the principles of Euclid's geometry. The geometry used to describe these phenomena is called fractal geometry. Fractal methods have many applications in surface geological and geochemical studies due to the geometric shape of the anomalies, the spatial distribution of the data, and the use of all data in the computational process. Many researchers have studied the use of fractal methods in Earth sciences. Shirazy et al. [70] proposed various methods of using this geometry to separate geochemical communities. An equation is presented below for the concentration of materials or fractal properties: is the cumulative area enclosed by concentration level lines whose corresponding concentration is greater than or equal to ν. The value of α represents the fractal dimension of the different amplitudes [72,73]. The purpose of using C-A fractal method in geochemical modeling was to determine the fractal dimension of copper concentration data in stream sediment samples. This method identifies the area's various geochemical communities of copper concentrations. It also confirms the results of K-means clustering.

Linear Discriminant Analysis (LDA) Method
Linear discriminant analysis is a statistical method of data analysis used to measure the relationship between a parameter and known communities. In this method, the target communities need to be known in advance or at least defined [74,75]. Therefore, the results of K-means clustering and C-A fractal methods were used as one of the inputs of this method [74]. The purpose of using the LDA method was to investigate the intragroup behavior of geochemical elements in the copper family. In other words, the genetic relationship of elements identified as group elements with copper was investigated using this method. It was determined which elements of the previously identified geochemical family are genetically related to the copper element.

Preparation and Descriptive Statistics Analysis of Raw Data
Outlier data were identified using the boxplot method for analysis of the geochemical stream sediment data. The data were corrected using the replacement method with the largest and smallest values. The boxplot and histogram of the copper element as an example in the raw and corrected data are shown in Figures 5 and 6, respectively.  In order to correct the outlier values, the identified outliers were removed first and then the largest and smallest values were determined. The outlier datapoint whose value was greater than the largest value was replaced with it, and if the outlier datapoint was less than the smallest value, it was replaced with it. Table 2 presents the statistical characteristics of each element. Mahboob et al. [31] and Shirazy et al. [32] state that the data neither change continuously nor change in frequency in geochemical exploration. It is better to divide the total data into classes or categories with equal intervals and then draw their distribution curve. According to the resultant curve, the statistical population distribution can be identified [75,76]. In this study, a histogram of the refined data (replacement of outlier data) was drawn, and a bell curve was matched (see Figures 5 and 6). The normal distribution of the statistical population is a prerequisite for using data in many statistical methods. According to the statistical characteristics listed in Table 2, the skewness and kurtosis test were used to determine the frequency distribution of elements. In general, the normal bell distribution has zero skewness and kurtosis, so if these values in the data are close to zero, the distribution is also closer to the normal distribution [77]. However, if the data kurtosis and skewness range from −2 to 2, the data distribution is confirmed in terms of normality and can be used in statistical methods [78][79][80][81][82][83][84]. In the study area, the skewness and kurtosis coefficients for the data range from −1 to +1. Accordingly, the frequency distribution of data can be considered normal. It should be noted that the raw data of most elements, including outlier data, show an abnormal distribution, and after correcting outlier data, the distribution returned to normal. Figure 7 shows a hierarchical clustering diagram in relation to the elements of Ag, Zn, Pb, Mo, Sn, and Cu. From the geochemical data, five elements are genetically related, and they are effective in identifying each other. For further analysis, the correlation coefficients were calculated for this cluster. The correlation coefficient is a specific measure that quantifies the strength of the linear relationship between two variables in a correlation analysis. The study of a correlation coefficient is one of the most critical cases in geochemical studies because it can be used to understand the environment and processes influential in forming deposits [85][86][87]. As shown in Table 3, the results indicate a direct relationship between Ag, Zn, Pb, Mo, Sn, and Cu.  The purpose of using the FA method, in addition to confirming the results of the hierarchical analysis, was to calculate the scores related to the main component of Cu mineralization. To carry out the FA method, the validity of the data was evaluated using the Kaiser-Meyer-Olkin (KMO) index and performed on geochemical data of stream sediment samples. The KMO index for the data is 0.885. It means that these data have high validity and adequacy for the FA. Table 4 shows the FA results of geochemical data in the Sahlabad area. As shown in Table 4, the percentage of variance justification is presented by different components. From Figure 8, four factors (the fracture point of the diagram) were considered according to the Scree plot and eigenvalues related to each factor. The results show that the variance justification in the four components is more than 65%, which is a significant amount. Table 5 presents the matrix results of four factors from the factor analysis. From the results presented in Table 4, the elements were identified in relation to the highest scores of the copper element.  The appropriate component was factor number one, which included the elements of Ag, Cu, Mo, Pb, Zn, and Sn. Based on the FA method, metallic and quasi-metallic elements, presented in the linear composition of the first component with an appropriate variance justification, can be considered predictors of copper mineralization in the study area. The FA results confirmed the hierarchical analysis results, which indicated the proper performance of methods and data quality.

Hierarchical Clustering and Correlation Coefficients Analysis
To represent the potential areas represented by the elements of the first factor, a map obtained from the matrix of this factor was produced using the kriging interpolation method (Figure 9). Because topography plays an essential role in controlling stream sediment anomalies, a three-dimensional model of potential areas based on the first component matrix of the FA matched to Sahlabad's topography was also presented in Figure 9. In Figure 10, the earthy brown color indicates the depletion of the main factor, and the red color designates the enrichment. Since the geochemical data are stream sediment samples, the area's topography controls this enrichment and depletion. Figure 9. A three-dimensional model of potential areas based on the first component matrix of factor analysis adapted to the Sahlabad's topography (for factor score please see Figure 10).

K-Means Clustering Results
Using the k-means clustering method, the copper element in the geochemical data was divided into three clusters. Table 6 presents the results of this clustering, separated by the different specifications. Additionally, Figure 11 shows the map of copper element clustering. As shown in Figure 11, the three groups are separated by blue, green, and red colors, representing the background, anomaly, and enrichment of geochemical communities, respectively. Figure 11. A map of K-means clustering applications on geochemical data of copper element in stream sediment samples. Table 6. Clustering results of copper element using the k-means clustering method.  Figure 12 shows a map resulting from the Kriging interpolation method. In this regard, a concentration-area table was prepared using map networking and area calculation, including different concentrations of the copper element. The threshold range of each statistical community was calculated by plotting the logarithm of concentration and the area containing the copper element, based on the breaking points of the diagram. According to the C-A fractal diagram presented in Figure 13, the statistical community of copper element concentration data in stream sediment samples is divided into three groups. The concentration threshold of these groups is shown in Table 7. It can be concluded that the results of K-means clustering and C-A fractal methods confirm each other with a good accuracy.   The FA results showed that Ag, Cu, Mo, Pb, Zn, and Sn in the first factor have the highest variance justification. Since the target was the element copper and its behavior with other elements, these six elements were used in the LDA method. From K-means clustering and C-A fractal methods, the separated communities were used as the inputs of the LDA method. The number of focal functions is less than the number of groups considered for the dependent quantity (i.e., three standard groups for the copper element). According to Figure 14, separating the data related to elements is in the direction of function one with the least possible interference. Consistent with data processed by the LDA method (Table 8), for every 110 analyzed samples, at least one sample is lost and excluded from the analysis. Therefore, out of a total of 709 samples, 701 samples were analyzed. Figure 15 shows the histogram of discriminant scores calculated based on the discriminant function for the data used in each discriminated group.

Number of Processed Data 709
Excluded Missing or out-of-range group codes 0 At least one missing discriminating variable 110 The number of data used in the output 701 Figure 15. Histogram of discriminant scores based on the discriminant function for the data used in discriminated groups of (A-C). Figure 16 presents the results of the linear discriminant analysis performed on the data for grouping the elements. It was observed that the silver element is not inherently correlated with other elements and is in an independent group. Although silver correlates well with other elements, it does not play a role in tracing the copper. The elements copper, molybdenum, tin, lead, and zinc are separated into a group, indicating their inherent correlation in the study area. Furthermore, Wilks' Lambda test was used to validate the linear discriminant analysis. In this regard, Table 9 represents standard correlation coefficient and eigenvector values obtained from the discriminant functions. The first canonical discriminant functions were used in the analysis. The canonical correlation coefficient represents the Pearson correlation between the differentiation scores calculated by the above function and the initial grouping values. The eigenvalues represent the amount of variance expressed by the above functions. So that eigenvalues more significant than one represent a more appropriate discriminant function. The canonical correlation coefficient also indicates the correlation between discriminant scores and the dependent variable of the analysis, and the larger values of this parameter represent the higher power of the discriminant function. The calculated value for the canonical correlation coefficient is 0.903. This means that the discriminant function can model the variability of more than 81% related to the groups. This value is too significant for the data used in the analysis due to the study area's scale. The value of the Wilks' Lambda coefficient in the discriminant analysis is 0.184, which is a deficient value and indicates the high ability of the function in the correct classification of data. This number, on the other hand, equals the percentage of the total variance that this discriminant function cannot express. In this analysis, the elements of copper, molybdenum, lead, zinc, and tin were identified as a group of elements involved in tracing each other. In this regard, the behavior of two elements with each other was summarized in a number, indicating the direct or inverse correlation of two elements to each other. The copper's geochemical behavior with each identified group element was genetically related to Cu investigated in pairs. To identify the optimal number of clusters in the K-mean clustering, the K number was increased from 3 to 10. Figure 17 shows the value of the utility function against the number of clusters for all elements in the same group as the copper element. In line with the diagram shown in Figure 17, the number of three clusters is the optimal number of clusters for the behavior of copper element compared to other elements in its group. Figure 18 presents the profile of the clusters and the utility values for the optimal classification performed for the desired elements.

Zn Mo
Sn Pb Figure 18. The profile of the clusters and the utility function values for the optimal classification (k = 3) performed for Cu vs. Mo, Pb, Sn, and Zn.

Geochemical Behavior of Cu and Mo Elements
Optimal clustering was used to investigate the geochemical behavior of copper element compared to Mo element. The centers of the designated clusters for K = 3 are plotted in Figure 19. Based on this clustering, according to the diagram presented in Figure 19, the concentration of Cu in the stream sediment samples of Sahlabad area increases the concentration of Mo. The behavior of Cu and Mo elements with each other is nonlinear. The fitted curve with regression coefficient 1 is a quadratic curve whose equation is presented in Equation (4).
From Equation (4), the element concentration of Cu and Mo can be calculated in relative to each other. Equation (4) helps to see more about the behavior of Cu and Mo elements in the area's stream sediments.

Geochemical Behavior of Cu and Zn
As shown in Figure 17, the three-class clustering has the highest utility value among all the classifications. Subsequently, Figure 20 shows the concentration of the cluster centers per k = 3 to identify the behavior of the Cu element relative to the Zn element. Along with the above diagram, which shows the changes in the concentration of Cu against Zn in stream sediment samples, it is observed that the concentration of Zn also increases with increasing concentration of Cu. The behavior of Cu and Zn elements with each other is nonlinear, and the curve fitting on their behavior is of the quadratic equation with a nonlinear regression coefficient of 1 (R 2 = 1), which is the behavioral equation of these two elements in Equation (5) Geochemical Behavior of Cu and Pb Elements As shown in Figure 17, the three-class clustering (K = 3) was selected as the most appropriate classification with the highest utility function value (s(i)). The graph of concentration values in the cluster centers of this classification is also presented in Figure 21 to describe the geochemical behavior of Cu versus Pb.
Following the graph of the behavior of Cu and Pb elements, an increasing trend of these two elements relative to each other is observed. The fitted curve, which shows the concentration behavior of the studied elements with each other, is a quadratic equation with a nonlinear regression coefficient of 1 (R 2 = 1), presented in Equation (6)

Geochemical Behavior of Cu and Sn Elements
Considering the optimal clustering, the concentration of Cu in the stream sediment samples of the Sahlabad area increases concentration of Sn. This behavior was fitted with a quadratic curve, and its equation is presented in Equation (7). The copper and tin concentrations in the clusters' centers are shown in Figure 22.

Discussion
The technical flow presented in the current research provides a path for the multifaceted analysis of stream sediment geochemical data. The Self-Validation System (SVS) in this methodology is a type of confirmation of the high accuracy of the results (based on the principle of reproducibility of the results). However, the challenge is to choose appropriate analytical methods for the study system. Therefore, one of the limitations of the presented technical flow is the analytical methods and subsystems related to the performance procedure of each of them. For example, can the interpolation method make the results more accurate (closer to reality)? A detailed and documented answer to this question requires another research definition based on the present article. Other limitations include the lack of other types of geochemical data, such as heavy minerals or rock samples. If all types of geochemical data are used, more information can be deduced.
Given that the relationships obtained between the copper element and the predictor elements of Cu mineralization (geochemical family) were all direct (including the elements Mo, Pb, Zn, and Sn), it means that as the grade of copper increases, the grade of these elements also increases, and vice versa. Geochemically, it can be interpreted that the results of the previous steps acknowledged that these elements are predictors of Cu mineralization in the Sahlabad area. Therefore, geochemical halos related to the Cu mineralization can be identified using these elements. On the other hand, these results show that the relationships between geochemical elements are not necessarily linear. In other words, they show a kind of nonlinear behavior concerning each other.
The strength of the integrated geochemical model presented in this research is the evaluation of the results of each method by the results of previous and subsequent methods. In such a way, the results of each part geochemically confirm the results of other parts. In geochemical studies based on stream sediment data, it is always important to identify the predictor composition of the target element mineralization, determine the geochemical threshold of the target element, and investigate the geochemical behavior of elements in the same group with the target element. Based on the analysis of stream sediment data by the geochemical model proposed in this study, a more accurate mineral exploration campaign can be performed. Another strength of the current geochemical model is the comprehensive review of geochemical data. This comprehensiveness (the steps presented in the flowchart (see Figure 4)) gives the geochemistry researchers a broader view of exploratory geochemical modeling based on stream sediment data.

Conclusions
In this study, correlation analysis and machine learning techniques were implemented and incorporated to develop a geochemical model of the data obtained from stream sediment samples for the Sahlabad area, South Khorasan province, East Iran. This study aimed to identify the geochemical anomalies of the copper element based on its trace elements. Trace elements were first identified to achieve this goal using hierarchical analysis and factor analysis (FA) methods. Then, the map of copper mineralization anomalies was produced using the main component scores of the factor analysis. Successively, the group behavior was investigated to clarify the geochemical relationships of trace elements with Cu. The geochemical behavior of trace elements versus copper element was modeled as quadratic equations. Typically, it can be said that the geochemical distribution of copper in the Sahlabad area is related to Mo, Sn, Pb, Zn, and Ag. Based on the results of linear discriminant analysis (LDA), which was applied to the family of copper tracer elements, it was found that Ag has a different behavior from other elements. Since it is in a different group from other elements, it can be understood that it has no direct genetic geochemical relationship with the copper element in the area. However, based on the initial clustering of the elements, it can be used as a copper tracer in the area, because this element's genetic relationship with the copper element is indirectly defined. The geochemical model developed in this investigation is systematically appropriate for copper prospectivity mapping in the Sahlabad area and other analogue metallogenic provinces around the world.