Factor Analysis of XRF and XRPD data on the Example of the Rocks of the Kontozero Carbonatite Complex (NW Russia). Part I: Algorithm

: This paper aims to develop a principle for selecting the most informative samples for geological research from extensive collections of rock material. As a tool for this selection, we chose an original method of statistical comparison of X-ray powder diffraction (XRPD) and X-ray fluorescence (XRF) data using factor analysis (FA). A collection of carbonatites and aluminosilicate rocks from the Kontozero Devonian carbonatite paleovolcano complex (198 samples) is presented to test our technique. The factors extracted during FA were successfully mineralogically interpreted according to peak positions on the graphs of factor loadings. For the studied rock collection, this approach allowed us to identify more than 20 rock-forming minerals based only on XRPD data. We also found about ten mineral phases, the lines of which are low-intensity, and/or which overlap with more intense peaks of other minerals in the diffraction patterns. The mineralogical interpretation of the factors of such hidden minerals can be performed through electron probe microanalysis (EPMA) of the samples previously selected using FA. In this study, we report on an algorithm that facilitates the selection of the rock samples exhibiting the greatest contrast in mineral and chemical composition and which contain the entire set of mineral phases occurring in the geological object under study. From the collection of Kontozero rocks we examined, the 30 most representative samples were selected, amounting to about 15% of the initial sample set.


Introduction
Analytical instruments are continually evolving, and their performance increases rapidly. Recently, a line of portable devices for field X-ray fluorescence (XRF) and X-ray powder diffraction (XRPD) analysis has appeared (e.g., [1]). As a result, researchers have the opportunity to work with ever-larger collections of geological samples. The number of samples is exceptionally high when drilling, during which many hundreds of core samples end up in the hands of geologists. Time and labor problems arise when selecting a representative subset of samples that can provide the most comprehensive (mineralogical, geochemical, isotopic, etc.) information for further research. In many scientific fields, with modern large-scale data processing, various methods of selecting a sample set and reducing its size using statistical tools have proven to be effective, and even irreplaceable [2]. However, in geology, the standard solution to this problem is based on expert judgment, i.e., on a mostly intuitive approach. At the same time, some techniques use statistical calculations (e.g., [3] and references to the reviews therein). This direction is promising, as it greatly simplifies and speeds up the solution of the sample selection problem.
One of the statistical approaches to solve the indicated problem is the processing of XRPDdataset using cluster analysis. This approach is successfully implemented, for example, using the commercial software PolySNAP (University of Glasgow, Glasgow, UK [4,5]), distributed by Bruker AXS. This software product copes several tasks, including phase identification, automatic mixture detection, and quantitative analysis. There are some examples of successful application of cluster analysis to XRPD-dataset in Earth Sciences (e.g., [6,7]). However, this approach is not yet widely used in rock studies.
Factor analysis (FA), as applied for statistical processing of the XRPD dataset, is promising as an alternative to cluster analysis. For example, FA can improve the signal-to-noise ratio in diffraction patterns [8]. Factor analysis is also widely used in the modification of principle component analysis (PCA) for various types of time-resolved datasets (XRPD, X-ray absorption spectroscopy including extended X-ray absorption fine structure, optical waveguide absorption spectroscopy, etc.) [9][10][11][12]. In these cases, a system is studied in which a change in crystalline phases occurs due to chemical reactions and/or structural transformations with a change in PT-parameters of the medium. The subject of this analysis is an equation system consisting of diffraction patterns or spectral characteristics at each moment of the experiment. The extreme members (initial and final) and, in the case of multistage reactions, single intermediate members have a contrasting composition. The other intermediate members pass into each other evolutionarily and are similar to neighboring members. They can be distinguished by a small shift in the content proportions of the phases that appear and disappear during the reaction. The response to the course of the reaction is a regular decrease in the intensities of individual lines, which makes it possible to monitor the dynamics of reaction transformations using PCA. Thus, when using PCA for time-resolved data, researchers deal with mixtures of a relatively small set of phases, usually known in advance.
In the case of studying the XRPD dataset of polymineral rocks, many members of the system of equations can (and most likely will) have the contrast composition. Moreover, many geological objects (including alkaline-carbonatite complexes, considered here as an example) exhibit rather revolutionary transitions. Thus, for natural rocks, the primary data have a more complex structure due to the natural variability of the object of study. When investigating, the problem of identifying a priori unknown mineral phases and identifying the most contrasting samples comes to the fore.
The original technique of statistical comparison of XRPD and XRF data by factor analysis (FA) proposed in [13] proved to be appropriate for the mathematical identification of major, minor, and accessory minerals and the rough estimation of their contents. This technique makes it possible to find samples with the highest and the lowest concentrations of a particular mineral in the collection under study [13]. FA makes the study blind, which substantially reduces the influence of the researcher on the result. This paper presents the results of an FA-based investigation of XRPD patterns and complementary XRF data on a rock sample collection from the Kontozero carbonatite complex. Based on these results, we developed an algorithm that remarkably facilitates the selection of samples exhibiting the greatest contrast in mineral and chemical composition and which contain the entire set of mineral phases constituting the rocks of the study collection. For statistical processing, we used the IBM SPSS Statistics software (IBM Corp., Armonk, NY, USA; [14]), widely used in the scientific field. The "Factor Analysis" module implemented in this program has a userfriendly interface and does not require specialized mathematical training.

Theory
In general terms, the procedure of factor analysis can be described as follows. Suppose we have a matrix of X variables of size (N × M), where N is the number of observations (rows) and M is the number of independent variables (columns). FA can be carried out for both variables (R-technique) and observations (Q-technique). The analysis involves examining either the correlation matrix or the covariance matrix of X. The former approach is most often used. The X matrix must be converted to a standardized X S matrix, which is then decomposed into several latent variables (factors). These are calculated as eigenvectors of the correlation matrix of the standardized data. The magnitude of the corresponding eigenvalues represents the variance of the data by the eigenvector directions [15].
Decomposition of an X S data matrix implies data separation into two parts-a structure part and an error part: where A is the matrix of "factor scores" (of size N × n), B is the matrix of "factor loadings" (of size M × n), the apex T means transpose, and n ≤ min(N, M) (where N is the number of samples, M is the number of independent variables, and n is the number of factors). The above inequality stipulates the transition to a space of lower dimension. The optimal n can be calculated through the sum of the eigenvalues of the used factors, which represents the data dispersion explained by these factors. The residuals (errors) are collected in an E matrix in such a way that an A matrix of factor scores describes the position of the samples in the new coordinate system. The B matrix of factor loadings describes the new axis, which is built on the original one. The factor score (FS) values describe the magnitude of a factor. FSs characterize the observations. The factor loadings (FLs) are the coefficients of the correlation between the factors and the original variables. They characterize the entire dataset and not a specific observation. In particular cases, the FA procedure becomes more complicated (for example, due to the application of the singular value decomposition algorithm [16]). However, the general FA principle remains unified.
Thus, FA is instrumental in moving from the M of original (independent) variables to the n of new variables/factors that concentrate correlated information from the initial volume of data. This function makes FA effective when applied to XRPD datasets. It is known [17,18] that the X-ray diffraction spectrum of a polymineral rock is a superposition of the diffraction spectra of the constituent minerals. Each mineral always displays the same diffraction spectrum, characterized by a set of interplanar distances d(hkl), which can also be represented in the values of the 2θ angle and the corresponding line intensities I(hkl), unique to each mineral. For the methodology of FA, the key fact is that intensities of individual peaks of each mineral are proportional to each other, and therefore mutually correlated. As shown in [13], FA extracts mineral-specific information from the entire XRPD volume and aggregates it into factors. The relationship between the factor and the corresponding mineral is illustrated by the identical position of the intense peaks on the FL factor graph and the mineral peaks on the diffraction patterns of the samples (Figure 1).
As a rule, one mineral corresponds to one factor; although on rare occassions, the relationship between factor and mineral(s) is more complicated. As a consequence, the interpretation of such rare factors becomes more sophisticated. We have established two types of complex factors. First, several minerals can be combined into a single factor. This combination results from direct or inverse proportionality of the content of the minerals involved. The petrological rationale for this circumstance is detailed in [13]. Second, the information about one mineral can theoretically be distributed into several factors. This assumption is based on the fact that the XRPD-dataset does not strictly fit the idealized description above. Several reasons lead to a change in the intensities of the peaks and, less often, their positions. These are instrumental problems (X-ray tube degradation, power settings, and choice of scan-time, etc.), problems caused by sample preparation (e.g., inaccurate vertical alignment of the sample, uneven powder grinding, and preferred orientation of crystallites), and those associated with sample properties (variations in the content of isomorphic impurities in minerals, microabsorption, etc.). FA is as sensitive to distortion of primary XRPD data as cluster analysis (see [7] and a review therein). Based on the FA methodology, we assume the following. Regardless of the nature of the distortion in the data, if the errors are systematic, they do not affect the FA result. If only the absolute intensity is affected, the values of the factors will change, but the factor loadings will remain unchanged. A random change in the relative intensity of some peaks decreases the FL value in the region of these peaks. A displacement of the zero point or a systematic change in the relative intensity in a sample group (e.g., due to the mineralogical/petrophysical specifics of a sample group leading to a uniform preferred orientation of the crystallites of a mineral, or due to insufficient grind of a batch of samples) can lead to the separation of XRPD data from one mineral into several factors. All or most peaks in the products of this separation (clone factors) should have close positions on FL factor graphs but different intensities.
We have not yet encountered the proven phenomenon of the separation of a mineral into several factors; however, during this study, we found several candidates for clone factors. Figure 1. Comparison of the diffraction pattern of an exemplary sample with the factor loading (FL) graphs of factors of the minerals constituting this sample (quantitative analysis was performed using the MAUD program [19,20]).

Sample Description
For this case study, we used 198 core samples from the Kontozero volcano-plutonic alkalinecarbonatite complex. Kontozero belongs to the Kola Alkaline Province [21,22] formed in the Devonian period in 360-380 Ma [23]. The predominance of volcanic rocks distinguishes the Kontozero complex from other complexes of the Kola Province [24,25]. It complicates petrographic investigations due to the small dimension of minerals and the diversity of their structural relationships. Other features of the complex are the ubiquity of breccias and, like in any alkaline-carbonatite formation, its mineral diversity and the presence of rare minerals. All these features, along with insufficient geological exploration, have hampered the study of the rocks of Kontozero. Thus, at the beginning of this study, we had only minimum mineralogical information about the samples from our collection. The sample collection includes (1) carbonatites sensu stricto (calcio-, magnesio-, and ferro-) containing < 20 wt% SiO2, (2) silicocarbonatites (essentially carbonate rocks of endogenous origin containing > 20 wt% SiO2), and (3) a variety of carbonate-bearing silicate and aluminosilicate rocks (from normal to alkali content, with both Na and K alkalinity types).

Analytical Techniques
The primary source of information on the mineral composition was X-ray powder diffraction (XRPD) from bulk rock samples. The chemical compositions of each sample were determined by Xray fluorescence analysis (XRF). Both analytical methods are express methods, allowing researchers to obtain the results for extensive sample collections at the earliest stage of the research.

XRPD
The XRPD results of the bulk rock samples were collected at room temperature using a Shimadzu XRPD-6000 diffractometer (Shimadzu Corp., Kyoto, Japan) with the Bragg-Brentano thetatheta geometry. Loading was carried out by loosely filling the sample holders with the finely ground powders (<7.4 µm), followed by pressing with frosted glass without horizontal movements. Measurements were taken using a Cu target X-ray generator with a graphite monochromator. Our task was to show how the proposed technique copes with quickly obtained large data sets (which is important, for example, when exploring mineral deposits when cores are mined in large quantities). Based on this, when choosing the shooting mode, we gave preference to speed, which inevitably degrades the quality of the diffractograms. The scan range of the Bragg angle (2θ) was from 4.00° to 70.00° in the continuous regime, with a scan speed of 2.00°/min; sampling pitch was 0.02°. Figure 2 shows the quality of the diffraction patterns obtained in this mode. The work was performed on the analytical equipment of the Institute of Mineralogy of Ural Branch of the Russian Academy of Sciences (Miass, Russia; [26]). Raw XRPD data are listed in Supplementary Table S1.

XRF
The XRF data of the bulk rock samples were collected using an S4 Pioneer wavelength dispersive X-ray fluorescence spectrometer (Bruker AXS, Karlsruhe, Germany). Instrumental operation conditions for the main rock-forming elements (Na, Mg, Al, Si, P, K, Ca, Ti, Mn, and Fe) and some minor elements (Ba, Sr, and Zr) were the following: 30 kV at 80 mA for NaKα, MgKα, AlKα, SiKα, PKα, KKα, and CaKα analytical lines and 50 kV at 40 mA for TiKα, MnKα, FeKα, SrKα, ZrKα, and RhKα lines. For minimization of the mineral and particle size effects, the samples were homogenized using the fusion sample preparation technique. Samples were preliminarily dried and calcined to determine the loss of ignition values. A total of 0.5 g of calcined sample was then mixed with 7.5 g of flux (a mixture of lithium metaborate and lithium tetraborate) and fused in a TheOX electric furnace (Claisse, Canada) to obtain glasses appropriate for further analysis. Certified reference materials (RMs) of igneous and sedimentary rocks, as well as apatite concentrates, were applied to build calibration curves. The lower detection limits were 0.05 wt% for all measured elements. The analysis of spectral overlaps and matrix effect corrections were carried out using the fundamental parameters method, as well as the calculation of Sr and Zr contents, utilizing the intensity of the incoherent (Compton) anode emission (Rh) scatter peak. The XRF analysis technique used, including estimates of measurement errors, is detailed in [27]. The research was performed using equipment from the Joint Use Center for Isotope-geochemical Research of the A.P. Vinogradov Institute of Geochemistry, Siberian Branch of the Russian Academy of Sciences (Irkutsk, Russia; [28]). The obtained XRF data are listed in Supplementary Table S2.

Data Processing
Before data processing, we applied some spectral manipulations, such as baseline correction and smoothing of the diffraction patterns. Smoothing was performed by using PeakFit v. 4.12 (Systat Software Inc., San Jose, CA, USA) with Loess regression (a level of 0.5%). Baseline correction of diffractograms was accomplished in the QualX v. 2.24 program (Institute of Crystallography (IC)-CNR, Bari, Italy; [29,30]), using a "Bezier Spline" (the points selected by the program were interpolated via the Bézier curve). The data processed in this way were compiled into a single database in Microsoft Excel (see Supplementary Table S3). Other preparatory manipulations (detailed in [13]) performed using this program were (1) the removal of variables, the values of which dropped to zero in all diffractograms after the baseline fitting, and (2) the addition of a small constant (for example, 0.01, which in the case of our data is three orders of magnitude less than the background values) to each intensity value. We also considered that after processing the data using QualX, the maximum peak of the diffractogram is automatically set at 1000. The diffractograms were scaled by multiplying the intensities at each 2θ by the coefficient k = (Imax-Imin), where Imax and Imin represent the maximum and minimum values in the corresponding "raw" diffractogram. Since most diffractograms showed subhorizontal baselines, this simplified approach for estimating the intensity of the principal peak satisfied the correctness. The set of diffractograms, thus transformed, was supplemented with the contents of the chemical elements in the corresponding samples. Factor analysis was performed in the modification of the principal component method using IBM SPSS Statistics v. 23 (IBM Corp., Armonk, NY, USA; [14]). An R-technique of FA (by variables) was used. The VARIMAX rotation [31], which is the most commonly used orthogonal rotation in FA, was also applied. Factors were identified using the online American Mineralogist Crystal Structure Database (AMSCD) [32], the QualX v. 2.24 program with the indexed XRPD database of open-access POW_COD [33], and the commercial PDF2 [34]. The calculated factor loadings and factor scores are listed in Supplementary Tables S4 and S5, respectively.

Types of the Extracted Factors
Data processing of the Kontozero collection yielded 107 factors, describing a 100% dispersion of the raw data. Analysis of the FL graphs showed that more than a third of the obtained factors describe the noise component ( Figure 3). The graphs of the noise factors have no distinct peaks but contain many "outliers" (single points with high FL values surrounded by those with low FL values). Therefore, we excluded all these factors from consideration.  Figure 3 is taken from [35].
About a third of FL graphs have either one intense peak or a series of low-intensity peaks ( Figure  4). Factors with these FL graphs are interpretable only in exceptional cases (when analyzing them, we used the techniques described below, in most cases to no avail). Note that, altogether, factors of this type explain less than 10% of the data dispersion. Only the remaining third of the factors, which showed many distinct peaks on FL graphs ( Figure  5), turned out to be informative. This group includes mainly the first 30 factors that altogether account for 90% of the total data dispersion. We focus on these factors below, because they were subject to mineralogical interpretation. Figure 5. Examples of FL graphs of the most informative factors (red line-factor #5 "alkaline amphibole"; blue line-factor #9 "orthoclase"; green line-factor #12 "monticellite").

Evaluation of the Stability of the Factor Solution
To evaluate the stability of the obtained solution, we conducted two additional numerical experiments.
First, it was necessary to check whether the performed operations with the spectra (baseline correction and smoothing) affected the result. Therefore, we compared the FA results for the following three datasets: 1. Raw X-ray diffraction patterns not subjected to any spectral operations; 2. Raw diffraction patterns subjected to baseline correction; 3. Smoothed diffraction patterns subjected to baseline correction (the data used in the method of [13]).
As a result, we obtained three sets of factors. For the overwhelming majority of interpreted factors from each set, analogs were found among the factors of other sets. The similarity is distinct both in the high values of the correlation coefficient between FSs (Table 1, Figure 6A) and in the identity of the FL graphs ( Figure 6B).
The difference in the serial numbers of analogous factors in the considered sets (see Table 1) is due to differences in the volumes of data dispersion explained by these factors. For instance, the factor shown as an example in Figure 6 explains 3.0% of the dispersion of the dataset composed of raw diffraction patterns. In the case of raw diffractograms with baseline correction, it accounts for 3.4% of data dispersion. Lastly, it explains 6.9% of the dispersion for a database on smoothed diffraction patterns. As a result, the factor under consideration has the ordinal number #6 in the first set of factors, #5 in the second set, and #4 in the third set. Typically, differences in the explained dispersion are small, and permutations occur in the immediate position (see Table 1).   Against the background of the considered analogous factors, a specific factor #3, extracted from the dataset of raw diffraction patterns, stands out. First, the FL graphs of its closest analogs from the results of processing other datasets have only a distant similarity with the FL graph of this factor (Figure 7). The FSs of all these factors are poorly correlated (see Table 1). Second, on the FL graph of factor #3, in addition to sharp peaks, a broad maximum occurs in the range of small 2θ angles. All diffraction patterns used in this study are bent to the top in this area (see Figure 2). This artifact disappears after the baseline correction procedure. Hence, the specificity of factor #3, extracted from raw diffraction patterns, is due to the information on the background component contained therein. Thus, our analysis showed that, in theory, even raw data are appropriate for research. However, if no baseline correction is made, additional factors overloaded by supplementary background information appear, which complicates the interpretation of the FA results. On the contrary, diffractogram smoothing, combined with baseline correction, simplifies the interpretation. The FL graphs obtained after such spectral manipulations are devoid of the outliers typical of raw data processing results (see Figure 6B). Nevertheless, FL graphs retain all the nuances of morphology. It follows that the performed spectral operations do not distort the information contained in the XRPD, and do not affect the results of FA.
The second numerical experiment we conducted aimed to determine whether the result of FA is particular or general. By a "particular" result, we refer to its relation to the tested sample subset (the working hypothesis implies that different sample subsets can yield different factor solutions). By a "general" result, we mean a solution resistant to any rearrangement of samples.
In this numerical experiment, we subjected the following three datasets to FA: 1. The subset of carbonatites sensu stricto selected according to the formal principle "SiO2 < 20 wt %", commonly used for the classification of carbonatites [36]-99 observations; 2. The cumulative subset of all other rocks of the collection (silicocarbonatites and carbonatebearing silicate rocks)-99 observations; 3. The entire set of samples from the Kontozero collection-198 observations.
After processing these datasets, as in the previous experiment, we revealed many analogous factors characterized by very similar FL graphs (Figure 8). Thus, the result of processing XRPD data with the FA represents a robust general solution. This solution is determined solely by the specifics of the mineral composition. It only requires the samples to be representative (that is, the set of samples must contain all mineral phases characteristic of the object of study).

Interpretation of the Results of Factor Analysis
After the stability of the solution obtained with the FA was proven, we proceeded to interpret the results. This procedure involves the decryption of the information hidden in the parameters of FL and FS. The positions of peaks on the FL graphs of the factors chosen for interpretation (see Section 4.1) coincide with the positions of the corresponding peaks on the diffractograms of certain minerals from the databases. For example, intense peaks in the FL graph of one of the factors occur at the same 2θ as in the fluorapatite diffractogram from the RRUFF database [37] (Figure 9A). Note that a similar FL graph was observed when studying the carbonatites of the Petyayan-Vara area (Vuoriyarvi massif, NW Russia) [13] (Figure 9B). We compared the FL graphs and the diffraction patterns of those samples in which the FSs of the corresponding factors were the highest (i.e., the highest expected content of the mineral associated with a factor [38]). Visual comparison of graphs ( Figure 10A) in all considered cases was effective and sufficient. However, comparison can also be formalized mathematically. For this procedure, it is necessary to construct a ranked range of FL values and analyze it using the methodology similar to the "Cattell's scree test" [39], which is often utilized in FA to determine the number of interpreted factors. The essence is to identify the inflection point on the ranked range, after which the dynamics of decreasing values changes ( Figure 10B). This procedure allows us to identify the most significant peaks on FL graphs (i.e., those whose FL values exceed the FL value of the inflection point) and to compare the diffraction patterns with only them ( Figure 10C). Mention should be made of the critical FL value, which can be estimated using standard statistical tests. Below this critical value, FL loses its statistical significance (for 198 samples, the critical value modulus is 0.18 at the significance level of p = 0.01).
For many factors, the most intense peaks in the FL plots coincided with several pronounced peaks in the diffraction patterns ( Figure 10A). For a mineralogical explanation of these factors, we used two techniques: (1) a mineral search in the online XRPD AMCSD database [32] using the "Diffraction Search" tool for the most intense peaks in the FL, with the "Tolerance" parameter equal to 0.1 ( Figure 10B); and (2) qualitative identification of the phase of interest in the diffraction pattern of the sample with the maximum FS using the QualX v. 2.24 program [30] (via the peaks simultaneously occurring in both the diffractogram and the FL graph; see Figure 10C). It should be borne in mind that in cases where the contents of some minerals in the rocks are close to proportional, these minerals can be combined into one factor [13]. During this study, we also observed peaks of several minerals (for example, magnetite and diopside) as part of a single FL graph of factor loadings. Given the possibility of "mixing" several minerals into one factor, the analysis of diffraction patterns with maximum FSs is preferable. However, it is technically more complex, and we achieved the best results by combining both interpretation techniques.
In addition, we extracted several specific factors with intense peaks in FL graphs. Diffraction patterns of samples with maximum FSs of these factors in the corresponding regions have only weak lines and/or lines overlapped by lines of other minerals (e.g., Figure 12A,B). For such factors, the qualitative identification of the phase of interest by the diffractogram with the maximum FS is not practical. Thus, we diagnosed these factors based on the position of the most intense peaks in the AMCSD database. The subsequent mineralogical study of samples with maximum FSs confirmed our assumptions. Ultimately, a comparison with the identified factors revealed several minerals in the studied rocks, which, although not abundant, are of significant petrological interest. The examples are burbankite (Na,Ca)3(Sr,Ba,Ce)3(CO3)5 ( Figure 12C) and baryte ( Figure 12D). Their diffractograms do not contain pronounced peaks due to the low content of these minerals in the rocks (up to several wt%). However, the proposed technique turned out to be sensitive even to these accessory phases.

The Algorithm for the Selection of Representative Samples
Summarizing the results of this study, we propose the following algorithm for selecting the most representative samples in an extensive collection of rocks of a geological object: 1. XRPD and XRF analysis and primary data processing, including baseline corrections and removal of zero values. The output of this step is a database suitable for FA (an example is shown in the supplementary Table S3); 2. Factor analysis of the obtained results. The outputs are (a) tables of FL values for XRPD and XRF variables (see Supplementary Table S4), (b) graphs of factor loadings for XRPD variables (e.g., Figure 5), and (c) factor scores for each sample (see Supplementary Table S5); 3. Compilation and examination of all FL graphs on a single chart. The output is the rejection of all non-interpreted noise factors (see Figure 3); 4. Comparison of each factor graph with the diffraction pattern of the sample, which shows the maximum score value of this factor. The output is a division of factors into easy to interpret (e.g., Figure 10A) and difficult to interpret (e.g., Figure 12A); 5. Interpretation of the easily interpretable factors by combining the two proposed techniques ( Figure 10). The output is a highly confident mineralogical explanation of these factors; 6. Interpretation of difficult-to-interpret factors by searching for the mineral (s) according to the position of the most intense peaks in the AMCSD database. The output is an assumption about the nature of these factors; 7. Routine mineralogical (optical microscopy, SEM + EPMA, Raman) examination of the samples with the highest FSs. The output is a verification of the FA results, a collection of the most representative samples, an idea of the mineral composition of all studied rocks at the level of main, minor, and most accessory minerals, all within a reasonably short time.

Conclusions and Future Perspectives
The proposed FA-based technique ascribed a large and unwieldy set of original data on the rocks of Kontozero to a small set of factors. The number of factors was minimized to the extent necessary to represent all non-random differences in the combined (XRPD + XRF) dataset. After this analysis, all the relevant information remained in full, which was confirmed by numerical modeling that proved the stability of the factor solution. The factor loading graphs of many extracted factors show pronounced peaks that coincide in position with the lines on the diffraction patterns of the samples. This phenomenon makes it possible to carry out mineralogical identification of factors using two techniques: (1) based on the graphs of factor loadings and (2) based on the diffraction patterns of samples with the maximum factor score. The first technique also reveals phases that do not yield pronounced peaks in the diffractograms, owing to their low content in the rocks. We note that in the latter case, additional methods of mineral diagnostics are required. In sum, the results of this study allowed us to develop an algorithm for selecting the most representative samples with the highest contents of certain minerals.
Further prospects for the discussed methodology include an in-depth assessment of the impact of noise and the possibility of its quantification; determining the impact of pre-treatment of XRPD data on the FA result; FA-based investigation of XRPD datasets with wider 2θ ranges (up to 100 ° or more) and determination of data range effects; examination of the influence of characteristics distorting the primary XRPD data (primarily the preferred orientation of crystallites) on the FA result and an assessment of the robustness of the proposed technique to these factors; comparison of the results of factor and cluster analyzes of the same XRPD-dataset and determining possible ways of their joint complementary use; the use of several FA and similar techniques (PCA, independent component analysis, etc.), allowing to limit the result, for example, non-negative matrix factorization [10].
Supplementary Materials: The following are available online at www.mdpi.com/2073-4352/10/10/874/s1, Table  S1: Raw diffraction patterns of the samples; Table S2: The measured content of major elements; Table S3: The dataset prepared for factor analysis; Table S4: Factor loadings of XRPD and XRF variables; Table S5: Factor scores of the samples.