Imputation of Gold Recovery Data from Low Grade Gold Ore Using Artificial Neural Network

Costa, Fabrizzio Rodrigues; Carneiro, Cleyton de Carvalho; Ulsen, Carina

doi:10.3390/min13030340

Open AccessArticle

Imputation of Gold Recovery Data from Low Grade Gold Ore Using Artificial Neural Network

by

Fabrizzio Rodrigues Costa

^1,*

,

Cleyton de Carvalho Carneiro

^1,2

and

Carina Ulsen

^1,3

¹

Department of Mining and Petroleum Engineering, Universidade de São Paulo, Escola Politécnica, São Paulo 05508030, Brazil

²

InTRA—Integrated Technologies for Rock and Fluid Analysis, Universidade de São Paulo, Escola Politécnica, Santos 11013552, Brazil

³

Technological Characterization Laboratory, Department of Mining and Petroleum Engineering, Universidade de São Paulo, Escola Politécnica, São Paulo 05508030, Brazil

^*

Author to whom correspondence should be addressed.

Minerals 2023, 13(3), 340; https://doi.org/10.3390/min13030340

Submission received: 4 December 2022 / Revised: 16 February 2023 / Accepted: 22 February 2023 / Published: 28 February 2023

(This article belongs to the Special Issue Process Mineralogy, Plant Practice and Developments in Mineral Processing)

Download

Browse Figures

Versions Notes

Abstract

:

In a multivariate database, the missing data can be obtained through several imputation techniques, which are particularly useful for data that are difficult to obtain, for any reason, or have high uncertainties or scarce variables. A Self-Organizing Maps (SOM) neural network is an effective tool for the analysis of multidimensional data applied for the imputation of data. In this paper, data from drilling were used for training, testing, and validation using the variables: total Au recovery (%), which means gold recovery from a gravity concentration plus hydrometallurgical process, Au (g/t), As (ppm), S (%), Al₂O₃ (%), CaO (%), K₂O (%), and MgO (%). After training, the partial omission of Au content and recovery was carried out, from 10% to 50%, to evaluate the data imputation performance for those variables. The results imputed by the SOM were compared with the original data values and evaluated according to descriptive statistics; the results indicated a determination coefficient of 85% when 50% of the data were omitted and 93% when 10% of the data were omitted. Once demonstrated, the correlation between the original data and SOM imputation analysis can help geologists and metallurgists to obtain results with a high degree of reliability of metallurgical recovery through related chemical variables, making it possible to implement SOM analysis as a powerful tool to input analytical data. One of the practical applications of the proposed model is to produce a pattern of imputed data that can be a good alternative in the construction or generation of a synthetic geometallurgical database with missing data.

Keywords:

gold ore; artificial neural network; SOM imputation; data analysis of applied mineralogy

1. Introduction

In the mining industry—in particular, gold mining—continuous exploitation has led to a decrease in resources of high-quality ores, which are gradually becoming depleted due to their high demand. Refractory gold ores that have a recovery efficiency of <50% on average using direct cyanidation make up about 25%–30% of the total gold production from gold ores [1]. Gold ores are commonly classified into two major ores; free-milling ores are defined as those where over 90% of gold can be recovered by conventional cyanide leaching or some combination of flotation and cyanidation, and refractory ores give low gold recoveries only with the use of significantly more reagents or a complex pre-treatment process such as pre-oxidation prior to cyanide leaching [2,3,4,5]. The most efficient gold recovery is needed as an important guarantee for the sustainable development of the gold industry, while there are many challenges, especially in the treatment of sulphide refractory gold ore, such as increased operating costs, low gold recovery, and an increase in waste materials.

An essential factor for developing a more efficient gold extraction processing route is related to the inherent mineralogical characteristics of the gold ore that is processed. Mineralogy determines the design and performance of the process route involved in extracting gold [6,7,8,9,10], which is usually composed of crushing, a SAG (semi-autogenous grinding)/ball mill, flotation circuit, and hydrometallurgy (high-grade concentrated gold). In the hydrometallurgy process, a concentrate is leached in a cyanide solution, adsorbed using activated carbon, stripped, and fused into bars (bullion) containing gold and impurities in the foundry. The gold-bearing sulphide minerals are often recovered into a sulphide concentrate by gravity or flotation concentration, depending on the liberation characteristics of the sulphides.

Mineralogical data are validated using supporting chemical and metallurgical data or by automated scanning electron microscopy techniques [11]. A technique still little explored, Artificial Neural Networks (ANNs), have been used in several mineral processing applications such as modelling for particle size analysis techniques [12], liberation modelling [13], geological block modelling [14], and geometallurgy [15,16]. Geometallurgy is a multi-disciplinary approach that integrates several subjects’ information to incorporate principles of process mineralogy, chemical analysis and ore characterization [17,18,19,20,21]. The imputation of missing data can serve as a complementary tool for predictive geological block modelling, giving greater consistency in the modelling. The geometallurgy concept has received increasing attention due to the development of modern analytical techniques combined with ANN analysis [15].

In pursuit of a solution for datasets containing imputation of the values of missing attributes, recent developments in computing technologies have produced several machine learning algorithms, especially ANNs. The dataset typically contains an unusual proportion of missing values, outliers, and redundancy which must be eliminated or treated to build a robust model. There are many reasons for such omissions, including improper and mistaken data, data unavailability and data collection problems, missing sequences, or incomplete information.

The Self-Organizing Maps algorithm (SOM) is a heuristic model used to visualize and explore linear and non-linear relationships in high-dimensional datasets and is based on principles of vector quantization [22,23] and measures of vector similarity. In an SOM analysis, each sample is treated as an n-dimensional (nD) vector in a data space defined by its variables with unsupervised classifiers that are trained by competitive learning. After creating an n-dimensional space, where “n” is the number of variables involved, the samples elect a Best Matching Unit (BMU), which provides the corresponding values for each variable, making it possible to create imputed values of samples with missing values from their respective BMUs [24].

The method can be used to perform broad categories of operations, such as prediction or estimation, imputation [25,26], clustering, and classification [27,28], and is thus considered an exploratory data analysis tool.

The SOM model is not affected by missing values and can process with incomplete input datasets and be developed for an incomplete dataset to predict missing values in the input data. Therefore, SOM is an unsupervised method for which no prior knowledge is required regarding the nature or number of groupings within the dataset. It is an ideal technique for the analysis of complex and disparate geoscientific problems.

Aiming to implement SOM analysis as an alternative tool to impute the missing data, in this research, assay data were collected directly from drilling for training, testing, and validation. Because the used drillholes covered a reasonable part of the deposit, the trained network had sufficient information to characterize and generalize the spatial variation of chemical analyses in the region encompassed by the drillholes. The network’s model performance was assessed through two statistical criteria: the coefficient of determination R² (Equation (1)) and root mean squared error RMSE (Equation (2)).

Coefficient of determination (R^{2}) = \frac{\sum_{i} ({\hat{y}}_{i} - \bar{y})^{2}}{\sum_{i} (y_{i} - \bar{y})^{2}}

(1)

Root mean squared error (RMSE) = \sqrt{\sum_{i = 1}^{n} \frac{{({\hat{y}}_{i -} - y_{i})}^{2}}{n}}

(2)

where:

\hat{y}

: predicted value of y for observation I;

\bar{y}

: the mean y value;

y_{i}

: the y value for observation I;

n: number of samples.

The SOM was analysed using the variables total recovery (%), which means gold recovery from a gravity concentration + hydrometallurgical process, Au (g/t), As (ppm), S (%), Al₂O₃ (%), CaO (%), K₂O (%), and MgO (%). These chemical variables are associated with a mineralogy mainly composed of pyrite, arsenopyrite (sulphides), and carbonaceous phyllite with a predominance of chlorite.

The analytical dataset, which represents a database derived from drilling, and gold recovery data, meaning gold recovery from a gravity concentration plus a hydrometallurgical process (total recovery), were tested with omissions of 10%, 20%, 30%, 40%, and 50% of the total recovery and content of Au data. The results imputed by the technical SOM were compared with the original analytical geochemical values and evaluated according to descriptive statistics.

Gold concentration was carried out in two independent steps. In the first, the ore from the pit was subjected to comminution, classification, grinding, and gravimetric concentration through hydrocyclones, jigs, and flotation cells (gold recovery). In the final stage, the concentrate (high-grade gold concentrate) produced in the previous stage of the hydrometallurgical process was subjected to the CIL (carbon in leach) leaching method and electrodeposition to arrive at the final product, the Au bullion.

In a multivariate database, as applied in the research, the SOM analysis treats each sample as a vector unit, allowing for the creation of synthetic values of samples with missing values of their respective vectors. Using imputation methods on an incomplete dataset is a significant problem in data mining, big data analysis, and ANN-based decision-making, as the result of the mining or analysis can be negatively influenced if the incomplete dataset is significant in the database. Few studies have been published on missing value imputation intended as the primary solution scheme for chemical–mineralogical datasets containing one or more missing values of attributes.

Once demonstrated, the correlation between the original data and the SOM imputation analysis can help geologists and metallurgists to obtain results with a high degree of reliability of metallurgical recovery obtained through related chemical variables, making it possible to implement SOM analysis as an alternative tool to impute analytical data, where the values are taken directly from the prototype vectors of the BMUs. This research project aimed to propose the validation of a new system in order to achieve total recovery using other variables without compromising the quality of the result standards.

2. Materials and Methods

2.1. Dataset

The dataset of sulphide refractory gold ore used in this analysis was provided by a low-grade gold producing company (<0.6 g/t) in Minas Gerais State, Brazil. The deposit was hosted in carbonaceous seriticitic phyllite with the intercalation of phyllosilicate essentially composed of chlorite and millimetre quartzite lenses. Sulphides in general are represented by pyrite, arsenopyrite, and sparse occurrences of pyrrhotite, sphalerite, chalcopyrite, and galena.

The evaluation and imputation of gold recovery were carried out at 107,258 intervals of chemical analysis from 2209 diamond drillholes composed exclusively of deposit with homogeneous phyllites. Due to the large size of the dataset and with the aim to observe the compositional characteristics of the variables, it was necessary to pre-process the data and calculate the average arsenic, sulphur, and oxide contents for each 0.001 gold grade, resulting in 1397 gold analyses.

Figure 1 indicates, per gold content (e.g., 0.001 g/t of gold content), the average of the chemical element calculated. Gold content (e.g., 0.001 g/t content) can occur frequently in several holes. Equation (3) represents the calculation.

{Au}_{Grade i} = \frac{\sum y_{i}}{n_{i}}

(3)

where:

Au_Grade_i: gold content for interval i;

y_{i}

: the y value of the chemical element for interval i;

n_i: number of samples.

Chemical analysis was carried out by X-ray fluorescence (XRF) in fused beads for the determination of the main compounds of Al₂O₃, CaO, K₂O, and MgO; the dosage of Au content by fire assay, As by ICP OES and S by the pyrolysis method in an induction furnace with determination by infrared cells.

The estimation of gold recovery was developed based on historical operations, according to Equation (4).

Total recovery (%) = [(\frac{(g r a d e - (g r a d e * 0.0359 + 0.059))}{(g r a d e * 0.985 + 0.032)} * 0.96) + 0.04] * 100

(4)

where:

(g r a d e * 0.0359 + 0.059)

= tailings grade of the concentration plant;

grade = gold content in the ore;

0.985 = correction factor;

0.032 = 3.2% gain of the desulphurization (the objective of desulfurization is to remove sulphides from the tailings by separating the tailings into desulphurised tailings that are non-acid-generating and a sulphide-rich sulphide concentrate);

0.96 = hydrometallurgical recovery.

Figure 2 represents the gold recovery trend (total recovery) as a function of the gold content in the feed. The graph was obtained by applying Equation (3). It is observed that for high levels of gold (>0.90 g/t), the recovery curve does not show great gains.

Missing values of chemical elements are common in many databases. The reasons for the missing values can be various, such as an error in the analytical equipment, failure of quality control, the absence of a samples mass for a certain analysis, or outlier values, among others. Prediction of missing values by modelling is an alternative method to deal with this problem when it is not feasible to repeat all laboratory work, which includes sampling, sample preparation by grinding and pulverizing stages, preparation of aliquot, analysis in duplicate or triplicate, result quality control, and activities that could be performed by artificial neural networks, such as SOM.

2.2. Experimental Procedure

Once samples were selected, the data underwent a pre-processing step to feed the analysis of imputation values. In the sample pre-processing, random values of Au content and total recovery were excluded on purpose for the later estimation of those values by the SOM analysis. Finally, the imputed values were compared with the original values obtained by lab chemical analysis and the total recovery theoretical curve. To measure and assess the performance of the SOM analysis, sampling data tables with random exclusion values were modified, with 10%, 20%, 30%, 40%, and 50% of the total samples randomly hidden for the generation of new tables to use in the imputation values, as shown in Table 1. The data table was then introduced in the SOM platform from SiroSOM^® software (3rd, Commonwealth Scientific and Industrial Research Organisation, Canberra, Australia).

2.3. Self-Organizing Maps Technique

SOM is a data analysis, visualization, and interpretation tool, based on the principles of vector quantization and measures of vector similarity, that combines an input layer with a competitive layer of processing neurons and is typically organized as a two-dimensional grid [27]. The SOM implements a characteristic non-linear projection method from high-dimensional spaces. Most SOM models have been applied for the visualization of dimensional systems and data mining [23,24] and they can be used to perform broad categories of operations such as function fitting, prediction, or imputation, clustering, noise reduction, and classification. The SOM is based on unsupervised learning, meaning that no human intervention is required during the model learning process and that little information needs to be known about the characteristics of the input data.

The learning process occurs in an ordering phase, when large changes made to the neurons, and a tuning phase; smaller changes are made to units immediately adjacent to the winning neuron and are known as Best Matching Units (BMUs). It is these BMUs or seed vectors that are projected onto the enveloping hypersurface and transformed to produce the SOM representation of the data.

This training process results in an organized map, where the asymptotic local multivariate density of the weights approaches that of the training set. These maps can be visualized as a unified distance matrix (U-matrix) [29]. The representation of the U-matrix map indicates the closeness between adjacent nodes on the map, typically in terms of the Euclidean distance. A colour–temperature scale is used so that cooler colours (blues) separate adjacent nodes that are closer (similarity), and hotter colours indicate larger Euclidean separations (difference).

Additionally, component plots providing another visualization of the nodes on the self-organized map are possible. Because each node is a vector in the data space defined by the input variables, it is possible to visualize each node’s contribution to a particular variable and display the values again using a colour–temperature scale so that low values are displayed in blue and high ones in red.

Similarities among patterns are mapped into similar weights of neighbouring neurons. This means that the SOM is a clustering algorithm, where there is one cluster associated with each neuron on the map. The BMU for each vector, therefore, denotes to which cluster a particular input vector belongs, and the connection weights of that vector define the centre of that cluster.

2.4. SOM Preparation and Training Criteria

The SOM data mining and analysis are carried out using SiroSOM software. A schematic of the SOM data mining attributes is presented in Figure 3.

In the training phase, the node vectors are trained to represent the original distribution in an n-dimensional space, which is defined by the input samples and the required output map size. The number of seed vectors is determined by the size of the required output map, which in the study was a 20 × 20 sized map, thus meaning 400 seed vectors. A two-step process is applied to each input sample: (1) a competitive step: the input sample is compared to all seed vectors within a particular radius of the input sample, and ultimately, a winning seed vector is determined as being the most similar; (2) a cooperative step: the seed vectors within a given radius of the winning seed vector are also modified so that their properties are changed by a percentage to more closely resemble the input sample in question.

The number of output neurons in an SOM can be selected using the heuristic rule suggested by Vesanto [31] and applied by Carneiro [25] in a study on the imputation of reactive silica and available alumina: the optimal number of map units is close to 5

\sqrt{n}

, where n is the number of training samples (sample vectors).

The application of the SOM to the chemical analyses data is carried out using a fixed number of neurons and topological relations. Specifically, a surface hyper toroidal with hexagonal neurons (hexagonal grid) is selected, whose size is established based on the ratio between the eigenvalues of the training data. The surface of the hyper toroidal volume was used for the projection of neurons or BMUs.

For the analysis in the SOM environment, seven variables that have mineralogical significance were selected. The variables were as follows: Au (g/t); As (ppm), sulphur—S (%); Al₂O₃ (%); CaO (%); K₂O (%); and MgO (%). These variables directly reflect the mineralogy of the deposit, such as the occurrence of sulphides and silicates associated with the gold content and type of rock hosted. The total recovery variable imputed in this study is an important variable since it has an impact on the viability of the mineral deposit, besides having a dependence on the variable gold.

3. Results and Discussion

3.1. Geological Database

One of the elements necessary for an accurate SOM application is model information diversity. Diversity reflects the incorporation of measurement information characterizing relations across different elements, representing the analysed sample space, which must contain sufficient and necessary information for the imputation to be effective.

The chemical dataset was obtained from the drillholes assay and processed through filters to remove items that did not meet specifications (outliers) to allow for the qualification of the data. The dataset used in this study for input, training, testing, and validation, as mentioned, consisted of 1397 intervals of gold grade, but with some missing data. Table 2 shows the summary statistics of the seven variables analysed.

The data presented in Figure 4 and Figure 5 show a positive correlation with arsenic, sulphur, and gold contents that corroborates that the mineralogy and mineral associations of the gold in the deposit can be defined as sulphide refractory gold ore. From a flotation perspective, it is possible to evaluate the floatability of pyrite, arsenopyrite, and other sulphides, since these minerals all have different proportions of gold associated with them.

There is a mineralogical association between gold, pyrite, and arsenopyrite (gold occurring in ‘‘solid solution” or as sub-microscopic particles). From 1 g/t of gold, the relation is found to be more dispersed due to the occurrence of predominantly sulphide clusters, as shown in Figure 5.

In previous studies carried out at the deposit by Costa [32], the mineralogical compositions determined by SEM-MLA showed a considerable presence of quartz and mica (81.5% of the total) followed by minor proportions of albite (5.2%), heavy minerals (goethite + ilmenite + rutile) (1.8%), chlorite (clinochlorine; 2.4%), arsenopyrite (0.7%), pyrite (1.6%), galena and pyrrhotite (0.4%), and very rarely chalcopyrite and sphalerite. According to the same study, gold occurred mainly in the form of fine-grained alloys containing 20% of silver and typical refractory gold was concentrated in pyrite, arsenopyrite, and in minor proportions in silicates.

The total recovery data calculated from the gold grades also show a positive trend as the gold versus arsenic grades provide more precision to the imputation simulation in the SOM. Figure 6 presents the scatterplot of arsenic content and gold grade.

The high content of Al₂O₃ and K₂O indicates the occurrence of silicate and aluminosilicate minerals, with a major proportion of quartz, mica, and chlorite, thus showing a decrease in gold contents due to the low Au content versus silicate association. Another important factor is the possible occurrence of gold inclusion in silicates (Figure 7, Figure 8 and Figure 9). Although the dataset exhibits higher magnesium content, the mineralogical investigation related to other studies indicates that the bulk of the magnesium occurs in carbonates in the form of ankerite and dolomite (Figure 10). Significant amounts of deleterious materials such as pyrrhotite [33] require that emphasis should be placed on the identification and quantification of these components, such as font cyanide and oxygen consumers, preg-robbers, and other minerals such as clay minerals.

As seen in Figure 8, Figure 9 and Figure 10, K₂O, MgO, and CaO do not have a very clear trend with the gold grade—a fact that can be observed in Table 2 with a low variation in potassium and magnesium contents, as described by the standard deviation. This can occur due to the generalization of the deposit composed of only one lithology (phyllite).

3.2. Self-Organizing Maps Analysis

After testing a series of training steps for the SOM analysis, using a 20 × 20 grid, a stable repetitive result was achieved with the training parameters shown in Table 3. At the end of each SOM analysis, the quality of the mapping was usually also measured with the quantization error and the topographical error.

The quantization error (QE) was calculated, which represents a measure of the distance a sample is from its node vector, as the map resolution. Samples with high quantization error represent the outliers in a dataset. In the same way, the final topographic error (Te) was calculated, which simulates the proportion of all data vectors for which the main BMUs (first and second) are not adjacent. It quantifies the ability of the map to represent the topology of the input data. Table 3 shows the number of rows and columns that were calculated from the desired size map and the calculated parameters in the SOM analysis initialization of the five data deletion steps of 10%, 20%, 30%, 40%, and 50%, represented by E10%, E20%, E30%, E40%, and E50% with their respective quantization and topographic errors.

The application of the SOM to these training data is carried out using a fixed number of neurons and topological relations. The selected grid shape is a toroid with hexagonal neurons, whose number is established based on the ratio between the eigenvalues of the training data.

Training of the map is conducted in two phases: first rough and then fine. The rough training involves 20 iterations using a Gaussian neighbourhood with an initial and final radius of 29 units and 8 units, and the fine training involves 400 iterations using a Gaussian neighbourhood with an initial and final radius of 8 units and 1 unit. Table 4 presents the SOM training parameters.

The representation of the self-organizing maps in the form of a U-matrix (unified distance matrix) in terms of Euclidean distance is presented in Figure 11, demonstrating the distance between neighbouring nodes for all attributes (chemical analysis), where the colour-temperature scale represents the similarities between adjacent nodes in blue–green and the dissimilarities in orange–red. From Figure 11, it is possible to establish relationships of similarities between samples with a more homogeneous pattern, mainly seen in Figure 11b,e.

3.3. Correlation and Results Evaluation

To obtain new values for the total recovery, self-organizing maps were simulated in five stages of 10%–50% exclusion of the total recovery data and gold results, from which the accuracy of the imputed total recovery data was evaluated. To evaluate the results obtained, the summary statistic parameters of total recovery of the original data, as well as those obtained by the BMU generated by the SOM analysis, are shown in Table 5.

Plots comparing the observed and imputed values for the independent variables depict a high degree of correspondence, indicating that the SOM is a reasonable estimator. For example, the respective degree of correlation among the observed and imputed values ranged from R² = 0.93 for 10% exclusion to R² = 0.85 for 50% exclusion. The different exclusion scenarios are presented to compare the influence of the proportion of deleted data and their imputation.

The indices R² and RMSE make it possible to assess the quality of the model with good accuracy and promote the comparison of results. For the RMSE indicator, the lowest value denotes the best performance of the model, while for R², the higher its value, the better the fit between the analytical values and those that were predicted. Figure 12 and Figure 13 show the results of these two indices.

The RMSE and R² show the correlation between the original and imputed total recovery of the SOM model performance for the test. In general, the SOM model performed very well in the imputation of the total recovery: 93%, 90%, 89%, 86%, and 85%, respectively. The corresponding medians were close between the original and imputed value, indicating that the statistic variable can be a good error reference since it is not subject to bias values such as the mean variable.

When the coefficient of determination or R² is observed, its total variance increases considerably from the exclusion of 30% observed data, which can be explained by the model. This causes a considerable loss in adherence to the model. As for the RMSE, unlike R², the largest deviations are above 40%, where there is a greater weight for large deviations between the original and predicted data.

Based on the historical data accumulated over more than 20 years of operation, the average recovery in the concentration plant process is 81%, and in the hydrometallurgical recovery, it is 97%, thus resulting in an overall recovery of 88.9%. The concentration of points in the range of 80–100 in the imputation correlation graphs (Figure 14) refer to the greater occurrence of recovery in that range. Dispersion below 80% has a strong contribution to poor recovery when there is an excess of gold associated with silicate minerals or contaminating minerals.

4. Conclusions

The principle of self-organizing maps (SOM) has been used extensively as an analytical and visualization tool in exploratory data analysis, with plenty of practical applications. This study presented an imputation of the total recovery variable under different percentages of intentionally missing data.

The relationships between the geochemical variables, gold, and total recovery can be very well visualized by using partly pre-processed data with the SOM model. The relationships revealed by the SOM model agreed with corresponding correlation analyses. The SOM model performed well in predicting total recovery mainly between 10%–30% of exclusion, where the predictions were accurate between 88% and 93%. Above 30%, the indices presented by the coefficient of determination (R²) and root mean squared error (RMSE) did not show good adherence to the model, with calculated indices of 0.86 and 0.85 for R². These indices, also shown by the RMSE (E_40%: 4.96 and E_50%: 5.89), translate into a greater variance between the original and predicted data.

The SOM model, which showed a good prediction performance, replaced the missing values and outliers, was not affected by missing values (mainly oxides), and processed incomplete datasets relatively well, leading to good predictions. Regarding the influence of the chemical parameters and the calculated variable used in this study, the SOM proved to be efficient when tested on samples from these sources. It should be taken into account that the result of the recovery variable may be vulnerable to possible errors in the proposed formula. In conclusion, the results indicated that the SOM model is a practical tool for imputation of the chemical composition of analytical data, besides its recognized ability of classification, integration, and interpretation of multivariate data. It is a tool that can be useful in the development of geometallurgical models, especially when working with a large volume of information and missing data.

Based on the high correlation between the original values measured by chemical analysis in the laboratory and those imputed by SOM, the effectiveness of SOM for the imputation of data with up to 50% of missing values was clearly defined.

Author Contributions

F.R.C.—methodology, investigation, writing—original draft, C.d.C.C.—conceptualization, writing—review & editing, validation. C.U.—conceptualization, methodology, resources, writing—review & editing, project administration, funding. All authors have read and agreed to the published version of the manuscript.

Funding

Infrastructure was provided by LCT and InTRA Laboratories. Scholarship from F.R.C. M.T. was provided by Coordination for the Improvement of Higher Education Personnel (CAPES).

Data Availability Statement

Data supporting the findings of this study will be made available from the corresponding author, upon reasonable request.

Acknowledgments

We would like to thank the technical team at the Laboratório de Caracterização Tecnológica at Escola Politécnica da USP for their analytical support, InTRA for software availability, FINEP 01.18.0041.00 and 01.18.0137.00, Fapesp 2020/06754-0 and 2020/08476-8 for infrastructure and the scholarship offered by CAPES to F.R. Costa. The authors would also like to thank the anonymous referee for reviewing the manuscript and providing valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Konadu, K.J.; Huddy, R.J.; Harrison, S.T.L.; Osseo-Asare, K.; Sasaki, K. Sequential pretreatment of double refractory gold ore (DRGO) with a thermophilic iron oxidizing archeaon and fungal crude enzymes. Miner. Eng. 2019, 138, 86–94. [Google Scholar] [CrossRef]
Lorenzen, L. Some guidelines to the design of a diagnostic leaching experiment. Miner. Eng. 1995, 8, 247–256. [Google Scholar] [CrossRef]
Soltani, F.; Darabi, H.; Badri, R.; Zamankhan, P. Improved recovery of a low-grade refractory gold ore using flotation–preoxidation–cyanidation methods. Int. J. Min. Sci. Technol. 2014, 24, 537–542. [Google Scholar] [CrossRef]
Dunne, R. Flotation of gold and gold-bearing ores. In Gold Ore Processing: Project Development and Operations, 1st ed.; Adams, M.D., Ed.; Elsevier: Amsterdam, The Netherlands, 2016; pp. 315–338. [Google Scholar]
Asamoah, R.K.; Skinner, W.; Addai-Mensah, J. Enhancing gold recovery from refractory bio-oxidised gold concentrates through high intensity milling. Miner. Proces. Extr. Metall. 2020, 129, 64–73. [Google Scholar] [CrossRef]
Butcher, A.R. A practical guide to some aspects of mineralogy that affect flotation. In Flotation Plant Optimisation, 16th ed.; Greet, C.J., Ed.; Spectrum Series; Australasian Institute of Mining & Metallurgy: Perth, Australia, 2010; pp. 83–93. [Google Scholar]
Chryssoulis, S.L. Using Mineralogy to Optimize Gold Recovery by Flotation. JOM 2001, 53, 48–50. [Google Scholar] [CrossRef]
Gu, Y. Automated scanning electron microscope based mineral liberation analysis: An introduction to JKMRC/FEI mineral liberation analyser. J. Min. Mater. Charact. Eng. 2003, 2, 33–41. [Google Scholar] [CrossRef]
Goodall, W.R. Characterisation of mineralogy and gold deportment for complex tailings deposits using QEMSCAN. Miner. Eng. 2008, 21, 518–523. [Google Scholar] [CrossRef]
Goodall, W.R.; Scales, P.J.; Butcher, A.R. The use of QEMSCAN and diagnostic leaching in the characterisation of visible gold in complex ores. Miner. Eng. 2005, 18, 877–886. [Google Scholar] [CrossRef]
Coetzee, L.L.; Theron, S.J.; Martin, G.J.; Van Der Merwe, J.; Stanek, T.A. Modern gold deportments and its application to industry. Miner. Eng. 2011, 24, 565–575. [Google Scholar] [CrossRef]
Maxwell, A.P.; Denby, B.; Pitts, W. The Application of Neural Networks to Size Analysis of Minerals on Conveyors. In Proceedings of the 25th International Symposium on the Application of Computers and Operations Research in the Minerals Industries (APCOM), Brisbane, Australia, 9–14 July 1995. [Google Scholar]
Petersen, K.; Lorenzen, L.; Amandale, D. The use of neural network analysis of diagnostic leaching data in gold liberation modeling. J. S. Afr. Inst. Min. Metall. 2003, 103, 113–118. [Google Scholar]
Bakarr, J.A.; Sasaki, K.; Yaguba, J.; Karim, B.A. Integrating artificial neural networks and geostatistics for optimum 3D geological block modeling in mineral reserve estimation: A case study. Int. J. Min. Sci. Technol. 2016, 26, 581–585. [Google Scholar] [CrossRef]
Lishchuk, V.; Lund, C.; Ghorbani, Y. Evaluation and comparison of different machine-learning methods to integrate sparse process data into a spatial model in geometallurgy. Miner. Eng. 2019, 34, 156–165. [Google Scholar] [CrossRef]
Oliver, S.; Willingham, D. Maximise orebody value through the automation of resource model development using machine learning. In Proceedings of the Third AusIMM International Geometallurgy Conference, Perth, Australia, 15–16 June 2016; pp. 295–301. [Google Scholar]
Lishchuk, V.; Koch, P.H.; Ghorbani, Y.; Butcher, A.R. Towards integrated geometallurgical approach: Critical review of current practices and future trends. Miner. Eng. 2020, 145, 2–16. [Google Scholar] [CrossRef]
Lund, C.; Lamberg, P.; Linderberg, T. Practical way to quantify minerals from chemical assays at Malmberget iron ore operations. An important tool for the geometallurgical program. Miner. Eng. 2013, 49, 7–16. [Google Scholar] [CrossRef]
Lamberg, P. Particles—the bridge between geology and metallurgy. In Proceedings of the Conference on Mineral Engineering, Luleå, Sweden, 7–9 February 2013; pp. 1–16. [Google Scholar]
Lane, G.R.; Martin, C.; Pirard, E. Techniques and applications for predictive metallurgy and ore characterization using optical image analysis. Miner. Eng. 2008, 21, 568–577. [Google Scholar] [CrossRef]
Henley, K.J. Ore-dressing mineralogy—A review of techniques, applications and recent developments. Geol. Soc. S. Afr. 1983, 7, 175–200. [Google Scholar]
Kohonen, T. Self-organized formation of topologically correct feature maps. Biol. Cybern. 1982, 43, 59–69. [Google Scholar] [CrossRef]
Kohonen, T. The self-organizing map. Proc. IEEE 1990, 79, 1464–1480. [Google Scholar] [CrossRef]
Kohonen, T. Self-Organizing Maps, 3rd ed.; Berlin, Heidelberg, New York, Barcelona, Hong Kong, London, Milan, Paris, Singapore, Tokyo, Springer Series; Information Sciences: Berlin, Germany, 2001. [Google Scholar]
Carneiro, C.C.; Silva Yanez, D.N.V.; Ulsen, C.; Fraser, J.S.; Antoniassi, J.L.; Paz, S.; Angélica, R.S.; Kahn, H. Imputation of Reactive Silica and Available Alumina in Bauxites by Self-Organizing Maps. In Proceedings of the 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization, Nancy, France, 28–30 June 2017; WSOM: Geneva, Switzerland, 2017. [Google Scholar]
Malek, M.A.; Harun, A.; Shamsuddin, S.M.; Mohamad, I. Imputation of time series data via Kohonen Self-Organizing Maps in the presence of missing data. Proc. World Acedemy Sci. Eng. Technol. 2008, 31, 502–507. [Google Scholar]
Fraser, S.J.; Dickson, B.L. A New Method for Data Integration and Integrated Data Interpretation: Self-Organising Maps. In Proceedings of the Exploration 07: Fifth Decennial International Conference on Mineral Exploration, Toronto, ON, Canada, 9–12 September 2007; Milkereit, B., Ed.; pp. 907–910. [Google Scholar]
Carneiro, C.C.; Fraser, S.J.; Crósta, A.P.; Silva, A.M.; Barros, C.E.D.M. Semiautomated geologic mapping using self-organizing maps and airborne geophysics in the Brazilian Amazon. Geophysics 2012, 77, 17–24. [Google Scholar] [CrossRef] [Green Version]
Ultsch, A.; Siemon, H.P. Kohonen’s Self Organizing Feature Maps for Exploratory Data Analysis. In Proceedings of the International Neural Network Conference, Paris, France, 9–13 July 1990; pp. 305–308. [Google Scholar]
Friedel, M.J. Modeling hydrologic and geomorphologic responses across post-fire landscapes using a self-organizing map approach. Environ. Model. Softw. 2011, 26, 1660–1674. [Google Scholar] [CrossRef]
Vesanto, J.; Alhoniemi, E. Clustering of the self organized map. IEEE Trans. Neural Netw. 2000, 1, 586–600. [Google Scholar] [CrossRef] [PubMed]
Costa, F.R. Caracterização Tecnológica do Minério de Ouro da Mina Morro do Ouro—Paracatu, MG. Master’s Thesis, Escola Politécnica, Universidade de São Paulo, São Paulo, Brazil, 2016. [Google Scholar]
Deschênes, G.; Rousseau, M.; Tardif, J.; Prud’homme, P.J.H. Effect of the composition of some minerals on cyanidation and use of lead nitrate and oxygen to alleviate their impact. Hydrometallurgy 1998, 50, 205–222. [Google Scholar] [CrossRef]

Figure 1. Drillholes scheme and analytical ranges. Gray and green hatching means similar gold grade ranges.

Figure 2. Gold recovery trend (total recovery) as a function of the gold content.

Figure 3. Schematic of SOM data mining attributes adapted from Friedel [30].

Figure 4. Scatterplot of arsenic versus gold grade.

Figure 5. Scatterplot of sulphur versus gold grade.

Figure 6. Scatterplot of arsenic content and total gold recovery.

Figure 7. Scatterplot of aluminium oxide versus gold grade.

Figure 8. Scatterplot of K₂O versus gold grade.

Figure 9. Scatterplot of MgO versus gold grade.

Figure 10. Scatterplot of CaO versus gold grade.

Figure 11. Representation of the U-matrix, with the exclusion of (a) 10%, (b) 20%, (c) 30%, (d) 40%, and (e) 50%.

Figure 12. RMSE data from 10 to 50% of sample exclusion.

Figure 13. Coefficient of determination data from 10 to 50% samples exclusion.

Figure 14. Correlation between the original and imputed total gold recovery data from (a) 10%, (b) 20%, (c) 30%, (d) 40%, and (e) 50% of sample exclusion. R² is the coefficient of determination.

Table 1. Number of samples excluded for SOM imputation analysis.

Exclusion %	E_0%	E_10%	E_20%	E_30%	E_40%	E_50%	Variables
Number of samples	1397	140	279	419	559	699	7

E_%: percentage of excluded data.

Table 2. Summary statistics of variables analysed.

Statistic	Au (g/t)	As (ppm)	S (%)	Al₂O₃ (%)	CaO (%)	K₂O (%)	MgO (%)
N	1397	1397	1390	1073	1174	1205	1168
Mean	0.825	2204	0.956	4.603	0.306	0.181	0.463
Median	0.771	2108	0.943	4.186	0.311	0.175	0.461
SD	0.488	1005	0.258	2.150	0.074	0.031	0.065
Maximum	1.99	6171	1.920	15.683	0.490	0.290	0.656
Minimum	0.001	44	0.352	1.480	0.100	0.100	0.230

N: number of samples; SD: standard deviation.

Table 3. Samples preparation and parameters for SOM input.

Exclusion	Rows	Columns	Te	Qe
E10%	20	20	0.106	0.776
E20%	20	20	0.092	0.769
E30%	20	20	0.106	0.752
E40%	20	20	0.109	0.739
E50%	20	20	0.082	0.730

Topographic error (Te): the proportion of all data vectors for which the first and second BMUs are not adjacent units. Quantization error (Qe): Average distance between each data vector and its BMU, as a measure of map resolution.

Table 4. SOM training parameters.

Rough Training			Fine Training
Ir1	Fr1	TL1	Ir2	Fr2	TL2
29	8	20	8	1	400

Ir: Initial radius; Fr: Final radius; Tl: Training length.

Table 5. Statistical analyses of the total recovery variable (from 10% to 50% of sample exclusion).

Data	Exclusion (%)	Samples	Minimum	Maximum	Mean	Median	SD	R²	RMSE
Original	10	140	15.30	93.29	76.04	84.15	17.85	0.93	4.42
BMU	10	140	26.01	92.70	75.49	83.26	17.20	0.93	4.42
Original	20	279	10.36	93.56	73.54	81.74	18.94	0.90	4.79
BMU	20	279	27.02	92.63	74.00	81.33	17.76	0.90	4.79
Original	30	419	10.36	93.56	80.74	86.20	15.09	0.89	4.83
BMU	30	419	25.10	92.81	80.43	86.32	14.84	0.89	4.83
Original	40	559	9.48	93.40	78.92	85.06	15.62	0.86	4.96
BMU	40	559	30.75	92.70	78.99	85.22	14.31	0.86	4.96
Original	50	699	8.58	93.40	77.71	83.64	15.92	0.85	5.89
BMU	50	699	26.01	92.60	77.99	84.82	15.62	0.85	5.89

BMU: best matching unit; SD: standard deviation; R²: coefficient of determination, RMSE: root mean square error.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Costa, F.R.; Carneiro, C.d.C.; Ulsen, C. Imputation of Gold Recovery Data from Low Grade Gold Ore Using Artificial Neural Network. Minerals 2023, 13, 340. https://doi.org/10.3390/min13030340

AMA Style

Costa FR, Carneiro CdC, Ulsen C. Imputation of Gold Recovery Data from Low Grade Gold Ore Using Artificial Neural Network. Minerals. 2023; 13(3):340. https://doi.org/10.3390/min13030340

Chicago/Turabian Style

Costa, Fabrizzio Rodrigues, Cleyton de Carvalho Carneiro, and Carina Ulsen. 2023. "Imputation of Gold Recovery Data from Low Grade Gold Ore Using Artificial Neural Network" Minerals 13, no. 3: 340. https://doi.org/10.3390/min13030340

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Imputation of Gold Recovery Data from Low Grade Gold Ore Using Artificial Neural Network

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Experimental Procedure

2.3. Self-Organizing Maps Technique

2.4. SOM Preparation and Training Criteria

3. Results and Discussion

3.1. Geological Database

3.2. Self-Organizing Maps Analysis

3.3. Correlation and Results Evaluation

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI