ELM-Based AFL–SLFN Modeling and Multiscale Model-Modiﬁcation Strategy for Online Prediction

: Online prediction of key parameters (e.g., process indices) is essential in many industrial processes because online measurement is not available. Data-based modeling is widely used for parameter prediction. However, model mismatch usually occurs owing to the variation of the feed properties, which changes the process dynamics. The current neural network online prediction models usually use ﬁxed activation functions, and it is not easy to perform dynamic modiﬁcation. Therefore, a few methods are proposed here. Firstly, an extreme learning machine (ELM)-based single-layer feedforward neural network with activation-function learning (AFL–SLFN) is proposed. The activation functions of the ELM are adjusted to enhance the ELM network structure and accuracy. Then, a hybrid model with adaptive weights is established by using the AFL–SLFN as a sub-model, which improves the prediction accuracy. To track the process dynamics and maintain the generalization ability of the model, a multiscale model-modiﬁcation strategy is proposed. Here, small-, medium-, and large-scale modiﬁcation is performed in accordance with the degree and the causes of the decrease in model accuracy. In the small-scale modiﬁcation, an improved just-in-time local modeling method is used to update the parameters of the hybrid model. In the medium-scale modiﬁcation, an improved elementary e ﬀ ect (EE)-based Morris pruning method is proposed for optimizing the sub-model structure. Remodeling is adopted in the large-scale modiﬁcation. Finally, a simulation using industrial process data for tailings grade prediction in a ﬂotation process reveals that the proposed method has better performance than some state-of-the-art methods. The proposed method can achieve rapid online training and allows optimization of the model parameters and structure for improving the model accuracy.


Introduction
Data-based modeling is becoming important in the field of engineering [1]. However, such models may become inaccurate or ineffective owing to process dynamics and disturbances, such as variations of the feed properties, the process conditions, and aging equipment [2,3]. In particular, for systems with high complexity, uncertainty, or stochastic characteristics, the mechanism may not be clear; thus, an accurate process model cannot be built. For example, in the grinding and flotation processes in mineral processing, the variation of the feed properties is frequent and is large in long-term production. Additionally, the process structure may be modified to adapt to the feed properties. Thus, the model usually becomes increasingly ineffective, because its generalization ability is limited in data-based modeling [4]. If the model cannot accurately reflect the behavior of the process, the model-mismatch problem occurs. Therefore, model online modification is significant. Meanwhile, considering that model mismatch may be caused by different reasons, an effective modification strategy is also necessary. activation-function learning for a single-layer feedforward neural network (AFL-SLFN) was proposed by our research group. In this study, to further improve the model accuracy and make the modelling method more effective for online application in industrial process, an adaptive weighted hybrid intelligent modeling method with a multiscale online modification strategy is proposed and validated using the flotation process. This paper makes the following contributions: (1) This paper combines ELM and AFL-SLFN. This allows the activation function of ELM to change adaptively. It is convenient for pruning to obtain a simpler network structure. In general, a single model does not perform well. Then, a hybrid model with adaptive weights is established by using the AFL-SLFN as a sub-model, which improves the prediction accuracy. (2) To track the process dynamics and maintain the generalization ability of the model, a multiscale model-modification strategy is proposed. That is, small-, medium-, and large-scale modification is performed in accordance with the degree and the causes of the decrease in model accuracy.
In the small-scale modification, the just-in-time learning model can quickly reflect the change of working conditions. In order to improve the prediction accuracy of the just-in-time learning model, the spatial distance and cosine value between the input sample point and the historical sample point are fully considered to calculate their similarity and to improve the quality of the just-in-time dataset. In the medium-scale modification, the Morris method is improved by redefining the elementary effect (EE)-based Morris, where the model input parameters are mapped to a new interval and, therefore, its scope of application is expanded. Simulation results obtained using industrial data from a flotation process are presented and analyzed.
The remainder of this paper is organized as follows: Section 2 describes the ELM and AFL-SLFN-based adaptive hybrid modeling method. In Section 3, the online modification strategy is proposed. Simulation results are presented and discussed in Section 4. Lastly, the conclusion and future work are presented in Section 5.

ELM and AFL-SLFN-Based Adaptive Hybrid Modeling Methodology
In industrial processes, particularly those where natural resources are used as raw materials, e.g., mineral processing, metallurgical processes, and petrochemical processes, the process performance is related to not only the process conditions but also the properties of the raw materials. The process becomes more complicated and varies with time owing to the variation of the feed and the process conditions; thus, the process models used for prediction, control, and optimization usually become worse over time. Hence, an online modification method must be adopted to enhance the adaptability of the model. Additionally, owing to the requirement of real-time production, online modification with fast learning is necessary. Therefore, in this study, a modeling method based on an ELM and an AFL-SLFN is proposed for the prediction or soft-sensing of key process parameters. However, owing to the complexity of the industrial process, a single network is not adequate. According to the measurement theory, the average value of multiple measurements can approach the true value. Thus, an AFL-SLFN model base is established, and the adaptive weighted average of the values of multiple networks is used as the final prediction value.

ELM-Based AFL-SLFN
In neural-network modeling, the activation functions are determined before the training of the network and are not subsequently changed. However, these activation functions do not have physical meaning. For some actual processes, the relationship between the inputs and outputs can be represented by a set of simple functions. Therefore, to improve the adaptive ability of the network and make it match the physical relationship between the inputs and outputs more closely, a new type of SLFN with learning for the activation function is proposed. The structure of the single-hidden-layer neural network with multiple inputs and one output is illustrated in Figure 1. In Figure 1, the relationship between the inputs and outputs is as follows: Thus, where x is the input of the neuron, and y is the output of the neuron; w and b are randomly determined weight and bias parameters. They are not changed after they are initially determined.
is a cluster of base functions, such as the trigonometric sines {sin(x), sin(2x), sin(3x), …} or polynomial functions {1, x, x 2 , x 3 , …}. We can select the activation function according to the characteristics of the data. The activation function and parameters β are not fixed; they are regulated in the training procedure to obtain optimal network performance. For a general description of the activation-function learning neural network, please refer to our previous work [34]. ELM is used as the learning algorithm. In this algorithm, the weight and bias parameters of the input to the hidden layer are randomly assigned, and only the weights of the output layer are regulated [35][36][37]. Thus, it has fast learning and high accuracy [38][39][40], making it suitable for online learning. For details regarding the extreme learning algorithm, please refer to References [35,36].

Adaptive Hybrid Model Based on Multiple AFL-SLFNs
In application, the simultaneous training of multiple SLFNs online is time-consuming. Thus, to reduce the computation time, an SLFN model base is constructed using the historical data of the process. Then, when a new sample is obtained, multiple SLFNs are activated and combined to obtain the prediction results.
We previously developed a hybrid model based on multiple excellent ELM models, which combines the advantages of each sub-model [33]. However, the weights and activation functions of the different sub-models are the same. Considering the differences in performance among the submodels, an adaptive weighted hybrid model based on an AFL-SLFN was proposed, and the weights were calculated with prediction errors.
The structure of the hybrid model based on multiple SLFNs is shown in Figure 2. In Figure 2, there are R sub models for hybrid modeling, and the prediction value after data fusion ˆn ew y is where pr is the weight of the r-th model. In Figure 1, the relationship between the inputs and outputs is as follows: Thus, where x = [x 1 , x 2 , · · · , x i , · · · , x n ] is the input of the neuron, and y is the output of the neuron; w and b are randomly determined weight and bias parameters. They are not changed after they are initially determined. ϕ 1 , ϕ 2 , · · · , ϕ m is a cluster of base functions, such as the trigonometric sines {sin(x), sin(2x), sin(3x), . . . } or polynomial functions {1, x, x 2 , x 3 , . . . }. We can select the activation function according to the characteristics of the data. The activation function and parameters β are not fixed; they are regulated in the training procedure to obtain optimal network performance. For a general description of the activation-function learning neural network, please refer to our previous work [34]. ELM is used as the learning algorithm. In this algorithm, the weight and bias parameters of the input to the hidden layer are randomly assigned, and only the weights of the output layer are regulated [35][36][37]. Thus, it has fast learning and high accuracy [38][39][40], making it suitable for online learning. For details regarding the extreme learning algorithm, please refer to References [35,36].

Adaptive Hybrid Model Based on Multiple AFL-SLFNs
In application, the simultaneous training of multiple SLFNs online is time-consuming. Thus, to reduce the computation time, an SLFN model base is constructed using the historical data of the process. Then, when a new sample is obtained, multiple SLFNs are activated and combined to obtain the prediction results.
We previously developed a hybrid model based on multiple excellent ELM models, which combines the advantages of each sub-model [33]. However, the weights and activation functions of the different sub-models are the same. Considering the differences in performance among the sub-models, an adaptive weighted hybrid model based on an AFL-SLFN was proposed, and the weights were calculated with prediction errors.
The structure of the hybrid model based on multiple SLFNs is shown in Figure 2. In Figure 2, there are R sub models for hybrid modeling, and the prediction value after data fusionŷ new iŝ The same weight is not optimal for all the SLFNs. Thus, an adaptive weight is assigned to each SLFN.
The mean and variance equations of the single model are shown in Equations (7) and (8), respectively. Then, the variance of the hybrid model is calculated using Equation (9).
where 2 σ is a second-order function ( ) . It is easy to mathematically prove that the model with optimal weight for different sub-models is more accurate The same weight is not optimal for all the SLFNs. Thus, an adaptive weight is assigned to each SLFN.
The mean and variance equations of the single model are shown in Equations (7) and (8), respectively. Then, the variance of the hybrid model is calculated using Equation (9).
whereŷ r jnew is the prediction data at the j-th moment from the r-th SLFN. The measured output value for the new sample is denoted as y new . The sample interval is selected by sliding the window to maintain the number of samples as n s .
Then, we obtain where σ 2 is a second-order function f (p r ) of the model weight p r . We can determine the optimal weight p r for obtaining the minimum σ 2 min . This yields an optimization problem with the objective function f (p r ). It is easy to mathematically prove that the model with optimal weight for different sub-models is more accurate than the model with the average weight for the sub-models. Additionally, this was proven by the examples in Reference [41]. The mathematical proof is omitted here.
To estimate the performance of the hybrid model, the root-mean-square error (RMSE), mean relative error (MRE), correlation coefficient (R 2 ), and average runtime (Time) are used as evaluation indicators. The RMSE, MRE, and R 2 are frequently used statistical indicators. Time indicates the computation time of the modification procedure. If the RMSE, MRE, and Time are closer to 0, the performance is better. If R 2 is closer to 1, the regression is better.
where y i is the measured value,ŷ i is the predicted value, and N is the number of samples.

Online Modification Method of Hybrid Process Model
In Section 2, an adaptive hybrid model was established. Because the mechanism and dynamics of industrial processes are usually complex and time-varying, when the feed properties and operating conditions change, the hybrid model may not adapt to the new samples. Thus, an online model-modification strategy, which includes small-scale modification, medium-scale modification, and large-scale modification, is designed to update the model parameters or model structure or rebuild the model, respectively. In a period of production time, the distribution of prediction errors is statistically analyzed, and the corresponding modification strategies are made for the hybrid model.
The absolute value of the error, relative error, or RMSE can be used to evaluate the model for updating. The absolute value of the error is calculated as where p i represents the measured output of the i-th sample, andp i represents the corresponding predicted value. The accuracy of the hybrid model is evaluated via simulation online. The sample interval is selected by sliding the window to maintain the number of samples as n s , and the absolute value of the error is statistically analyzed. If the prediction error of the hybrid model is below a threshold determined by technicians according to the benefit of the process, the model is considered to be accurate. Otherwise, the reason why the model is inaccurate is analyzed, and the model is modified at a different scale according to the variation range of the model error. The modified model is monitored for a time period to ensure that the modification is effective. Otherwise, the modification process continues. The more detailed modification procedure, as shown in Figure 3, is described below.  The prediction error for a single sample is denoted as ε , and the thresholds of the prediction error are 0 ε , 1 ε , and 2 ε , with 0 1 2 0 ε ε ε < < < . In a time period, the numbers of errors in ε ε , and 2 [ , ] ε +∞ are denoted as 0 n , 1 n , 2 n , and 3 n , respectively. The probability of the error in the four ranges is calculated in Equation (18 Suppose that the four thresholds for the probability are ( 0,1,2,3) Then, the modification is performed according to the relationship between i pr and i P .
pr P ≥ , the model is considered to be accurate, and no modification is needed. If the distribution pr1 has 1 1 pr P ≥ , the error of the prediction model is slightly too large; thus, the requirement of the process is not satisfied. Usually, this is not caused by a large variation of the feed properties. Thus, only the parameters of the sub-models are modified. An improved K-neighbor justin-time learning algorithm is proposed to retrain the SLFN models for improving the accuracy, which is called "small-scale modification". Details of the method can be found in Section 3.1.
If the distribution pr2 has 2 2 pr P ≥ , the error of the prediction model is large. Then, structure modification is performed on the SLFN sub-model by using the structure pruning method, and the model is updated. This is called "medium-scale modification". Details of the method can be found in Section 3.2. If 3 3 pr P ≥ , the model error is very large, and the variations of the feed properties and the working conditions are considered at the same time. Firstly, all the hybrid SLFN models are modified by updating the model parameters and structure updating. If the model is still not accurate, it is not The prediction error for a single sample is denoted as ε, and the thresholds of the prediction error are ε 0 , ε 1 , and ε 2 , with 0 < ε 0 < ε 1 < ε 2 . In a time period, the numbers of errors in [0, ε 0 ], [ε 0 , ε 1 ], [ε 1 , ε 2 ], and [ε 2 , +∞] are denoted as n 0 , n 1 , n 2 , and n 3 , respectively. The probability of the error in the four ranges is calculated in Equation (18).
Suppose that the four thresholds for the probability are P i (i = 0, 1, 2, 3). Then, the modification is performed according to the relationship between pr i and P i .
If pr 0 ≥ P 0 , the model is considered to be accurate, and no modification is needed. If the distribution pr 1 has pr 1 ≥ P 1 , the error of the prediction model is slightly too large; thus, the requirement of the process is not satisfied. Usually, this is not caused by a large variation of the feed properties. Thus, only the parameters of the sub-models are modified. An improved K-neighbor just-in-time learning algorithm is proposed to retrain the SLFN models for improving the accuracy, which is called "small-scale modification". Details of the method can be found in Section 3.1.
If the distribution pr 2 has pr 2 ≥ P 2 , the error of the prediction model is large. Then, structure modification is performed on the SLFN sub-model by using the structure pruning method, and the model is updated. This is called "medium-scale modification". Details of the method can be found in Section 3.2.
If pr 3 ≥ P 3 , the model error is very large, and the variations of the feed properties and the working conditions are considered at the same time. Firstly, all the hybrid SLFN models are modified by updating the model parameters and structure updating. If the model is still not accurate, it is not applicable to the current working conditions, and remodeling is considered. This is called "large-scale modification".

Improved K-Neighbor Just-In-Time Learning for Small-Scale Modification
In this section, a small-scale modification method using just-in-time modeling is described. Here, a method is proposed to select a better modeling dataset for the just-in-time method, and the original model is then retrained to update the model parameters. When the just-in-time learning method is used for online modeling, firstly, the samples most similar to the current sample are selected, and then the selected samples are employed to construct a new model or retrain the existing model. The current input sample is denoted as x q (1 × n), where n is the number of input variables. This is also called a query vector.
The historical input samples are denoted as X(N × n), with i = 1, 2, · · · , N, where x i represents the i-th historical input sample with n inputs, which is also called the response vector, and N represents the number of the samples. Then, The Euclidean distance d qi and the cosine of the vectoral angle cos θ qi of x q and x i are given as follows: where x ij is the j-th variable of sample x i . Then, the similarity between the query vector x q and the response vector x i is evaluated to select samples for just-in-time modeling. Usually, the following Equation is used to calculate the similarity [12,13]: In Equation (23), λ ∈ [0, 1] is an unknown parameter that is usually determined by experience. According to many studies [12][13][14][15][16][17][18][19][20], the samples used for just-in-time learning significantly affect the performance of the model. In order to improve the quality of the dataset selected for just-in-time learning modeling, a new similarity-evaluation Equation is proposed below.
where c qi represents the similarity of the samples x q and When c qi < 0, the cosine angle of the two samples is large, the similarity is small, and x i is not suitable for just-in-time modeling. When c qi > 0, the similarity is larger. If c qi is close to 1, the sample x i is selected for just-in-time modeling.
The first k samples in descending order of the similarity are selected for just-in-time modeling, which can be expressed as follows: Furthermore, the modeling samples can be optimized in a principal component analysis (PCA) model. From the viewpoint of the correlation among variables, Q is described as distance and T 2 is used to guarantee the samples in local region. Then, the two indices are fused to obtain a novel index, whose minimum corresponding to k is the optimal size for local samples [15]. In this study, k = 8.
Then, the SLFN models in the model base are retrained using the ELM and the method described in Section 2.2.

Morris-Based SLFN Structure Pruning for Medium-Scale Modification
For medium-scale modification, the structure of the SLFNs is regulated. Based on the influence of neurons on the output of the model, the pruning is carried out to optimize the network structure. An improved EE is used to evaluate the contribution of a node to the output of the network. If the EE is below a threshold, the node is deleted. When determining the influence of the input layer on the model output, the input variable refers to the input feature (the input of the model). When determining the influence of the hidden layer on the output, the input variable refers to the output of the hidden layer neuron.
The EE was defined by Morris in 1991 [21]. In the standard EE-based Morris pruning method, the values of the input variables are mapped to the range [0, 1], and the space of the input variables is a super-cube with dimension K. If the output y is differentiable, ∂ i (x) = ∂y/∂x i can be used as an index to evaluate the influence of the input variable x i on the output y. Then, ∂ i (x) may be equal to 0 or a nonzero constant for all input vectors x. It may also be a non-constant function of x i or one or more x i ( j i). These situations correspond to four cases where the influence of x i on y can be ignored, is linearly addable, or is nonlinear and depends on other variables.
We know that all the input values are in the range [0, 1]. The input values are discretized, and the input variable x i is set equal to one of the values in 0, 1/(p − 1), 2/(p − 1), · · · , 1 , where p is an even number determined by experience.
The EE of x i is defined as where ∆ is a predetermined multiple of 1/(p − 1). That is, one EE value is obtained by running the model twice. The first time, the value of the input variable x j ( j = 1, 2, 3, · · · , k) can be randomly selected, and, the second time, the input value should have an increment of ∆. Then, the significance of the input to the output can be determined by running the model for a number of times proportional to k or k 2 . For each input, numerous p k−1 (p − ∆(p − 1)) EE values are obtained. In the Morris method, it is assumed that the "basic factor (EE)" obeys a certain distribution F i . The mean quantifies the individual effect of the input on the output, whereas the standard deviation estimates the combined effects of the input due to nonlinearities or interactions with other inputs. These sensitivity measures can be used to rank the inputs according to their relative importance and determine non-influential parameters that may be fixed in subsequent model calibration.
In the standard EE-based Morris pruning method, the input values of a variable must be mapped to the range [0, 1], and the distribution of the input value is usually uniform [42,43]. However, in practice, the value of a variable may vary in a large range and be distributed nonuniformly. Mapping all the values in a large range to the range [0, 1] may result in an unreasonable sample density. Additionally, different input variables may have different types of distributions. In such cases, the standard EE-based Morris pruning method is not suitable. Thus, herein, an improved EE is proposed.
The input variables are denoted as x = [x 1 , x 2 , · · · , x k ], and the output variable is denoted as y, where k is the number of input variables. Suppose that there is only one output. The range of the i-th input variable x i is denoted as [a i , b i ], and different variables have different types of distributions.
ab min = min(ab i ), i = 1, 2, 3, · · · , k, Processes 2019, 7, 893 11 of 23 0 < ∆ < ab min /p, To minimize the number of model evaluations which are required to compute the sensitivity measures, Morris designed a random orientation matrix B* = (J m,1 X * + (∆/2)[(2B − J m,k ) D* + J m,k ]) P*, where B* is constructed using B, B is an m × k strictly lower triangular matrix of ones, J m,k is an m × k matrix of ones, D* is a k × k diagonal matrix with elements chosen randomly from the set [−1, 1], and P* is a k × k matrix constructed by randomly permuting the columns of a k × k identity matrix. For more information on the sampling design, please refer to References [21,42].
The acquisition of the above matrix is a randomization process. Suppose the two rows of B which differ only in their i-th elements (i = 1, 2, . . . , k) B(i) = x 1 x 2 · · · x i−1 x i,1 x i+1 · · · x k x 1 x 2 · · · x i−1 x i,2 x i+1 · · · x k and the result of only the first two stages of the randomization process on these rows, then J 2,1 X* + (∆/2)[(2B(i) − J 2,k )D* + J 2,k ] can be obtained. In this study, for the input variable x i , let l i = ab i /(p − 1). Then, calculate the probability of , and each has equal probability. Then, for an input vector x, the improved EE of the i-th input is defined by Equation (26), but the range of x i is different, that is, Therefore, its scope of application is expanded.
The distribution of the EE of the input x i is denoted as F i , that is, d i (x) ∼ F i . The number of EE values is 2 k−1 p k , with the distribution F i . Then, by analyzing F i in the same way as the standard EE method does, the significance of the input variable is determined. The number of times that the model must be run for obtaining the EE values should be designed economically to limit the computation time. For more detail, please refer to References [21].
According to the modification strategy, when the accuracy of the hybrid model worsens and pr 2 ≥ P 2 , the improved EE-based Morris pruning method is activated to improve the model. The procedure for SLFN structure optimization using the pruning method is as follows: Step 1: Using Equation (12), determine the weights of all the sub-models. In the sub-models which were never modified, the one with the largest weight is modified.
Step 2: Input the sampling matrix into the sub-model with the largest input weight. Then, calculate the EE value of each hidden-layer neuron and its statistical analysis value.
Step 3: Calculate the mean and standard deviation of the neuronal EE values and optimize the neural-network structure by pruning.
Step 4: Retrain the network model. Then, use the test data to test this model, and calculate the model error again.
Step 5: If the model is still in the medium-scale modification region and there are unpruned sub-models, return to Step 1.

Large-Scale Modification
When the error of the model is too large and pr 3 ≥ P 3 , more process data are collected, and all the models are constructed from the first step.

Case Study: Online Simulation Using Industrial Data
In this section, a case study is presented, in which the foregoing modeling method and model-modification strategy are employed for modeling and prediction of the tailings grade in mineral processing. In mineral processing, the particle size in the grinding-classification process, recovery rate, concentrate grade, and tailings grade are key indices. However, in many actual processes, these indices are measured offline, causing delays for process control. Additionally, owing to the complexity of the process and the frequent changes of the feed property, prediction models for the indices using neural networks and other methods cannot adapt to the variation of the process. To solve these problems, online simulation using the proposed AFL-SLFN modeling method and the multiscale modification strategy is used to predict the tailings grade, which can improve the computation and model updating speed and the precision of the model.

Preprocesing of the Dataset
The froth features, process conditions, ore compositions and grade, concentrate grade, and tailings grade were collected in a bauxite beneficiation plant. Then, Pearson correlation analysis and significance test analysis were performed to reduce data redundancy. The variables with a coefficient larger than 0.25, as shown in Table 1, were used as inputs to predict the tailings grade. That is, 12 input features were selected out of a total number of 26 features. The grade was analyzed every 8 h, and the froth features were obtained online by analyzing froth videos and photographs. According to the flowchart of the process, it takes approximately 10 min for the ore to pass from the first scanning cell to the tailings. Therefore, the froth features between 10 and 20 min before the sampling time of the tailings were averaged and used for prediction of the tailings grade. Finally, 450 groups of data were obtained. Thus, the data covered 150 days of production. Among these data, 360 groups were used as training sets, and the remaining 90 groups were used as test sets.
There are some missing values in the original industrial data due to human causes and mechanical failures. Therefore, an associated K-nearest neighbor (Knn) method was used to interpolate the missing data. Firstly, the candidate input feature related to the input feature corresponding to each missing data value were determined by correlation analysis, and all values of the input features were normalized to be dimensionless. Then K samples nearest to the missing data were determined according to the Euclidean distance between each sample of the candidate input features. The K values were then assigned weights by distance to estimate the missing data.

Small-Scale Mdoification of Prediction Model for Tailings Grade
R represents the number of sub-models, and H represents the number of hidden nodes. Grid searches of R on {2, 3, . . . , 9, 12} and of H on {5, 8, 11, 14, . . . , 50} were performed to identify the optimal values. For selecting the activation function, we firstly set up a base-function pool, including a trigonometric function cluster, logarithmic function cluster, polynomial function cluster, or exponential function, Gaussian function, or mixed function cluster. Each activation function was a base function, and different base functions were used for different activation functions. Then, the base functions were adjusted according to the learning accuracy to obtain an optimal combination of the base functions as activation functions. Each model had different activation functions that were randomly selected from {1/(1 + e −x ), sin x, 0.5e −x 2 /10 , e −x , cos 2x}.
The effects of the parameters R and H on the results are presented in Table 2. To test the stability of the models, we conducted these experiments 20 times. The results shown in Table 2 are the stable ones.
With an increase in the number of hidden-layer neurons and sub-models, model runtime only slightly increased, the accuracy of the model was not always improved; thus, the parameters R and H were set as six and 25, respectively, which were the optimal values. Using the training data, six SLFN sub-models were obtained by conducting the training six times. Each SLFN model had 25 hidden neurons, and the number of hidden neurons was determined by tests. The prediction results for the test dataset are shown in Figure 4. Here, the prediction results represent the adaptive weighting of the six sub-models. The error of the model was calculated using the measured and predicted values, as shown in Figure 5. There were many points with large errors. Therefore, online modification was necessary.       For model modification, the number of samples in the sliding window n s was set as nine, corresponding to three days of production. According to the requirements of the actual production process and the experiences of the operators and the technicians, the error thresholds ε 0 ,ε 1 , and ε 2 were set as 0.08, 0.15, and 0.25, respectively, and the model prediction error probability thresholds P 0 , P 1 , P 2 , and P 3 were set as 0.6, 0.3, 0.3, and 0.6, respectively. Here, 0.08 and 0.15 were values accepted by the technicians in the plant, and other values were set according to experience.
Then, the distribution probability of the model error in different regions was calculated. For the first nine samples, the number of errors in the four regions was four, four, one, and zero, as shown in Table 3. Therefore, for the 10th sample, small-scale modification was needed. The improved just-in-time local modeling method was then used to update the model. In this section, only small-scale modification is considered; medium-scale modification is discussed in the next section. The online modification procedure was as follows: Step 1: According to the k-nearest theory, the eight closest samples to the query sample were selected from the historical data and used for retraining to obtain six new SLFN sub-models, each having five hidden nodes.
Step 2: The new models were then used to predict the tailings grade of the query sample. The six old models were kept in the model base.
Step 3: The sliding window was moved ahead by one sample, and the prediction errors of the samples in the new window were calculated.
Step 4: According to the distribution of the prediction errors, if modification was needed, the foregoing procedure was repeated to construct new models using the just-in-time method; otherwise, the six old models were used for prediction.
The prediction results and the corresponding errors of the small-scale modified AFL-SLFN model are illustrated in Figures 6 and 7, respectively. In Figure 7, the point with a vertical arrow represents a small-scale modification point.
As shown in Figures 4-7, the small-scale modification was effective for tailings grade prediction, and the adaptability of the model was improved. However, for samples 11-20 and 38-44, the prediction errors were large, and the small-scale modification was not adequate. Therefore, medium-scale modification was necessary. Similarly, there were small fluctuations for samples 51-54 and 68-72. As shown in Figures 4-7, the small-scale modification was effective for tailings grade prediction, and the adaptability of the model was improved. However, for samples 11-20 and 38-44, the prediction errors were large, and the small-scale modification was not adequate. Therefore, mediumscale modification was necessary. Similarly, there were small fluctuations for samples 51-54 and 68-72.   No. Sliding Window [0, 0.08) [0.08, 0.15) [0.15, 0.25 As shown in Figures 4-7, the small-scale modification was effective for tailings grade prediction, and the adaptability of the model was improved. However, for samples 11-20 and 38-44, the prediction errors were large, and the small-scale modification was not adequate. Therefore, mediumscale modification was necessary. Similarly, there were small fluctuations for samples 51-54 and 68-72.   No. Sliding Window [0, 0.08) [0.08, 0.15) [0.15, 0.25)

Medium-Scale Modification of Prediction Model for Tailings Grade
As depicted in Figure 6, the measured value of the tailings grade changed significantly for samples 11 and 38 and deviated for several consecutive samples. According to the distribution of errors listed in Table 3, for sample 16, a medium-scale modification was needed. Thus, the improved Morris pruning method was used to update the sub-model with the largest weight. The pruned model and the other five sub-models were then used to predict the next sample, and the errors were calculated and checked to determine whether further modification was needed. For the 90 samples, two medium-scale modifications were performed. For sample 40, another medium-scale modification was required. Details regarding the pruning method are presented below, taking the second medium-scale modification as an example.
For the second medium-scale modification, the sub-model with the second-largest weight was modified. We knew that there were 25 neurons in the hidden layer of the model. The learning process of the sub-model for the 360 training samples is shown in Figure 8, where the threshold value of the mean square error for training was 0.1. As shown in Figure 8, the training stopped at the 17th iteration. The RMSEs were 0.093518 and 0.099472 for the training and test, respectively. To accurately describe the effect of each hidden-layer neuron on the overall prediction output of the network model, the EE value corresponding to each neuron was calculated using the improved Morris population sampling method. According to the Morris method [21,42,43], the number of EE values of each input was set as r, and then the average of the r EE values was determined. Here, r (≥2) was set randomly. Here, according to experience, the following values were used: r = 6, p = 8, and k = 12. After the model ran once, six independent EE values were obtained for each hidden-layer neuron, and the economy of the model was 12/13. The mean value was taken as the abscissa, and the standard deviation was taken as the ordinate, as shown in Figure 9. To further describe the role of each neuron in the model, the evaluation index is shown in Figure 10.
As shown in Figures 9 and 10, the mean and the standard deviation of the 21 EE values were relatively large, indicating that the corresponding neurons had great influence on the output and could be considered as important neurons. The mean and standard deviation of the ninth, 13th, 14th, and 17th EE values were close to zero. Thus, the corresponding neurons were considered to be unimportant and ignored. Therefore, the ninth, 13th, 14th, and 17th neurons of the hidden layer were deleted. Additionally, as shown in Figure 9, the 22nd input had strong nonlinearity or interaction with other inputs. Hence, the previous 360 samples (counting back from the current query sample) were used as training samples to retrain the pruned sub-model. The simulation results after pruning are shown in  Following the pruning of the sub-model structure, the raining stopped after the seventh iteration. The training and test RMSEs were 0.071579 and 0.081624, respectively. Thus, the training speed and model accuracy were both improved.
The simulation results of the AFL-SLFN hybrid model with the multiscale modification strategy are illustrated in Figure 14, where small-scale and medium-scale modification was performed by simulating online prediction. The prediction errors are depicted in Figure 15. The distribution of the model errors in all the sliding windows and the modification are presented in Table 3.
As indicated by Table 3, the sample was in the small-scale modification area in the first sliding window, and the local sample set was constructed using the improved K-nearest neighbor just-in-time learning algorithm to predict the 10th test sample. In the fifth sliding window, the error probability of each region was below the threshold value; thus, modification was not necessary, and the initial model was used to predict the 14th test sample. In the seventh sliding window, the sample was in the medium-scale modification area, and the sub-model with the largest weight was selected. The Morris pruning method based on the improved EE value was used to optimize the sub-model structure. After updating the model, the 16th test sample was predicted. deleted. Additionally, as shown in Figure 9, the 22nd input had strong nonlinearity or interaction with other inputs. Hence, the previous 360 samples (counting back from the current query sample) were used as training samples to retrain the pruned sub-model. The simulation results after pruning are shown in Figures 11-13.
Following the pruning of the sub-model structure, the raining stopped after the seventh iteration. The training and test RMSEs were 0.071579 and 0.081624, respectively. Thus, the training speed and model accuracy were both improved.
The simulation results of the AFL-SLFN hybrid model with the multiscale modification strategy are illustrated in Figure 14, where small-scale and medium-scale modification was performed by simulating online prediction. The prediction errors are depicted in Figure 15. The distribution of the model errors in all the sliding windows and the modification are presented in Table 3.
As indicated by Table 3, the sample was in the small-scale modification area in the first sliding window, and the local sample set was constructed using the improved K-nearest neighbor just-intime learning algorithm to predict the 10th test sample. In the fifth sliding window, the error probability of each region was below the threshold value; thus, modification was not necessary, and the initial model was used to predict the 14th test sample. In the seventh sliding window, the sample was in the medium-scale modification area, and the sub-model with the largest weight was selected. The Morris pruning method based on the improved EE value was used to optimize the sub-model structure. After updating the model, the 16th test sample was predicted.                     In the eighth sliding window, the error probability of each region was below the threshold; thus, modification was not necessary, and the 17th test sample was predicted using the updated model. By analogy, all predictions were finally obtained.
The distributions of the model errors in Figures 5, 7, and 15 are plotted in Figure 16. The statistics of the errors for different modifications, as well as the time consumed (including model modification and prediction), are presented in Table 4. Figure 16a-c present the error distributions under no modification, small-scale modification, and multiscale modification, respectively. The prediction error had a normal distribution, and the normality of the data was verified by a D-normality test. The results indicated that the models with and without modification were valid. As shown in Figure 16c, the prediction-error distribution for the multiscale modification strategy was the most consistent with the normal distribution, and this strategy yielded the highest accuracy.    In the eighth sliding window, the error probability of each region was below the threshold; thus, modification was not necessary, and the 17th test sample was predicted using the updated model. By analogy, all predictions were finally obtained.
The distributions of the model errors in Figures 5, 7, and 15 are plotted in Figure 16. The statistics of the errors for different modifications, as well as the time consumed (including model modification and prediction), are presented in Table 4. Figure 16a-c present the error distributions under no modification, small-scale modification, and multiscale modification, respectively. The prediction error had a normal distribution, and the normality of the data was verified by a D-normality test. The results indicated that the models with and without modification were valid. As shown in Figure 16c, the prediction-error distribution for the multiscale modification strategy was the most consistent with the normal distribution, and this strategy yielded the highest accuracy. In the eighth sliding window, the error probability of each region was below the threshold; thus, modification was not necessary, and the 17th test sample was predicted using the updated model. By analogy, all predictions were finally obtained.
The distributions of the model errors in Figures 5, 7 and 15 are plotted in Figure 16. The statistics of the errors for different modifications, as well as the time consumed (including model modification and prediction), are presented in Table 4. Figure 16a-c present the error distributions under no modification, small-scale modification, and multiscale modification, respectively. The prediction error had a normal distribution, and the normality of the data was verified by a D-normality test. The results indicated that the models with and without modification were valid. As shown in Figure 16c, the prediction-error distribution for the multiscale modification strategy was the most consistent with the normal distribution, and this strategy yielded the highest accuracy.  Table 4 indicate that the hybrid model with only small-scale modification improved the model accuracy significantly. However, the model with multiscale modification tracked the process dynamics with higher accuracy than that with only small-scale modification. In the online modification process, the improved just-in-time local modeling method had to reselect the sample, the medium-scale modification process required pruning, and both of them had to retrain the model. Thus, the computation time increased for the multiscale modification. However, owing to the high speed of the ELM, the runtime of the prediction model did not change significantly. Thus, the model had stronger dynamic adaptability and higher prediction accuracy under the joint action of the small-scale and medium-scale modification strategy and could predict the flotation grade accurately in a long and stable manner. Table 4 indicate that the hybrid model with only small-scale modification improved the model accuracy significantly. However, the model with multiscale modification tracked the process dynamics with higher accuracy than that with only small-scale modification. In the online modification process, the improved just-in-time local modeling method had to reselect the sample, the medium-scale modification process required pruning, and both of them had to retrain the model. Thus, the computation time increased for the multiscale modification. However, owing to the high speed of the ELM, the runtime of the prediction model did not change significantly. Thus, the model had stronger dynamic adaptability and higher prediction accuracy under the joint action of the small-scale and medium-scale modification strategy and could predict the flotation grade accurately in a long and stable manner.

Model Comprasion
To better demonstrate the capability of the proposed model, its performance was compared with that of state-of-the art algorithms, including support vector regression (SVR), the ELM, the online sequential ELM (OS-ELM) [44], the weighted ELM [45], and the online recurrent ELM (OR-ELM) [46]. In the following experiments, we used the sigmoid function as the activation function of the basic ELM. Grid searches of C (tradeoff constant) on {21, 22, 23, …, 215, 216} for the weighted ELM and of L (hidden-layer nodes) on {5, 10, 15, 20, 25, …, 50} were performed to identify the optimal values for all the models. For the SVR model, the RBF kernel was used. The kernel-function parameters were determined using the approach described in Reference [47]. The regularization parameter was set as four, and the kernel-function parameter was set as 1.0. The prediction results of the proposed model and the other models are illustrated in Figures 14 and 17. Table 5 presents the prediction performances of all the models according to the following performance metrics: the RMSE, MRE, R 2 , and Time. Clearly, the AFL-SLFN hybrid model with modification achieved the best outcomes: RMSE = 0.0845, MRE = 0.0361, and R 2 = 0.8748. Regarding the runtime, because the proposed method involved a dynamic online modification process, the overall prediction time was longer than that of the static model, but the difference was negligible for the 8-h tailings grade test time.

Frequency
Frequency Frequency

Model Comprasion
To better demonstrate the capability of the proposed model, its performance was compared with that of state-of-the art algorithms, including support vector regression (SVR), the ELM, the online sequential ELM (OS-ELM) [44], the weighted ELM [45], and the online recurrent ELM (OR-ELM) [46]. In the following experiments, we used the sigmoid function as the activation function of the basic ELM. Grid searches of C (tradeoff constant) on {21, 22, 23, . . . , 215, 216} for the weighted ELM and of L (hidden-layer nodes) on {5, 10, 15, 20, 25, . . . , 50} were performed to identify the optimal values for all the models. For the SVR model, the RBF kernel was used. The kernel-function parameters were determined using the approach described in Reference [47]. The regularization parameter was set as four, and the kernel-function parameter was set as 1.0. The prediction results of the proposed model and the other models are illustrated in Figures 14 and 17. Table 5 presents the prediction performances of all the models according to the following performance metrics: the RMSE, MRE, R 2 , and Time. Clearly, the AFL-SLFN hybrid model with modification achieved the best outcomes: RMSE = 0.0845, MRE = 0.0361, and R 2 = 0.8748. Regarding the runtime, because the proposed method involved a dynamic online modification process, the overall prediction time was longer than that of the static model, but the difference was negligible for the 8-h tailings grade test time. Table 5. Comparison of different models for tailings grade prediction. SVR-support vector regression; ELM-extreme learning machine; OS-online sequential; OR-online recurrent; AFL-SLFN-single-layer feedforward neural network with activation-function learning.

Conclusions
An adaptive weighted hybrid intelligent modeling method with a multiscale online modification strategy for the prediction of industrial-process indices was proposed and verified. The hybrid

Conclusions
An adaptive weighted hybrid intelligent modeling method with a multiscale online modification strategy for the prediction of industrial-process indices was proposed and verified. The hybrid modeling method is based on an SLFN using an ELM and activation-function learning, where different combinations of base functions are used as activation functions. Thus, the network parameters are trained quickly, and optimization of the structure is convenient. Considering the model mismatch caused by the process dynamics and the instability of the feed properties in industrial processes, a multiscale modification strategy for online estimation was proposed. In this strategy, an improved just-in-time method is used for local modeling, i.e., small-scale modification. The weight distribution of the Euclidean distance and the cosine information do not need to be considered, and the reliability of the local modeling dataset is enhanced. An improved EE-based Morris pruning method is used for optimizing the sub-model parameters and structure, i.e., medium-scale modification. Here, the mapping range and the distribution of input variables can be generalized; thus, the model structure can be optimized conveniently. The method was compared with other state-of-the-art methods via a simulation using preprocessed industrial data. The results indicated that the proposed method can achieve higher accuracy and better adaptability. Meanwhile, due to the model, the modification was done by simulating continuous industrial production, and it can be concluded that the generalization of the model with modification is reasonable.
In this study, only pruning was used for the modification of the network structure. However, sometimes, neurons must be added to the networks. Additionally, adaptively increasing or decreasing the number of input variables and neurons should be investigated for achieving the optimal network structure. Thus, future work will continue to focus on optimizing the network structure.