Elliot and Symmetric Elliot Extreme Learning Machines for Gaussian Noisy Industrial Thermal Modelling

: This research proposes an Elliot-based Extreme Learning Machine approach for industrial thermal processes regression. The main contribution of this paper is to propose an Extreme Learning Machine model with Elliot and Symmetric Elliot activation functions that will look for the ﬁttest number of neurons in the hidden layer. The methodological proposal is tested on an industrial thermal drying process. The thermal drying process is relevant in many industrial processes such as the food industry, biofuels production, detergents and dyes in powder production, pharmaceutical industry, reprography applications, textile industries and others. The methodological proposal of this paper outperforms the following techniques: Linear Regression, k -Nearest Neighbours regression, Regression Trees, Random Forest and Support Vector Regression. In addition, all the experiments have been benchmarked using four error measurements (MAE, MSE, MEADE, R 2 ).


Introduction
The most relevant aspects of the drying technology are the mathematical modelling of the process and the equipment [1]. The modelling of drying processes consists of the design of a set of equations that describes the modelled system as accurately as possible. Simulation models are needed in the design, construction and operation of drying systems. Many authors in the scientific literature have focused their efforts on the modelling of the convective drying kinetics for different products such as vegetables, fruits and agro-based products like prunes [2], carrots [3], bananas [4], potatoes and apples [5], olive cakes [6] and mint leaves [7].
In all the above-mentioned works, the falling rate period was the most relevant stage, and Fick's Law of diffusion was used to describe the drying process. Semi-theoretical and empirical models, which consider only external resistance to moisture transfer between product and air, are the most widely used [8]. The mathematical models for the convective drying processes were proposed using nonlinear regression together with a multiple regression analysis.
We propose Extreme Learning Machines (ELMs). An ELM is a single hidden layer feedforward Neural Networks where the weights of the hidden neurons are randomly assigned. The authors applied the proposal to the thermal industry drying process. One of the most common applications of Neural Networks is related to regression tasks [9]. In this paper, we design an Extreme Learning Machine approach with a dynamic number of nodes in the hidden layer for forecasting industrial drying processes. One contribution of this study is to propose an Extreme Learning Machine approach with a dynamic hidden layer. The model searches for the best number of neurons in the hidden layer.
Linear Regression [10], k-Nearest Neighbours [11], Regression Trees [12], Random Forest [13] and Support Vector Regression [14] have been applied to regression tasks. These techniques can be considered state-of-the-art algorithms and this proposal is benchmarked with them.
Our approach is tested on the industrial thermal drying process. The moisture loss of a drying process is a complex problem with associated risks and uncertainties because it involves two simultaneous processes, transfer of heat and transfer of mass, with the possible appearance of physical, chemical and even biological transformation processes.
These processes can change the characteristics of the product to dry and therefore generate mechanisms of heat and mass transfer. Moreover, experimental tests' drying while maintaining the essential external variables (temperature, humidity, rate and direction of airflow, the physical form of the solid and so on) are necessary for forecasting moisture loss and dryer design. The relevance of this proposal is to achieve an improved dryer, while foreseeing the behaviour from different magnitudes of external variables without having to try experimental drying tests again.
The remainder of this paper is organised as follows: Section 2 provides a theoretical background knowledge on industrial drying processes and modelling approaches. Section 3 focuses on the Extreme Learning Machine technique, including the proposed activation functions. Section 4 details the experimental approach and Section 5 proposes a methodology with special focus on the error measurements and the state-of-the-art techniques. Section 6 shows the results of the experiments and the last section has the conclusions of the paper.

Industrial Drying Processes
The industrial processing of tomato leads to a great variety of output products: concentrated tomato, pizza sauce, tomato powder, peeled tomato (either whole or diced), ketchup, tomato sauce seasoned with vinegar, sugar, salt and some spices, etc. The tomato processing generates two different by-products: one is the product that results from peeling tomatoes and removing the seeds and the other is the sludge from wastewater treatment plants.
Those residues have a high moisture content. A thermal drying operation of industrial tomato by-products is highly recommended, so that they can be used for livestock feed production, for lycopene extraction or for boiler fuel as pellets (peels-seeds) and soil amendment (sludge).
Drying is a complex process involving simultaneous heat and mass transfer and it can result in important changes in the physical properties of the product. The operation of drying converts a solid, semisolid or liquid product into a solid product. In the process of convective drying, heat is necessary to evaporate moisture from the product and a flow of air is needed to remove the moisture.
Four phenomena of transfer occur during the process: heat transfer from air to the product; heat transfer from solid-air interface to the inside of the product; mass transfer through the product, either by diffusion or capillarity and mass transfer from solid-air interface into surrounding air.
The parameters governing the velocity of these phenomena are what defines the drying velocity. Thus, drying characteristics of the products being dried, and simulation models are needed in the design, construction and operation of drying systems [15]. The above-mentioned residues from industrial processing of tomato are used in the experiments performed in this work.

Modelling Approaches
Prediction and modelling of solar thermal have been addressed in literature. Broadly speaking, a couple of approaches are available for modelling thermal systems; the first one is built upon the analytical view of the thermodynamic processes within the system, the second is a promising field based on machine learning techniques. This research is focused on the second approach.
The computation of the performance of a solar thermal system with an analytical approach is extremely complex. Normally, modelling physical phenomena with computational models take a great amount of time and computational power. Karim et al. [16] propose a numerical modelling methodology applied to a v-groove solar collector. The resulting method can forecast both the air temperature at any place of the solar collector and the efficiency of the system. Notton et al. [17] develop a mix of finite differences and electrical analogy models to compute the outlet temperature of a solar thermal collector integrated in a building. Dowson et al. [18] applied a numerical modelling technique to an aerogel covered, integrated, solar air collector for computing outlet temperatures.
Former modelling approaches get accurate computations of the solar thermal energy performance. They also need highly complex mathematical models that use thermodynamic principles.
Overall, analytical models such as those mentioned above are computationally intensive, and, normally, exhaustive exploration of the search space of the parameters required for online control is not possible. In this sense, more simple, generic modelling approaches as they are more efficient computationally to predict key variables. This paper proposes an easy-to-use, accurate and efficient computational modelling approach such as an Extreme Learning Machine. The next section focuses on the theoretical details of this computational technique.

ELM Fundamentals
Extreme Learning Machine (ELM) is an emerging machine learning technique with and outstanding generalisation performance and extremely fast learning speed [19]. Many scholars have concentrated their research in the field of classification of unstructured data and incremental learning for data clustering, power consumption and sustainability. De and Gao [20] applied Particle Swarm to generate the input layer weights, and an ELM algorithm to determine the hidden layer threshold. Madhusudhanan et al. [21] propose a framework which clusters the metadata and assigns a label to each cluster and then generates a model with an ELM that learns for each batch of the arrival data. Salerno and Rabbeni [22] apply ELMs for power disaggregation aimed at determining appliance-by-appliance electricity consumption, leveraging upon a single meter only, which measures the entire power demand.
ELMs [23] are neural networks with a single hidden layer and a novel feature. The weights of the hidden neurons (from input weights and biases) are randomly assigned ( Figure 1). The weights of the edges between the hidden layer and the output layer could be determined according to the Moore-Penrose generalised inverse and the smallest norm least-squares solution of general linear system, without any learning iteration [24][25][26].
Let Ψ a training dataset where c is the number of instances, x i = [x i1 , . . . , x in ] T with n attributes as inputs and o = [o 1 , . . . , o m ] T the labels as outputs. The input and hidden layer are connected by a n × k weight matrix where the ith row vector is denoted as in Each neuron of the hidden layer has a bias represented by a row vector k × 1 denoted as β = [β 1 , . . . , β k ]. The hidden and the output layers are connected by a weight matrix k × m where the jth row vector is denoted as out The output of the ELM is a row vector denoted as O ∈ m . The output of the ith neuron is computed as follows: where f (·) is the transformation function and in j · x i is the inner product between the weights of the connections from the input to hidden layer and the values of the input layer. Equation (1) can be written in a more compact way as follows: where H ∈ n×k is the following adjacency matrix: In ELM, the weights of the connections between the input and the hidden layers and the hidden biases are chosen randomly, while the output weights can be calculated analytically as where H † = (H T H) −1 H T is the Moore-Penrose generalised inverse of the matrix H [27,28].

Activation Functions
The activation function is one of the essential parameters in ELMs as shown in Figure 2. These functions transform the output of the presynaptic aggregation function and generate the postsynaptic output.
It is possible to use different activation functions. The most common ones are the unipolar sigmoid (Equation (5a) and Figure 3) and hyperbolic tangent (Equation (5b) and Figure 3): where λ is the slope of the activation function.  If the activation function is the unipolar sigmoid, according to Equation (5a), then Equation (3) can be written as follows: If the activation function is the hyperbolic tangent, according to Equation (5b), then Equation (3) can be written as follows: The authors propose Elliot and Elliot Symmetric activation functions [29] for Extreme Learning Machines as alternatives to the unipolar sigmoid and hyperbolic tangent activation functions. The Elliot activation function maps the output in a range of [0, 1] and it is a faster approximation of the unipolar sigmoid. The Elliot Symmetric activation function maps the output in a range of [−1, +1] and it is a faster approximation of the hyperbolic tangent [30]. The Elliott and Elliot Symmetric activation functions are computed as shown in Equations (8a) and (8b): where λ is the slope of the function. If the activation function is Elliot, according to Equation (8a), then Equation (3) can be written as follows: Moreover, if the activation function is Elliot Symmetric, according to Equation (8a), then Equation (3) can be written as follows: The shapes of the four detailed activation functions with λ = 1.0 are shown in Figure 3. Elliot and Symmetric Elliot activation functions are algebraic activation functions used for Gaussian shape functions' approximation [29,31]. Both are sigmoid-like functions such as unipolar sigmoid and hyperbolic tangent but faster.
One of the aims of this paper is to test the accuracy of these activation functions for a real-world industrial regression problem. In addition, these functions are benchmarked with the most common activation functions. It is for this reason that an experimental approach has been applied.

Drying Equipment and Experimental Procedure
A convective dryer was used as experimental equipment for collecting the kinetic data for the study of convective drying. The dryer was located inside a laboratory room (the surrounded air temperature was between 15 • C and 25 • C, and 60% maximum relative humidity). Figure 4 illustrates the experimental set-up used in the determination of drying curves. The experimental set-up consists of a fan, a resistance battery and a heating control system, air-duct, tray and measurement instruments. The air fan has a maximum volumetric flow rate of 700 m 3 /h and a power of 33 W. The air flow was controlled by a revolution speed regulator. The air flow rate was measured with a flow sensor (Schmidt SS 20.260 (SCHMIDT Technology GmbH, St. George, Germany), measurement range 0.2 m/s to 2.5 m/s, and ±5% maximum deviation) during the experiments. The heating system consisted of a seven-resistance battery (500 W each) placed inside the duct. The heating control system-a stepped switch to control heating power-allows power supply to range from 500 W to 3500 W. Total dimensions of the equipment are 2540 × 750 × 1350 mm, and those of the air-duct 2540 × 390 × 390 mm (350 × 350 mm inner dimensions). The tray was constructed from AlMg 3 , with dimensions 400 × 300 × 15 mm. Two sensors were installed in the air-duct (Galltec + Mela TFK80J (Galltec Mess and MELA Sensortechnik GmbH, Bondorf, Germany), measuring range from −10 • C to 90 • C and accuracy ±0.2 K) in order to control air temperature. The main features of the electronic balance used in this work (Kern & Sohn GmbH KB10000-1, KERN and SOHN GmbH, Balingen, Germany) are the following: 8000 g maximum load and 0.1 g resolution. The convective dryer was located inside a laboratory room so that it could work in the appropriate operating conditions: surrounding temperature ranging from 15 • C to 25 • C, and 60% of maximum relative humidity. Different products can be dried with this system by adjusting drying parameters. According to the requirements of the research, these parameters could change in an appropriate range [32]. At the beginning of each experiment, the dryer was allowed to reach steady state at the desired airflow rate and inlet air temperature. When steady state conditions had been attained, the sample was introduced inside the drying chamber. The samples were placed on the tray as a thin layer and sample thickness was kept constant for each experiment (10 mm).
The drying air temperature fluctuated between 25 • C and 50 • C. Drying experiments were conducted at three different air velocities (0.9, 1.0 and 1.3 m/s) according to the samples. Moisture loss was recorded within 5 min intervals during the process. The experiments ended when the moisture content in the samples was reduced to, approximately 10% by weight (wet basis). This moisture content value is appropriate for the case of peels and seeds and it was adopted as an equal reference for the case of sludge. Initial moisture content of samples was determined separately before start of the experiments.
The moisture content was computed using the following expression: where M is the moisture content ( g water / g dry-matter ), W w is the wet weight and W d is the dry mass. The values of W d were measured by heating the product samples at 110 • C in an oven, for two hours, and then measuring their weight on an analytical balance. All drying experiments were performed in triplicate, and the arithmetic means of the results obtained in each case were used in the drying curves. The values for the moisture content obtained were converted into the moisture ratio, MR. The dimensionless moisture ratio was calculated using the following expression which was simplified to where M t and M0 are the moisture contents at any given time and the initial moisture content, respectively, since the values of the equilibrium moisture content, M e , are relatively small compared to M t or M0, and because of the continuous fluctuation of the relative humidity of the air during drying process. Experimental data from the different drying runs were expressed as moisture ratio versus drying time; these results for the case that concerns (tomato sludge and peels-seeds) are shown in [32,33].

Proposed Methodology
The goal of the methodology is forecasting the moisture of a noisy thermal process. As far as the authors know, there are no previous works using ELMs for thermal modelling. In addition, the application of a computational approach to noisy thermal modelling is another novelty of this research.
The proposed methodology for forecasting industrial drying processes is detailed in Algorithm 1.  Figure 5 shows the ELM model for industrial drying processes regression. The input layer is composed of three neurons. Each one represents a relevant concept for industrial drying (velocity, temperature and time). The output layer is composed by just one neuron modelling the predicted moisture loss. Note that the conventional laboratory method of measuring moisture in solid or semi-solid materials is loss on drying.
The proposed Elliot-ELM and Elliot Symmetric-ELM are benchmarked with the state-of-the-art techniques and the commonly used Tanh-ELM and Sigmoid-ELM.

Error Measurements
The performance of all the regression techniques are measured using four different error measurements for a more comprehensive benchmarking. The following sections detail the error measurements.

Mean Absolute Error
The Mean Absolute Error (MAE) computes a measurement related to the expected value of the absolute error loss or l1-norm loss. The MAE uses the same scale as the dataset and is as follows: whereŷ i is the forecasted value of the ith sample, and y i is the real-world value; then, the MAE estimated over n samples. The MAE is a measure of the quality of a regressor. It is always positive and the values closest to zero are better.

Mean Squared Error
The Mean Squared Error (MSE) computes a measurement related to the expected value of the squared (quadratic) error loss and is computed as follows: whereŷ i and y i means the same as in MAE. The MSE is a measure of the quality of each regressor. It is always positive and the values closest to zero are better.

Median Absolute Error
An interesting property of the Median Absolute Error (MEDAE) is that it is very robust to outliers. It is always positive and the values closest to zero are better. The loss is computed as the median of all absolute differences between the real value and the forecasted value: MEDAE(y,ŷ) = median(|y 1 −ŷ 1 |, . . . , |y n −ŷ n |).

Coefficient of Determination R 2
The coefficient of determination or R 2 measurements indicates the proportion of the variance in the dependent variable that is predictable from the independent variable. R 2 can be negative and the best possible score is 1.0.
An R 2 = 0.0 means that the dependent variable cannot be predicted from the independent variable. An R 2 = 1.0 means that the dependent variable can be predicted without error from the independent variable. If 0.0 < R 2 < 1.0, it indicates the extent to which the dependent variable is predictable. An R 2 = 0.4 means that 40% of the variance in the dependent variables is predictable from the dependent variable: whereȳ = 1 n · ∑ n i=1 y i is the average of the real-world values.

State-of-the-Art Regression Techniques
To evaluate the proposed approach, we performed two kinds of experiments on industrial drying processes. The first experiment is the regression of the moisture from accurate source data and the second uses noisy data. The error measurements achieved by the ELMs in both experiments are compared to the state-of-the-art forecasting techniques.
Those techniques include the following: Linear regression, k-Nearest Neighbours, Regression trees, Random Forest and Support Vector Regression. All the techniques used for comparative purposes were tested under the same conditions.
A brief overview of these algorithms is given in the following subsections.

Linear Regression
Linear regression evaluates the linear dependency of variables. The linear regression technique explains the behaviour of the unknown value y in terms of known quantities x, parameters β and random noise . Linear regression can be expressed as follows: whereŷ is the dependent variable (forecasted value), y is a vector of independent variables (y = {y i } n i=1 ), β is a row vector of parameters and is a random error [34].

k-Nearest Neighbours Regression
The k-Nearest Neighbours (k-NN) is a non-parametric method that can be used for both classification and regression tasks [35]. In the k-NN regression whenever there is a new point to forecast, its k-Nearest Neighbours are selected from the training data. Then, the forecasted value can be computed as the average of its k-Nearest Neighbours.
The k-NN algorithm estimates the response of a testing point x t as the weighted average of the responses of the k-closest training points, y = {y} k i=1 , in the neighbourhood of x t . k-NN regression can be expressed as follows: where Φ k (x) is the set of the k-Nearest Neighbours indices to x given the training samples T.

Regression Trees
Machine learning algorithms generating decision trees have been adapted by Kenesei and Abonyi [36] for numeric attributes forecasting. Regression Trees (RTs) are a special kind of decision trees that can be applied to solve regression problems. Decision trees are not as popular for regression as for classification tasks, but they are highly competitive with common machine learning techniques [37].
The leaves of a RT are set as regression models. RTs are built from the split values of predictive attributes. Then, a regression model is generated for each leaf. After that, the RT is pruning the leaves for decreasing the error up to the optimum model [38].

Random Forest
The Random Forest (RF) algorithm [39] is based on RTs. RFs can predict numeric values based on a group (forest) of RTs. This algorithm includes bagging and bootstrapping techniques.
An RF is an ensemble of RTs which inserts an extra layer of randomness to bagging. Moreover, RFs change the way of building regression trees. Each tree is built with a different bootstrap data sample. In RFs, each node is divided using the best predictor among a subset of randomly predictors chosen at that node [39].

Support Vector Regression
Support Vector Machines (SVMs) are statistical learning tools looking for the structural risk minimisation inductive principle to get a good generalisation on a finite number of learning patterns. SVMs were originally proposed by Vapnik [40] for classification tasks. Afterwards, SVMs were extended to regression problems. Support Vector Regression (SVR) is an extension [41] of large margin kernel methods for regression. The regression analysis looks for a mapping function which approximates from training datasets to labels. It is needed to measure the importance of the prediction accuracy, as residuals are unavoidable. The loss function measures the prediction accuracy. Training SVR can be computed as a convex optimisation problem where ·, · is the dot product, and is the precision. SVR uses kernels to transform the input data space in feature space [42], as n φ(·) − − → h , where h > n and φ(·) is the kernel function. For both experiments, the authors applied several kernels and select the best result of all. The kernels are the following: • Linear. It the simplest kernel of all.
• Polynomial. It is well suited when all the training data are normalised • Hyperbolic tangent. It is a continuous function widely used in neural networks as a transformation function where λ is the slope and c a constant. • Radial Basis Function. It is a continuous function whose result is just depending on the distance to the origin. The Gaussian RBF is as follows:

Results
Section 6.1 details the experiment focused on the regression of the moisture from accurate (no noise) data and Section 6.2 explains the regression of the moisture from noisy data.
Furthermore, the experiment is done with two tomato derivatives: peels and seeds and sludge. They showed an initial moisture content of 66% by weight (wet basis). In this case, drying experiments were conducted respectively at 25 • C, 35 • C and 45 • C drying air temperatures and at 1.0 m/s and 1.3 m/s air velocities according to the methodology indicated.
Moreover, the samples of fresh sludge were obtained from the wastewater treatment plant of a local tomato industry located in the province of Badajoz (in the Southwest area of Spain). These samples showed an initial moisture content of 63% by weight (wet basis). This value was calculated as indicated by the Norm UNE 32001 [43]. Drying experiments were conducted at 30 • C, 40 • C and 50 • C drying air temperatures and at 0.9 m/s and 1.3 m/s air velocities according to the explained methodology.

No Noise
The results of the peels and seeds experiment without noise are detailed in Table 1. ELMs achieved better results than the state-of-the-art techniques. Elliot-ELM got the lowest error measurement for MAE, Elliot Symmetric-ELM the lowest MSE and the highest one for R 2 score. Tanh-ELM got the lowest MEDAE. According to all the error measurements, ELM outperforms the tested forecasting techniques. The second-best technique for MAE and MSE is Random Forest, for MEDAE is k-NN and for R 2 is Regression Trees. Within the ELMs, Elliot-ELM got the lowest MAE, Elliot Symmetric-ELM the lowest MSE and the highest R 2 , but the lowest MEDAE is achieved by the Tanh-ELM. The results of the sludge experiment without noise are detailed in Table 2. ELMs achieved better results than the state-of-the-art techniques. Sigmoid-ELM got the lowest for all the error measurements, just for R 2 the other ELMs (Tanh, Elliot and Symmetric Elliot) got the same score than the winner. According to all the error measurements, ELM outperforms again the tested forecasting techniques. The second-best technique for MAE is Linear Regression, for MSE and R 2 is k-NN, for MEDAE is Random Forest.

Noise
Real-world data are affected by several issues where noise is a critical factor [44]. In real-world applications, it is not possible to avoid the presence of noise in data. The source of noise relevant for this paper is the attribute noise [45]. The goal of this experimental approach is to test the accuracy of the proposed methodology in the presence of real-world noise.
In this work, Gauss distribution is used to include noise in the data. The Gaussian noise is statistical noise with a probability density function equal to that of the Gaussian distribution. Gaussian noise is therefore noise whose frequency spectrum (after a Fourier transform) has a bell-shaped curve.
We compute the Gaussian noise with the following probability density where µ is the mean and σ the standard deviation. We are going to use three amplitudes of noise in the training data. They are computed with the same mean µ = 0.0 and different standard deviations' values (low: σ = 0.1, middle: σ = 0.2 and high: σ = 0.3). Figure 6 shows the Gaussian probability densities applied for generating Gaussian noise. Figure 7 shows the three Gaussian noises added to training data for each sample.
Regarding this experiment, data gathering in laboratory environments does not include noise, since it is normally clean, the sensors and their connections are very accessible and there is specialised staff in charge of their care. However, in industrial working conditions, the situation changes. In the environment of the sensors, there may be dust, grease and other fouling agents; these sensors are not usually located in points where they can be easily checked and it is not usual to have staff in charge of its continuous review.
In the industrial drying processes in which we are interested, it is usual for distorted data records to invalidate the measurements and therefore the process control should be less accurate. For these reasons, it is worth testing the proposal with noisy data, both Gaussian and uncorrelated uniform.  The results of the peels and seeds experiment with Gaussian noise (σ = 0.1) are detailed in Table 3. ELMs achieved better results than the state-of-the-art techniques. Tanh-ELM got the lowest error measurement for MAE, Elliot Symmetric-ELM the lowest scores for MSE and MEDAE and the highest one for R 2 score. According to all the error measurements, ELM outperforms the tested forecasting techniques and Elliot-ELM is the best within the ELM flavours. The second-best technique for MAE and MEDAE is k-NN, for MSE and R 2 is Random Forest. The results of the sludge experiment with noise (σ = 0.1) are detailed in Table 4. ELMs achieved better results than the state-of-the-art techniques. Elliot-ELM got the lowest error measurement for MAE, MEDAE and the highest one for R 2 . Elliot Symmetric-ELM got the lowest scores for MSE. According to all the error measurements, ELM outperforms the tested forecasting techniques and Elliot-ELM is the best within the ELM flavours. The second-best technique for MAE, MSE and MEDAE is k-NN, for R 2 is Random Forest. The results of the peels and seeds experiment with noise (σ = 0.2) are detailed in Table 5. ELMs achieved better results than the state-of-the-art techniques. Elliot-ELM got the lowest error measurement for all the error measurements. According to all the error measurements, ELM outperforms the tested forecasting techniques and Elliot-ELM is the best within the ELM flavours. The second-best technique for MAE, MSE and R 2 is Random Forest and for MEDAE is k-NN. The results of the sludge experiment with noise (σ = 0.2) are detailed in Table 6. ELMs achieved better results than the state-of-the-art techniques. Elliot-ELM got the lowest error measurement for all the error measurements. According to all the error measurements, ELM outperforms the tested forecasting techniques and Elliot-ELM is the best within the ELM flavours. The second-best technique for all the error measurements is Random Forest. The results of the peels and seeds experiment with noise (σ = 0.3) are detailed in Table 7. ELMs achieved better results than the state-of-the-art techniques. Symmetric Elliot-ELM got the lowest error measurement for MAE, MEDAE and the highest one for R 2 . Elliot-ELM got the lowest scores for MEDAE. According to all the error measurements, ELM outperforms the tested forecasting techniques and Elliot-ELM is the best within the ELM flavours. The second-best technique for MAE, MSE and MEDAE is k-NN, for R 2 is Random Forest. The results of the sludge experiment with noise (σ = 0.3) are detailed in Table 8. ELMs achieved better results than the state-of-the-art techniques. Elliot-ELM got the lowest error measurement for MSE and the highest one for R 2 . Tanh-ELM got the lowest scores for MAE and MEDAE. According to all the error measurements, ELM outperforms the tested forecasting techniques and the second-best technique for MAE, MSE and MEDAE and R 2 is Random Forest.

Conclusions
In this paper, we proposed Elliot and Symmetric Elliot activation functions for Extreme Learning Machine for industrial drying processes' regression. The authors compare several state-of-the-art techniques and the Elliot-based ELM proposal. Moreover, a range of numbers of neurons in the hidden layer is checked.
The authors applied the proposal to the thermal industry drying process. As far as we know, this is a novel application of Extreme Learning Machines. In addition, in this study, the authors utilise four error measurements for assuring the goodness of the proposal.
According to the results of the experiments, our ELM's proposal outperforms the state-of-the-art algorithms and Elliot and Symmetric-Elliot ELMs are competitive with the conventional logistic activation functions. Elliot and Symmetric-Elliot ELMs obtained the best performances in most cases.
Furthermore, the experiments were also done with inclusion of three levels of Gaussian noise. Elliot-based ELMs were the best proposals in most of the experiments according to the four error measurements.