1. Introduction
In the current decade, the use of aerospace technology in passenger transport and logistics has increased significantly. According to [
1], the global market for aerospace forgings was estimated at USD 30 billion by 2022, and this figure is expected to rise to more than USD 50 billion by the end of 2035, with a compound annual growth rate of 8%. However, one of the biggest challenges facing this industry is the high cost of maintenance, refurbishment, and overhaul (MRO). According to [
2], MRO costs were estimated at USD 62 billion in 2021, representing approximately 11.2% of total airline operating costs, with engines being the largest cost segment at 37% of these costs.
Remaining useful life (RUL) prognostics is one of the very effective strategies widely used to optimise MRO operations. The main objective of prognostics is to accurately predict how long an asset can continue to perform its intended function [
3,
4]. Such prediction enables an MRO operation to be performed in accordance with the actual condition of the component, which, in turn, can save costs on unnecessary MRO operations. Indeed, several studies, e.g., [
3,
4], report that the high fluctuation of usage patterns and operating conditions makes the scheduled maintenance inaccurate, while [
5] estimates that a total of USD 3 billion is wasted on the no fault found (NFF) inventory. In addition to the direct monetary savings that can be achieved through prognostics, a reduction in MRO activities can also reduce the human errors that occur during this process. Such a reduction is not insignificant but accounts for about 80% of MRO errors [
6,
7]. Another key advantage of prognosis-based MRO optimisation is its ability to extend component life and reduce maintenance delay [
8,
9]. A study conducted by the authors of [
8] on a fleet of 100 long-haul air-craft engines shows that prognosis-based MRO can prolong engine life by about 30–40%, while [
9] shows that a 20% reduction in maintenance time can be achieved. Besides the above benefits, some studies show the benefits of using prognostics in terms of improving spare parts supply chains [
10], increasing fleet availability [
11], and reducing collateral damage during repairs and on-ground aviation [
12].
The significant benefits that RUL prognosis can offer to the aviation sector have sparked the interest of researchers to use the most advanced modelling approaches to optimise MRO operations. Many of the pioneering models use analytical approaches to develop mathematical models capable of characterising the degradation behaviour of physical systems, e.g., [
13,
14,
15,
16,
17,
18]. Although these models demonstrate their feasibility, the need to account for various interactions within the modelled system and its operating conditions can pose challenges to their solvability and in some cases make them intractable [
19]. In response to these challenges, the principles of data-driven models have been utilised as contemporary modelling techniques. Data-driven models use external observations generated by the system to make predictions about its degradation status. This, in turn, makes data-driven models well suited to the continuous evolution that modern aviation systems undergo [
20]. Among the various approaches used in data-driven models, the deep artificial neural network (DANN) has emerged as the mainstream architecture. Simply put, a DANN [
21] is an acyclic graph consisting of computational units with learnable parameters organised into layers and connected by an objective function. During training, the model is presented with a set of observations along with their corresponding desired outcomes. The model then attempts to adjust its learnable parameters so that its output is as close as possible to the desired outcomes. This adjustment process is usually facilitated by the backpropagation algorithm, which is an error correction learning strategy. In backpropagation, the errors calculated in the output layer are passed backwards to the preceding layers so that each layer adjusts its learnable parameters in response to the errors it produces. While several works using DANNs have achieved remarkable results by tailoring the hypothesis spaces of the model to the dataset, e.g., [
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37], it is important to recognise that there are several reported limitations associated with the DANN architecture. One of the most important limitations is the high vulnerability of DANN models to data and concept drift [
38,
39], the increase in gradient instabilities with the depth of the DANN model [
21,
40], the high sensitivity of DANN to noise and outlines in the datasets [
21,
41], and the high computational cost.
Motivated by the importance of developing a reliable data-driven prognosis model and the shortcomings of existing DANN models, this paper proposes a novel model based on the counter-propagation network (CPN) [
42]. The core approach underlying the CPN is that a combination of unsupervised and supervised learning strategies within the same architecture can improve the learning capability of the model and solve some of the main problems related to backpropagation. In a CPN, a raw dataset is fed into a self-organisation map (SOM), which is a nonlinear vector quantisation algorithm known for its ability to preserve the topological order of its inputs [
43]. The output of the SOM is then fed into a supervised network based on Grossberg’s learning approach, which is known for its high fidelity and low computational budget. Several works comparing the performance of CPN with that of DANN, e.g., [
44,
45,
46,
47,
48,
49], show the superiority of CPN in terms of higher convergence speed, greater resilience to noise and outliers, better computational efficiency, better interpretability and explainability, and lower sensitivity to concept/data drifts.
However, one of the biggest challenges standing in the way of applying CPN directly to RUL prognosis tasks is the fact that the original CPN was designed to process time-agnostic datasets. In contrast, most datasets that characterise RUL are a collection of multivariate time series. Typically, each instance of such a dataset consists of a series of observations taken from various subcomponents of the assets being monitored. Due to the mutual coupling amongst these subcomponents and the fact that they are exposed to the same operating conditions, the RUL dataset usually exhibits a high degree of correlation and hierarchical dependencies. This work addresses the unique nature of RUL datasets by proposing a novel variant of a SOM map dubbed recursive growing hierarchical SOM (ReGHSOM). ReGHSOM combines the strengths of ReSOM [
50], which was developed to allow traditional SOMs to handle temporal relationships of the dataset through a fixed architecture, and GHSOM [
51], which was developed to allow dynamic evolution of the SOM without considering temporal dependencies. This combination enables the proposed ReGHSOM algorithm to effectively handle high correlations and hierarchical dependencies of multivariate time series datasets. Indeed, ReGHSOM does not impose any constraints or prior assumptions on the architectures of the model, which, in turn, allows ReSOM to deal with different shapes of datasets without having to seek the suitability of the model’s hypothesis spaces for the particular dataset. Another important aspect of the ReGHSOM is its ability to transform nonlinear statistical relationships embedded in multivariate time series observations into a simpler geometric representation that can preserve their topological order. Therefore, latent relationships can be thoroughly visualised and rigorously quantified. Furthermore, feeding the supervised layer with a meaningful and low-dimensional representation of the original dataset not only improves prediction accuracy by reducing the impact of noisy data points, but also enables a reduction in the time required for these predictions by reducing the computational complexity of the model.
The performance of the proposed model was comprehensively evaluated using the commercial modular aero-propulsion system simulation (C-MPASS) dataset [
52]. This dataset was selected for evaluation because it is one of the most commonly used benchmarking datasets in multimodal work, allowing a fair comparison between the results of the proposed model and others. Another important feature of this dataset is that it uses different conditions, fault models, and noise levels to generate the readings. Performing the evaluations under these cases allows us to assess the suitability of the proposed model for dealing with quasi-real datasets. In addition to comparing the results of the model with relevant work, the evaluation of the proposed model also includes its learning ability and its evolution under different subsets of C-MPASS. All evaluations are conducted using standard statistical metrics, including mean absolute error, root mean square error, and score. The results of this evaluation show that the proposed model is able to achieve an average mean square error of 5.24 and an average score of 293 for the C-MPASS dataset, which are better than most of the comparable works.
To summarise, the main contribution of this work is to develop a versatile RUL prognostics model that can dynamically adapt its architecture to the characteristics of the degradation dataset in real-time. This adaptability extends the applicability of the model to entire engines or even specific components without requiring extensive adjustments to the model’s hypothesis spaces. The high prediction accuracy that the proposed model can achieve makes it one of the valuable methods not only in optimising standard MRO operations but also in contemporary non-destructive testing (NDT) from different perspectives. This includes reducing the cost and efforts associated with performing unnecessary NDTs or MROs, as the prediction generated by the model can reveal the status of the system under its actual operational conditions. Moreover, incorporating the readings from different components in the proposed model facilitates predicting the performance of those components that cannot be easily inspected by the NDTs. Another important benefit of the proposed model is this context stems from its high computational feasibility, which, in turn, facilitates incorporating it with other operational processes seamlessly.
The rest of this paper is organised as follows:
Section 2 reviews the most pertinent works presented in the open literature,
Section 3 describes the proposed model,
Section 4 presents the results and discussion, and
Section 5 concludes this paper.
2. Related Works
The development of an effective model that can predict RUL or other related aircraft component degradation metrics is one of the most active research areas that has received much attention due to its key role in saving lives and optimising aviation MRO practices. Most of the work presented in the open literature can be divided into two border groups: physics-based models and data-driven models.
The core concept underlying most of the physical-based models is that the behaviour exhibited by a system during its life cycle can be quantified mathematically. Therefore, the signs of deterioration can be identified simply by interpreting these models in the light of fundamental laws of science and their derivations [
19]. Broadly speaking, most of these models can vary according to several criteria. These include the factors that contribute to degradation (e.g., environmental conditions (e.g., [
13]) and operating conditions (e.g., [
14])); the mechanisms by which degradation occurs (e.g., competitive degradation (e.g., [
15]) and multistage degradation (e.g., [
16])); and the methods used to represent uncertainty in the model (e.g., deterministic (e.g., [
17]) or stochastic approaches (e.g., [
18])). Although physical-based models have the potential to achieve a high level of fidelity, they often entail a significant trade-off between the level of details that go into the models and the solvability of the model. A very detailed model can represent the complexity of the real world but may be difficult to solve, while a simplified model may be easier to work with but may not fully represent real-world scenarios. Another notable limitation of physical-based models is their lack of versatility, as models developed for specific machines or systems cannot be easily applied to other machines.
Data-driven models, on the other hand, are based on the assumption that the degradation characteristics of a system can be determined by analysing the observations generated by that system. This, in turn, makes data-driven models advantageous as they do not require tracking the internal state space of systems or a mathematical representation of the machine. The proliferation of high-precision sensors and rapid advances in the field of deep learning model artificial intelligence further reinforce this trend by facilitating the integration of extensive sensor-derived information for accurate predictions.
Deep artificial neuron networks (DANN) are one of the predominant modelling approaches in this context. The work presented in [
22] proposes a deep learning model for predicting the remaining useful life of aircraft turbofan engines. In developing this model, it was assumed that removing outliners and noisy data points can reduce the time and computational complexity of the model, which, in turn, can lead to a faster learning curve and better prediction readings. Therefore, four preprocessing phases were applied to the raw dataset. In the first phase, a correlation analysis is performed between the RUL values and the sensor trajectories for each sub-dataset. All trajectories whose correlation coefficients are less than 10% are excluded from the subsequent preprocessing phases, while the remaining trajectories are run through a moving median filter with an adaptive time window. In the third and fourth preprocessing phases, Z-score normalisation and an improved piecewise linear degradation model are used. The proposed model uses the LSTM, drop-out, and fully connected architectures to obtain the RUL values, while the iterative grid search technique is used to adjust the hyperparameters of the model (including the number of layers, the number of neurons in each layer, batch size, etc.). In this work, the C-MAPSS dataset is used to evaluate the accuracy of the proposed models. The results of this evaluation show that the prediction metrics vary between the sub-datasets, with the highest achievable root mean square error (
) being 7.78, while the lowest is 17.63. The work presented in [
23] follows the same procedures presented in [
22], but uses different preprocessing techniques. More specifically, this work uses maximum information coefficient theory (MICT) instead of the correlation analysis used in [
22] to determine the degree of association between sensor trajectories and the given RUL in each training subset of C-MAPSS. The processed data are then treated with a technique that combines both the simple moving average method and kernel principal component analysis to smooth the noisy data points and map the remaining data points to a low-dimensional space before feeding them into the deep learning model. The proposed model consists of a series of LSTM layers followed by drop-out and fully connected layers. The results of this work show that the highest RSME value is 9.65 and the worst is 22.21. Following the same modelling approach, the authors of [
24] investigate the impact of different correlation-based filtering methods and feature selection wrapper techniques on the prediction performance using the C-MAPSS benchmark dataset. In this work, the MLP architecture with different number of layers and neurons is used. The results of this work show that the best
value of 44.71 can be achieved when using the evolutionary wrapper selection method with four fully connected layers followed by a drop-out layer and a single layer.
The great success that the convolutional neural network (CNN) has achieved in computer vision and related disciplines has inspired a cadre of scholars to use it to predict RUL. A CNN-based model relies mainly on the ability of this network architecture to extract the salient feature automatically without a need for pre-adjusting. The work presented in [
25], for example, proposed a CNN model consisting of two pairs of convolutional layers, each followed by a pooling layer and a fully connected layer from which the predicted RUL values are derived. A sliding window of length 15 is used to segment the multivariate time series of the raw datasets into smaller units before processing them with the proposed CNN model. The results of this work show that the proposed model performs better compared to the other three models developed using MLP, support vector regression (SVR), and relevance vector regression (RVR). However, the highest
value of 18.4480 reported by the proposed model was not higher than the values reported by comparable works. The work in [
26] is another example that uses a CNN architecture to predict the RUL values of C-MAPSS data. This work aims to reduce the loss of information that results from the change in dimensionality of the dataset when it is processed through convolutional layers. The idea of this work is to use zero-padding convolutional layers for primitive feature extraction and a unit kernel convolutional layer for combining all previously extracted features. Despite the tolerable performance values of the proposed model, the comprehensive evaluation of CNN architectures in the context of RUL prediction concludes that there is a proportional relationship between the number of convolutional layers and the prediction performed, but this advantage is outweighed by the computational budgets and training time.
Besides the above, there are other works that aim to improve predictive performance by integrating different ANN architectures over the same model, using continual learning techniques and federated learning principles. An example of this direction is [
27], which uses three different architectures: (i) CNN to extract the features from the dataset, (ii) convolution block attention module to discriminate the most relevant features and discard the rest of the features extracted by CNN, and (iii) LSTM to reveal the latent relationships between selected features and the predicted RUL. The result of this work shows the ability of the proposed model to achieve an
of 5.50 on the C-MPASS dataset, but there is no further information about if this value is due to the whole dataset or just a part of it. An example of the use of continual learning was presented in [
28]. The basic idea of this work is to use the elastic weight consolidation (EWC) approach to mitigate the negative impact of catastrophic forgetting on prediction performance. Catastrophic forgetting is one of the well-known limitations of deep learning models. It occurs when the model cannot retrieve the knowledge it gained from processing previous samples when a more recent instance is processed. EWC addresses this limitation by regulating the model’s parameter spaces according to the importance of the acquired knowledge. The performance reported in this paper shows that it outperforms other models based on CNN and restricted Boltzmann machine and LSTM architectures. The authors of [
29] propose a federated learning model where the learning tasks are distributed across multiple nodes rather than the exhaustion of a single machine resource by a massive training dataset. The performance evaluation of the proposed model is performed using both weight aggregation algorithms, synchronous and asynchronous, and it is shown that a higher value can be obtained with the proposed model. The work proposed in [
30] provides a new perspective on the development of RUL estimation models by assuming that this estimation can be formulated as a decision-making problem rather than a regression problem, as is the case in other work. In this work, the Markov decision process is used to model the set of observations in the dataset as a linked state space, while deep reinforcement learning is used as a means to identify the best estimation strategy. The work proposed in [
31] attempts to overcome the high complexity of traditional spatio-temporal deep learning by proposing a lightweight operator and using it with the GRU architecture. In this work, it is claimed that the proposed operator is able to extract the relevant information for the given dataset and seamlessly insert it into the following layers of the model. In addition, some recent works such as [
32,
33,
34,
35,
36,
37] have been devoted to improving the prediction performance by incorporating one or more of the above architectures.
3. The Proposed Model
The main contribution of this work is to develop a novel data-driven model that can predict the values of RUL based on counter propagation network (CPN) principles [
42]. This approach was chosen for its robustness in processing large amounts of multivariate data, even when contaminated with noise and outliers. In addition, CPN is known for its effective learning ability and fast convergence. Because of these properties, CPN has been used to solve real-world problems with intricate data structures, including the mapping and interpretation of infrared spectra of compounds [
53], inferring the molecules behave in acidic or basic environments [
54], phylogenetic classification of ribosomal RNA [
55], structural analysis and design [
44].
The CPN framework was introduced by Hecht-Nielsen as a hybrid artificial neural network that seamlessly integrates supervised and unsupervised learning strategies into a single architecture. In the unsupervised learning phase, the self-organising map (SOM) [
43] is used to encode the high-dimensional input data into a low-dimensional space that preserves its topological order, while in the supervised learning phase, the Grossberg network is used to associate the low-dimensional representations generated by the SOM to a set of target outcomes. CPNs can be constructed in two main configurations: a full CPN and a forward-only CPN. A full CPN consists of two input layers, a SOM map, and two Grossberg layers. The two input layers are designed to receive a set of observations and the corresponding target outputs, while the two Grossberg layers are responsible for generating the best possible approximations of these inputs. The SOM layer acts as a mediator that facilitates the transformation between the output and input spaces. In contrast, in a forward CPN, the readings are received by an SOM layer and an approximation is generated by the Grossberg layer. This makes the full CPN suitable for bidirectional function approximation and the forward-only CPN suitable for unidirectional function approximation. Considering that the main objective of the proposed model is to map the set of observations into RUL values, the suitability of the forward-only CPN for this purpose becomes clear. However, since the original forward-only CPN architecture was designed for processing non-sequential datasets, a new form of this architecture is proposed here. In our proposal, the recursive SOM [
50] is combined with the growing hierarchical SOM [
51] architecture to form a novel unsupervised learning model, which we refer to here as recursive growing hierarchical SOM (ReGHSOM), which effectively processes RUL data. To illustrate the proposed model,
Figure 1 shows a high-level abstraction of the different components of this model. As you can see, the multivariate time series of sensor readings are fed into the unsupervised layer (ReGHSOM), which clusters them hierarchically to reflect the different granularity of the dataset. The centre of each cluster, represented in colours (also known as best matching units), is connected to the supervised layer (Grossberg), from which the predicted RUL values are generated. The rest of this section is organised as follows:
Section 3.1 provides a formal description of the RUL prediction and the underlying assumptions used to develop the model.
Section 3.2 and
Section 3.3 provide a detailed description for the ReGHSOM and Grossberg layer, respectively.
3.1. Problem Formulation and Underlying Assumptions
This study considers a collection of multivariant time series, denoted by , representing measurements of different parts of several assets and multiple conditions under which these assets are operate;, the term observations is used here to refer to these measurements and operating conditions collectively. The dimensionality of is defined as where is the width of observations, is a set containing identifiers of all assets accommodated , is the cardinality symbol, and is the number of cycles at which the observations related to arbitrary assets are is monitored. For the sake of generality, it is assumed that the monitoring cycles of different assets are not necessarily congruent, i.e., . Furthermore let be the subset of that contains all observations related to the engine e, this set can be written as , and be the set of length containing the RUL values of all assets. Based on the above, the goal of a data-driven model is to find such a function that accepts the set of observations as input and produces a vector of values that are as close as possible to the real RUL values, denoted here by ; hence, can be expressed as .
A data-driven model aims to derive by applying a learning strategy to a collection of learnable computing units constructed according to a particular hypothesis space. Although there are no golden rules that can be followed in defining the hypothesis space or the learning strategy, here we attempt to discuss the underlying assumptions used in developing the proposed model. First, the use of a type of computing unit that can recognise the temporal structure embedded in time series, which is strongly required due to the fact that most of the observations provided by RUL datasets are time series. Second, the use of a versatile hypothesis space that can be easily adapted to the structure and complexity of the dataset.
3.2. RGHSOM Unsupervised Layer
The self-organising map (SOM) is a type of connectionist system introduced by Kohonen in 1982 [
43] and is, therefore, also referred to as the Kohonen map in some references. SOM was inspired by the mechanism by which cortical maps evolve automatically during growth. Indeed, several research studies on neuronal information processing have shown that interactions between cortex cells in response to a given stimulus are dominated by their lateral spacing. In this process, cells that are better able to interpret a stimulus increase their activation by emitting excitatory signals to their neighbours, while keeping distant cells in suspension by sending inhibitory signals. These interactions lead to self-organisation of cortical maps in a topographically meaningful order.
SOM resembles the self-organising phenomena of the brain described above in that it combines the competitive learning approach [
56,
57] with a spatiotemporal function called the neighbourhood function. In its simplest form, a SOM consists of several artificial neurons arranged in a two-dimensional lattice. Each neuron in the map is connected to all neurons in the input layer by a weighting vector, referred to here as the receptive weighting vector (it is also called the codebook vector or prototype), whose dimension is set according to the number of neurons in the input layer. In addition, each neuron in the SOM map is connected to the other neurons in the same layer by either an excitatory or an inhibitory weighting, depending on their lateral distance. During the training phase, all weighting vectors are randomly initialised, after which an instance of the observation dataset is presented to the SOM. The neuron in the SOM map then applies some kind of radial function to calculate the extent to which its receptive weighting vector matches the presented instance. The neuron with the best match is then nominated and this begins the weighting update process, in which the receptive weighting vectors of the unit with the best match and its neighbours are moved closer to the given readings, while the vectors of the other neurons remain unaffected. At the end of the training phase, the SOM should be able to transform nonlinear statistical relationships embedded in high-dimensional observations into a simpler geometric representation that can preserve their topological order.
However, the lack of effective mechanisms by which the standard SOMs can incorporate temporal dependencies into their clustering formation, as well as their rigid topologies, stand in the way of straightforward application of SOMs to RUL prediction. Some works have focused on improving SOM capabilities for processing sequential datasets (e.g., temporal SOM, hypermap, recurrent SOM, and recursive SOM) [
58], while others have concentrated on extending the SOM topology according to the nature of the dataset under consideration (e.g., growing SOM [
59] and growing hierarchical SOM [
51]). This work aims to enhance the capabilities of SOM in both perspectives by combining ReSOM with GHSOM. The underlying approach on which ReSOM was developed is to allow the classical SOM to learn from its past activities by feeding it with a lagged-in-time copy of the SOM as additional input. Therefore, at a time instant, the neurons of a ReSOM receive two homogeneous inputs: the first is a feedforward input representing the instances of the training dataset corresponding to that time point, and the second input is the activity of the SOM generated at the delayed time step. These two inputs are concatenated and then fed into a classical SOM map. This, in turn, allows the ReSOM to follow the same procedures and mechanisms of the classical SOM map, including learning rules, weight updates, and neighbourhood function.
The principle of the GHSOM, on the other hand, is to construct a SOM map that can grow dynamically in accordance with the dataset that the map encounters at runtime. Such growth can occur vertically by adding new maps to the existing structure and horizontally by adding new neurons to the same map. This process continues until a suitable SOM topology emerges that can effectively represent the different patterns exhibited by the datasets and their relationships to each other.
To explain how the integration of ReSOM and GHSOM works, we assume that at a time instant
there is a SOM map with a number of neurons at level
; here,
is used to refer to the set that accommodates these neurons. The feedforward and feedback weight vectors of any neuron
at time
, i.e.,
, are denoted by
and
, respectively, where the
and
. At this point, the map is presented with a realisation of the input, denoted
, and each neuron calculates its distance with respect to
as:
where
and
are hypermeters used to control the how long the historical information is involved in the computing the distance, whereas
is the output generated by this neuron at the preceding time step which is computed as:
The neuron with the minimum distance to the given input at time instance
is nominated as best matching unit (BMU), i.e.,
and then weight vectors are updated according to:
where
is the learning rate (the rate at which the learning process is paced, which is typically defined as
;
is the initial value of learning rate, usually,
; and
is the time constant.
is the neighbourhood function that is defined as a
where
is the distance between the neuron
and
and
is the effective width of the topology neighbourhood, which is defined as
; here,
is the initial value of the effective width and again
is the time constant. Depending on the representation power that neurons of the
layer provides, the training can be conducted for one of more epochs and by the end of them, each neuron computes its mean quantisation error (MEQ) as:
where
is a subset of the dataset represented by neuron
, i.e., the data points whose BMU is
. It is worth noting that we define
in terms of the feedforward weight vectors without considering the feedback weight vector. This definition is justified by the fact that the feedforward weight vectors connect the neuron to the input space, and it is the sole responsibility to represent the datapoints. Following the computation of
’s of all neurons, the
of the entire map at level
is computed as the mean of
’s of all BMU neurons in
, i.e.,
where
is the subset of
containing all the BMU’s. Once these computations are performed, a decision whether there is a need to add more neurons to the same level or add new layers has to be made. Such a decision is performed by comparing the
with the
of its parent, i.e., the neuron in the upper level
from which the level is emerged, i.e.,
If the value of Equation (6) is evaluated as false, it means that the current map cannot represent the dataset at the desired level of granularity and, therefore, the process of horizontal growth must be initiated. This process starts by selecting the neuron with the maximum
value in layer J, i.e.,
and the furthest neighbour within its receptive field to
in terms of the weight vector, denoted by
. A new set of neurons is then added between
and
. The new map architecture is then trained and evaluated against the condition given in Equation (6). Once this condition is met, the horizontal growth is terminated, and the vertical growth process begins. The main goal of this process is to determine whether or not each neuron in the current map is placed at the correct level. This determination Is made by comparing the
of all neurons with the
of the neuron at level 0, i.e.,
using Equation (7). If a neuron does not meet this condition, it is moved to the next level of the map.
where
denoted the sets of all neurons in the horizontal and vertical levels, whereas
and
are the hyperparameters of the model whose values are set to 0.05 and 1.0, respectively.
3.3. Grossberg Layer Supervised Layer
The output layer, as defined in the original CPN architecture, is a single layer with one or more artificial neurons, each of which is fully connected to all other neurons in the SOM layer. Although this makes this layer looks like an MLP architecture, it differs significantly from that architecture both in the way by which the weighting connections are updated and, in the strategy used to perform the learning. In this layer, the actual target values (ground truth) are used to perform the learning process, whereas in the traditional MLP network, the magnitude of the deviation of the target value from the predicted value (i.e., the prediction errors) is used instead. Using the actual values not only speeds up the convergence of the model, but also reduces the possibility of trapping into local minima, which typically occurs when the error is too small to be captured by the learning rate. Furthermore, the output of this layer uses the Grossberg learning rule, where the new value of the weights is calculated based on the value of the current weight, the ground truth and the output of the SOM layer, without the need for complicated mathematical operations (i.e., as gradients in the MLP architecture). The main advantage of the Grossberg learning rule, which, besides its low computational cost, has a high level of robustness against data deviations. More specifically, adjusting the weights of the neurons in this layer in accordance with all the fired/triggered SOM neurons facilitates the retention of valuable available information related to various clusters derived from the unsupervised learning strategy in the mapping space of each neuron. It is worth noting that the artificial neurons of the CPN output layer do not contain an activation function, as is the case with their counterparts in the MLP networks. As a result, the CPN output layer avoids the limitations associated with selecting an inappropriate activation function, such as output space constraints, potential bias shifts, and lack of smoothness.