1. Introduction
Epilepsy is one of the most prevalent neurological disorders, affecting tens of millions of individuals worldwide. Patients diagnosed with epilepsy experience seizures, which are the primary clinical manifestation of the condition. A diagnosis of epilepsy is established after a patient experiences two or more unprovoked seizures, separated by at least 24 h [
1]. Seizures vary in type, duration, and severity, and regardless of the specific characteristics, they represent a distressing and, at times, painful experience for the patient. These seizures may range from brief lapses of consciousness to severe muscular spasms, and in some instances, they pose significant risks. Patients may sustain injuries from falls or while handling hazardous objects or machinery. The abnormal neuronal activity (reaching up to 500 discharges per second) associated with seizures can also lead to brain cell damage, particularly during prolonged seizures or those occurring in rapid succession, a condition known as
status epilepticus [
2].
Numerous studies emphasize the importance of implementing automated methods for the detection of epileptic activity, as well as for the diagnosis and prediction of epileptic seizures [
3]. Such methods can help minimize human errors by medical professionals, particularly those caused by fatigue after long periods of analyzing subtle differences in numerous recorded MEG images. This is crucial in preventing epileptic seizures and enables continuous machine-based monitoring of patients in critical conditions.
While several advanced models have been proposed for EEG classification [
3], fewer publications address the classification of MEG data, with most utilizing advanced artificial neural networks (ANNs). However, none of these studies have conducted a comprehensive examination of basic ANN models.
Gao et al. proposed an automatic feature extraction from EEG Spectra and classification method using deep convolutional neural networks integrating three DCNNs: Inception-ResNet-V2, Inception V3, and ResNet152 achieving an average classification accuracy of over 90% [
4].
Gomez et al. in their work [
5] tested a deep learning strategy derived from robust methods for object recognition tasks in the computer vision field on the EPILEPSIAE and CHB-MIT Scalp EEG Signal datasets, achieving global accuracy and specificity levels of 92.9 ± 21.8 and 93.1 ± 21.9, respectively.
Ilias et al. experimented with pretrained models including AlexNet, DenseNet201, EfficientNet, ResNet18, etc. They also introduced a multimodal two-branch CNN which can extract low- and high-frequency features from EEG signals. This model obtains a comparable performance to that of the state-of-the-art approaches with 97% accuracy. The dataset used was the publicly available EEG dataset of the University of Bonn, consisting of five subsets, each subset contains 100 single-channel EEG segments of 23.6 s duration [
6].
In [
7], Meenakshi Sood tried feature extraction from several Wavelet transforms of EEG signals, then fed into different ANNs, achieving a best accuracy of 99.4% with a model called the NN-D ensemble classification method. He also used the dataset of the University of Bonn.
Sandeep et al. applied a radial basis function neural network (RBFNN) that uses a modified particle swarm optimization (PSO) algorithm to optimize the mean square error [
8]. This method produced a maximum accuracy of 99% tested on two datasets. The first dataset was an EEG dataset for epileptic seizure identification, and the second dataset was an EEG dataset for eye state prediction, both publicly available.
Aoe et al. developed Mnet, a novel deep convolutional neural network to classify multiple neurological diseases using resting-state magnetoencephalography (MEG) signals. The dataset consisted of MEG signals of 67 healthy subjects, 26 patients with spinal cord injuries, and 140 patients with epilepsy and was used to train and test the network using 10-fold cross-validation. The model performed with an accuracy of 70.7 ± 10.6% in classifying the healthy subjects and those with the two neurological diseases. The specificity of classification for each disease ranged from 86 to 94% [
9].
Alotaiby et al. explored the use of eight statistical features and Genetic Programming (GP) with the K-nearest neighbor (KNN) algorithm for interictal spike detection, achieving a 91.75% average sensitivity and 92.99% average specificity [
10].
Raghu et al. attempted to classify seven variants of seizures with non-seizure EEG through the application of convolutional neural networks (CNNs) and transfer learning. Several pretrained networks were fed with spectrograms of EEG signals acquired from the Temple University Hospital EEG corpus. The highest classification accuracies of 82.85% (using GoogLenet) and 88.30% (using Inception V3) were achieved using transfer learning and extract image features approach, respectively [
11].
Hussein et al. developed a novel convolution module named “semi-dilated convolution” that better exploits the geometry of wavelet scalograms and non-square-shaped images. Then they proposed a neural network architecture named “semi-dilated convolutional network (SDCN)” that uses the aforementioned module to expand the receptive field along the long dimension (image width) while maintaining high resolution along the short dimension (image height), resulting in an average seizure prediction sensitivity of 98.90% for scalp EEG and 88.45–89.52% for invasive EEG [
12].
Hirano et al. created the fully automated AI-based MEG interictal epileptiform discharge identification and ECD estimation (FAMED) which resulted in a mean Area Under Curve (AUC) of 0.9868 (10-fold patient-wise cross-validation), while the sensitivity and specificity were 0.7952 and 0.9971, respectively [
13].
Zhu Liang et al. introduced a hybrid model combining generative and discriminative approaches for decoding visual information from MEG data across different subjects. They employed stacked generalization to enhance the decoding performance, demonstrating its effectiveness in handling inter-subject variability [
14].
Henson et al. explored the combination of MEG and MRI data to classify individuals with Mild Cognitive Impairment (MCI). The authors utilized a form of stacked generalization, referred to as “late combination,” where predictions from modality-specific classifiers are combined to improve the overall classification accuracy [
15].
Olivetti, E. et al. proposed, among others, the use of ensemble learning, and specifically of stacked generalization, to address the variability across subjects within the training data, with the aim of producing more stable classifiers [
16].
The review of the aforementioned literature led us to several conclusions:
- 1.
The number of studies examining MEG signals is significantly smaller than the corresponding number for EEG, especially epileptic signals.
- 2.
There is a focus on investigating advanced classification models, while references and evaluations of the effectiveness of simpler, fundamental classification models are notably scarce.
- 3.
Due to the absence of comparative studies between advanced and basic models, it remains unclear what performance gains are achieved by their application relative to the cost (computational cost and implementation cost).
Additionally, MEG as a method is relatively new and quite expensive, with limited clinical application. As a result, MEG signal samples, particularly those concerning specific categories of diseases such as epilepsy, are quite rare. Neurologists are still not familiar with recognizing MEG signals and tend to avoid incorporating them into clinical practice.
All the above served as the motivations and simultaneously defined the objectives and direction of our work, which are the following:
- 1.
Evaluation of basic artificial neural network (ANN) models in the classification of epileptic signals.
- 2.
Comparison of the basic models with some slightly more advanced ones to clarify the performance gain and whether it justifies the additional effort and cost.
- 3.
Testing a technique of slight optimization of ANNs based on the use of pretrained models.
Our aim was not to contribute to theoretical induction, but to assist in the creation of an effective MEG classification method that would have both educational and clinical utility, as it could aid in the learning process of MEG signal recognition by specialized neurologists.
After conducting a preliminary study testing and comparing various basic ANN models to build a solid understanding of the characteristics of our data and gain insights into the model’s behavior, we proceeded to testing 1D-CNN in comparison to AFFNN and testing a lightweight optimization method based on variations of stacked generalization–concatenated stacking (various interconnections of pretrained and untrained models) [
17,
18]. The results of our study are very encouraging and can constitute a basis for future research on epileptic seizure recognition, prediction, and prevention by means of Genetic Algorithms and Recursive ANN.
2. Material and Methods
Magnetoencephalography is a neuroimaging technique that utilizes an array of sensors placed slightly above the scalp. The use of superconducting quantum interference devices (SQUIDs), which can detect and record biomagnetic fields of the order of 1 fT (=10
−15 T), makes MEG very sensitive to the microscopic alterations of the magnetic field produced by neural electrical activity. Thus, it achieves a very good spatial resolution (up to 5 mm), as well as a great time resolution at the scale of one millisecond or even better, which makes MEG a great tool for tracing real-time changes in brain activity and state. It can be used in combination with other imaging techniques (MRI, f-MRI, PET, PET-CT) to give a detailed 3D imaging of brain activity in specific areas. It is non-invasive and it is completely safe, causing no discomfort. Moreover, it can detect epileptic activity and spot epileptic foci in the brain activity with the patient resting in the interictal state, without inducing unpleasant and even painful seizures to the patient, caused by intermittent visual and auditory stimuli, as is the usual routine during EEG recording [
19].
Epileptic activity appears in EEG and MEG as irregular patterns in the form of spikes, spikes-and-slow waves, or sharpwaves. The morphology of spikes and sharp waves in EEG was thoroughly analyzed by Gortman, and these waves can be used for epilepsy diagnosis [
20]. Although studies are being carried out [
21], there still is no formal definition of epileptic spikes in MEG. However, even if it seems an oxymoron, in comparison to EEG signals, “MEG spike yield and localization is superior compared with EEG” [
22]. Epileptic signals in MEG have different morphological characteristics (duration, shape, and sharpness) from those in EEG. This can be explained by the small effect on the MEG signal from the interference from the skull and scalp. Furthermore, muscular activity and eye movement have much less effect on MEG [
23].
Figure 1 depicts two segments of MEG signals used in the present study; the first (a) is classified as non-epileptic and the second (b) as epileptic. In (c), we see the Spectrum bar graph and the heatmap of the corresponding channel’s ST-FFT of the same non-epileptic MEG segment as (a), with different axis scaling to better depict the signal’s details. The signal segments were deliberately selected to exhibit significant differences, ensuring that these variations are clearly represented in the plots. It is important to note that these segments do not represent the average pair of epileptic and non-epileptic signals. For each segment, we show the Signal plot, the Spectrum bar graph and the heatmap of the corresponding signal’s Short-Time Fast Fourier Transform (ST-FFT) (approximately 9 s or 2100 samples).
It is important to note that significantly less research has been conducted applying deep learning models to MEG data compared to EEG data. The primary reason for this disparity is the high cost associated with SQUID. SQUID is a costly device, with installation expenses that are even more substantial. Ideally, the installation requires a Faraday cage to ensure complete electromagnetic isolation for the highly sensitive SQUID sensors, along with acoustic and anti-vibration isolation. These requirements can raise the total cost to several million euros or USD.
The MEG signals used in this study were recorded in the MEG Unit of the Laboratory of Medical Physics, Department of Medicine, Democritus University of Thrace, located in Alexandroupolis, Greece using Neuromag-122™, Elekta Neuromag Ltd., Finland that utilizes the SQUID technology. MEG recordings were obtained from 5 epileptic patients, 3 male and 2 female, 23–40 years old, mean age 33.5 years. All 5 patients were diagnosed by special neurologists as pharmacoresistant Idiopathic Generalized Epilepsy patients, according to the International League Against Epilepsy (ILAE) classification. The diagnoses were based on clinical manifestations, electroencephalography findings, and brain MRI scans when it was necessary. MEG signals were recorded while the patients were in a resting state with their eyes closed.
In this study, we analyzed MEG signals recorded from 122 points on the patients’ brains, with a sampling frequency of 256 Hz and a duration of 9 seconds. A low-pass filter with a cutoff frequency of 30 Hz was applied to all channels. In a few instances, due to sensor malfunctions, six (6) channels (6 × 9 = 54 segments) contained values outside the acceptable range and were subsequently excluded from analysis. The remaining signals were segmented into [(5 × 122) − 6] × 9 = 5436 items (signal segments), each consisting of 1 channel with a duration of 1 second, resulting in 256 samples per segment. Each segment was independently classified by specialized neurologists as either containing epileptic activity (2059 segments) or not (3377 segments). Despite the absence of data from healthy patients, our dataset includes a sufficient number of non-epileptic segments (62.1%). The data were randomly divided into three sets: 80% for the training set, 10% for the test set, and 10% for the validation set.
To statistically support our results, we performed within-subject statistical tests on the spectra of the measured signals. For each one of the 31 frequency components (0–30 Hz), we performed a
t-test on the null hypothesis that the power of the frequency band in the segments showing epileptic activity comes from the same distribution as that of the segments that do not show epileptic activity. In total, 31 × 5 tests were performed, and the null hypothesis was rejected with very small
p-values (mean
p-value: 1.38 × 10
−2) for all but 6 tests (Subject 1—Frequency 5 Hz; Subject 2—Frequency 7 Hz; Subject 5—Frequencies 8, 9, 11, 27 Hz) [
24].
Furthermore, the correlation across the signal channels was measured, showing that the degree of correlation is rather small (mean correlation coefficient: 0.066). The number of channel pairs having correlation above 0.5 is 1105 (0.6%), and the number of channel pairs having correlation above 0.3 is 3427 (1.88%). This can be explained because MEG, in contrast to EEG, is capable of recording the brain signal in great detail, at mm scale, with minimum interference, since the recording is performed in a Faraday cage [
19].
Two additional points must be stressed. First, due to its high sensitivity, MEG is capable to detect epileptiform activity even if the patient is at the interictal state. As it was pointed out, even by early MEG studies in epilepsy, this kind of activity is persistent even if the patient is not experiencing an epileptic seizure. In addition, epileptiform activity not only is present at the interictal state but also manifests spatiotemporal stationarity [
25]. The second point is that MEG signals were recorded using second-order SQUID sensors that avoid capturing overlapping magnetic activity emitted by nearby brain locations [
26,
27].
However, even if we had a significant correlation between many neighboring channels, this would not cancel the significance of our experiment. There is a need to move towards creating a device capable of collecting brain signals from a single patient and decide whether the patient is showing signs of epileptic brain activity that may lead to a seizure.
Our models were built and tested on a Hewlett-Packard Elite 800 G9 (HP Development Company, Palo Alto, CA, USA) machine with an Intel i5 10,500 3.1 GHz 6-core processor (Intel Corporation, Santa Clara, CA, USA) and 16 GB of RAM, using MATLAB R2018a (9.4.0.813654) as the programming environment, which was provided by the Laboratory of Applied Mathematics at the Hellenic Open University.
In this study, we employed several artificial neural network (ANN) models to classify the collected MEG segments. We began by evaluating simple feed-forward neural networks (FFNNs), one of the most basic ANN architectures. These models consist of multiple fully connected layers of artificial neurons. The input MEG segment, comprising 256 samples, is fed into all 256 neurons of the input layer. Each neuron transforms its input by applying the function described in Equation (1),
where
are the inputs of the neuron,
are the weights,
b is the bias applied to the neuron, and
f is a nonlinear activation function. In our study, we used the Hyperbolic Tangent Sigmoid Transfer Function in the hidden layers and Linear Transfer Function for the output layer. The outputs of all the neurons of one layer serve as the inputs to all the neurons of the next layer. The parameters
and
are formed by the procedure called
training, and its goal is to minimize the difference between the network’s actual output and the desired output [
28]. In this study we used the Levenberg-Marquardt Training Algorithm.
Convolutional NNs, on the other hand, use a series of convolution, pooling, and fully connected layers in order to extract feature hierarchies beginning from low- and moving towards higher-level patterns. The role of the convolution layer is fundamental. An optimized feature extractor, called the kernel, is applied to each position of the signal. Every layer’s output is the input of the next layer, so the extracted features may progressively become more complex. The parameters of the kernels are optimized by training performed by means of the backpropagation algorithm-gradient descent [
29].
CNNs exploit local spatial correlations by processing small receptive fields rather than treating all input features as independent. The computational complexity per layer of CNNs is O(K2⋅C⋅F) where K is the filter size, C is the number of channels, and F is the number of filters. For FFNNs, the computational complexity per layer is O(N⋅M) where N is the number of input neurons, and M is the number of output neurons. Since K2 ≪ N, CNN layers tend to require far fewer computations than fully connected layers with the same number of neurons.
Furthermore, training a neural network involves
backpropagation, where gradients of the loss function are computed with respect to each parameter. Since CNNs have fewer parameters than FFNNs, they require fewer gradient updates per epoch, leading to faster convergence during training [
30].
In the first part of our study, we conduct four groups of experiments using feed-forward neural networks (FFNNs) and one-dimensional convolutional neural networks (1D-CNN).
The first group (Group I) consists of two identical experiments (Exp. 1, 2), both in terms of the type of data and the model used. However, the results differ due to the heuristic nature of the experiments. We use a feed-forward neural network (FFNN) with five hidden layers consisting of 64, 32, 32, 32, and 16 neurons, respectively. The data used in this experiment are the raw values of the 256 samples for each segment of our signal.
In the second group (Group II), we employ three FFNNs of different sizes. In Exp. 3, we train a two-hidden-layer network with 32 and 16 neurons, respectively. In Exp. 4, we train a four-layer network with 64, 32, 32, and 16 neurons, respectively. In Exp. 5, we train a three-hidden-layer network with 64, 32, and 16 neurons, respectively. In all three experiments, the input consists of elements with 260 spectral values obtained by applying the ST-FFT on the initial signal segments. The ST-FFT was applied on the signal segment using a sliding window of 128 samples and a step of 32 samples, yielding 4 × 65 = 260 spectral values per segment. This approach results in an input size roughly equivalent to that of the experiments with the 256 samples of the MEG signal. In this case, the results are significantly better, as shown by the detailed findings presented in the following section.
In the third group (Group III), we use two FFNNs of different sizes. In Exp. 6, we train a three-hidden-layer network with 64, 32, and 16 neurons, respectively. In Exp. 7, we train a four-hidden-layer network with 64, 32, 32, and 16 neurons, respectively. In these two experiments, the input consists of a combination of the 256 sample values of each item and the corresponding 260 spectral values obtained through ST-FFT applied to the initial signal segment, resulting in a total of 516 values for each item.
The activation function used in all FFNN hidden layers was Hyperbolic Tangent Sigmoid, and for the output layers, Linear Activation was used. A typical Nguyen–Widrow initialization method was used for weight initialization for hidden layers, and random small values were used to initialize input and output layers. Levenberg–Marquardt backpropagation algorithm was used, which is a second-order optimization method that adjusts learning dynamically. The loss function was mean squared error (MSE), and validation and stopping criteria were based on the early-stopping technique using a validation set; the learning rate was adjusted dynamically. The default number of epochs was set arbitrarily extremely high (to 1000), but the algorithm always terminates based on the early-stopping criteria: training stops if the validation error increases for a certain number of epochs (six in our case), if the performance gradient (1 × 10
−7) becomes too small, if the network reaches a specified error threshold, or if adaptive parameter mu exceeds 1 × 10
10. The batch size is automatically determined based on the training function [
31].
In the fourth group (Group IV), we conduct five experiments using 1D convolutional neural networks (1D-CNNs) with varying configurations. In Exp. 8, we train a network with four convolutional layers consisting of 64, 128, 256, and 256 neurons, respectively, and a filter size of 5. In Exp. 9 and 10, we train a four-layer network similar to the previous one, but with a pooling layer inserted between the second and third layers. For Exp. 11 and Exp. 12, we add two additional ReLU layers after the first and third convolutional layers. The one-dimensional convolutional networks are designed for processing one-dimensional signals. Thus, their input, as in the first group, consists of the raw values of the 256 samples for each component of our signal.
All CNNs use Glorot (Xavier) initialization, and the optimizer is SGDM (Stochastic Gradient Descent with Momentum). Learning rate is 0.01 and learning rate scheduling is “piecewise” decay. Number of epochs is set to 30. Batch size is set to 64 based on our system’s memory and the magnitude of our trainset. Training stops early if validation loss does not improve for 5 consecutive epochs (‘Patience’). Loss function used is cross-entropy (for classification), and validation loss is used as validation criteria [
32].
In all experiments, the networks are trained using the training set (80% of the data: 4348 items/segments), followed by testing and validation with the corresponding sets (each one, 10% of the data). The VCDimension [
33] of the largest and more complex FFNN, having 5 layers with 64, 32, 32, 32, and 16 neurons, respectively, and represented by a graph with a number of edges (which approximates the number of the network trainable parameters) E = 64 × 32 + 32 × 32 + 32 × 32 + 32 × 16 = 4608, is bounded by O(|E|)—for FFNN with weights coming from a finite set [
34]. This makes all the networks used and individually trained suitable for training on 4348 items with low probability of overfitting.
Based on the above theoretical limitation of the FFNN size as well as the limited resources of our computer system, we decided to explore the possibility of leveraging knowledge from a pretrained neural network to construct a larger model that could potentially yield improved results. The significant memory requirements apply exclusively to the training phase, not to subsequent usage of the trained network, such as for validation or classification of new data during production. Once a model is trained on the training set, the entire network can be saved or “frozen”. In training any subsequent models, the pretrained model can be reused on the initial training set to generate output, which corresponds to the classification result for the training set. The computational cost and memory consumption for this process are negligible. Subsequent models are trained on a combination of the original input (the initial training set) and the output of the pretrained model. The pretrained model’s output is a single value between 0 and 1 for each classified item, representing the model’s confidence in its prediction (a value of 0 indicates that the item is classified as non-epileptic with 100% certainty, while 1 indicates it is classified as epileptic with 100% certainty). Adding this single-value column to the original input matrix is analogous to introducing an additional feature for training the next model.
In alternative experiments, we explored freezing all layers of the pretrained network except for the last one and modifying the final layer to generate more output values (e.g., 16 or 32), with the aim of providing more information to the subsequent models. However, this modification did not yield improved results. The accuracy remained comparable to the approach where the pretrained network produced only one output value per classified item. Consequently, we adopted the more efficient approach, which requires significantly less computational effort.
Once the training of the models on the training set is complete, the parameters (weights and biases) remain unchanged. The validation set is then applied to the input of the first model, generating an output corresponding to the classification result. As in the previous process, this output is combined with the original input (the initial validation set) and fed into the subsequent model to produce the next output. This procedure continues for all models in the chain (refer to Groups VII and VIII). The output of the final model represents the ultimate classification result.
We selected the best-performing model in terms of classification accuracy to test this simple method of optimization based on reusing pretrained models. Our goal was to assess the upper limits of classification accuracy attainable from the MEG data, and as such, we disregarded training time and computational cost. We selected the FFNN model that was trained on the combination of MEG signals and ST-FFT spectral values. Several experiments were conducted to measure classification accuracy and other relevant metrics, taking into account computational costs.
Stacked generalization, or stacking, is a model ensembling technique that leverages the principle that different models capture different error patterns. By learning an optimal combination of base models’ outputs, stacking can effectively reduce both bias and variance, leading to improved generalization performance [
35].
Several theoretical foundations support the effectiveness of stacking:
Bias–Variance Tradeoff: Stacking mitigates bias by incorporating multiple perspectives from diverse base models while reducing variance by averaging out overfitting tendencies of individual learners [
36]. The meta-learner, trained on the predictions of base models, acts as a higher-level function that smooths the decision boundary and enhances robustness [
37].
Universal Approximation Property: Theoretically, a sufficiently expressive meta-learner can approximate any function of the base models’ outputs, ensuring that stacking can asymptotically match or exceed the performance of its best component model [
38].
Convex Combination and Model Averaging: When the meta-learner assigns weights to base models, it effectively performs a weighted averaging of predictions. Under convex loss functions, this form of ensembling is known to improve generalization bounds, as shown in studies on weighted model averaging [
39].
Error-Correction Theory: The meta-learner operates as an error-correction mechanism, learning to identify and adjust for systematic mistakes made by base models. This aligns with the principles of boosting, where weak learners iteratively refine decision boundaries to compensate for previous errors [
40].
Transfer learning–stacked generalization was introduced in the following manner: (a) the output (single or multiple value) of one pretrained FFNN was used as an input to a second, untrained FFNN (
Figure 2); (b) the output (single or multiple value) from the pretrained FFNN was combined to the input of the MEG data to the untrained FFNN (
Figure 3); (c) the single-value classification output from several pretrained FFNNs was combined to an input of the MEG data to the untrained FFNN (
Figure 4); or, (d) a chain was built with FFNNs trained by a combination of the MEG data and the output of the previous FFNN model (
Figure 5).
In the first case, we used a 4-hidden-layer FFNN (sizes 64, 32, 32, and 16 hidden neurons, respectively) pretrained on the combined input of the MEG signal values with the ST-FFT spectral values—a total of 516 values, as explained in the experiments of Group III—and then a new FFNN was created and configured for 16 outputs (and 32 outputs for the second experiment). We then copied the parameters of the first four layers of the pretrained network to the new network, thus cloning the pretrained network, but now the clone gives an output of 16 (or 32) values. The idea was to feed the untrained FFNN with more input data from the pretrained network than just a single numeric value per item. Finally, we created a 5-level untrained network and trained it with the output of the pretrained network. Testing the FFNN on the Test Dataset gave poor results. The prediction accuracy was similar, even lower than that of the original pretrained model.
In our second series of optimization experiments (Groups V, VI, and VII), we combined the output of the pretrained network with the original input of the MEG data (256 values of the signal samples and 260 values obtained from the ST-FFT) to the untrained FFNN (
Figure 3). We tested both pretrained networks with multiple value outputs and pretrained networks with single value outputs, with the latter being much easier to use because there was no need for any special configuration. All experiments gave a small improvement in the models’ performance compared to that of the pretrained model but had no significant differences among them. The gain from the use of multiple-value output was approximately the same as the gain from the use of single-value output. So, in the next steps, we proceeded only using pretrained models with single-value output.
In the third and fourth cases (Group VIII and IX), we expanded our models by increasing the number of pretrained networks used to feed the new, untrained model. We experimented with both parallel and serial configurations as shown in
Figure 4 and
Figure 5, respectively, using five pretrained networks in each case. Although the structure in these last two experiments may seem highly complicated, the individually pretrained networks used in these experiments are rather small and easily controllable, since, as previously explained, all individual models are separately trained using the output of the previously trained models along with the variables of the training set.
In Group VIII, we use a serial chain of five FFNNs. The first FFNN is pretrained on the Train-Set data, the 516-column matrix (256 values of the signal samples and 260 values obtained from the ST-FFT). All the subsequent FFNN models are trained on the classification output of the previous model on the Train-Set combined with the initial Train-Set, as explained earlier (lines 256–279) and as shown in
Figure 3. After all FFNN models are sequentially trained on the Train-Set, we use the Validation-Set as the input to the chain. Each FFNN produces a classification output that is combined with the initial Validation-Set and fed to the next FFNN. The output of the final FFNN is the final classification output, which we expect to be more accurate than the output of the first FFNN. In the process, we measure each model’s accuracy to see the overall progress step by step.
The experiments in Group IX are adequately described by
Figure 4. Five FFNNs are pretrained separately on the Train-Set and then are used to produce a single-value output each, which is the classification result for each item in the Train-Set. These five outputs are combined with the original Train-Set (516 Columns) into a new 521-column matrix which is used to train the untrained FFNN. Then, the Validation-Set is used on the five pretrained models in the same manner to produce a 521-column Validation-Set for the final FFNN, which gives us the classification output.