A Model-Assisted Combined Machine Learning Method for Ionospheric TEC Prediction

In order to improve the prediction accuracy of ionospheric total electron content (TEC), a combined intelligent prediction model (MMAdapGA-BP-NN) based on a multi-mutation, multi-cross adaptive genetic algorithm (MMAdapGA) and a back propagation neural network (BP-NN) was proposed. The model combines the international reference ionosphere (IRI), statistical machine learning (SML), BP-NN, and MMAdapGA. Compared with the IRI, SML-based, and other neural network models, MMAdapGA-BP-NN has higher accuracy and a more stable prediction effect. Taking the Athens station in Greece as an example, the root mean square errors (RMSEs) of MMAdapGABP-NN in 2015 and 2020 are 2.84TECU and 0.85TECU, respectively, 52.27% and 72.13% lower than the IRI model. Compared with the single neural network model, the MMAdapGA-BP-NN model reduced RMSE by 28.82% and 24.11% in 2015 and 2020, respectively. Furthermore, compared with the neural network optimized by a single mutation genetic algorithm, MMAdapGA-BP-NN has fewer iterations ranging from 10 to 30. The results show that the prediction effect and stability of the proposed model have obvious advantages. As a result, the model could be extended to an alternative prediction scheme for more ionospheric parameters.


Introduction
The ionosphere is an important part of the near-ground space environment.It is 60-1000 km above the ground and is an area where the excitation of high-energy solar radiation and cosmic rays ionizes the earth.The ionosphere consists mainly of charged particles, neutral atoms, and molecules.In the Sun-Earth space environment, the ionosphere is one of the most important parts directly affecting human space activities [1].The ionosphere is a medium for radio wave transmission and a link between magnetic layers and a neutral atmosphere.The formation of ionizing layers and its changes are comprehensively controlled by various factors such as solar electromagnetic radiation, particle radiation, magnetic field disturbance, geomagnetic field change, and upper atmospheric movement, showing complex weather-like and climate-like variations, which will have a serious impact on satellite navigation, radar imaging, short-wave communication, and other technical means involving radio propagation and electromagnetic environment.The important characteristics parameters of the ionospheric characteristics include the critical frequency, profile height, total electron content (TEC), etc.These parameters are also widely used in satellite communication, navigation timing, radar detection, direction finding and positioning, spectrum management, and other civil and military fields [2].Among them, TEC refers to the total number of electrons along the signal propagation path.It is an important parameter for analyzing the temporal-spatial distribution, morphological structure, delay characteristics, and disturbance of the ionosphere [3].Using satellite Remote Sens. 2023, 15, 2953 2 of 22 observation data to solve TEC will cause TEC data to lag relatively.TEC can be estimated in near-real-time using a certain ionospheric model [4].Accurately predicting TEC is not only very important to ensure the stability of the radio system but also very significant in pre-earthquake identification and other fields.These predictions are also essential for many aerospace and astronomical observation applications.
Due to the complexity of the ionosphere, many methods have been used for TEC modeling [5][6][7].One of the models is the TEC prediction model established based on statistical experience.The most representative model is the International Reference Ionosphere (IRI) model.It is a well-known empirical model and is a recommended international standard [8][9][10].The other is the empirical model of parameter prediction established by the ionosphere parameter observation data.This type of model is widely used in the forecast of ionosphere parameters due to its simple model structure and easy calculation.
In order to improve the prediction accuracy of ionospheric parameters, the artificial intelligence method was introduced [11] to predict the critical frequency of the F2 layer (foF2) [12], the 3000 km transmission factor of the F2 layer (M(3000) F2) [13], TEC [14], peak height [15], and other parameters [16].In particular, the rapid development of machine learning [17] and intelligent computing technology provides more abundant prediction methods for nonlinear and nonstationary ionospheric parameter data.For example, the regional long-term prediction method based on statistical machine learning using the principal component method [14] and the long-term prediction method using the autocorrelation prediction method [18] are both intelligent prediction algorithms.In addition, through the extrapolation method based on physical observation without the input of prediction interval, a new TEC prediction method for the Arctic region is proposed using data from the past 14 years [19].
A deep learning algorithm, such as a neural network (NN), is a very important ionospheric parameter prediction method and has been widely used in recent years [17,[20][21][22].It often requires a large amount of data for training, so it has a good effect when dealing with large samples of data for a long time.It is very important to improve the prediction accuracy of ionospheric parameters, which is also a very popular research content [6,7].The improvement and optimization of deep learning algorithms are significant means to improve the prediction accuracy of ionospheric parameters.The genetic algorithm (GA) can optimize the neural network to establish a short-or long-term prediction model of ionospheric parameters in a single station or a certain region and can get good results [23][24][25].There are also some innovative optimization algorithms used to improve neural networks.For example, the artificial neural network (ANN) based on multi-source data and optimized by the genetic algorithm can be used to build a global ionospheric model for predicting various ionospheric parameters such as foF2 and hmF2 [26].The radial basis function (RBF) neural network classification algorithm based on partial least squares (PLS) and GA (PLS-GA-RBF) is proposed to solve some recognition problems caused by small samples.The results indicate a good picture of the PLS-GA-RBF algorithm; the operating efficiency and recognition accuracy improved substantially [27].The prediction accuracy of ionospheric parameters based on the backpropagation neural network (BP-NN) can be improved by using an improved particle swarm optimization algorithm [28].
In order to improve the long-term prediction accuracy of TEC, a new combined intelligent prediction algorithm was proposed by combining the IRI model, TEC statistical learning model, artificial neural network, and genetic algorithm.Furthermore, to improve the prediction accuracy, the weight and threshold of BP-NN are optimized by an adaptive genetic algorithm with a multi-mutation and multi-cross location.
The detailed arrangement of this paper is as follows: Section 2 is the modeling data and processing process; Section 3 proposes the model principle and modeling process; in Section 4, the predicted results are compared with the measured data and the predicted data of other models to verify the practicability of the method.

Data Collection
The modeling data used in this paper are divided into three parts: the TEC observation data of Athens Station, the solar radio wave flux of 10.7 cm wavelength (F10.7), and the sunspot number (SSN).In this paper, we can download the required TEC observation data by selecting the corresponding time, observed station, and data type through the official websites of the National Oceanic and Atmospheric Administration (NOAA) [29] and the Global Ionospheric Radio Observatory (GIRO) [30,31].The downloaded data can be saved as a text file that records 24 h of TEC data for each day of the year at a station.In order to meet the modeling requirements of the Long-term regional ionospheric TEC prediction model, the monthly 24-h TEC mean values of each TEC observation station are calculated, and one observation station corresponds to a standardized modeling data file.As shown in Figure 1, by observing data, it can be found that TEC presents a certain periodic fluctuation, and the intensity of different years of fluctuations varies in different years.After the statistics of the above data, 5184 sets of data were obtained.Section 4, the predicted results are compared with the measured data and the predicted data of other models to verify the practicability of the method.

Data Collection
The modeling data used in this paper are divided into three parts: the TEC observation data of Athens Station, the solar radio wave flux of 10.7 cm wavelength (F10.7), and the sunspot number (SSN).In this paper, we can download the required TEC observation data by selecting the corresponding time, observed station, and data type through the official websites of the National Oceanic and Atmospheric Administration (NOAA) [29] and the Global Ionospheric Radio Observatory (GIRO) [30,31].The downloaded data can be saved as a text file that records 24 h of TEC data for each day of the year at a station.In order to meet the modeling requirements of the Long-term regional ionospheric TEC prediction model, the monthly 24-h TEC mean values of each TEC observation station are calculated, and one observation station corresponds to a standardized modeling data file.As shown in Figure 1, by observing data, it can be found that TEC presents a certain periodic fluctuation, and the intensity of different years of fluctuations varies in different years.After the statistics of the above data, 5184 sets of data were obtained.

Data Processing
Here, we first preprocessed the TEC data by filtering.In this paper, mean filtering is used to filter the data.In order to ensure that the filtered data waveform is close to the original data waveform and to achieve the effect of suppressing interference and supplementing missing data points, we may use the data in an appropriate time window to carry out mean filtering.For each TEC data, the three values before and after the TEC data and itself were taken, and their mean value was calculated as the filtered data.
where TEC f i is the filtered TEC, and the TEC i is the original TEC.The processing in the following text is based on filtered TEC.
After the mean filtering of the original data, the missing data in the data set was supplemented with the fitting data of cubic spline interpolation, and the scatter plot was drawn and connected to compare with the original data, as shown in Figure 2. It can be found that, compared with before denoising, the fluctuation amplitude of the data after denoising is reduced, supplementing the missing data points in the dataset.Here, we first preprocessed the TEC data by filtering.In this paper, mean filtering is used to filter the data.In order to ensure that the filtered data waveform is close to the original data waveform and to achieve the effect of suppressing interference and supplementing missing data points, we may use the data in an appropriate time window to carry out mean filtering.For each TEC data, the three values before and after the TEC data and itself were taken, and their mean value was calculated as the filtered data.
where TEC i f is the filtered TEC, and the TEC i is the original TEC.The processing in the following text is based on filtered TEC.
After the mean filtering of the original data, the missing data in the data set was supplemented with the fitting data of cubic spline interpolation, and the scatter plot was drawn and connected to compare with the original data, as shown in Figure 2. It can be found that, compared with before denoising, the fluctuation amplitude of the data after denoising is reduced, supplementing the missing data points in the dataset.For F10.7, a scatter plot is drawn in time order and connected.TEC data and F10.7 were drawn on the same image, as shown in Figure 3a.Similarly, the same processing was performed on SSN, as shown in Figure 3b.It can be found that the overall variation of TEC data is strongly similar to the F10.7 data and SSN.For F10.7, a scatter plot is drawn in time order and connected.TEC data and F10.7 were drawn on the same image, as shown in Figure 3a.Similarly, the same processing was performed on SSN, as shown in Figure 3b.It can be found that the overall variation of TEC data is strongly similar to the F10.7 data and SSN.

Methods
As shown in Figure 4, this paper combines several classical models and statistical machine learning (SML) models to obtain a high-accuracy prediction model.The prediction and validation of the combined model constructed in this paper mainly include four processes: Step 1: Based on the statistical learning method, F10.7 and SSN were used to predict TEC.
Step 2: The IRI model was used to obtain TEC data in the prediction region.
Step 3: The BP-NN model is to predict final prediction results by training the data from the above IRI and statistical learning models, wherein the determination of the model using multi-mutation and multi-crossover adaptive genetic algorithm optimization algorithm.
Step 4: Compare the deviation between the TEC data predicted by the proposed model and the TEC data calculated by other models and the actual observed data, and analyze the capability gap between the proposed model and the existing model.

Methods
As shown in Figure 4, this paper combines several classical models and statistical machine learning (SML) models to obtain a high-accuracy prediction model.The prediction and validation of the combined model constructed in this paper mainly include four processes: Step 1: Based on the statistical learning method, F10.7 and SSN were used to predict TEC.
Step 2: The IRI model was used to obtain TEC data in the prediction region.
Step 3: The BP-NN model is to predict final prediction results by training the data from the above IRI and statistical learning models, wherein the determination of the model using multi-mutation and multi-crossover adaptive genetic algorithm optimization algorithm.
Step 4: Compare the deviation between the TEC data predicted by the proposed model and the TEC data calculated by other models and the actual observed data, and analyze the capability gap between the proposed model and the existing model.

TEC Prediction Based on the IRI Model
The IRI model, a joint project of the International Union of Radio Science (URSI) and the Committee on Space Research (COSPAR) is the international standard for climate specifications of ionospheric parameters and an empirical model based on extensive terrestrial and spatial data [8,9].It can describe the monthly mean value of electron density, electron temperature, ion temperature, and ion composition for a given location, data, and time in the non-auroral ionosphere over an altitude range of 50-1500 km.Since its inception in 1969, the IRI model has evolved with newer data and better mathematical descriptions of global and temporal patterns of change.In addition, numerous independent studies have validated the usability of the IRI model and compared it with direct and indirect ionospheric measurements not used in model development [9,32,33].Therefore, the model used in this paper is IRI-2020 [34], through which the monthly one-hour average TEC data of Athens station from 2004 to 2021 are obtained for subsequent analysis and processing.

TEC Prediction Based on the IRI Model
The IRI model, a joint project of the International Union of Radio Science (URSI) and the Committee on Space Research (COSPAR) is the international standard for climate specifications of ionospheric parameters and an empirical model based on extensive terrestrial and spatial data [8,9].It can describe the monthly mean value of electron density, electron temperature, ion temperature, and ion composition for a given location, data, and time in the non-auroral ionosphere over an altitude range of 50-1500 km.Since its inception in 1969, the IRI model has evolved with newer data and better mathematical descriptions of global and temporal patterns of change.In addition, numerous independent studies have validated the usability of the IRI model and compared it with direct and indirect ionospheric measurements not used in model development [9,32,33].Therefore, the model used in this paper is IRI-2020 [34], through which the monthly one-hour average TEC data of Athens station from 2004 to 2021 are obtained for subsequent analysis and processing.

TEC Prediction Based on Statistical Machine Learning
Fourier expansion is known to be useful in characterizing periodic waveforms.Because TEC is inherently diurnal, Fourier analysis is superior to other ionospheric modeling methods in building empirical models of TEC predictions.In particular, the Fourier function was used to model the ionospheric TEC diurnal variation, with a period of 24 h, as shown in the following function [12]:

N m p ssn T a m p ssn nT b m p ssn nT
where λ is the geographical latitude, φ is the geographical longitude, m is the month, p is the solar radio wave flux of 10.7 cm wavelength, ssn is the number of sunspots, T is the given time, expressed as between −180° and 180°, N is the maximum number of harmonics

TEC Prediction Based on Statistical Machine Learning
Fourier expansion is known to be useful in characterizing periodic waveforms.Because TEC is inherently diurnal, Fourier analysis is superior to other ionospheric modeling methods in building empirical models of TEC predictions.In particular, the Fourier function was used to model the ionospheric TEC diurnal variation, with a period of 24 h, as shown in the following function [12]: where λ is the geographical latitude, ϕ is the geographical longitude, m is the month, p is the solar radio wave flux of 10.7 cm wavelength, ssn is the number of sunspots, T is the given time, expressed as between −180 • and 180 • , N is the maximum number of harmonics used to represent the diurnal variation and can be determined by the regression analysis, n (n = 1, 2, . . ., N) is the harmonic number, a n and b n are Fourier coefficients related to geographic coordinates and solar activity parameters.
For the given geographical location in Athens, Greece (38 • 02 N, 23 • 44 E, UTC/GMT + 2 h), the main factors affecting TEC are time and solar activity level.One of the determinants of solar activity level is F10.7, which is mainly determined by the corona and the upper layer of the chromosphere.Another determinant is SSN, which is influenced by the photosphere and the chromosphere.Therefore, we can predict TEC from solar cycle variation parameters and time parameters (year, semi-year, and month).The prediction function can be written as a harmonic function [12]: where p is the average F10.7 over 12 months (unit is 10 −22 /W/m −2 /Hz −1 ); ssn is the average SSN over 12 months; m is the month; K represents the upper limit of annual, semi-annual, seasonal, and monthly cycles; J represents the upper limit of solar cycles.K and J can be determined by regression analysis.In addition, the coefficients in the formula were determined by the least square fitting method.
Either an empirical ionospheric model or a principle-based physical model is derived from observations.In exactly the same way, the ionospheric parameters of the above harmonic functions can be determined from observations.The root-mean-square error (RMSE) could be easily estimated with the least squares regression analysis for TEC [12,16].The calculation shows that the maximum harmonic number N in Equation ( 2) is up to 11 in the above model to ensure the highest precision and robustness.Similarly, when the regression order J = 2 and K = 2 in Equations ( 3) and (4), the calculated RMSE of TEC is smaller.It can be seen that it has better convergence and stronger robustness.

Modeling of Neural Network Optimization Based on Genetic Algorithm
In this section, the TEC data predicted by solar activity level index, IRI-2020, and harmonic function fitting were taken as input, and TEC data were taken as output to establish a prediction model based on BP-NN.In addition, the multi-mutation and multicrossover adaptive genetic algorithm (MMAdapGA) was used to optimize the model, and existing data (the actual filtered TEC data, time parameter (UTC), TEC data predicted by the IRI model, and the SML-based model, etc.) were used as training dataset and testing dataset to train the model.
ANN simulates the process of neuron signal transmission [35].The neuron is a topological network established by biological research and the response mechanism of the brain, simulating the process of neural conflict.The end of multiple dendrites receives external signals and transmits them to neurons for processing and fusion.Finally, the nerve is transmitted to other neurons or effectors through axons.ANN approximates complex nonlinear functions as an algorithm model of basic nonlinear function combinations.This model has been successfully applied to TEC modeling and prediction [35].ANN does not need to determine the mathematical equation of the mapping relationship between input and output in advance.Through its own training, ANN can obtain the result closest to the expected output value when the input value is given.
As a classical algorithm, BP-NN simulates the neuronal processing process [36], which is a multi-layer feedforward network trained by error backpropagation.Its basic idea is the gradient descent method which uses the gradient search technology to minimize the mean square error of the actual output value and the expected output value of the network [37].
The input of BP-NN designed in this paper is respectively coordinated Universal Time UTC (year, month), F10.7 observation data, SSN observation data, SML-based model fitting TEC data through F10.7 (SML-F10.7),SML-based model fitting TEC data through SSN (SML-SSN), and the IRI model prediction TEC data.The output is the TEC prediction data.The input of the TEC data containing predictions of multiple models can solve the problem of a large deviation of a single model after a certain prediction time.The BP-NN constructed in this paper has a neural network with eight input neurons, one output neuron and two hidden layers.The first hidden layer comprises 15 hidden neurons, and the second hidden layer is composed of 10 hidden neurons, as shown in Figure 5.
fitting TEC data through F10.7 (SML-F10.7),SML-based model fitting TEC data through SSN (SML-SSN), and the IRI model prediction TEC data.The output is the TEC prediction data.The input of the TEC data containing predictions of multiple models can solve the problem of a large deviation of a single model after a certain prediction time.The BP-NN constructed in this paper has a neural network with eight input neurons, one output neuron and two hidden layers.The first hidden layer comprises 15 hidden neurons, and the second hidden layer is composed of 10 hidden neurons, as shown in Figure 5.
The above neural network is trained.Each data set is normalized to [−1,1] as a training set to carry out fitting training on the BP neural network model, in which the normalization formula is [38]: for the neuron i (i = 1, 2, …, 15) in the first hidden layer, x i 1 ,x i 2 ,⋯,x i 8 is the input of the neuron.The input is the independent variable that has a key influence on the system model.The input in this model is eight kinds of data, as shown in Figure 5. 8 is the connection weight, which is used to adjust the weight ratio of each input.According to biology, neurons are activated only when the amount of information they receive reaches a threshold.Therefore, Φ i1 is used in this paper to represent the threshold value of this neuron, which is determined by the multi-mutation and multicrossover adaptive genetic algorithm.Similarly, the connection weight of neuron r (r = 1, 2, …, 10) in the second hidden layer is W r2 1 ,W r2 2 ,⋯,W r2 15 .The threshold value of this neuron The above neural network is trained.Each data set is normalized to [−1, 1] as a training set to carry out fitting training on the BP neural network model, in which the normalization formula is [38]: for the neuron i (i = 1, 2, . . ., 15) in the first hidden layer, i is the input of the neuron.The input is the independent variable that has a key influence on the system model.The input in this model is eight kinds of data, as shown in Figure 5.
i1 is the connection weight, which is used to adjust the weight ratio of each input.According to biology, neurons are activated only when the amount of information they receive reaches a threshold.Therefore, Φ i1 is used in this paper to represent the threshold value of this neuron, which is determined by the multi-mutation and multi-crossover adaptive genetic algorithm.Similarly, the connection weight of neuron r (r = 1, 2, . . ., 10) in the second hidden layer is . The threshold value of this neuron is Φ r2 .The connection weight of the hidden layer to the output layer is 10 .The threshold value is Ψ.All activation functions in this paper are Sigmod functions [17].can improve the computational efficiency and accelerate the convergence rate of the algorithm [26].Multi-mutation and multi-crossover can improve individual mutation rate and crossover rates and reduce the probability of local minimization resulting in a dead cycle [39].The MMAdapGA can optimize the weight and threshold of BP-NN, improve the computational efficiency and accuracy of BP-NN, and reduce the error [25].In this paper, MMAdapGA is used to calculate and optimize the weights and thresholds between each layer of BP-NN.The flowchart of the algorithm is shown in Figure 6b.
− + 1 x e BP-NN is mainly composed of the topology structure of the neural network (number of input/output parameters, number of layers of neural network), the weight and threshold between each layer, and the training and prediction of the neural network.The algorithm flow chart of BP-NN is shown in Figure 6a.GA is a global optimization algorithm with good global search ability.The adaptive method mainly changes the individual crossover and mutation probability automatically with the change of fitness function value, which can improve the computational efficiency and accelerate the convergence rate of the algorithm [26].Multi-mutation and multi-crossover can improve individual mutation rate and crossover rates and reduce the probability of local minimization resulting in a dead cycle [39].The MMAdapGA can optimize the weight and threshold of BP-NN, improve the computational efficiency and accuracy of BP-NN, and reduce the error [25].In this paper, MMAdapGA is used to calculate and optimize the weights and thresholds between each layer of BP-NN.The flowchart of the algorithm is shown in Figure 6b.Since the topological structure of BP-NN has been determined, that is, the weights and thresholds between each layer of the neural network have been determined, then the Since the topological structure of BP-NN has been determined, that is, the weights and thresholds between each layer of the neural network have been determined, then the number of optimization parameters of the genetic algorithm can be determined.In addition, the weights and thresholds of BP-NN are random numbers in the interval [−0.5, 0.5] initialized randomly, so the coding length of the individual population can be determined.In this paper, two crossover points and two mutation points are set, and the mating and mutation probabilities are changed automatically with the change of fitness function value.The optimization process of MMAdapGA for BP-NN is as follows: Step 1: Population initialization.The individual coding method is real coding, and each individual is a real number string, which consists of six parts: the connection weight of the input layer and hidden layer, the connection weight of hidden layer and the hidden layer, a threshold of two hidden layers, connection weight of the hidden layer and the output layer, and threshold of the output layer.The individual contains all the weights and thresholds of BP-NN.Under the condition that the network structure remains unchanged, a neural network with definite structure, weights and thresholds can be formed.
Step 2: Fitness function.The initial weight of the neural network is obtained according to the individual.After training the neural network with training data, the TEC output is predicted.The sum of absolute error E between the predicted output TEC f and the expected output TEC r is used as the calculation formula for the individual fitness value.
TEC f tj is the TEC value predicted by an input j in the data set, and the neural network population it uses is composed of a fixed topology structure and the threshold and weight obtained by decoding individual a t .TEC r j is the expected output, namely the actual TEC value corresponding to j in the data set.k is the coefficient.
Step 3: Selection operation.Regeneration individuals are selected according to fitness.Individuals with high fitness have a higher probability of being selected, while individuals with low fitness are eliminated.It can be seen from Formula (8) that the fitness of individuals with smaller fitness values is higher, so the reciprocal of fitness values is taken to calculate the selection probability.
Step 4: Crossover operation.Since the individuals are coded by real numbers, the crossover operation method is the real number crossover method.Two random numbers need to be set to cross individuals, as two points are used for crossing operations.Let two crossing points cross at the same time.When crossing at the m time, the changes of individual a k (k = 1, 2, • • • , 135) and individual a l (l = 1, 2, • • • , 135) in the population can be expressed by the following formula: where, a km and a lm are individual a k and a l at the m crossover time.b 1 and b 2 are the random number in the interval of [0, 1].Since the adaptive genetic algorithm is adopted, the probability of the cross operation of individual a k and individual a l is [40]: E max is the maximum population fitness.E avg is the average population fitness.E kl = max{E k , E l } is the larger adaptation value of individual a k and individual a l .k 1 = 0.3, k 2 = 0.5 are the coefficients.
Step 5: Variation operation.Mutative operation is performed on a weight W or threshold V. Select gene a kp and gene a kq of individual a k for the mutation operation as follows: where, r is a random number in the range of [0, 1].g is the current iteration number.G max is the maximum number of evolutions.Due to the adaptive genetic algorithm, the mutation probability of gene a kp in individual a k is expressed by the following formula: where E k is the fitness of the individual a k .k 3 = 0.1, k 4 = 0.2 are the coefficients.
Step 6: A new generation of population is generated from crossover and mutation, then return to Step 2. Repeat the process 100 times.Finally, the relative optimal solution of weights and thresholds of each BP-NN layer is obtained.
Step 7: The BP-NN is trained and optimized by MMAdapGA, and the error is calculated.The gradient descent method is used to update the weights and thresholds again.Finally, MMAdapGA-BP-NN with errors within the allowed range is obtained, and the model is tested by predicting the TEC of the test set.

Results and Discussion
In this paper, the MMAdapGA-BP-NN prediction model is proposed.At the same time, a single BP-NN prediction model (prediction of TEC based on BP neural network) and GA-BP-NN prediction model (prediction of TEC based on neural network optimized by a single mutation and single crossover genetic algorithm) are proposed.As can be seen from Figure 3, 2015 is a year of high solar activity, and 2020 is a year of low solar activity.Data from 2015 and 2020 are taken as test sets, respectively, and other data are taken as training sets to train the model.In this paper, the above three models are all run 50 times, and the running results are analyzed.The performance of the above three models, the IRI model and the SML-based model, is compared and analyzed.The error between each predicted TEC data and the actual observed TEC data err i is defined in the following formula: The performance indicators are as follows [41]: • Mean Relative Error (MRE) • Root Mean Square Error (RMSE) In Equations ( 13)-( 17), TEC f i is the predicted TEC data, and TEC r i is the actual observed TEC data.TEC f is the average value of predicted TEC data, and TEC r is the average value of actual observed TEC data.
The comparison results are shown in Table 1, where SML-F10.7 is the SML-based model for prediction based on the actual observed data of F10.7, and SML-SSN is the SML-based model for prediction based on the actual observed data of SSN.2015) is more obvious than that in a low solar activity year (2020).
We record the operation results of BP-NN, GA-BP-NN and MMAdapGA-BP-NN TEC prediction models, respectively.The results of each model are recorded 50 times, as shown in Figures 7 and 8.
seen that the optimization effect of MMAdapGA on BP-NN is very obvious.Moreover, the optimization effect of MMAdapGA on BP-NN in a high solar activity year (2015) is more obvious than that in a low solar activity year (2020).
We record the operation results of BP-NN, GA-BP-NN and MMAdapGA-BP-NN TEC prediction models, respectively.The results of each model are recorded 50 times, as shown in Figures 7 and 8.It can be observed from Figures 7 and 8 that: 1.
In the year of high solar activity (2015), the average number of iterations converging for 50 running times of MMAdapGA-BP-NN, GA-BP-NN, and BP-NN are 19.5, 32.7, and 40.3, respectively.In the year of low solar activity (2020), the average number of iterations converging for 50 running times of the three models are 15.0, 23.3, and 33.5, respectively.MMAdapGA-BP-NN has fewer iterations and faster training speed.

3.
In the year of high solar activity (2015), the standard deviations of iterations converging for 50 running times of MMAdapGA-BP-NN, GA-BP-NN, and BP-NN are 3.71, 3.95, and 4.12.The standard deviations of RMSE of the three models are 1.46, 2.15, and 3.57.The standard deviations of ρ of the three models are 0.08, 0.14, and 0.34, respectively.In the year of low solar activity (2020), the standard deviations of iterations converging for 50 running times of the three models are 2.14, 2.34, and 5.02.The standard deviations of RMSE of the three models are 0.74, 1.03, and 1.46.The standard deviations of ρ of the three models are 0.06, 0.12, and 0.23, respectively.Compared with GA-BP-NN and BP-NN, MMAdapGA-BP-NN has a lower standard deviation of performance indexes, and the training effect is more stable.Moreover, the three models are generally more stable in the low solar activity year (2020) than in the high solar activity year (2015).It can be observed from Figures 7 and 8 that: 1. MMAdapGA-BP-NN results of 50 runs generally have smaller RMSE, larger ρ, and better prediction accuracy.2. In the year of high solar activity (2015), the average number of iterations converging for 50 running times of MMAdapGA-BP-NN, GA-BP-NN, and BP-NN are 19.5, 32.7, and 40.3, respectively.In the year of low solar activity (2020), the average number of iterations converging for 50 running times of the three models are 15.0, 23.3, and 33.5, respectively.MMAdapGA-BP-NN has fewer iterations and faster training speed.3.In the year of high solar activity (2015), the standard deviations of iterations converging for 50 running times of MMAdapGA-BP-NN, GA-BP-NN, and BP-NN are 3.71, 3.95, and 4.12.The standard deviations of RMSE of the three models are 1.46, 2.15, and 3.57.The standard deviations of ρ of the three models are 0.08, 0.14, and 0.34, respectively.In the year of low solar activity (2020), the standard deviations of iterations converging for 50 running times of the three models are 2.14, 2.34, and 5.02.The standard deviations of RMSE of the three models are 0.74, 1.03, and 1.46.The standard deviations of ρ of the three models are 0.06, 0.12, and 0.23, respectively.Compared with GA-BP-NN and BP-NN, MMAdapGA-BP-NN has a lower standard deviation of performance indexes, and the training effect is more stable.Moreover, the three models are generally more stable in the low solar activity year (2020) than in the high solar activity year (2015).
We calculated the mean value of the results of 50 runs of MMAdapGA-BP-NN, GA-BP-NN, and BP-NN, which are used as the average prediction results of the three models.The curve graph of actual observed TEC data and the scatter graph of predicted data of            Monthly performance indicators of MMAdapGA-BP-NN in the high solar activity year (2015) and low solar activity year (2020) are calculated, respectively, and variation curves are drawn, as shown in Figure 12.Around April 2015 and March 2020, TEC data fluctuated sharply, and the prediction error was large.Obviously, the prediction error of TEC data during the daytime is generally larger than that at night.fluctuated sharply, and the prediction error was large.Obviously, the prediction error of TEC data during the daytime is generally larger than that at night.Geomagnetic storms can cause the disturbance of TEC.Multiple storm periods occurred during the years 2015 and 2020 [42].For example, a geomagnetic storm occurred on 17 March 2015 [43].The impact of geomagnetic storm intensity on TEC prediction Geomagnetic storms can cause the disturbance of TEC.Multiple storm periods occurred during the years 2015 and 2020 [42].For example, a geomagnetic storm occurred on 17 March 2015 [43].The impact of geomagnetic storm intensity on TEC prediction errors is analyzed next.And the Ap index can also characterize the geomagnetic storm intensity.The qualitative expression of geomagnetic storm intensity is shown in Table 3.Furthermore, the qualitative analysis yields the number of A p > 15 days and the number of A p ≥ 30 days in high solar activity years (2015) and low solar activity years (2020), as shown in Figure 13.It can be found that there is no obvious correlation between the absolute error of predicted TECs and the geomagnetic storm days.In addition, the correlation coefficient ρ between the A p index, and the absolute error is calculated quantitatively and is −0.36 and 0.35, respectively.It can be found that the correlation coefficient between the A p index and the absolute error is small.errors is analyzed next.And the Ap index can also characterize the geomagnetic storm intensity.The qualitative expression of geomagnetic storm intensity is shown in Table 3.Furthermore, the qualitative analysis yields the number of  > 15 days and the number of  ≥ 30 days in high solar activity years (2015) and low solar activity years (2020), as shown in Figure 13.It can be found that there is no obvious correlation between the absolute error of predicted TECs and the geomagnetic storm days.In addition, the correlation coefficient ρ between the  index, and the absolute error is calculated quantitatively and is −0.36 and 0.35, respectively.It can be found that the correlation coefficient between the  index and the absolute error is small.

Conclusions
Real-time and accurate TEC monitoring is essential for many aerospace applications.The combined prediction method based on the neural network can reflect the implicit relationship between TEC data and external features, which can be used to improve the accuracy of TEC prediction.In order to solve the defects of a single model and further improve the prediction accuracy, a new combined prediction model is proposed for TEC over Athens, Greece.The combined prediction model is a combination of the IRI model,

Conclusions
Real-time and accurate TEC monitoring is essential for many aerospace applications.The combined prediction method based on the neural network can reflect the implicit relationship between TEC data and external features, which can be used to improve the accuracy of TEC prediction.In order to solve the defects of a single model and further improve the prediction accuracy, a new combined prediction model is proposed for TEC over Athens, Greece.The combined prediction model is a combination of the IRI model, SML-based model, BP-NN model and MMAdapGA parameter optimization model.We use TEC data from the Athens station in Greece as an example to validate the rationality of MMAdapGA-BP-NN, so the data of other stations are not verified.In future work, we will use this method to predict TEC data for more sites and longer time horizons to verify the completeness of MMAdapGA-BP-NN.By comparing with a single model, it can be found that MMAdapGA-BP-NN has higher accuracy and better prediction effect.Compared with a single neural network model, MMAdapGA-BP-NN has a better prediction effect and is more stable.Compared with the neural network optimized by a single mutation genetic algorithm, MMAdapGA-BP-NN has a higher prediction accuracy, fewer iterations, and a more stable prediction effect.This paper provides a new way of thinking about TEC prediction, and future studies are expected to extend the applicability of the proposed model to a wider range of regions and even the globe.In addition, this modeling method is expected to be used to predict other ionospheric parameters, such as peak height and critical frequency of the F2 layer.Based on the current results, the errors can be further reduced by further adjusting the parameters of the neural network, improving the neural network optimization method, adding more parameters affecting TEC into the model and other methods.In future research, we will consider the influence of more factors on TEC prediction, such as the geomagnetic storm intensity index and more complex solar activity, to further improve the accuracy and robustness of the model.It is hoped that the MMAdapGA-BP-NN model can provide a basis for the establishment of the international model.

Figure 1 .
Figure 1.VTEC data curve for 24 h from 2004 to 2022 in Athens, Greece.Figure 1. VTEC data curve for 24 h from 2004 to 2022 in Athens, Greece.

Figure 1 .
Figure 1.VTEC data curve for 24 h from 2004 to 2022 in Athens, Greece.Figure 1. VTEC data curve for 24 h from 2004 to 2022 in Athens, Greece.

Figure 2 .
Figure 2. Comparison between original and filtered TEC data.

Figure 2 .
Figure 2. Comparison between original and filtered TEC data.

Figure 3 .
Figure 3.The curve diagram of TEC and solar activity parameters: (a) the curve diagram of filtered TEC data and F10.7 data; (b) the curve diagram of filtered TEC data and SSN data.

Figure 3 .
Figure 3.The curve diagram of TEC and solar activity parameters: (a) the curve diagram of filtered TEC data and F10.7 data; (b) the curve diagram of filtered TEC data and SSN data.

Figure 4 .
Figure 4. Data flow diagram of the MMAdapGA-BP-NN prediction model.

Figure 4 .
Figure 4. Data flow diagram of the MMAdapGA-BP-NN prediction model.

Figure 5 .
Figure 5. Schematic structure of the neural network, which is used in MMAdapGA-BP-NN.The BP-NN model test dataset accounts for about 10% of the total dataset, and the training set accounts for about 90% of the total dataset.The training set is represented by the formula shown below:

Figure 5 .
Figure 5. Schematic structure of the neural network, which is used in MMAdapGA-BP-NN.The BP-NN model test dataset accounts for about 10% of the total dataset, and the training set accounts for about 90% of the total dataset.The training set is represented by the formula shown below: D = x 1 1 , . . ., x 8 1 , y 1 , x 1 2 , . . ., x 8 2 , y 2 . . .x 1 n , . . ., x 8 n , y n .(5) BP-NN is mainly composed of the topology structure of the neural network (number of input/output parameters, number of layers of neural network), the weight and threshold between each layer, and the training and prediction of the neural network.The algorithm flow chart of BP-NN is shown in Figure6a.GA is a global optimization algorithm with good global search ability.The adaptive method mainly changes the individual crossover and mutation probability automatically with the change of fitness function value, which Remote Sens. 2023, 15, 2953 9 of 22

Figure 7 .
Figure 7.In the high solar activity year (2015), the scatter diagram and normal distribution diagram of the performance indicators of BP-NN, GA-BP-NN, and MMAdapGA-BP-NN after running 50 times: (a) the scatter diagram of the performance indicators of BP-NN, GA-BP-NN, and MMAdapGA-BP-NN running 50 times; (b) the normal distribution diagram of iteration times when the three models running 50 times; (c) the normal distribution diagram of RMSE when the three models running 50 times; and (d) the normal distribution diagram of ρ when the three models running 50 times.

Figure 7 .
Figure 7.In the high solar activity year (2015), the scatter diagram and normal distribution diagram of the performance indicators of BP-NN, GA-BP-NN, and MMAdapGA-BP-NN after running 50 times: (a) the scatter diagram of the performance indicators of BP-NN, GA-BP-NN, and MMAdapGA-BP-NN running 50 times; (b) the normal distribution diagram of iteration times when the three models running 50 times; (c) the normal distribution diagram of RMSE when the three models running 50 times; and (d) the normal distribution diagram of ρ when the three models running 50 times.

Figure 8 .
Figure 8.In the low solar activity year (2020), the scatter diagram and normal distribution diagram of the performance indicators of BP-NN, GA-BP-NN, and MMAdapGA-BP-NN running 50 times: (a) the scatter diagram of the performance indicators of BP-NN, GA-BP-NN, and MMAdapGA-BP-NN running 50 times; (b) the normal distribution diagram of iteration times when the three models running 50 times; (c) the normal distribution diagram of RMSE when the three models running for 50 times; (d) the normal distribution diagram of ρ when the three models running 50 times.

Figure 8 .
Figure 8.In the low solar activity year (2020), the scatter diagram and normal distribution diagram of the performance indicators of BP-NN, GA-BP-NN, and MMAdapGA-BP-NN running 50 times: (a) the scatter diagram of the performance indicators of BP-NN, GA-BP-NN, and MMAdapGA-BP-NN running 50 times; (b) the normal distribution diagram of iteration times when the three models running 50 times; (c) the normal distribution diagram of RMSE when the three models running for 50 times; (d) the normal distribution diagram of ρ when the three models running 50 times.We calculated the mean value of the results of 50 runs of MMAdapGA-BP-NN, GA-BP-NN, and BP-NN, which are used as the average prediction results of the three models.The curve graph of actual observed TEC data and the scatter graph of predicted data of each model in a high solar activity year (2015) are shown in Figure9a.The curve graph of the prediction error of the corresponding models is shown in Figure9b.The curve graph of actual observed TEC data and the scatter graph of predicted data of each model in a low solar activity year (2020) are shown in Figure9c.The curve graph of the prediction error of the corresponding models is shown in Figure9d.Monthly performance indicators (average of 50 runs) of the above six models in a high solar activity year (2015) and low solar activity year (2020) are calculated, and variation curves are drawn, as shown in Figure10.In the months with high TEC fluctuations, the prediction effect of the model is poor.In March, April, and May 2015, actual observed TEC data were larger than the normal condition, and the forecast data of MMAdapGA-BP-NN showed large errors.In March and April 2020, actual observed TEC data were smaller than normal, and the MMAdapGA-BP-NN forecast data had larger errors than in other months.But in general, compared with other models, the prediction results of MMAdapGA-BP-NN are more accurate and stable.

Figure 9 .
Figure 9. Actual observed TEC data predicted TEC data and prediction error with IRI-2020, SML-F10.7,SML-SSN, BP-NN, GA-BP-NN, and MMAdapGA-BP-NN in a high solar activity year (2015) and low solar activity year (2020): (a) predicted TEC data of each model and actual observed TEC data in a high solar activity year (2015); (b) prediction error of each model in a high solar activity year (2015); (c) predicted TEC data of each model and actual observed TEC data in low solar activity year (2020); (d) prediction error of each model in a low solar activity year (2020).Monthly performance indicators (average of 50 runs) of the above six models in a high solar activity year (2015) and low solar activity year (2020) are calculated, and variation curves are drawn, as shown in Figure10.In the months with high TEC fluctuations, the prediction effect of the model is poor.In March, April, and May 2015, actual observed TEC data were larger than the normal condition, and the forecast data of MMAdapGA-BP-NN showed large errors.In March and April 2020, actual observed TEC data were smaller than normal, and the MMAdapGA-BP-NN forecast data had larger errors than in other months.But in general, compared with other models, the prediction results of MMAdapGA-BP-NN are more accurate and stable.

Figure 9 . 22 Figure 10 .
Figure 9. Actual observed TEC data predicted TEC data and prediction error with IRI-2020, SML-F10.7,SML-SSN, BP-NN, GA-BP-NN, and MMAdapGA-BP-NN in a high solar activity year (2015) and low solar activity year (2020): (a) predicted TEC data of each model and actual observed TEC data in a high solar activity year (2015); (b) prediction error of each model in a high solar activity year (2015); (c) predicted TEC data of each model and actual observed TEC data in low solar activity year (2020); (d) prediction error of each model in a low solar activity year (2020).Remote Sens. 2023, 15, x FOR PEER REVIEW 16 of 22

Figure 10 .
Figure 10.(a-d) are the variation curves of monthly performance indicators (MAE, MRE, RMSE, and ρ of IRI-2020, SML-F10.7,SML-SSN, BP-NN, GA-BP-NN, and MMAdapGA-BP-NN in a high solar activity year (2015); (e-h) are the variation curves of monthly performance indicators of the six models in a low solar activity year (2020); (f-l) are the variation curves of monthly performance indicators of the average of two years of data.

Taking 6 :
00 to 18:00 as daytime and the rest of the time as night, the diurnal deviation of the predicted results is analyzed.The deviation of TEC data in daytime and night predicted by MMAdapGA-BP-NN in the high solar activity year (2015) and low solar activity year (2020) is shown in Figure11.The prediction error is large when actual observed TEC data fluctuate violently.Overall, the prediction error is smaller at night than during the

Figure 10 .
Figure 10.(a-d) are the variation curves of monthly performance indicators (MAE, MRE, RMSE, and ρ of IRI-2020, SML-F10.7,SML-SSN, BP-NN, GA-BP-NN, and MMAdapGA-BP-NN in a high solar activity year (2015); (e-h) are the variation curves of monthly performance indicators of the six models in a low solar activity year (2020); (f-l) are the variation curves of monthly performance indicators of the average of two years of data.

Taking 6 :
00 to 18:00 as daytime and the rest of the time as night, the diurnal deviation of the predicted results is analyzed.The deviation of TEC data in daytime and night predicted by MMAdapGA-BP-NN in the high solar activity year (2015) and low solar activity year (2020) is shown in Figure11.The prediction error is large when actual observed TEC data fluctuate violently.Overall, the prediction error is smaller at night than during the daytime.The performance indexes of MMAdapGA-BP-NN in daytime and night in the high solar activity year (2015) and low solar activity year (2020) are calculated, as shown in Table2.In the year of high solar activity (2015), MAE, MRE, RMSE, and ρ of MMAdapGA-BP-NN during the night decreased by 45.31%, 48.30%, 35.62%, and 14.63% compared with daytime.In the low solar activity year (2020), MAE, MRE, RMSE, and ρ of MMAdapGA-BP-NN during the night decreased by 48.65%, 60.70%, 39.42%, and 7.78% compared with daytime.
daytime.The performance indexes of MMAdapGA-BP-NN in daytime and night in the high solar activity year (2015) and low solar activity year (2020) are calculated, as shown in Table2.In the year of high solar activity (2015), MAE, MRE, RMSE, and ρ of MMAdapGA-BP-NN during the night decreased by 45.31%, 48.30%, 35.62%, and 14.63% compared with daytime.In the low solar activity year (2020), MAE, MRE, RMSE, and ρ of MMAdapGA-BP-NN during the night decreased by 48.65%, 60.70%, 39.42%, and 7.78% compared with daytime.

Figure 11 .
Figure 11.The deviation of TEC forecast data in daytime and night from MMAdapGA-BP-NN in a high solar activity year (2015) and low solar activity year (2020): (a) Prediction error stem of daytime and night in a high solar activity year (2015); (b) Prediction error stem of daytime and night in a low solar activity year (2020).

Figure 11 .
Figure 11.The deviation of TEC forecast data in daytime and night from MMAdapGA-BP-NN in a high solar activity year (2015) and low solar activity year (2020): (a) Prediction error stem of daytime and night in a high solar activity year (2015); (b) Prediction error stem of daytime and night in a low solar activity year (2020).

Figure 12 .
Figure 12. (a-d) are the performance indicators of MMAdapGA-BP-NN during daytime and night in a high solar activity year (2015) (average of 50 run results); (e-h) are the performance indicators curve of MMAdapGA-BP-NN during daytime and night a low solar activity year (2020) (average of 50 run results).

Figure 12 .
Figure 12. (a-d) are the performance indicators of MMAdapGA-BP-NN during daytime and night a high solar activity year (2015) (average of 50 run results); (e-h) are the performance indicators curve of MMAdapGA-BP-NN during daytime and night in a low solar activity year (2020) (average of 50 run results).

Figure 13 .
Figure 13.The absolute error of predicted TEC and geomagnetic storm days bar graph: (a) The absolute error of predicted TEC and geomagnetic storm days in high solar activity years (2015); (b) The absolute error of predicted TEC and geomagnetic storm days in low solar activity years (2020).

Figure 13 .
Figure 13.The absolute error of predicted TEC and geomagnetic storm days bar graph: (a) The absolute error of predicted TEC and geomagnetic storm days in high solar activity years (2015); (b) The absolute error of predicted TEC and geomagnetic storm days in low solar activity years (2020).

Table 1 .
Performances of each model in a high solar activity year (2015) and a low solar activity year (2020).NN decreased by 25.91%, 27.86%, 28.82%, and 10.98% compared with BP-NN.In the year of low solar activity (2020), the MAE, MRE, RMSE, and ρ of MMAdapGA-BP-NN decreased by 31.71%,41.39%, 24.11%, and 14.63% compared with BP-NN.Overall, the MAE, MRE, RMSE, and ρ of MMAdapGA-BP-NN decreased by 26.58%, 29.07%, 36.60%, and 9.52% compared with BP-NN.Based on the above data, it can be seen that the optimization effect of MMAdapGA on BP-NN is very obvious.Moreover, the optimization effect of MMAdapGA on BP-NN in a high solar activity year ( By observing the performance indicators of each model in Table1, it can be found that:1.According to the high solar activity year (2015), low solar activity year (2020) and the mean values of the two years of performance indicators, the IRI model has the largest prediction error; the SML-based model has a slightly better prediction effect than the IRI model; MMAdapGA-BP-NN has the smallest prediction error and the best effect among all models.2.The prediction results of all models are generally slightly better in a low solar activity year (2020) than in a high solar activity year (2015).The MAE, MRE, RMSE, and ρ of MMAdapGA-BP-NN decreased by 86.21%, 53.91%, 70.07%, and 3.

Table 2 .
Performance indices of MMAdapGA-BP-NN during daytime and night in a high solar activity year (2015) and low solar activity year (2020).

Table 2 .
Performance indices of MMAdapGA-BP-NN during daytime and night in a high solar activity year (2015) and low solar activity year (2020).

Table 3 .
The qualitative expression of geomagnetic storm intensity.

Table 3 .
The qualitative expression of geomagnetic storm intensity.