1. Introduction
The Total Electron Content (TEC) is a key physical parameter for the study of the ionosphere, and it is tremendously valuable for the correction of wave propagation and the exploration of ionospheric theory. When the frequency of satellite signals is known, only the TEC in the signal transmission path needs to be obtained to determine the ionospheric delay. Therefore, TEC can serve as an effective descriptor of ionospheric delays in satellite signals. Dual-frequency or multi-frequency users can form a linear combination of ionosphere-free delay using satellite observations, thereby weakening or eliminating ionospheric delay as much as possible. However, single-frequency users generally cannot obtain ionospheric delay from their own measurement data and must rely on the ionospheric TEC model for corrections. This TEC model is widely used in Global Navigation Satellite Systems (GNSS) [
1]. Different GNSS systems adopt different ionospheric models; for example, GPS and the BeiDou Satellite Navigation System use the Klobuchar model to correct ionospheric delay, while the European Satellite Navigation System Galileo chooses the NeQuick model for ionospheric delay correction [
2,
3,
4]. However, the accuracy of these models is not satisfactory, with the Klobuchar model only being able to correct 50% to 60% of ionospheric delays [
1]. In addition, the ionospheric TEC calculated using dual-frequency or multi-frequency observations can be modeled, providing a reference for GNSS single-frequency users for ionospheric delay.
Thus, it is crucial to obtain TEC quickly and accurately. Existing techniques for obtaining ionospheric TEC are mainly divided into two categories: actual TEC methods and ionospheric model methods. The first type includes calculating TEC using GNSS dual-frequency observations, TOPEX/Poseidon dual-frequency altimeter data, radio occultation data, and ionosonde data. The second method is to obtain TEC using ionospheric models, separated into physical models and empirical models. The physical models of the ionosphere are continuous energy and momentum equations based on the physicochemical properties of the ionosphere. However, due to the complexity of the intrinsic structure of the ionosphere and spatial disparities, physical models cannot comprehensively describe the spatiotemporal characteristics of the ionosphere, so related research often focuses on smaller regions. Ionospheric empirical models, on the other hand, are based on the spatiotemporal characteristics of the ionosphere, using reasonable functions to depict these characteristics and form empirical formulas from data observed in the long-term records of the ionosphere. Ionospheric TEC empirical models can fairly reflect the spatiotemporal characteristics of the ionosphere. In practical applications, ionospheric empirical models are typically chosen.
The GPS network has been operational on a global scale for over two decades, accumulating a significant volume of satellite observational data during this period [
5]. Such data has yielded copious modeling material for the creation of empirical Total Electron Content (TEC) models at individual stations. Mao and his team, for instance, leveraged TEC data from 1980 to 1990 and applied empirical orthogonal function analysis to develop an empirical TEC model over the Wuhan station [
6]. Huang et al. also stood out for using a Gaussian mixture model with an improved radial basis function neural network algorithm that successfully forecasted short-term TEC overhead at a single station [
7]. Huang’s team also made use of a hybrid genetic algorithm and a Back Propagation (BP) artificial neural network algorithm to construct a one-hour forecast model for the single-station ionospheric TEC [
8]. Feng et al., focusing their research on the MSNA area Antarctic Peninsula station, proposed a single station ionospheric TEC empirical model, namely the “SSM-month” (single station model-month) model [
9]. This model encompasses twelve sub-models, which separately describe TEC changes in different months without interference. Yet, these single-station models have some limitations [
10]. The SSM-month model, for example, is complex and has many coefficients, which may cause inconvenience in practical application. Moreover, these single-station models have significant regional limitations, which restrict their use to relatively small areas [
7,
9]. Future studies might make attempts to address these issues or seek new solutions.
Region-wide models have been developing rapidly. Orus et al. improved the accuracy of the global ionosphere map of the Polytechnic University of Catalonia by using the Kriging interpolation algorithm [
11]. Jakowski’s team built a global empirical model of the ionosphere using the nonlinear least squares method and GIM data issued by the European Orbit Centre from 1998 to 2007. The model is driven mainly by F10.7 and includes twelve model coefficients [
12]. Mukhtarov et al. also used the same method to build a global empirical model of the ionospheric TEC, drawing on CODE GIM data collected from 1999 to 2011 [
13,
14]. Ercha’s team constructed a global ionosphere model using the EOF method based on GIM data provided by the Jet Propulsion Laboratory between 1999 and 2009 [
15]. Also utilizing the EOF method, Wan et al. simulated the total global ionospheric electron content using JPL GIM data from 1998 to 2011 [
16]. Feng et al. utilized the CODE GIM data from 1999 to 2015, employed the Non-linear Least Squares estimation in conjunction with grid point concepts to fit model parameters, and developed a new Global Ionospheric TEC model [
10]. Performing an analysis of the previous day’s ionospheric simulation, Wang’s team succeeded in obtaining satellite and receiver spherical harmonics coefficients and code biases. Subsequently, they used Bayesian estimation as a tool and successfully enhanced the accuracy of the global ionospheric map at Wuhan University [
17]. Wang’s team went on to develop a unique adaptive autoregressive model for predicting the global ionospheric vertical total electron content diagram. This model is mainly based on the autoregressive model for predicting spherical harmonic coefficients and utilizes the F-test method to adaptively determine the order of the autoregressive model [
18]. In 2020, Wang et al. further proposed an improved version of the adaptive autoregressive grid point vertical total electron content prediction algorithm.
Meanwhile, Cherrier utilized deep neural networks and a series of CODE TEC data from 2014 to 2016 to design a global ionospheric model. This model can predict global TEC diagrams based on known past TEC graphs without introducing any prior information [
19]. Xiong proposed a new type of extended encoder–decoder long short-term memory extension (ED-LSTME). This model demonstrated good performance in the consistency of long-term time sequences and the determination of the optimal delay and predictions [
20]. However, there are inherent problems with these large models. Some models only cover specific areas or time periods, and some models use inconsistent accuracy datasets, which pose challenges for prediction accuracy and stability.
Taking into account the pros and cons of single-site models and regional models, our study has developed a new method for constructing regional ionospheric models based on single-site measurement data. For our model data set, we have used high-precision and unified GPS single-site data from mainland China. The Non-linear Least Squares method is used, considering anomalies during empirical model subcomponent modeling, and constructing a regional ionospheric model. The results indicate that the model has higher precision than empirical models such as IRI2020 and NeQuick2. This provides a new method for improving the current ionospheric models and offers more accurate data support for applications in ionospheric space weather and communication navigation fields.
2. GPS-TEC Data
GPS stations provide pseudo ranges and carrier phase readings for two L band frequencies. By calculating the difference between the codes or carrier phase values of the two frequencies, the pseudo-range TEC (also known as STE
Ca) and phase TEC (i.e., STE
Cr) from the satellite to the receiver can be obtained [
21]. The computation formula for pseudo-range TEC is as follows:
In this formula,
represents 40.3 m
3/s
2;
and
are GPS signal frequencies;
and
are recorded pseudo ranges;
represents the speed of light;
and
designate the satellite’s and receiver’s differential code biases, respectively. The estimation of the differential code bias requires a reduction in the differences in ionospheric delays between corresponding measurements [
22,
23]. The equation for phase TEC is as follows:
where
and
signify carrier phases;
,
represent wavelengths. Assuming that cycle slips do not disrupt the continuity of observation, differential code biases, and integer cycle ambiguities are constant within a cycle. Based on STE
Ca, by smoothing STEC
r for a specific time
over
continuous epochs, a more accurate slant TEC (STE
Ci) can be obtained [
24,
25].
To make slant TEC fit for a wider range of regional analyses, it must be converted into vertical TEC (VTEC). This conversion process uses the mapping function at various ionospheric penetration points, where the penetration point is the intersection of the line of sight and the shell of the ionosphere. The charged particles in the ionosphere are theoretically considered to be primarily distributed in a single thin layer, concentric with the Earth, and nested within the ionosphere. The height of this layer is subject to variations in day and night, geographical location, solar zenith, among other factors, generally between 350 and 480 km. The computation of its height can be quickly implemented. With measurement angles exceeding 30 degrees, the accuracy of the associated calculations significantly improves, making the conversion process feasible in most global regions. The calculation of VTEC can be achieved via the following formula [
24,
26].
Here,
represents the Earth’s average radius, and hm indicates the F2 layer’s peak height. E
0 denotes the satellite station’s elevation angle [
27,
28]. It is worth noting that in the crust observation network in the China region, the data at some stations may be unstable. In our TEC solution, we found occasional TEC values reaching thousands. For data quality assurance, it is necessary to conduct quality control, in the subsequent experiments, all TEC records exceeding 200 TECU will be excluded.
4. Model Comparison
To fully evaluate the performance of the model, we have selected the 2017 solar activity parameters, geographic coordinates (longitude and latitude) of various locations, day of the year, and local time as the input variables for the MEFM-ITCR model. The aim is to predict the TEC at various locations in China for the year 2017. For a comprehensive validation of the accuracy of our model, we additionally introduced the IRI2020 and NeQuick2 models for comparative reference. The ground truth was based on TEC calculated from 30 GPS locations. Here is the detailed information of GPS survey stations as shown in
Table 1.We compared the three models in different geographic locations, in different seasons, and under different geomagnetic disturbance conditions. Through this methodology, we hope to gain a comprehensive understanding of the accuracy and forecasting ability of the MEFM-ITCR model, as well as its strengths and weaknesses compared with other models.
4.1. Overview of IRI2020 and NeQuick2 Models
The International Reference Ionosphere 2020 (IRI2020) is an internationally recognized ionosphere model intended to depict the physical and chemical characteristics of the global ionosphere. This model can provide forecasts of ionospheric parameters (such as electron density, ionospheric height, TEC, etc.) and indices related to solar and geomagnetic activities. Based on the actual data from multiple monitoring stations worldwide and combined with physical models and statistical methods, IRI2020 can accurately model and predict ionospheric characteristics at different latitudes, longitudes, seasons, and times. This model plays a key role in scientific research, astronomy, communication, and navigation systems among others. As an international standardization model based on global cooperation, IRI2020 continues to iterate and optimize to provide more accurate ionosphere information and forecast data.
NeQuick2 is another ionosphere model used to predict the physical characteristics of the global ionosphere in various different environments. As one of the recommended ionospheric prediction models by the International Telecommunication Union (ITU), NeQuick2 is specifically used to optimize the performance of the Global Navigation Satellite System (GNSS). This model utilizes a substantial amount of global ionospheric observational data, combined with relevant physical models, to model and forecast the vertical electron density distribution of the ionosphere. It can also calculate various ionospheric parameters such as electron density, TEC, etc. In the application of GNSS, the NeQuick2 model plays a crucial role, especially in improving signal propagation accuracy and stability. By providing accurate ionospheric delay correction, NeQuick2 helps to enhance the accuracy of navigation and positioning. The model is co-developed by the European Space Agency (ESA) and other partners and is widely used in global navigation, communication, weather forecasting, and so on. In this study, we obtained data from the NeQuick2 model using the Fortran source code provided by The Abdus Salam International Centre for Theoretical Physics (ICTP). The download link is
https://t-ict4d.ictp.it/nequick2/ (accessed on 19 August 2023).
4.2. Evaluation Parameters
The evaluation metrics encompass primarily R2 (also known as the coefficient of determination), RMSE (root mean square error), MAE (mean absolute error), and ρ2 (the square of Pearson’s correlation coefficient). Each of these parameters is defined and calculated as follows:
R
2, the coefficient of determination, quantifies the extent to which there exists a correlation between the observed variables and the predictions made by a model. Its values range from 0 to 1. The computational formula for R
2 is given below:
Here, SSR symbolizes the residual sum of squares, which is generated by squaring and summing the differences between each observed value and its corresponding predicted value, and SST stands for a total sum of squares (the sum of squares of deviations between each observed value and the average of observed values).
Onto RMSE (root mean square error). This is a metric applied to gauge the divergence between a model’s predictive values and the true values. Below is how it is calculated:
In the above computation, n signifies the count of samples, denotes the observed value, and represents the corresponding forecasted value.
Switching to MAE (mean absolute error), this functions as a measure for the average absolute discrepancy between the predictive values and the legitimate values. Its calculation proceeds as follows:
Herein, n represents the total of samples, indicates the observed value, and is the corresponding estimated value.
Transitioning to ρ
2, (the square of Pearson’s correlation coefficient), Pearson’s correlation coefficient gauges the strength of a linear association between two variables. The square of this coefficient, ρ
2, embodies the percentage of variance in one variable elucidated by the variance in another variable. The computation for this occurs as follows:
where,
,
are the data, and
,
are the means of x and y, respectively.
6. Conclusions
In this research, we delve deeply into precise function models of the ionospheric Total Electron Content (TEC) diurnal component, seasonal variation component, geomagnetic activity component, MSNA correction component, and solar activity-related component based on the actual TEC data measured at GPS stations, in conjunction with solar flux and geomagnetic activity data. Utilizing the non-linear least squares method to fit the coefficients to be determined, we constructed an empirical model, MEFM-ITCR, aiming to forecast China’s regional ionospheric TEC.
We conducted multidimensional evaluations of the performance of the MEFM-ITCR model, which included geographical position variation analysis, seasonal variation analysis, geomagnetic disturbance analysis, and regional model comparison. In the area of China, irrespective of different latitudes and longitudes, different seasons, or various states of geomagnetic disturbance, the MEFM-ITCR model’s predictive power, its linear correlation, as well as the model’s accuracy, have all surpassed the IRI2020 and NeQuick2 models. When conducting a regional model comparison, we found that this model possesses considerable scalability. Even in the Indian Peninsula and Indian Ocean regions, areas not covered by the modeling dataset, it can accurately predict the increment of TEC, which implies that this model carries the potential for application in other areas. However, constrained by the fact that the modeling dataset only includes crust observation network GPS stations within China, the predictive capacity of the model for ionospheric abnormal phenomena, such as Equatorial Ionization Anomaly (EIA), remains to be enhanced. In response to this, we should further reinforce research in both MSNA and EIA areas to drive the continuous improvement and development of this model.