A Comprehensive Method for GNSS Data Quality Determination to Improve Ionospheric Data Analysis

Global Navigation Satellite Systems (GNSS) are now recognized as cost-effective tools for ionospheric studies by providing the global coverage through worldwide networks of GNSS stations. While GNSS networks continue to expand to improve the observability of the ionosphere, the amount of poor quality GNSS observation data is also increasing and the use of poor-quality GNSS data degrades the accuracy of ionospheric measurements. This paper develops a comprehensive method to determine the quality of GNSS observations for the purpose of ionospheric studies. The algorithms are designed especially to compute key GNSS data quality parameters which affect the quality of ionospheric product. The quality of data collected from the Continuously Operating Reference Stations (CORS) network in the conterminous United States (CONUS) is analyzed. The resulting quality varies widely, depending on each station and the data quality of individual stations persists for an extended time period. When compared to conventional methods, the quality parameters obtained from the proposed method have a stronger correlation with the quality of ionospheric data. The results suggest that a set of data quality parameters when used in combination can effectively select stations with high-quality GNSS data and improve the performance of ionospheric data analysis.


Introduction
Global Navigation Satellite Systems (GNSS) signals are measurably delayed as they pass through the Earth's ionosphere. Because the ionosphere is a dispersive medium, dual frequency GNSS observations can be utilized to generate Total Electron Content (TEC) estimates and thus contribute to various ionospheric studies. These include: modeling the ionosphere [1,2]; mapping global and local ionospheric TEC [3,4]; and studying ionospheric deformation caused by natural hazards such as earthquakes, volcanic eruptions, and tsunamis [5][6][7][8]. Recently ionospheric anomalies which can threaten navigation integrity of GNSS augmentation systems have also been extensively studied [9].
To be used in these various arenas, it is necessary to compute consistent high precision ionospheric measurements such as ionospheric delay on GNSS signal, ionospheric spatial gradient, and TEC using dual-frequency code and carrier measurements collected from GNSS reference station networks. However, in GNSS data processing, poor-quality GNSS observations can deteriorate the quality of ionospheric measurements. A discontinuity (i.e., data jump) in GNSS observations can occur as a result of a cycle slip, that is, a sudden jump of a number of integer cycles in carrier-phase measurements. Cycle slips are caused by the loss of lock of receiver phase lock loops due to a receiver/antenna being shaded in an "urban canyon" or foliated environments, low Signal-to-Noise Ratio (SNR) of satellite signals, or failure from electromagnetic interference in the receiver itself. Multipath caused by the reflection of satellite signals from the ground, buildings, or other obstacles incurs rapidly changing errors in GNSS observations. Large multipath errors on the code measurements when used to level the carrier measurements introduce errors on the delay estimates. Short arcs caused by cycle slips, outliers, or invalid observations are typically subjected to large leveling errors and thus make poor delay estimates [10]. While these errors are detected, removed, and controlled in data pre-processing, they cause estimation errors on ionospheric measurements since the treatment cannot be perfect especially when the GNSS data quality is very poor.
Recently, GNSS networks have become more widespread around the world and the number of stations has increased, which has led to an improvement in the observability of ionospheric behaviors. The Continuously Operating Reference Stations (CORS) network has over 2100 stations as of 2013 in the conterminous United States (CONUS) compared to about 400 stations prior to 2004. Japan has one of the most densely populated Global Positioning System (GPS) networks, with the GPS Earth Observation Network (GEONET) consisting of over 1200 stations. The increase in the total number of stations has led to a corresponding increase in the number of station with poor GNSS data quality as well. The use of poor quality data degrades the accuracy of ionospheric measurements. Therefore it is necessary to have a tool to characterize the quality of GNSS data collected from stations. If the stations with high quality data are effectively selected by using quality information of those stations, it would help obtain more reliable results from ionospheric data analysis. well suited to monitor ionospheric activity associated with natural hazards both in the locations of the occurrence and outside of it with sufficient spatial and temporal resolution to infer key properties such as velocity, direction, and magnitude of ionospheric disturbances. Because these applications use data collected from GNSS reference stations, the quality of the data from each station can affect the results of these studies.
As shown in Figure 1, some stations of the CORS network in CONUS are actually located in unfriendly environments (e.g., next to a solar panel, in a bush, and even in-between towers [15]). Because signal loss and attenuation are induced by obstacles, such as metal plates and branches, raw GPS measurements from these stations are corrupted and consequently may produce erroneous estimates of ionospheric measurements.  [15]. Figure 2a shows the ionospheric delay estimates of GPS L1 signals in the slant domain (i.e., along the actual path between satellite and receiver) observed from two nearby stations OKEE and AVCA (separated by 18.01 km) while they tracked PRN 22 on a nominal day (24 May 2012). OKEE is a good example of a station with poor GPS data quality. From the many fragments of ionospheric delay estimates from OKEE (red), it is evident that its carrier-phase measurements are corrupted by numerous cycle slips resulting in outliers and short arcs of ionospheric observations. By dividing the differences in the ionospheric delays by the separation distance, the ionospheric spatial gradients between the two stations are estimated [16], as shown in Figure 2b. The ionospheric-delay leveling errors due to the short arcs from OKEE are observed at each end of the curve, and the many fragments due to the excessive cycle slips on OKEE are evident in the center of the curve. Figure 3a,b shows vertical TEC (VTEC) estimates and VTEC perturbations of PRN 22 observed at OKEE and AVCA. TEC estimates in the slant domain were converted to equivalent vertical TEC (i.e., in the zenith or 90-degree upward direction above the observing receiver) via a geometric mapping function [16]. The perturbations of VTEC were estimated by subtracting the large-scale trend of VTEC variation from the original VTEC [14]. The hourly trend of VTEC was estimated using a moving average filter with a time window of two hours. It is obvious in Figure 3b that the cycle clips and short arcs on OKEE cause large VTEC perturbations which are not real, because this rapid variation of VTEC is not expected under quiet conditions (see Table 1). These examples illustrate how poor data quality degrades the accuracy of ionospheric measurements and can produce erroneous results.

Figure 2.
Examples of ionospheric measurements corrupted by poor quality GNSS data: (a) dual-frequency slant ionospheric delay estimates for CORS stations OKEE (poor quality data) and AVCA (good quality data) and (b) ionospheric spatial gradient estimates between OKEE and AVCA viewing PRN 22.

GNSS Data Quality Measurement Algorithms
A methodology for determining the quality of GNSS data has been developed by utilizing the GNSS data pre-processing technique for ionospheric data analysis as a basis and augmenting it with the TEQC algorithm and adaptive filter algorithm [11,[17][18][19]. The purpose of this method is to provide comprehensive and accurate quality information to users for the selection of high-quality GNSS data. The input of the method is the RINEX file collected from a station of our interest for two consecutive days and the output is the GNSS data quality parameters of the corresponding station. This method is composed of mainly three parts, as shown in Figure 4: Long-Term Ionospheric Anomaly Monitoring (LTIAM) pre-processing algorithm, TEQC algorithm, and adaptive filter algorithm.
The number of IOnospheric Delay (IOD) cycle slips, the number of outliers, and the number of short arcs are counted as separate data quality parameters using the LTIAM pre-processing algorithm. The percentage of observations, the Root Mean Square (RMS) of multipath on L1 code and L2 code measurements, and the number of data jumps detected using multipath estimates are computed by implementing the TEQC algorithm. Lastly, the mean of receiver noise on code measurements is calculated by designing an adaptive filter algorithm.

LTIAM Pre-Processing Algorithm
The LTIAM tool is an automated software package developed by the authors to build ionospheric anomaly threat models for GBAS and to evaluate the validity of the threat model over the life cycle of system by continually monitoring ionospheric behavior [17,18]. This tool automatically gathers GPS observation data during the potential days of anomalous ionospheric events which are selected based on external data from public space weather sites. After computing ionospheric delays and gradients using GPS data, the tool automatically searches for any anomalous gradients that are large enough to be potentially hazardous to GBAS users. The selected anomaly candidates will be manually validated and reported if deemed to be real anomalies.
High-quality ionospheric measurements are essential for the product of LTIAM. Thus, this tool includes a sophisticated pre-processing algorithm, which performs cycle slip detection, short arc removal, outlier removal, and code-carrier smoothing, to obtain precise estimates of ionospheric delays. In this paper, we utilize the existing LTIAM pre-processing algorithm to detect IOD cycle slips, outliers, and short arcs, and to count the numbers of these events as data quality parameters. The detection algorithms are described in Section 3.1.1, and detection thresholds are determined in Section 3.1.2. The common term, k n r , represents the sum of the true range between the n th receiver and k th satellite, receiver clock biases, satellite clock biases, and tropospheric error.
The dual-frequency code-derived estimate, I ρ , is noisier than the carrier-derived estimate, I φ , because the carrier-phase measurements have lower multipath and receiver noise errors than the code measurements (i.e., Li Cycle slips detected by using I φ ∇ are usually defined as "IOD cycle slips" [11]. The LTIAM IOD cycle detection algorithm performs better than the conventional data quality checking algorithm (e.g., TEQC, see Section 4) by applying three detection criteria: data jump, data gap, and loss of lock indicator. First, the difference between two adjacent ionospheric delays, I φ ∇ , is examined to detect a data jump greater than the slip detection threshold of 0.5 m for a nominal day. The determination of the threshold is explained in the following subsection. Second, the absence of L1 or L2 carrier-phase measurements is considered as cycle slips. Third, the Loss of Lock Indicator (LLI) of each observation from raw GPS data in RINEX format is also utilized as an indicator of potential cycle slips. After performing detection of cycle slips, continuous arcs are divided into several sub-arcs. The step of outlier detection is carried out for each sub-arc. Two approaches, the polynomial fit method and the adjacent point difference method, are executed in parallel to detect outliers [17,20]. First, a polynomial fit is performed on the carrier-derived ionospheric delay estimates, I φ , and the residuals (i.e., the I φ data minus the polynomial fit, fit P ), R , are computed for each epoch i t as shown in Equation (9). If the largest value of differential residuals, R ∇ , between adjacent points exceeds an outlier detection threshold of 0.5 m, this point is classified as a potential outlier. The determination of the threshold is explained in the following subsection: Second, the Outlier Factor (OF) between adjacent points of the point p at time p t is computed as: where p I and q I are I φ at time p t and q t , respectively [20]. w is the weight between two points, p and q at time p t and q t . In this equation, the set "adjacent" includes all points within fifteen minutes centered at the point p at time p t . If the potential outlier identified from the polynomial fit method returns the largest OF, the point is recorded as an outlier and removed. To detect all outliers, this process is repeated until no more outliers remain. LTIAM also detects short arcs which are continuous arcs of less than ten data points or five minutes when an interval of data points is thirty seconds. The short arcs need to be discarded because leveling errors for those arcs are typically large and cause ionospheric delay estimation errors [17]. In this step, we count the number of IOD cycle slips, the number of outliers, and the number of short arcs as data quality parameters.

Determination of Detection Thresholds
LTIAM was originally designed to process data from the period of anomalous ionospheric events. Thus, LTIAM pre-processing algorithms use relaxed detection thresholds in order to prevent ionospheric data from being misjudged as cycle slips or outliers and discarded under ionospheric storm conditions. A threshold of 2.5 m for cycle slip detection and a threshold of 0.8 m for outlier removal were used as defaults [17]. However, data quality checks are commonly conducted by using data from nominal days on which anomalous ionospheric events rarely happen. This section thus newly determines cycle slip and outlier detection thresholds respectively through statistical analyses. We first collect data from CORS stations in CONUS for seven consecutive days and obtain statistical distributions of differential ionospheric delays, I φ ∇ , and differential residuals, R ∇ (where the residuals, R , are the carrier-derived ionospheric delays, , minus the polynomial fit of I φ ).
The geomagnetic conditions on these seven consecutive days are shown with two indices of global geomagnetic activity from space weather databases [21]: planetary K (Kp) and disturbance storm time (Dst). In this period, a total of 1654 CORS network stations were operating in CONUS. As Kp and Dst in Table 1 indicate, the geomagnetic storm condition was quiet. This allows CORS station data quality to be observed while minimizing any influence of abnormal ionospheric behavior.  Figure 5a,b shows the probability density function (PDF) on a logarithmic scale and cumulative distribution function (CDF) of I φ ∇ derived from data for seven consecutive days respectively.
In Figure 5a, we see that the PDF of I φ ∇ steadily decreases as I φ ∇ increases when I φ ∇ is smaller than 0.5 meters, and PDF stays almost the same on the order of 10 −4 for I φ ∇ greater than 0.5 m. The CDF of Figure 5b shows that the probability that I φ ∇ goes beyond 0.5 meters is approximately 0.2 percent. This statistical result indicates that the rare occurrences of I φ ∇ greater than 0.5 meters are likely to be due to cycle slips. One example of the comparison between erroneous I φ ∇ and normal I φ ∇ is shown in Figure 6. Figure 6a,c show the ionospheric delay estimates of all GPS satellites in the slant domain observed from stations 1SUN and OKEE on a nominal day (24 May 2012). The CDFs of I φ ∇ derived from each station are shown in Figure 6b,d, respectively. The carrier-phase measurements of OKEE were corrupted by numerous cycle slips, resulting in inaccurate ionospheric delay estimates while good quality data of 1SUN produce precise ionospheric delay estimates. Approximately 10 percent of total I φ ∇ of OKEE has a value greater than 0.5 m, while no I φ ∇ from 1SUN exceeds 0.5 m. From these results, a threshold of 0.5 m was determined for cycle slip detection for nominal days. (c) (d) Figure 7a,b shows the PDF on a logarithmic scale and CDF of the differential residuals, R ∇ , (where the residuals are the ionospheric delay data minus the polynomial fit) derived from data for seven consecutive days respectively. The distribution of R ∇ shows that the probability is very small (on the order of 10 −4 ) for R ∇ greater than 0.5 m. As shown in Figure 7b, the probability that R ∇ exceeds 0.5 m is approximately 0.02 percent. A threshold of 0.5 m is used for outlier detection in this study.

TEQC Algorithm
The TEQC software is commonly used to check data quality of GPS data in the RINEX format [11]. We selected and implemented some parts of TEQC algorithms to develop a comprehensive quality determination method for supporting broader communities including users for ionospheric studies. The quality parameters include the percentage of observations, the RMS of multipath on L1 and L2 code measurements, and the number of data jumps detected using multipath estimates. The percentage of observations is the ratio of "possible observations" to "complete observations," where "possible observations" indicate the total number of possible observation epochs in a given time window, and "complete observations" are the number of epochs that actually observed code and carrier-phase data.
The LTIAM IOD cycle slip detection algorithm performs better than the IOD cycle slip detection of TEQC by applying three detection criteria. However, if data jumps occur in carrier-phase measurements due to receiver clock jumps (i.e., receiver clock slips) on both L1 and L2 signals simultaneously, these cannot be detected using IOD measurements. Thus, we augmented slip detection by incorporating the TEQC method, which detects data jumps using multipath estimates. The data jumps detected by using multipath (MP) estimates are defined as MP slip. The MP slip method uses linear combinations of L1/L2 code ( 1 ρ L , 2 ρ L ) and carrier-phase ( 1 φ L , 2 φ L ) measurements [11]. These linear combinations are defined as: Li M and Li m are the multipath errors on code and carrier-phase measurements on the Li ( 1, 2) i = signals, respectively. The bias terms, 1 B and 2 B , are: Li N is the integer ambiguity of the Li frequency signals, and γ is the square of the frequency ratio as shown in Equation (5). When the difference between two consecutive points (at epoch i t and epoch 1 i t − ) in each continuous arc of MP1 or MP2 is greater than a threshold of 10 m as shown in Equation (17), it is identified as a data jump. If the data jump occurs at a different point in time compared to an IOD cycle slip, this data jump is referred to as an MP slip: After performing IOD cycle slip and MP slip detection, the arcs are divided by the detected slips. The biases, 1 B and 2 B , of the sub-arcs of MP1 and MP2 are assumed to be constants unless an undetected slip is remaining. Therefore, these constants are removed from each arc, and the RMS values of these linear combinations are reported. Although the portion of carrier-phase multipath is included in this reported value, the amount is small compared to that of code multipath. Thus, the bias-removed MP1 and MP2 can be approximated to be the multipath errors on L1 code and L2 code measurements, respectively.

Adaptive Filter Algorithm
An adaptive filter algorithm is designed to estimate receiver noise on code measurements. After removing the bias components, 1 B and 2 B , of MP1 and MP2 from Equations (13) [19]. The adaptive filter takes two inputs: a primary input and a reference input. In this study, _ MPi new for the day of interest is set as the primary input, and _ MPi new for the previous day is set as the reference input. Then, the output of a Finite-duration Impulse Response (FIR) filter is calculated using the reference input and weights. A least-mean-square (LMS) algorithm has been used to adaptively adjust the weights of the FIR filter to minimize the sum of squared estimation errors.
The adaptive filter returns the part of the primary input that is strongly correlated with the reference input as its output. Thus, the i mp of the primary input (i.e., the multipath estimate on the code measurement) is calculated as the output of the adaptive filter. The estimation error of the filter approximately represents the code receiver noise, i ε , because it represents the value with i mp removed from the primary input. As explained, in order to estimate the receiver noise, i ε , correlation between the MPi of two consecutive days must exist. However, there are cases where such correlation is not clearly visible depending on receiver/antenna type and environmental changes. In these cases, the receiver noise in the quality output is presented as "not available (N/A)".

Results
The CORS data on the dates in Table 1 were collected and analyzed to evaluate the performance of the data quality measurement algorithms. As explained above, these seven consecutive days during which the geomagnetic storm condition was quiet are suitable for observing GNSS data quality because the chance of cycle slips and outliers being falsely detected due to any influence of abnormal ionospheric behavior is minimized. Using the results of this method, the comparative analysis on the performance of stations in the CORS network was conducted in Sections 4.1 and 4.2. In Section 4.3, we also examine the correlation between the data quality parameters obtained in this study and TEC perturbation which well represents the quality of ionospheric data under ionospherically quiet conditions. These results from correlation analysis are compared to that of the TEQC software. Section 4.4 discusses the selection of high quality data which can be conducted by utilizing data quality parameters through case studies.

Data Quality Parameter Output per Station
The statistics of quality parameters obtained from the tests are used to compare the performance of each station. Table 2 shows the results from the GNSS data quality measurement algorithms for station NVLA on 27 May 2012. The receiver model and the type of antenna can be found in the header part of the RINEX file collected from the station. While the RINEX file records the SNR for L1 and L2 frequencies, the unit of SNR is dependent on each receiver and not all stations provide SNR. Since the GPS observations at low elevation angles (i.e., weaker received signal strengths) are affected by larger multipath errors and prone to loss of lock, an elevation cutoff angle of 10 degrees (as a default) is used. The number of IOD cycle slips, the number of outliers, and the number of short arcs, the percentage of observations, the number of MP slips, the RMS of multipath errors on L1 and L2 code measurements, and the mean of receiver noise on L1 and L2 code measurements are computed using the proposed data quality determination method.
The quality measurements corresponding to those in Table 2 are obtained from each station every day during the seven days listed in Table 1. Table 3 shows the rank of stations for five quality parameters (each parameter of stations is averaged over all seven days) among a total of thirteen parameters. The worst station is on the top for each quality parameter, and the same station is highlighted with the same color. Table 3 shows that the worst stations are likely to be identified by multiple data quality parameters. Recall that, among the highlighted stations in this table, station OKEE was introduced as an example of station with poor GPS data quality in Section 2.

Distributions of Data Quality Parameters
The results of analyzing the quality parameters of the CORS stations in CONUS show us how widely station performance can vary. Figure 8a shows the total number of IOD cycle slips counted over all satellites during 24 h at each station. These numbers are counted for the seven consecutive days. The station ID is plotted (in no particular order) along the x-axis, and the number of IOD cycle slips is plotted along the y-axis. The blue circle shows the mean value of all seven days on each station and the red dot represents the minimum value among seven days on each station, respectively. These two values over seven days are close together for most stations, indicating that poor data quality of a station persists for an extended period. From this test, 1.2 percent of stations had more than 500 IOD cycle slips per day, and more than 15 percent of the stations had more than 50 IOD cycle slips. Note that the mean value over all seven days and all stations is 39.94. Figure 8b shows the total number of short arcs counted over all satellites during 24 h at each station. If cycle slips frequently occur, the number of short arcs increases because ionospheric delay data are divided into sub-arcs by the cycle slips. Thus, a high correlation exists between the number of IOD cycle slips and the number of short arcs. More than 10 percent of the stations had more than 50 short arcs per day while the mean value over all seven days and all stations is 34.48. The number of outliers also widely varies depending on station performance as shown in Figure 8c. This quality parameter has the mean value of 1.21 over all days and all stations, and as many as 2.4 percent of stations had more than 10 outliers per day. Figure 8d presents the total number of MP slips counted at each station per day. While the occurrence of IOD cycle slips mainly depends on environmental conditions around receivers, MP slips occurring due to receiver clock jumps rely upon the receiver itself. Thus, the distribution of the number of MP slips is dissimilar to that of IOD cycle slips. The mean value over all seven days and all stations is 15.02 MP slips per day, while one percent of stations had more than 500 MP slips per day, and more than 3.6 percent of the stations had more than 50 MP slips. The RMS values of multipath errors on L1 code and L2 code measurements of each station are shown in Figure 8e,f, respectively. Since multipath errors are caused by the reflection of satellite signals from the environment around receivers such as the ground, buildings, or other obstacles, the distributions of RMS multipath on L1 and L2 code measurements are very much alike. The mean values of RMS of multipath errors on L1 and L2 code measurements over all days and all stations are 0.3411 and 0.3876 m, respectively.
In most quality parameters, better data quality results in smaller values. However, higher values of percentage of observations indicate better quality of data. Thus, the mean values (blue circle) and the maximum values (green dots) of the percentage of observations of each station are compared to confirm that the poor data quality of a station persists for an extended period. In the percentage of observations, the maximum value (green dots) of a station across seven days and the mean value (blue circle) over seven days are also close together for most stations as shown in Figure 8g. The mean value of the percentage of observations over all days and all stations is 97.39 percent. As can be seen in Figure 8a-g, the range of good and poor performance varies noticeably for each quality parameter. It can be observed that most stations maintain similar performance for the duration of this data set. This information suggests that we can select high quality GNSS data in a station basis and the quality parameters of each station should be useful for the selection.  Figure 9a through 9g show the PDF of each quality parameter on each station per day in logarithmic scale. These test statistics are obtained from data collected for the seven days in Table 1. As an example, the PDF of the number of IOD cycle slips on each station per day is shown in Figure 9a. The dashed vertical lines in Figure 9a-f refer to the value of 9 µ σ + (the mean value plus 9 times the sample standard deviation) for each parameter. In Figure 9a-f, since data (blue) exist continuously from 0 to this line and the continuity of data ceases beyond this line, the data that go beyond 9 µ σ + are considered to be extreme outliers (i.e., stations with poor data quality). The dashed vertical line in Figure 9g represents the value of 9 µ σ − . The percentage of observations which falls lower than this line indicates extremely poor data quality.

Correlation between Data Quality Parameters and TEC Perturbations
To examine the possibility of selecting high quality GNSS data (i.e., high quality stations) based on data quality parameters for ionospheric studies, the correlation between TEC perturbation and each quality parameter was investigated. The TEC perturbation measurements are generated by processing dual-frequency GPS measurements collected from the CORS network using the LTIAM software. In Figure 10, normalized standard deviations of TEC perturbation calculated during the seven days listed in Table 1 are plotted along the x-axis. The standard deviations of TEC perturbation obtained from data over all satellites during 24 h at each station are averaged over the seven days. The averaged standard deviations of TEC perturbation at each station are normalized by removing their mean over all stations and dividing them by their standard deviations. Large TEC perturbation is not expected to be seen in mid latitude regions on the dates (listed in Table 1) during which geomagnetic activities were quiet. Thus the large TEC perturbations observed from some stations in Figure 10 are likely due to poor quality GPS data corrupted by cycle slips, outliers, multipath, and so on. The correlation between TEC perturbation and the number of IOD cycle slips counted using the TEQC software tool is shown in Figure 10a. As done for the proposed method in this paper, TEQC also used differential ionospheric delay, I φ ∇ , for cycle slip detection. To make a direct comparison of its performance to that of the proposed method (from which the results are shown in Figure 10b), we set a threshold of 0.5 m in TEQC which is the same value determined in Section 3.1.2. The number of cycle slips is counted using TEQC over all satellites during 24 h at each station. These numbers are averaged over the seven days and then normalized over all stations to have zero mean and unit variance. In this case, the Pearson's correlation coefficient between the number of cycle slips and the TEC perturbation, r , is 0.4019. The six quality parameters (the number of IOD cycle slips, the number of short arcs, the number of outliers, the number of MP slips, the RMS of MP1, and the percentage of observation) obtained from the proposed method in this paper are calculated, averaged over seven days, normalized over all stations, and plotted along the y-axis in Figure 10b-g. As shown in Figure 10b, the TEC perturbation is more highly correlated with the number of IOD cycle slips ( r = 0.4970) than with the number of cycle slips ( r = 0.4019) derived from TEQC. This result demonstrates that the cycle slip detection of the proposed method performs better and its output more accurately represents GPS data quality. The correlation coefficients were also considerably high in the cases of the number of short arcs (r = 0.4785) and the number of outliers (r = 0.4891), indicating that these would affect the quality of ionospheric data. The correlations of TEC perturbation with MP1 and the percentage of observation are not strong, and that with the number of MP slips which are not visible on TEC estimates is weak. Assuming that the normalized quality parameters are independent and the sum of the normalized quality parameters has a normal distribution, we combine multiple quality parameters into a new quality parameter which has a stronger correlation with TEC perturbation. As shown in Figure 10h, the correlation increased ( r = 0.6486) after adding the three quality parameters which have the first to third highest correlation: the number of IOD cycle slip, the number of outliers, and percentage of observation. In this combination set, the number of short arcs was not included to avoid adding duplicated information because its distribution is almost equal to that of the number of IOD cycle slips. More effective station selection would be possible using the parameter with higher correlation (which will be discussed in the following subsection).

Case Study: Station Selection
This subsection shows the possibility of utilizing the data quality parameters driven by the proposed method to select stations with high quality GNSS data. The performances of two cases which use the TEQC-driven cycle slip parameter and the combined quality parameter respectively were compared. Figure 10a,h which present results from correlation analyses for the two parameters were redrawn in Figure 11a,b. Based on the standard deviation of TEC perturbation, sixteen stations (1 percent of the total stations) that produce ionospheric data with the poorest quality were identified and denoted with black asterisks in Figure 11a-d. To exclude these worst case stations using the number of cycle slips obtained from TEQC, the threshold of the normalized number of cycle slips is lowered to 0.0074 (demarcated with the dashed horizontal line in Figure 11a). However, if we select stations that fall below the threshold, 286 stations denoted with red crosses (17 percent of the total stations) in Figure 11a are sacrificed although these stations have good quality data. On the other hand, in the case of using the newly defined parameter by combining three quality parameters which have the highest correlation coefficients, only 164 (10 percent of the total stations) stations marked with red crosses in Figure 11b are additionally removed because of exceeding a threshold of 0.6469 (demarcated with the dashed horizontal line in Figure 11b). The stations marked with blue diamonds are selected as shown in Figure 11c,d (the zoomed-in plots of Figure 11a,b). Note that a value of one was added to all of the data prior to plotting, because negative data cannot be represented in a logarithmic scale. It is evident that the use of parameter which has stronger correlation with the quality of ionospheric data improves the performance of station selection. The set of data quality parameters when used in combination allowed the effective selection of high quality GNSS data and better performance compared to the TEQC parameter, although this is not necessarily the best solution. Research on the optimal means of utilizing data quality parameters generated by the proposed method for selecting high quality stations is in progress [22] and beyond the scope of this paper.

Conclusions
The use of corrupted GNSS data degrades the quality of ionospheric measurements. Thus, it is necessary to check the quality of observation data and use high quality GNSS data only for ionospheric data analysis. This paper presents a methodology to determine the quality of GNSS observations collected from a reference station for the purpose of ionospheric studies. This method provides a comprehensive set of quality control parameters calculated using the sophisticated pre-processing algorithms of the LTIAM which are augmented with the TEQC algorithm and adaptive filter algorithm. These quality parameters include the number of cycle slips, the number of short arcs, the number of outliers, the number of MP slips, the percentage of observations, the RMS of multipath on L1 and L2 code measurements, and the mean of receiver noise. The results from analyzing the GNSS data quality of the CORS network showed that the range of good and poor qualities varies noticeably for each quality parameter and the performance of individual stations persists for an extended time period. This indicates that high quality data can be selected in a station basis by utilizing data quality parameters. The correlation analysis between data quality parameters and TEC perturbations which well represent the quality of ionospheric data demonstrated that the quality parameters obtained from proposed method have stronger correlation than that of TEQC and thus enable a better performance when used for station selection. Furthermore, a set of quality parameters was used in combination, its correlation with TEC perturbations increased and the performance of selecting high quality stations was improved.
As the number of GNSS stations and also GNSS applications where their observations can be employed steadily increase, it becomes more important to characterize the quality of GNSS observations. The proposed method should be applicable for the GNSS users of various applications to check the quality of GNSS observations and accordingly select high-quality data. This will especially help to improve the performance of applications for which precise GNSS data is essential, such as Real Time Kinematic (RTK), precise orbit determination of satellites, and the estimation of the Earth Rotation Parameters (ERP). The use of the statistical information on quality parameters obtained from this method allows selecting stations desired for specific applications. Research on the best means of utilizing these statistical results and effectively selecting stations with high quality data is an ongoing research topic that will benefit a wide range of GNSS applications.