2.4.2. Quality TOPSIS Scoring
This module executes an eight-step process to generate a comprehensive quality score for each station in S1.
(1) Construction of the Quality Metric System. The quality assessment process begins by processing the observation data for all stations in S1 using the Anubis software. A comprehensive metric system is established, comprising 40 indicators—10 for each of the four GNSS constellations. For any constellation
Q, where
Q ∈ {
G,
R,
E,
C} (representing GPS, GLONASS, Galileo, and BDS, respectively), the 10-element metric vector is defined as:
The definitions of these metrics correspond to those in
Section 2.1. The processing parameters were configured with a 7° elevation cut-off angle, a 600 s data gap threshold, and a 1800 s minimum continuous observation segment. Stations with less than 0.5 h of data are consequently excluded. If a station lacks data for an entire constellation, a default penalty value is assigned to its corresponding metrics.
(2) Directional Normalization. To create a standardized decision matrix, all 40 metrics are normalized to a common scale using min-max scaling. The metrics
Qnobs,
Qcnr1, and
Qcnr2 are benefit-type indicators, whereas the remaining seven are cost-type indicators. For benefit-type indicators, the normalization is performed as follows:
where
is the original value of the j-th benefit-type metric for the i-th station,
is the maximum value of that metric across all stations, and
xnorm,ij is the resulting normalized value. For cost-type indicators, the normalization formula is:
where the superscript “− denotes a cost-type indicator, and
is the minimum value of the j-th cost-type metric across all stations.
(3) Subjective Weighting via AHP. As a multi-criteria decision-making method, the AHP provides a formal structure for quantifying subjective assessments by transforming them into numerical values through a series of pairwise comparisons [
17]. First, a 10 × 10 pairwise comparison matrix,
Ah, is constructed based on expert knowledge.
The core principle of this AHP matrix is to quantify the relative impact of the degradation of any two quality metrics on the ultimate goal: GNSS four-constellation POD. The matrix is populated using the fundamental 1–9 scale (
Table 1).
The order of the metrics in the matrix corresponds to their sequence in the
Qqc vector. For example, the element A
h[0][6] = 3 indicates that the number of observations is considered slightly more important than the multipath error. The rationale is that a sufficient quantity of observations is a prerequisite for reliable POD, rendering low multipath values meaningless in its absence. Similarly, A
h[6][7] = 1 implies that multipath effects on the first and second frequencies are of equal importance, as the quality of observations on both frequencies is equally critical for the dual-frequency ionosphere-free combination. Conversely, A
h[3][8] = 1/5 signifies that receiver clock jumps are considered significantly less important than the carrier-to-noise ratio. This judgment is based on the fact that clock jumps are typically detectable and correctable during data preprocessing, whereas a low C/N0 ratio indicates poor signal quality that systematically increases the noise level of all observations. The consistency of the matrix was validated, yielding a Consistency Ratio (CR) of 0.06, which falls below the standard threshold of 0.1. The subjective weight vector is derived by normalizing the columns of
Ah and then averaging the elements in each row:
where
is the subjective weight of the i-th metric, and
is the element of the column-normalized matrix
Ah. In this equation,
denotes the element in the i-th row and
j-th column of matrix
Ah. The denominator
represents the sum of the first 10 elements of the j-th column of matrix
Ah.
(4) Objective Weighting via the Entropy Weight Method. The fundamental principle of the Entropy Weight Method is to determine objective weights based on the amount of information conveyed by each indicator. An indicator with greater variation in its values across different alternatives is considered to contain more information and thus exerts a greater influence on the evaluation, assigning a higher weight. Conversely, an indicator with less variation is assigned a lower weight [
18].
We first calculate the information entropy for each indicator according to the following equation:
In this equation,
denotes the information entropy of the
j-th indicator, m represents the total number of stations, and
is a smoothing parameter introduced to prevent numerical instability. The term
, representing the probability of the
j-th indicator for the
i-th station, is calculated as follows:
Finally, the objective weight for the
j-th indicator,
, is computed using the subsequent formula:
(5) Fusion of Subjective and Objective Weights. The subjective and objective weights are combined using a weighted average to form a single-system metric weight:
where α is a coefficient representing the preference for subjective weights, set to 0.7 in this study to emphasize the importance of expert knowledge.
(6) Application of Constellation-Specific Weights. We use a three-thread parallel processing strategy (Thread 1: GPS + GLONASS; Thread 2: GPS + Galileo; Thread 3: GPS + BDS) in this study. Since GLONASS, Galileo, and BDS are all processed in conjunction with GPS, a higher weight is assigned to the GPS-related metrics. The final weight for each metric is thus defined as:
where
wQ is the constellation-specific weight:
wG = 0.4,
wR = 0.2,
wE = 0.2 and
wC = 0.2.
(7) Calculation of the Final TOPSIS Score. A weighted normalized decision matrix
Z is constructed using the final metric weights, that is,
, where
zij represents the weighted value of the
j-th indicator for the
i-th station. The ideal and negative-ideal solutions are then determined as described in
Section 2.2, and the final TOPSIS score
for the
i-th station, where
∈ [0, 1], is calculated.
(8) Station Quality Classification. Finally, stations are classified into four quality levels based on their TOPSIS scores: Excellent (C ≥ 0.8), Good (0.6 ≤ C < 0.8), Fair (0.4 ≤ C < 0.6), and Poor (C < 0.4). The thresholds were derived empirically from the statistical distribution of the scores. Our analysis indicated that stations with scores above 0.8 generally exhibit high data integrity, whereas scores between 0.6 and 0.8 often signify missing data for a specific GNSS system or for a particular frequency within a system. This indicates that the TOPSIS score can serve as a preliminary diagnostic tool to identify potential issues in a station’s data.
The entire quality scoring procedure is summarized in Algorithm 1.
| Algorithm 1: TOPSIS_SCORE |
| Input: |
| Station metric matrix X (dimensions: n stations × 40 metrics); Set of constellation identifiers Q = {G, R, E, C}; Metric directionality vector dir (benefit-type/cost-type); Constellation weights wsys; Subjective weight coefficient α = 0.7 |
| Output: |
| A CSV file containing station names, coordinates, TOPSIS scores, and quality levels |
| Procedure: |
| X^ ← directional_normalize(X, dir) |
| for s in Q: |
| sub ← columns of X^ in system s (10 cols) |
| wobj ← entropy_weight(sub) |
| wcomb ← normalize(α·wsub + (1 − α)·wobj) |
| apply column weights wsys[s]*wcomb to sub → Ys |
| Y ← concats(Ys) by columns |
| v+ ← colwise max(Y); v− ← colwise min(Y) |
| for each station i: |
| Rank stations based on Ci and assign quality levels based on predefined thresholds |
| Write results to a CSV file |
2.4.3. Clustering and Station Selection
(1) Normalization to the Unit Sphere. Acknowledging that the station distribution is on a sphere rather than a plane, the k-means algorithm is adapted to use spherical distance. This requires first normalizing the three-dimensional Earth-Centered, Earth-Fixed (ECEF) coordinates of each station onto a unit sphere:
where
represents the ECEF coordinates of the
i-th station, and
denotes its corresponding coordinates on the unit sphere.
(2) Initialization via Spherical k-means++. To mitigate the sensitivity of k-means to initial conditions, the k-means++ algorithm [
19] is used to determine an intelligent initial set of k cluster centroids. The procedure (summarized in Algorithm 2) begins by selecting the first centroid uniformly at random. The next centroid is then chosen from the remaining stations with a probability proportional to the squared spherical distance to the nearest existing centroid. This is repeated until all k centroids are selected. For the selection of the
j-th cluster centroid, the spherical distance from each point to its nearest previously selected centroid is calculated as follows:
where
denotes the minimum spherical distance from the
i-th point to the set of the first
j − 1 selected centroids. The probability of selecting a point as the next centroid is then calculated using the following formula:
The entire procedure for the spherical distance-based k-means++ initialization is summarized in Algorithm 2.
| Algorithm 2: Spherical K-Means Plus Plus |
| Input: |
| Normalized station coordinates (dimensions: n stations × 3); the number of clusters k. |
| Output: |
| A set of k centroids {c1 … ck} |
| Procedure: |
| pick first center c1 uniformly at random |
| for t = 2…k: |
| Di ← mincarccos(clip(dot(, c), −1, 1)) |
| pi ← Di2/( Dj2) |
| sample ct ~ Categorical(p) |
| return centers {c1…ck} |
(3) Spherical k-means Iteration and Final Selection. The final selection stage involves three key aspects:
(a) Assignment Step: each station is assigned to the nearest centroid using spherical distance as the metric:
where
denotes the cluster assignment for the
i-th station, and
k is the desired number of clusters (equivalent to the number of selected stations).
(b) Update Step: the new centroid for each cluster is calculated as the vector mean of all station coordinates within that cluster, which is then re-normalized to the unit sphere:
where
is the updated centroid of the
j-th cluster,
represents the set of stations belonging to cluster
j, and
denotes the number of stations in that cluster.
(c) Optimality Criterion: the standard k-means procedure is iterated until convergence. To ensure a robust result, this entire process, from initialization to convergence, is repeated independently R times. The optimal clustering result is identified by minimizing the inertia J, defined as the sum of squared spherical distances of stations to their respective cluster centroids:
where
is the centroid of the cluster to which the
i-th station belongs. The final station list,
, is composed of the highest-scoring stations from the clusters of the run that yields the minimum inertia value.
The complete algorithm for clustering-based station selection is summarized in Algorithm 3.
| Algorithm 3: K-Means Based Station Selection |
| Input: |
| DataFrame df containing station coordinates, TOPSIS scores, etc.; Desired number of stations k; Number of independent runs R |
| Output: |
| The final optimal station list Sbest. |
| Procedure: |
| coords ← df.xyz; q ← df.topsis_score |
| x^ ← row_normalize(coords) |
| for r in 1…R: |
| c0 ← SphericalKMeansPlusPlus(x^, k) (Algorithm 2) |
| labels, centers ← KMeans(x^, init = c0) |
| Jr ← ∑iarccos(clip(dot(x^i, normalize(centers[labelsi])), −1,1))2 |
| Sr ← argma xiqi within each cluster |
| r* ← argminrJr |
| Sbest ← Sr* |