Next Article in Journal
A Hybrid Improved Atom Search Optimization Algorithm Optimizes BiGRU for Bus Travel Speed Prediction
Previous Article in Journal
Unified Representation and Game-Theoretic Modelling of Online Rumour Diffusion
Previous Article in Special Issue
Modeling and Forecasting U.S. Outbound Travel Demand Across Regions Using Time Series Model and Machine Learning: A Comparative Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Geospatial Clustering of GNSS Stations Using Unsupervised Learning: A Statistical Framework to Enhance Deformation Analysis for Environmental Risk Management

by
Daniel Álvarez-Ruiz
,
Alberto Sánchez-Alzola
* and
Andrés Pastor-Fernández
School of Engineering, University of Cádiz, 11519 Puerto Real, Cádiz, Spain
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(5), 855; https://doi.org/10.3390/math14050855
Submission received: 30 December 2025 / Revised: 19 February 2026 / Accepted: 25 February 2026 / Published: 3 March 2026

Abstract

The global expansion of continuous GNSS networks has generated large-scale spatiotemporal datasets whose analysis requires robust mathematical and statistical tools. This study introduces a geospatial, multivariate statistical framework for classifying 21,548 GNSS stations from the University of Nevada repository. The methodology integrates harmonic regression, stochastic noise modeling, quality assessment, and slope estimation into a unified feature space suitable for high-dimensional analysis. Using unsupervised learning clustering computed with our custom-developed code, based entirely on free and open-source software, we identify homogeneous station groups that reflect dominant signal properties—periodicity, noise structure, data quality, and long-term velocity—together with their spatial context. The resulting clusters exhibit strong mathematical coherence and reveal continental-scale patterns driven by seasonal forcing, tectonic regime, climatic variability, and monument stability. By grouping stations with similar statistical behavior, the proposed framework improves reference-site selection, enhances deformation-field interpretation, and supports the detection of anomalous or hazard-related behavior. Overall, this approach provides a scalable, data-driven mathematical tool for analyzing complex spatiotemporal signals and contributes to more reliable deformation modeling and environmental risk assessment.
MSC:
62H30; 62M10; 86A32

1. Introduction

In recent years, the deployment of permanent GNSS positioning networks has undergone a remarkable development [1,2]. Since the first permanent stations of international networks such as International GNSS Service (IGS) and European Reference Frame (EUREF) at the end of the 20th century, more and more countries have denser and more widespread positioning networks [3]. Currently, the registration of stations exceeds the number of 20,000 and their data are available in free databases on the Internet of institutions and governments around the world. These stations have an extensive range of accuracy and are used for make reference frames for applications in geosciences [4].
The widespread use of permanent GNSS networks has been vital to geodynamic advances in determining crustal deformation, earthquake study, and active volcanism [1,5,6]. In areas with large numbers of permanent stations such as North America, Europe and Japan, dense networks of permanent GNSS stations have been of great help in the modeling of seismic and volcanic risk by institutions and countries [5,6].
Accurate characterization of GNSS time series is essential for obtaining reliable deformation models, as their quality is influenced by multiple factors including noise level, periodicity, data gaps, time-series length, and station stability [7,8]. Stations affected by horizon obstructions often exhibit elevated noise levels [9], while those installed on buildings or elevated structures may be impacted by thermal expansion or local monument deformation. These site-specific effects must be carefully assessed to ensure that geodynamic interpretations derived from GNSS observations remain robust and scientifically defensible [10].
Recent research has placed considerable emphasis on improving the stochastic modeling of continuous GNSS (CGPS) time series. Several studies have demonstrated the importance of accounting for temporally correlated and heteroscedastic noise to enhance the reliability of velocity estimates and the detection of subtle geophysical signals [11]. More sophisticated stochastic frameworks, such as autoregressive (AR) and generalized autoregressive conditional heteroscedasticity (GARCH) models, have been proposed to better capture the complex noise behavior observed in GNSS data [2]. Additionally, the spatial variability of noise characteristics and the influence of monument stability and environmental conditions on the stochastic properties of GNSS time series have been increasingly recognized [12].
Beyond these stochastic considerations, continuous GNSS time series are also affected by strong periodic signals, equipment-related discontinuities (e.g., antenna or receiver changes), and local site instabilities. Focusing on noise, this can be classified according to its spectral properties: white noise exhibits no temporal correlation, whereas real GNSS observations commonly contain colored noise components such as flicker and random walk. Flicker noise reflects long-term temporal correlations, while random-walk noise introduces cumulative deviations that may bias long-term displacement estimates [13]. Data gaps and local deformations further distort the temporal structure of the signal, complicating the extraction of geophysical information. Proper identification and modeling of these effects are therefore crucial to ensure the robustness and geodynamic reliability of GNSS-based interpretations [14].
Statistics provides essential tools for the analysis of geodetic time series [15]. Multivariate statistical approaches, including unsupervised machine-learning techniques, can be applied to group and classify stations according to the quality of their time series [16,17,18]. Such classifications offer valuable support to users by enabling a clearer assessment of station quality based on the cluster to which each station is assigned.
Cluster analysis and discriminant analysis are widely used statistical tools in the processing of GNSS time series, aimed at identifying patterns, classifying stations, and improving geodynamic interpretations [19,20]. Cluster analysis enables the grouping of geodetic stations that exhibit similar behavior in terms of displacement, noise levels, or periodicities, without requiring prior information about existing categories [14]. This is particularly useful for detecting regions with homogeneous tectonic responses or identifying stations affected by similar local conditions [21]. In contrast, discriminant analysis constructs functions that separate previously defined groups, facilitating the classification of new stations or validating groupings obtained through other methods [16]. When applied together, these methods provide valuable insights into the spatial coherence of geodetic motion, the quality of the time series, and the characterization of local or regional geophysical phenomena [22].
Despite the significant advances achieved by existing statistical and machine-learning approaches applied to GNSS time series, several limitations remain when addressing large-scale and heterogeneous networks. Many state-of-the-art studies focus on specific aspects of the signal—such as stochastic noise modeling [8], seasonal signal extraction [23], or velocity estimation—often relying on sophisticated or computationally demanding methods applied to restricted regions or limited numbers of stations. While these approaches provide high-quality results at local or regional scales, their complexity can reduce transparency, reproducibility, and scalability when extended to continental or global datasets.
Furthermore, previous classification and clustering efforts frequently rely on a reduced set of variables or treat data quality, noise characteristics, and deterministic signals separately, limiting the construction of an integrated and interpretable representation of station behavior. This fragmentation complicates cross-regional comparisons and restricts the practical use of classification results for network management, reference-station selection, and large-scale geodynamic interpretation [8,11]. These considerations motivate the need for a unified, robust, and reproducible multivariate framework that deliberately favors well-established and interpretable statistical models, and that can be systematically applied to tens of thousands of GNSS stations using freely available software tools.
In this study, we develop and apply a scalable multivariate statistical framework for the systematic classification of 21,548 GNSS stations listed in the University of Nevada database (https://geodesy.unr.edu/, accessed on 24 February 2026). The methodology is based on four key parameters—noise, slope, periodicity, and quality—directly derived from the harmonic modeling of GNSS time series and representative of their mathematical and physical properties.
Python scripts were used to automatically retrieve and organize the stations by country, followed by a filtering procedure to remove sites with known issues, short observational records, or anomalous values. The periodicity, stochastic noise characteristics, quality indicators, and long-term velocity (slope) were then computed using custom R scripts specifically developed for this work. These descriptors were integrated into a unified feature space enabling large-scale unsupervised analysis. Clustering techniques were applied to identify coherent and physically interpretable station families, and discriminant analysis was used to evaluate cluster separability.
The principal contributions of this work include (i) the construction of quantitative variables capturing periodic behavior, noise structure, instrumental stability, and long-term deformation trends; (ii) the identification of statistically coherent cluster families reflecting dominant signal characteristics; and (iii) the demonstration that these clusters exhibit spatial patterns governed by geodynamic, climatic, and operational factors. Overall, the study shows that multivariate statistical classification provides a transferable framework for improving the interpretation, reliability, and practical use of GNSS time series in geophysical applications.
This research addresses the following questions:
  • RQ1: Is it possible to classify GNSS stations into homogeneous groups using time-series slope, periodicity, noise, and quality indicators?
  • RQ2: Is there a relationship between the geographic location of stations and their statistical classification?
  • RQ3: Can the cluster profiles of CGNSS stations serve as a statistical framework to enhance deformation analysis and risk management?

2. Background and Theoretical Context

Continuous GNSS time series contain a mixture of deterministic signals and stochastic noise whose proper characterization is essential for extracting reliable geophysical information. This section introduces the theoretical foundations required to interpret these components, covering the harmonic modeling of periodic signals, the main noise processes affecting CGNSS data, and the quality factors that influence the stability and reliability of long-term geodetic records.

2.1. Harmonic Regression Model for Periodic GNSS Signals

In this study, the periodic components of GNSS time series are modeled using a linear harmonic regression framework. This model is designed to represent regular oscillations in the signal associated with cyclic environmental and geophysical influences. The general form of the model for a GNSS coordinate component y ( t ) is:
y ( t ) = β 0 + β 1 t + a 1 sin ( 2 π t ) + b 1 cos ( 2 π t ) + a 2 sin ( 4 π t ) + b 2 cos ( 4 π t ) + δ · J ( t ) + ε ( t ) ,
where
  • t denotes time expressed in decimal years,
  • β 0 and β 1 represent the intercept and the linear trend (velocity),
  • a 1 and b 1 correspond to the annual harmonic component,
  • a 2 and b 2 correspond to the semiannual harmonic component,
  • J ( t ) is an indicator function that takes the value 1 when a discontinuity (jump) occurs at time t, and 0 otherwise,
  • δ represents the magnitude of the discontinuity,
  • ε ( t ) denotes the stochastic residual component (noise).
In practice, the jump indicator function J ( t ) is implemented using a tolerance window around known change epochs, such as antenna replacements or monumentation modifications. When multiple discontinuities are present, the model is extended by introducing several indicator functions J 1 ( t ) , J 2 ( t ) , , each associated with a corresponding jump coefficient δ 1 , δ 2 , , allowing the model to account for multiple abrupt changes in the GNSS time series.

2.2. Noise in CGNSS Time Series

To properly model and interpret GNSS time series, it is essential to understand the different types of noise that may affect the data. These noise processes differ in their temporal correlation structure and spectral properties, which have important implications for velocity estimation, uncertainty quantification, and geophysical interpretation.

2.2.1. White Noise

Definition 1 
(White Noise). Let { ε t } t Z be a sequence of random variables defined on a probability space ( Ω , F , P ) . The sequence { ε t } is said to be a white noise process if it satisfies the following conditions:
1. 
Zero mean:
E [ ε t ] = 0 , t Z .
2. 
Constant variance (homoscedasticity):
Var [ ε t ] = σ 2 < , t Z .
3. 
No temporal correlation (lack of memory):
Cov ( ε t , ε s ) = E [ ε t ε s ] = σ 2 , if t = s , 0 , if t s .
Remark 1. 
If, in addition, the variables ε t are independent and identically distributed (i.i.d.), then the process is called i.i.d. white noise. In the particular case where the distribution is normal N ( 0 , σ 2 ) , the process is referred to as Gaussian white noise: ε t i . i . d . N ( 0 , σ 2 ) [24].
White noise is a weakly stationary stochastic process characterized by constant mean and variance, and zero autocorrelation between time points. Its power spectral density is flat, indicating equal energy across all frequencies and the absence of periodic or temporal structure. Due to the lack of correlation, white noise is inherently unpredictable. Figure 1 shows the CGNSS station LINO, located in Great Britain on the stable western margin of the Eurasian plate. The vertical component (Up) exhibits high white noise levels, minimal trend, and clear annual periodicity. We estimated a white noise amplitude of ( 5.43 ± 0.11 ) × 10 3 m .

2.2.2. Random Walk Process

Definition 2 
(Random Walk Process). Let { X t } t Z be a stochastic process defined recursively by:
X t = X t 1 + ε t , with ε t WN ( 0 , σ 2 ) ,
where { ε t } is a white noise process with zero mean and constant variance σ 2 .
The process { X t } is called a random walk process, also known as cumulative noise. It can be expressed as:
X t = X 0 + k = 1 t ε k .
The random walk process is a non-stationary stochastic model characterized by constant mean but linearly increasing variance over time. Its values exhibit persistent temporal dependence, as past perturbations accumulate and influence future states. Consequently, the autocovariance depends not only on the lag but also on the time index [25].
Spectrally, the process concentrates energy in low frequencies, with a power spectral density that decays approximately as the inverse square of frequency—typical of red or Brownian noise. This long-range memory implies that distant past values can still affect current observations. Such behavior is especially relevant in geodetic applications, where cumulative effects like tectonic drift or sensor instability can bias trend estimations if not properly accounted for. Identifying random walk noise is therefore essential for reliable GNSS time series analysis. Figure 1 shows station CGNSS X086, located in Okinawa (Japan) on an active tectonic boundary. The East component exhibits high random walk noise, with no clear linear trend and strong autocorrelation. We estimated a random walk noise amplitude of ( 5.13 ± 0.13 ) × 10 2 m .

2.2.3. Flicker Noise

Definition 3 
(Flicker Noise). Let { X t } t Z be a zero-mean stochastic process with finite variance over finite time intervals. Denote
γ ( τ ) = E [ X t X t + τ ] ,
where γ ( τ ) denotes the (formal) autocovariance function of the process, whenever it exists.
The power spectral density (PSD) of the process is defined as the discrete Fourier transform of the autocovariance function [26,27]:
f ( λ ) = 1 2 π τ = γ ( τ ) e i τ λ , λ [ π , π ] .
The process { X t } is said to exhibit flicker noise (or 1 / f noise) if its power spectral density satisfies the scaling law
f ( λ ) 1 | λ | ,
for frequencies λ within a finite interval λ [ λ min , λ max ] .
Flicker noise represents an intermediate stochastic behavior between white noise and random walk noise. Unlike white noise, it exhibits long-range temporal dependence, while, in contrast to random walk processes, its variance does not grow unbounded with time. From a theoretical perspective, an ideal 1 / f spectrum is not integrable at the origin, implying that a pure flicker noise process is not strictly stationary. In practical applications, flicker noise is therefore interpreted as an approximate spectral behavior observed over a limited frequency band.
Spectrally, flicker noise concentrates more energy at low frequencies than white noise but less than random walk noise. This slow decay of the spectrum reflects persistent correlations across multiple time scales, making flicker noise particularly relevant in geodetic time series, where it can significantly bias velocity estimates if neglected. Figure 1 shows the CGNSS station P566, where the vertical component exhibits a clear 1 / f -type spectral decay, indicative of flicker noise. We estimated a flicker noise amplitude of ( 3.71 ± 0.07 ) × 10 3 m .

2.2.4. Noise Estimation Methodology

The following noise estimates should be interpreted as empirical noise metrics derived from the residual series, rather than as strict identification of ideal stochastic processes. The stochastic component ε ( t ) in Equation (1) is estimated from the residuals obtained after fitting the deterministic harmonic regression model using ordinary least squares (OLS), a standard approach in GNSS time-series analysis [28]. Three stochastic noise processes are quantified: white noise, flicker (low-frequency) noise, and random walk noise. Each component is estimated using complementary time-domain and frequency-domain methods applied to the regression residuals.
  • White noise amplitude is computed as the standard deviation of the residual series, representing the high-frequency uncorrelated component.
  • Flicker (low-frequency) noise is evaluated in the spectral domain using the power spectral density (PSD) of the residuals. The noise level is approximated from the spectral energy concentrated in the lowest frequency band, capturing long-period correlated variability consistent with the 1 / f behavior commonly observed in geodetic time series.
  • Random walk noise is estimated by fitting an ARIMA(0, 1, 0) model to the residual series using maximum likelihood estimation. The square root of the innovation variance of the fitted model provides an estimate of the random walk noise amplitude.
We note that the adequacy of the noise characterization was assessed through inspection of the residual autocorrelation structure and spectral behavior. The identification of white noise, flicker noise, and random walk noise therefore relies on their expected statistical and spectral signatures. Several factors may affect the accuracy of noise quantification, including limited time-series length, data gaps, imperfect modeling of discontinuities, seasonal signal leakage into the residuals, and irregular temporal sampling. These effects may introduce bias in spectral estimates and influence the clustering results derived from noise-related features. Despite these limitations, this empirical characterization provides a consistent and scalable framework for comparing noise behavior across thousands of CGNSS stations.

2.3. Periodicity in CGNSS Time Series

GNSS time series typically exhibit both deterministic and stochastic components. Among the deterministic components, periodic signals play a central role in modeling the geophysical behavior of the stations. These signals are often associated with environmental effects such as seasonal loading (hydrological, atmospheric or snow-related), thermal expansion of the monumentation, and systematic errors in satellite orbits or equipment calibration [29]. From a mathematical perspective, periodicity in the GNSS signal is often represented using harmonic components, which are modeled as sinusoidal functions with specific frequencies and phases. The most prominent periodicities observed in CGNSS data are the annual and semiannual cycles [23].
In this study, only the annual and semiannual harmonic components were considered in the regression model because they represent the dominant and most stable seasonal signals consistently observed in CGNSS time series [30,31]. Higher-order harmonics generally exhibit smaller amplitudes, lower temporal stability, and greater sensitivity to data gaps and noise contamination. Including additional frequencies would therefore increase model complexity without significantly improving the characterization of periodic behavior in large-scale GNSS station datasets. This parsimonious representation is commonly adopted in GNSS time-series modeling.
Figure 2 illustrates representative examples of the different types of periodic behavior observed in the CGNSS time series. These cases highlight how stations may exhibit low periodic content, strong annual signals, or pronounced semiannual components depending on their geophysical and environmental context. In Equation (1), the sinusoidal terms of the model represent the annual and semi-annual periodic components of the time series.

2.4. Quality in CGNSS Time Series

In this study, the quality of continuous GNSS (CGNSS) time series is formally defined as a set of structural and observational descriptors that characterize the completeness, continuity, and instrumental stability of each station record. Specifically, the variables considered are time series length, data availability, and the presence of offsets. These quality descriptors are treated as indicators of dataset integrity and are mathematically independent from the stochastic noise metrics and deterministic signals used in other analyses.
Offsets arising from equipment changes, such as antenna or radome replacements, represent a major factor affecting CGNSS time series quality (Figure 3). These discontinuities introduce artificial jumps that, if not properly documented and corrected, may bias velocity estimates and long-term geophysical interpretations. Studies within the EUREF Permanent Network identify antenna-related changes as a primary source of offsets and highlight the importance of systematic metadata tracking and detection strategies [32,33].
Data gaps also reduce time series quality by limiting observational continuity and availability. Although missing observations can influence stochastic noise estimation—particularly colored noise components such as flicker and random walk [34]—they are considered here exclusively as indicators of observational completeness. Similarly, the length of the GNSS time series constrains the reliability of velocity estimation, seasonal signal separation, and stochastic interpretation, with longer records generally enabling more robust parameter estimation [34]. Consequently, time series length is incorporated as a fundamental descriptor of observational capacity.
To maintain consistency and comparability in the clustering framework, each quality descriptor is normalized and optionally weighted before integration into the multivariate analysis. Offsets are coded as the number of documented equipment changes, data availability is expressed as number of observations, and series length is normalized to a fixed scale. In this way, the quality feature contributes proportionally to cluster assignments without dominating or being dominated by noise metrics [35]. This ensures that the four primary feature domains—slope, periodicity, noise, and quality—remain conceptually distinct and separable in the feature space.

3. Materials and Methods

This study followed a systematic methodology encompassing the automated download, processing, filtering, and analysis of GNSS time series data. The steps carried out involved an initial automated download of the GNSS Data. Next, it was necessary to label the stations using the countries, followed by adaptation of the file to the format of the SARI program [36]. Afterwards we computed the Slopes, Periodicities, and Noise Levels of the stations with a data filtering and cleaning for a better analysis. Eventually we applied a statistical descriptive analysis with several multivariate techniques such as clustering and discriminant analysis. Below, we describe the whole procedure in detail.

3.1. Preprocessing and Download of the GNSS Data

Our work began with an automated download of the GNSS Data from (https://geodesy.unr.edu/, accessed on 24 February 2026), a public repository provided by the UNAVCO/Nevada Geodetic Laboratory. A total of 21,548 geodetic stations were downloaded globally. For that we used a Python script to read the time series files for East, North, and Up components. The script utilized web scraping techniques through the requests library to identify all accessible stations and systematically download the corresponding files. The downloaded files were adapted into a custom .tenv format to facilitate subsequent organization and processing.
Downloaded files were locally organized into country-specific folders, based on each station’s identification code. This classification was performed using information data contained in each station’s filename or download URL. Grouping by country enabled regional comparative analyses and facilitated quality control processes. The number of stations per country varies depending on the different development of GNSS networks in each of them. Regions such as Europe, North America, and active seismic zones boast denser networks than less populated and poor areas.
Later we considered the philosophy of analysis in a rigorous and widely used program for determining parameters in geodetic time series such as SARI (https://sari-gnss.github.io/, accessed on 24 February 2026). The “.tenv” files were converted into a format similiar to the used with this software. The conversion process involved rewriting the data into columns of decimal dates and displacements (East, North, Up) following the required order; ensuring uniformity in the number of columns and data formatting.

3.2. Computation of Velocities, Periodicities, and Noise Levels

The computational workflow was implemented using both Python (version 3.12.4) and R-Studio (version 2026.01.1+403). Python was used for automated data retrieval and dataset construction. In particular, the libraries requests, pandas, os, and time were used to download GNSS time-series files from the Nevada Geodetic Laboratory repository, organize them into country-based directories, and generate structured input datasets for subsequent analysis. The estimation of velocities, seasonal components, noise metrics, filtering procedures, clustering analyses, and visualizations were implemented in R (version 4.4.2). Data manipulation and preprocessing were performed using the tidyverse ecosystem, while clustering and statistical analyses relied on the packages stats, cluster, MASS, clusterSim, mclust, and e1071. Visualization and mapping were carried out using ggplot2, leaflet, and  sf. All scripts used in the workflow were developed by the authors to ensure reproducibility of the analysis.
Building on the workflow described above, we developed a second script in R to extract the key geodetic parameters from the GNSS time series. First, we fitted a linear model to each time series to estimate the velocities in the East, North, and Up components. In this model we also incorporated sinusoidal terms with annual and semi-annual frequencies to capture the seasonal behavior of the displacements following what was described in Section 2.1. This approach allowed us to characterize three main noise components: white noise, colored noise, and random walk noise, representing. Finally, we organized all the estimated parameters for each station into a structured table, ensuring the data was ready for subsequent analysis.
Following the parameter estimation, we compiled the results into a structured database to facilitate further analysis. In this table, each row represents an individual GNSS station, while each column corresponds to one of the calculated variables, including velocities, seasonal amplitudes, noise levels, and the length of the time series. This organization enables efficient statistical evaluations and comparative studies across the entire dataset.
To ensure the robustness and reliability of the analyses, a multi-step data filtering pipeline was implemented over our 21,548 stations downloaded. Initially, only records from 2020 onwards were considered to standardize the time series and ensure they have approximately the same length, staying 15,661 stations. Afterwards, we note that some records containing missing values in the velocity components (East, North, and Up) and only complete observations were considered. Also, stations with physically reasonable slope values (between 10 and 10 mm yr−1) were retained. Finally, stations with time series shorter than three years were also excluded, since shorter observations do not allow for a reliable estimation of seasonal periodicities. After this basic filtering stage, 12,383 stations out of the 15,661 were retained.
Subsequently, multivariate outliers among the velocity components were identified using the Mahalanobis distance, computed from the covariance structure of the East, North, and Up trends. Stations exceeding the 99th percentile of the χ 2 distribution (with three degrees of freedom) were considered anomalous and removed. It is worth noting that the basic filtering step is univariate, as it only considers the velocity components and the length of the time series. However, this approach does not account for atypical relationships among the three velocity components (East, North, and Up). Therefore, an additional multivariate filtering using the Mahalanobis distance was applied. This method identifies anomalous observations based on the covariance structure of the dataset, allowing the detection of stations with inconsistent behavior among components, even if their individual values fall within acceptable limits. Consequently, this step ensures a more robust and internally consistent dataset for subsequent analyses. This step reduced the dataset from 12,383 to 12,290 stations, representing 78.48% of the initial set. In Table 1, the structure and variables of the four datasets can be visualized, providing an overview of the information available for subsequent analyses.
Finally, an interquartile range (IQR) criterion was independently applied to each derived dataset (noise parameters, velocity components, periodic terms, and quality-related metrics) in order to identify and remove univariate outliers. In the case of the slope dataset, absolute values were used prior to the IQR filtering. After this filtering step, the retained samples consisted of 8816 records for the noise dataset (56.29% of the original data), 9197 for the slope dataset (58.73%), 8444 for the periodic terms dataset (53.92%), and 7921 for the quality dataset (50.58%). This additional filtering ensured that only statistically consistent and physically plausible values were preserved for subsequent analyses. In Figure 4, the percentage of data retained after each filtering stage for three datasets is shown. An additional figure related to the filtering procedure is provided in the Supplementary Materials (Figure S1).
After the data filtering stages illustrated in Figure 4, the clustering process was prepared by selecting the optimal number of groups for each dataset. To this end, a robust version of the elbow method was implemented, which evaluates the within-cluster sum of squares (WSS) for increasing values of k. The optimal number of clusters ( k o p t ) was identified as the point with the maximum perpendicular distance from the straight line connecting the first and last WSS values. The elbow method was applied independently to the four datasets (periodicity, noise, quality and slope) to determine the optimal number of clusters. In Figure 5 we show the results of the elbow method in each dataset and the optimal number of clusters needed.
We found that the appropriate number of clusters for grouping almost all the analysed datasets is three, with the exception of noise, for which the algorithm indicates that the optimal number of groups is two (stations with higher and lower noise levels). To evaluate the robustness of the clustering, we compared the k = 2 and k = 3 solutions for the noise dataset. While k = 2 captures the main contrast between low-noise and high-noise levels, it merges distinct geophysical behaviors into overly broad groups. The k = 3 configuration provides a clearer and more informative separation, revealing an intermediate cluster with coherent spatial and geophysical patterns. For this reason, and considering that the elbow method provides only a heuristic and orientative indication of the appropriate number of clusters, we adopt k = 3 as the preferred solution.

3.3. Clustering Analysis

To explore structural characteristics within the GNSS-derived datasets (noise parameters, periodic terms, general quality and velocity components), we applied five distinct clustering algorithms to each dataset independently. The clustering procedures were implemented in R, using standardized versions of the data matrices to ensure comparability across variables. The selected methods are detailed below.
In this study, several clustering approaches were applied and organized according to their hierarchical or non-hierarchical nature, as well as their partition-based or probabilistic formulation, in order to capture different latent structures in the data and to evaluate the robustness of the results.
Among the non-hierarchical partition-based methods, the k-means algorithm was implemented as a distance-based approach that minimizes the within-cluster sum of squares. To reduce sensitivity to initial conditions and to ensure convergence toward a stable solution, 50 random initializations were performed (nstart = 50). In addition, the Partitioning Around Medoids (PAM) algorithm was used as a robust alternative to k-means. By selecting actual observations as cluster centers (medoids), PAM improves interpretability and provides increased resistance to the influence of outliers.
A hierarchical clustering approach was also employed using Ward’s minimum variance criterion (ward.D2 option). This agglomerative method constructs a dendrogram based on pairwise distances between observations, which was subsequently cut at a predefined number of clusters. The resulting dendrograms were saved to allow for visual inspection of the hierarchical structure and relationships among clusters.
Finally, non-hierarchical fuzzy and probabilistic methods were considered. Fuzzy C-means clustering was applied to allow partial membership of each observation across clusters, with the fuzziness parameter set to m = 2 , following the classical formulation of Bezdek (1981) [37], as it provides a well-established balance between membership fuzziness and centroid definition. The resulting membership matrices were exported for further analysis. In parallel, Gaussian Mixture Models (GMMs) were fitted as a fully probabilistic framework, assuming that the data were generated from a mixture of multivariate normal distributions. The optimal number of components was fixed at k opt , and Bayesian Information Criterion (BIC) values were recorded to assess and compare model fit.
For each method, the cluster assignments and cluster centers (or medoids) were extracted and stored. These results enabled comparative evaluations of clustering consistency and facilitated subsequent geospatial and statistical interpretations.

3.4. Clustering Evaluation Criteria

To assess the quality and consistency of the clustering results, several internal validation metrics were computed for each method and dataset. These metrics provide complementary perspectives on the compactness, separation, and discriminative power of the resulting clusters. Following the recommendation of reference [38], three complementary cluster evaluation criteria were selected in order to determine the most suitable clustering method for the available data. These indices were chosen because they jointly assess cluster compactness, separation, and overall structural validity, providing a balanced framework for comparing alternative clustering solutions.
The Average Silhouette Width was used to quantify both cohesion and separation of the resulting clusters. Values close to 1 indicate well-separated and compact clusters, whereas values near 0 suggest overlapping or ambiguous cluster assignments. This metric allows an intuitive interpretation of clustering quality at both the global and individual observation levels. In addition, the Davies–Bouldin Index (DBI) and the Calinski–Harabasz Index (CHI) were employed as complementary validation measures. Lower DBI values indicate superior clustering performance by reflecting low intra-cluster dispersion and high inter-cluster separation. Conversely, higher CHI values are associated with better-defined clusters, as this index favors solutions that maximize between-cluster variance while minimizing within-cluster variance.
All metrics and visualizations were computed using the standardized data matrices to ensure comparability across variables. The results were stored and exported for each clustering method and dataset, enabling a comprehensive comparison of clustering performance. To quantitatively assess the performance of the clustering algorithms applied to each GNSS signal component, we computed several internal validation metrics. The results are summarized in Table 2.
Regarding the four dataframes used to evaluate the performance of each GNSS station, K-means clustering generally provided the most effective grouping results compared with the other methods tested. It yielded the highest-quality outcomes for the velocity, periodicity, and noise dataframes across the three evaluation criteria. In contrast, for the dataframe describing the overall station quality, the best-performing method was Fuzzy Means, although the K-means solution produced results that were very similar and only slightly inferior.
Reducing the number of variables can substantially alter the behavior of clustering algorithms, often favoring soft approaches such as Fuzzy C-Means (FCM) over hard methods like K-means. Lower dimensionality typically yields more compact data structures and smoother group boundaries, enabling fuzzy-membership algorithms to better capture gradual transitions between clusters. JCR studies have shown that FCM consistently achieves higher Silhouette, Davies–Bouldin, and Calinski–Harabasz scores in settings with subtle group separation or partial overlap [39,40]. Moreover, FCM tends to be more stable than K-means when the variable set is small, as its fuzzy formulation accommodates structural uncertainty and avoids rigid assignments that may be unsuitable in low-dimensional spaces [41]. These findings support the observation that, under reduced dimensionality, FCM can emerge as the best-performing clustering method, as reflected in this study.
Based on the validation indices reported in Table 2 (Silhouette, Davies–Bouldin, and Calinski–Harabasz), the clustering method with the best overall performance was selected independently for each component. Specifically, k-means was retained for Slope (Velocity), Noise, and Period, while Fuzzy C-means was selected for the Quality component. The subsequent analysis and interpretation are therefore based on these optimal configurations rather than on a joint clustering of the four quantities.

3.5. Linear Discriminant Analysis of Cluster Separability

To evaluate the separability of the clusters obtained from each clustering method, a Linear Discriminant Analysis (LDA) was performed using the lda() function from the MASS package in R. The analysis was applied to the standardized numerical variables used for clustering, with cluster assignments treated as class labels. For each clustering solution, model performance was assessed using leave-one-out cross-validation (LOOCV), from which confusion matrices and classification accuracy were computed. Discriminant projections shown in the figures correspond to the first two linear discriminant components obtained from an LDA model fitted without cross-validation and are used solely for visualization of group separation in the reduced discriminant space. LDA projections were generated only when all clusters contained at least two observations to ensure numerical stability of the discriminant model. This analysis provides a complementary evaluation of clustering quality by quantifying the degree of linear separability among the identified groups in the feature space, alongside other internal validation metrics.

3.6. Geospatial Visualization of Clustering Results

To support the interpretation of clustering outcomes, interactive geospatial maps were generated for each clustering method and dataset. These maps display the spatial distribution of GNSS stations colored by their assigned cluster, enabling visual assessment of regional patterns and potential geophysical correlations. The maps were created using the leaflet library in R, which allows dynamic rendering of geographic data. Each station was plotted using its latitude and longitude coordinates, and clusters were distinguished using an adaptive color palette from the viridis package. Pop-up labels were included to display station identifiers and cluster assignments upon interaction.
For each clustering method applied to each dataset (noise parameters, periodic terms, and velocity components), a separate map was generated and exported in HTML format. These visualizations provide an intuitive overview of how stations group spatially, revealing regional similarities, tectonic behaviors, or anomalies that may not be evident from numerical analysis alone.
In addition to the interactive maps, the R workflow also produces geospatial files in standard GIS formats—specifically ESRI Shapefiles—so that the results can be further explored, edited, or integrated within QGIS 3.40 software (https://qgis.org/, accessed on 24 February 2026). This ensures full interoperability with common geospatial analysis tools and facilitates more advanced spatial queries or cartographic representations. The use of clean basemaps (CartoDB.Positron) and consistent styling ensures clarity and comparability across maps. Legends were added to facilitate interpretation, and all outputs were saved for further exploration and presentation.

4. Results

In this section, we present the main outcomes of the clustering analysis applied to the GNSS time series. First, centroid profiles were computed for each cluster to characterize the dominant temporal behavior of the stations across the different signal components. We then generated Linear Discriminant Analysis (LDA) projections to evaluate the separability of the clusters and to visualize the discriminant structure captured by each dataset.
In addition, geospatial maps were generated to visualize the spatial distribution of GNSS stations according to their assigned clusters, enabling detailed regional comparisons and the identification of coherent geographic or geophysical patterns. These maps were produced using R and subsequently exported in standard GIS formats, allowing further inspection, refinement, and spatial analysis within QGIS software. This integration facilitates a more comprehensive interpretation of the clustering results, particularly in regions where local tectonics, environmental conditions, or data quality may influence station behavior. The resulting geospatial products form a key component of the analysis and are examined in detail in the following subsections.
To aid the interpretation of the clustering results, each GNSS signal component was grouped according to the dominant behavior observed in its time series. For the periodic component, clusters P1, P2, and P3 capture distinct patterns of seasonal variability: P1 includes stations with a strong annual signal in the East and Up components and a weaker response in the North; P2 groups stations dominated by an annual cycle in the North component together with a pronounced semiannual contribution in the horizontal components; and P3 comprises stations with intermediate annual amplitudes horizontally but a markedly enhanced semiannual signal in the vertical direction. A similar nomenclature is applied to the remaining components—Q1–Q3 for quality, N1–N3 for noise, and S1–S3 for slope—each summarizing the main structural characteristics of the stations within their respective cluster families. These patterns are detailed in Table 3.
To further characterize the structure and separability of the clusters obtained for each GNSS signal component, we examined the centroid profiles derived from the clustering results. These profiles summarize the representative behavior of each group and provide a direct visualization of the dominant temporal patterns related to slope, noise, periodicity, and quality. In addition, Linear Discriminant Analysis (LDA) projections were generated to evaluate the discriminant power of the clustering solutions and the degree of separation in a reduced feature space. Together, these graphical representations offer a clear view of the internal consistency of the clusters and their ability to capture meaningful differences across stations. The corresponding centroid profiles and LDA projections are shown in Figure 6 and Figure 7. Additional figures related to the cluster box plots are provided in the Supplementary Materials (Figures S2–S5).

5. Discussion

This section discusses the physical interpretation and geodynamical implications of the clustering results obtained from the analysis of GNSS time-series characteristics. By jointly considering periodicity, slope, quality, and noise, the clustering framework reveals coherent spatial patterns that reflect the combined influence of regional geophysical processes, tectonic setting, climatic forcing, and network-related factors. Rather than treating each metric in isolation, the discussion emphasizes how the identified groups capture systematic differences in crustal behavior across continents and tectonic domains. The following subsections examine each component in detail, relate the cluster distributions to established findings in the literature, and highlight how the observed patterns contribute to a more integrated understanding of global GNSS station behavior. Figure 8 displays the global station-clustering results for the four parameters analyzed. To facilitate interpretation, Figure 9 provides a detailed view focusing on specific regions, including Europe, the continental United States, Australia, New Zealand, and Japan.

5.1. Periodicity

Regarding the periodic component, the analysis of the characteristics of groups P1 and P2 reveals a marked regional variability in the expression of seasonal signals, particularly in the horizontal components. In group P1, which encompasses much of Europe, the annual periodicity appears to be dominant in the East component. In contrast, in group P2, covering large areas of North and South America, the periodicity is predominantly expressed in the North component. Several studies have shown that in the Americas—particularly at IGS stations located in the United States and South America—the North component exhibits larger seasonal amplitudes and greater discrepancies with loading models, suggesting a dominance of this component in the annual periodicity. This trend has been documented in global analyses demonstrating a higher sensitivity of the North component to hydrological and atmospheric variations across the American continent, as well as a greater complexity in its modeling [42,43].
In contrast, in Europe, the specialized literature indicates that the East component tends to exhibit a more stable and coherent seasonal behavior across stations, with well-defined annual amplitudes and lower dispersion relative to loading models. Studies focusing on European GNSS networks have identified that the seasonal signal in the East component is particularly robust, suggesting a more direct control by regional geophysical processes such as continental hydrological loading and thermoelastic crustal variations [44,45]. Taken together, these results support the hypothesis that the dominant periodicity differs between regions, with a clear predominance of the North component in the Americas and the East component in Europe.
As a specific case within group P2, where semiannual periodicity is more prominent in the horizontal components, several global studies have shown that GNSS stations located in the American continent exhibit greater complexity in their horizontal seasonal signals, often characterized by a more pronounced semiannual component. This behavior has typically been linked to the combined influence of atmospheric, hydrological, and oceanic loading, whose interannual variability is particularly strong in the Americas. Recent analyses indicate that semiannual amplitudes in the North and East components are systematically larger at American stations compared with those in Europe, suggesting a more dynamic crustal response in this region, modulated by geophysical processes with higher temporal variability [23,42].
Regarding group P3, which includes coastal areas of the Pacific Ocean (Japan, Korea and Taiwan, Australia, New Zealand, and the Polynesian islands) as well as the southwestern United States, Mexico, and Alaska, the group profile reveals a stronger presence of semiannual periodicity in the vertical component. Studies focused on the southwestern Pacific indicate that the combined influence of oceanic and hydrological loading, together with tectonic processes, generates highly seasonal vertical motion patterns that are not purely annual, with significant subannual contributions at coastal stations on Pacific islands near Australia and New Zealand [46]. This behavior may be related to tidal effects associated with the semiannual solar declination cycle.
Consistently, global analyses of relative sea level and vertical motion have shown that in tectonically active margins—such as the Pacific Rim and high-latitude regions—the vertical component exhibits a nonlinear behavior dominated by loading variations that amplify subannual signals (including the semiannual one), particularly in areas such as Alaska and other sectors of the northern Pacific [47]. Additionally, regional studies integrating GNSS observations with hydrological loading models demonstrate that in areas subjected to strong seasonal climatic forcing, the vertical component robustly incorporates both annual and semiannual terms [48]. This is consistent with the presence of an enhanced semiannual vertical signal in the southwestern United States and Mexico, where combined oceanic, hydrological, and tectonic loading effects converge.
As a specific example of group P3 within the European region, the case of southern Finland stands out. The detection of a semiannual signal in the vertical component in this area is consistent with glacial isostatic uplift mechanisms, which make the crust particularly sensitive to seasonal loading variations. The combined effects of snow accumulation and melt, the presence of lakes, changes in hydrological storage, and semiannual variations in atmospheric pressure and Baltic Sea ocean loading generate a bimodal vertical response that can be observed in GNSS time series.

5.2. Slope

The clustering of slope fields is strongly linked to tectonic context and reflects the grouping of stations according to the geodynamic setting in which they are embedded. Groups S1 and S3 exhibit contrasting characteristic profiles, with relatively low and high horizontal slopes, respectively. Group S1 is mainly located in the tectonically stable part of the Americas, as well as in regions of Turkey and Greece and parts of Japan and the Philippines. Group S3, in contrast, is predominant across Europe and Africa.
In regions where both groups converge, tectonically active and seismically dynamic areas emerge as a result of the relative slope differences between stations, as observed in California, Panama, Chile, Greece, and Turkey, as well as in volcanic regions such as Japan and the Philippines. The relationship between relative slope gradients and seismicity and volcanism has been extensively documented. In California, for example, the lateral interaction between the Pacific and North American plates generates strong slope gradients and highly seismic transform faults [49,50]. In subduction zones such as Chile, Japan, and the Philippines, rapid and heterogeneous plate convergence leads to deformation accumulation, seismic rupture, and magma generation through fluid-induced melting [51,52]. In the eastern Mediterranean, relative slope differences between Africa, Arabia, and Eurasia account for the intense distributed deformation, seismic activity, and volcanism associated with the Hellenic Arc [53].
In the case of group S2, characterized by a high relative slope in the vertical component (whether uplift or subsidence), its presence is predominant in regions undergoing post-glacial isostatic adjustment, such as northern Europe (Norway, Sweden, and Finland) and areas of Canada and Greenland where uplift is occurring. This group is also found in volcanic regions such as the Azores, Taiwan, and Japan. In the specific context of isostasy, high-precision geodetic measurements in these regions confirm this behavior [54,55,56]. In contrast, vertical variations—both positive and negative—in volcanic areas may be related to magmatic adjustments, intrusions, and deformation associated with subduction arcs, hydrological loading changes, or oceanic rift processes, as documented in studies from the Azores, Japan, and Taiwan [57].
To provide a preliminary quantitative validation of the clustering results, we compared the observed GNSS velocities with the plate-motion predictions of the GSRMv1 model [58]. The stations included in the analysis yielded correlation coefficients of 0.446 for the East component and 0.783 for the North component. These values indicate a systematically better agreement in the meridional direction, consistent with previous studies reporting larger residuals and higher noise levels in the East component of GNSS velocities. In addition, the spatial distribution of the residual vectors reveals differences of up to 40–50 mm/yr in major plate-convergence zones, such as western North America, the Hellenic–Anatolian region, and the eastern Eurasian margin, in line with the expected behavior in areas of strong tectonic interaction.
It is worth noting that the boundaries between the velocity-based clusters tend to coincide with regions of enhanced seismic, tectonic, or deformational activity. These transition zones separate areas with distinct kinematic behavior and may therefore reflect local stress accumulation or deformation gradients. This pattern provides additional geophysical support for the clustering structure identified in this study.

5.3. Quality

For the quality component, although all stations generally exhibit good performance due to prior filtering and preprocessing, the algorithm identified three distinct clusters. Groups Q1 and Q2 show similar characteristics in terms of the percentage of available data in the time series. Q2 presents slightly lower relative quality compared with Q1, which is the group with the lowest average number of offsets and therefore the highest overall quality. Group Q3, in turn, also shows good quality, with a higher percentage of available data but without standing out in terms of offset occurrence. All groups have a comparable average time span, as the last five years from 2020 were selected for all stations.
The spatial distribution of the clusters is heterogeneous but shows a clear geographical pattern linked to the network to which each station belongs. The optimal-quality group Q1 is strongly represented in stations located in South Carolina and Pennsylvania (United States), Poland, Bavaria (Germany), Catalonia and Aragon (Spain), and Romania. For the good-quality group Q3, a higher concentration is observed in New Zealand, Japan, and Korea in Asia; in stations across the U.S. states of Texas, Alabama, Florida, California, and Alaska; in Chile; in New South Wales (Australia); in southern Italy; and in Israel. Several studies have shown that well-maintained continuous GNSS networks exhibit a very low incidence of jumps, discontinuities, and data gaps, resulting in more stable and reliable time series [25,59].

5.4. Noise

Regarding the noise component, most stations exhibit generally low noise levels, and the algorithm has classified them into group N2. Stations in group N1, by contrast, show higher values of random-walk noise and white noise. This group is predominantly distributed across equatorial and tropical regions, as well as northern Australia. It is also present in mountainous areas such as the Appalachian range in the United States and several European mountain chains, including the Alps, the Apennines, and the Scandinavian Mountains. Additionally, this group extends over large areas of Greenland and appears in seismically and volcanically active regions such as parts of Japan, the Hawaiian Islands, and New Zealand.
These patterns have been documented in equatorial and tropical regions, where atmospheric variability and hydrological loading increase the stochastic component of the time series [12]. Likewise, global studies have shown that stations located in mountainous areas exhibit higher noise levels due to local geomorphological and meteorological effects [60]. This behavior is also observed in high-latitude regions such as Greenland, where extreme atmospheric conditions intensify random-walk noise [61]. In addition, in seismically and tectonically active areas such as Japan and New Zealand, geodynamic activity has been shown to increase the stochastic variability of GNSS time series, leading to elevated noise levels [25].
Stations in group N3 exhibit higher levels of flicker noise. Their spatial distribution is more diffuse and complex, although notable concentrations appear in northern Sweden and in volcanic regions such as New Zealand and Japan. In the United States, this group is present in areas of western Texas, northern Minnesota, the region near Vancouver, and the Mammoth Lakes area in California. A significant presence is also observed in the states of Alaska and Indiana, as well as in central Mexico.
Correlated noise in GNSS time series exhibits a marked geographical dependence, with stronger intensities in high-latitude regions such as northern Scandinavia and Alaska [9], where ionospheric and atmospheric conditions favor the emergence of flicker-type components. This effect has also been documented in studies focused on the polar ionosphere, where high-frequency variability increases correlated noise in GNSS observations [62]. Moreover, the greater presence of flicker noise in volcanic regions such as Japan, New Zealand, or Mammoth Lakes is consistent with studies showing that magmatic activity and associated deformation generate significant increases in non-white noise, including flicker noise [63]. Finally, the occurrence of this group in regions with active aquifers or strong hydrological loading, such as central Mexico or parts of the U.S. Midwest, aligns with the findings of [12], who demonstrated that hydrogeological dynamics introduce correlated-noise components into GNSS time series.

6. Conclusions

In this study, a comprehensive multivariate statistical framework was applied to the analysis of GNSS time series from 21,548 permanent stations registered in the University of Nevada database. By combining harmonic regression, noise characterization, quality assessment, and slope estimation with clustering and discriminant analysis techniques, we demonstrate that large GNSS networks can be systematically classified into homogeneous groups that reflect both the dominant signal characteristics of the time series and their underlying geophysical context.
Regarding the first research question (RQ1), the results clearly show that GNSS stations can be robustly classified into well-defined and interpretable clusters using variables related to periodicity, noise, quality, and slope. The identified cluster families (P1–P3, Q1–Q3, N1–N3, and S1–S3) capture the main structural features of the time series and summarize the dominant behavior of the stations in a compact and meaningful way. The consistency of the cluster profiles across thousands of stations supports the suitability of multivariate statistical methods for large-scale GNSS network characterization.
In relation to the second research question (RQ2), a strong correlation is observed between the spatial distribution of the clusters and the geographical and geodynamic setting of the stations. Distinct regional patterns emerge in all analyzed components. Periodicity clusters reveal marked continental-scale differences in the dominance of seasonal signals, with contrasting behavior between Europe, North and South America, and the Pacific region. Slope clusters closely reflect tectonic regimes, distinguishing stable continental interiors, actively deforming plate boundaries, volcanic arcs, and regions affected by post-glacial isostatic adjustment. Similarly, noise clusters exhibit clear associations with climatic zones, topography, latitude, and tectonic activity, while quality clusters reflect differences in network management, monument stability, and operational standards. These results confirm that the statistical classification of GNSS time series is not arbitrary, but strongly controlled by physical and environmental factors.
Concerning the third research question (RQ3), the cluster profiles derived in this work provide a robust statistical framework for enhancing deformation analysis and geohazard assessment. By grouping stations with similar signal characteristics, the proposed approach facilitates the identification of reliable reference stations, the detection of anomalous behavior, and the interpretation of spatially coherent deformation patterns. This is particularly relevant for applications in seismic and volcanic risk analysis, where distinguishing between tectonic signals and noise-related artifacts is essential.
Considering the limitations of the proposed framework, the simplified representation of GNSS time-series behavior using a reduced set of features that, while capturing dominant statistical characteristics, cannot fully reflect the physical complexity of site-specific processes such as monument–environment interactions, nonlinear deformation, or transient geophysical events. Likewise, the noise characterization relies on empirical metrics derived from regression residuals rather than on a formal stochastic model selection, meaning that the estimated noise components should be viewed as descriptive indicators of temporal variability rather than strict identifications of ideal stochastic processes.
These limitations include the heterogeneity of data across the global network, which introduces additional constraints. Differences in time-series length, metadata completeness, monument stability, and maintenance practices may affect the extracted features and, consequently, the clustering results, even after applying normalization and filtering procedures. Furthermore, the unsupervised clustering approach is inherently sensitive to the choice of feature space and distance metric. Alternative feature definitions, weighting schemes, or clustering algorithms could lead to different groupings. For this reason, the resulting clusters should be interpreted as statistically meaningful patterns within the proposed analytical framework, rather than as definitive categories of GNSS station behavior.
Overall, this study demonstrates that multivariate statistical classification constitutes a powerful and scalable tool for extracting geophysical insight from large GNSS networks. The proposed methodology is transferable to different regions and temporal windows and can be readily extended by incorporating alternative stochastic models or additional geodetic observables. These results highlight the value of statistical clustering and discriminant analysis as complementary approaches for improving the interpretation, reliability, and practical use of GNSS time series in geodynamic studies.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math14050855/s1, Figure S1: GNSS Stations Removed During the Filtering Steps for Each Dataset (S, N, P, Q). Figure S2: Distributions of quality features by cluster. Figure S3: Distributions of velocity features by cluster. Figure S4: Distributions of noise features by cluster. Figure S5: Distributions of periodicity features by cluster.

Author Contributions

Conceptualization, A.S.-A., A.P.-F. and D.Á.-R.; methodology, A.S.-A. and D.Á.-R.; software, D.Á.-R.; validation, A.S.-A. and D.Á.-R.; formal analysis, A.S.-A. and D.Á.-R.; investigation, A.S.-A., A.P.-F. and D.Á.-R.; resources, A.S.-A. and D.Á.-R.; data curation, D.Á.-R.; writing—original draft preparation, A.S.-A., A.P.-F. and D.Á.-R.; writing—review and editing, A.S.-A., A.P.-F. and D.Á.-R.; visualization, A.S.-A., A.P.-F. and D.Á.-R.; supervision, A.S.-A. and A.P.-F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data and code that support the findings of this study are available from the following link: https://github.com/DanielAlvarezRuiz/Mathematics_Alvarez_et_al_2026 (accessed on 30 December 2024).

Acknowledgments

The authors would like to thank their families for the support provided throughout the development of this research article, with special gratitude to Paula Álvarez Ruiz and to Gonzalo, Felipe, and Carlota Alés Sánchez.

Conflicts of Interest

The authors declare no competing interests.

References

  1. Ge, M.; Gendt, G.; Dick, G.; Zhang, F.P.; Rothacher, M. A new data processing strategy for huge GNSS global networks. J. Geod. 2006, 80, 199–203. [Google Scholar] [CrossRef]
  2. Wang, S.; Huang, S.; Fang, H. Ground GNSS station selection to generate the global ionosphere maps using the information content. Space Weather 2022, 20, e2020SW002675. [Google Scholar] [CrossRef]
  3. Cao, Y.; Kuznetsov, E.; Miao, C.; Chen, S.; Meng, T. The current situation of Russia’s GNSS continuous operating reference station network and thinking on future development. Adv. Space Res. 2024, 73, 3896–3908. [Google Scholar] [CrossRef]
  4. Altamimi, Z.; Collilieux, X.; Métivier, L. Reference frames for applications in geosciences. In International Association of Geodesy Symposia; Springer Nature: Berlin/Heidelberg, Germany, 2013; Volume 138, pp. 137–145. [Google Scholar] [CrossRef]
  5. Ohno, K.; Ohta, Y.; Takamatsu, N.; Munekane, H.; Iguchi, M. Real-time modeling of transient crustal deformation through the quantification of uncertainty deduced from GNSS data. Earth Planets Space 2024, 76, 140. [Google Scholar] [CrossRef]
  6. Blewitt, G.; Hammond, W.C.; Kreemer, C. Harnessing the GPS data explosion for interdisciplinary science. Eos 2018, 99, e2020943118. [Google Scholar] [CrossRef]
  7. Williams, S.D.P.; Penna, N.T. Non-tidal ocean loading effects on geodetic GPS heights. Geophys. Res. Lett. 2011, 38, L09314. [Google Scholar] [CrossRef]
  8. Langbein, J. Noise in GPS displacement measurements from southern California and southern Nevada. J. Geophys. Res. 2008, 113, B05405. [Google Scholar] [CrossRef]
  9. Gobron, K.; Rebischung, P.; Chanard, K.; Altamimi, Z. Anatomy of the spatiotemporally correlated noise in GNSS station position time series. J. Geod. 2024, 98, 34. [Google Scholar] [CrossRef]
  10. Ren, Y.; Lian, L.; Wang, J. Analysis of seismic deformation from global three-decade GNSS displacements: Implications for a three-dimensional Earth GNSS velocity field. Remote Sens. 2021, 13, 3369. [Google Scholar] [CrossRef]
  11. He, X.; Bos, M.S.; Montillet, J.P.; Fernandes, R.M.S. Investigation of the noise properties at low frequencies in long GNSS time series. J. Geod. 2019, 93, 1271–1282. [Google Scholar] [CrossRef]
  12. He, Y.; Nie, G.; Wu, S.; Li, H. Analysis and discussion on the optimal noise model of global GNSS long-term coordinate series considering hydrological loading. Remote Sens. 2021, 13, 431. [Google Scholar] [CrossRef]
  13. Dmitrieva, K.; Segall, P.; Bradley, A.M. Effects of linear trends on estimation of noise in GNSS position time-series. Geophys. J. Int. 2016, 208, 281–288. [Google Scholar] [CrossRef]
  14. Kaczmarek, A.; Kontny, B. Identification of the noise model in the time series of GNSS station coordinates using wavelet analysis. Remote Sens. 2018, 10, 1611. [Google Scholar] [CrossRef]
  15. Montillet, J.P.; Bos, M.S.; Fernandes, R.M.S.; Williams, S.D.P.; Yu, K. Geodetic Time Series Analysis in Earth Sciences; Springer Geophysics; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar] [CrossRef]
  16. Li, M.; Huang, G.; Wang, L.; Xie, W. Comprehensive classification assessment of GNSS observation data quality by fusing k-means and KNN algorithms. GPS Solut. 2024, 28, 21. [Google Scholar] [CrossRef]
  17. Liu, Y.; Nagahata, H.; Uchiyama, H.; Taniguchi, M. Discriminant and cluster analysis of possibly high-dimensional time series data by a class of disparities. Commun. Stat.—Simul. Comput. 2017, 46, 7875–7892. [Google Scholar] [CrossRef]
  18. Alonso, A.M.; Berrendero, J.R.; Hernández, A.; Justel, A. Time series clustering based on forecast densities. Comput. Stat. Data Anal. 2006, 51, 762–776. [Google Scholar] [CrossRef]
  19. Barba, P.; Rosado, B.; Ramírez-Zelaya, J.; Berrocoso, M. Comparative analysis of statistical and analytical techniques for the study of GNSS geodetic time series. Eng. Proc. 2021, 5, 21. [Google Scholar] [CrossRef]
  20. Bogusz, J.; Klos, A. On the significance of periodic signals in noise analysis of GPS station coordinates time series. GPS Solut. 2015, 19, 403–417. [Google Scholar] [CrossRef]
  21. Rosado, B.; Plaza, S.; Páez, R.; Gárate, J.; Fernández-Palacín, F.; Berrocoso, M. Establishing models of surface deformation from geodetic time series GNSS in the southern region of the Iberian Peninsula and North Africa (Spain). In Proceedings of the 31st IUGG Conference on Mathematical Geophysics, Paris, France, 6–10 June 2016; p. 203. [Google Scholar]
  22. Costantino, G.; Giffard-Roisin, S.; Dalla Mura, M.; Socquet, A. Denoising of geodetic time series using spatiotemporal graph neural networks: Application to slow slip event extraction. arXiv 2024, arXiv:2405.03320. [Google Scholar] [CrossRef]
  23. Klos, A.; Bogusz, J.; Bos, M.S.; Gruszczynska, M. Modelling the GNSS time series: Different approaches to extract seasonal signals. In Geodetic Time Series Analysis in Earth Sciences; Montillet, J.P., Bos, M.S., Eds.; Springer: Cham, Switzerland, 2018; pp. 211–234. [Google Scholar] [CrossRef]
  24. Shumway, R.H.; Stoffer, D.S. Time Series Analysis and Its Applications: With R Examples, 3rd ed.; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
  25. Zhang, J.; Lian, L.; Huang, C.; Xu, C.; Zhang, S. Automated offset detection approaches: Case study in IGS Repro2 and Repro3. GPS Solut. 2024, 28, 123. [Google Scholar] [CrossRef]
  26. Brockwell, P.J.; Davis, R.A. Time Series: Theory and Methods, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 1991. [Google Scholar]
  27. Percival, D.B.; Walden, A.T. Spectral Analysis for Physical Applications: Multitaper and Conventional Univariate Techniques; Cambridge University Press: Cambridge, UK, 1993. [Google Scholar] [CrossRef]
  28. Tao, Y.; Liu, C.; Liu, C.; Zhao, X.; Hu, H. Empirical Wavelet Transform Method for GNSS Coordinate Series Denoising. J. Geovisualization Spat. Anal. 2021, 5, 9. [Google Scholar] [CrossRef]
  29. Santamaría-Gómez, A.; Mémin, A. Geodetic secular velocity estimation using multiyear GNSS data and the MIDAS robust trend estimator. J. Geophys. Res. Solid Earth 2015, 120, 7343–7361. [Google Scholar] [CrossRef]
  30. Blewitt, G.; Lavallée, D. Effect of Annual Signals on Geodetic Velocity. J. Geophys. Res. Solid Earth 2002, 107, ETG 9-1–ETG 9-11. [Google Scholar] [CrossRef]
  31. Santamaría-Gómez, A.; Bouin, M.N.; Collilieux, X.; Wöppelmann, G. Correlated errors in GPS position time series: Implications for velocity estimates. J. Geophys. Res. Solid Earth 2011, 116, B01405. [Google Scholar] [CrossRef]
  32. Dawidowicz, K.; Krzan, G.; Wielgosz, P. Offsets in the EPN station position time series resulting from antenna/radome changes: PCC type-dependent model analyses. GPS Solut. 2023, 27, 9. [Google Scholar] [CrossRef]
  33. Khazraei, S.M.; Amiri-Simkooei, A.R. Improving offset detection algorithm of GNSS position time-series using spline function theory. Geophys. J. Int. 2021, 224, 257–270. [Google Scholar] [CrossRef]
  34. Sun, X.; Lu, T.; Hu, S.; Huang, J.; He, X.; Montillet, J.P.; Ma, X.; Huang, Z. The relationship of time span and missing data on the noise model estimation of GNSS time series. Remote Sens. 2023, 15, 3572. [Google Scholar] [CrossRef]
  35. Qiu, Y.; Fang, C.; Song, S.; Huang, X.; Wang, C.; Wang, J. TsQuality: Measuring Time Series Data Quality in Apache IoTDB. Proc. VLDB Endow. 2023, 16, 3982–3985. [Google Scholar] [CrossRef]
  36. Santamaría-Gómez, A. SARI: Interactive GNSS position time series analysis software. GPS Solut. 2019, 23, 52. [Google Scholar] [CrossRef]
  37. Bezdek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms; Advanced Applications in Pattern Recognition; Springer: New York, NY, USA, 1981. [Google Scholar] [CrossRef]
  38. Chicco, D.; Campagner, A.; Spagnolo, A.; Ciucci, D.; Jurman, G. The Silhouette coefficient and the Davies-Bouldin index are more informative than Dunn index, Calinski-Harabasz index, Shannon entropy, and Gap statistic for unsupervised clustering internal evaluation of two convex clusters. PeerJ Comput. Sci. 2025, 11, e3309. [Google Scholar] [CrossRef]
  39. Salman, Z.; Alomary, A. Performance of the K-means and fuzzy C-means algorithms in big data analytics. Int. J. Inf. Technol. 2024, 16, 465–470. [Google Scholar] [CrossRef]
  40. Pickens, A.; Sengupta, S. Benchmarking studies aimed at clustering and classification tasks using K-means, fuzzy C-means and evolutionary neural networks. Mach. Learn. Knowl. Extr. 2021, 3, 695–719. [Google Scholar] [CrossRef]
  41. Cebeci, Z.; Yildiz, F. Comparison of K-means and Fuzzy C-means algorithms on different cluster structures. J. Agric. Inform. 2015, 6, 13–23. [Google Scholar] [CrossRef]
  42. Niu, Y.; Wei, N.; Li, M.; Rebischung, P.; Shi, C.; Chen, G. Quantifying discrepancies in the three-dimensional seasonal variations between IGS station positions and load models. J. Geod. 2022, 96, 31. [Google Scholar] [CrossRef]
  43. Li, Z.; Yang, K.; Jiang, W.; Deng, L.; Zou, Y. Impacts of period offset and period variation on modeling seasonal signals in GNSS coordinate time series. Geo-Spat. Inf. Sci. 2025, 1–16. [Google Scholar] [CrossRef]
  44. Kaczmarek, A.; Kontny, B. Estimates of seasonal signals in GNSS time series and environmental loading models with iterative least-squares estimation (iLSE). Acta Geodyn. Geomater. 2018, 15, 131–141. [Google Scholar] [CrossRef]
  45. Nistor, S.; Suba, N.S.; El-Mowafy, A.; Apollo, M.; Malkin, Z.; Nastase, E.I.; Kudrys, J.; Maciuk, K. Implication between geophysical events and the variation of seasonal signal determined in GNSS position time series. Remote Sens. 2021, 13, 3478. [Google Scholar] [CrossRef]
  46. Ballu, V.; Gravelle, M.; Wöppelmann, G.; de Viron, O.; Rebischung, P.; Becker, M.; Sakic, P. Vertical land motion in the Southwest and Central Pacific from available GNSS solutions and implications for relative sea levels. Geophys. J. Int. 2019, 218, 1537–1551. [Google Scholar] [CrossRef]
  47. Oelsmann, J.; Marcos, M.; Passaro, M.; Sanchez, L.; Dettmering, D.; Dangendorf, S.; Seitz, F. Regional variations in relative sea-level changes influenced by nonlinear vertical land motion. Nat. Geosci. 2024, 17, 137–144. [Google Scholar] [CrossRef] [PubMed]
  48. Xu, P.; Jiang, T.; Li, W.; Xu, G.; Zhang, C.; Wang, W.; Tian, K.; Feng, J. Analyzing the seasonal vertical displacement fluctuations using the Global Navigation Satellite System and hydrological load: A case study of the Western Yunnan Region. Water 2024, 16, 1260. [Google Scholar] [CrossRef]
  49. Argus, D.F.; Gordon, R.G. Present tectonic motion across the Coast Ranges and San Andreas fault system in central California. Geol. Soc. Am. Bull. 2001, 113, 1580–1592. [Google Scholar] [CrossRef]
  50. McCaffrey, R. Block kinematics of the Pacific–North America plate boundary in the southwestern United States from inversion of GPS, seismological, and geologic data. J. Geophys. Res. 2005, 110, B07401. [Google Scholar] [CrossRef]
  51. Moreno, M.; Melnick, D.; Rosenau, M.; Bolte, J.; Klotz, J.; Echtler, H.; Baez, J.; Bataille, K.; Chen, J.; Bevis, M.; et al. Heterogeneous plate locking in the South–Central Chile subduction zone: Building up the next great earthquake. Earth Planet. Sci. Lett. 2011, 305, 413–424. [Google Scholar] [CrossRef]
  52. Loveless, J.P.; Meade, B.J. Geodetic imaging of plate motions, slip rates, and partitioning of deformation in Japan. Earth Planet. Sci. Lett. 2010, 297, 34–41. [Google Scholar] [CrossRef]
  53. Reilinger, R.; McClusky, S.; Vernant, P.; Lawrence, S.; Ergintav, S.; Cakmak, R.; Ozener, H.; Kadirov, F.; Guliev, I.; Stepanyan, R.; et al. GPS constraints on continental deformation in the Africa–Arabia–Eurasia collision zone. J. Geophys. Res. 2006, 111, B05411. [Google Scholar] [CrossRef]
  54. Milne, G.A.; Davis, J.L.; Mitrovica, J.X.; Scherneck, H.G.; Johansson, J.M.; Vermeer, M.; Koivula, H. Space–geodetic constraints on glacial isostatic adjustment in Fennoscandia. Science 2001, 291, 2381–2385. [Google Scholar] [CrossRef] [PubMed]
  55. Sella, G.F.; Stein, S.; Dixon, T.H.; Craymer, M.; James, T.S.; Mazzotti, S.; Dokka, R.K. Observations of glacial isostatic adjustment in North America from GPS. Geophys. Res. Lett. 2007, 34, L02306. [Google Scholar] [CrossRef]
  56. Khan, S.A.; Sasgen, I.; Bevis, M.; van Dam, T.; Bamber, J.L.; Wahr, J.; Willis, M.; Kjær, K.H.; Wouters, B.; Helm, V.; et al. Geodetic measurements reveal differences between Greenland ice sheet flow and GIA uplift. Geophys. Res. Lett. 2016, 43, 7815–7823. [Google Scholar] [CrossRef]
  57. Ching, K.-E.; Rau, R.-J.; Johnson, K.M.; Lee, J.-C.; Hu, J.-C. Present-day kinematics of active mountain building in Taiwan from GPS observations during 1995–2005. J. Geophys. Res. 2011, 116, B09405. [Google Scholar] [CrossRef]
  58. Kreemer, C.; Blewitt, G.; Klein, E.C. A geodetic plate motion and global strain rate model. Geochem. Geophys. Geosyst. 2014, 15, 3849–3889. [Google Scholar] [CrossRef]
  59. Lau, L.; Tai, K.W. A data quality assessment approach for high-precision GNSS Continuously Operating Reference Stations (CORS) with case studies in Hong Kong and Canada/USA. Remote Sens. 2023, 15, 1925. [Google Scholar] [CrossRef]
  60. Wang, L.; Wu, Q.; Wu, F.; He, X. Noise content assessment in GNSS coordinate time-series with autoregressive and heteroscedastic random errors. Geophys. J. Int. 2022, 231, 856–876. [Google Scholar] [CrossRef]
  61. Wu, Z.; Lu, C.; Tan, Y.; Zheng, Y.; Liu, Y.; Jin, K. Real-time GNSS tropospheric delay estimation with a novel global random walk processing noise model (GRM). J. Geod. 2023, 97, 112. [Google Scholar] [CrossRef]
  62. Guo, K.; Veettil, S.V.; Weaver, B.J.; Aquino, M. Mitigating high latitude ionospheric scintillation effects on GNSS Precise Point Positioning exploiting 1-s scintillation indices. J. Geod. 2021, 95, 30. [Google Scholar] [CrossRef]
  63. Carbonari, R.; Riccardi, U.; De Martino, P.; Cecere, G.; Di Maio, R. Wavelet-like denoising of GNSS data through machine learning: Application to the time series of the Campi Flegrei volcanic area (Southern Italy). Geomat. Nat. Hazards Risk 2023, 14, 2187271. [Google Scholar] [CrossRef]
Figure 1. Types of noise observed in CGNSS time series are illustrated: (a) Up component with high white noise levels: CGNSS station LINO, Great Britain. (b) East component with high random walk noise: CGNSS station X086, Okinawa, Japan. (c) Up component with flicker noise CGNSS station P566, California USA.
Figure 1. Types of noise observed in CGNSS time series are illustrated: (a) Up component with high white noise levels: CGNSS station LINO, Great Britain. (b) East component with high random walk noise: CGNSS station X086, Okinawa, Japan. (c) Up component with flicker noise CGNSS station P566, California USA.
Mathematics 14 00855 g001
Figure 2. Types of periodicity observed in CGNSS time series are illustrated: (a) North component with low periodicity levels: CGNSS station COCOS, Cocos Keeling Island. (b) East component with high annual periodicity: CGNSS station TALR, Spain. (c) North component with high semiannual periodicity: CGNSS station SEY2, Seychelles.
Figure 2. Types of periodicity observed in CGNSS time series are illustrated: (a) North component with low periodicity levels: CGNSS station COCOS, Cocos Keeling Island. (b) East component with high annual periodicity: CGNSS station TALR, Spain. (c) North component with high semiannual periodicity: CGNSS station SEY2, Seychelles.
Mathematics 14 00855 g002
Figure 3. Types of low quality observed in CGNSS time series are illustrated: (a) North component with earthquake effect: CGNSS station J819, Japan. The time series show leaps in 2004 and 2011 earthquakes. (b) Up component with lots of antenna changes: CGNSS station MKEA, Hawaii (USA). (c) Up component with lots of gaps and clear change of antenna in 2022: CGNSS station LAE1, Papua New Guinea.
Figure 3. Types of low quality observed in CGNSS time series are illustrated: (a) North component with earthquake effect: CGNSS station J819, Japan. The time series show leaps in 2004 and 2011 earthquakes. (b) Up component with lots of antenna changes: CGNSS station MKEA, Hawaii (USA). (c) Up component with lots of gaps and clear change of antenna in 2022: CGNSS station LAE1, Papua New Guinea.
Mathematics 14 00855 g003
Figure 4. Percentage of data retained after each filtering stage for four datasets: Noise, Quality, Period and Slope.
Figure 4. Percentage of data retained after each filtering stage for four datasets: Noise, Quality, Period and Slope.
Mathematics 14 00855 g004
Figure 5. Elbow plots for the four GNSS signal components—slope, noise, periodicity, and quality—used to determine the suggested number of clusters.
Figure 5. Elbow plots for the four GNSS signal components—slope, noise, periodicity, and quality—used to determine the suggested number of clusters.
Mathematics 14 00855 g005
Figure 6. Centroid profiles of the clusters obtained for each GNSS signal component. Panels show the representative patterns for (a) slope, (b) noise, (c) period, and (d) quality, summarizing the characteristic behavior of each cluster.
Figure 6. Centroid profiles of the clusters obtained for each GNSS signal component. Panels show the representative patterns for (a) slope, (b) noise, (c) period, and (d) quality, summarizing the characteristic behavior of each cluster.
Mathematics 14 00855 g006
Figure 7. Linear Discriminant Analysis (LDA) projections for the clustering results of each GNSS signal component: (a) slope, (b) period, (c) quality, and (d) noise.
Figure 7. Linear Discriminant Analysis (LDA) projections for the clustering results of each GNSS signal component: (a) slope, (b) period, (c) quality, and (d) noise.
Mathematics 14 00855 g007aMathematics 14 00855 g007b
Figure 8. Global distribution of GNSS stations classified by cluster analysis for four parameters: (a) periodicity, (b) noise, (c) quality, and (d) slope. In each map, stations are grouped into three clusters representing the dominant characteristics of the corresponding component, highlighting the spatial variability of GNSS time series behavior at the global scale. Colors are consistent across all components: the same color always represents the same cluster number (Cluster 1, Cluster 2, Cluster 3), while the P/Q/S/N labels indicate the component-specific cluster assignment.
Figure 8. Global distribution of GNSS stations classified by cluster analysis for four parameters: (a) periodicity, (b) noise, (c) quality, and (d) slope. In each map, stations are grouped into three clusters representing the dominant characteristics of the corresponding component, highlighting the spatial variability of GNSS time series behavior at the global scale. Colors are consistent across all components: the same color always represents the same cluster number (Cluster 1, Cluster 2, Cluster 3), while the P/Q/S/N labels indicate the component-specific cluster assignment.
Mathematics 14 00855 g008
Figure 9. Regional distribution of GNSS stations classified by cluster analysis for four signal components: periodicity (P), quality (Q), noise (N), and slope (S). Each row corresponds to a component: (ac) periodicity, (df) quality, (gi) noise, and (jl) slope. Columns represent regions: Europe, USA, and other areas (for periodicity, quality, and noise, ‘Other’ includes remaining global regions; for slope, ‘Other’ corresponds to Japan in panel (l) and Australia in panels (f,i)). Stations are classified into three clusters per component. This figure highlights how GNSS signal characteristics and data quality are spatially structured at regional and continental scales. Colors are consistent across all components: the same color always represents the same cluster number (Cluster 1, Cluster 2, Cluster 3), while the P/Q/S/N labels indicate the component-specific cluster assignment.
Figure 9. Regional distribution of GNSS stations classified by cluster analysis for four signal components: periodicity (P), quality (Q), noise (N), and slope (S). Each row corresponds to a component: (ac) periodicity, (df) quality, (gi) noise, and (jl) slope. Columns represent regions: Europe, USA, and other areas (for periodicity, quality, and noise, ‘Other’ includes remaining global regions; for slope, ‘Other’ corresponds to Japan in panel (l) and Australia in panels (f,i)). Stations are classified into three clusters per component. This figure highlights how GNSS signal characteristics and data quality are spatially structured at regional and continental scales. Colors are consistent across all components: the same color always represents the same cluster number (Cluster 1, Cluster 2, Cluster 3), while the P/Q/S/N labels indicate the component-specific cluster assignment.
Mathematics 14 00855 g009
Table 1. Datasets and their corresponding fully expanded variables.
Table 1. Datasets and their corresponding fully expanded variables.
DatasetVariables
NoiseStation, Latitude, Longitude, White Noise East, North and Up, Flicker Noise East North and Up, Random Walk Noise East, North and Up
SlopeStation, Latitude, Longitude, East Slope, North Slope, Up Slope
PeriodStation, Latitude, Longitude, Annual East, North and Up Amplitudes and Phases, Halfyear East, North and Up Amplitudes and Phases.
QualityStation, Latitude, Longitude, Time Series Length, Data Count, Offset Count
Note: The variables Station, Latitude, and Longitude are not used in any computation; they are included only as informational metadata.
Table 2. Performance metrics for the clustering methods applied to each GNSS signal component. The table reports Silhouette, Davies–Bouldin, and Calinski–Harabasz indices for all tested algorithms.
Table 2. Performance metrics for the clustering methods applied to each GNSS signal component. The table reports Silhouette, Davies–Bouldin, and Calinski–Harabasz indices for all tested algorithms.
ComponentMethodSilhouetteDavies–BouldinCalinski–Harabasz
Slopek-means0.4061.1095170.972
PAM0.4031.1395125.477
Hierarchical0.3361.2603803.265
Fuzzy C-m0.3991.1625096.332
G-M0.2271.6481994.017
Qualityk-means0.4871.0425113.207
PAM0.4921.0824725.828
Hierarchical0.4421.0894209.896
Fuzzy C-m0.4891.0285079.954
G-M0.2511.4441223.629
Noisek-means0.3471.6433147.169
PAM0.1741.8332864.923
Hierarchical0.2571.8192390.510
Fuzzy C-m0.1872.0032869.742
G-M0.1512.8421549.336
Periodk-means0.1991.9711921.221
Hierarchical0.1712.1591602.448
Fuzzy C-m0.1642.2371641.621
G-M0.1342.2851430.125
Table 3. Summary of the main characteristics of the clusters identified for each GNSS signal component (periodicity, quality, noise, and slope), including the number of stations, their relative proportion, and the defining features of each cluster profile.
Table 3. Summary of the main characteristics of the clusters identified for each GNSS signal component (periodicity, quality, noise, and slope), including the number of stations, their relative proportion, and the defining features of each cluster profile.
Compon.ClusterN%Profile
PeriodP1201323.84%Strong annual periodicity in East and Up; weak in North
P2357442.33%Strong annual periodicity in North; marked semiannual in horizontal
P3285733.83%Moderate annual periodicity in horizontal; strong semiannual in Up
QualityQ195612.07%Very few offsets (highest quality)
Q2254032.07%More frequent offsets
Q3442555.86%High data completeness
NoiseN1169219.19%High random-walk and white noise
N2562863.84%Generally low noise levels
N3149616.97%Elevated flicker noise
SlopeS1433447.12%Low velocities in all components
S2150716.39%Intermediate velocities; high in Up
S3335636.49%Strong horizontal velocities
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Álvarez-Ruiz, D.; Sánchez-Alzola, A.; Pastor-Fernández, A. Geospatial Clustering of GNSS Stations Using Unsupervised Learning: A Statistical Framework to Enhance Deformation Analysis for Environmental Risk Management. Mathematics 2026, 14, 855. https://doi.org/10.3390/math14050855

AMA Style

Álvarez-Ruiz D, Sánchez-Alzola A, Pastor-Fernández A. Geospatial Clustering of GNSS Stations Using Unsupervised Learning: A Statistical Framework to Enhance Deformation Analysis for Environmental Risk Management. Mathematics. 2026; 14(5):855. https://doi.org/10.3390/math14050855

Chicago/Turabian Style

Álvarez-Ruiz, Daniel, Alberto Sánchez-Alzola, and Andrés Pastor-Fernández. 2026. "Geospatial Clustering of GNSS Stations Using Unsupervised Learning: A Statistical Framework to Enhance Deformation Analysis for Environmental Risk Management" Mathematics 14, no. 5: 855. https://doi.org/10.3390/math14050855

APA Style

Álvarez-Ruiz, D., Sánchez-Alzola, A., & Pastor-Fernández, A. (2026). Geospatial Clustering of GNSS Stations Using Unsupervised Learning: A Statistical Framework to Enhance Deformation Analysis for Environmental Risk Management. Mathematics, 14(5), 855. https://doi.org/10.3390/math14050855

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop