A Softsensor for Wind Measurements in Karst Caves

Kocijan, Juš; Perne, Matija; Gabrovšek, Franci; Mlakar, Primož; Grašič, Boštjan; Božnar, Marija Zlata

doi:10.3390/s26010022

Open AccessArticle

A Softsensor for Wind Measurements in Karst Caves

by

Juš Kocijan

^1,2,*

,

Matija Perne

^1,3

,

Franci Gabrovšek

^3,4

,

Primož Mlakar

⁵

,

Boštjan Grašič

⁵

and

Marija Zlata Božnar

⁵

¹

Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia

²

Centre for Information Technologies and Applied Mathematics, University of Nova Gorica, 5000 Nova Gorica, Slovenia

³

Faculty of Mathematics and Physics, University of Ljubljana, 1000 Ljubljana, Slovenia

⁴

Karst Research Institute, Research Centre of the Slovenian Academy of Sciences and Arts ZRC-SAZU, 6230 Postojna, Slovenia

⁵

MEIS d.o.o., Mali Vrh pri Šmarju, 1293 Šmarje–Sap, Slovenia

^*

Author to whom correspondence should be addressed.

Sensors 2026, 26(1), 22; https://doi.org/10.3390/s26010022

Submission received: 13 November 2025 / Revised: 11 December 2025 / Accepted: 17 December 2025 / Published: 19 December 2025

(This article belongs to the Special Issue Novel Sensing Technologies for Environmental Monitoring and Detection)

Download

Browse Figures

Versions Notes

Abstract

A data-driven soft sensor of wind in a cave passage is developed as an alternative to physical anemometers for measuring wind velocity. It is intended to either fill data gaps during periods without physical measurements or to serve as a substitute for the physical sensor. It is implemented as a Gaussian process model, trained on one year of half-hourly measurements. Statistical measures and visual inspection of the test data indicate that both selected model structures perform well. Therefore, soft sensors represent a viable tool in underground meteorology. They may replace physical sensors that are fragile, power-intensive, or expensive. Alternatively, they can fill data gaps when a physical sensor is unavailable.

Keywords:

soft sensor; karst; cave; meteorology

1. Introduction

The use of soft sensors offers excellent potential for studying karst underground air. We present an example of a successfully developed soft sensor of wind along a cave passage.

Karst is a type of topography formed on soluble bedrock and characterised by distinctive landforms such as caves. As a result of dissolution, karst contains extensive air-filled underground voids of large volumes and dimensions, which can extend to considerable depths below the surface. Several reasons exist to study the underground air within these voids. One application is the use of speleothems, cave mineral deposits, as paleoclimate proxies: the speleothem’s

δ^{13}

C value reflects the

δ^{13}

C value and dissolved inorganic carbon (DIC) concentration in drip water [1,2,3]. However, it also depends on the CO₂ partial pressure

p {CO}_{2}

and isotopic composition in the cave atmosphere [1,2,4]. Thus, understanding the composition of cave air is necessary for climatic interpretation of the

δ^{13}

C signal [2]. Interpretation of speleothem

δ^{18}

O signals is analogous: they are influenced both by regional climate and by local cave conditions [5,6]. Karst underground air is also a significant source of radon in buildings [7,8] and responds dynamically to climate changes [9,10]. It provides insight into inaccessible parts of the karst subsurface [2,11]. Moreover, karst aquifers are an important water source, supplying an estimated 9 % [12] to 25 % [13] of the global population. Air conditions constrain the sustainable use of caves in tourism [14,15,16,17,18]. In carbonate karst, the dissolution reaction consumes CO₂ [19,20] and is being investigated as a potential carbon sink [21,22]. In aquifer regions dominated by free-surface flow [23], carbonate dissolution is primarily governed by underground air transporting the necessary CO₂ [24,25].

The movement of underground air strongly influences its role in storing and transporting chemical species and heat in karst. Consequently, underground air velocity—wind and ventilation—has been extensively studied [11,26,27], particularly regarding CO₂ concentration [25,28,29], speleothem deposition, and paleoclimate records [30,31].

The wide-ranging effects of air movement are reflected in the various principles used to measure it [32,33]. Nevertheless, accurately measuring low air velocities remains challenging. The typical sensor of choice, the ultrasonic anemometer, is often fragile, power-intensive, or costly. While useful in cave settings, its deployment in challenging locations and over extended time periods may be impractical.

A potential alternative is a soft sensor. A soft sensor, also called a virtual or inferential sensor [34], is a computational model that estimates the value of an unmeasured variable. It uses measurements of interdependent variables and a process model to estimate the unmeasured variable, employing either first-principles or data-driven models. Soft sensors are typically used when variables are complex, expensive, or impossible to measure directly in real time. Soft sensors are widely applied in process engineering [35] and manufacturing [36]. Recent surveys cover this topic [37,38,39]. Although common in engineering, karst science has not yet adopted soft sensors.

In this study, wind sensors are replaced by a combination of other sensors and a mathematical model to estimate wind velocity and direction. Inputs to the soft sensor can include temperature measurements, which are easier to obtain in caves than wind, and surface meteorological variables measured or predicted near the cave. Because karst underground is typically poorly characterised, soft sensors generally rely on data-driven rather than first-principles models [40]. Developing a soft sensor for wind velocity, therefore, requires wind measurements over a calibration period. The key benefit of using a soft sensor is simpler maintenance of the measurement system. Alternatively, the soft sensor can operate alongside the physical anemometer, filling in the data gaps. Importantly, a soft sensor can be developed retrospectively for past periods with available input data. Temperatures in a cave passage can be monitored first to determine whether wind measurement would be beneficial. Once an anemometer is added, a soft sensor can be calibrated and used to reconstruct wind data for the earlier period of temperature monitoring.

The paper’s main original contributions are as follows:

A data-driven method for developing a soft sensor of a meteorological variable in karst underground;
Demonstration of the method’s effectiveness via a case study of wind in a selected cave passage.

The rest of the paper is structured as follows: Section 2 outlines the theoretical tools, Section 3 presents the case study and instrumentation, Section 4 shows results, Section 5 discusses them, and Section 6 provides a concise summary and concludes.

2. Methods

2.1. Procedure for the Development of a Soft Sensor

Developing a soft sensor follows a procedure similar to mathematical modelling. The main steps for the development are as follows:

Selection of variables to model and defining the soft sensor requirements, such as accuracy and precision.
Data collection and preprocessing.
Modelling the system using methods such as experimental modelling (system identification), first-principles modelling, or observer design. The topic is well-documented in literature, e.g., [41,42].
Validation of the soft sensor against preset requirements using data not used in the development of the soft sensor.
Implementation on-site and regular maintenance.

Soft-sensor performance is highly dependent on measurements of other interdependent variables. Therefore, measurements must be consistent with field conditions, and equivalent sensor measurements should be used during development of the soft sensor and its operation. Any inconsistency leads to deviations in soft-sensor outputs. Examples of such discrepancies include sensor relocation or changes in the operational environment of the sensor.

Here, we focus on data-driven soft-sensor development using experimental modelling, also known as system identification. Any suitable data-driven modelling method, such as linear or nonlinear regression or classification, may be employed. We use a Gaussian process (GP) regression model to both obtain the prediction and quantify its uncertainty. Alternative models [43] may be various neural network models, such as multilayer perceptron models or long-short-term memory models, kernel models, such as support vector machines, fuzzy models, etc. Main strengths of the GP model in comparison with other models are flexibility because it can model complex nonlinear functions without specifying explicit structure; uncertainty quantification providing prediction variance; it performs well with small-to-moderate datasets; and others. The main weaknesses of the method are scalability and computational cost, where computational load rises with the third power of the number of data points, which can be circumvented using approximation methods.

2.2. Assumptions and Constraints

Observations used to develop the model underlying our soft sensor came from cave and surface measurements, supplemented with numerical weather forecasts. The latter are essential when weather measurements are not available. Sensor placement is fixed and must remain unchanged. Accurate instruments are essential because the soft sensor’s accuracy depends on them.

All variables are measured synchronously.

2.3. Performance Metrics

Modelling performance was evaluated using two cost functions. The first is selected to evaluate the model’s time-dependent predictions relative to the original system’s response. This evaluation uses the normalised mean-squared error (NMSE), defined as [44]

NMSE = \frac{1}{N} \frac{{∥y - E (\hat{y})∥}^{2}}{σ_{y}^{2}},

(1)

where

$y$ —the vector of observations,
$E (\hat{y})$ —the mean value of estimations $\hat{y}$ , where $\hat{y}$ is considered as a stochastic variable, which gives different values at each repeated observation,
$σ_{y}^{2}$ —the variance of observations,
N—the number of observations.

NMSE is a commonly used standardised measure for the accuracy of predicted mean values; an NMSE of 0 indicates a perfect model. The coefficient of determination

R^{2}

can be derived from NMSE and is defined as [45]

R^{2} = 1 - \frac{{∥y - E (\hat{y})∥}^{2}}{N σ_{y}^{2}} \cdot 100 % = (1 - NMSE) \cdot 100 % .

(2)

It ranges from 0% to 100%, with higher values indicating better performance.

The second measure is the Mean Standardised Log Loss (MSLL) [46],

\begin{matrix} MSLL & = & \frac{1}{2 N} \sum_{i = 1}^{N} [ln (σ_{i}^{2}) + \frac{{(E ({\hat{y}}_{i}) - y_{i})}^{2}}{σ_{i}^{2}}] \\ - & \frac{1}{2 N} \sum_{i = 1}^{N} [ln (σ_{y}^{2}) + \frac{{(y_{i} - E (y))}^{2}}{σ_{y}^{2}}], \end{matrix}

(3)

where

σ_{i}^{2}

is the variance of prediction in the i-th step,

σ_{y}^{2}

is the variance of observations,

y

is the vector of the observations and

E (\cdot)

denotes the expectation, i.e., the mean value.

MSLL is a standardised measure suitable for evaluating predictions of random variables. It incorporates the variance of predictions; prediction errors are weighted more heavily when associated prediction variance is smaller. The MSLL is approximately zero for the simpler models and negative for the more complex ones.

2.4. Gaussian Process Models

Gaussian process (GP) modelling [46,47], or kriging, models the input-output relationship

f (x)

of the regression vector

x

with a GP. A GP model is a nonparametric, probabilistic model used for regression, classification, and function approximation. It is particularly powerful when both the prediction and the associated uncertainty estimate are required. GP is a stochastic process containing random variables

f (x_{i})

with a normal probability distribution,

p (f (x_{1}), \dots, f (x_{N}) ∣ x_{1}, \dots, x_{N})) = N (m, K) .

(4)

The vectors

x_{i}

are regressor vectors, f denotes the GP,

m

is the mean vector and

K

is the covariance matrix of the Gaussian distribution

N

. In GP modelling, we describe the GP with a mean function and a covariance function,

m_{i} = m (x_{i}), K_{i j} = C (x_{i}, x_{j}),

(5)

where

m (x_{i})

is the mean function and

C (x_{i}, x_{j})

is the covariance function. GP models flexibly approximate complex functions (Figure 1) using covariance kernels [47].

2.5. Models of Dynamic Systems

The system under study is dynamic. Its output depends on current and past inputs, unlike in a static system, where it depends only on current input. This consideration requires careful selection of the regression model structure. Various dynamic model structures can be employed for soft sensors. We focus on two commonly used models: the Finite Impulse Response (FIR) and the AutoRegressive model with exogenous input (ARX).

2.5.1. Finite Impulse Response (FIR) and Autoregressive (ARX) Models

Nonlinear finite-impulse-response (NFIR) models employ solely the present and past samples of the input signal

u \in R

—specifically

u (k)

and

u (k - i)

at time step

k \in N

for

i \in N

—as regressors. Since regressors are restricted to input measurements only, NFIR structures are inherently stable, a crucial advantage when dealing with nonlinear systems where ensuring stability is more involved. The main advantages of NFIR models are already mentioned: stability, simplicity, and, consequently, fast training. A primary disadvantage is that they must frequently use many previous values of input variables to mimic the model dynamics, thereby increasing the model’s complexity. NFIR structures are well-suited to tasks such as control, identification of dynamical systems, noise suppression, modelling nonstationary time series, adaptive equalisation of communication channels, and various other signal-processing applications. A block diagram of the GP-NFIR model is shown in Figure 2.

Nonlinear ARX (NARX) models use both input values

u (k - i); I, k \in N

and measured output values

y (k - i)

as regressors. NARX formulation—also referred to as the equation-error or series–parallel representation—estimates the next output sample

\hat{y} (k)

from the latest available lagged input and output measurements.

\begin{matrix} \hat{y} (k) & = & f (y (k - 1), y (k - 2), \dots, y (k - n), u (k - 1), \\ u (k - 2), \dots, u (k - m)) + ν, \end{matrix}

(6)

where n is the maximum lag in the output values, m is the maximum lag in the input values, and

ν

is the white Gaussian noise. The main advantages of NARX models are they are more compact, as they require fewer lags, and they can capture internal states of systems, yielding better predictive performance when systems have internal states. The main disadvantages are potential simulation instability due to feedback; estimation and simulation are much more challenging and often iterative because past outputs are used in the model; consequently, they are computationally more demanding; and they can be biassed when outputs are noisy. The GP-NARX model is schematically shown in Figure 2.

2.5.2. What Are Prediction, Forecasting or Multi-Step Ahead Prediction, and Simulation?

One-step-ahead prediction (in short, prediction) denotes estimating the output at the immediate next sample. This is the standard task for prediction-type models such as ARX. The model’s long-term behaviour—used for forecasting over extended horizons or for validation—is typically assessed by simulation. Here, simulation means multistep-ahead prediction, where the horizon is either unbounded or matches the time span of interest for the analysis. For autoregressive schemes, multistep prediction can be performed by [47]:

direct approaches, which train a distinct model for each target horizon, or
iterative approaches, which obtain longer-horizon forecasts by repeatedly applying a one-step predictor.

The direct approach is limited because the prediction horizon must be chosen in advance and fixed; changing the horizon forces retraining. It also demands much more training data for strongly nonlinear systems that require long horizons. In the iterative formulation for Gaussian process predictors, the current output forecast is a function of prior output forecasts and observed inputs, expressed as:

\hat{y} (k) = f (\hat{y} (k - 1), \dots, \hat{y} (k - n), u (k - 1), \dots, u (k - m)) + ν (k),

(7)

where

\hat{y} (k - i); i = 1 \dots n

denotes the output estimate i samples or time steps in the past.

3. Case Study—Brezimeni Rov Passage in Postojna Cave, Slovenia

Measuring airflows in caves is crucial for understanding underground air dynamics. However, this is challenging because it requires sensitive, accurate anemometers, which are expensive and power-intensive. Using soft sensors can be a convenient and cost-effective alternative. There are also a couple of other reasons for using the soft sensor that are even more important in our study. The reasons are as follows:

Due to the impracticality of anemometers, wind data are missing for some past periods when temperatures were recorded. We believe our site is no exception in this regard. Monitoring often begins with only the more convenient sensors, with advanced ones added if the site warrants further study. The only way to obtain the data on quantities that were not measured during the initial period is with soft sensors.
Without real-time data connections, equipment failures are detected only during periodic visits, creating substantial data gaps, possibly months long. Most underground monitoring sites encounter this challenge. Depending on the type of study, such gaps may be a significant obstacle, but soft sensors can be used to fill them.

Postojna Cave, in Slovenia’s karst region, is a popular tourist attraction [48]. Its entrance is located at 45.783° N, 14.204° E, and its map is presented in Figure 3. The influence of the large number of visitors on the underground environment presents both a need for monitoring for conservation purposes and a research opportunity. Brezimeni Rov, marked in Figure 3, contributes to the ventilation of the most visited parts of the cave to a substantial amount and has electrical power available at its entrance. Thus, it is a particularly suitable location for monitoring the underground air.

Brezimeni Rov branches from the Stara Jama passage. It is approximately 325 m long along its axis and is relatively uniform with few splits and loops. The initial several tens of metres consist of a simple channel formed in bedrock, partly filled with sediments and flowstone. Occasional chambers with some collapse follow, until after 180 m, the passage rises into a collapsed hall approximately 20 m across. From the hall to the point 230 m from the entrance, one can progress along different levels. 222 m from the start of the passage, there is a dome, and at 250 m, the passage splits. The eastern part is 30 m long, narrow (1 × 2 m), and mostly horizontal, while the southern part is 75 m long and more diverse, with ascents and descents and lots of flowstone and sediment.

Preliminary observations indicate that primary airflow through Brezimeni Rov occurs between the passage start and the dome at 222 m, from which we can infer that the dome has a connection to the surface that is passable for air. No additional air inflows or outflows were detected. The air flow through Brezimeni Rov is an important fraction of the total ventilation of Postojna Cave. Its intensity and direction imply that it is mainly driven by chimney effect: it has the direction into the passage and up the dome in cold periods, is reversed in hot periods, and is strongest at most extreme outside temperatures. Based on these observations, we decided to measure air velocity at a spot convenient for estimating air flow. Temperatures are measured at multiple points along the airflow path to quantify heat exchange between the rock and the air. Locations of sensors are shown in Figure 4.

The wind sensor used is an Ultrasonic Anemometer 3D manufactured by Thies Clima, sampling at 20 Hz. The measurements are averaged over 30 min periods. Because of the cave layout (Figure 5), wind direction is primarily constrained to the two directions along the passage. To simplify sensor development, wind velocity and direction were combined into a single signal, with positive and negative signs indicating direction.

CO₂ concentration is measured with the Carbon Dioxide Probe GM252 by Vaisala Oyj, Helsinki, Finland, offering the range of 0–10,000 ppm CO₂ with the accuracy of ±40 ppm CO₂ at concentrations up to 3000 ppm and ±2% above. Temperature T1 is measured with a Pt-100 sensor pt100-1 by Microstep-MIS, Bratislava, Slovakia and T2 with a Pt-100 sensor HYGROCLIP S3C03-PT15 by Rotronic, Ettlingen, Germany both accurate to ±0.1 K. The temperatures are read and logged by the AMS-111 data logger from Microstep-MIS, Bratislava, Slovakia. Ten additional temperatures are recorded with autonomous loggers of the type MX2203 TidbiT MX Temperature 400’ Data Logger by HOBO, Onset Computer Corporation, Bourne, MA, USA, accurate to ±0.2 K and with resolution 0.01 K. The locations of the sensors are presented in Table 1.

The sensor ‘Temperature HOBO Brezi deade’ 277 m into the passage serves as a control, verifying that conditions beyond the dome at 222 m are stable with minimal changes in the weather.

The sensors ‘Temperature HOBO Stara dol’ and ‘Temperature HOBO Stara gor’ are located in the Stara Jama passage that the Brezimeni Rov passage connects to. They are north of the entrance to the Brezimeni Rov passage, that is, further away from the cave entrance and deeper into the massif. Ceiling height at this location is 7.7 m. The sensor’s Temperature HOBO Stara dol’ occasionally experiences flooding.

At the wind sensor site, the passage is approximately 6.2 m wide and 2.7 m high. The sensor is located 2.2 m from the passage wall, and its centre is 0.65 m above the passage floor.

Outside weather data were obtained from the Slovenian Environmental Agency (ARSO) station at 45.772° N, 14.197° E. The measurements of the Pivka River are taken at the sink at the Postojna Cave entrance, also by ARSO.

Assuming cave wind events relate to atmospheric conditions above Postojna Cave, regressors are chosen to maximise information about external conditions. The scheme of conditions of the Brezimeni Rov passage is illustrated in Figure 5.

The representativeness of the measurements at the Postojna ground-level meteorological station of the national measurement network is limited. In particular, wind measurements, carried out as standard at 10 m, can reflect a very local situation. The reason is that the surroundings of Postojna are characterised by very complex terrain. Weather forecasting offers an alternative to station measurements. The weather forecasting model generally operates on slightly larger horizontal spatial cells and therefore does not capture all terrain details. Consequently, wind direction and speed in a spatial cell represent the broader area, rather than the specific micro-location of the meteorological station, as direct measurements do.

Postojna Cave has several larger, locally distributed openings and presumably also many small openings through which the cave air is connected to the outer atmosphere. Thus, data on wind, temperature, and pressure should represent a wider area than that covered by the meteorological station.

Weather reanalysis from the MEIS company, produced with the WRF model [49,50], is used. The model simulations, in a setup used for weather forecasting, have also been successfully validated in the past at several locations across Slovenia, where ground measurements and vertical profiles of wind and temperature are available [51,52].

Reanalysis is an improved weather forecast that reconstructs past or current weather conditions without projecting into the future. It utilises measured weather data for boundary and initial conditions to reconstruct variables across the model’s entire 3D domain. This approach generally yields more accurate results than forecasting future weather.

Characteristics of the used reconstruction of meteorological variables of the external atmosphere are as follows:

The model used is WRF [49,50].
Two nested domains.
The inner domain is the area of Slovenia with minimal surroundings.
The inner domain has a horizontal spatial resolution of 4 km × 4 km.
The temporal resolution is half an hour.

We use the results for a 4 km × 4 km ground cell covering the area around Postojna, where the main cave entrance is located. The data are treated as if a ground station were located in the centre of the cell, reflecting smoothed terrain characteristics rather than highly local variations. Reanalysis provides (the rest of the variables are listed in Table A2) the following:

Air temperature at 2 m.
Relative air humidity at 2 m.
Wind speed and direction at 10 m.
Air pressure at the ground.
Short-wave incoming radiation.
Precipitation.
Cloud cover.

Figure 3. Position of the Brezimeni Rov passage within the Postojna Cave. Adapted from Šebela [53].

Figure 4. Brezimeni Rov passage with sensor locations. The dashed line in Stara Jama is the tourist railway. Drawn after Gallino [54].

Figure 5. Scheme of conditions of the Brezimeni Rov passage. The arrows show wind direction.

4. Results

4.1. Data

Data used for statistical modelling comprises cave signal measurements, weather station data from the surrounding area, and weather forecasts. Model output data are extracted from the signal combining wind velocity and direction (Table A1 in the Appendix A), while input data or regressors are obtained from the remaining signals, listed in Table A2 in the Appendix A.

The data points are separated by 30 min, which is the sampling rate of the weather forecast. The measurements are taken every 10 or 5 min, so they are downsampled; only the samples synchronous with the weather forecast are kept. The only signal that requires more preprocessing is the cumulative precipitation, where three 10 min totals are summed into each 30 min total.

These data were, according to modelling practice, divided into three sets: a training dataset, a validation dataset, and a test dataset. In modelling from data, ‘training data’ is the dataset used to teach the model by adjusting its parameters, ‘validation data’ is used during training to select structure, tune hyperparameters, and prevent overfitting, and ‘test data’ is a separate dataset used after training to evaluate the model’s final performance on unseen data.

While outside weather measurements and forecasts are often gap-free, this cannot always be said for cave measurements. Consequently, data should be preprocessed to remove the gaps. There are two options for eliminating the gaps. The first one is to use a data imputation [55], e.g., to interpolate, linearly or with other suitable functions, among the existing data, but this means that you incorporated some prior knowledge about how data should look in the gaps. Moreover, this option is only sensible when data gaps are narrow. The second option is to build the matrix of all regression vectors, called the regressor or feature matrix, for model training, and then eliminate rows with missing data to obtain a matrix with all elements. While this is the procedure for the training dataset, the test dataset should contain only time series without gaps, even though you limit the test data range. In our case, we selected a data range with no gaps for training, validation and test.

The dataset included approximately 18,000 points for training and validation, and 11,500 for testing. Examination of data distributions is beneficial for identifying anomalies, such as outliers or data points that, for various reasons, are outside the expected range. Figure 6 shows the distributions of signals used for modelling. The training signal (the reference wind velocity) consisted of data from 2024. The test signal comprised data from January 2025 until the end of August 2025. The test dataset is used solely for the model’s final evaluation.

The final step in data preprocessing is data normalisation, which yields scale alignment, improved numerical stability, faster and more stable optimisation, and other benefits. There are different ways to normalise data. We used standardisation, in which we scaled the input data to have a zero mean and unit variance. One needs to keep in mind that model responses must be denormalised before comparison with the original data.

4.2. Model Structure

Model structure selection depends partly on the chosen modelling method. Two model structures are compared: NFIR and NARX. For each modelling method, regressors have to be selected. The regressors, as well as other structural parameters, are determined using cross-validation. In our case, we used a 3-fold cross-validation, in which the data, excluding the test data, is divided into three equal subsets. Each subset is used once as a validation dataset, while the remaining two are used as a training dataset. The scores of statistical measures from three modelling runs are averaged, and the structure inputs or regressors with the best score are selected. The obtained results are presented below.

4.2.1. Nonlinear Finite Impulse Response—NFIR Structure

The regressor-selection procedure, using the backward elimination method [47], yielded six signals, comprising measurements of variables taken both inside and outside the cave, as well as weather forecasts. These signals ranged from one to five lags to introduce dynamics into the model. Finally, 30 regressors were used as input to the NFIR model, as shown in Table A3 in the Appendix A, with one output representing cave wind velocity. The isotropic rational quadratic covariance function [47] was selected as the most appropriate based on cross-validation (Table A5 in Appendix A). Other parameters used for GP models (see [46,47] for details) include a zero mean function and exact inference and prediction with a Gaussian likelihood.

4.2.2. Nonlinear AutoRegressive Model with Exogenous Input—NARX Structure

The regressor-selection procedure, employing the forward selection method [47], yielded nine signals, comprising measurements of variables taken inside and outside the cave, as well as weather forecasts. These signals ranged from one to two lags to introduce dynamics into the model. The number of lags and, consequently, the number of regressor candidates are smaller for the NARX model. Finally, 11 regressors were used as input to the NARX model, as shown in Table A4 in the Appendix A, with one output representing cave wind velocity. The isotropic Matérn covariance function with the smoothness parameter of value

\frac{3}{2}

[47] was selected as most appropriate based on cross-validation (Table A5 in Appendix A). Other parameters used for GP models (see [46,47] for details) include a zero mean function and exact inference and prediction with a Gaussian likelihood.

4.3. Learning of the Model and Test Results

The full dataset, excluding test data, was used for model training. The MATLAB Statistics and Machine Learning Toolbox was used to train and test the models. The hyperparameters of models are given in Table A6 in the Appendix A. The obtained results are shown in Figure 7 for the NFIR model and in Figure 8 for the NARX model. In the NFIR model, there is no distinction between prediction and simulation because of its structure. The NARX model employs the iterative method to simulate the dynamic model. The statistical evaluation of simulation results on test signals yields NMSE = 0.056, R² = 94.44% and MSLL = −1.437 for the GP-NFIR model, and NMSE = 0.066, R² = 93.37% and MSLL = −1.113 for the GP-NARX model.

Suppose the simulation results are divided into absolute values of wind velocity and wind direction. In that case, the statistical measures for the simulated absolute values of wind velocity are NMSE = 0.216, R² = 78.39% and MSLL = −0.738 for the GP-NFIR model, and NMSE = 0.246, R² = 75.41% and MSLL = −0.568 for the GP-NARX model. The statistical evaluation of direction results yields NMSE = 0.092 and R² = 90.83% for the GP-NFIR model, and NMSE = 0.128 and R² = 87.24% for the GP-NARX model. For wind direction evaluation, data points with velocities below 0.1 m/s were excluded.

5. Discussion

Our goal was to develop a statistical model capable of substituting direct measurements of underground wind velocity. The model is considered effective if it can forecast the variable of interest using available measurements. The simulation results using independent data indicate that the goal has been achieved. No significant differences in simulation quality were observed between the model structures, as shown in Figure 7 and Figure 8. While statistical measures provide insight into model performance, visual inspection of all model responses remains essential. The evidence suggests that the resulting models are suitable for use as soft sensors.

Regressors obtained via a systematic machine-learning procedure differ between the two model structures. This difference confirms that the regressors provide statistical information to the model, rather than physics-based information. Physical causality should not be inferred from statistical models alone. But the hypothesis of physical causality can be confirmed using statistical methods. In this study, causality cannot be inferred from the obtained model, as indicated by the differing optimal regressors.

While the lists of selected regressors do not imply causality, they do match the general expectations. The regressors of the GP-NFIR model, listed in Table A3, are based on outside temperature, wind velocity, and wind direction, and on temperatures at three locations inside the Brezimeni Rov passage. Underground airflows are driven by the wind and by the temperature differences between the subsurface and the outside air [27], which are precisely the quantities captured by the selected regressors. The temperatures inside the cave are also affected by the cave airflow, which can further explain their relevance. Several lags are present due to the dynamics of the system—trends in these quantities affect the airflow as well. The signals underlying the regressors are of good quality, measured with sufficient accuracy, and distributions of their values cover the anticipated ranges, as demonstrated in Figure 6.

The regressors of the GP-NARX model are listed in Table A4. Past values of cave airflow are prominent; they encode information about the forces that previously drove the airflow and therefore serve as strong predictors of the current airflow. The other regressors that further improve the predictions contain the outside air temperature and wind, and a couple of air temperatures in the cave, just like in the case of GP-NFIR. In GP-NARX, the outside wind information is obtained from the reanalysis, while the source in GP-NFIR is the measurement. Although they originate from different sources, both pairs of signals contain similar information, making it difficult to explain the models’ differing preferences. Unlike in GP-NFIR, we also see regressors based on the forecasted potential temperature and sun elevation, and the measured temperature of the river that sinks into the cave. The potential temperature provides more information on the weather; the elevation of the Sun depends on the part of the day and the season. The river temperature both affects the cave airflow and provides information on the weather in the past. Additional studies would be necessary in order to determine whether the soft sensor benefits more from the river temperature’s direct thermal influence on cave airflow or from its role as a proxy for past weather conditions.

In our case, the simulation accuracy of both evaluated models, GP-FIR and GP-NARX, is quite similar. Nevertheless, one must consider the properties of each model, listed in Section 2.5, when selecting the model structure.

Statistical models are generally reliable for interpolation but not for extrapolation. Consequently, the model—and thus the soft sensor—is reliable only within the range of the training data. Beyond this range, model predictions may be inaccurate. In Bayesian models, including Gaussian process models, low confidence in model responses is indicated by high variance.

The close agreement between model responses and independent data supports the use of dynamic models. Although dynamic models are considerably more complex than static models, it is important to assess whether dynamics can be neglected.

By demonstrating the use of a statistical model to develop a soft sensor for underground wind velocity, we fulfilled the goal of showing how soft sensors can be utilised when hardware sensors are unavailable.

A common challenge in environmental monitoring is the occurrence of data gaps due to sensor failure [56]. Data on karst systems is no exception [57,58,59]. Such gaps may significantly bias statistics and impede long-term analyses [60]. In the case of Brezimeni Rov, heat balance studies cannot be conducted during periods with gaps in a quantity as crucial as wind velocity. The soft sensor enables such studies by providing estimates to fill data gaps.

Cave monitoring sites similar to our case study of Brezimeni Rov are widespread [18,61,62,63]. The choice of site for soft sensor development was not based on any special local characteristics. It can be inferred that applying this method at another well-instrumented site would produce a similarly successful soft sensor; hence, the approach can be recommended.

6. Conclusions

A data-driven soft sensor of wind velocity along a cave passage has been successfully implemented. It allows estimation of wind velocity in the passage from other measurements, without direct measurement. The constructed soft sensor demonstrates reasonable accuracy based on statistical measures and visual inspection of independent test data, supporting its use as a replacement for direct measurements.

When using a soft sensor, one must keep in mind that the data-driven model developed for the soft sensor cannot be applied to another location under any circumstances. The model remains invalid even after minor location changes. However, the method for developing a soft sensor demonstrated can be applied to any cave system at any location, regardless of key site characteristics, provided that key measurements are available. Some prior knowledge of cave-system physics is necessary to select appropriate variables for measurement, based on potential relationships with the soft-sensor output variable. A simple rule that more variables are better is applicable.

The demonstrated method has several advantages over direct measurement. A primary advantage is convenience, as most accurate wind sensors are expensive and power-intensive. Knowing wind velocity without measuring it is thus beneficial. Even with a physical sensor in place, a soft sensor can supplement missing data in the measurement time series. A soft sensor trained after installation of the physical sensor can also reconstruct past periods, provided the necessary input data were recorded.

Being data-driven, the soft sensor requires training data that includes the output quantity. The development of the soft sensor for wind measurement, therefore, requires that the physical anemometer be installed for a specific period. Only with sufficient training data can a soft sensor replacing the physical anemometer be developed.

This soft sensor will aid further research on cave meteorology and climatology. It can reconstruct wind speeds when no anemometer measurements exist and fill eventual data gaps. It may also allow downsizing of monitoring equipment; the anemometer could be removed while wind velocity continues to be obtained from the soft sensor. The present study serves as a proof-of-concept demonstrating the utility of soft sensors in karst science. More applications for various variables in underground systems are envisaged in future work, and these additional case studies will strengthen the proposed concept.

Author Contributions

Conceptualisation, J.K. and M.P.; methodology, J.K., F.G. and M.P.; software, J.K.; validation, J.K., F.G. and M.P.; investigation, J.K. and M.P.; resources, P.M., B.G. and M.Z.B.; data curation, M.P. and B.G.; writing—original draft preparation, J.K., M.P., P.M., B.G. and M.Z.B.; writing—review and editing, J.K., M.P., P.M., B.G. and M.Z.B.; project administration, J.K. and M.P.; funding acquisition, J.K. and M.P. All authors have read and agreed to the published version of the manuscript.

Funding

The authors acknowledge projects “Air in karst underground as a sink of greenhouse gases”, Id N2-0299, “Atmosphere Identification for Protection of Population in Preparation for Accidental Releases—MARIONETTE”, Id L2-60149, and Research Core Funding No. P2-0001, which were all financially supported by the Slovenian Research and Innovation Agency.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

Authors Primož Mlakar, Boštjan Grašič and Marija Zlata Božnar were employed by the company MEIS d.o.o. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A. Available Signals, Selected Regressors and Covariance Function

Table A1. Signal/variable from which output data for training of the soft sensor is extracted (symbol ‘x’ stands for check mark).

Variable	Cave
Cave wind velocity (with incorporated direction)	x

Table A2. Signals/variables from which regressors were selected (symbol ‘x’ stands for check mark).

Variable	Cave	Outside	Forecast
CO₂	x
Temperature T1	x
Temperature T2	x
Temperature HOBO Stara dol	x
Temperature HOBO Stara gor	x
Temperature HOBO Anemo	x
Temperature HOBO Kamra	x
Temperature HOBO Voda	x
Temperature HOBO Dvorana	x
Temperature HOBO Rov dvor	x
Temperature HOBO Kamin dol	x
Temperature HOBO Za kaminom	x
Temperature HOBO Brezi deade	x
Air temperature		x
Relative humidity		x
Global radiation		x
Diffuse sky radiation		x
Pressure		x
Wind velocity		x
Wind direction		x
Precipitation cumulative 10 min		x
Snow height		x
Pivka water temperature		x
Pivka water level		x
Precipitation cumulative		x
Air temperature			x
Wind velocity WRF			x
Wind direction WRF			x
Precipitation cumulative			x
QNH			x
Cloudiness			x
Relative humidity			x
Global radiation			x
Diffuse sky radiation			x
Pressure			x
Temperature 1000			x
Temperature red prit			x
Irradiation direct			x
Irradiation global			x
Irradiation diffuse			x
Irradiation JDN			x
Irradiation elevation			x
Irradiation azimuth			x
Irradiation declination			x
Visibility TCDCclm			x
Visibility LCDClcll			x
Visibility MCDCmcll			x
Visibility HCDChcll			x
Visibility VISsfc			x

Table A3. Signals/variables and lags in samples selected as winning regressors in GP-NFIR model-structure optimisation.

Variable	Lag in Samples
Temperature T1	1
Temperature T1	2
Temperature T1	3
Temperature T1	4
Temperature T1	5
Temperature HOBO Rov dvor	1
Temperature HOBO Rov dvor	2
Temperature HOBO Rov dvor	3
Temperature HOBO Rov dvor	4
Temperature HOBO Rov dvor	5
Temperature HOBO Kamin dol	1
Temperature HOBO Kamin dol	2
Temperature HOBO Kamin dol	3
Temperature HOBO Kamin dol	4
Temperature HOBO Kamin dol	5
Air temperature	1
Air temperature	2
Air temperature	3
Air temperature	4
Air temperature	5
Wind velocity	1
Wind velocity	2
Wind velocity	3
Wind velocity	4
Wind velocity	5
Wind direction	1
Wind direction	2
Wind direction	3
Wind direction	4
Wind direction	5

Table A4. Signals/variables and lags in samples selected as winning regressors in GP-NARX model-structure optimisation.

Variable	Lag in Samples
Cave wind velocity	1
Cave wind velocity	2
Temperature HOBO Dvorana	2
Temperature HOBO Kamin dol	1
Air temperature	1
Air temperature	2
Pivka water temperature	1
Wind velocity WRF	2
Wind direction WRF	2
Temperature red prit	1
Irradiation elevation	2

Table A5. Mean-squared-error 3-fold cross-validation loss for covariance functions [47] during the selection procedure, where regressors as in Table A3 and Table A4 are used. ISO means isotropic covariance function, and ARD, which stands for Automatic Relevance Determination, means anisotropic covariance function. Loss values in bold are the winning values.

Covariance Function	GP-NFIR Loss	GP-NARX Loss
Matérn 3/2 ISO	0.037	0.033
Rational quadratic ISO	0.033	0.034
Rational quadratic ARD	0.040	0.035
Squared exponential ISO	0.048	0.034
Squared exponential ARD	0.046	0.034
Linear, one parameter ISO	0.194	0.035
Linear ISO	0.195	0.035
Linear ARD	0.195	0.035
Linear ARD + constant	0.195	0.035
Squared exponential ISO	0.051	0.034
Linear ISO + squared exponential ISO	0.038	0.035
Linear ISO + rational quadratic ISO	0.034	0.034
Lin. one parameter ISO + rational quadratic ISO	0.034	0.034

Table A6. Hyperparameters of GP-FIR and GP-NARX models.

Model	GP-FIR	GP-NARX
Covariance function	Rational quadratic ISO	Matérn 3/2 ISO
signal standard deviation	1.1	0.93
length scale	97	1.3
scale-mixture parameter	0.09	/
noise standard deviation	0.26	0.17
mean function	‘constant’	‘constant’
value	−0.66	−0.66

References

Lambert, W.J.; Aharon, P. Controls on dissolved inorganic carbon and δ¹³C in cave waters from DeSoto Caverns: Implications for speleothem δ¹³C assessments. Geochim. Cosmochim. Acta 2011, 75, 753–768. [Google Scholar] [CrossRef]
Mattey, D.; Atkinson, T.; Hoffmann, D.; Boyd, M.; Ainsworth, M.; Durell, R.; Latin, J.P. External controls on CO₂ in Gibraltar cave air and ground air: Implications for interpretation of δ¹³C in speleothems. Sci. Total Environ. 2021, 777, 146096. [Google Scholar] [CrossRef]
Wang, S.; Zhu, J.; Gao, K.; Zhao, B.; Zhang, Z.; Wang, Y.; Cheng, H. A drought event in central eastern China during MIS 6 revealed by elemental ratio and δ13C records from Hulu Cave. Palaeogeogr. Palaeoclimatol. Palaeoecol. 2025, 676, 113129. [Google Scholar] [CrossRef]
Baker, A.; Ito, E.; Smart, P.L.; McEwan, R.F. Elevated and variable values of ¹³C in speleothems in a British cave system. Chem. Geol. 1997, 136, 263–270. [Google Scholar] [CrossRef]
Hendy, C. The isotopic geochemistry of speleothems—I. The calculation of the effects of different modes of formation on the isotopic composition of speleothems and their applicability as palaeoclimatic indicators. Geochim. Cosmochim. Acta 1971, 35, 801–824. [Google Scholar] [CrossRef]
Lachniet, M.S. Climatic and environmental controls on speleothem oxygen-isotope values. Quat. Sci. Rev. 2009, 28, 412–432. [Google Scholar] [CrossRef]
Haneberg, W.C.; Wiggins, A.; Curl, D.C.; Greb, S.F.; Andrews, W.M., Jr.; Rademacher, K.; Rayens, M.K.; Hahn, E.J. A Geologically Based Indoor-Radon Potential Map of Kentucky. GeoHealth 2020, 4, e2020GH000263. [Google Scholar] [CrossRef]
Long, S.C.; Fenton, D.; Scivyer, C.; Monahan, E. Factors underlying persistently high radon levels in a house located in a karst limestone region of Ireland: Lessons learned about remediation. Nukleonika 2016, 61, 327–332. [Google Scholar] [CrossRef]
Badino, G. Cave temperatures and global climatic change. Int. J. Speleol. 2004, 33, 103–113. [Google Scholar] [CrossRef]
Domínguez-Villar, D.; Lojen, S.; Krklec, K.; Baker, A.; Fairchild, I.J. Is global warming affecting cave temperatures? Experimental and model data from a paradigmatic case study. Clim. Dyn. 2015, 45, 569–581. [Google Scholar] [CrossRef]
Covington, M.D.; Perne, M. Consider a cylindrical cave: A physicist’s view of cave and karst science. Acta Carsologica 2015, 44, 363–380. [Google Scholar] [CrossRef]
Stevanović, Z. Karst waters in potable water supply: A global scale overview. Environ. Earth Sci. 2019, 78, 662. [Google Scholar] [CrossRef]
Ford, D.C.; Williams, P.W. Karst Hydrogeology and Geomorphology; John Wiley & Sons: Chichester, UK, 2007. [Google Scholar]
Bastian, F.; Jurado, V.; Nováková, A.; Alabouvette, C.; Saiz-Jimenez, C. The microbiology of Lascaux Cave. Microbiology 2010, 156, 644–652. [Google Scholar] [CrossRef] [PubMed]
Vouvé, J.; Brunet, J.; Marsal, J. Les oeuvres rupestres de Lascaux (Montignac, France): Maintien des conditions de conseryation. Stud. Conserv. 1983, 28, 107–116. [Google Scholar] [CrossRef]
Pulido-Bosch, A.; Martín-Rosales, W.; López-Chicano, M.; Rodríguez-Navarro, C.M.; Vallejos, A. Human impact in a tourist karstic cave (Aracena, Spain). Environ. Geol. 1997, 31, 142–149. [Google Scholar] [CrossRef]
Pinçon, G. Compte rendu critique des actes du symposium international sur Lascaux et la conservation en milieu souterrain. Les Nouv. L’ArchéOlogie 2012, 128, 47–51. [Google Scholar] [CrossRef]
Novas, N.; Gázquez, J.A.; MacLennan, J.; García, R.M.; Fernández-Ros, M.; Manzano-Agugliaro, F. A real-time underground environment monitoring system for sustainable tourism of caves. J. Clean. Prod. 2017, 142, 2707–2721. [Google Scholar] [CrossRef]
Plummer, L.N.; Wigley, T.M.L.; Parkhurst, D.L. The kinetics of calcite dissolution in CO₂–water systems at 5° to 60 °C and 0.0 to 1.0 atm CO₂. Am. J. Sci. 1978, 278, 179–216. [Google Scholar] [CrossRef]
Kaufmann, G.; Dreybrodt, W. Calcite dissolution kinetics in the system CaCO3–H2O–CO2 at high undersaturation. Geochim. Cosmochim. Acta 2007, 71, 1398–1410. [Google Scholar] [CrossRef]
Liu, Z.; Dreybrodt, W.; Wang, H. A new direction in effective accounting for the atmospheric CO₂ budget: Considering the combined action of carbonate dissolution, the global water cycle and photosynthetic uptake of DIC by aquatic organisms. Earth-Sci. Rev. 2010, 99, 162–172. [Google Scholar] [CrossRef]
Liu, Z.; Macpherson, G.L.; Groves, C.; Martin, J.B.; Yuan, D.; Zeng, S. Large and active CO₂ uptake by coupled carbonate weathering. Earth-Sci. Rev. 2018, 182, 42–49. [Google Scholar] [CrossRef]
Perne, M.; Covington, M.; Gabrovšek, F. Evolution of karst conduit networks in transition from pressurized flow to free-surface flow. Hydrol. Earth Syst. Sci. 2014, 18, 4617–4633. [Google Scholar] [CrossRef]
Covington, M.D.; Prelovšek, M.; Gabrovšek, F. Influence of CO2 dynamics on the longitudinal variation of incision rates in soluble bedrock channels: Feedback mechanisms. Geomorphology 2013, 186, 85–95. [Google Scholar] [CrossRef]
Covington, M.D.; Knierim, K.J.; Young, H.A.; Rodriguez, J.; Gnoza, H.G. The impact of ventilation patterns on calcite dissolution rates within karst conduits. J. Hydrol. 2021, 593, 125824. [Google Scholar] [CrossRef]
Cigna, A.A. An analytical study of air circulation in caves. Int. J. Speleol. 1968, 3, 41–54. [Google Scholar] [CrossRef]
Gabrovšek, F. How do caves breathe: The airflow patterns in karst underground. PLoS ONE 2023, 18, e0283767. [Google Scholar] [CrossRef]
Lang, M.; Faimon, J.; Godissart, J.; Ek, C. Carbon dioxide seasonality in dynamically ventilated caves: The role of advective fluxes. Theor. Appl. Climatol. 2017, 129, 1355–1372. [Google Scholar] [CrossRef]
Kukuljan, L.; Gabrovšek, F.; Covington, M.D.; Johnston, V.E. CO₂ dynamics and heterogeneity in a cave atmosphere: Role of ventilation patterns and airflow pathways. Theor. Appl. Climatol. 2021, 146, 91–109. [Google Scholar] [CrossRef]
Spötl, C.; Fairchild, I.J.; Tooth, A.F. Cave air control on dripwater geochemistry, Obir Caves (Austria): Implications for speleothem deposition in dynamically ventilated caves. Geochim. Cosmochim. Acta 2005, 69, 2451–2468. [Google Scholar] [CrossRef]
James, E.W.; Banner, J.L.; Hardt, B. A global model for cave ventilation and seasonal bias in speleothem paleoclimate records. Geochem. Geophys. Geosystems 2015, 16, 1044–1051. [Google Scholar] [CrossRef]
Busch, N.E.; Christensen, O.; Kristensen, L.; Lading, L.; Larsen, S.E. Cups, Vanes, Propellers, and Laser Anemometers. In Air-Sea Interaction: Instruments and Methods; Dobson, F., Hasse, L., Davis, R., Eds.; Springer: Boston, MA, USA, 1980; pp. 11–46. [Google Scholar] [CrossRef]
Garrick, R.D.; Villasmil, L.; Lee, J.; Gutterman, J.S. Comparison of new ultrasonic and hot wire thermo-anemometer gas flow meters. In Proceedings of the 2011 Future of Instrumentation International Workshop (FIIW) Proceedings, Online, 7–8 November 2011; pp. 164–167. [Google Scholar] [CrossRef]
Karba, R.; Kocijan, J.; Bajd, T.; Karer, M.Ž.; Karer, G. Terminological Dictionary of Automatic Control, Systems and Robotics; Springer Nature: Cham, Switzerland, 2024; Volume 104. [Google Scholar] [CrossRef]
Zhu, X.; Rehman, K.U.; Wang, B.; Shahzad, M. Modern soft-sensing modeling methods for fermentation processes. Sensors 2020, 20, 1771. [Google Scholar] [CrossRef] [PubMed]
Gallareta, J.G.; González-Menorca, C.; Muñoz, P.; Vasic, M.V. Advancements in Soft-Sensor Technologies for Quality Control in Process Manufacturing: A Review. IEEE Sens. J. 2025, 25, 14575–14588. [Google Scholar] [CrossRef]
Souza, F.A.; Araújo, R.; Mendes, J. Review of soft sensor methods for regression applications. Chemom. Intell. Lab. Syst. 2016, 152, 69–79. [Google Scholar] [CrossRef]
Jiang, Y.; Yin, S.; Dong, J.; Kaynak, O. A Review on Soft Sensors for Monitoring, Control, and Optimization of Industrial Processes. IEEE Sensors J. 2021, 21, 12868–12881. [Google Scholar] [CrossRef]
Perera, Y.S.; Ratnaweera, D.; Dasanayaka, C.H.; Abeykoon, C. The role of artificial intelligence-driven soft sensors in advanced sustainable process industries: A critical review. Eng. Appl. Artif. Intell. 2023, 121, 105988. [Google Scholar] [CrossRef]
Yan, W.; Tang, D.; Lin, Y. A Data-Driven Soft Sensor Modeling Method Based on Deep Learning and its Application. IEEE Trans. Ind. Electron. 2017, 64, 4237–4245. [Google Scholar] [CrossRef]
Baillieul, J.; Samad, T. Encyclopedia of Systems and Control; Springer: London, UK, 2021. [Google Scholar]
Atherton, D.P.; Borne, P. Concise Encyclopedia of Modelling and Simulation; Pergamon Press: Oxford, England, 2013; Volume 5. [Google Scholar]
Kocijan, J. Modelling Dynamic Systems with Artificial Neural Networks and Related Methods; University of Nova Gorica Press: Nova Gorica, Slovenia, 2023. [Google Scholar]
Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
Coefficient of Determination. 2025. Available online: https://en.wikipedia.org/wiki/Coefficient_of_determination (accessed on 25 February 2025).
Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
Kocijan, J. Modelling and Control of Dynamic Systems Using Gaussian Process Models; Springer International: Cham, Switzerland, 2016. [Google Scholar]
Tičar, J.; Tomić, N.; Valjavec, M.B.; Zorn, M.; Marković, S.B.; Gavrilov, M.B. Speleotourism in Slovenia: Balancing between mass tourism and geoheritage protection. Open Geosci. 2018, 10, 344–357. [Google Scholar] [CrossRef]
Skamarock, W.C.; Klemp, J.B.; Dudhia, J.; Gill, D.O.; Barker, D.M.; Duda, M.G.; Huang, X.Y.; Wang, W.; Powers, J.G. A Description of the Advanced Research WRF Version 3; NCAR/TN–475+STR; NCAR Technical Note; National Center for Atmospheric Research: Boulder, CO, USA, 2008. [Google Scholar] [CrossRef]
Powers, J.G.; Klemp, J.B.; Skamarock, W.C.; Davis, C.A.; Dudhia, J.; Gill, D.O.; Coen, J.L.; Gochis, D.J.; Ahmadov, R.; Peckham, S.E.; et al. The weather research and forecasting model: Overview, system efforts, and future directions. Bull. Am. Meteorol. Soc. 2017, 98, 1717–1737. [Google Scholar] [CrossRef]
Grašič, B.; Mlakar, P.; Božnar, M.Z.; Kocijan, J. Validation of numerically forecast vertical temperature profile with measurements for dispersion modelling. Int. J. Environ. Pollut. 2018, 64, 22–34. [Google Scholar] [CrossRef]
Mlakar, P.; Kokal, D.; Grašič, B.; Božnar, M.Z.; Gradišar, D.; Kocijan, J. Validation of meteorological forecasts in fine spatial and temporal resolution produced as an input for dispersion models. Int. J. Environ. Pollut. 2017, 62, 236–246. [Google Scholar] [CrossRef]
Šebela, S. Tektonska Zgradba Sistema Postojnskih Jam; Založba ZRC: Ljubljana, Slovenia, 1998; Volume 18. [Google Scholar]
Gallino, L. Tavole del Rilievo dell R.R. Grotte di Postumia 1:500; Karst Research Institute ZRC SAZU: Postojna, Slovenia, 1924. [Google Scholar]
Afrifa-Yamoah, E.; Mueller, U.A.; Taylor, S.M.; Fisher, A.J. Missing data imputation of high-resolution temporal climate time series data. Meteorol. Appl. 2020, 27, e1873. [Google Scholar] [CrossRef]
Biber, E. The Challenge of Collecting and Using Environmental Monitoring Data. Ecol. Soc. 2013, 18., Art. 68. [Google Scholar] [CrossRef]
Gil, D.; Sánchez-Gómez, M.; Tovar-Pescador, J. Microclimate Variability in a Highly Dynamic Karstic System. Geosciences 2025, 15, 280. [Google Scholar] [CrossRef]
Giese, M.; Caballero, Y.; Hartmann, A.; Charlier, J.B. Trends in long-term hydrological data from European karst areas: Insights for groundwater recharge evaluation. Hydrol. Earth Syst. Sci. 2025, 29, 3037–3054. [Google Scholar] [CrossRef]
Luetscher, M.; Lismonde, B.; Jeannin, P.Y. Heat exchanges in the heterothermic zone of a karst system: Monlesi cave, Swiss Jura Mountains. J. Geophys. Res. Earth Surf. 2008, 113, Art. F02025. [Google Scholar] [CrossRef]
Zhang, Y.; Thorburn, P.J. Handling missing data in near real-time environmental monitoring: A system and a review of selected methods. Future Gener. Comput. Syst. 2022, 128, 63–72. [Google Scholar] [CrossRef]
Perrier, F.; Bourges, F.; Girault, F.; Mouël, J.L.L.; Genty, D.; Lartiges, B.; Losno, R.; Bonnet, S. Temperature variations in caves induced by atmospheric pressure variations—Part 1: Transfer functions and their interpretation. Geosystems Geoenvironment 2023, 2, 100145. [Google Scholar] [CrossRef]
Kranjc, A.; Opara, B. Temperature Monitoring in Škocjanske jame Caves. Acta Carsologica 2016, 31, 85–96. [Google Scholar] [CrossRef]
Cigna, A.A. Modern Trend in Cave Monitoring. Acta Carsologica 2016, 31, 35–54. [Google Scholar] [CrossRef]

Figure 1. The principle of the Gaussian process model.

Figure 2. Two GP model structures: (a) GP-NFIR model, where the output predictions

\hat{y}

are functions of m previous measurements of input signals u, (b) GP series-parallel or equation-error or NARX model, where the output predictions are functions of previous measurements of the input and output signals.

Figure 2. Two GP model structures: (a) GP-NFIR model, where the output predictions

\hat{y}

are functions of m previous measurements of input signals u, (b) GP series-parallel or equation-error or NARX model, where the output predictions are functions of previous measurements of the input and output signals.

Figure 6. Histograms of data that were used for modelling, some for one and some for another model structure.

Figure 7. The simulation response of GP-NFIR wind-velocity model (train signals: complete year 2024)—red line with 95% confidence band compared with the measured test signal of the first eight months of year 2025—blue line.

Figure 8. The simulation response of GP-NARX wind-velocity model (train signals: complete year 2024)—red line with 95% confidence band compared with the measured test signal of the first eight months of year 2025—blue line.

Table 1. Sensors (symbol ‘x’ stands for check mark).

Name/Quantity	Location	Sampling Rate		Elevation
	m From Start	30 min	5 min	m Above Ground
Wind	73	x		0.65
CO₂	73	x		0.3
Temperature T1	73	x		0.5
Temperature T2	12	x		0.6
Temperature HOBO Stara dol	−21		x	0.1 m above floor
Temperature HOBO Stara gor	−18		x	3.4 m above floor
Temperature HOBO Anemo	51		x	0.6
Temperature HOBO Kamra ponvi	150		x	0.5
Temperature HOBO Voda	166		x	underwater
Temperature HOBO Dvorana	192		x	0.5
Temperature HOBO Rov dvor	206		x	0.9
Temperature HOBO Kamin dol	222		x	1.4
Temperature HOBO Za kaminom	250		x	0.2
Temperature HOBO Brezi deade	277		x	1.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kocijan, J.; Perne, M.; Gabrovšek, F.; Mlakar, P.; Grašič, B.; Božnar, M.Z. A Softsensor for Wind Measurements in Karst Caves. Sensors 2026, 26, 22. https://doi.org/10.3390/s26010022

AMA Style

Kocijan J, Perne M, Gabrovšek F, Mlakar P, Grašič B, Božnar MZ. A Softsensor for Wind Measurements in Karst Caves. Sensors. 2026; 26(1):22. https://doi.org/10.3390/s26010022

Chicago/Turabian Style

Kocijan, Juš, Matija Perne, Franci Gabrovšek, Primož Mlakar, Boštjan Grašič, and Marija Zlata Božnar. 2026. "A Softsensor for Wind Measurements in Karst Caves" Sensors 26, no. 1: 22. https://doi.org/10.3390/s26010022

APA Style

Kocijan, J., Perne, M., Gabrovšek, F., Mlakar, P., Grašič, B., & Božnar, M. Z. (2026). A Softsensor for Wind Measurements in Karst Caves. Sensors, 26(1), 22. https://doi.org/10.3390/s26010022

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Softsensor for Wind Measurements in Karst Caves

Abstract

1. Introduction

2. Methods

2.1. Procedure for the Development of a Soft Sensor

2.2. Assumptions and Constraints

2.3. Performance Metrics

2.4. Gaussian Process Models

2.5. Models of Dynamic Systems

2.5.1. Finite Impulse Response (FIR) and Autoregressive (ARX) Models

2.5.2. What Are Prediction, Forecasting or Multi-Step Ahead Prediction, and Simulation?

3. Case Study—Brezimeni Rov Passage in Postojna Cave, Slovenia

4. Results

4.1. Data

4.2. Model Structure

4.2.1. Nonlinear Finite Impulse Response—NFIR Structure

4.2.2. Nonlinear AutoRegressive Model with Exogenous Input—NARX Structure

4.3. Learning of the Model and Test Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Available Signals, Selected Regressors and Covariance Function

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI