Next Article in Journal
Towards a Circular Phosphorus Economy: Electroless Struvite Precipitation from Cheese Whey Wastewater Using Magnesium Anodes
Previous Article in Journal
The Therapeutic Loop: Closed-Loop Epilepsy Systems Mirroring the Read–Write Architecture of Brain–Computer Interfaces
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Environmental Prediction Using a Spatiotemporal WSN: A New Method for Integrating BKA Optimization and CNN-BiLSTM

1
International College of Digital Innovation, Chiang Mai University, Chiang Mai 50200, Thailand
2
Institute of Big Data, Chengdu University, Chengdu 610106, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(1), 296; https://doi.org/10.3390/app16010296
Submission received: 5 November 2025 / Revised: 10 December 2025 / Accepted: 19 December 2025 / Published: 27 December 2025

Abstract

Accurate environmental prediction is crucial for ecological monitoring and disaster early warnings, but it remains challenging due to the spatiotemporal complexity of dynamic wireless sensor networks (WSNs). To this end, we propose a novel hybrid model that integrates a convolutional neural network (CNN), bidirectional long short-term memory (BiLSTM), and a black-winged kite algorithm (BKA). The CNN first extracts spatial features from multi-node sensor data to capture local environmental patterns. Subsequently, the BKA optimizes key CNN hyperparameters (learning rate, hidden layers, and regularization coefficients) to enhance the robustness of feature representation to noise and missing data. Subsequently, the BiLSTM processes the optimization features to model bidirectional long-term time dependencies (e.g., circadian rhythms, seasonal trends) to achieve accurate environmental predictions. Evaluation of the BKA-optimized CNN-BiLSTM model shows that our framework reduces prediction error by 19.3% to 32.7% compared to other models, achieving 89.4% accuracy in predicting extreme weather events. The synergy between BKA-driven CNN optimization and BiLSTM temporal dynamics modeling significantly improves the reliability of environmental prediction in resource-constrained sensor networks.

1. Introduction

A wireless sensor network (WSN) is a cutting-edge environmental monitoring technology that continuously collects key ecological parameters, such as temperature, humidity, and soil characteristics, through distributed nodes. It provides unprecedented data support for climate change research, precision agriculture, and natural disaster early warnings. With the advancement of sensor technology and the expansion of deployment scales, waterline networks have achieved high-resolution spatiotemporal environmental system observations, capturing complex dynamic processes from microhabitats to macro ecosystems. However, these large and heterogeneous sensor data contain complex spatiotemporal correlations, manifested by an uneven spatial distribution of monitoring nodes, topological correlations under complex geographical constraints, and long-term trends including short-period fluctuations (e.g., diurnal cycles) and multi-scale features (e.g., seasonal variations) [1]. These characteristics pose a significant challenge to accurate environmental predictions.
In dynamic WSN environment predictions, spatiotemporal complexity is reflected in three-dimensional space [2]. First, spatially, sensor nodes exhibit dynamic interdependencies, and physical distance and data similarity are not linear. Topography, vegetation coverage, and land use type produce non-uniform spatial correlations [3]. Second, environmental variables in time—especially climate-influenced weather patterns—exhibit rapid changes in multiscale periodicity, such as daily soil temperature fluctuations and seasonal trends [4]. Third, WSNs face data collection challenges due to energy limitations and communication reliability limitations, leading to random data loss and noise interference, further increasing the uncertainty of prediction models [5]. These complex spatiotemporal dependencies and resource-constrained environments constitute the core challenges of WSN environmental forecasting.
Existing forecasting methods face significant limitations in spatial feature robustness and temporal dynamic modeling [6]. Traditional time-series analysis techniques, such as ARIMA and exponential smoothing, fail to effectively capture spatial correlations, whereas classical spatial interpolation methods like kriging overlook temporal dynamics [7]. Among mainstream machine learning approaches, single-model architectures struggle to simultaneously optimize spatiotemporal feature extraction [8]. Although convolutional neural networks (CNNs) can extract spatial features [9], their fixed convolutional kernels are inadequate for handling non-uniform spatial correlations in WSN data [10], and pooling operations may lose subtle environmental patterns [11]. Long short-term memory networks (LSTMs) and their variants excel at modeling temporal dependencies [12]; however, unidirectional LSTMs cannot fully utilize the bidirectional temporal context inherent in environmental data, while standard bidirectional LSTMs (BiLSTMs) do not adequately account for the coupling of spatial features within temporal dynamics [13]. Furthermore, balancing model efficiency and accuracy is crucial in resource-constrained WSN scenarios, yet existing methods often lack adaptive optimization for computational complexity and communication overhead [14].
Table 1 provides a systematic comparison of existing environmental forecasting methods, covering four primary categories and their key characteristics [15]. Traditional statistical methods, such as ARIMA and kriging, demonstrate high resource efficiency but offer limited spatial feature processing capabilities and struggle to characterize complex nonlinear spatiotemporal dynamics [16]. Single-model machine learning approaches (e.g., CNN or LSTM) show improvements in either spatial or temporal feature extraction, yet they remain unable to optimize both simultaneously and exhibit limited adaptability. Hybrid models like CNN-LSTM perform well in spatiotemporal modeling; however, they suffer from hyperparameter sensitivity and insufficient robustness in dynamic wireless sensor network environments [16]. Although functional data analysis methods (e.g., FPCA) can effectively handle spatiotemporal features, they are highly sensitive to missing data, and their computational complexity increases sharply with the number of network nodes [17], resulting in poor scalability. Overall, existing methods show clear limitations in joint spatiotemporal modeling, environmental adaptability, or computational efficiency. See Table 1.
To address these limitations, this study focuses on a core scientific challenge: how to design a prediction framework that optimizes the robustness of spatial features while enabling time-dynamic modeling capabilities and automatically adapting to resource-constrained WSN scenarios [18]. Fundamentally, this entails enhancing the model’s adaptability to dynamic WSN-specific challenges—including non-uniform spatial correlations, multi-scale temporal dynamics, and operational efficiency under resource constraints—while maintaining forecast accuracy [19].
To address the above scientific challenges, we propose a CNN-BiLSTM hybrid prediction framework optimized by BKA. The framework systematically addresses the spatiotemporal complexity of WSN environmental prediction by integrating key components. The model consists of three core parts: First, the CNN module extracts spatial features from multi-node data, uses local receptive fields to capture environmental patterns between adjacent sensors, and constructs basic representations of spatial correlations [11]. Secondly, the Black-winged Kite Optimization Algorithm (BKA) adaptively adjusts the key CNN hyperparameters (such as learning rate, number of hidden layers, and regularization coefficient) to enhance the model’s robustness to noise and missing data, and improve its adaptability in dynamic WSN environments [20]. Finally, the BiLSTM module performs bidirectional long-term temporal modeling of the optimized spatial features to capture the forward and backward environmental changes such as circadian rhythms and seasonal trends [21], thereby achieving more accurate multi-scale temporal prediction [19].
The main contributions of this study are reflected in the following three aspects.
In this study, a meta-optimization and spatiotemporal partitioning modeling method is proposed for data quality assurance. By co-integrating convolutional neural networks (CNNs), Bayesian knowledge acquisition (BKA) [22], and bidirectional long short-term memory (BiLSTM) networks, the framework simultaneously addresses three key challenges in WSN environmental prediction: robustness of spatial features, temporal dynamic modeling, and data quality adaptation. Specifically, the BKA-based hyperparameter optimization mechanism aims to adapt to the characteristics of key model parameters to adapt to dynamic WSN data characteristics and significantly enhance the robustness of feature representations against noise and missing data.
Systematic validation experiments show that compared with mainstream prediction models, the proposed framework significantly improves the error and accuracy of extreme weather event prediction and has better adaptability in terms of resource consumption.
The structure of this paper is as follows. The second part reviews previous studies, focusing on the spatiotemporal analysis methods of WSN environmental prediction and the limitations of existing hybrid models. The third part introduces the methodology and introduces the architecture design and algorithm implementation of the BKA-CNN-BiLSTM model in detail. Section 4 describes the experimental design and discussion, including dataset characteristics and evaluation indicators. Section 5 provides a discussion section to interpret the results and analyze limitations. Finally, the sixth part discusses and explores future research directions.
Through this systematic research framework, we aim to promote the development of WSN-based environmental prediction methods to provide more reliable technical support for ecological monitoring, precision agriculture, and climate change research.

2. Related Work

This section reviews the development trajectory of dynamic WSN environmental prediction, including related deep learning models and optimization techniques. We first analyze the core challenges of WSN environmental prediction, then outline the evolution path from a single model to a hybrid model, and evaluate the effectiveness and limitations of hyperparameter optimization. Finally, by identifying key deficiencies in existing research, we lay the foundation for the innovative contribution of this study.

2.1. Challenges and Model Evolution of WSNs in Environmental Forecasting

Dynamic wireless sensor networks (WSNs) play a pivotal role in ecological monitoring. However, their dynamic topology, limited node resources, and communication instability lead to highly non-stationary data flow, high noise, and missing values, which pose significant challenges to traditional time-series prediction models [23,24]. Early studies mainly used linear models such as ARIMA, which struggle to capture complex nonlinear features and spatiotemporal coupling features in environmental data despite their high computational efficiency. These models do not perform well during emergencies such as pollution surges [25].
To overcome the limitations of linear models, data-driven approaches are becoming more common. Although machine learning models such as support vector machines (SVMs) and random forests have improved prediction performance to some extent, their effectiveness is still highly dependent on manual feature engineering and lacks robust modeling capabilities for long-term time dependencies [5]. In recent years, deep learning models have made significant progress in this field, leveraging their powerful capabilities in automatic feature extraction and nonlinear fitting. This evolution mainly develops in two directions: first, the transformation from a single-model architecture to a mixed-mode architecture to collaboratively capture spatiotemporal features; and secondly, a shift from manual parameter tuning to intelligent optimization techniques to improve model robustness and performance.

2.2. From Single to Hybrid Deep Learning Models

The initial research focused on the applicability of a single deep learning architecture. Convolutional neural networks (CNNs) have been shown to be effective in extracting local spatial patterns from multi-sensor nodes with spatial distribution [25]. However, CNNs are inherently designed to be spatially invariant, making them unsuitable for modeling long-term dependencies in time series, such as day–night cycles or seasonal cycles. For this reason, long short-term memory networks (LSTMs) and their variants are widely adopted. Their gating mechanism mitigates gradient vanishing issues, making them a cornerstone of time-series prediction [26]. Notably, bidirectional LSTMs (BiLSTMs) outperform unidirectional LSTMs in several environmental prediction tasks, utilizing historical and future information to achieve a more comprehensive understanding of the temporal context [27].
Obviously, a single model cannot handle the task of extracting spatiotemporal features at the same time. Therefore, researchers naturally combine CNNs with RNNs, especially BiLSTMs, to form hybrid models aimed at achieving synergistic effects between CNN extraction of spatial features and BiLSTM modeling of time dynamics [28,29,30]. Several research cases have shown that the CNN-BiLSTM hybrid framework outperforms the one-component model in PM2.5 prediction. However, this “assembly style” hybrid model still faces an inherent challenge: feature alignment issues. The spatial feature sequence output by a CNN needs to be effectively aligned with the time step of the BiLSTM, and improper processing may lead to information loss. Attention mechanisms can be introduced to dynamically weight critical time steps. However, the increased complexity and overfitting risk of hybrid models, coupled with the increase in the number of parameters, make them prone to overfitting in limited data scenarios common in WSNs. In addition, a BiLSTM layer that is too deep can actually degrade performance.
These challenges highlight the need for well-designed hybrid model architectures that optimize parameter configuration rather than just stacking components.

2.3. The Role and Limitations of Hyperparameter Optimization

The performance of deep learning models is highly dependent on hyperparameter configuration, including learning rate, hidden layer size, and regularization coefficients. In dynamic WSN scenarios, traditional grid search methods become impractical due to high computational costs. Given the powerful global search capabilities of metaheuristic algorithms, they function as effective automated optimization tools.
A series of swarm intelligence and evolutionary algorithms, including the sparrow search algorithm (SSA) [29], the whale optimization algorithm (WOA) [25], and the particle swarm optimization (PSO) [30,31], have been successfully applied to optimize the hyperparameters of LSTM-BiLSTM models, demonstrating performance improvements in environmental prediction tasks [25,29,30]. However, there are two significant limitations to existing optimization studies.
The optimization goal is narrowly defined. Most studies focus on fine-tuning parameters of the temporal component (e.g., BiLSTM) and ignore the co-optimization of spatial feature extractors (e.g., CNN hyperparameters) [30]. This can lead to suboptimal feature extraction, limiting the upper limit of subsequent time modeling.
Algorithm limitations: Some algorithms, such as standard PSO, are prone to local optimization in complex non-convex problems and exhibit sensitivity to noisy data, and their convergence speed and robustness need to be further improved [30].
The black-winged kite algorithm (BKA) is a novel metaheuristic method that mimics the diving–hunting strategy of black-winged kites. Studies have shown that it has superior convergence speed and robustness in solving high-dimensional optimization problems [32]. However, no studies have explored its application in the synergistic hyperparameter optimization of CNN-BiLSTM hybrid models.

2.4. Research Gaps and Paper Innovation

In conclusion, despite significant advances in existing research, there are key research gaps in the field of dynamic WSN environmental prediction.
Insufficient feature robustness: The current CNN-BiLSTM model fails to explicitly optimize the resilience of CNN components to WSN noise and typical data loss. When data quality degrades, model performance degrades dramatically [28].
Insufficient collaborative optimization: Current hyperparameter optimization methods mainly target BiLSTM models and lack an overall method for optimizing CNN and BiLSTM core hyperparameters at the same time [30].
Lack of efficient optimization algorithms: In resource-constrained WSN scenarios, high-performance algorithms are needed to balance convergence speed and robustness, but the BKA’s potential in this area has not been fully explored.
Accurate environmental prediction is crucial for ecological monitoring and disaster early warnings. However, in resource-constrained dynamic wireless sensor networks (WSNs), this task faces inherent dual challenges. First, the data exhibit a high degree of spatiotemporal complexity and coupling. Spatially dispersed node data contain local patterns, while temporal variations such as seasonal shifts are influenced by long-term trends and short-term extreme weather events. Secondly, WSN data often have noise, missing values, and non-stationarity, which weaken the generalization ability of traditional models and reduce prediction reliability.
Experimental results show that the prediction error of the proposed BKA-optimized CNN-BiLSTM model is significantly reduced by 19.3% to 32.7% in multiple real-world WSN datasets, and the prediction accuracy of extreme weather events reaches 89.4%. The core innovation of this study lies in revealing the synergistic mechanism between BKA-driven CNN hyperparameter optimization and BiLSTM temporal dynamics modeling, providing a system solution for resource-constrained dynamic WSN systems that balance prediction accuracy, model robustness, and computational feasibility.

3. Methodology

3.1. Overview of the Overall Framework

In order to overcome the problems of insufficient robustness, insufficient collaborative optimization, insufficient efficiency of optimization algorithms, modeling, and data quality, a new hybrid model is proposed.
First, we use the local perception capabilities of convolutional neural networks (CNNs) to explicitly decouple and extract spatial features from multi-node data. To address the limitations of CNNs in the noisy data landscape—especially their lack of functional robustness and inadequate collaboration optimization—we innovatively introduce the black-winged kite algorithm (BKA). As a meta-optimizer tailored to WSN data characteristics, the BKA adaptively searches for optimal hyperparameters (such as learning rate and number of hidden nodes) to enhance the robustness and generalization of the model under harsh real-world conditions. Subsequently, the optimized robust spatial features are fed into the bidirectional long short-term memory (BiLSTM) network. By leveraging its bidirectional temporal modeling capabilities, BiLSTM simultaneously captures both forward and backward long-term dependencies in environmental evolution, such as circadian rhythms and the cumulative effects of disasters, thereby solving the problems of efficient optimization algorithms, spatiotemporal modeling, and data quality, as shown in Figure 1.
The overall workflow of the model is shown in Figure 1. First, the WSN spatiotemporal sequence data from multiple nodes are organized into two-dimensional feature maps and fed into the CNN module to extract local spatial correlation patterns. Then, the BKA optimizer, as the “meta-regulator” of model performance, does not directly process the data but determines the optimal combination of key CNN hyperparameters (learning rate, number of hidden nodes, regularization coefficient) through an intelligent search to enhance the robustness of its spatial feature representation. Subsequently, the high-quality spatial feature sequences extracted by the optimized CNN are fed into the BiLSTM module. The BiLSTM uses its forward and backward LSTM layers to jointly model the bidirectional long-term time dependencies contained in the feature series (e.g., diurnal cycle, trend evolution). Finally, the output of the BiLSTM is mapped by the full connection layer to obtain the final predicted value of environmental parameters.

3.2. CNN Spatial Feature Extraction Module

In dynamic WSNs, the temperature and humidity data collected by the sensor nodes have local correlation in space. CNNs, with their convolution and pooling operations, are used as powerful spatial feature extractors, and their core advantage lies in translational invariance, i.e., they do not depend on the absolute position of nodes, so they can effectively adapt to dynamic changes in network topology, as shown in Figure 2.
At a given time instant t , the M -dimensional observations from N sensor nodes are organized into a N × M two-dimensional matrix X t , which is treated as an “image”. The convolutional neural network (CNN) applies convolution kernels to this “image” to extract local spatial patterns. The convolution operation is defined as shown in Equation (1):
z i , j l = σ ( m = 1 M k n = 1 N k w m , n l · x i + m l 1 + b l )
where z i , j l denotes the value of the l -th feature map at position ( i , j ) ; w m , n l and b l represent the trainable convolutional kernel weights and biases; σ(·) is the R e L U activation function; and M k × N k indicates the convolutional kernel size. Subsequently, the max-pooling layer performs downsampling on the feature map, as shown in Equation (2):
p i , j l = max ( u , v ) R i , j ( z u , v l )
where R i , j is the pooled window area, as shown in Figure 3.
Function in this study: The CNN module is specifically designed to decouple and extract spatial complexity from input data. It automatically learns local spatial correlations between nodes, such as pollutant diffusion gradients, and compresses high-dimensional raw data into a low-dimensional yet information-rich sequence of spatial feature vectors { h t c n n } , laying the groundwork for subsequent time-series modeling. The key parameters of its extraction capability—learning rate, number of hidden units, and the L2 regularization coefficient λ —will be optimized by the BKA.

3.3. BiLSTM Temporal Dynamic Modeling Module

Environmental parameters evolve over time, with both long-term trends and short-term fluctuations. BiLSTM was introduced to model these complex time dependencies. Compared with one-way LSTM, BiLSTM can learn contextual information from both forward and backward directions at the same time, with forward representation of past to future and backward representation of future to past, so as to understand the dynamics of the time series more comprehensively, as shown in Figure 4 and Figure 5.
At time step t , the BiLSTM takes the spatial feature h t c n n extracted by the CNN as input. Its core consists of LSTM units and gate mechanisms, which include the forgetting gate f t , input gate i t , and output gate o t . These gate mechanisms update the cell state C t (long-term memory) and hidden state h t (short-term memory), as shown in Equations (3)–(6):
f t = σ ( w f · [ h t 1 , h t c n n ] + b f )
i t = σ ( w i · [ h t 1 , h t c n n ] + b i ) C ~ t = t a n h ( W C · [ h t 1 , h t c n n ] + b C )
c t = f t C t 1 + i t C ~ t
o t = σ ( w o · [ h t 1 , h t c n n ] ) + b o h t = o t t a n h ( C t )
BiLSTM operates two separate LSTM networks: a forward LSTM (generating h t ) and a backward LSTM (producing h t ). The hidden states from both networks are concatenated at each time step to form the final output, as shown in Equation (7):
H t = [ h t ; h t ]
Function in this study: The BiLSTM module is specifically designed to decouple and model temporal complexity in data. It processes the robust spatial feature sequences optimized by the CNN and BKA, focusing on learning bidirectional temporal patterns such as circadian rhythms and the development of extreme events, and outputs the final prediction y ^ t , as shown in Figure 6.

3.4. BKA Optimization Module

The feature extraction performance of a CNN is highly dependent on its hyperparameter settings. Manual or grid parameter tuning proves inefficient and challenging to achieve global optimization in dynamic WSN scenarios. To address this, we introduce the black-winged kite algorithm (BKA) as an intelligent optimizer. The selection of the BKA is primarily based on its core behavioral mechanism that aligns closely with this problem: exploration–development–adaptive balancing. The BKA mimics the black-winged kite’s behaviors—cruising (global exploration), diving (local development), and eavesdropping (adaptive adjustment)—effectively enabling efficient searching across vast hyperparameter spaces while avoiding local optima. Additionally, it inherently adapts to dynamic changes in data distribution caused by node movement or environmental shifts.
Targeted optimization objective: BKA is specifically designed to optimize three key hyperparameters of the convolutional neural network (CNN), with the learning rate controlling the model’s convergence speed and stability.
Hidden node count: affects feature representation capability and model complexity.
The L2 regularization coefficient ( λ ) directly determines the model’s robustness to noise and missing data.
Convergence efficiency: Compared to other meta-heuristic algorithms (e.g., PSO, GA), the BKA demonstrates faster convergence and more stable optimization performance in solving continuous parameter optimization problems (see Section 4.3.8, Section 4.3.9 and Section 4.3.10 for comparative experiments).
The BKA model treats the hyperparameter θ = [ l r , u n i t s , λ ] as an individual. Its optimization objective is to minimize the prediction error of the CNN feature extractor on the independent validation set, with the fitness function defined as:
F i t n e s s ( θ ) = M S E v a l + λ | | W | | 2 2
where M S E v a l represents the mean squared error on the validation set, while | | W | | 2 2 denotes the L2 norm of CNN weights. The BKA iteratively updates individual positions (i.e., hyperparameter combinations) in the population to identify the optimal solution θ * with the minimum fitness value. This optimal configuration ensures the CNN can extract the most robust and discriminative spatial features from noisy dynamic WSN data.

3.5. Model Integration and Training

Integrate the three modules to form an end-to-end training framework. First, as shown in Table 2, the BKA algorithm optimizes the preset hyperparameter search space to obtain the optimal CNN configuration θ * . Then, with this configuration fixed, initialize the CNN-BiLSTM hybrid model. The model’s ultimate training objective is to minimize the total loss L t o t a l across all training samples using the mean square error (MSE), as shown in Equation (8):
L t o t a l = 1 N i = 1 N ( y i t r u e y i p r e d ) 2
Using the Adam optimizer, all weight parameters of the CNN and BiLSTM are updated simultaneously by the backpropagation algorithm. The fixed hyperparameters of the BiLSTM fraction were determined based on preliminary experiments (see Table 2 and Table 3).

4. Experimental Design and Discussion

4.1. Experimental Equipment

4.1.1. The Scale and Layout of the Experimental Site

To demonstrate the “spatiotemporal complexity” of the dynamic WSN and verify the performance of the model in real-world scenarios, a medium-scale heterogeneous experimental site is designed.
The site covers an area of 5 km × 5 km (25 square kilometers) and is equipped with 50 sensors.
It is large enough to accommodate diverse microclimate environments, such as urban blocks, parks, and near water bodies, to generate meaningful data on spatial change. It is also in line with the scale of typical environmental monitoring projects, such as urban air quality monitoring and forest microclimate studies, making the findings more convincing.
Topography and geomorphological features: The site covers a variety of landforms that simulate complex spatial patterns. About 30% are urban built-up areas; concrete structures are prone to the “heat island effect” and pollutant accumulation. About 40% are green park areas, i.e., grasslands, shrubs, and trees, the temperature and humidity difference of which contrasts with urban areas. About 10% are water bodies, i.e., small lakes or rivers with significant temperature and humidity regulation capabilities. Approximately 20% is open space: as a control area, the shade effect is minimal.
As shown in Figure 7, the process begins with obtaining raw spatiotemporal WSN data. The convolutional neural network is used to extract spatial features from the original data, and then the hyperparameters of the convolutional neural network are optimized by the BKA. The optimized spatiotemporal data are then processed by the CNN to generate feature outputs. These feature data are input into the BiLSTM model for bidirectional time-series modeling, and finally, the prediction results of environmental parameters are obtained.

4.1.2. List of Experimental Equipment and Consumables

The experiment list mainly includes the sensing layer, communication layer, data layer, and parameter configuration.
  • Sensor layer hardware (WSN node)
Table 4 details the hardware configuration and deployment specifications of the sensor nodes in the wireless sensor network (WSN) used in this experiment, as follows:
(1) Main controller: Each node is equipped with either an Arduino MKR WAN 1300 board (Arduino SA, Somerville, MA, USA) or a low-power STM32L-series microcontroller (STMicroelectronics, Geneva, Switzerland). A total of 50 such microcontrollers are deployed to handle data acquisition, processing, and communication.
(2) Sensors:
Environmental sensors: Fifty BME680 multi-parameter environmental sensors (Bosch Sensortec, Reutlingen, Germany) are used to collect microclimate data, including temperature, humidity, atmospheric pressure, and volatile organic compounds (VOCs).
Air quality sensors: Fifteen air quality sensors are deployed, comprising SDS011 particulate matter sensors (Nova Fitness Co., Ltd., Jinan, China) and SGP30 gas sensors (Sensirion AG, Stäfa, Switzerland), focusing on monitoring urban and upwind PM2.5/PM10 levels and indoor air quality indicators.
(3) Communication and support systems:
Communication module: All nodes are equipped with either LoRa modules (e.g., RFM95W) or NB-IoT modules—totaling 50 units—with chipsets provided by Semtech Corporation (Camarillo, CA, USA), enabling long-range, low-power wireless communication and supporting dynamic ad hoc network formation.
Power system: Each node is powered by a hybrid energy source consisting of an 18,650 lithium-ion battery and a small solar panel (50 sets in total), reflecting the “resource-constrained” and “dynamic” nature of real-world WSN deployments (e.g., nodes may temporarily go offline due to energy depletion).
Enclosure and mounting: All nodes are housed in custom IP65-rated protective enclosures and mounted 2–3 m above ground level on stainless steel poles (50 sets in total) to minimize ground-level interference and safeguard internal electronics.
B.
Communication and data layer
Table 5 details the system configuration of the communication and data layers in this study, which ensures a complete process from data aggregation to model computation and storage. The details are as follows: (1) Gateway: LoRa/NBIoT gateway (23 deployed) built based on the Raspberry Pi 4B single-board computer produced by the Raspberry Pi Foundation (Cambridge, UK), responsible for aggregating the data of all sensor nodes and uploading it to the cloud server via 4G or Ethernet. (2) ECS/On-premises server: Use cloud hosts (such as Amazon EC2 instances provided by Amazon Web Services, Inc. (Seattle, WA, USA)) or high-performance local workstations with public IP as computing platforms for training and prediction of CNNBiLSTM models optimized by BKA. (3) Data storage: Use the relational database MySQL (version: MySQL 8.0; or see https://www.mysql.com/) or the time series database InfluxDB (version: InfluxDB 2.x; Or see https://www.influxdata.com/) to efficiently store and manage massive spatiotemporal sequence data from 50 nodes.

4.1.3. Edge Deployment

With detailed test environment descriptions, quantitative optimization methods, and comprehensive performance tables, the proposed BKA-CNN-BiLSTM models and other comparative models demonstrate their latency, throughput, and memory usage on typical edge hardware. This demonstrates the feasibility of deploying the model in resource-constrained environments.
(1)
Test environment and evaluation indicators
Hardware platforms: We selected two common edge devices for evaluation.
Raspberry Pi 4B: Equipped with Broadcom BCM2711, a quad-core Cortex-A72@1.5 GHz processor, and 4 GB LPDDR4 memory. The device represents a mainstream edge computing platform with low power consumption and low cost. Laptop CPU: Equipped with an Intel Core i5-10210U@1.60 GHz processor and 16 GB DDR4 memory. This platform is a more powerful edge gateway or on-premises server. Software and framework: All tests were conducted on Ubuntu 20.04 using the PyTorch 1.12.0 framework and ONNXRuntime (CPU version) enabled as a unified inference backend to ensure fair performance comparisons.
Evaluation indicators:
Latency: The average time it takes to process a batch of data in milliseconds.
Throughput: The number of samples processed per second (samples per second).
Memory usage: The maximum physical memory (MB) used by the process during model loading and inference execution.
(2)
Model optimization and measurement methods
To accommodate the limited computing resources of edge devices, we implemented the following optimizations for all models participating in the comparison:
Dynamic quantization: Utilize PyTorch’s INT8 dynamic quantization technology to convert model weights into eight-bit integers, thereby reducing model size and speeding up inference. Pruning: Unstructured pruning of non-critical weights in the CNN module, setting the sparsity to 20% to minimize model complexity. ONNX Conversion: Converts the optimized PyTorch model to ONNX format and uniformly uses ONNXRuntime for inference, which is highly optimized for the CPU. Measurement methodology: During the test, the model runs 1000 inferred batches per device, discards the first 100 warm-up data points, and uses the average latency of the remaining 900 batches as the final result. Memory usage is obtained by monitoring the maximum set of resident memory during stable system operation.
(3)
Performance results and analysis
The results in Table 6 are as follows.
Deployment feasibility: On the Raspberry Pi 4B, our fully optimized BKA-CNN-BiLSTM model memory footprint is only 45.2 MB, with a single-batch (eight samples) inference latency of 32.1 ms, which equates to nearly 250 samples per second. This result shows that the model can run stably on low-cost, low-power edge nodes like Raspberry Pi to meet the practical deployment needs of resource-limited sensor networks.
Performance trade-offs: Our hybrid architecture has slightly higher latency and memory consumption due to its more complex structure compared to standalone CNN or BiLSTM models. However, this small performance trade-off leads to a significant improvement in prediction accuracy, demonstrating an effective balance between precision and efficiency in model design. Optimization achievements: Through INT8 quantization and pruning, the scale and computational load of the model are greatly reduced. Taking our model as an example, the Raspberry Pi-optimized version successfully controls memory usage below 50 MB, ensuring feasibility in edge environments with limited memory resources. Hardware advantages: The more powerful Intel i5 platform achieves lower latency and higher throughput through enhanced CPU architecture and higher memory bandwidth. This shows that for edge gateways with high computing demands, our model can make full use of hardware resources and provide faster response speed.

4.2. Data Collection and Preprocessing

4.2.1. Theoretical Analysis of Sampling Cycle

The sampling cycle is at the heart of our system design and is the result of balancing and optimizing hardware limits, sensor settling time, system energy consumption, and the theoretical demands of environmental prediction tasks.
(1)
Analysis of hardware and sensor physical limitations
The sampling period we chose was not the maximum time allowed by the hardware, but was much faster than the optimized value of this limit.
Hardware capability assessment:
Main control chip: The Arduino MKR-WAN1300 and STM32L series are both low-power but high-performance ARM Cortex-M microcontrollers capable of multi-sensor data acquisition, preprocessing, and packaging in milliseconds.
Sensor settling time: This is the most critical limiting factor. For example, the BME680 requires heating and stabilization time to achieve highly accurate readings of temperature, humidity, pressure, and volatile organic compounds, with stabilization and measurement times typically between 100 and 200 milliseconds. In contrast, the SDS011 laser dust sensor obtains stable readings in about 10 s.
Conclusion: From a pure hardware perspective, our system is fully capable of achieving sample intervals in seconds or even less.
(2)
The scientific basis for selecting the current sampling period
While the hardware features allowed for faster sampling, we ended up setting the sampling cycle to 5 min. This decision is based on three main scientific considerations.
  • Dynamic characteristics of environmental phenomena (scientific needs).
Our prediction targets (e.g., temperature, humidity, PM2.5 concentration) are typical slow-changing processes. Their changes are controlled by atmospheric physical and chemical processes, and changes on a second or small scale are usually noise rather than effective signals.
Oversampling risk: If the sampling speed is too fast (e.g., once per second), it will capture too high a frequency noise. This not only hinders the performance of the predictive model but also increases data redundancy and transmission overhead, potentially leading to model overfitting noise.
The Nyquist sampling theorem states that the sampling frequency must be at least twice the highest frequency of the signal. Our spectral analysis of historical environmental data shows that significant changes in energy occur mainly in the 16 Hz frequency range. Therefore, the 5 min sampling period we chose—much higher than the Nyquist sampling rate at that frequency—fully captured all meaningful dynamic changes.
B.
Limitations of system energy consumption and network life (engineering reality)
Wireless sensor networks (WSNs) are resource-constrained systems whose operational life is directly determined by energy consumption. The energy consumption model shows that the maximum power consumption of the sensor node occurs (a) during sensor wake-up and measurement, or (b) during wireless data transmission. Our calculations and experimental results show that extending the sampling interval from 1 min to 5 min can increase the theoretical lifetime of the node by nearly five times.
Data volume trade-offs: Longer sampling cycles reduce the number of packets transmitted per unit time, significantly reducing the risk of network congestion and communication energy consumption per node. This is especially critical for large-scale, battery-powered field deployments.
C.
Predictive Task Fit
Our models are designed to predict the next 24 h. The 5 min time step provides sufficient temporal resolution for such medium-term predictions without producing excessively long and difficult-to-handle sequences. Using secondary data requires the BiLSTM to process extremely long sequences (e.g., 3600 time steps), which significantly increases computational complexity and training difficulty but does not necessarily improve prediction performance.
(3)
Abstract
In summary, the 5 min sampling interval we chose is an optimized, application-oriented parameter with the following decision-making process.
The sampling rate is well below the maximum allowable capacity of the hardware, ensuring the reliability and stability of data acquisition. It takes into account the physical characteristics of environmental signals, avoiding oversampling to maintain data quality. Strictly limited by the energy consumption realities of the water network, this approach is a necessary compromise to achieve long-term sustainable ecological monitoring. Therefore, based on a deep understanding of hardware limitations, the selection of this sampling cycle was a rational decision made to achieve scientific prediction goals while ensuring system feasibility.

4.2.2. Sampling Cycle Experiment

The sensor network in this study uses a 5 min sampling cycle. This duration is determined by a trade-off between hardware capabilities, dynamic environmental characteristics, and system energy consumption limits. On the one hand, the 5 min interval is significantly shorter than the minimum sampling limit of the main control chip and sensor, ensuring the reliability of data acquisition. On the other hand, spectral analysis of historical environmental data shows that the 5 min period fully meets the requirements of the Nyquist sampling theorem for dynamic signal changes in temperature and PM2.5 measurements. Additionally, this cycle effectively controls node energy consumption and maintains a manageable amount of data, which is crucial for extending the operational life of battery-powered networks.
Figure 8 provides a 5 min optimized sampling cycle for scientific and engineering reference. Figure 8 contains two subplots. The left subgraph compares the signal waveforms at different sampling rates. The blue curve (high-precision reference signal) shows the raw signal captured by the sensor at high-speed sampling (per minute), including real-world ambient dynamics and high-frequency noise. The red dot (optimized sampling) shows that the blue curve is sampled every 5 min. This sampling rate successfully captures all major trends and key inflection points (such as peaks and troughs) while effectively filtering out most of the high-frequency noise, providing the model with clean and representative input data. The green triangle (undersampled) indicates sampling every 60 min. Obviously, this rate is severely depleted of signal detail and fails to reflect rapid environmental changes. For example, omitting the afternoon temperature surge entirely results in information loss that is not suitable for the input needs of the predictive model.
Conclusion: Compared with high or low sampling rates, the 5 min sampling period achieves the best balance between signal fidelity and data reduction.
The figure on the right shows the curve between system energy consumption and sampling rate. This curve shows the trend of average current consumption (or daily energy consumption) of nodes as the sample rate increases (shortening the sampling period). The relationship shows non-linear growth. When the sampling cycle is reduced from 60 min to 5 min, the energy consumption increases relatively smoothly, keeping the system life within acceptable limits. However, when the sampling cycle is further shortened to less than 1 min, the energy consumption rises sharply, leading to rapid battery depletion and seriously affecting the long-term deployment feasibility of the sensor network. The shaded areas in the figure indicate the recommended window for sampling periods, balancing sufficient fidelity of the signal with acceptable system energy consumption.
Key findings: As shown in the figure on the left, the 5 min sampling interval fully meets the data quality requirements of the environmental prediction task. The correct data further suggest that this range is within the “sweet spot” of the energy consumption curve, making large-scale long-term environmental monitoring technically feasible. This 5 min interval effectively captures the core dynamic characteristics of the environmental signal. It extends battery life by approximately four times compared to 1 min sampling while achieving the best balance between signal fidelity and system energy consumption. Therefore, the sampling intervals we chose represent scientific and engineering optimization, not hardware limitations.

4.2.3. Dataset Construction and Preprocessing

To verify the performance of the proposed model in real-world dynamic WSN scenarios, a six-month experiment was designed and conducted. The experimental setup is detailed in Section 4.1 to ensure reproducibility of the study.
Network topology: Deploy 50 wireless sensor nodes to form a dynamic WSN, of which 40 nodes are fixed nodes and 10 nodes are installed on mobile platforms (e.g., sanitary vehicles) to simulate mobility of more than 30% of the nodes.
(1)
Monitoring parameters: Temperature (°C), relative humidity (%), atmospheric pressure (hPa), and PM2.5 concentration (μg/m3) are collected at the same time at each node. The core prediction goal of this study is the PM2.5 concentration in the next hour, and the temperature data are used as an auxiliary tool to analyze extreme weather events such as heat waves.
(2)
Sample size and collection cycle
Data were collected every 5 min. The experimental period was from 1 June 2023 to 30 November 2023, for a total of 6 months (183 days). After the initial cleanup, about 52,560 valid time points and complete data records were obtained.
Data division: The dataset is divided chronologically, with the first four months (about 70%) for training, the following month (about 15%) for validation (BKA optimization), and the last month (about 15%) for testing.
(3)
Pretreatment process
Noise processing: Smooth out the original reading by using a median filter of the sliding window (window size = 5) to resist transient pulse noise.
Missing value handling: For data gaps caused by node relocation or communication failure, we use the spatiotemporal K nearest neighbor interpolation method. This method combines the data from the three nearest spatial nodes at a certain time point with the two nearest time points of the nodes themselves for weighted interpolation. Data segments with more than one consecutive hour of missing values are flagged and excluded from the training dataset.
Data normalization: All numerical features are normalized using Z-score standardization, which centers the data at zero mean and scales it to unit standard deviation. The transformation is defined as:
X n o r m = X μ σ .
where μ and σ are the mean and standard deviation of the feature, respectively.
Note: While Z-score normalization is effective for approximately Gaussian-distributed data, it is sensitive to outliers because both the mean and standard deviation can be heavily influenced by extreme values. For datasets with significant outliers, robust alternatives (e.g., scaling based on median and interquartile range) are recommended.
Time discretization: Converts timestamps into two periodic features—“time of day” and “day of year”—and encodes them with sine and cosine, respectively, to help models understand circadian rhythms and seasonal cycles.
(4)
Results and quantitative analysis
A.
Comparative results of PM2.5 concentration prediction
For the prediction accuracy comparison, we selected ARIMA, support vector regression (SVR), standard LSTM, and CNN-LSTM (unoptimized) as baseline models. The comparison results of PM2.5 concentration prediction are shown in Table 7.
Performance boost: Our models achieve optimal performance. Compared with the strongest baseline model, CNN-LSTM, RMSE decreased by 11.4% from 3.41 to 3.02. The improvement compared to ARIMA reached 29.9%, consistent with the “19.3–32.7%” margin of error mentioned in the abstract (based on comparisons between different baseline models).
Extreme event detection: In a binary classification task to determine whether PM2.5 concentrations exceed the severe pollution threshold (150 μg/m3), our model achieved 89.4% accuracy, with 87.1% accuracy and 85.6% recall rates, respectively.
B.
Provide quantitative analysis
Ablation research and benchmarking
(1)
Experimental setting and evaluation criteria
In order to comprehensively evaluate the performance of the proposed BKA-CNN-BiLSTM model in the prediction of dynamic wireless sensor network parameters, we conducted system ablation experiments and benchmark tests. All models were trained and tested under the same hardware and software conditions, using the same training, validation, and testing datasets to ensure fairness and comparability of results.
The benchmark model selection covered traditional machine learning methods, classical deep learning models, and advanced models proposed in recent years, mainly including the following.
ARIMA model: A representative statistical model for traditional time-series prediction. LSTM model: Classical recurrent neural networks for time-series processing. BiLSTM model: a bidirectional long short-term memory network. The system captures time dependencies in both directions. CNN-BiLSTM model: A hybrid model that combines convolutional neural networks and bidirectional LSTMs. Pure transformer model: A sequence model based on the mechanism of self-attention. ST-autoencoder model: A spatiotemporal autoencoder model. Ablation experiments aimed at validating the contribution of each component in the BKA-CNN-BiLSTM model, with the following variants developed.
Model Variants and Evaluation Settings
We evaluated the following model variants to assess the contribution of each component in the BKA-CNN-BiLSTM architecture:
  • CNN-BiLSTM: A baseline model without the BKA module, trained using default hyperparameters.
  • BKA-BiLSTM: The CNN-based spatial feature extraction module is removed, retaining only the BKA and BiLSTM components.
  • BKA-CNN: The BiLSTM layer is replaced with a traditional fully connected (FC) layer, thereby eliminating explicit temporal modeling capability.
The full BKA-CNN-BiLSTM model integrates all three components: BKA, CNN for spatial feature extraction, and BiLSTM for temporal dynamics modeling.
Evaluation metrics include RMSE (Root Mean Square Error), MAE (Mean Absolute Error), and R 2 (Coefficient of Determination)—standard metrics widely used in regression tasks. All experiments were conducted under identical data partitioning and preprocessing conditions. To mitigate the impact of randomness, the reported results are averaged over five independent runs.
(2)
Analysis of ablation experimental results
Table 8 shows the performance comparison results of the proposed BKA-CNN-BiLSTM model and its various ablation variants in the test set.
The analysis of the results in Table 8 allows us to draw the following important conclusions.
The BKA optimizer demonstrated a significant performance improvement. A comparative analysis between BKA-CNN-BiLSTM and CNN-BiLSTM showed that the BKA optimizer reduced RMSE by approximately 19.2% (from 1.04 to 0.84). This indicates that the BKA improves the robustness of feature representation quality and noise data by optimizing the key hyperparameters of the CNN module, including learning rate, hidden layer size, and the regularization coefficient.
Contribution of CNN modules: Comparing BKA-CNN-BiLSTM and BKA-BiLSTM shows that the RMSE increases by about 40.5% (from 0.84 to 1.18) after removing the CNN module. The effectiveness of CNN in extracting local spatial features from multi-node sensor data is confirmed, as the lack of spatial feature extraction significantly affects model performance.
Contribution of the BiLSTM module: The BKA-CNN model without BiLSTM performed the worst, confirming the irreplaceability of BiLSTM in modeling bidirectional long-term dependencies. Bidirectional temporal modeling is essential for capturing changes in environmental parameters, such as circadian rhythms and seasonal trends.
(3)
Benchmarking results analysis
Table 9 systematically compares this model with multiple benchmark models.
Analysis of the benchmark results shows the following.
(1)
Compared with the traditional model, the RMSE of the proposed BKA-CNN-BiLSTM model is about 49.1% lower than that of the traditional ARIMA model. This shows significant advantages. Deep learning models outperform traditional statistical models in capturing complex nonlinear spatiotemporal patterns and environmental parameter estimation.
(2)
Benchmark comparison with classical deep learning models: Compared with LSTM and BiLSTM models, our model shows a RMSE (market equilibrium effect) reduction of 35.4% and 32.8%, respectively. This improvement is primarily due to the spatial feature extraction capabilities of CNN components, overcoming the limitations of a single LSTM model when processing time series.
(3)
Benchmarking against advanced hybrid models: Our approach reduces RMSE by 19.2% compared to CNN-BiLSTM baselines, demonstrating that BKA-based hyperparameter optimization effectively enhances spatial feature representation and overall prediction accuracy.
(4)
Model efficiency evaluation: Although the BKA-CNN-BiLSTM model needs to extend the training time due to the optimization process, the single prediction time in the inference mode under standard GPU conditions is less than 10 milliseconds, which meets the real-time prediction requirements of dynamic WSN environmental parameters.
(5)
Discussion
Based on the combined results of ablation experiments and benchmarks, the following conclusions can be drawn.
The proposed BKA-CNN-BiLSTM model achieves the best performance in environmental parameter prediction through the synergistic interaction of its components. Specifically, the CNN module effectively extracts spatial features from multi-node sensor data, while the BiLSTM module precisely models the bidirectional time dependencies of environmental parameters. The BKA optimizer enhances the robustness of feature representation by intelligently optimizing CNN hyperparameters. This hierarchical architecture—including spatial feature extraction, hyperparameter optimization, and temporal dynamic modeling—effectively addresses the spatiotemporal complexity challenge in dynamic WSN environmental parameter prediction.
Compared with existing research, the key innovation of this study is the integration of the emerging BKA optimization algorithm into the CNN-BiLSTM hybrid model framework. Experimental validation proves the effectiveness of this comprehensive method in the task of predicting environmental parameters. The results show that in the WSN envionment with limited resources, the BKA-optimized hybrid model significantly improves the prediction accuracy and provides more reliable technical support for ecological monitoring and disaster early warnings.

4.3. Analysis of Forecast Results

4.3.1. Calculation of Extreme Event Accuracy

To rigorously evaluate the model’s predictive performance against extreme events (e.g., sudden pollution, storms, etc.), we construct it as a dichotomous task (extreme vs. normal events) and quantify it using confusion matrices and their derivative indicators.
The confusion matrix is a core tool for the performance evaluation of classification models, which summarizes four possible scenarios of model prediction results and real labels in a tabular form. For a binary problem, a definition is shown in Table 10.
Based on the confusion matrix, we calculated the overall accuracy to measure the model’s comprehensive judgment ability across all samples, and the accuracy calculation Equation (9) is as follows:
A c c u r a c y = T P + T N T P + T N + F P + F N
In this study, the sample distribution and model prediction results are as follows:
Test set total: 2336 samples.
True label distribution: 560 actual extreme events (positive category) and 1776 actual normal events (negative category).
Model prediction results:
TP = 391 (successfully captured extreme events)
FN = 169 (underreported extreme events)
FP = 78 (normal event with false positive)
TN = 1698 (correctly identified normal events)
The above values are substituted into the formula to calculate the model’s prediction accuracy for extreme events:
A c c u r a c y = 391 + 1698 391 + 1698 + 169 + 78 = 2089 2336 0.894
The calculation results show that the overall accuracy of the BKA-CNN-BiLSTM model in the extreme event prediction task is 89.4%.

4.3.2. Comprehensive Performance Evaluation

While environmental parameter prediction is essentially a regression task, we establish a dual evaluation metric of regression and classification to benchmark the broader study. The classification index is calculated by considering extreme weather events, such as excessive PM2.5 concentrations and sudden temperature changes, as binary classification problems.
As shown in Table 11, our model outperforms the comparison model on all regression indicators. The lowest RMSE and MAE values indicate minimal prediction error, while the highest R2 values indicate that our model covers most of the data variance and the predicted values are very close to the actual values.
Table 12 details the model’s performance in the extreme event early warning classification task. Our model achieved an accuracy rate of 89.4%, with an F1 score of 88.3%. This demonstrates the model’s exceptional balance between high accuracy (high accuracy) and recall (high coverage), enabling accurate prediction and comprehensive detection of extreme events. The AUC value of 0.951 further highlights the model’s excellent class discrimination ability. These comprehensive indicators finally verify the effectiveness and reliability of the framework in the application of critical disaster early warnings.

4.3.3. Characteristic Analysis of Noise Data

In order to show how BKA optimization improves the feature extraction performance of the CNN in noisy data, Figure 9 shows the visual feature representation of the CNN at noise levels of 0%, 10%, 20%, 30%, and 40% after the original convolutional neural network and BKA optimization.
For the original CNN, the feature distribution shows a clear dispersion trend with an increase in noise level. At 0% noise, the feature points already show some dispersion. When the noise level rises to 40%, the feature points completely lose their aggregation structure and become randomly scattered, indicating that the original convolutional neural network is less resistant to input noise interference, and the high noise level seriously affects the stability of its feature representation.
The BKA-optimized CNN maintains a more focused and ordered feature clustering at all noise levels. Even at 40% high noise, feature points maintain a recognizable clustering pattern. As shown in the Figure 9, the classification accuracy is 32.7%, and the noise is 30%, indicating that there is a direct positive correlation between the enhancement of feature distribution concentration and the improvement of classification performance shown by the BKA optimization model. This indicates that a more stable feature representation can effectively enhance the model’s ability to distinguish noise data.
In summary, BKA optimization significantly improves the robustness of CNNs in noisy environments while improving the performance of feature representation and classification.
BKA optimization significantly improves the robustness of the model against noise and missing data. Although the performance is slightly reduced in the “30% mixed interference” scenario, it shows stronger anti-interference in the first two noise and data loss conditions. This indicates that BKA optimization effectively enhances the robustness of CNNs in most noisy and missing data scenarios.
Table 13 shows the anti-interference performance of the proposed BKA-CNN model compared with the original CNN model in three different noise interference scenarios. The core content and conclusion of the table are as follows: (1) Test scenario: Three common data interference scenarios are simulated: 10% missing data: Analog sensor data is lost. 20% Random Noise: Random error in simulated data acquisition. 30% Hybrid Interference: Simulates a more complex comprehensive interference environment. (2) Performance comparison: In all scenarios, the performance of the BKA-CNN model optimized by the Black Wing Kite algorithm is significantly higher than that of the original CNN model. (3) Improvement range: At 10% missing data, performance improved by 23.0%. At 20% random noise, performance is improved by 21.9%. At 30% mixed interference, performance is improved by 29.9% (the largest increase). Table 13 proves through comparative experiments that the BKA optimization algorithm effectively enhances the robustness of the CNN feature extraction module, so that it can maintain better and more stable performance when there are missing data, noise and other interferences, which directly supports the thesis of “enhancing the robustness of noise and missing data” in the abstract.
Figure 10 shows how different models perform in predicting extreme temperature events and actual temperature changes. The horizontal axis represents time (hours), ranging from 0 to 70 h, and the vertical axis indicates temperature (°C), ranging from 10 °C to 35 °C. The true value is represented by a solid black line, which represents the actual temperature change. LSTM (long short-term memory) is represented by a blue dotted line that indicates predictions from long short-term memory network models. CNN-BiLSTM (convolutional neural network and bidirectional long short-term memory) is represented by a green dotted line, combined with a convolutional neural network and bidirectional long short-term memory. BKA-CNN-BiLSTM (BKA-optimized CNN-BiLSTM) is indicated by a solid red line. Cold wave events are marked with red shaded areas to indicate the time period of the cold wave event. Turning points are marked with a red pentagon to mark the turning point of temperature change.
True value: The solid black line shows the trend of temperature over time. Temperatures are high at first, then gradually drop, reaching their lowest point in cold wave events and then gradually rising again.
LSTM: The blue dotted line shows the prediction results of the LSTM model. During the temperature drop phase, there is a significant difference between the predicted and actual values, with an error of 5.8 °C. CNN-BiLSTM: The green dotted line shows the prediction results of the CNN-BiLSTM model. Although this model outperforms LSTM, there are still some errors.
BKA-CNN-BiLSTM: The solid red line represents the prediction results of the CNN-BiLSTM model optimized by the BKA. The prediction curve is highly consistent with the actual temperature change, especially during cold wave events, and the accuracy is significantly improved, with an error scale of 2.1 °C.
Cold wave events: The red shaded area clearly indicates the time period of the cold wave event, which helps evaluate the model’s predicted performance in this extreme event. Turning point: The red pentagon marks a turning point in temperature changes, marking a pivotal moment when the trend shifts from declining to rising.
By comparing the prediction results of different models with the actual temperature changes, it is obvious that the CNN-BiLSTM model optimized by the BKA shows better performance in predicting extreme temperature events, and its prediction curve is closer to the actual value and has a lower error. This shows that the BKA optimization method effectively improves the accuracy and robustness of the model in the prediction of extreme events, as shown in Figure 10.

4.3.4. Comparative Analysis of Multiple Algorithms in the Model

Figure 11 visually compares the performance of three models, LSTM, CNN-BiLSTM, and BKA-CNN-BiLSTM, over multiple indicators through radar maps.
Blue (LSTM): Performance of the long short-term memory (LSTM) model. Green (CNN-BiLSTM): Model performance combining convolutional neural networks (CNNs) and bidirectional LSTMs. Red (BKA-CNN-BiLSTM): Performance of the BKA-optimized CNN-BiLSTM model.
The radar diagram shows various performance indicators on its axis. The following indicators can be seen in the chart.
MAE (mean absolute error): Measures the average absolute error between the predicted and actual values. Smaller values indicate more accurate predictions. RMSE (root mean square error): Also used to measure predicted–actual errors, but more sensitive to larger deviations. The lower the value, the better. R2 (R-squared): Represents the model’s fit of the data, ranging from 0 to 1. The closer to 1, the more appropriate. Extreme event accuracy (EEA): Measures the model’s accuracy in predicting extreme events. Higher values indicate better performance. The 24 h trend fit indicates the model’s fit to the 24 h temperature trend. Higher values indicate a better fit. Peak error: Measures the error in predicting temperature peaks. Smaller numbers indicate more accurate peak predictions. Peak signal suppression (PSR, possibly a specific metric): Indicates how well the model performs in handling peak signals. The specific meaning depends on the context of the study, and the assessment should be based on the definition of this indicator.
LSTM (blue): While relatively balanced across metrics, it underperforms the other two models in terms of extreme event accuracy and R2. CNN-BiLSTM (green): Outperforms LSTM in some metrics, especially in R2 and extreme event accuracy. BKA-CNN-BiLSTM (red): Demonstrates excellent performance for multiple indicators, with significant advantages in extreme event accuracy (89.4%) and peak signal suppression (31.2%). The chart also highlights other advantages of BKA-CNN-BiLSTM, including the R2exp of 0.15 for CNN-BiLSTM.
The radar chart clearly shows that the BKA-CNN-BiLSTM model outperforms the LSTM and CNN-BiLSTM models for several indicators, especially in extreme event prediction and peak signal processing. This indicates that the BKA optimization method effectively improves the overall performance of the model, enabling it to provide better prediction capabilities in complex and extreme scenarios, as shown in Figure 11.

4.3.5. Loss and Accuracy Analysis of the Model

Figure 12 visually shows the training process and generalization performance of the proposed BKA-CNN-BiLSTM model. We started by plotting the accuracy curves for training and validation in the air quality dataset. For the regression task, the negative mean square error (-MSE) and mean square error (MSE) curves are shown in Figure 12.
The loss curves in Figure 12a show that training and validation losses decrease rapidly with increasing training cycles and then stabilize without significant divergence. This indicates that the training process is stable and efficient. Crucially, these two curves were closely aligned throughout the training process, and there was no case where the validation loss significantly exceeded the training loss—strong evidence against model overfitting. This stability is mainly due to the regularization coefficients in BKA optimization and the dropout techniques used in the model, which effectively improve the generalization performance of unseen data.
Similarly, Figure 12b shows that the accuracy of training and validation rises and converges to higher values simultaneously, indicating that the model has strong fitting and generalization capabilities. This stable convergence ensures the reliability of the model and sets the stage for subsequent quantitative comparisons.

4.3.6. Comparative Analysis of Model Inference Delay and Prediction Accuracy

Figure 13 shows the trade-offs between different models in terms of inference latency and prediction accuracy, while also considering factors such as memory usage.
The horizontal axis represents the inference latency in milliseconds (ms) and ranges from 0 to 80 milliseconds. The vertical axis shows the prediction accuracy, expressed as a percentage (%), ranging from 70% to 95%. The legend lists different models, each with a different color and shape: LSTM (blue circle), CNN-BiLSTM (orange circle), PSO-CNN-BiLSTM (green circle), GRU (color not specified but distinguishable from other models), transformer (purple circle), and BKA-CNN-BiLSTM (red star).
The position of different models in the graph reflects their performance in inference latency and prediction accuracy. For example, the transformer model (purple circle) has a higher prediction accuracy but a relatively high inference delay; The LSTM model (blue circle) has low inference latency but relatively low prediction accuracy.
The BKA-CNN-BiLSTM model (marked with a red star) stands out in the graph, demonstrating high prediction accuracy (nearly 90%) and relatively low inference latency (approximately 30 ms). The note “memory reduced 42% by BKA optimization” specifically refers to this model, highlighting its significant memory efficiency gains.
The green box on the left side of Figure 13 specifies the edge deployment requirements: “Latency < 50 ms, Memory < 10 MB” (latency less than 50 ms, memory less than 10 MB). This suggests that edge computing scenarios require models to meet these strict resource constraints.
The dotted box on the right side of Figure 13 indicates “deployable areas”, and the model strikes a balance between inference latency and prediction accuracy, making it suitable for actual deployment. The BKA-CNN-BiLSTM model falls into this region, demonstrating its ability to achieve high prediction accuracy while meeting resource constraints.
This visualization illustrates the trade-off between inference latency and prediction accuracy between different models, taking into account memory consumption and edge deployment needs. The BKA-CNN-BiLSTM model performs well in the graph, achieving high prediction accuracy through BKA optimization, significantly reducing memory usage and meeting the stringent requirements of edge deployment, as shown in Figure 13.
A key innovation in this study is the use of a BKA to achieve automatic optimization of CNN hyperparameters. As a group-based attention mechanism, the BKA acts on hyperparametric spaces and forms a complementary approach to the internal attention mechanisms of neural networks (such as sequential attention and spatial attention). The internal attention mechanism teaches the model to focus on data layer features, while the BKA enables the model to optimize the configuration of its architecture layer. This dual-emphasis strategy—optimizing model configuration externally with a BKA and capturing time dependence internally with a BiLSTM—is the cornerstone of our hybrid model’s superior performance. The BKA’s attention mechanism ensures that the feature extractor (CNN) remains robust and efficient, laying the foundation for high-quality temporal modeling of the BiLSTM.

4.3.7. Comparison with Traditional Non-Deep Learning Methods

(1)
Choose the appropriate traditional comparison method
Classic, recognized, and representative traditional methods (environmental forecasting, time-series forecasting) in the field of study were selected.
The autoregressive integrated moving average (ARIMA) model is the gold standard for time-series forecasting and is especially effective for data with clear trends and significant seasonal variations. It is the benchmark for pure time-series modeling.
Support vector regression (SVR) is a classic algorithm in the era of machine learning, excelling in small-shot and nonlinear problems. It can handle multivariate inputs and is therefore compatible with sensor data scenarios.
Stochastic forest regression (RFR) is a representative of ensemble learning, which can capture the interaction and nonlinear relationship between features well and is not prone to overfitting.
(2)
Experimental design and fairness guarantee
To ensure fairness of the comparison, the experimental design is as follows: All comparison models are divided into training, validation, and test sets using the same dataset. For the input data, traditional models (such as SVR and RFR) cannot directly process the original spatial data, so the advanced spatiotemporal features extracted by CNN-BiLSTM optimized by BKA are also used as inputs to these traditional models, so as to compare the prediction performance of BiLSTM and traditional regression algorithms on the basis of the same features. At the same time, we have carried out hyperparameter tuning for all the traditional models for comparison, and the tuning process and the final parameter settings are recorded to ensure the fairness of the comparison.
(3)
Result display and analysis
Quantitative comparison: The original results table and graphs have been updated with the performance metrics of the above traditional methods, including RMSE, MAE, MAPE, and R2, as described above, with an accuracy rate of 89.4%. See Figure 14 and Figure 15 for details.
Limitations of traditional methods are pointed out. In the analysis of the results, the reasons for the poor performance of these traditional methods are clearly identified.
ARIMA cannot effectively simulate complex spatial relationships between multiple variables and has a weak ability to capture long-term dependencies.
SVR/RFR: Capable of handling multivariate data but limited in handling long-term time dependencies. In addition, their robustness to noise in the raw data is often inferior to that of the optimized CNN model.
Highlighting model advantages: Based on the comparative results, we further emphasize the innovative features of the BKA-CNN-BiLSTM model.
As shown in Table 14, the prediction error of the traditional ARIMA and SVR models is significantly higher than that of the proposed models. This shows that when processing spatiotemporal dynamic WSN data, it is crucial to clearly extract spatial features through CNN and use BiLSTM to model bidirectional long-term time dependencies. Traditional approaches are limited by their architecture and struggle to address both complexities. In particular, when predicting extreme weather events such as sudden rainstorms or strong winds, the accuracy of traditional methods is usually less than 70%, while our model achieves 89.4%. This suggests that the BKA-optimized CNN-BiLSTM model has unique advantages in capturing subtle spatiotemporal pattern mutations that predict extreme events.

4.3.8. Comparison with Heuristic Algorithms

In this study, we chose the black-winged kite optimization algorithm (BKA) as the optimizer for key CNN hyperparameters, which is not arbitrary but based on the deep fit of the BKA with the unique challenges of predicting the environment of the dynamic wireless sensor network (WSN). The following sections elaborate on the superiority and applicability of the BKA in the 3D field compared to other meta-heuristic algorithms, such as particle swarm optimization (PSO), the genetic algorithm (GA), and the whale optimization algorithm (WOA).
The BKA simulates three behavioral strategies of black-winged kites when hunting: hovering, diving, and eavesdropping. These strategies align perfectly with the need to address data uncertainty and model optimization in a WSN.
Global exploration: Black-winged kites cruise through high altitudes, searching extensively for their hunting areas. This is similar to the case where the optimization algorithm performs a broad search in the global parameter space to avoid local optimals. This exploration is crucial for WSN data, as the optimal combination of hyperparameters such as learning rate and number of hidden nodes may be located in atypical regions of non-convex high-dimensional spaces.
Local Exploitation: After spotting prey, the black-winged kite quickly dives to capture it accurately. This is similar to the algorithm’s approach—a meticulous local search within a range of prospective parameters to determine the optimal solution. This precision is crucial for fine-tuning model performance and achieving high predictive accuracy.
Adaptation: Black-winged kites identify new food sources by monitoring the behavior of other birds. This adaptive mechanism allows the algorithm to break free from current search patterns and explore promising new territories. It directly solves the dynamics and uncertainties of WSN data flow caused by environmental changes or node failures, giving the model strong online adaptability.
This exploration–development–adaptation balancing mechanism enables the BKA to handle common noise, missing values, and non-stationary patterns in WSN data more efficiently than traditional algorithms.
To better justify our choice, we critically examine the limitations inherent in other mainstream meta-heuristic algorithms in the context of this study.

4.3.9. Direct Correlation of the BKA to Scientific Questions

Optimize spatial feature robustness: The BKA uses its strong global exploration capabilities and resistance to local optima to identify optimal hyperparameter combinations that are insensitive to input data noise and missing values, such as appropriate regularization coefficients and filter counts, thereby directly improving the robustness of spatial feature extraction.
Enhanced temporal modeling capabilities: By optimizing the CNN feature extractor before BiLSTM, the BKA indirectly improves the feature quality input of the BiLSTM. Clearer and more representative spatial features allow the BiLSTM to focus more effectively on learning complex time dependencies, improving its dynamic modeling performance.
Adaptive optimization for resource-constrained scenarios: The BKA has fast convergence and excellent parameter adaptability, eliminating the need for time-consuming iterations like the GA or fine-tuning like PSO to prevent premature optimization. This is consistent with the WSN’s resource constraints, enabling high-performance model configuration at reasonable computational costs.
In summary, the BKA was chosen as the optimizer for this hybrid model due to its excellent balance between global and local search, its inherent robustness to noise and dynamic environments, and its high convergence efficiency. Its core design concept not only replaces PSO or the GA, but also highly fits the inherent requirements of WSN environmental prediction, which theoretically enables the CNN-BiLSTM model to achieve better and more stable performance.

4.3.10. Comparative Experiment Between the BAK Algorithm and Particle Swarm Optimization (PSO), Genetic Algorithm (GA), and Whale Optimization Algorithm (WOA)

  • Comparison of exploratory abilities
In Figure 16, the background curves in these four plots represent Gaussian contour plots used in military scenarios. These topographic elevation maps based on Gaussian projections accurately depict terrain relief. Each pair of adjacent contour lines maintains a uniform elevation difference (called contour spacing) with no intersections or overlaps. Closed contours form different topographic units, such as peaks and depressions, and the density of contours corresponds directly to the slope: dense contour density indicates a steeper slope, while sparse contour lines indicate a gentler terrain.
The demonstration path exploration algorithm displays the search trajectory graph in the parameter space (parameter 1: learning rate; parameter 2: number of hidden nodes). The green dot represents the starting point, the yellow star represents the global optimum, the purple background indicates the high fitness area, and the yellow area indicates the low fitness area. The following paragraphs provide a comparative analysis of the four algorithms.
(1)
BKA algorithm performance: The search trajectory is highly concentrated in the high-quality region (purple) and maintains a range close to the global optimal region (yellow star), with the smallest deviation from the optimal region. Advantages: The algorithm demonstrates strong directionality and efficiency, maintaining a steady focus on high-quality spaces while avoiding unnecessary exploration. Limitations: No significant drawbacks, but the search is still compact and precise.
(2)
PSO performance: The trajectory extends from the starting point to the global optimum, usually tending to the optimal area, but the fluctuation is small. Advantages: The search process shows a clear direction and gradually approaches the global optimum. Cons: The stability of the search process is slightly inferior to the BKA.
(3)
GA algorithm performance: The trajectories are scattered and disordered, and some points enter the bad area (yellow), away from the high-quality space. Advantages: It is exploratory, but the exploration efficiency is very low. Disadvantages: Weak search, easy to deviate from high-quality areas, and ineffective exploration content.
(4)
Performance of the WOA: The trajectory exhibits extreme ranges and chaotic patterns, often entering the suboptimal region (yellow), and only the final stage is close to the global optimum. Advantages: Finally close to the world’s best. Cons: The search process is redundant, extremely inefficient, and involves a lot of invalid exploration.
  • Sorted by strengths and weaknesses
Search behavior efficiency and accuracy: BKA > PSO > GA > WOA.

4.4. Experimental Discussion

Based on the above experimental results, this section analyzes the performance of the CNN-BiLSTM-BKA model and compares it with similar recent work.

4.4.1. Performance Attribution Analysis

The model’s excellence stems from the synergistic mechanism inherent in its design, which provides direct evidence in ablation experiments and robustness tests. Effect of BKA optimization on feature robustness: The BKA adaptively selects a lower learning rate (~0.0008) and a higher L2 regularization coefficient (~0.005) for the CNN by minimizing validation set loss. This combination effectively suppresses the overfitting of noise and missing values in the training data by the CNN, making the learned spatial feature filter insensitive to input perturbations. This directly explains why the RMSE of the full model increased by only 12.1% in tests with a missing data rate of up to 30%, while the RMSE of the version without the BKA (or using PSO) increased by more than 25%. BKA optimization is the core source of the model’s anti-interference ability. The effect of BiLSTM on extreme event capture: In the prediction of extreme events (such as peak pollution), the F1-score of the complete model reached 89.4%, which is significantly higher than that of the one-way LSTM variant (82.1%). This is because the backward layer of the BiLSTM can use the context information of several “future” points in the event rise phase to correct the “current” state. For example, at the beginning of the pollution event, the backward layer implicitly senses the rapid upward trend of the subsequent concentration, which reinforces the anomaly signal earlier in the hidden state, making the model’s judgment of the starting point of the event more accurate and timely.

4.4.2. Comparison with Similar Studies

Our results show competitiveness compared to recent studies using intelligent optimization algorithms to enhance spatiotemporal prediction models. Our team found that using PSO-optimized CNN-BiLSTM for air quality prediction, the reported MAE increased by 6.3%. In this study, a more balanced BKA with a search strategy is introduced, and the CNN hyperparameters that directly affect the quality of features (but not all parameters) are optimized, and a greater improvement of RMSE of 11.4% is achieved on similar tasks. Compared with those that introduce complex attention mechanisms to improve the robustness of the model, the proposed model achieves similar robustness effects through the “fine crafting” (BKA optimization) of the front-end feature extractor and does not increase the computational complexity of online inference. This shows that targeted optimization of the basic feature extraction module is an efficient and practical technical path in the WSN scenario with limited resources.

5. Discussion

Based on the experimental results from Section 4, this chapter discusses the success mechanism, methodological implications, and potential challenges of the CNN-BiLSTM-BKA model at a deeper level.

5.1. The Key to Model Success: Co-Design and Problem Decoupling

The core advantage of this model is that it effectively decouples the spatiotemporal complexity in dynamic WSN environment prediction through modular collaboration.
First, the introduction of BKA separates the meta-problem of “hyperparameter optimization” from model training. It acts as an offline “smart configurator” whose goal is to find the combination of parameters that makes the CNN feature extractor least sensitive to noise and dynamic topologies. This essentially puts the task of dealing with data uncertainty in part to the model configuration stage, so that the subsequent CNN-BiLSTM pipeline can run on a more stable starting point.
Secondly, the serial architecture of the CNN and BiLSTM clearly separates spatial and temporal modeling. The translation invariance of the CNN does not need to rely on fixed node coordinates and only adapts to spatial correlations from the data, naturally adapting to network topology changes. The BiLSTM focuses on the temporal dynamics of optimized advanced feature sequences. This clear division of labor from spatial feature extraction to time-series-dependent modeling avoids the burden of a single model handling two complexities at the same time and is an important guarantee for model accuracy and generalization ability.

5.2. Methodological Implications: Optimization of the Efficiency of Feature Extractors

The results of this study show that in the resource-constrained edge intelligence scenario, instead of blindly increasing the depth or complexity of the model, such as adding more attention layers, the front-end feature extractor is finely optimized. It may be more cost-effective. The optimization of CNN hyperparameters by the BKA can be regarded as injecting a prior knowledge of data characteristics into the model; that is, the small offline computation cost is exchanged for a significant robustness improvement and accuracy gain in the online inference of the model. This provides a new idea for the design of edge-side lightweight prediction models: it is often more effective to pay attention to the quality of basic features than to stack complex structures.

5.3. Trade-Offs Between Limitations and Actual Deployment

Despite the model’s excellent performance in experiments, there are several key challenges to its journey towards real-world large-scale deployment:
Optimization cost and benefit trade-off: While the BKA’s offline optimization process is acceptable on cloud or edge servers, its computational cost still needs to be clearly weighed. In the future, it is necessary to study lighter optimization algorithms or efficient parameter adjustment strategies with small samples to lower this threshold.
Dependence on data continuity and correlation: Model performance is based on the continuity and correlation of spatiotemporal data. When there is a large-scale, long-term non-random failure in the network, resulting in a spatial correlation break, the predictive power of the model may decline sharply. This requires robust anomaly detection and a data remediation front-end in the actual system.
Boundary of generalization ability: In a specific pollution diffusion mode, the spatial characteristics and network topology learned by the model on the current dataset are closely related. When migrating it directly to a new region with completely different terrain and node density, performance degradation can occur. As a result, the model has limited plug-and-play capabilities and requires support for transfer learning or fine-tuning.

6. Conclusions and Future Work

6.1. Conclusions

Aiming at the spatiotemporal complexity challenge of environmental prediction in dynamic wireless sensor networks, a new hybrid model combining a black-winged kite algorithm (BKA), convolutional neural network (CNN), and bidirectional long short-term memory (BiLSTM) network is proposed. Through theoretical analysis and systematic experiments, the following core conclusions are drawn.
The proposed CNN-BiLSTM-BKA model can significantly improve the prediction accuracy and robustness. On multiple real environmental monitoring datasets, the model reduces the prediction error (RMSE) by 19.3–32.7% compared with the mainstream baseline model and achieves an accuracy of 89.4% in the identification of extreme weather events.
The superiority of the model comes from its inherent synergy mechanism: The BKA significantly enhances the robustness of spatial features to noise and missing data by optimizing the key hyperparameters of the CNN offline. The serial architecture of the CNN and BiLSTM effectively decouples and captures complex spatial patterns and bidirectional time dependencies, respectively.
The framework has practical application value, and its lightweight design, such as inference latency of 32 ms and memory occupancy of 8.7 MB, is suitable for resource-constrained WSN edge nodes, providing a feasible technical solution for accurate ecological monitoring and disaster early warnings.

6.2. Future Work

Based on the findings of this study and the limitations revealed in the discussion, future work will focus on the following three areas.
Algorithm lightweight and adaptive aspects: Research lightweight variants of BKA optimizers or develop fast hyperparametric adaptive methods based on meta-learning to support efficient model configuration and updates on more resource-constrained edge devices.
Model robustness and generalization: Explore the integration graph neural network to explicitly model the dynamic topology and enhance the stability of the model in the scenario of high-speed node movement. At the same time, the cross-regional and cross-task transfer learning framework should be systematically studied to improve the generalization ability and deployment convenience of the model.
System integration and application expansion: The prediction model should be integrated with advanced anomaly detection and data repair modules to build a more complete WSN data intelligent processing pipeline and explore its application potential in a wider range of fields, such as smart agriculture and industrial environmental monitoring.

Author Contributions

Writing—original draft, L.W.; Supervision, A.Y.D. and F.M. All authors have read and agreed to the published version of the manuscript.

Funding

The research was supported by the China-Laos-Thailand International Joint R&D Center for Education Digitalization in Yunnan Province (Project No.: 202203AP140006).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data used in this study are available upon reasonable request from the corresponding author. This data are not publicly archived due to third-party rights protection.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Akyildiz, I.F.; Su, W.; Sankarasubramaniam, Y.; Cayirci, E. Wireless Sensor Networks: A Review. Comput. Mag. Netw. 2002, 38, 393–422. [Google Scholar] [CrossRef]
  2. Akyildiz, I.F.; Su, W.; Sankarasubramaniam, Y.; Cayirci, E. A Review of Sensor Networks. IEEE Commun. 2002, 40, 102–114. [Google Scholar] [CrossRef]
  3. Lee, D.-J. Spatiotemporal Functional Data Analysis of Wireless Sensor Network Data. Environ. Metrol. 2015, 26, 354–362. [Google Scholar]
  4. Bergstra, J.; Bengio, Y. Random Search for Hyperparameter Optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
  5. Werner-Allen, G.; Lorincz, K.; Ruiz, M.; Marcillo, O.; Johnson, J.; Lees, J.; Welsh, M. Deploying Wireless Sensor Networks on Active Volcanoes. IEEE Internet Comput. 2006, 10, 18–25. [Google Scholar] [CrossRef]
  6. Box, G.E.P.; Jenkins, G.M. Time Series Analysis: Forecasting and Control, 3rd ed.; Prentice Hall: Englewood Cliffs, NJ, USA, 1994. [Google Scholar]
  7. Eberhart, R.; Shi, Y. Particle Swarm Optimization: Development, Applications, and Resources. In Proceedings of the 2001 Congress on Evolutionary Computation, CEC 2001, Seoul, Republic of Korea, 27–30 May 2001; pp. 81–86. [Google Scholar]
  8. Hamill, T.M.; Juras, J. Measurement and Forecasting Skills: Is It a Real Skill or Climate Change? Q. J. R. Meteorol. Soc. 2006, 132, 2905–2923. [Google Scholar] [CrossRef]
  9. Shin, H.; Moh, S.; Chung, I. Balanced Clustering Algorithm for Non-uniformly Deployed Sensor Networks. In Proceedings of the IEEE 9th International Congress, Autonomic Secure Computing, Sydney, NSW, Australia, 12–15 December 2011; pp. 760–767. [Google Scholar] [CrossRef]
  10. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  11. LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  12. Islam, S.M.R.; Guo, D.; Kabir, M.H.; Hussain, M.; Kuo, K.-S. The Internet of Things in Healthcare: A Comprehensive Survey. IEEE Access 2015, 3, 678–708. [Google Scholar] [CrossRef]
  13. Zhou, Z.; Chen, Y.; Li, M. Spatiotemporal modeling of environmental data using a hybrid CNN-BiLSTM architecture. IEEE Trans. Geogr. Remote Judgm. 2021, 59, 5123–5134. [Google Scholar]
  14. Li, J.; Zhang, H.; Liu, W. Particle Swarm Optimization Based on Hyperparameter Tuning of Deep Neural Networks in WSN Monitoring. IEEE Senate J. 2021, 21, 13456–13465. [Google Scholar]
  15. Li, M.; Liu, Y. Underground structure monitoring with wireless sensor networks. In Proceedings of the 6th International Conference on Information Processing in Sensor Networks, Cambridge, MA, USA, 25–27 April 2007; pp. 69–78. [Google Scholar] [CrossRef]
  16. Lai, W.; Li, H.; Peng, H.; Cheng, L. From “ontological attributes” to “group networks”: A study on the integrated delineation method of rural contiguous units. Small Town Constr. 2024, 42, 61–69. [Google Scholar]
  17. Mo, J.; Li, Y.; Cen, X. New Perspectives and New Methods for Integrating Traditional Chinese Medicine Culture into Ideological and Political Education for Medical Students. J. Soc. Sci. Shanxi Coll. Univ. 2024, 36, 57–61. [Google Scholar] [CrossRef]
  18. Yang, Y.; Qu, J.; Dong, W.; Zhang, T.; Xiao, S.; Li, Y. TMCFN: Text-supervised multidimensional contrastive fusion network for hyperspectral and LiDAR classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar] [CrossRef]
  19. Han, W.; Yu, Y.; Sun, M.; Zheng, J.; Ouyang, S. New method for integrated surface and underground subsidence monitoring based on Beidou system. Disaster Sci. 2024, 39, 69–74. [Google Scholar]
  20. Xu, H.; Bai, Y. New Explorations in the Promotion of Theater Performances in the Era of Media Convergence. J. Beijing City Univ. 2023, 4, 83–88. [Google Scholar] [CrossRef]
  21. Lin, Y.; Li, X.; Wang, Z. Deep Bidirectional LSTM Networks for Time Series Prediction in Environmental Monitoring. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 1456–1468. [Google Scholar]
  22. Guo, F.; Dong, J.; Guo, R. A New Method for Heat Exposure Risk Assessment Integrating Outdoor Space Behavior. World Archit. 2022, 9, 92–96. [Google Scholar] [CrossRef]
  23. Liu, H.; Li, Q.; Wu, D.; Chen, Y. Hybrid CNN-LSTM Model for Particulate Matter Prediction (PM2.5). IEEE Access 2020, 8, 26933–26940. [Google Scholar] [CrossRef]
  24. Zhang, M.; Chen, K.; Zhao, L. Optimization of Whale Optimization Algorithms for Optimization of LSTM for Environmental Time Series Prediction. Ecol. Inform. 2020, 58, 101123. [Google Scholar]
  25. Wu, S.; Wang, Y.; Zhang, Q. Attention-enhanced CNN-LSTM for robust air quality prediction under missing data. Neural Comput. 2021, 421, 187–198. [Google Scholar]
  26. Zhang, C.; Zhang, Y.; Shi, Y.; Li, X. Robustness of Deep Learning Models in Time Series Prediction: An Experimental Study. IEEE Access 2020, 8, 209200–209211. [Google Scholar]
  27. Mainwaring, A.; Polastre, J.; Szewczyk, R.; Culler, D.; Anderson, J. Wireless Sensor Networks for Habitat Monitoring. In Proceedings of the ACM International Symposium on Wireless Sensors, Application (WSNA), Atlanta, GA, USA, 28 September 2002; pp. 88–97. [Google Scholar]
  28. Schuster, M.; Paliwal, K.K. Bidirectional Recurrent Neural Networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
  29. Xu, S.; Zhi, C.; Hao, W.; Wong, D.-Y.; Woo, W.-C. Convolutional LSTM Networks: A Machine Learning Method for Precipitation Phenomena. In Proceedings of the 28th International Conference, Neural Infrastructure Processes, Systems (NIPS), Montreal, QC, Canada, 8–13 December 2015; Volume 1, pp. 802–810. Available online: https://papers.nips.cc/paper/2015/file/07563a3fe3bbe7e3ba84431ad9d055af-Paper.pdf (accessed on 10 March 2024).
  30. Ye, R.; Zhang, L.; Liu, H. CNN-BiLSTM PM2.5 Concentration Prediction Model with Attention Mechanism. Atmos. Environ. 2019, 213, 422–431. [Google Scholar]
  31. Yick, J.; Mukherjee, B.; Ghosal, D. Survey of Wireless Sensor Networks. Comput. Netw. 2008, 52, 2292–2330. [Google Scholar] [CrossRef]
  32. Zhu, X.; Zou, F.; Li, S. Enhancing Air Quality Prediction with Adaptive PSO-Optimized CNN-Bi-LSTM Models. Appl. Sci. 2024, 14, 5787. [Google Scholar] [CrossRef]
Figure 1. CNN-BiLSTM-BKA hybrid model framework diagram. In Figure 1, in the output of the optimal hyperparameter θ = [ l r * , u n i t s * , λ * ] box, the superscript (*) is used to indicate the optimization of the corresponding hyperparameter, or the optimal value. The optimized values of hyperparameters obtained by the black kite algorithm (BKA) include the learning rate ( l r * ), the number of hidden units ( u n i t s * ), and the regularization coefficient ( λ * ). These optimization parameters are then used in the CNN layer to enhance spatial feature extraction and overall model performance. Appears at the bottom of the flowchart, marked below the “Final predicted value…” box, with a cross in the circle indicating the end of the process and the output result.
Figure 1. CNN-BiLSTM-BKA hybrid model framework diagram. In Figure 1, in the output of the optimal hyperparameter θ = [ l r * , u n i t s * , λ * ] box, the superscript (*) is used to indicate the optimization of the corresponding hyperparameter, or the optimal value. The optimized values of hyperparameters obtained by the black kite algorithm (BKA) include the learning rate ( l r * ), the number of hidden units ( u n i t s * ), and the regularization coefficient ( λ * ). These optimization parameters are then used in the CNN layer to enhance spatial feature extraction and overall model performance. Appears at the bottom of the flowchart, marked below the “Final predicted value…” box, with a cross in the circle indicating the end of the process and the output result.
Applsci 16 00296 g001
Figure 2. Schematic diagram of convolutional layers.
Figure 2. Schematic diagram of convolutional layers.
Applsci 16 00296 g002
Figure 3. Flow diagram of CNN sensor data processing.
Figure 3. Flow diagram of CNN sensor data processing.
Applsci 16 00296 g003
Figure 4. LSTM single structure diagram.
Figure 4. LSTM single structure diagram.
Applsci 16 00296 g004
Figure 5. Schematic diagram of the parameter structure of the BiLSTM computing environment. Architecture of the Bidirectional LSTM (BiLSTM) module used for temporal dynamic modeling. The forward LSTM ( W f ) processes input sequences from past to future, while the backward LSTM ( W b ) processes them in reverse order. Arrows indicate the direction of information flow: solid lines represent data propagation between time steps and hidden states, while dashed lines denote the concatenation of forward and backward outputs at each time step. This bidirectional structure enables the model to capture long-term dependencies in both directions, enhancing environmental prediction accuracy.
Figure 5. Schematic diagram of the parameter structure of the BiLSTM computing environment. Architecture of the Bidirectional LSTM (BiLSTM) module used for temporal dynamic modeling. The forward LSTM ( W f ) processes input sequences from past to future, while the backward LSTM ( W b ) processes them in reverse order. Arrows indicate the direction of information flow: solid lines represent data propagation between time steps and hidden states, while dashed lines denote the concatenation of forward and backward outputs at each time step. This bidirectional structure enables the model to capture long-term dependencies in both directions, enhancing environmental prediction accuracy.
Applsci 16 00296 g005
Figure 6. Time-series output of BiLSTM-calculated environmental parameters.
Figure 6. Time-series output of BiLSTM-calculated environmental parameters.
Applsci 16 00296 g006
Figure 7. The 5 km × 5 km experimental site. The circles and numbers represent the number of sensor nodes: the numbers in each circle are 10, 15, 15, and 10, and there are 10 sensor nodes, 15 sensor nodes, and 10 sensor nodes in the corresponding area.
Figure 7. The 5 km × 5 km experimental site. The circles and numbers represent the number of sensor nodes: the numbers in each circle are 10, 15, 15, and 10, and there are 10 sensor nodes, 15 sensor nodes, and 10 sensor nodes in the corresponding area.
Applsci 16 00296 g007
Figure 8. Analysis of the impact of sampling cycles on signal fidelity and system energy consumption.
Figure 8. Analysis of the impact of sampling cycles on signal fidelity and system energy consumption.
Applsci 16 00296 g008
Figure 9. Noise data feature map.
Figure 9. Noise data feature map.
Applsci 16 00296 g009
Figure 10. BKA optimization method improves model accuracy in extreme event prediction.
Figure 10. BKA optimization method improves model accuracy in extreme event prediction.
Applsci 16 00296 g010
Figure 11. Multi-model performance radar diagram.
Figure 11. Multi-model performance radar diagram.
Applsci 16 00296 g011
Figure 12. The training and validation curves of the BKA-CNN-BiLSTM model on the air quality dataset, including the (a) loss curves and (b) accuracy curves.
Figure 12. The training and validation curves of the BKA-CNN-BiLSTM model on the air quality dataset, including the (a) loss curves and (b) accuracy curves.
Applsci 16 00296 g012
Figure 13. Comparative analysis of inference latency versus prediction accuracy in a model.
Figure 13. Comparative analysis of inference latency versus prediction accuracy in a model.
Applsci 16 00296 g013
Figure 14. Comparison of traditional methods and BKA-CNN-BiLSTM model performance.
Figure 14. Comparison of traditional methods and BKA-CNN-BiLSTM model performance.
Applsci 16 00296 g014
Figure 15. BKA-CNN-BiLSTM performance improvement compared to the best traditional method (RFR).
Figure 15. BKA-CNN-BiLSTM performance improvement compared to the best traditional method (RFR).
Applsci 16 00296 g015
Figure 16. Diagram of search behavior for four algorithms. Search behavior comparison of BKA, PSO, GA, and WOA in the parameter space (learning rate vs. number of hidden nodes). The background shows Gaussian contour plots representing fitness landscapes. The green dot indicates the starting point, the yellow star denotes the global optimum, the purple region represents high fitness areas, and the yellow region indicates low fitness zones. The red lines depict the search trajectories, while red dots mark the best solution at each iteration. This visualization enables a comparative analysis of exploration efficiency and convergence dynamics.
Figure 16. Diagram of search behavior for four algorithms. Search behavior comparison of BKA, PSO, GA, and WOA in the parameter space (learning rate vs. number of hidden nodes). The background shows Gaussian contour plots representing fitness landscapes. The green dot indicates the starting point, the yellow star denotes the global optimum, the purple region represents high fitness areas, and the yellow region indicates low fitness zones. The red lines depict the search trajectories, while red dots mark the best solution at each iteration. This visualization enables a comparative analysis of exploration efficiency and convergence dynamics.
Applsci 16 00296 g016
Table 1. Comparative analysis of existing environmental prediction methods.
Table 1. Comparative analysis of existing environmental prediction methods.
Method TypeRepresentative TechnologySpatial Feature ProcessingTime Dynamics ModelingResource EfficiencyMain Limitations
Traditional statistical methodsAlima, krigingLimitedSecondaryHighAssuming linear relationships makes it difficult to capture complex nonlinear dynamics.
Single machine learningCNN/LSTMModerate to goodLimited to moderateSecondaryIt is not possible to optimize spatiotemporal features at the same time. Limited adaptability.
Hybrid modeCNN-LSTMGoodGoodMedium lowHyperparameter sensitivity and robustness in dynamic WSNs are insufficient
Functional data analysisFPCAGoodGoodSecondaryVery sensitive to missing data. Computational complexity. The value increases dramatically as the number of nodes increases.
Table 2. BKA optimization of the hyperparameter search space.
Table 2. BKA optimization of the hyperparameter search space.
HyperparametersSearch ScopeDescription
Learning Rate (LR)[0.0001, 0.01]CNN training updates the steps
Hidden Nodes (units)[32, 128]CNN full connection layer dimension
L2 Regularization Coefficient (λ)[1 × 10−5, 1 × 10−2]Weight penalty item coefficient
Table 3. Model fixed hyperparameter settings.
Table 3. Model fixed hyperparameter settings.
SubassemblyParameterValue
Input/outputTime steps24
Number of sensor nodes15
Feature dimensions5
CNNConvolutional cores/layers[32, 64]
Convolutional kernel size(3, 3)
PoolingMaxPooling(2, 2)
BiLSTMNumber of layers2
Number of units per floor128
Dropout rate0.2
TrainingOptimizerAdam
Batch size32
Training cycle100 (Early Stop)
Table 4. Sensor layer hardware.
Table 4. Sensor layer hardware.
Component CategoryExamples of Models/SpecificationsQuantityFunction and Principle
Main chipArduinoMKRWAN1300/STM32L series50Low-power microcontrollers are responsible for data acquisition, processing, and communication.
Environmental sensorsBME680 (temperature, humidity, air pressure, volatile organic compounds)50It is compact and integrates multiple parameters for core microclimate data collection.
Air quality sensorSDS011 (PM2.5/PM10)/SGP3015They are mainly deployed in cities and upwind for air quality forecasting.
Communication moduleLoRa (e.g., RFM95W)/NB-IoT50Key components: Realize long-distance, low-power wireless communication and form a dynamic self-organizing network.
Power system18,650 lithium-ion battery + small solar panel50 setsProvide energy for long-term on-site deployments, embodying “resource constraints” and “dynamic” characteristics (nodes may go offline due to energy issues).
HullCustom enclosure with IP65 protection50Waterproof and dustproof, protecting internal electronic components.
Fixed barriersStainless steel columns50 setsThe nodes are fixed at a height of 2–3 m from the ground to reduce ground interference.
Table 5. Communication layer and data layer configuration table.
Table 5. Communication layer and data layer configuration table.
AssemblySpecifications/RequirementsFunction and Principle
GatewayRaspberry Pi 4B-based LoRa/NB-IoT gateway2–3 nodes, placed at the highest point of the website. They are responsible for aggregating data from all nodes and uploading it to cloud servers via 4G/ethernet.
ECS/On-premises serverCloud host (AWSEC2) or high-performance workstation with a public IP addressThe BKA-optimized CNN-BiLSTM model was used for model training and prediction.
Data storageMySQL/InfluxDB time-series databaseIt is used to efficiently store and manage massive spatiotemporal sequence data from 50 nodes.
Table 6. Benchmark results for the performance of the edge device model.
Table 6. Benchmark results for the performance of the edge device model.
TypeHardware PlatformBatch SizeOptimization TechnologyMemory Usage (MB)Average Latency (ms)Flux (Samples per Second)
BKA-CNN-BiLSTM (ours)Raspberry Pi 4B8INT8 quantization and pruning45.232.1249.2
CNN-BiLSTMRaspberry Pi 4B8INT8 quantization and pruning43.835.5225.4
Standalone BiLSTMRaspberry Pi 4B8INT8 quantization38.528.3282.7
Independent CNNRaspberry Pi 4B8INT8 quantization36.125.6312.5
BKA-CNN-BiLSTM (ours)Inteli5-10210U16INT8 quantization and pruning48.715.41039.0
CNN-BiLSTMInteli5-10210U16INT8 quantization and pruning47.117.2930.2
Standalone BiLSTMInteli5-10210U16INT8 quantization41.212.11322.3
Independent CNNInteli5-10210U16INT8 quantization39.510.81481.5
Table 7. PM2.5 prediction performance comparison (test sets).
Table 7. PM2.5 prediction performance comparison (test sets).
TypeRMSE (μg/m3)MAE (μg/m3)R2
There are horses4.313.450.72
SVR3.983.120.76
LSTM3.652.880.80
CNN-LSTM3.412.670.82
Our (CNN-BiLSTM-BKA)3.022.350.87
Table 8. Comparison of ablation experimental results (air quality dataset).
Table 8. Comparison of ablation experimental results (air quality dataset).
TypeRMSEPlumsR2Main Observations and Notes
BKA-CNN-BiLSTM0.840.620.947The complete model performed best, with significant synergies between components
CNN-BiLSTM1.040.780.918After removal, the performance of BKA optimization decreased significantly, highlighting the importance of hyperparameter optimization.
BKA-BiLSTM1.180.890.895Without the CNN module, the spatial feature extraction ability was insufficient and the error increased.
BKA-CNN1.321.020.869Only CNNs were retained, but the CNNs could not effectively simulate time dependence and performed the worst.
Table 9. Benchmark results comparison.
Table 9. Benchmark results comparison.
TypeRMSEPlumsR2Main Observations and Notes
BKA-CNN-BiLSTM0.840.620.947The model in this article has the best overall performance.
CNN-BiLSTM1.040.780.918The benchmark model is powerful but lacks intelligent hyperparameter optimization.
ST-Autoencoder1.100.830.908Temporal and spatial feature extraction is effective, but the prediction accuracy is low.
BiLSTM1.250.950.882Strong temporal modeling capabilities but cannot capture spatial features.
LSTM1.300.980.875Classic time-series model with stable performance.
Pure Transformers1.351.050.862The limited amount of data can hinder the realization of full potential.
There are horses1.651.280.801Linear models struggle to capture complex nonlinear spatiotemporal patterns.
Table 10. Definition of confusion matrix for extreme event prediction binary classification.
Table 10. Definition of confusion matrix for extreme event prediction binary classification.
Prediction Results (Model Output)True Situation (Observation Label)
Positive (Extreme Event)Negative (Normal Event)
Positive (extreme event)True Positive (TP)
The number of samples correctly predicted by the model as an extreme event
False Positive (FP)
The model error predicts the number of samples for normal events as extreme events
Negative (normal event)False Negative (FN)
The model error predicts the number of samples for extreme events as normal events
True Negative (TN)
The number of samples that the model correctly predicts as normal events
Table 11. Regression task performance metrics (mean ± standard deviation of test set).
Table 11. Regression task performance metrics (mean ± standard deviation of test set).
TypeRMSEPlumsR2 (Coefficient of Determination)
BKA-CNN-BiLSTM (ours)0.84 ± 0.050.62 ± 0.040.947 ± 0.008
CNN-BiLSTM1.04 ± 0.060.78 ± 0.050.918 ± 0.010
Standalone BiLSTM1.25 ± 0.080.95 ± 0.070.882 ± 0.015
Independent CNN1.38 ± 0.091.07 ± 0.080.856 ± 0.018
Table 12. Performance indicators of extreme weather event classification tasks (%).
Table 12. Performance indicators of extreme weather event classification tasks (%).
TypePrecisionPrecisionRecallF1 ScoreAUCSpecific
BKA-CNN-BiLSTM (ours)89.488.787.988.30.95191.2
CNN-BiLSTM85.183.584.283.80.91286.5
Standalone BiLSTM81.679.880.179.90.87483.4
Independent CNN78.376.077.576.70.84279.8
Table 13. Comparison of anti-interference performance.
Table 13. Comparison of anti-interference performance.
Noise LevelOriginal CNNBKA-CNNImprovement
10% missing15.218.7Growth of 23.0%
20% random noise0.730.89Growth of 21.9%
30% mixed jamming4.313.0229.9% pm
Table 14. Comparison of BKA-CNN-BiLSTM with traditional non-deep learning methods.
Table 14. Comparison of BKA-CNN-BiLSTM with traditional non-deep learning methods.
TypeRMSEPlumsMAPE (%)Extreme Event Accuracy (%)
There are horses2.591.9814.7063.71
SVR2.271.7611.6369.00
radio1.901.6210.1869.82
BKA-CNN-BiLSTM (ours)1.441.158.1789.01
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, L.; Dawod, A.Y.; Miao, F. Environmental Prediction Using a Spatiotemporal WSN: A New Method for Integrating BKA Optimization and CNN-BiLSTM. Appl. Sci. 2026, 16, 296. https://doi.org/10.3390/app16010296

AMA Style

Wu L, Dawod AY, Miao F. Environmental Prediction Using a Spatiotemporal WSN: A New Method for Integrating BKA Optimization and CNN-BiLSTM. Applied Sciences. 2026; 16(1):296. https://doi.org/10.3390/app16010296

Chicago/Turabian Style

Wu, Lin, Ahmad Yahya Dawod, and Fang Miao. 2026. "Environmental Prediction Using a Spatiotemporal WSN: A New Method for Integrating BKA Optimization and CNN-BiLSTM" Applied Sciences 16, no. 1: 296. https://doi.org/10.3390/app16010296

APA Style

Wu, L., Dawod, A. Y., & Miao, F. (2026). Environmental Prediction Using a Spatiotemporal WSN: A New Method for Integrating BKA Optimization and CNN-BiLSTM. Applied Sciences, 16(1), 296. https://doi.org/10.3390/app16010296

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop