A Comparative and Regional Study of Atmospheric Temperature in the Near-Space Environment Using Intelligent Modeling

Li, Zhihui; Han, Zhiming; Zhang, Huanwei; Liao, Qixiang

doi:10.3390/forecast8010001

Open AccessArticle

A Comparative and Regional Study of Atmospheric Temperature in the Near-Space Environment Using Intelligent Modeling

by

Zhihui Li

,

Zhiming Han

,

Huanwei Zhang

^* and

Qixiang Liao

^*

The College of Meteorology and Oceanology, National University of Defense Technology, Changsha 410073, China

^*

Authors to whom correspondence should be addressed.

Forecasting 2026, 8(1), 1; https://doi.org/10.3390/forecast8010001

Submission received: 6 August 2025 / Revised: 11 December 2025 / Accepted: 19 December 2025 / Published: 23 December 2025

(This article belongs to the Section Weather and Forecasting)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

Novel Hybrid ConvLSTM Model.
A ConvLSTM-based hybrid model integrating 3D convolution and a residual attention mechanism are proposed, effectively capturing spatiotemporal features of near-space atmospheric temperature with superior performance.
State-of-the-Art Prediction Accuracy.
The model achieves an RMSE of 2.187 K, a correlation coefficient (R) of 0.994, and a Mean relative error (MRE) of 0.67% on the test set, outperforming traditional CNN and sequential models like LSTM and GRU.
Seasonal and Vertical Error Analysis.
Prediction stability is higher in winter (RMSE < 1.23 K) than in summer (peak RMSE = 3.33 K), with systematic overestimation observed in the mesosphere (50–70 km), attributed to complex atmospheric processes and data resolution limitations.

What are the implications of the main findings?

Comprehensive Model Comparison.
Extensive experiments compare multiple architectures (CNN, FCN, LSTM, BiLSTM, GRU, and hybrid variants), demonstrating the advantage of spatiotemporal modeling over spatial-only or temporal-only approaches.

Abstract

The high-precision prediction of near-space atmospheric temperature holds significant importance for aerospace, national defense security, and climate change research. To address the deficiencies of extracting features in conventional convolutional neural networks, this paper designs a ConvLSTM hybrid model that combines the spatiotemporal feature extraction capability of 3D convolution with a residual attention mechanism, effectively capturing the dynamic evolution patterns of the near-space temperature field. The comparative analysis with various models, including GRU, shows that the proposed model demonstrates superior performance, achieving an RMSE of 2.433 K, a correlation coefficient R of 0.993, and an MRE of 0.76% on the test set. Seasonal error analysis reveals that the prediction stability is better in winter than in summer, with errors in the mesosphere primarily stemming from the complexity of atmospheric processes and limitations in data resolution. Compared to traditional CNNs and single time-series models, the proposed method significantly enhances prediction accuracy, providing a new technical approach for near-space environmental modeling.

Keywords:

multimodal data; deep learning; intelligent modeling and prediction; multidimensional comparison

1. Introduction

Near-space refers to the atmospheric region between 20 km and 100 km above the Earth’s surface, encompassing the stratosphere, mesosphere, and lower thermosphere [1,2]. The temperature trends in this region serve as indicators for global atmospheric climate change. The accurate characterization and reliable prediction of the near-space atmospheric temperature are of significant strategic importance for aerospace, national defense security, and climate change research [3,4,5,6,7]. Current research on near-space atmospheric temperature faces limitations due to multiple factors, primarily manifesting as single data sources, low precision, and weak data adaptability [8,9,10]. Therefore, leveraging the structural advantages of multi-source data and integrating multidimensional information sources using deep learning techniques is essential to comprehensively describe the complex processes of near-space atmospheric variations and enhance the model’s ability to represent multidimensional data. Although traditional numerical models have strict physical consistency and good interpretability, their operation usually requires huge computational resources and depends on parameterization schemes for complex physical processes, which contain significant uncertainties. Machine learning models, especially deep learning models, have proven to have excellent abilities in learning complex nonlinear mapping relationships from data, and their prediction speed is extremely fast, which is suitable for scenarios requiring rapid or large-scale applications.

In recent years, alongside the increase in available observational data, advancements in artificial intelligence have enriched and promoted the application of data-driven machine learning methods in the atmospheric sciences [11,12,13,14]. Charles et al. [15] integrated and analyzed various artificial neural networks (ANNs), systematically demonstrating ANNs’ capability to discover nonlinear relationships between input and output parameters, thus establishing their applicability for temperature prediction. Common traditional machine learning models include Long Short-Term Memory networks (LSTM) [16,17] and convolutional neural networks (CNNs) [18], which have achieved good results in short-term temperature prediction. Chen et al. [19] processed single-point temperature values from the MERRA-2 reanalysis dataset at a 20–50 km altitude during 1991–2000 to construct LSTM and BiLSTM deep learning networks, enabling the fine short-term forecasting (48 h). Tian et al. [20] combined a 3D convolutional neural network (3DCNN) with an attention-based gated recurrent network to propose a novel temperature prediction model (D3AT-LSTM) for daily air temperature prediction in the Yellow River Basin. The model shows higher prediction accuracy than other methods. In studies on atmospheric temperature profile retrieval, Backpropagation Neural Networks (BPNN) exhibit good data adaptability [21]. It was validated by Jiang et al. [3] using datasets from 31 locations in China, and the optimal model for atmospheric temperature profile retrieval achieved an MAE of less than 1.9 K and RMSE of less than 2.5 K, and R was approximately equal to 0.99. Hybrid deep learning models are widely used for short-term temperature prediction [22]. Gao et al. [23] proposed a Multiscale Large Kernel Spatiotemporal Attention Neural Network (MSLKSTNet), which is capable of simultaneously learning large-scale global features and small-to-medium-scale local features. Compared to the widely used Convolutional Long Short-Term Memory (ConvLSTM) temperature prediction model, it improved the MSE metric by 42% in temperature forecasting tasks. Sun et al. [24] constructed ConvLSTM and Convolutional Gated Recurrent Unit (ConvGRU) neural networks, analyzing a 10-year MERRA-2 dataset to develop short-term temperature and wind speed forecasting models within near-space. He et al. [25] used a CNN-LSTM hybrid model to achieve a 12 h short-term forecast of atmospheric temperature in the near space (50–90 km). The experiment shows that the CNN-LSTM model has better predictive performance than a single CNN or LSTM model. Cheng et al. [26] adopted a bidirectional deep network structure to extract reverse features, proposing a Bidirectional Multiscale Skip LSTM (BMS-LSTM) short-term temperature prediction model with an average prediction error of approximately 3.890 K, outperforming comparison models. Kreuzer et al. [27] used a ConvLSTM network based on data from five German weather stations for longer time-span predictions (6–24 h), concluding that multivariate neural networks outperform single-variable neural networks as the time span increases. Wang et al. [28] developed a machine learning algorithm that combines historical observations and ground measurements to retrieve atmospheric temperature profiles. The algorithm demonstrates good retrieval accuracy, with small retrieval biases, root mean square errors, and mean absolute errors at all heights.

In summary, the application of machine learning and deep learning for temperature prediction and retrieval is feasible and effective. The systematic comparative analysis of different deep learning model performances is crucial for improving model accuracy. Ihsan et al. [29] proposed the innovative hybrid models GRU-CNN and LSTM-CNN, comparing them with various traditional statistical and machine learning models for daily average air temperature prediction using multiple statistical criteria (MAE, RMSE, MSE, and R²), validating the innovative models while providing references for model selection. However, current comparative studies on models still suffer from the limitation of single data sources and lack multidimensional comparative validation.

How to conduct multidimensional comparisons of different deep learning hybrid models based on multimodal datasets is the core of this study. Due to the diversity of selected data sources, using a Fully Convolutional Network (FCN) can overcome the shortcomings of traditional feature extraction in conventional CNNs. At the same time, pure convolutional structures have high parallelism and fast training and inference speed, and are capable of efficiently handling long-term data. The hybrid model algorithms can comprehensively consider the spatial characteristics and temporal dependencies of the data, thereby better fitting global characteristics, breaking through the limitations of single information dimensions, and accurately predicting near-space atmospheric motion trends. This study aims to evaluate and compare several advanced architectures, including an FCN for enhanced spatial feature extraction, hybrid models that integrate multiple neural network paradigms, and ConvLSTM models for spatiotemporal modeling, to determine the strengths and weaknesses of each model. The technical approach and advantages of this paper is shown Figure 1.

2. Study Area and Data

2.1. Study Area Description

Changsha City, Hunan Province, was selected as the study area. Located in the lower reaches of the Xiangjiang River, it occupies a position on the western margin of the Liuyang-Changsha Basin [30]. The main stream of the Xiangjiang River divides it into two parts, forming an important hydrological framework. The city lies between 111°53′ E–114°15′ E and 27°51′ N–28°41′ N, exhibiting characteristics of mid-latitude complex underlying surfaces, uniform topography, and seasonal climate change. It features typical temperature variations: large fluctuations in spring, concentrated precipitation and sustained high temperatures in summer, and dry cold conditions in winter. This study focuses on the profile 27°30′ N–28°30′ N/111°30′ E–114°30′ E, which largely overlaps with the Changsha region. Although this model was developed and validated in a single city, its core architecture does not rely on parameters specific to Changsha. The input data required for the model is globally available. Therefore, this method has the potential to be applied to other regions. However, future research needs to be further validated and calibrated in cities of different climate zones and scales to comprehensively evaluate their universality. A schematic diagram of the Changsha area is shown in Figure 2.

2.2. Data Description

2.2.1. ERA5 Dataset Description

The ERA5 dataset is a fifth-generation global atmospheric reanalysis dataset from the European Centre for Medium-Range Weather Forecasts (ECMWF). It has a spatial resolution of 1 × 1° and is divided into 137 vertical levels, extending from the surface to an altitude of approximately 80 km. The ERA5 data assimilates multiple observational data sources, such as radiosondes, dropsondes, aircraft measurements, wind, temperature, and humidity observations, from numerous satellite platforms. This study selected all atmospheric temperature profile data and surface meteorological data averaged over the study area [31]. The models will be trained and analyzed using this average data. The detailed description of the ERA5 data used in this study is provided in Table 1.

Although the horizontal resolution of the ERA5 dataset is 1 × 1°, which may be considered moderate for regional-scale analysis, it remains well-suited for near-space temperature modeling for several reasons:

(1): The near-space atmosphere exhibits relatively homogeneous horizontal structures at synoptic scales, diminishing the necessity for very high horizontal resolution;
(2): ERA5 incorporates assimilation of multiple observational sources—including satellite data, radiosondes, and aircraft measurements—ensuring high data quality and physical consistency;
(3): The high vertical resolution (137 levels) adequately captures key thermal variations throughout the stratosphere and mesosphere.

2.2.2. FY-4A Dataset Description

The FY-4A (Fengyun-4A Satellite) is a new-generation geostationary meteorological satellite independently developed by China, launched in December 2016. It primarily serves weather monitoring, climate research, and disaster early warning. This study uses the atmospheric temperature profile product (AVP) obtained by the Geostationary Interferometric Infrared Sounder (GIIRS). This dataset has high precision, with a root mean square error (RMSE) of 1.5–2.5 K. Furthermore, the dataset covers altitudes from near-surface to approximately 85 km, encompassing the study area [32]. Hence, it was selected for comparative analysis with future processed data.

2.3. Data Preprocessing and Quality Control

The solar activity cycle (approximately 11 years) is a core indicator of solar activity intensity. Its variations directly impact the Earth’s climate system by influencing total solar irradiance, especially ultraviolet radiation. Considering the influence of solar activity on temperature, data spanning 22 years from 2002 to 2023 were selected for this study. This period of time covered two complete solar activity cycles, enabling subsequent research to deeply explore the modulating effects of solar-terrestrial energy coupling mechanisms on atmospheric thermal structure.

2.3.1. ERA5 Data Preprocessing

Using the 137-level ERA5 data, the relationship between the level number (‘lev’) and the geopotential height was obtained from the ERA5 official website, which was used to convert the level numbers to the heights. And it was revealed that the 50 levels were in the 20–80 km altitude range. Therefore, these 50 levels’ data was extracted, and the model level numbers were mapped to their corresponding geopotential heights for subsequent processing.

Horizontally, due to the narrow and elongated selected area (the latitude span is only 1°), a 0.5° buffer zone (27° N–29° N, 111° E–115° E) was retained during cropping. After interpolation, the data was cropped to the study region (27°5′ N to 28°5′ N and 111°5′ E to 114°5′ E). To improve horizontal resolution, the data was interpolated to a 0.25 × 0.25° regular grid using Cubic Spline Interpolation. The selection of this method was based on its recognized effectiveness and applicability in meteorological data processing: cubic spline interpolation can generate second-order continuous differentiable smooth surfaces, which are very suitable for atmospheric variables with continuous and smooth physical characteristics such as temperature and pressure. Previous studies have confirmed that in the quality control application of high-level sounding data, its improved algorithm can also achieve high accuracy (such as temperature RMSE < 0.08 K) [33]. In addition, analysis has shown that when using ERA5 data, the final results obtained by different interpolation schemes have comparable accuracy [34]. Therefore, the choice of cubic spline interpolation method in this study is reliable and reasonable. The specific process is illustrated in Figure 3.

In this study, ERA5 reanalysis data were used as the “ground truth” for model training, validation, and testing. To assess the reliability of ERA5 data within the study region and height range, we later compared it with temperature profiles retrieved from the FY-4A satellite. The results indicate that the two exhibit good consistency in overall trends and structures. Nevertheless, any model evaluation based on ERA5 should be understood as reflecting performance within the internal consistency framework of this dataset, with error metrics implicitly incorporating the inherent uncertainties of ERA5 itself.

2.3.2. FY-4A Data Preprocessing

The Level-2 atmospheric temperature profile product (ATP) from FY-4A GIIRS was downloaded via the National Satellite Meteorological Center data platform. Based on the Quality Flag (QF), only data points with QF = 0 were retained, while data with QF ≥ 1 (indicating errors) were discarded. Data for the required location (28° N, 112° E) was extracted and temporally averaged for subsequent comparative analysis. The specific process is illustrated in Figure 4. In this study, high-precision temperature profile data inverted by FY-4A satellite was mainly used to verify the feasibility and rationality of ERA5 data processed by the preprocessing process in this study. The purpose of this comparison was to evaluate whether the two different sources of datasets were generally consistent in terms of trend and magnitude of change in order to ensure that the basic data used for subsequent model training was reliable and physically consistent. The FY-4A profile used for comparison was the monthly average profile calculated from all high-quality (QF = 0) instantaneous observations of the current month, so as to make a meaningful comparison with the average data of ERA5 on the same time scale.

2.4. Evaluation Metrics

To effectively evaluate the models, this study selected four commonly used metrics to assess the inversion performance [35,36,37]: the root mean square error (RMSE), the Pearson Correlation Coefficient (R), the mean absolute error (MAE), and the mean relative error (MRE). These quantitatively evaluate the inversion accuracy. The RMSE and MAE primarily reflect the absolute accuracy of predicted values. The R focuses on whether the model can reproduce the variation trends and patterns of the true values. The MRE provides a relative assessment of accuracy. The four metrics are defined as follows:

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - x_{i})}^{2}}

(1)

R = \frac{\sum_{i = 1}^{N} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{2}} \sqrt{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}}

(2)

M A E = \frac{1}{N} \sum_{i = 1}^{N} | (y_{i} - x_{i}) |

(3)

M R E = \frac{1}{N} \sum_{i = 1}^{N} | \frac{(y_{i} - x_{i})}{x_{i}} |

(4)

where N represents the number of samples,

x_{i}

represents the true temperature value,

y_{i}

represents the temperature value inverted by the neural network,

\bar{x}

represents the average true temperature value, and

\bar{y}

represents the average temperature value inverted by the neural network.

2.5. Analysis of Processed Data

2.5.1. Monthly Average Atmospheric Temperature Profile Distribution

Based on the processed data, monthly average atmospheric temperature profile distribution maps for 2022 can be generated, as shown in Figure 5. The figure illustrates temperature variation characteristics corresponding to the different height layers. Overall, the temperature initially decreases, then increases, and subsequently decreases significantly with increasing altitude, conforming to the typical temperature distribution pattern of the atmosphere (especially, the stratosphere and mesosphere).

2.5.2. Spatial Autocorrelation Analysis

To verify if the data meets model training requirements, Moran’s I index is used for quantitative analysis. The index measures spatial autocorrelation strength and determines data suitability [38]. The figure shows that Moran’s I indices for all height layers are greater than 0.8, indicating strong global spatial autocorrelation and a non-random spatial structure throughout the atmospheric column. This pervasive spatial organization is a fundamental prerequisite for employing convolutional operations. However, it is noted that this global metric does not capture finer-scale local heterogeneities or anisotropic patterns, which are also present in near-space dynamics (e.g., gravity waves) and are learnable by the CNN architectures used in this study. Spatial autocorrelation analysis of processed data across vertical levels is shown in Figure 6.

2.5.3. Comparison of Temperature Profiles with FY-4A

To validate the effectiveness of the processed ERA5 data, we compare its atmospheric temperature profiles with independent retrievals from the FY-4A satellite at the location (28° N, 112° E) for January and July 2022 (Figure 7). The blue curve represents the monthly average profile derived from FY-4A instantaneous observations (strictly quality-controlled, QF = 0), while the orange curve represents the monthly averaged and spatially interpolated ERA5 data from this study.

As shown in Figure 7, the preprocessed ERA5 temperature profile shows a high degree of consistency with the FY-4A data in terms of overall morphology and trend, particularly within the 20–50 km stratospheric region. Quantitative analysis confirms this: in this layer, the average deviation (bias) between the two datasets is less than 1.5 K, and the root mean square error (RMSE) is approximately 2.0 K. This indicates that the ERA5 data can reliably capture the main characteristics of the atmospheric vertical structure, providing a valid foundation for model training.

However, significant differences emerge above 50 km in the mesosphere. The deviation increases markedly, with local temperature differences exceeding 5 K in July 2022. This discrepancy is attributed to three main factors: Inherent Spatiotemporal Representativeness Differences: FY-4A provides a point-scale monthly average of instantaneous soundings sensitive to local perturbations, whereas ERA5 produces a grid-scale monthly mean that smooths sub-grid variability through assimilation and interpolation. Complexity of Mesospheric Processes: Physical and dynamical processes above 50 km (e.g., gravity wave breaking, non-LTE effects) are challenging to accurately represent in reanalysis systems, leading to greater inherent uncertainty. Weaker Observational Constraints: The mesosphere lacks the dense observational network available for the stratosphere, making reanalysis products more model-dependent and uncertain in this region.

Consequently, the ERA5 data used as the “ground truth” in this study contains non-negligible uncertainty, particularly in the mesosphere. The model prediction errors reported hereafter (e.g., RMSE) therefore represent performance within the ERA5 reanalysis framework, reflecting the model’s ability to learn the spatiotemporal patterns present in this dataset.

3. Models and Methods

3.1. Model Principles

3.1.1. MLP Neural Network

The Multilayer Perceptron (MLP) is the most fundamental feedforward neural network, consisting of an input layer, several hidden layers, and an output layer, each containing multiple neurons. The signals are transmitted between these neurons via fully connected layers, using nonlinear activation functions (such as Rectified Linear Unit (ReLU), Sigmoid, etc.) to achieve complex mappings. The MLPs optimize weights through the backpropagation algorithm to minimize prediction error.

3.1.2. CNN Neural Network Algorithm

The convolutional neural network (CNN) is a type of feedforward neural network, a variant of MLP. Its core structures for extracting and processing input data include convolutional layers, pooling layers, and fully connected layers. The convolutional layers extract local features of the data by sliding filters (kernels), significantly reducing parameters through weight sharing. Pooling layers reduce spatial dimensions and enhance translation invariance. The CNN models exhibit good adaptability when processing complex data and possess strong pattern recognition capabilities, enabling them to handle multi-layer meteorological systems.

3.1.3. LSTM and BiLSTM Neural Network Algorithm Principles

Long Short-Term Memory networks (LSTM) are a special type of Recurrent Neural Network (RNN). The algorithms overcome the bottleneck of learning long-term dependencies by introducing a cell state and a three-gate mechanism(input, forget, and output gates). Bidirectional Long Short-Term Memory networks (BiLSTM) structurally employ two independent LSTM modules, which process the input data in forward and reverse temporal directions, respectively. The output vectors from feature extraction are concatenated to form the final output feature vector.

3.1.4. GRU Neural Network Algorithm Principles

The Gated Recurrent Unit (GRU) is a streamlined design based on LSTM. It merges the input and forget gates into a single update gate, reducing the number of gates. This results in higher computational efficiency and reduced memory usage.

3.1.5. ConvLSTM Neural Network Algorithm Principles

ConvLSTM (Convolutional Long Short-Term Memory) is a deep learning model designed for spatiotemporal sequence data prediction. Its core innovation lies in replacing the fully connected operations in a traditional LSTM with convolutional layers. This enables the model to employ convolutional kernels for capturing spatial local correlations (e.g., spatial patterns in temperature fields) while simultaneously processing temporal dependencies through gating mechanisms (input gate, forget gate, and output gate). This synchronous, end-to-end learning of spatiotemporal features makes ConvLSTM highly suitable for predictive tasks involving data with strong spatiotemporal evolution patterns, such as atmospheric temperature fields. The structure of a ConvLSTM cell, which implements this convolutional-gating hybrid approach, is illustrated in Figure 8.

3.2. Parameter Training

Hyperparameter tuning is essential for neural network models performing temperature prediction due to the intrinsic chaotic nature of meteorological systems and the complex spatiotemporal dependencies in climate data. The suboptimal hyperparameter configurations can amplify biases in modeling nonlinear atmospheric processes and certain configurations can lead to underfitting or overfitting, causing issues like vanishing gradients in recurrent architectures.

This study adopted a systematic and multi-strategy optimization method: First, we used manual search and a coarse-to-fine strategy to preliminarily tune basic models (such as CNN and LSTM) to determine the reasonable range of key hyperparameters (such as the number of layers and units). Subsequently, for more complex models such as GRU and ConvLSTM, we introduced an automated hyperparameter tuning tool based on Bayesian optimization, namely Keras Tuner, to efficiently explore the high-dimensional parameter space for a near-optimal configuration. The Bayesian optimization was configured with a search budget of 50 trials per model, using the expected improvement acquisition function, and all searches were repeated with fixed random seeds to ensure stability of the results. In addition, learning rate scheduling and early stopping methods were integrated into the training process of all models as effective methods for dynamic regularization and automated determination of training epochs. The core objective of the entire hyperparameter tuning process was to achieve the lowest RMSE on the validation set.

The following sections will elaborate on the optimization process and final parameters of each specific model in detail.

3.2.1. Training Samples

This study uses the data processed in Section 2, spanning 22 years from 2002 to 2023 (covering two sunspot cycles) of monthly averaged near-space atmospheric temperature profile data for the selected region. For training, validation, and test set division, data from 2002 to 2020 was used as the training set, data from 2021 to 2022 as the validation set, and data from 2023 as a separate test set.

3.2.2. CNN Model Training

After optimizing hyperparameters for the classic CNN model, its prediction performance was found to be poor (RMSE > 4.8 K). Consequently, the CNN structure was modified. Due to the small horizontal latitude/longitude range of the data (7 × 13 points), overfitting was severe. To mitigate this, the pooling layers were initially used, but the small input spatial dimensions risked excessive loss of spatial information. The compression in the longitude dimension yielded poor results, leading to the adoption of dilated convolutions. On the other hand, using only two convolutional layers resulted in insufficient feature extraction. To increase convolutional depth, residual connections (Residual Blocks) were introduced. The optimized model adopted an encoder–decoder structure, avoiding fully connected layers, making it a variant of a Fully Convolutional Network (FCN). The overall improvement strategy was as follows: Input (7, 13, 50) → Encoder (extracts multi-scale features) → Decoder (reconstructs target output, i.e., monthly average temperature profile) → Output (7, 13, 50). While searching for optimal parameters during experimentation, other parameters remained fixed. For faster comparison, the model performance was assessed after 40 training epochs. Considering prediction RMSE (K) and training time, the final parameters were selected as follows: eight channels for the first convolutional layer, kernel size of 3 × 3, and 128 neurons in the fully connected layer. Using this structure, it was trained for 100 epochs with early stopping (a widely used regularization strategy in deep learning training, patience = 20 for all models); the prediction RMSE reached 3.3634 K, showing significant optimization.

After hyperparameter optimization of the classical CNN model, we observed suboptimal prediction performance. The root cause lies in the traditional CNN architecture, which typically incorporates one or more fully connected layers at the end. These layers “flatten” the two-dimensional spatial feature maps extracted by convolutional layers into one-dimensional vectors for processing. This operation leads to two critical issues: Loss of Spatial Information: The flattening operation destroys the spatial structure of feature maps, preventing the model from utilizing the relative positions and spatial relationships between pixels in subsequent processing. For tasks like temperature field prediction that require outputs with explicit spatial structure, this is a significant disadvantage. Parameter Explosion and Overfitting: The enormous number of parameters in fully connected layers (proportional to input size and neuron count) readily triggers severe overfitting on small datasets (e.g., the 7 × 13 spatial grid in this study), compromising the model’s generalization capability.

To address these limitations, this study adopts a fully convolutional encoder–decoder network architecture. Specifically, our implementation replaces all fully connected layers with convolutional operations [39]. The encoder path extracts multi-scale spatial features through convolutional and downsampling layers, while the decoder path restores the spatial resolution to the original input dimensions via transposed convolutional layers (i.e., learnable upsampling). Skip connections are incorporated to fuse high-level semantic features from the decoder with fine-grained spatial details from the encoder, facilitating precise reconstruction of the temperature field. This architecture offers the following core advantages for our task:

Spatial Preservation: The network maintains the spatial dimensions of data throughout, enabling pixel-to-pixel (or grid-to-grid) prediction of the entire spatial temperature field, which is essential for preserving the spatial structure and contextual information from input to output.

Enhanced Generalization and Parameter Efficiency: By eliminating parameter-intensive fully connected layers, the model significantly reduces the total number of trainable parameters. This reduction substantially lowers the risk of overfitting on our dataset with a limited spatial grid (7 × 13 points) and improves model robustness. The inherent weight-sharing property of convolutional layers further enhances parameter efficiency.

Flexible Input Size: As the architecture relies solely on convolutional operations, it imposes no fixed constraints on input dimensions and can theoretically accommodate inputs of arbitrary spatial size, offering greater flexibility.

Thus, while built upon convolutional layers, this fully convolutional encoder–decoder structure is theoretically and practically more suited for spatially structured output tasks than traditional CNNs with fully connected heads. The experimental results of this study (an RMSE of 3.3634 K for this optimized FCN model, significantly outperforming the unoptimized classical CNN) corroborate this conclusion.

3.2.3. Time-Series Neural Network Model Training

The architectural hyperparameters for the time-series models were systematically determined prior to training. Given the structural similarities between the Long Short-Term Memory (LSTM) and Bidirectional LSTM (BiLSTM) networks, a two-layer stacked architecture was adopted as a common framework. A sequential grid search was employed to optimize the number of units in each layer. Initially, the second layer was fixed at 64 units while the number of units in the first layer was evaluated; an optimal size of 128 units was identified. Subsequently, with the first layer fixed at 128 units, the second layer was optimized, resulting in a final configuration of (128, 96) units. This architecture was subsequently applied to both the standard LSTM and BiLSTM models.

To enhance the model’s ability to focus on relevant temporal information, an encoder–decoder architecture incorporating an attention mechanism was implemented. Following the paradigm introduced by Bahdanau et al. [40], the model utilizes LSTM layers for both the encoder and decoder. The additive attention mechanism computes a context vector through a learnable alignment model, enabling a probabilistic “soft alignment” between the input and output sequences. The architectural hyperparameters for this model were optimized using a similar sequential search, yielding a final encoder configuration of (128, 64) units and a decoder configuration of (64, 128) units.

For the Gated Recurrent Unit (GRU) model, a more extensive hyperparameter search was conducted. Building upon an initial (128, 96) configuration, a bidirectional GRU architecture was optimized using a Bayesian optimization strategy via the Keras Tuner library over 20 trials. The finalized encoder architecture comprised 192 and 128 units in its first and second layers, respectively, while the decoder contained 128 and 64 units. A dropout layer with a rate of 0.2 was applied to the encoder input for regularization. This core architecture was further refined through the integration of the aforementioned attention mechanism [40] and dynamic learning rate scheduling.

All time-series models were trained under a unified configuration. The Mean Squared Error (MSE) served as the loss function, optimized using the Adam algorithm. To mitigate overfitting, early stopping was employed with a patience of 20 epochs, monitoring the validation loss. The models without attention (LSTM, BiLSTM) were trained for a maximum of 100 epochs, while the attention-based LSTM and the optimization stages for the GRU were trained for 50 epochs per major stage. The GRU model underwent an additional model compression phase involving distillation and quantization. The final prediction performance on the test set was as follows: the LSTM with attention achieved an RMSE of 2.7173 K, the BiLSTM model reached 2.6473 K, and the optimized GRU model attained the best performance with an RMSE of 2.4896 K, representing an improvement of over 0.3 K compared to its baseline configuration.

3.2.4. Training Time-Series Neural Networks with CNN Feature Extraction

Additional inversion results were obtained for the CNN-LSTM, CNN-BiLSTM, and CNN-GRU models, which combine CNN models with time-series models. In these models, the CNN is responsible for further extracting features from the input information, which are then fed into the time-series model to achieve better inversion results. Parameters are selected as the optimal parameters derived from the tuning of the respective models mentioned above. Early stopping is applied over 100 training epochs. The final prediction RMSE values were CNN-LSTM = 2.8639 K, CNN-BiLSTM = 2.8047 K, and CNN-GRU = 2.5987 K. The relatively poor performance (compared to pure time-series models) might be because explicit convolutional operations extract local spatial patterns, leading to loss of spatial structural information. Subsequent time-series neural networks (LSTM, etc.) then need to reconstruct high-dimensional features in low-dimensional space, causing errors to be amplified layer by layer. Additionally, the absence of an attention mechanism meant all timestep features were treated equally, which limited its capacity to focus on key meteorological events like cold wave outbreaks.

3.2.5. ConvLSTM Model Training

The ConvLSTM (Convolutional Long Short-Term Memory) model is used. This is a deep learning model combining the strengths of convolutional neural networks (CNNs) and Long Short-Term Memory networks (LSTM). It is primarily designed for processing spatiotemporal data, particularly suitable for tasks requiring consideration of both spatial features and temporal dependencies, such as the temperature prediction problem explored in this paper. Unlike traditional LSTM, which relies on fully connected layers for data transformation, the ConvLSTM uses convolutional layers to process spatial data, enabling better capture of spatial features within the input. The model employs convolutional operations in the computation of each gate, effectively extracting spatial features from the input data. Concurrently, convolutional operations reduce the model parameter count and the computational complexity by using the same kernel to compute features at different locations. The convolutional networks possess translation invariance, meaning that the same feature is detected regardless of its position, which is crucial for spatiotemporal sequence data.

The tuning process followed methods as described earlier, with training over 50 epochs. Specifically, stacked bidirectional ConvLSTM layers are used, with a 2 × 2 kernel size to capture spatiotemporal correlations in the temperature field. The first layer (128 ConvLSTM units) extracts local features, and the second layer (64 ConvLSTM units) extracts global features. A Convolutional Residual Attention (CRA-ConvLSTM) module is added to the encoder to dynamically assign importance weights to different spatiotemporal locations. The spatial attention focuses on key regions, especially boundary regions, which have a smaller spatial extent. The decoder’s node count corresponds to the encoder’s. The hierarchical ConvLSTM is employed, combined with skip connections to fuse features from different encoder layers, to alleviate issues like vanishing gradients. A 1 × 1 convolutional kernel is used at the end to compress channels. And the output predicted temperature field is recovered to the original spatial dimensions via Sigmoid activation. A cosine annealing strategy with warm restarts is used for the learning rate (initially, the LR = 1 × 10⁻⁴, and the LR is restarted every 20 epochs). Gradually decreasing the learning rate helps the model fine-tune its parameters toward their optimal values. Early stopping was applied, monitoring the validation RMSE with a patience of 20 epochs to prevent overfitting. The final prediction RMSE is 2.4330 K, which is the best value among all models. The final structures of all models are summarized in Table 2.

4. Results and Analysis

This section presents the experimental results and comparative analysis of the intelligent models for monthly average atmospheric temperature profile prediction. The performance of various deep learning architectures is first evaluated using quantitative metrics such as root mean square error (RMSE) to identify the most effective model for this specific dataset and task. Subsequently, a comprehensive visualization and analysis of the best-performing model’s predictions are provided to offer an intuitive assessment of its capabilities and error characteristics.

The test set RMSE (K) values for the different time-series neural networks used in this study are shown in Table 3. The ConvLSTM achieves the best prediction result with an RMSE of 2.4330 K, indicating its effectiveness in fusing spatiotemporal features and its superior performance in this temperature prediction task. The ConvLSTM significantly improves prediction accuracy by extracting spatial features through convolutional layers and leveraging LSTM’s temporal modeling capabilities. The GRU (2.4896 K) performs the second best, while also having fewer parameters and higher computational efficiency. The pure CNN model yields the highest RMSE (4.8023 K), indicating that spatial-only modeling is inadequate for capturing the temporal evolution of temperature. In comparison, the FCN-based improved architecture achieves a significantly lower RMSE (3.3634 K), demonstrating the effectiveness of its structural enhancements. Nevertheless, the overall performance highlights that spatial modeling must be integrated with temporal networks to better represent spatiotemporal processes. Additionally, the narrow and elongated shape of the study region may have further limited the spatial representation capacity of the pure CNN.

As shown in Figure 9, the predicted monthly temperature profile over 12 months indicates the lowest errors in winter (January–February), particularly in February (RMSE = 0.98 K); this is likely due to stable winter atmospheric conditions (e.g., frequent temperature inversions). Larger errors in May (3.33 K), June (3.05 K), and August (3.05 K) were likely related to enhanced convective activity and cloud cover interference. The intermediate errors in spring (March–April) and autumn (September–November) were possibly due to variable meteorological conditions (e.g., frontal activity). To address high errors in May–August, introducing attention mechanisms is suggested. For seasons with stable temperature changes, the model structures could be simplified to reduce computational costs.

Figure 10 displays a scatter plot comparing the predicted and observed monthly average temperature profiles for the target year produced by the optimal ConvLSTM model (refer to Table 2). The test dataset consists of 54,600 independent samples, derived from 50 vertical layers, 12 months, and 91 horizontal grid points (arranged in a 7 × 13 grid across the study region). The reference values correspond to the monthly averaged ERA5 reanalysis data, as described in Section 2.3.1. The correlation coefficient (R) between the predicted and observed values is 0.993.

To evaluate potential systematic bias, the distribution of data points relative to the line of perfect agreement (y = x) is examined. The scatter plot exhibits a largely symmetrical distribution around the y = x line, suggesting minimal systematic overestimation or underestimation. The root mean square error (RMSE) is 2.433 K, the mean absolute error (MAE) is 1.733 K, and the mean relative error (MRE) is 0.76%. Visual analysis reveals that prediction errors exhibit slightly greater bias in the mid-temperature range (~220–260 K) compared to the temperature extremes. The apparent sparsity of points in certain regions is attributable to substantial overlap due to the high data density and model precision.

The latitude-longitude distribution map of the mean error between the ConvLSTM’s predictions and the true temperature values is shown in Figure 11. The errors across the entire region range between 2.050 K and 2.225 K, with an extremely small range (difference of only 0.175 K). It indicates high prediction consistency of the model within this geographical scope, and is consistent with error fluctuation ranges reported in near-space temperature field studies (e.g., ±0.2 K). The absence of extreme outliers (e.g., >2 K or <1 K) validates the robustness of this model. The smooth color transition between 27.5° N and 29.0° N latitude suggests the model’s relatively balanced capability to capture the north–south temperature gradient.

Using the ConvLSTM model, the zonal and meridional average error distribution maps of the mean error between predictions and true values are generated. For the zonal average error map, the errors of all data points along the same longitude are averaged. Conversely, for the meridional average error map, the errors of all data points along the same latitude are averaged. Results are shown in Figure 12. The 50–70 km altitude interval in both zonal and meridional plots predominantly shows orange-red hues (error 2.0–3.0 K), indicating systematic overestimation of mesospheric temperature by the model. This is related to the frequent gravity wave activity and the complex thermodynamic processes in the mesosphere, compounded by the limited vertical resolution of ERA5 data (only 12 layers between the stratopause and mesosphere, ~50–80 km). The lowest errors occur at 20–33 km (error < 1.0 K), likely due to stable atmospheric stratification in the stratosphere and the good assimilation quality of ERA5 reanalysis data in this region. Near 28.2° N latitude, the errors are below 1.0 K in the 20–35 km height range. This benefits from the better assimilation constraints in the stratosphere (e.g., ozone sounding data fusion) and the ConvLSTM’s strong ability to capture stable stratification. The mechanistic analysis of error characteristics in this study—such as attributing lower errors in winter to stable stratification, higher errors in summer to enhanced convective activity, and the systematic overestimation in the mesosphere (50–70 km) to gravity wave complexity and coarse data vertical resolution—currently relies primarily on qualitative physical reasoning. A critical direction for future research involves employing quantitative diagnostic tools to rigorously validate these hypotheses.

5. Summary

5.1. Main Work and Conclusions

Focusing on the problem of predicting monthly average atmospheric temperature profiles in near-space (20–80 km), this study systematically constructed a high-precision spatiotemporal predicted framework through multi-source data fusion and deep learning model innovation. The main work and conclusions are as follows:

(1): Deep neural network model optimization and validation are conducted. An improved Fully Convolutional Network (FCN) is proposed to address the spatial information loss in traditional CNNs. Through introducing dilated convolutions and residual connections, the predicted RMSE is reduced from 4.8 K to 3.23 K. This proves that the accuracy of the CNN model is greatly improved after the modification of FCN. This study constructs LSTM, BiLSTM, GRU, and their attention mechanism variants. It is found that the LSTM encoder–decoder model with attention (RMSE = 2.71 K) significantly outperforms the base LSTM (RMSE = 2.75 K), validating the attention mechanism’s ability to focus on key meteorological events. And it is noted that the error of the hybrid model is significantly lower than that of the standalone CNN model, which proves the advantage of model mixing in dealing with spatiotemporal joint problems. The ConvLSTM-based hybrid architecture emerged as the most effective framework, achieving a minimal RMSE relative to ERA5 reanalysis data of 2.433 K, a near-perfect correlation (0.993), and a mean error approaching zero. This result confirms the high efficacy of synchronously capturing spatiotemporal features with 3D convolutions, augmented by advanced attention and decoding mechanisms, for achieving highly accurate and robust predictions in the complex near-space environment. It should be clearly pointed out that these outstanding performance indicators are the model’s performance within the ERA5 data system. They measure the model’s ability to learn and reproduce the spatiotemporal evolution patterns contained in ERA5. The “absolute prediction accuracy” of the model is fundamentally constrained by the uncertainty of the ERA5 data itself, especially in the intermediate layer above 50 km, where observations are sparse and physical processes are complex.
(2): Model performance analysis is performed. The differences in model performance across seasons and vertical layers are revealed: the lowest errors occur in winter (January–February, RMSE < 1.23 K), while errors surge in summer (May–August, peak RMSE = 3.33 K), which coincides with periods of enhanced convective activity and associated atmospheric instability. Systematic overestimation is identified in the mesosphere (50–70 km, error 2.0–3.0 K), which may be related to the complexity of gravity wave activity and insufficient data resolution. Errors are the lowest in the stratosphere (30–50 km, error < 1.0 K), a region characterized by stable atmospheric conditions and where high-quality assimilated data are generally available. It is also observed that the pure CNN models easily overfit in small grid (7 × 13) scenarios. Although architecturally optimized, it performs the worst (RMSE = 3.23 K), highlighting the necessity of joint spatiotemporal modeling.

5.2. Future Work

While this study has achieved certain results in predicting monthly average temperature profiles, there remains room for improvement:

(1): Further innovation in model algorithms is needed. Seasonally adaptive attention mechanisms are essential to enhance modeling, specifically for high-error summer periods. To explore hybrid models combining Transformer architectures with ConvLSTM, self-attention is used to capture long-range spatiotemporal dependencies. By introducing physics-constrained loss functions and embedding thermodynamic equations into network training, the physical consistency of predictions is enhanced.
(2): The model evaluation in this study is based on ERA5 reanalysis data as the reference truth. Although ERA5 undergoes rigorous assimilation processing and shows good consistency with other observational data (such as FY-4A), it still carries some degree of uncertainty. Therefore, the forecast errors reported in this paper include error components inherent to the ERA5 data itself. Future research will attempt to incorporate more independent observational datasets to further disentangle model errors from the uncertainties of the data itself.
(3): Regarding the research results, the next step will be to quantitatively analyze the physical processes that cause driving errors. This will be achieved by calculating objective meteorological metrics, such as Convective Available Potential Energy (CAPE) and Convective Inhibition (CIN), to statistically correlate atmospheric instability with the elevated prediction errors observed during summer months. Furthermore, to elucidate the systematic mesospheric bias, we plan to compute the vertical distribution of Gravity Wave Potential Energy (GWPE) by analyzing perturbations in high-resolution wind and temperature data. This will allow for a direct quantitative assessment of gravity wave activity’s specific contribution to the model’s overestimation. Finally, sensitivity experiments will be designed to systematically isolate and quantify the impact of ERA5’s vertical resolution on temperature prediction uncertainties in the mesosphere, thereby clarifying the relative role of data limitations versus model physics in shaping the observed biases.

Author Contributions

Conceptualization, Q.L. and H.Z.; methodology, Z.L. and Z.H.; validation, Q.L.; writing—original draft preparation, Z.L.; writing—review and editing, Z.H.; code implementation, Z.L. and Z.H.; Z.L. and Z.H. contributed equally to this work and are listed in alphabetical order. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 42275060, 42405065, 42474225, and 42305048, and by the Independent Innovation Science Fund of the National University of Defense Technology, grant number 24-ZZCX-JDZ-45 and 25-ZZCX-BC-10.

Data Availability Statement

The data of the paper is not made public.

Acknowledgments

We are grateful for the strong support from the Space Environment Intelligent Perception Team from the National University of Defense Technology. Thanks to the anonymous reviewers.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhao, X.R.; Sheng, Z.; Shi, H.Q.; Weng, L.B.; He, Y. Middle atmosphere temperature changes derived from saber observations during 2002-20. J. Clim. 2021, 34, 7995–8012. [Google Scholar] [CrossRef]
Sheng, Z.; He, Y.; Wang, S.C.; Chang, S.; Leng, H.; Wang, J.; Zhang, J.; Wang, Y.; Zhang, H.; Sui, H.; et al. Dynamics, chemistry, and modelling studies in the aviation and aerospace transition zones. Innovation 2025, 6, 101012. [Google Scholar] [CrossRef] [PubMed]
Li, B.; Cui, R.; Weng, L. Investigation of the solar activity and QBO effects on the near-space environment. Acta Geophys. 2025, 73, 2127–2136. [Google Scholar] [CrossRef]
Jiang, S.; Ma, Y.; Deng, F.; Lei, L. A deep learning framework for enhanced retrieval of atmospheric temperature and humidity profiles across China: Unifying inversion algorithms across multiple stations. Atmos. Res. 2025, 315, 107793. [Google Scholar] [CrossRef]
Qu, W.; Gong, W.; Chen, C.; Zhang, T.; He, Z. Optimization design and experimental verification for the mixed-flow fan of a stratospheric airship. Aerospace 2023, 10, 107. [Google Scholar] [CrossRef]
Yang, X.J.; Li, Y.F.; Wang, H.M.; Wang, Y.; FU, J. Bayesian and least square method for temperature inversion of adjacent space atmosphere based on oxygen A-band. Chin. J. Space Sci. 2021, 41, 769–777. [Google Scholar] [CrossRef]
Liao, Q.; Sheng, Z.; Guo, P. Research on the Wavelet Denoising Algorithm for Thorpe Analysis Based on the Radiosonde Data. Remote Sens. 2025, 17, 114. [Google Scholar] [CrossRef]
Zhu, J.W.; Zhou, C.; Zhao, Z.Y.; Liu, Y. Inversion of atmospheric temperature based on THz radiometer. Chin. J. Space Sci. 2025, 45, 125–134. [Google Scholar] [CrossRef]
Mooers, G.; Pritchard, M.; Beucler, T.; Ott, J.; Yacalis, G.; Baldi, P.; Gentine, P. Assessing the potential of deep learning for emulating cloud superparameterization in climate models with real-geography boundary conditions. J. Adv. Model. Earth Syst. 2021, 13, e2020MS002385. [Google Scholar] [CrossRef]
He, Y.; Zhu, X.; Sheng, Z.; He, M. Identification of stratospheric disturbance information in China based on the round-trip intelligent sounding system. Atmos. Chem. Phys. 2024, 24, 3839–3856. [Google Scholar] [CrossRef]
Niu, Z.Y.; Zou, X.L. A new machine learning approach for deriving atmospheric temperatures and typhoon warm cores from FY-3E MWTS-3 observations. J. Geophys. Res. Mach. Learn. Comput. 2024, 1, e2024JH000170. [Google Scholar] [CrossRef]
Daubechies, I.; DeVore, R.; Foucart, S.; Hanin, B.; Petrova, G. Nonlinear Approximation and (Deep) Networks. Constr. Approx. 2022, 55, 127–172. [Google Scholar] [CrossRef]
Wu, Y.; Sheng, Z.; Zuo, X. Application of deep learning to estimate stratospheric gravity wave potential energy. Earth Planet. Phys. 2022, 6, 70–82. [Google Scholar] [CrossRef]
Irrgang, C.; Boers, N.; Sonnewald, M.; Barnes, E.A.; Kadow, C.; Staneva, J.; Saynisch-Wagner, J. Towards neural Earth system modelling by integrating artificial intelligence in Earth system science. Nat. Mach. Intell. 2021, 3, 667–674. [Google Scholar] [CrossRef]
Johnstone, C.; Sulungu, E.D. Application of neural network in prediction of temperature: A review. Neural Comput. Appl. 2021, 33, 11755–11773. [Google Scholar] [CrossRef]
Liao, Q.; Mai, Y.; Sheng, Z.; Wang, Y.; Ni, Q.; Zhou, S. The Comparison of Long Short-Term Memory Neural Network and Deep Forest for the Evaporation Duct Height Prediction. IEEE Trans. Antennas Propag. 2023, 71, 4444–4450. [Google Scholar] [CrossRef]
Qin, C.; Gao, X.G.; Wan, K.F. Deep spatio-temporal convolutional long-short memory network. Acta Autom. Sin. 2020, 46, 451–462. (In Chinese) [Google Scholar] [CrossRef]
Yao, S.H.; Guan, L. Atmospheric temperature and humidity profile retrievals using a machine learning algorithm based on satellite-based infrared hyperspectral observations. Infrared Laser Eng. 2022, 51, 20210707. (In Chinese) [Google Scholar] [CrossRef]
Chen, B.; Sheng, Z.; Cui, F. Refined Short-Term Forecasting Atmospheric Temperature Profiles in the Stratosphere Based on Operators Learning of Neural Networks. Earth Space Sci. 2024, 11, e2024EA003509. [Google Scholar] [CrossRef]
Tian, T.; Wu, H.; Liu, X.; Hu, Q. D3AT-LSTM: An Efficient Model for Spatiotemporal Temperature Prediction Based on Attention Mechanisms. Electronics 2024, 13, 4089. [Google Scholar] [CrossRef]
Zhang, T.; Meng, S.; Jiang, G.; Ye, H. Retrieval of tropical cyclone temperature and humidity profiles from FY-3E MWTS and MWHS data using deep learning algorithm. In Proceedings Volume 12980, Proceedings of the Fifth International Conference on Geoscience and Remote Sensing Mapping (ICGRSM 2023), Lianyungang, China, 13–15 October 2023; SPIE: Washington, DC, USA, 2024; pp. 558–563. [Google Scholar] [CrossRef]
Wang, Y.; Huo, P.; Han, Y.; Chen, T.; Wang, X.; Wen, H. A survey of deep learning-based weather forecasting models. Comput. Sci. 2025, 52, 112–126. [Google Scholar]
Gao, F.; Fei, J.; Ye, Y.; Liu, C. MSLKSTNet: Multi-Scale Large Kernel Spatiotemporal Prediction Neural Network for Air Temperature Prediction. Atmosphere 2024, 15, 1114. [Google Scholar] [CrossRef]
Sun, X.; Zhou, C.; Feng, J.; Yang, H.; Zhang, Y.; Chen, Z.; Xu, T.; Deng, Z.; Zhao, Z.; Liu, Y.; et al. Research on Short-Term Forecasting Model of Global Atmospheric Temperature and Wind in the near Space Based on Deep Learning. Atmosphere 2024, 15, 1069. [Google Scholar] [CrossRef]
He, H.Z.; Gu, S.Y.; Qin, Y.S.; Liu, Y.X. Research on temperature forecasting in near space based on CNN-LSTM algorithm. Rev. Geophys. Planet. Phys. 2025, 56, 118–125. (In Chinese) [Google Scholar] [CrossRef]
Cheng, P.Y.; Zhao, J.; Han, L.Z.; Zhang, Y.Y.; Wu, Y.N. The short-term temperature prediction based on bidirectional multi-scale LSTM. J. Jiangxi Norm. Univ. (Nat. Sci.) 2022, 46, 134–139. [Google Scholar]
Kreuzer, D.; Munz, M.; Schlüter, S. Short-term temperature forecasts using a convolutional neural network—An application to different weather stations in Germany. Mach. Learn. Appl. 2020, 2, 100007. [Google Scholar] [CrossRef]
Wang, H.; Liu, D.; Xia, Y.; Xie, W.; Wang, Y. Retrieval of Atmospheric Temperature Profile from Historical Data and Ground-Based Observations by Using a Machine Learning Algorithm. Remote Sens. 2023, 15, 2717. [Google Scholar] [CrossRef]
Uluocak, I.; Bilgili, M. Daily air temperature forecasting using LSTM-CNN and GRU-CNN models. Acta Geophys. 2024, 72, 1991–2004. [Google Scholar] [CrossRef]
Zhou, W.; Chen, Z.; Li, R.; Xiao, S. Correlation analysis between weather and tourism based on network attention: Taking Changsha as an example. Meteorol. Sci. Technol. 2020, 48, 607–614. [Google Scholar] [CrossRef]
Jiang, S.H.; Wei, L.Y.; Ren, L.L.; Zhang, L.Q.; Wang, M.H.; Cui, H. Evaluation of IMERG, TMPA, ERA5, and CPC precipitation products over mainland China: Spatiotemporal patterns and extremes. Water Sci. Eng. 2023, 16, 45–56. [Google Scholar] [CrossRef]
Kim, M.; Kim, J.; Lim, H.; Lee, S.; Cho, Y.; Chan, P.W. Implementation of the Yonsei Aerosol retrieval algorithm in the GK-2A/AMI and FY-4A/AGRI remote-sensing systems. AIP Conf. Proc. 2024, 2988, 050002. [Google Scholar] [CrossRef]
Yang, R.F.; Ma, R.Q.; Zhang, G.L. Quality control of radiosonde data based on improved cubic spline interpolation. J. Trop. Meteorol. 2023, 39, 183–192. [Google Scholar]
Li, R.H.; Yuan, Y.B.; Zhang, H.X. Applicability analysis of ERA5 in regional tropospheric research. J. Navig. Position. 2021, 9, 29–37. [Google Scholar] [CrossRef]
Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE)? —Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
Hodson, T.O. Root-mean-square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. 2022, 15, 5481–5487. [Google Scholar] [CrossRef]
Mao, X.; Zhuang, Y. Spatiotemporal heterogeneity of carbon emission intensity distribution in the tourism industry and its calculation methods. Sustain. Energy Res. 2025, 12, 14. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Bahdanau, D.; Cho, K.H.; Bengio, Y. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar]

Figure 1. The technical approach and advantages of this paper.

Figure 2. Schematic diagram of the Changsha area. The map delineates the core study domain (red triangle: 27°30′ N–28°30′ N, 111°30′ E–114°30′ E) and the extended buffer zone used for data processing.

Figure 3. Preprocessing flowchart for ERA5 data. Key steps include (a) extraction of 50 pressure levels between 20 and 80 km, (b) conversion from model levels to geopotential height, (c) spatial cropping with a 0.5° buffer, (d) cubic spline interpolation to a 0.25 × 0.25° grid, and (e) final cropping to the study region.

Figure 4. Preprocessing flowchart for FY-4A data. The process involves (a) downloading Level-2 ATPs, (b) applying quality control (QF = 0), (c) extracting data for the target location (28° N, 112° E), and (d) temporal averaging for comparison with ERA5.

Figure 5. Monthly average atmospheric temperature profile distribution for 2022 at (28° N, 112° E).

Figure 6. Spatial autocorrelation analysis of processed data across vertical levels. The vertical line segments represent the interannual fluctuation range (minimum to maximum), with the center point indicating the multi-year average for each level. The color bar scale, from 0 (dark blue) to 49 (yellow-green), indicates vertical levels from low to high (surface to upper atmosphere).

Figure 7. Comparison of temperature profiles between FY-4A satellite inversion and ERA5 reanalysis at (28° N, 112° E) in January and July 2022. (a) Results for January 2022. (b) Results for July 2022. The FY-4A data is the result of strict quality control (QF = 0); the ERA5 data is monthly average data and interpolated to this point.

Figure 8. Schematic diagram of a ConvLSTM cell. Schematic notation: solid arrows show data flow; (*) is element-wise multiplication; (·) is vector concatenation. At time step t, the input

X_{i}

and the previous hidden state

h_{t - 1}

, are processed through convolutional operations within the gates (

f_{t}, t_{t}, o_{t}

to update the cell state

c_{t}

and output the new hidden state

h_{t}

.

Figure 8. Schematic diagram of a ConvLSTM cell. Schematic notation: solid arrows show data flow; (*) is element-wise multiplication; (·) is vector concatenation. At time step t, the input

X_{i}

and the previous hidden state

h_{t - 1}

, are processed through convolutional operations within the gates (

f_{t}, t_{t}, o_{t}

to update the cell state

c_{t}

and output the new hidden state

h_{t}

.

Figure 9. Monthly average temperature profile prediction results. The gray shaded area highlights the high-error period, identified as months where RMSE exceeded 3.0 K. Specific RMSE values for February (minimum) and May (maximum) are annotated on the plot.

Figure 10. Scatter plot of prediction results vs. true temperature values. The dashed line is the linear regression fit. Key statistics (N, R, RMSE, MAE, MRE) are shown.

Figure 11. Latitude–longitude grid distribution of mean error between prediction results and true temperature values.

Figure 12. Zonal and meridional average error distribution of mean error between prediction results and true temperature values.

Table 1. ERA5 dataset parameter description.

Data Properties	Value
Projection mode	Conventional latitude and longitude grid
Spatial range	27°30′ N–28°30′ N, 111°30′ E–114°30′ E
Original spatial resolution	1 × 1°
Original temporal resolution	3 h
Vertical range	137 floors (global), select 50 floors with heights ranging from 20 to 80 km
Time range	2002 to 2023

Table 2. Structure and parameters of different time-series neural networks.

Name of Temporal Neural Network	Structure and Parameters
CNN	The number of channels for the first layer of the parameter convolutional layer is set to 8, the size of the convolutional kernel is determined to be 3 × 3, and the number of neurons in the fully connected layer is 128.
LSTM and BiLSTM	The number of nodes in the encoder part is (128, 64) and in the decoder is (64, 128).
GRU	The number of GRUs in the encoder part is 192 in the first layer and 128 in the second layer. The number of GRU units in the decoder part is 128 in the first layer and 64 in the second layer.
ConvLSTM	The number of nodes in the encoder part is (128, 64) and in the decoder is (64, 128).
CNN-LSTM & CNN-BiLSTM	The size of CNN partial convolution kernel is determined to be 3 × 3, and the number of neurons in the whole connection layer is 128. The number of nodes in the encoder part of the sequential neural network is (128, 64) and in the decoder is (64, 128).
CNN-GRU	The size of CNN partial convolution kernel is determined to be 3 × 3, and the number of neurons in the whole connection layer is 128. The number of GRUs in the encoder part is 192 in the first layer and 128 in the second layer. The number of GRU units in the decoder part is 128 in the first layer and 64 in the second layer.

Table 3. Network training reference metrics for different time-series neural networks: test set RMSE (K).

Name of Temporal Neural Network	Test Set RMSE (K)
CNN	4.8023
FCN	3.3634
LSTM	2.7173
BiLSTM	2.6473
GRU	2.4896
ConvLSTM	2.4330
CNN-LSTM	2.8639
CNN-BiLSTM	2.8047
CNN-GRU	2.5987

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Z.; Han, Z.; Zhang, H.; Liao, Q. A Comparative and Regional Study of Atmospheric Temperature in the Near-Space Environment Using Intelligent Modeling. Forecasting 2026, 8, 1. https://doi.org/10.3390/forecast8010001

AMA Style

Li Z, Han Z, Zhang H, Liao Q. A Comparative and Regional Study of Atmospheric Temperature in the Near-Space Environment Using Intelligent Modeling. Forecasting. 2026; 8(1):1. https://doi.org/10.3390/forecast8010001

Chicago/Turabian Style

Li, Zhihui, Zhiming Han, Huanwei Zhang, and Qixiang Liao. 2026. "A Comparative and Regional Study of Atmospheric Temperature in the Near-Space Environment Using Intelligent Modeling" Forecasting 8, no. 1: 1. https://doi.org/10.3390/forecast8010001

APA Style

Li, Z., Han, Z., Zhang, H., & Liao, Q. (2026). A Comparative and Regional Study of Atmospheric Temperature in the Near-Space Environment Using Intelligent Modeling. Forecasting, 8(1), 1. https://doi.org/10.3390/forecast8010001

Article Menu

A Comparative and Regional Study of Atmospheric Temperature in the Near-Space Environment Using Intelligent Modeling

Highlights

Abstract

1. Introduction

2. Study Area and Data

2.1. Study Area Description

2.2. Data Description

2.2.1. ERA5 Dataset Description

2.2.2. FY-4A Dataset Description

2.3. Data Preprocessing and Quality Control

2.3.1. ERA5 Data Preprocessing

2.3.2. FY-4A Data Preprocessing

2.4. Evaluation Metrics

2.5. Analysis of Processed Data

2.5.1. Monthly Average Atmospheric Temperature Profile Distribution

2.5.2. Spatial Autocorrelation Analysis

2.5.3. Comparison of Temperature Profiles with FY-4A

3. Models and Methods

3.1. Model Principles

3.1.1. MLP Neural Network

3.1.2. CNN Neural Network Algorithm

3.1.3. LSTM and BiLSTM Neural Network Algorithm Principles

3.1.4. GRU Neural Network Algorithm Principles

3.1.5. ConvLSTM Neural Network Algorithm Principles

3.2. Parameter Training

3.2.1. Training Samples

3.2.2. CNN Model Training

3.2.3. Time-Series Neural Network Model Training

3.2.4. Training Time-Series Neural Networks with CNN Feature Extraction

3.2.5. ConvLSTM Model Training

4. Results and Analysis

5. Summary

5.1. Main Work and Conclusions

5.2. Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI