Enhancing the Usability of CALIPSO Low-Confidence Cloud Products Using a Multilayer Perceptron-Based Data Refinement Framework

Luo, Xiaolu; Song, Wenkai; Yan, Shiqi; Zhang, Miao; Han, Ge

doi:10.3390/atmos17040413

Open AccessArticle

Enhancing the Usability of CALIPSO Low-Confidence Cloud Products Using a Multilayer Perceptron-Based Data Refinement Framework

by

Xiaolu Luo

^1,2,

Wenkai Song

³,

Shiqi Yan

³,

Miao Zhang

³

and

Ge Han

^2,*

¹

College of Resources and Environment, Yangtze University, Wuhan 430100, China

²

Perception and Effectiveness Assessment for Carbon-Neutrality Efforts, Engineering Research Center of Ministry of Education, Institute for Carbon Neutrality, Wuhan University, Wuhan 430072, China

³

Engineering Research Center of Environmental Laser Remote Sensing Technology and Application of Henan Province, Nanyang Normal University, Nanyang 473061, China

^*

Author to whom correspondence should be addressed.

Atmosphere 2026, 17(4), 413; https://doi.org/10.3390/atmos17040413

Submission received: 16 March 2026 / Revised: 14 April 2026 / Accepted: 14 April 2026 / Published: 18 April 2026

(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)

Download

Browse Figures

Versions Notes

Abstract

The CALIPSO V4.10 5 km cloud-layer product contains a small yet influential fraction of low-confidence and “unknown” cloud-type labels, which constrains its effectiveness in climatological analyses and limits its utility for downstream Earth system applications. To improve the practical usability and completeness of these observations, this study develops a multilayer perceptron (MLP)-based refinement framework using global summer daytime CALIPSO data from 2006–2021. High-confidence cloud samples (76% of the dataset), defined as cases with high Feature Type QA and high Ice/Water Phase QA, were used as the reliable supervision subset to train the MLP model using 11 geolocation-, optical-, and microphysics-related variables, including cloud optical depth, cloud thickness, depolarization ratio, and color ratio. The trained model was subsequently applied to a separately defined low-confidence cloud subset (~5% of the dataset), consisting of cases with high Feature Type QA but low Ice/Water Phase QA, of which over 60% were originally labeled as “unknown”, to generate probabilistic assignments of three cloud types: ice clouds, water clouds, and oriented ice crystals. Evaluation using withheld high-confidence samples indicates a strong level of agreement with operational CALIPSO classifications (~94.99%). Moreover, the refined low-confidence results exhibit physically coherent vertical structural characteristics consistent with established cloud thermodynamic regimes. It is emphasized that the proposed framework does not establish an independent physical truth beyond CALIOP’s measurement capability; instead, it provides a physically consistent and statistically robust approach to improving the completeness and practical usability of CALIPSO cloud-type products for large-scale scientific and modeling applications.

Keywords:

CALIPSO; cloud classification; multilayer perceptron (MLP); machine learning; remote sensing

1. Introduction

Clouds play a fundamental role in the Earth system by regulating radiative balance, modulating the hydrological cycle, and influencing climate variability and change [1,2,3]. Accurate cloud classification is therefore essential for quantifying cloud radiative forcing, understanding cloud feedback mechanisms, and improving the representation of clouds in weather and climate models [4,5,6]. The Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) mission, carrying the Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP), has provided unprecedented global active-lidar observations of cloud vertical structure and optical properties [7,8,9]. In the CALIPSO Version 4.10 5 km cloud-layer product, the classification of ice clouds, water clouds, and oriented ice crystals has been improved through updates to the operational cloud classification algorithm [10]. Nevertheless, a small but non-negligible fraction of observations remains classified with low confidence, and a large proportion of these cases are labeled as “unknown”. Although these samples account for only a minority of the total dataset, they are concentrated in atmospherically and observationally challenging situations, such as weak backscatter conditions, ambiguous phase transitions, or structurally complex cloud scenes, thereby reducing the completeness and downstream usability of the product.

Accurate identification of cloud type is a prerequisite for understanding cloud–climate effects because different cloud categories exhibit distinct radiative and microphysical characteristics [11,12]. For example, high-level ice clouds influence the Earth’s energy budget mainly through longwave greenhouse effects, whereas low-level water clouds strongly modulate incoming solar radiation through their albedo effect [13]. Oriented ice crystals, although less frequent, can substantially modify optical properties and lidar scattering behavior [14]. Consequently, ambiguous or low-confidence cloud labels may introduce uncertainty into climatological statistics, regional cloud-type occurrence analyses, and subsequent model evaluation or data-assimilation applications. From a product perspective, the key challenge is therefore not simply to maximize classification accuracy in already well-labeled samples, but to provide physically reasonable and practically usable cloud-type information for those observations for which the operational algorithm is uncertain.

In recent years, machine learning has been increasingly applied to cloud detection and classification in satellite remote sensing. Representative studies have demonstrated that machine-learning methods can improve cloud detection and thermodynamic-phase classification from passive satellite observations, often using CALIOP products as reference labels for model development and evaluation [15,16,17]. In parallel, CALIOP/CALIPSO observations have also been used in machine-learning studies for broader atmospheric feature recognition tasks, such as aerosol subtype identification or cloud–aerosol scene classification, rather than for post-processing refinement of the operational cloud-type product itself [18,19]. These studies demonstrate the strong potential of machine learning for learning complex nonlinear relationships among satellite-observed variables. However, most existing efforts have focused either on using CALIOP-based labels to support cloud classification for other sensors, or on performing general cloud or aerosol recognition tasks from satellite measurements. In contrast, relatively limited attention has been paid to the refinement of low-confidence and “unknown” labels within the CALIPSO cloud-type product itself. This represents a distinct product-oriented research gap: how to improve the completeness and usability of low-confidence CALIPSO cloud observations when no independent truth labels are available for those ambiguous cases.

To address this issue, this study develops a multilayer perceptron (MLP)-based probabilistic refinement framework for low-confidence CALIPSO cloud-type observations. Using global summer daytime CALIPSO data from 2006 to 2021, we train the model on the high-confidence subset and then apply it to low-confidence samples to infer the most probable cloud type among ice clouds, water clouds, and oriented ice crystals. It should be emphasized that the purpose of this framework is not to replace the CALIOP operational algorithm or to establish an independent physical truth beyond CALIOP’s observational capability. Rather, the goal is to learn robust statistical relationships from high-confidence samples and use them to provide physically consistent and probabilistically interpretable cloud-type assignments for low-confidence cases. The main contributions of this work are threefold: (1) formulating CALIPSO low-confidence cloud-type refinement as a product-oriented probabilistic classification problem; (2) developing an MLP-based framework that integrates geolocation, optical, and polarization-related variables from CALIOP observations; and (3) evaluating the refined results from both statistical consistency and physical plausibility perspectives, thereby demonstrating their value for enhancing the completeness and practical usability of the CALIPSO cloud-type product [20,21].

2. Materials and Methods

2.1. Data Preprocessing

For this study, we used the CALIPSO Version 4.10 5 km cloud-layer product acquired by the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) mission, which carries the Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP). The dataset was obtained from the official CALIPSO data archive and includes global summer daytime observations from 2006 to 2021, which provided ample samples for model training and validation. Summer daytime observations were selected to reduce seasonal heterogeneity in cloud phase and vertical structure, thereby establishing a relatively homogeneous and well-sampled dataset for this initial low-confidence cloud-type refinement experiment. The CALIOP lidar has a unique advantage in cloud detection and classification, using atmospheric backscatter signals and polarization measurements [7]. CALIOP not only provides precise cloud location information (layer heights) but also retrieves cloud microphysical properties, offering a rich observational basis for cloud type identification [22].

To ensure data quality and effective model training, a dedicated data preprocessing workflow was designed. During feature engineering, all input variables were standardized using Z-score normalization to center them to zero mean and scale them to unit variance. This normalization removes differences in feature magnitude, which helps speed up model convergence, improves numerical stability, and prevents features with larger scales from dominating the training.

We combined physical insight with statistical analysis for feature selection. Through an in-depth understanding of CALIOP’s measurement principles and relevant cloud physical processes, we identified 11 discriminative feature parameters as model inputs. These features included geolocation (longitude and latitude) and cloud top and base heights to describe cloud location; cloud optical depth and cloud geometric thickness to characterize cloud physical properties; and cloud depolarization ratio and color ratio to reflect cloud microphysical properties [23,24], among others.

From the dataset, we defined a strict high-confidence subset (76% of all samples) for model training and a strict low-confidence subset (~5% of all samples) as the target set for reclassification evaluation. These two groups do not represent a complete binary partition of the dataset. Samples with intermediate or otherwise non-selected confidence conditions, which did not satisfy our criteria for either reliable supervision or low-confidence refinement, were excluded from both subsets; these excluded samples account for approximately 19% of the total dataset. Confidence was determined according to the CALIOP quality-assessment framework associated with feature identification and ice–water phase assignment. The high-confidence subset was randomly divided into training and validation sets at an 80:20 ratio using a fixed random seed. The original training data were markedly imbalanced, with 36.75% ice clouds (~12,043,202 samples), 62.67% water clouds (~20,536,500 samples), and only 0.58% oriented ice crystals (~190,916 samples). To mitigate the adverse effects of this imbalance, the Synthetic Minority Oversampling Technique (SMOTE) was applied to the minority class, improving the model’s ability to learn rare cloud categories [25]. This procedure increased the representation of the minority class without reducing the number of samples in the majority classes.

The definition of high- and low-confidence samples follows the CALIOP cloud-phase and quality-assessment framework. In the Version 4 CALIOP product, cloud thermodynamic phase is determined from physically meaningful lidar observables, primarily cloud-layer depolarization and attenuated backscatter, together with auxiliary information such as cloud-layer temperature and, where applicable, color-ratio- and viewing-angle-related criteria. When phase discrimination is uncertain, the product assigns lower confidence or reports an “unknown” phase.

Based on this framework, the present MLP model was designed as a probabilistic refinement tool for low-confidence observations rather than as a direct replication of the operational CALIOP decision tree. The CALIOP confidence indicators were used only to define the sample subsets and were not included as input variables or as an auxiliary information layer in the MLP. The selected input variables describe cloud optical, geometric, and microphysical properties relevant to lidar phase discrimination, while the high-confidence subset provides the supervision source and the low-confidence subset serves as the refinement target. In this sense, the model is data-driven, but its sample construction and feature design remain closely linked to the physical and product-level logic of CALIOP.

2.2. MLP Neural Network Model Design

A deep neural network based on a multilayer perceptron was designed to meet the needs of CALIPSO cloud classification. The network uses a progressive dimensionality reduction structure (illustrated in Figure 1), which controls model complexity while maintaining feature extraction capability. The input layer receives an 11-dimensional normalized feature vector, derived from CALIPSOs, which includes physical, optical, and structural variables describing cloud and aerosol properties (summarized in Table 1). These input features are transformed through three hidden layers, and the final output layer produces a probability distribution over the three cloud categories. Such a hierarchical architecture supports incremental representation learning and facilitates stable model performance when integrating heterogeneous remote-sensing parameters.

To provide an empirical basis for selecting the MLP architecture, we performed a structured hyperparameter search on a stratified high-confidence subset from a representative year of the CALIPSO dataset. The search space included hidden-layer architecture, dropout configuration, initial learning rate, L2 regularization coefficient, batch size, and SMOTE oversampling multiplier. A total of 23 valid hyperparameter trials were evaluated. The final configuration was selected primarily according to validation Macro-F1, with weighted F1-score, overall accuracy, and computational cost considered as supplementary criteria.

Based on the hyperparameter-search results (Figure 2), the final MLP architecture adopted in this study balances predictive performance and computational efficiency for the present task. Although Trial 2 achieved the highest validation Macro-F1, Trial 13 was selected as the final configuration because it provided comparable validation performance with substantially lower training cost. Specifically, the selected architecture reduced training time from 39.33 s to 18.75 s while maintaining a validation Macro-F1 of 0.9295. The final model therefore uses a progressively reduced hidden-layer structure (256 → 128 → 64). The first hidden layer contains 256 neurons—a broad layer to capture rich input feature representations. The second hidden layer has 128 neurons, beginning to filter and combine features, and the third hidden layer has 64 neurons, achieving higher-level feature abstraction and compression. Each hidden layer is followed by a batch normalization layer to accelerate training and improve stability. We use the Rectified Linear Unit (ReLU) activation function for all hidden neurons. Mathematically,

ReLU (x) = \max (0, x)

, which means the output is

x

, if

x > 0

and 0 otherwise. This simple non-linear function is computationally efficient and helps mitigate the vanishing gradient problem. To prevent overfitting, dropout layers are inserted after each hidden layer with dropout rates of 0.4, 0.3, and 0.2 for the first, second, and third layers respectively. This decreasing dropout schedule maintains regularization while avoiding excessive suppression of higher-level features. The output layer uses a softmax function to produce a probability distribution

p_{i}

over the three classes (ice cloud, water cloud, oriented ice crystal), where

p_{i} = \frac{e^{z_{i}}}{\sum_{j = 1}^{3} e^{z_{j}}}, i = 1, 2, 3,

(1)

And

z_{i}

is the network’s raw output for class

i

. This ensures

p_{1} + p_{2} + p_{3} = 1

, providing a clear categorization as well as a confidence level for each prediction.

For model training, we adopted a high-performance computing framework and the Adam optimizer. We used a batch size of 1024, which balances computational efficiency with convergence stability. The loss function was a weighted cross-entropy to account for class imbalance, combined with a regularization term. Specifically, let

y_{i j}

denote the one-hot encoded ground-truth label for sample

i

and class

j

, such that

y_{i j} = 1

if sample

i

belongs to class

j

, and

y_{i j} = 0

otherwise. Thus,

y_{i j}

is a discrete binary indicator variable. Let

p_{i j}

denote the predicted probability for sample

i

belonging to class

j

. The weighted cross-entropy loss is then defined as:

L_{weighted} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{3} w_{j} y_{i j} \log (p_{i j})

(2)

where

N

is the batch size and

w_{j}

is a weight for class

j

. The weight

w_{j}

is set inversely proportional to the class frequency,

w_{j} = \frac{N_{total}}{3 N_{j}}

, with

N_{j}

being the number of samples of class

j

(ensuring rarer classes receive higher weight). In our dataset, the sample distribution after oversampling roughly balanced the three classes, but we still retained slight weighting to account for any residual imbalance [25]. We also included an

L_{2}

regularization (weight decay) term to penalize large weights and further prevent overfitting. The total loss was the sum of the weighted cross-entropy and the regularization term.

Regarding the sensitivity to input features, the proposed classification framework is designed to rely on a set of physically meaningful and complementary variables. By jointly learning from these heterogeneous features, the MLP is expected to form class-discriminative representations based on feature combinations rather than on any single predictor. In addition, all input features are normalized prior to training, and regularization mechanisms such as batch normalization and dropout are applied to reduce excessive dependence on individual variables. As a result, the model is intended to be robust to moderate variability in any single input feature while maintaining physically interpretable classification behavior.

Our training strategy included adaptive learning rate scheduling and early stopping. The initial learning rate was set to 0.001. We reduced the learning rate to 10% of its value every 10 epochs (i.e.,

η_{t} = 0.1 \times η_{t - 10}

) and employed an early stopping criterion: training was halted if the validation loss did not improve for 5 consecutive epochs. To monitor training progress and facilitate model selection, we evaluated the model on the validation set every 30 iterations and recorded key metrics (loss, accuracy, etc.) throughout the training. This comprehensive training strategy ensured efficient convergence and a reliable final model.

3. Results

3.1. MLP Model Validation

To further evaluate the suitability of the selected MLP for the present task, we compared it with two widely used machine learning baselines—Random Forest (RF) and Support Vector Machine (SVM)—using the same high-confidence subset, train/validation/test split and evaluation metrics (Table 2). The results of this comparison showed that the MLP achieved classification performance comparable to the two baselines, while maintaining the lowest computational cost of the three models. Therefore, it was chosen as the final framework for the subsequent large-scale refinement experiments, as it provided the best overall trade-off between classification performance and computational efficiency.

To evaluate the performance of the MLP model on the cloud classification task, we established a comprehensive evaluation framework. The evaluation encompassed two main aspects. (1) Probability Distribution Analysis: We analyzed the model’s predicted probability distribution for each sample across the three categories (ice, water, oriented ice) to assess classification confidence and uncertainty. (2) Confusion Matrix and Metrics: We constructed confusion matrices, counting true positives, false positives, false negatives, and true negatives for each category. From these, we computed precision, recall, and F1-score for each class as key performance indicators. This multi-level evaluation provides a robust basis for validating model performance.

The validation results demonstrate that the MLP neural network achieved excellent classification performance. According to the confusion matrix (Figure 3), the model performed very well for the major cloud types. For the ice cloud category (Class 1), the accuracy reached 98.8%; only ~1.1% of ice cloud samples were misclassified, indicating a high recognition rate. The water cloud category (Class 2) had the best performance, with 92.9% accuracy—only 7.1% of water cloud samples were misclassified. For the oriented ice crystal category (Class 3), the model’s recognition ability was weaker: accuracy was 84.1%, with 15.9% of oriented ice crystal samples misclassified (most of those, 14.7%, were wrongly predicted as regular ice clouds). This behavior is consistent with the known similarity in optical characteristics between oriented ice crystals and non-oriented ice clouds, which makes their separation more challenging for both algorithmic and learned classifiers.

Using receiver operating characteristic (ROC) curves and the derived metrics (Figure 4), we observed that the model’s discriminative ability was high for all classes (area under each ROC curve, AUC > 0.99). However, the performance metrics reveal important disparities. For ice clouds (Class 1, ~36.8% of samples) and water clouds (Class 2, ~62.7%), the model achieved outstanding precision (91.89% and 99.12% respectively), recall (98.83% and 92.92%), and F1-scores (~0.95 and 0.96). In contrast, for oriented ice crystals (Class 3, only 0.58% of samples), despite a high AUC of 0.991, the precision was much lower (42.57%) with a recall of 84.12% (F1-score ~0.57).

Two main factors contributed to this discrepancy for Class 3 (oriented ice crystals). (1) Severe class imbalance: Even after SMOTE oversampling, oriented ice crystal samples were far fewer than ice or water cloud samples (originally ~76,000 vs. millions for the other classes), making it difficult for the model to learn this minority class thoroughly. (2) Complex cloud coexistence: Oriented ice crystals often coexist with or transition into other cloud types under certain conditions [26], blurring the boundaries between classes and complicating the classification task.

These factors collectively imply that while the model can effectively detect the presence of oriented ice crystals (reflected in the high recall), it tends to over-predict oriented ice crystals for some samples that are actually other cloud types (hence the low precision). This leads to reduced overall performance on this minority class, despite the promising ROC curve.

As shown in Figure 5, the distribution of predicted probabilities offers further insight into the model’s confidence characteristics. For ice clouds (Class 1) and water clouds (Class 2), the model shows strong discriminative power: the predicted probabilities for correctly classified samples (blue bars) are mostly very high (0.8–1.0 range), whereas the probabilities when the model is wrong (red bars) are generally very low (0–0.2). This clear separation indicates high confidence in identifying these cloud types. In contrast, for oriented ice crystals (Class 3), the probability distribution is more dispersed. While correctly classified oriented ice samples still tend to have high predicted probabilities (peak in 0.8–1.0 range), the incorrect predictions are spread more broadly, especially in the mid-range (0.4–0.8). This reflects the model’s relatively high uncertainty in recognizing oriented ice crystals—a direct consequence of their limited training sample size (only ~0.58% of data). Analyzing the probability outputs in this way not only quantifies model confidence but also helps identify which classification results should be treated with caution.

3.2. Physical Consistency Validation of Reclassified Cloud Types

In addition to statistical evaluation using high-confidence CALIPSO samples, we assess the physical consistency of the MLP-reclassified cloud types by examining their vertical cloud-top height characteristics. This analysis is designed to complement conventional supervised performance metrics and does not rely on CALIPSO categorical labels as a validation reference. Instead, it evaluates whether the reclassified cloud categories exhibit physically plausible vertical structures that are consistent with established cloud physical understanding, thereby providing an additional, label-independent assessment of the model output.

Figure 6 summarizes the cloud-top height characteristics of the MLP-reclassified ice and water clouds at global and latitudinal scales. As shown in Figure 6a, the two cloud types exhibit clearly distinct height distributions: ice clouds are predominantly associated with higher altitude ranges, while water clouds are mainly concentrated at lower levels. Figure 6b further shows that the relative vertical positioning of ice and water clouds varies with latitude. Although water clouds may reach comparable or locally higher median cloud-top heights in certain low- to mid-latitude regions, the overall height distributions remain physically distinct. This latitude-dependent behavior reflects the influence of regional thermodynamic and dynamical conditions and is consistent with known cloud formation processes. Overall, the observed vertical structure characteristics demonstrate that the reclassified cloud types exhibit physically meaningful and climatologically reasonable behavior, supporting the physical plausibility of the reclassification results.

In summary, the multi-faceted validation confirms that our MLP model provides reliable classification performance, especially for the primary cloud types (ice and water clouds). Overall, the model has sufficient accuracy and generalization ability to support the subsequent reclassification of low-confidence data.

3.3. Reclassification of Low-Confidence Data

Before reclassification, we first examined the composition of CALIPSO’s cloud classification dataset (Figure 7). We focused on the CALIPSO V4.10 5 km first-layer cloud data for summer daytime (2006–2021). As expected, high-confidence data constitute the vast majority of the dataset, reflecting the overall reliability of CALIPSO’s operational cloud classification algorithm [22]. However, the proportion of low-confidence (low-quality) data, while only a few percent, corresponds to a substantial absolute number of samples (millions worldwide). Moreover, most of these low-confidence samples were labeled as “unknown,” indicating that the current algorithm struggles under certain conditions (for example, low signal-to-noise ratios or complex edge-of-cloud situations).

In-depth analysis of the high-confidence subset reveals that water clouds dominate the lowest cloud layers in summer daytime, followed by ice clouds, with oriented ice crystals being relatively rare. This distribution is physically reasonable, as summertime boundary layer conditions (warm temperatures and ample moisture) favor water cloud formation. The secondary presence of ice clouds corresponds mainly to cirrus clouds at higher altitudes (above ~5–6 km) [27]. The small fraction of oriented ice crystals indicates specific microphysical conditions where ice crystals align horizontally; while rare, their presence is important for understanding cloud radiative effects [28].

Through our MLP-based reclassification analysis of the 2,216,512 low-confidence (low-quality) samples, several important findings emerged. Figure 8a shows the original breakdown of these low-quality data: about 64% were “unknown” category, ~27% were labeled water cloud, 5% oriented ice crystal, and 4% ice cloud. This underscores the limitations of the current algorithm when confronted with marginal data—it defaulted to “unknown” in the majority of questionable cases.

After applying our trained MLP model, the reclassified results (Figure 8b) showed a dramatically different distribution: 59.64% water clouds, 35.81% ice clouds, and 4.55% oriented ice crystals (with the “unknown” category effectively eliminated). The new distribution is highly consistent with the expected climatology for summer daytime low-level clouds, which are predominantly water clouds. In other words, the MLP reassignment of formerly “unknown” cases appears physically plausible and aligns with known atmospheric behavior.

To understand how individual samples were reclassified, we analyzed the detailed category conversions (Figure 8c,d): (1) For samples originally labeled ice cloud in the low-confidence set, the MLP was very stable—it retained the ice classification for ~91.2% of them, converting only 7.7% to water cloud. This high retention suggests that even when labeled with low confidence, most of these ice cloud identifications by CALIPSO were correct, and our model agrees. (2) For samples originally labeled water cloud, the MLP kept 64.71% as water cloud, but reclassified 30.37% as ice cloud and 4.92% as oriented ice crystal. This indicates that some low-confidence “water” cloud samples had features more consistent with ice clouds. Such confusion can occur in mixed-phase cloud regions, where supercooled water droplets and ice crystals coexist, giving water-like optical properties to what is partly an ice cloud. (3) Notably, for samples originally labeled oriented ice crystal, about half (50.29%) were reclassified by the MLP as regular ice clouds, 22.31% as water clouds, and only 27.40% remained oriented ice crystals. This suggests that many of the “oriented ice” labels in low-confidence data may actually have been transitional cases between ice and water clouds. This interpretation is consistent with prior cloud microphysical observations of mixed-phase clouds and horizontally oriented crystals [26]. (4) The most dramatic change was for the “unknown” category (which comprised 64% of the low-quality data). After MLP reclassification, 33.46% of these previously unknown cases were identified as ice clouds, 63.72% as water clouds, and 2.82% as oriented ice crystals (Figure 8d). The dominance of water cloud assignments in formerly unknown data aligns with the understanding that summer low-level clouds are mostly warm (liquid) clouds.

These results can be explained from several physical perspectives. (1) Thermodynamic perspective: In daytime summer boundary layers, temperature is high and humidity is abundant, conditions very conducive to the formation and maintenance of water clouds [29]. (2) Altitude perspective: The lowest cloud layers are in the lower troposphere, where temperatures are usually above 0 °C, favoring the existence of liquid water clouds. (3) Microphysical perspective: It is generally easier for the atmosphere to satisfy the conditions for water cloud droplet formation than for ice crystal formation. Thus, many ambiguous “unknown” cases default to water cloud when analyzed physically. All the above mechanisms support the plausibility of the MLP model’s reclassification outcomes.

We also examined the global spatial distribution of the reclassified results (Figure 9). The panels (a–d) show the original low-confidence data distribution: ice clouds (~4.1% of low-quality data) were mainly at high latitudes (especially the Arctic); water clouds (~27.1%) were globally distributed with a concentration in the Northern Hemisphere; oriented ice crystals (~5.1%) had “hotspots” at high latitudes (again near the Arctic); and the unknown category (~63.7%) was widespread but with notably higher density in high-latitude regions (e.g., polar areas), where extreme conditions often challenge the classification algorithm. After reclassification by the MLP (panels e–f), the low-quality samples are reassigned into just two categories: ice clouds (~35.9%) and water clouds (~59.6%). The spatial patterns of these reclassified clouds remained consistent with the original physical expectations: ice clouds show highest densities in cold, high-latitude regions (e.g., the Arctic, consistent with the prevalence of ice clouds ~30% of global cloud cover [5]), whereas water clouds are ubiquitous worldwide, with higher densities in warmer regions with abundant moisture.

It is worth noting that summer atmospheric dynamics can obscure some cloud features (e.g., intense convection can produce complex cloud layer structures), which makes classification more difficult for any algorithm. The fact that our MLP model maintained high classification accuracy even under these conditions indicates that it successfully captured the essential distinguishing features of cloud types. In particular, the model was able to take many samples that were originally “unknown” to the CALIPSO algorithm and reasonably classify them as water or ice clouds based on their optical and physical characteristics. This capability is of great value for improving the integrity and usefulness of the CALIPSO dataset.

Taken together, the existing analyses provide an atmospheric interpretation of both the original “unknown” subset and the major relabeling pathways. The cloud-top-height consistency analysis (Figure 6) shows that the reclassified ice and water clouds retain physically distinct vertical characteristics, while the conversion statistics (Figure 8c,d) indicate that the dominant label transitions are not arbitrary but are consistent with plausible thermodynamic and phase-transition conditions. In addition, the global spatial distributions before and after reclassification (Figure 9) suggest that the original “unknown” samples are concentrated in observationally and physically challenging regions, and that the reclassified cloud types remain consistent with known climatological patterns. Therefore, although these analyses do not establish an independent physical truth for ambiguous samples, they support the physical plausibility and atmospheric interpretability of the refined labels.

4. Discussion

This study applied an MLP neural network to generate refined cloud-type information for CALIPSO low-confidence cloud-layer observations and achieved an overall agreement of ~94.99% with the high-confidence subset. The use of 11 geolocation-, optical-, and microphysics-related features provided the model with a sufficiently diverse parameter space to characterize the main CALIPSO cloud categories. The progressive reduction in hidden-layer size (256 → 128 → 64 neurons) facilitated hierarchical feature abstraction while maintaining manageable model complexity.

Despite employing strategies to address data imbalance, such as SMOTE oversampling and weighted loss, the performance of the rare oriented ice crystal class remains relatively limited compared to the two major cloud types (precision ~42.6%, F1 ~0.57). This indicates that, within the current framework, the issue of class imbalance is merely mitigated rather than fully resolved. At the same time, this result reflects the inherent difficulty in distinguishing rare cloud states, as these states share similar optical characteristics with more common clouds. When applied to the low-confidence subset, the model assigned most previously “unknown” samples to water clouds (63.7%) or ice clouds (33.5%), consistent with typical summer cloud-phase distributions. Likewise, approximately half of the low-confidence oriented ice crystal samples were mapped to regular ice clouds and about 22% to water clouds, a pattern aligned with their known tendency to occur within transitional or mixed-phase conditions. Spatial patterns of the refined results also agreed with established climatological tendencies—for example, water clouds dominating warm regions and ice clouds being more prevalent at high latitudes.

Looking forward, several directions may further strengthen this line of research. (1) Enhanced imbalance mitigation: Future efforts may incorporate more advanced techniques, such as generative-model-based sample augmentation or meta-learning frameworks, to better represent rare cloud types in the training set. (2) External observational comparison: Future work will consider comparison with additional observational products as an auxiliary reference for the refined cloud-type results. (3) Unknown-sample characterization: Future work should further investigate the atmospheric and observational conditions associated with the original “unknown” labels, including their distributions in optical, geometric, and polarization-related variables. Such analysis would help clarify whether these samples are concentrated in specific phase-ambiguous, transitional, or otherwise challenging cloud regimes. (4) Model interpretability analysis: Applying post hoc interpretation tools, such as SHAP or other feature-attribution methods, may provide further insight into how depolarization-related, optical, geometric, and structural variables jointly contribute to the probabilistic refinement of low-confidence cloud-type labels. Such analyses could also help identify the key variables associated with major relabeling pathways, particularly for samples that change between ice and water cloud categories, thereby improving the physical interpretability and transparency of the framework.

Overall, the proposed framework offers a practical means of improving the completeness and usability of CALIPSO cloud-type products by providing probabilistic cloud-type information for observations flagged as low confidence.

5. Conclusions

This study presents a multilayer perceptron (MLP)-based framework for generating refined cloud-type information for CALIPSO low-confidence cloud-layer observations. The approach achieved an overall agreement of approximately 95% with the high-confidence subset and provided cloud-type assignments for 63.7% of samples originally labeled as “unknown,” thereby improving the completeness of the CALIPSO V4.10 5 km cloud-type product. The refined outputs offer more usable cloud-type information for downstream climatological and modeling applications. Our comprehensive analysis of the reclassification outcomes—including statistical performance and physical plausibility—provides deeper insight into cloud classification and offers a new methodological reference for improving the quality of satellite lidar remote sensing datasets. In future work, we will explore more effective class balancing strategies and incorporate physical constraints into the model to further enhance performance. Overall, this study advances remote sensing data processing methods and provides more reliable data support for in-depth research on the climatic effects of clouds.

Author Contributions

Conceptualization, M.Z.; Methodology, W.S.; Software, X.L.; Validation, W.S. and S.Y.; Formal analysis, W.S.; Investigation, W.S.; Resources, M.Z.; Data curation, M.Z.; Writing—original draft, X.L. and W.S.; Writing—review & editing, M.Z. and G.H.; Visualization, W.S.; Supervision, M.Z.; Project administration, M.Z.; Funding acquisition, M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

Program for Science & Technology Innovation Talents in Universities of Henan Province of China: 24HASTIT018; Program of Undergraduate Universities Young Backbone Teacher Training of Henan Province of China: 2024GGJS104.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ramanathan, V.; Cess, R.D.; Harrison, E.F.; Minnis, P.; Barkstrom, B.R.; Ahmad, E.; Hartmann, D. Cloud-radiative forcing and climate: Results from the Earth Radiation Budget Experiment. Science 1989, 243, 57–63. [Google Scholar] [CrossRef]
Zhang, Y.; Han, G.; Huang, Y.; Wang, H.; Zhang, H.; Pei, Z.; Pu, Y.; Luo, H.; Yi, J.; Gong, W. Attributing GHG emissions to individual facilities using multi-temporal hyperspectral images: Methodology and applications. ISPRS J. Photogramm. Remote Sens. 2026, 232, 937–956. [Google Scholar] [CrossRef]
Rossow, W.B.; Schiffer, R.A. Advances in understanding clouds from ISCCP. Bull. Am. Meteorol. Soc. 1999, 80, 2261–2287. [Google Scholar] [CrossRef]
Platnick, S.; King, M.D.; Ackerman, S.A.; Menzel, W.P.; Baum, B.A.; Riédi, J.C.; Frey, R.A. The MODIS cloud products: Algorithms and examples from Terra. IEEE Trans. Geosci. Remote Sens. 2003, 41, 459–473. [Google Scholar] [CrossRef]
Winker, D.; Chepfer, H.; Noel, V.; Cai, X. Observational Constraints on Cloud Feedbacks: The Role of Active Satellite Sensors. Surv. Geophys. 2017, 38, 1483–1508. [Google Scholar] [CrossRef]
Chen, T.; Rossow, W.B.; Zhang, Y.-C. Radiative Effects of Cloud-Type Variations. J. Clim. 2000, 13, 264–286. [Google Scholar] [CrossRef]
Winker, D.M.; Pelon, J.; McCormick, M.P. The CALIPSO mission: Spaceborne lidar for observation of aerosols and clouds. In Lidar Remote Sensing for Industry and Environment Monitoring III; SPIE: Bellingham, WA, USA, 2003. [Google Scholar]
Winker, D.M.; Pelon, J.; Coakley, J.A., Jr.; Ackerman, S.A.; Charlson, R.J.; Colarco, P.R.; Flamant, P.; Hoff, R.M.; Kittaka, C. The CALIPSO Mission: A Global 3D View of Aerosols and Clouds. Bull. Am. Meteorol. Soc. 2010, 91, 1211–1229. [Google Scholar] [CrossRef]
Winker, D.M.; Vaughan, M.A.; Omar, A.; Hu, Y.; Powell, K.A.; Liu, Z.; Hunt, W.H.; Young, S.A. Overview of the CALIPSO Mission and CALIOP Data Processing Algorithms. J. Atmos. Ocean. Technol. 2009, 26, 2310–2323. [Google Scholar] [CrossRef]
Avery, M.A.; Ryan, R.A.; Getzewich, B.J.; Vaughan, M.A.; Winker, D.M.; Hu, Y.; Garnier, A.; Pelon, J.; Verhappen, C.A. CALIOP V4 cloud thermodynamic phase assignment and the impact of near-nadir viewing angles. Atmos. Meas. Tech. 2020, 13, 4539–4563. [Google Scholar] [CrossRef]
Bony, S.; Colman, R.; Kattsov, V.M.; Allan, R.P.; Bretherton, C.S.; Dufresne, J.-L.; Hall, A.; Hallegatte, S.; Holland, M.M.; Ingram, W.; et al. How well do we understand and evaluate climate change feedback processes? J. Clim. 2006, 19, 3445–3482. [Google Scholar] [CrossRef]
Hartmann, D.L.; Ockert-Bell, M.E.; Michelsen, M.L. The Effect of Cloud Type on Earth’s Energy Balance: Global Analysis. J. Clim. 1992, 5, 1281–1304. [Google Scholar] [CrossRef]
Cesana, G.; Storelvmo, T. Improving climate projections by understanding how cloud phase affects radiation. J. Geophys. Res. Atmos. 2017, 122, 4594–4599. [Google Scholar] [CrossRef]
Noel, V.N.; Chepfer, H. A global view of horizontally oriented crystals in ice clouds from Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO). J. Geophys. Res. Atmos. 2010, 115, D00H23. [Google Scholar] [CrossRef]
Haynes, J.M.; Noh, Y.-J.; Miller, S.D.; Haynes, K.D.; Ebert-Uphoff, I.; Heidinger, A. Low cloud detection in multilayer scenes using satellite imagery with machine learning methods. J. Atmos. Ocean. Technol. 2022, 39, 319–334. [Google Scholar] [CrossRef]
Guo, B.; Zhang, F.; Li, W.; Zhao, Z. Cloud classification by machine learning for geostationary radiation imager. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4102814. [Google Scholar] [CrossRef]
Zeng, S.; Omar, A.; Vaughan, M.; Ortiz, M.; Trepte, C.; Tackett, J.; Yagle, J.; Lucker, P.; Hu, Y.; Winker, D.; et al. Identifying aerosol subtypes from CALIPSO lidar profiles using deep machine learning. Atmosphere 2021, 12, 10. [Google Scholar] [CrossRef]
Salcedo, A.; Rocadenbosch, F.; López-Martínez, C. Retrieval of planetary boundary layer height from CALIPSO satellite: A big data and machine learning approach. In Proceedings of the 2025 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Brisbane, Australia, 3–8 August 2025; IEEE: New York, NY, USA, 2026; pp. 5519–5523. [Google Scholar] [CrossRef]
Brakhasi, F.; Matkan, A.; Hajeb, M.; Khoshelham, K. Atmospheric scene classification using CALIPSO spaceborne lidar measurements in the Middle East and North Africa (MENA), and India. Int. J. Appl. Earth Obs. Geoinf. 2018, 73, 721–735. [Google Scholar] [CrossRef]
Han, G.; Zhang, H.; Huang, Y.; Chen, W.; Mao, H.; Zhang, X.; Ma, X.; Li, S.; Zhang, H.; Liu, J.; et al. First global XCO₂ observations from spaceborne lidar: Methodology and initial result. Remote Sens. Environ. 2025, 330, 114954. [Google Scholar] [CrossRef]
Han, G.; Huang, Y.; Shi, T.; Zhang, H.; Li, S.; Zhang, H.; Chen, W.; Liu, J.; Gong, W. Quantifying CO₂ emissions of power plants with Aerosols and Carbon Dioxide Lidar onboard DQ-1. Remote Sens. Environ. 2024, 313, 114368. [Google Scholar] [CrossRef]
Yost, C.R.; Minnis, P.; Sun-Mack, S.; Chen, Y.; Smith, W.L. CERES MODIS Cloud Product Retrievals for Edition 4—Part II: Comparisons to CloudSat and CALIPSO. IEEE Trans. Geosci. Remote Sens. 2021, 59, 3695–3724. [Google Scholar] [CrossRef]
Li, Z.; Shen, H.; Weng, Q.; Zhang, Y.; Dou, P.; Zhang, L. Cloud and cloud shadow detection for optical satellite imagery: Features, algorithms, validation, and prospects. ISPRS J. Photogramm. Remote Sens. 2022, 188, 89–108. [Google Scholar] [CrossRef]
Tan, Z.; Wang, X.; Liu, Y.; Li, J. Assessing Overlapping Cloud Top Heights: An Extrapolation Method and Its Performance. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4107811. [Google Scholar] [CrossRef]
Chen, P.; Ren, Y.; Zhang, B.; Zhao, Y. Class Imbalance in the Automatic Interpretation of Remote Sensing Images: A Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 9483–9508. [Google Scholar] [CrossRef]
Westbrook, C.D.; Illingworth, A.J.; O’Connor, E.J.; Hogan, R.J. Doppler lidar measurements of oriented planar ice crystals falling from supercooled and glaciated layer clouds. Q. J. R. Meteorol. Soc. 2010, 136, 260–276. [Google Scholar] [CrossRef]
Wang, Q.; Chen, L.; Wang, F.; Zhang, X. Obtaining Cloud Base Height and Phase from Thermal Infrared Radiometry Using a Deep Learning Algorithm. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4105914. [Google Scholar] [CrossRef]
Ou, S.S.C.; Kahn, B.H.; Liou, K.-N.; Takano, Y.; Schreier, M.M.; Yue, Q. Retrieval of Cirrus Cloud Properties from the Atmospheric Infrared Sounder: The K-Coefficient Approach Using Cloud-Cleared Radiances as Input. IEEE Trans. Geosci. Remote Sens. 2013, 51, 1010–1024. [Google Scholar] [CrossRef]
Minnis, P.; Yost, C.R.; Sun-Mack, S.; Chen, Y. CERES MODIS Cloud Product Retrievals for Edition 4—Part I: Algorithm Changes. IEEE Trans. Geosci. Remote Sens. 2021, 59, 2744–2780. [Google Scholar] [CrossRef]

Figure 1. Structure of the MLP neural network model.

Figure 2. Performance–cost trade-off of the 23 hyperparameter-search trials. The vertical axis shows validation Macro-F1, and the horizontal axis shows training time.

Figure 3. Confusion matrix for test results (Class 1: ice clouds; Class 2: water clouds; Class 3: oriented ice crystals).

Figure 4. ROC curves and performance metrics for each class (Orange bars: precision; Blue: recall; Green: F1-score for Class 1: ice, Class 2: water, Class 3: oriented ice).

Figure 5. Predicted probability distributions for each class: (a) ice clouds, (b) water clouds, and (c) oriented ice crystals. Blue indicates correct classifications, and red indicates misclassifications.

Figure 6. Physical consistency assessment of MLP-reclassified cloud types based on cloud-top height characteristics.

Figure 7. Hierarchical representation of CALIPSO cloud data quality in the summer daytime dataset. High-confidence, low-confidence, and excluded/non-selected samples represent the overall dataset partition, whereas the HQ subcategories indicate the internal composition of the high-confidence subset only.

Figure 8. (a) Original low-confidence cloud classification (Class 1: ice; 2: water; 3: oriented ice; 4: unknown); (b) after MLP reclassification; (c,d) detailed conversion of original labels to new labels.

Figure 9. Global spatial distribution of CALIPSO low-confidence cloud types before (a–d) and after (e,f) MLP reclassification (sample rate 1%).

Table 1. Input features and expected output.

Category	Variable	Description
Input	Longitude	Geographical longitude of CALIPSO footprint
Input	Latitude	Geographical latitude of CALIPSO footprint
Input	AOD	Aerosol optical depth (AOD)
Input	Flux_AOD	Flux-based aerosol optical depth
Input	Cloud_Top_Height	Cloud top height (km)
Input	Cloud_Base_Height	Cloud base height (km)
Input	Cloud_Thickness	Cloud thickness (top minus base, km)
Input	Number_of_ Layers	Number of detected cloud layers
Input	AOD_Ratio	Relative contribution of AOD (ratio)
Input	Depolarization_Ratio	Lidar depolarization ratio (microphysical indicator)
Input	Color_Ratio	Lidar color ratio (particle size proxy)
Output	Cloud_Type	Classification target: {Ice cloud, Water cloud, Oriented ice crystals}

Table 2. Comparison of the selected MLP, Random Forest, and SVM on the same high-confidence subset.

Model	MLP	SVM	RF
Validation Accuracy	0.974478	0.977850	0.976817
Validation Macro-F1	0.939500	0.939625	0.940600
Validation Weighted-F1	0.974862	0.977672	0.977463
Test Accuracy	0.974583	0.977786	0.977259
Test Macro-F1	0.939689	0.940034	0.942711
Test Weighted-F1	0.975024	0.977565	0.977885
Training Time (s)	18.75	33.94	85.53

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Luo, X.; Song, W.; Yan, S.; Zhang, M.; Han, G. Enhancing the Usability of CALIPSO Low-Confidence Cloud Products Using a Multilayer Perceptron-Based Data Refinement Framework. Atmosphere 2026, 17, 413. https://doi.org/10.3390/atmos17040413

AMA Style

Luo X, Song W, Yan S, Zhang M, Han G. Enhancing the Usability of CALIPSO Low-Confidence Cloud Products Using a Multilayer Perceptron-Based Data Refinement Framework. Atmosphere. 2026; 17(4):413. https://doi.org/10.3390/atmos17040413

Chicago/Turabian Style

Luo, Xiaolu, Wenkai Song, Shiqi Yan, Miao Zhang, and Ge Han. 2026. "Enhancing the Usability of CALIPSO Low-Confidence Cloud Products Using a Multilayer Perceptron-Based Data Refinement Framework" Atmosphere 17, no. 4: 413. https://doi.org/10.3390/atmos17040413

APA Style

Luo, X., Song, W., Yan, S., Zhang, M., & Han, G. (2026). Enhancing the Usability of CALIPSO Low-Confidence Cloud Products Using a Multilayer Perceptron-Based Data Refinement Framework. Atmosphere, 17(4), 413. https://doi.org/10.3390/atmos17040413

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing the Usability of CALIPSO Low-Confidence Cloud Products Using a Multilayer Perceptron-Based Data Refinement Framework

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Preprocessing

2.2. MLP Neural Network Model Design

3. Results

3.1. MLP Model Validation

3.2. Physical Consistency Validation of Reclassified Cloud Types

3.3. Reclassification of Low-Confidence Data

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI