Multivariable Characterization of Atmospheric Environment with Data Collected in Flight

: The In-Cloud Icing and Large-drop Experiment (ICICLE) ﬂight campaign, led by the United States Federal Aviation Administration, was conducted in the geographical region over US Midwest and Western Great Lakes, between January and March 2019, with the aim to collect atmospheric data and study the aircraft icing hazard. Measurements were taken onboard the National Research Council of Canada (NRC) Convair-580 aircraft, which was equipped with more than 40 in situ probes, sensors, and remote sensing instruments in collaboration with Environment and Climate Change Canada (ECCC). In each ﬂight, aerosol, cloud microphysics, atmospheric and aircraft state data were collected. Atmospheric environment characterization is critical both for cloud studies and for operational decision making in ﬂight. In this study, we use the advantage of multiple input parameters collected in-ﬂight together with machine learning and clustering techniques to characterize the ﬂight environment. Eleven parameters were evaluated for the classiﬁcation of the sampled environment along the ﬂight path. Namely, aerosol concentration, temperature, hydrometeor concentration, hydrometeor size, liquid water content, total water content, ice accretion rate, and radar parameters in the vicinity of the aircraft. In the analysis of selected ﬂights, we were able to identify periods of supercooled liquid clouds, glaciated clouds, two types of mixed-phase clouds, and clear air conditions. This approach offers an alternative characterization of cloud boundaries and a complementary identiﬁcation of ﬂight periods with hazardous icing conditions.


Introduction
Regulators, air traffic controllers, and pilots use atmospheric data for airborne operations. In the decision-making process, the interpretation of the incoming data may vary based on individuals' experience [1]. New weather forecast and operational tools aim to overcome human factor biases and improve characterization of the hazardous flight environment via multiplatform analyses, i.e., airborne, spaceborne, ground, and models [2][3][4][5]. Such characterization of the atmospheric environment would highly depend on the selection of tools and the final application. Common cases of interpretation biases include that of clear-air or out-of-cloud environment definition and that of mixed phase clouds, where the ratios of sizes and numbers of hydrometeors play a key role. Meanwhile, atmospheric scientists distinguish between different types of clouds by the dominant physical processes involved in cloud evolution. All identified categories are often inconsistent due to the use of different approaches, e.g., selection of observations location, resolution and sensitivity. For example, Oktem and Romps [6] have used ground stereo cameras in a multi-year 2.1.1. Aircraft Instruments The exhaustive list of instruments integrated into NRC's Convair-580 aircraft for the flight campaign is described in detail in [24]. In Table 1, we list the source instruments whose data was used as selected input features in the characterization approaches. The Table includes both the probes for in situ measurements and remote-sensing instruments.

Nevzorov
The Nevzorov sensor has constant-temperature hot-wire elements, which measure LWC and TWC in clouds. These elements differ in shape of their collectors: the LWC detector is mostly sensitive to liquid droplets and discriminates them from ice particles that instantly shatter when collide with the convex surface of the sensor, while the conical hollow collector of the TWC enables to catch both ice and liquid particles. The operational  [24]. Top row: NRC's radar system (W-and X-band) and its axial pointing options and underwing cloud probes. Middle row: Aerosol isokinetic sampling inlet and bulk water detectors installed on the fuselage. Bottom row: Underwing cloud and atmosphericstate probes. Bottom middle: Cabin configuration with operator stations.

Aircraft Instruments
The exhaustive list of instruments integrated into NRC's Convair-580 aircraft for the flight campaign is described in detail in [24]. In Table 1, we list the source instruments whose data was used as selected input features in the characterization approaches. The Table includes both the probes for in situ measurements and remote-sensing instruments. Remote sensing NAW Figure 1. Illustration of the NRC Convair-580 aircraft instrumented for the ICICLE campaign in 2019, reproduced with permission from [24]. Top row: NRC's radar system (W-and X-band) and its axial pointing options and underwing cloud probes. Middle row: Aerosol isokinetic sampling inlet and bulk water detectors installed on the fuselage. Bottom row: Underwing cloud and atmospheric-state probes. Bottom middle: Cabin configuration with operator stations.

Rosemount Icing Detector
A Rosemount Icing Detector (RID) is mounted on an oscillating rod, which changes its natural vibration frequency as impinging ice droplets accrete and freeze on the sensor. The decrease in frequency is proportional to the increase in the accumulated mass and induces a voltage gain digitized to 1 mV precision. If the accreted ice exceeds a voltage threshold, the sensor's tip is heated for a fixed time interval to remove the ice. Previous studies have shown that the instrument has insignificant response to ice crystals; hence, the RID can not only provide independent supercooled liquid water (SLW) detection, but also can be useful in segregating liquid and glaciated conditions. The detailed principle of RID operation and limitations are described in [28][29][30].

OAP and Scattering Probes
In this study, we used composite Particle Size Distributions (PSD) covering the full size-range of the hydrometeors (2 µm to 2 cm). Normally, these PSDs are composed of data from one scattering probe (e.g., CDP, FCDP, FSSP) one mid-size range Optical Array Probe (OAP), typically 2D-S, and one precipitation-size range OAP, typically HVPS-3, similar to PSDs presented in Nguyen et al. [31]. A detailed description of these common commercial probes can be found in Baumgardner et al. [32] and references therein. For images, a particle area equivalent diameter was used for sizing of particles that are captured fully within the detection array.

Rosemount Temperature Sensor
In this study, we use one of the 5 Rosemount Temperature sensors [33] installed on the aircraft. The accuracy of the static temperature measured by this sensor is known to be ±0.1 • C [33]. However, due to known sampling biases and spread, including the location near the fuselage, it is estimated to be ±1 • C.

UHSAS
The Ultra-High Sensitivity Aerosol Spectrometer [34] was integrated in a cabin rack and connected to a sampling isokinetic inlet located on top of the fuselage. The instrument recorded 1 Hz PSD data in the size range of 60 to 1000 nm. For our analysis, we used the total number concentration of aerosol counted by the instrument.

NAW
For remote sensing, we used the NRC's Airborne W-band radar (NAW), which typically has three pointing direction options [31,35]. In this study, we used data from the side (horizontal) sensing direction. The most informative parameters were found to be Radar Reflectivity (Z), Differential Reflectivity (ZDR), and Linear Depolarization Ratio (LDR) ( Table 1). We used the remote sensing data within the vicinity of the aircraft, at a distance of 250 m. This distance was selected to ensure the radar data are not affected by the close-range biases [31] while remaining within the same flight environment as experienced by the in situ probes (for more details, see Section 2.3.2). The assumption here is uniformity of cloud microstructure on a spatial scale of 250 m. This assumption works quite well for homogeneous cases. However, it does not work in the case of an inhomogeneous cloudy environment within a 250 m range. Such issues may occur on cloud edges; however, since here we use the side antenna which is leveled on the plane of the in situ instruments, the probability of lateral inhomogeneity is much lower.
Additional instruments, not listed here, were used for quality assessment of the characterization outcomes. Some of those include but not limited to Airborne Elastic Cloud Lidar (AECL, [36]), Cloud Particle Imager (CPI, [37]) and other OAP images, as well as GOES satellite channels and other data, which are available in the ICICLE project repository (see Data Availability Statement section).

ICICLE Flight Campaign
The data used in this study was collected during the In-Cloud Icing and Large-drop Experiment (ICICLE) flight campaign, led by the United States Federal Aviation Administration (FAA). It was conducted between January and March 2019, over the US Midwest and Western Great Lakes. An essential aim of this campaign was to study microphysical cloud properties in order to better understand how hazardous icing conditions (particularly freezing rain, freezing drizzle, and ice pellets) form and evolve over time [20,24]. In these flights, the observed cloud composition (i.e., PSD and phase) varied between ice particles of various sizes and shapes, freezing drizzle, and freezing rain, and supercooled cloud droplets. Other properties, such as ice accretion, LWC, and TWC, were also collected to characterize bulk cloud microphysical properties. Here, a subset of data is used for the selected two flight test cases with a rich variety of conditions. The following subsections and Figures 2-4 provide an overview of those flight environments. Experiment (ICICLE) flight campaign, led by the United States Federal Aviation Administration (FAA). It was conducted between January and March 2019, over the US Midwest and Western Great Lakes. An essential aim of this campaign was to study microphysical cloud properties in order to better understand how hazardous icing conditions (particularly freezing rain, freezing drizzle, and ice pellets) form and evolve over time [20,24]. In these flights, the observed cloud composition (i.e., PSD and phase) varied between ice particles of various sizes and shapes, freezing drizzle, and freezing rain, and supercooled cloud droplets. Other properties, such as ice accretion, LWC, and TWC, were also collected to characterize bulk cloud microphysical properties. Here, a subset of data is used for the selected two flight test cases with a rich variety of conditions. The following subsections and Figures 2-4 provide an overview of those flight environments.        On the morning of 23 February 2019, a strong area of low pressure over the Texas Panhandle area was lifting north-eastward into Kansas, encroaching into the western side of an area of high pressure centered over New England. The advancement of the precipitation into southern Wisconsin led to the development of a deep layer dominated by ice crystals falling into a melting layer, then subsequently, into the near-surface sub-freezing layer to form classical freezing rain (FZRA) there during the morning hours [38]. These environments and the classical FZRA mechanism were the focus of this day's mission. On the morning of 23 February 2019, a strong area of low pressure over the Texas Panhandle area was lifting north-eastward into Kansas, encroaching into the western side of an area of high pressure centered over New England. The advancement of the precipitation into southern Wisconsin led to the development of a deep layer dominated by ice crystals falling into a melting layer, then subsequently, into the near-surface sub-freezing layer to form classical freezing rain (FZRA) there during the morning hours [38]. These environments and the classical FZRA mechanism were the focus of this day's mission. Vertical profiles were made at Janesville (KJVL) and Madison (KMSN), including missed approaches to document the environment essentially down to the surface. In addition, level flight legs were flown at select altitudes within the lower (3000-4000 ft) and upper sub-freezing layers (10,000 and 12,000 ft) between the two sites and in the vicinity. Later, the aircraft captured data at the higher altitudes to the west, near the Wisconsin-Iowa border, as well as during the return to the operations base at Rockford, IL, USA (KRFD; Figures 2-4).

Flight 28, 5 March 2019
On the morning of 5 March 2019, an area of low pressure was present across the northern Great Lakes, with a developing cold front moving rapidly from Wisconsin into Michigan. West-northwesterly flow was present over this region, bringing cold temperatures and cold air advection across Lake Michigan. The mission was focused on the boundary layer; lake effect clouds over and immediately downwind of Lake Michigan, which had pockets of surface water temperatures on the order of 5 • C. The cellular stratocumulus lake-effect clouds had rather cold (−22 to −24 • C), liquid tops, as was evidenced by signals in the visible, long-and short-wave infrared satellite channels. There was a rapid conversion from liquid to snow from the cloud tops downward and cellular streaks of snow were present in reflectivity from the Grand Rapids, Michigan NEXRAD (not shown). The layer of higher, weaker, similarly cold, mixed-phase clouds was also evident over the southeastern part of Lake Michigan in satellite imagery ( Figure 3).
The Convair-580 departed KRFD at~11:54 UTC, climbed to~8000 ft, then flew across the southern part of the lake, briefly sampling the upper cloud layer near 9500 ft, then descending through the boundary-layer stratocumulus to perform a missed approach at the South Haven airport at~12:58 UTC. After performing a deep vertical profile while flying to the north along the eastern lake shore, the aircraft performed a missed approach at Muskegon at~13:34 UTC. This was followed by a series of north-south and east-west runs across the eastern half of the lake and just inland, while cycling altitudes through the stratocumulus deck. Upon reaching the western-most point where sampling was possible, the aircraft climbed into clear air above the stratocumulus deck and returned to the base at KRFD, landing at~16:28 UTC (Figures 2-4).

Classification Approaches
In this study, we composed a list of selected variables, derived from in situ singleparticle and bulk, as well as remote sensing measurements. Each of these variables on its own is not sufficient for characterization without the others; aerosol information is useful to cluster periods of clear-air and cloud-formation events; microphysical information makes use of hydrometeor features to characterize clouds but provides limited information about the physical processes. In many applications, such as remote sensing, knowledge of the microphysics is essential. One key parameter in remote sensing interpretation is the hydrometeor size distribution [31]. Here, we present clustering and other methods when the end goal is classification of the environment. For simplification of the discussion and the comparisons, we use the terms clustering and classification interchangeably. However, the clustering method becomes classification only after the subsequent step of manual attribution of class labels to the identified clusters.

Clustering Methods
Generally, clustering methods can be divided into two main groups: hierarchical and non-hierarchical clustering. Hierarchical clustering algorithms group data points and provide a natural graphical representation of data. The graphical representation resulting from hierarchical clustering is a dendrogram representing the nested grouping of data points and the similarity levels at which groupings change [39]. Non-hierarchical (also known as partitional) clustering methods yield a single partition of the data instead of a clustering structure. They perform the partition of data points into K clusters so that data in the same cluster are more similar to each other than those in different clusters. Partitional clustering methods can be categorized as either hard or fuzzy [22,40,41]. Hard (also known as crisp) clustering assigns each data point to a single cluster, while fuzzy clustering methods assign degrees of membership in several clusters to each data point. A fuzzy clustering can be transformed to a hard clustering by assigning each point to the cluster with the dominant degree of membership [22]. In the following subsections, we will present briefly the clustering methods used in our study.

K-Means
K-means clustering [42] is an iterative data-partitioning algorithm which assigns input data points to one of K clusters, where K is defined before the initialization of the algorithm. In this study, we use the k-means clustering function from MATLAB. The cluster centroid initialization is performed according to the K-means++ algorithm [43]. It is known that the k-means output can strongly depend on the initialization [44]. Various initial cluster centroid positions can eventually result in different k-means outcomes. To avoid such instability, we set the 'Replicates' attribute of the MATLAB (version R2022a) function to 100, repeating the clustering algorithm a hundred times, using new initial cluster centroid positions at each iteration.
K-means clustering serves as a primary tool for the multivariable characterization of cloud phase in this study. However, one must keep in mind that the k-means is an unsupervised learning algorithm that groups the input data into categories without labeling them. Instead of determining the cluster labels, k-means algorithm can detect internal structure in data and partition observations according to identified patterns. Unsupervised learning nature of this method has the advantages of simplicity and applicability to large datasets; however, the k-means algorithm does not come without drawbacks. The apparent shortcoming is the predetermined number of clusters K, the choice of which is generally not explicit. Furthermore, each entity is assigned to one cluster, which assumes well-defined class boundaries. Unfortunately, this is not the case in many real-life applications in which the boundaries between classes may overlap and the entities may belong to more than one cluster.
The optimal number of clusters in this method was established using the "elbow" location [45], when the algorithm is run several times with a range of K values (e.g., from 1 to 15) and for each K value the cost function is calculated. The inflection point of the resulting curve indicates the optimal number of clusters. It was found that the K value for two selected flights is 5. As mentioned above, the biggest drawback is the lack of labeled outputs. Therefore, a subsequent analysis of the clustering results is required to label the clusters. In consideration of this limitation, we implemented a manual inspection of the input parameters with individually assigned clusters and characterized the clusters according to their physical significance, followed by validation using complementary data.

Fuzzy Clustering
Fuzzy clustering methods have been proposed to address the limitations of traditional clustering methods. Particularly, fuzzy clustering methods have the advantage of being able to deal with the complex relationships between samples, which can help in reducing data noise. Fuzzy C-Means (FCM) is one of the most popular and frequently used fuzzy clustering methods [41]. The FCM method alternately optimizes membership degrees and centroids until the best clusters are found. In the FCM method, the fuzzy assignment of samples to each cluster is essentially based on the relative distance between one sample and all cluster centroids. The FCM method is a heuristic that aims to minimize the objective function by optimizing the membership degree and the centroids [41]. The FCM method has been proven to provide better solutions in many machine learning and pattern recognition applications in comparison to hard clustering methods, such as k-means. Furthermore, the Fuzzy J-means (FJM) method introduced by Belacel et al. [22] uses all possible centroids-topattern re-locations in order to construct move-defined neighborhoods. The membership values and centroids are calculated in the same way as in the FCM algorithm. Comparative results previously published indicate that the FJM method outperforms the FCM method for both simulated and real data sets [22,46].
To avoid the limitation of the predetermined clustering number, we have tried two approaches: The first one is based on empirical experiments, as in the case of k-means. The second approach applies the clustering based on hybrid metaheuristic SA-IAFSA algorithm, to obtain the number of clusters instead of defining it a priori.

Decision Tree Dendrogram Classification Method
In this paper, we validate the k-means algorithm outcomes using complementary data, e.g., AECL, NAX radar, and imagery observations and compare it to other classification approaches. One of these classification approaches was developed to characterize the flight conditions during the ICICLE campaign using aircraft in situ measurements with 1 s temporal resolution [20]. This method applies a set of conditional statements on water content and temperature data to partition observations into five primary categories: "liquid T < 0 • C", "liquid T > 0 • C", "mixed phase", "glaciated", and "clear air". As such, classification is based on a flowchart, we will further refer to it by its common name, a Decision Tree (DT) classification [47,48]. The DT methodology in ICICLE is depicted in Figure 5. LWC and IWC data for the DT classification are extracted from Nevzorov sensors and the temperature is taken from static temperature measurements of one of the aircraft Rosemount temperature sensors. The 'liquid T > 0 • C' category may generally contain a portion of mixed phase clouds with melting ice. Here, only flight F20 had such flight water content and temperature data to partition observations into five primary categories: "liquid T < 0 °C", "liquid T > 0 °C", "mixed phase", "glaciated", and "clear air". As such, classification is based on a flowchart, we will further refer to it by its common name, a Decision Tree (DT) classification [47,48]. The DT methodology in ICICLE is depicted in Figure 5. LWC and IWC data for the DT classification are extracted from Nevzorov sensors and the temperature is taken from static temperature measurements of one of the aircraft Rosemount temperature sensors. The 'liquid T > 0 °C' category may generally contain a portion of mixed phase clouds with melting ice. Here, only flight F20 had such flight segments of positive temperatures ( Figure 4e); therefore we can expect a fraction of this category in DT to include mixed phase. Figure 5. Decision tree illustrating the methodology of the classification based on conditional statements on data from bulk water content and temperature sensors [20]. The full decision tree includes further sub-classification using particle diameters.  [20]. The full decision tree includes further sub-classification using particle diameters.

Data Preparation
Selecting and preparing the input parameters for multivariable characterization of atmospheric environment is a challenging task. In recent literature, a wide variety of in situ and remote sensing measurements have been used for similar classification objectives (see Section 1). There are numerous studies on input feature set selection in machine learning problems (e.g., [49][50][51]). Unfortunately, there is no universal procedure for input parameters selection that could be applied to most classification problems. Here, we use the assumption that coincident and collocated data from both in situ and remote-sensing instruments in the vicinity of the aircraft, which makes the atmospheric environment analysis, along the flight path, more coherent [31]. In addition, aerosol was included in the analysis as it is often involved in cloud processes e.g., as condensation and ice nuclei [52] while the static temperature plays a major role in cloud formation and phase determination, and it is often used in IWC parameterization (e.g., [53,54]).
Atmospheric data collected in flight with complex instrumentation, normally, has to pass numerous steps of data processing and corrections to reduce bias in the analysis. For example, OAP probes have corrections for out-of-focus particles, inter-arrival time correction for shattered particles, numerous options of sizing projected images of particles (e.g., [55,56]). Bulk probe data corrections include but not limited to reference-signal corrections, dead-time correction during deicing, collection efficiency corrections, latent heat release corrections [32,57]. Temperature can be corrected with a combination of readings from multiple sources. Airborne aerosol corrections include sampling efficiency, refractive index corrections, sizing corrections. Radar signal can include pointing vector correction, ground signal clutter correction, close-range bias correction, noise subtraction.
Here, we show that with minimal to no data treatment, it is possible to achieve a valid approximation of the flight conditions. The steps of data preparation we have taken are listed below.
First, data from all the different probes and instruments were temporally aligned with matching temporal resolution of 1 s in the overlapping flight segments. The variables were then tested for interdependence and those with high interdependence across numerous flights were removed from the input set to enhance variable complementarity and eliminate redundant features which presumably do not provide much additional information. Next, we selected the optimal input parameters, which provide the most valuable information to determine the phase of the detected clouds. From the atmospheric physics perspective, most commonly measured parameters include size, shape, and concentration of hydrometeors; bulk water content; atmospheric state; aerosol; and radar measurements. Hence, for clustering objective, bulk parameters such as LWC and TWC from Nevzorov sensors, static temperature, FSSP and OAP droplet concentration, mean volume diameter (MVD), W-band radar reflectivity, LDR and ZDR, concentration of aerosols, and RID voltage signal were taken as inputs (Table 1). A correlation matrix of the selected input variables is depicted in Figure 6. Red color indicates higher correlation, while blue color indicates higher anti-correlation. The hydrometeor number concentrations from a scattering probe and OAP ( Figure  4c,d) are merged into a composite concentration, ranging from 2 µm to 2 cm in particle diameter. Concentration of the full-size spectrum were used for the derivation of the mean volume diameter (Equation (1)), where Ni and Di are the number concentration and middle point of the i-th size bin, re- The next step in data preparation is to filter the input set, which can have biased values remaining as a result of instrument noise, icing, or malfunctions. For example, values of LWC and TWC from Nevzorov detectors were removed if LWC was larger than TWC or one of the variables was less than the sensitivity of the Nevzorov sensor. Such unphysical values of data may occur because LWC and TWC are measured independently by different detectors, therefore their values may not always obey the physics of LWC ≤ TWC, even if properly calibrated. LWC and TWC are calculated by subtracting measured power and a baseline corresponding to clear air power [14], when the latter might have some error that in turn could lead to negative values of LWC or TWC.
The hydrometeor number concentrations from a scattering probe and OAP (Figure 4c,d) are merged into a composite concentration, ranging from 2 µm to 2 cm in particle diameter. Concentration of the full-size spectrum were used for the derivation of the mean volume diameter (Equation (1)), where N i and D i are the number concentration and middle point of the i-th size bin, respectively. The size distribution of the hydrometeors has a direct impact on the W-band radar variables, recorded using the side-pointing antenna (see Section 2.1.1). In order to avoid unreliable and noisy signal, all three radar variables (i.e., Z, ZDR, LDR) were masked using a SNR threshold of 0 dB. Differential reflectivity exhibited close range biases within 1-km distance from the aircraft (not shown) and was calibrated using flight segments in small liquid drops, where ZDR is approximately 0 dB.
Further steps in data preparation addressed the issue of missing values. Keeping missing input values (e.g., NaN, Inf) induces large voids in each input parameters dataset, which will remain unclassified, a scenario that would lead to a larger data void in the clustering product because of the overlap of multiple input datasets. A common practice is to fill the gaps in observations with the mean of the non-missing values or some predefined constants. However, such data manipulation does not work well and in most cases will likely result in the 'cloud' class inflation during the time intervals when the instrument was legitimately not in operation, or the signal was below its sensitivity threshold and hence could not be detected. Rather than ignoring missing data in the input dataset or replacing it with mean values, we impute the missing values for each input variable, independently, according to the following procedure: LWC, TWC, aerosol concentration, MVD, and number particle concentration missing values were replaced with 0; measurement gaps from the RID were equated to the nearest non-missing value, temperature was subjected to linear interpolation of neighboring non-missing values; W-band radar reflectivity, ZDR and LDR voids were substituted with −35 dBZ, −5 dB and −40 dB, respectively.
The input variables (Table 1) are not commensurate and had to be normalized to the range 0-1 to avoid biases in the classification due to the span in orders of magnitudes between absolute values of different variables.

Results
In this section, we present two independent flight-case analyses as well as an intercomparison between several different classification approaches, such as k-means, DT, and fuzzy clustering. 2D-S imagery and lidar measurements were used for manual validation of the classification results.

Variety of Conditions in Flight
Two flights with different conditions of interest were selected for multivariable atmospheric environment characterization: flight F20 on 23 February 2019 and flight F28 on 5 March 2019 (see Section 2.2). Bernstein et al. [20] summarized the environmental conditions of interest, highlighting cases with high IWC, supercooled LWC, freezing drizzle and freezing rain, small drops, and mixed phase. Figure 7 depicts the percentages of occurrence of each of the five primary categories using the DT approach. Flight F20 is characterized by the significant IWC amount detected above the layer of freezing rain, shown as a dominant portion of glaciated conditions throughout the flight [38]. Flight F28 represents a case with a wide variety of sizes and shapes and the presence of supercooled liquid water below −20 • C. as a dominant portion of glaciated conditions throughout the flight [38]. Flight F28 represents a case with a wide variety of sizes and shapes and the presence of supercooled liquid water below −20 °C.

Flight 20
The results of our analysis based on the probability distribution functions (PDF) and assigned clusters suggest that the blue cluster represents a clear-air environment as it has low signatures in the Nevzorov sensors and missing values in radar measurements. Moreover, the absence of imagery data from the 2D-S probe between 15:33 and 15:37 UTC supports the clear-air labeling in this time interval. Concentration of larger aerosol may be detected in this period by the optical probe and the aerosol instruments. Glaciated phase (green color) is notable for the low LWC/TWC ratio, RID signal below 3 V, temperatures below 0 • C, smaller concentration and larger MVD of the particles, and ZDR shifted to positive values. Mixed phase identified here by the algorithm (purple color) is characterized by LWC/TWC in the range 0 to 0.5, temperatures below and above the freezing point, and wider range of RID and number concentration. Interestingly, in many studies the choice of an LWC/TWC phase fraction threshold is dictated by the type and accuracy of the instrumentation, such as 0.1 and 0.9 [14]. There are two purely liquid phases observed during flight F20 corresponding to the red and yellow clusters. The main difference between these two liquid conditions is seen in number concentration, temperature and RID measurements. Yellow cluster is characterized by temperatures below 0 • C and oscillating RID signal indicating supercooled liquid water freezing and accretion, whereas red cluster implies liquid above 0 • C (Figure 8). It is worth noting that some studies show that pure liquid droplets larger than 100 µm may be misclassified by Nevzorov as mixed phase [58].
during flight F20 corresponding to the red and yellow clusters. The main difference between these two liquid conditions is seen in number concentration, temperature and RID measurements. Yellow cluster is characterized by temperatures below 0 °C and oscillating RID signal indicating supercooled liquid water freezing and accretion, whereas red cluster implies liquid above 0 °C (Figure 8). It is worth noting that some studies show that pure liquid droplets larger than 100 µm may be misclassified by Nevzorov as mixed phase [58].  We have selected a couple of flight periods to examine cases with more scrutiny. Figure 9 shows the time series plots of the aircraft pressure-altitude and six input variables between 13:41 and 13:57 UTC during flight F20. The NRC Convair-580 aircraft climbed through four different cloud conditions during this time interval, from freezing rain layer (A), through a warm nose (B), into mixed phase and more ice-laden layers above (C, D). The K-means algorithm separated the supercooled liquid state into a yellow cluster, as seen in flight section 'A' in Figure 9. This separation is supported by the increasing RID reading and sub-zero temperature. Next, in period B (dominated by red cluster) we observe positive temperatures and mostly a small-droplet environment, also seen in the 2D-S imagery. In A and B, the LDR value is near the detection limit of −30 dB (not shown). An LDR reading in raining spherical drops is likely a leaked co-pol signal measured in cross-pol channel. In the subsequent transition from B to C, we see a detectable signal increase as large ice appears. The transition between mixed and glaciated environments (C and D in Figure 9) is characterized primarily by changes in RID readings, temperature, and LWC/TWC ratio while W-band radar LDR value in ice remains above the values seen in section A, B. Figure 10 depicts the PDFs of each input variable recorded in flight F28 where color coding denotes the number of clusters assigned by the k-means algorithm. The low values of liquid and total water content, hydrometeor concentration and size, as well as distinct detached peaks in the W-band radar reflectivity, ZDR, and LDR indicate that the blue cluster signifies clear-air flight segments. LWC/TWC ratios differentiate glaciated phase from liquid and mixed conditions. Lower concentration and higher MVD suggest that the purple cluster, likely, represents an ice-dominated environment. Positive radar dBZ values and mean volume diameter larger than 100 µm indicates the presence of large ice aggregates within the glaciated class. It is interesting to note that the concentration of these samples has a multimodal distribution, implying different types of ice particles. Time series of concentration and size in the purple cluster and 2D-S images confirm this assumption, showing spheroids, columnar crystals, and aggregates of ice corresponding to the time intervals of glaciated conditions (Figures 11-13). RID response greater than 4 V indicates the presence of supercooled liquid water in two of the clusters, which we refer to as "Mixed 2" and "liquid" (Figure 10). When liquid water is present, the increase in RID signal in the time series is notable (Figure 11), whereas flat or decreasing RID signal correspond to the absence of liquid water or a LWC below the sensitivity threshold of the detector at temperatures consistently below −10 • C throughout this flight. "Mixed 1" class identified is distinct from other classes for its lower values in the LWC and TWC sensors and in the RID detector and can be interpreted as mixed phase with greater portion of ice. Therefore, green and red clusters represent mixed phase clouds with different proportions of liquid and ice particles. Green cluster (Mixed 1) has also lower hydrometeor concentration compared to the red cluster (Mixed 2), indicating that the green cluster could be a transition phase between clear air and mixed phase. The K-means algorithm separated the supercooled liquid state into a yellow cluster, as seen in flight section 'A' in Figure 9. This separation is supported by the increasing RID reading and sub-zero temperature. Next, in period B (dominated by red cluster) we observe positive temperatures and mostly a small-droplet environment, also seen in the 2D-S imagery. In A and B, the LDR value is near the detection limit of −30 dB (not shown). An LDR reading in raining spherical drops is likely a leaked co-pol signal measured in crosspol channel. In the subsequent transition from B to C, we see a detectable signal increase as large ice appears. The transition between mixed and glaciated environments (C and D in Figure 9) is characterized primarily by changes in RID readings, temperature, and LWC/TWC ratio while W-band radar LDR value in ice remains above the values seen in section A, B.  identified is distinct from other classes for its lower values in the LWC and TWC sensors and in the RID detector and can be interpreted as mixed phase with greater portion of ice. Therefore, green and red clusters represent mixed phase clouds with different proportions of liquid and ice particles. Green cluster (Mixed 1) has also lower hydrometeor concentration compared to the red cluster (Mixed 2), indicating that the green cluster could be a transition phase between clear air and mixed phase. After analyzing the individual variables, identifying the five clusters assigned by the k-means algorithm using physical interpretation (Figures 10 and 11), and validating the labels with complementary information, we can now describe the overall flight conditions. Figure 12 shows the NRC Convair-580 aircraft pressure-altitude flight profile on 5 March 2019 during flight F28. The color-coded parts of the trajectory denote five clusters assigned by the k-means algorithm and further manually labeled. On ascent, the aircraft  In a deeper analysis of the first missed approach segment, we notice a layered macro structure. Figure 13 shows the aircraft pressure-altitude flight profile, its corresponding W-band reflectivity, color-coded according to the identified clusters, and complementary lidar signal and depolarization ratio plots. The lidar depolarization ratio used here as an auxiliary qualitative tool for validation, since it was not calibrated for absolute values. In section A of this flight segment, identified as liquid by the k-means algorithm, we observe mostly 10 µm droplets, depicted as single dark pixels recorded by the 2D-S probe. Lidar's low depolarization ratio is another indicator of the liquid phase observed at 3 km altitude. Next, in section B, radar reflectivity is increased, and the segment is classified mostly as "mixed 1". Lidar plots indicate a transition between layers on descent, where a larger variety of ice particles is detected by the 2D-S probe. Lidar depolarization ratio is medium to high when the aircraft enters section C of the flight, classified as "mixed 2" by the kmeans algorithm. The 2D-S probe images show 10 µm droplets together with larger 100 µm hydrometeors. At the warmer temperature, in section D (identified as "glaciated"), radar reflectivity is further increased in correlation with the 2D-S imagery of larger ice aggregates. The RID constant reading ( Figure 11) indicates limited to no additional accretion and therefore we conclude prevailing ice composition.

Flight 28
Such validation analyses show that the k-means algorithm cluster assignment corresponds well to the physical cloud structure observed with the complementary tools in flight F28. From the clustering results, we can infer some valuable information on the characteristic range of instruments' values corresponding to different cloud phases. For example, the glaciated phase in this flight has an LWC/TWC ratio below 0.25, large MVD in the range of 100-1000 µm, fairly flat (non-accreting) RID signal, and enhanced aerosol concentration about 10 3 cm −3 . As opposed to the glaciated environment, liquid conditions in flight F28 have lower MVD and radar reflectivity values, larger hydrometeor concentration and an LWC/TWC ratio greater than 0.8. Undoubtedly, future studies will need to combine observations from numerous flights with various conditions to obtain universal thresholds for each cloud category.

Classification Intercomparison
In the previous section, we focused on k-means classification however, different approaches for atmospheric classification exist for applications along the flight track (see Section 1). In this section, we compare the results from four such classification techniques using data from flight F28, namely: k-means, DT, fuzzy clustering, and image sphericity derived classification produced by ECCC and distributed with their dataset (see Data Availability Statement section). The latter method is based on particle size distributions for Liquid and Ice categories, classified using area equivalent diameter and maximal diameter for any imaged particle, a mixed phase is classified when both categories are present and clear-air is classified when none of the categories are present. In fuzzy clustering method for this flight, the optimal number of 4 clusters was determined through the SA-IAFSA algorithm (see Section 2.3) after the data preparation procedure. Figure 14a-d il- After analyzing the individual variables, identifying the five clusters assigned by the k-means algorithm using physical interpretation (Figures 10 and 11), and validating the labels with complementary information, we can now describe the overall flight conditions. Figure 12 shows the NRC Convair-580 aircraft pressure-altitude flight profile on 5 March 2019 during flight F28. The color-coded parts of the trajectory denote five clusters assigned by the k-means algorithm and further manually labeled. On ascent, the aircraft passed a clear air segment of the flight and made transition into a mixed phase, at 12:22 UTC. It entered the narrow interval of liquid prevalent conditions at cloud top at 12:33 UTC, followed by a descent into a mixed phase environment and a missed-approach maneuver in the glaciated cloud phase at 12:41 UTC. The NRC Convair-580 then ascended from the glaciated to mixed phase environment and subsequently from mixed phase to liquid layers and ultimately to the clear air environment. The almost symmetrical occurrence of clusters at different altitude levels, over~1 h segment, over the east coast of Lake Michigan (Figures 2 and 3), suggest a persistent cloud layer and composition. After these transitions, in the interval 13:10-13:22 UTC, we observe a few patchy periods of liquid water between clear air segments. It is discernible that the glaciated layers were present at the lower altitude, mostly below 850 hPa, whereas the mixed phase and the liquid conditions were observed near cloud tops during the porpoising maneuvers of flight F28 in the interval 14:00-15:18 UTC.
In a deeper analysis of the first missed approach segment, we notice a layered macro structure. Figure 13 shows the aircraft pressure-altitude flight profile, its corresponding W-band reflectivity, color-coded according to the identified clusters, and complementary lidar signal and depolarization ratio plots. The lidar depolarization ratio used here as an auxiliary qualitative tool for validation, since it was not calibrated for absolute values. In section A of this flight segment, identified as liquid by the k-means algorithm, we observe mostly 10 µm droplets, depicted as single dark pixels recorded by the 2D-S probe. Lidar's low depolarization ratio is another indicator of the liquid phase observed at 3 km altitude. Next, in section B, radar reflectivity is increased, and the segment is classified mostly as "mixed 1". Lidar plots indicate a transition between layers on descent, where a larger variety of ice particles is detected by the 2D-S probe. Lidar depolarization ratio is medium to high when the aircraft enters section C of the flight, classified as "mixed 2" by the kmeans algorithm. The 2D-S probe images show 10 µm droplets together with larger 100 µm hydrometeors. At the warmer temperature, in section D (identified as "glaciated"), radar reflectivity is further increased in correlation with the 2D-S imagery of larger ice aggregates. The RID constant reading ( Figure 11) indicates limited to no additional accretion and therefore we conclude prevailing ice composition.
Such validation analyses show that the k-means algorithm cluster assignment corresponds well to the physical cloud structure observed with the complementary tools in flight F28. From the clustering results, we can infer some valuable information on the characteristic range of instruments' values corresponding to different cloud phases. For example, the glaciated phase in this flight has an LWC/TWC ratio below 0.25, large MVD in the range of 100-1000 µm, fairly flat (non-accreting) RID signal, and enhanced aerosol concentration about 10 3 cm −3 . As opposed to the glaciated environment, liquid conditions in flight F28 have lower MVD and radar reflectivity values, larger hydrometeor concentration and an LWC/TWC ratio greater than 0.8. Undoubtedly, future studies will need to combine observations from numerous flights with various conditions to obtain universal thresholds for each cloud category.

Classification Intercomparison
In the previous section, we focused on k-means classification however, different approaches for atmospheric classification exist for applications along the flight track (see Section 1). In this section, we compare the results from four such classification techniques using data from flight F28, namely: k-means, DT, fuzzy clustering, and image sphericity derived classification produced by ECCC and distributed with their dataset (see Data Availability Statement section). The latter method is based on particle size distributions for Liquid and Ice categories, classified using area equivalent diameter and maximal diameter for any imaged particle, a mixed phase is classified when both categories are present and clear-air is classified when none of the categories are present. In fuzzy clustering method for this flight, the optimal number of 4 clusters was determined through the SA-IAFSA algorithm (see Section 2.3) after the data preparation procedure. Figure 14a  Overall, in this flight, while fuzzy clustering captures well the transition between clear air, mixed and glaciated environments, it fails to capture the pure liquid phase with the same accuracy as k-means or DT algorithms. On the other hand, k-means clustering, and DT classification also have noticeable distinctions between them. For instance, one can notice the difference in clear-air classification, around 13:16 UTC where the k-means algorithm underestimated the clear-air class occurrence in comparison to DT, instead classifying some of the segments as mixed phase. This can be explained by the inclusion of the radar measurements in the input parameters set, an increase in LDR signal from −40 dB to −25 dB was found for the samples of mixed phase while other input parameters showed no significant difference between clear-air and mixed-phase around 13:16 UTC (Figure 11).
One other distinction of the k-means output in comparison to the DT classification is the segregation of two mixed-phase classes from glaciated conditions. To examine these discrepancies between the two methods with higher scrutiny, we selected a test case between 13:01 and 13:05 UTC in ascent above 800 hPa pressure-altitude level ( Figure 15). We show the time-series of the NRC Convair-580 aircraft pressure-altitude flight profile color-coded with the classes assigned by the k-means and DT algorithms (Figure 15a,b) temporally aligned with lidar depolarization ratio (Figure 15c), and selected input variables time series (Figure 15d,e). Comparing mixed phase 1 (green cluster) in k-means with the corresponding glaciated (purple cluster) classified in DT in the period where the aircraft ascends from mixed-phase 2 (red cluster), we notice a significant increase in the complementary lidar depolarization information. The highly attenuated lidar depolarization ratio becomes less attenuated with medium to high depolarization ratio values. Another major change identified during the ascent is the decrease in the bigger diameter hydrometeors (>1 mm) (Figure 15d) and stronger fluctuations in their concentration. Other variations were observed also in a decrease in radar reflectivity in comparison to the glaciated phase earlier in the flight and a significant decrease in the aerosol concentration (Figure 15e). Overall, in this flight, while fuzzy clustering captures well the transition between clear air, mixed and glaciated environments, it fails to capture the pure liquid phase with the same accuracy as k-means or DT algorithms. On the other hand, k-means clustering, and DT classification also have noticeable distinctions between them. For instance, one can notice the difference in clear-air classification, around 13:16 UTC where the k-means algorithm underestimated the clear-air class occurrence in comparison to DT, instead classifying some of the segments as mixed phase. This can be explained by the inclusion of the radar measurements in the input parameters set, an increase in LDR signal from −40 dB to −25 dB was found for the samples of mixed phase while other input parameters showed no significant difference between clear-air and mixed-phase around 13:16 UTC (Figure 11).
One other distinction of the k-means output in comparison to the DT classification is the segregation of two mixed-phase classes from glaciated conditions. To examine these discrepancies between the two methods with higher scrutiny, we selected a test case between 13:01 and 13:05 UTC in ascent above 800 hPa pressure-altitude level ( Figure 15). We show the time-series of the NRC Convair-580 aircraft pressure-altitude flight profile colorcoded with the classes assigned by the k-means and DT algorithms (Figure 15a,b) temporally aligned with lidar depolarization ratio (Figure 15c), and selected input variables time series (Figure 15d,e). Comparing mixed phase 1 (green cluster) in k-means with the corresponding glaciated (purple cluster) classified in DT in the period where the aircraft ascends from mixed-phase 2 (red cluster), we notice a significant increase in the complementary lidar depolarization information. The highly attenuated lidar depolarization ratio becomes less attenuated with medium to high depolarization ratio values. Another major change identified during the ascent is the decrease in the bigger diameter hydrometeors (>1 mm) (Figure 15d) and stronger fluctuations in their concentration. Other variations were observed also in a decrease in radar reflectivity in comparison to the glaciated phase earlier in the flight and a significant decrease in the aerosol concentration ( Figure  15e). The sphericity derived classification is in good agreement with the k-means clustering for clear-air and glaciated periods. However, the pure liquid phase is almost not present in the former method, instead it is classified as mixed phase (Figure 14e). This is due to the lowest manual threshold set to classify as 'mixed' cases when both ice and liquid classes contain concentration larger than zero. The challenge of setting such thresholds is addressed in depth by Korolev et al. [59] and it is discussed in the next section.

Discussion and Practical Applications
In cloud phase characterization, the scientific community is currently relying on a variety of definitions. Korolev et al. [59], in their review paper, stated that there is no consensus regarding what defines "liquid", "mixed phase", and "ice" conditions. As an example, they illustrated the complexity in defining extreme ratios between hydrometeor phases, spatial clustering of phases (i.e., two types: genuine and conditional volumetric mixed-phase), and longevity of the phases. They listed the known instrumental limitations (both in situ and remote) and concluded that in the future, only the synergy of in situ measurements with different platform remote sensing will lead to an advanced understanding of mixed-phase clouds across the globe, even though the mixed-phase definition might change.
Nguyen et al. [31] have implemented this approach and demonstrated its efficiency in unprecedented, collocated, airborne in situ and radar measurements, including in characterization of mixed phase and riming events in the Arctic clouds and showed agreement with models. In the study presented here, we used the same sampling methodology, focusing our effort on the vicinity of the aircraft. While simultaneous collection of data using multiple measurement techniques should potentially improve the characterization of clouds, the subjectivity inherent in interpreting results from multiple techniques also introduces the potential to misinterpret results and/or introduce bias in order to find agreement with predisposed opinions [59]. In order to minimize the aforementioned opinions, we tested a number of taxonomy techniques, which resulted in a split of the mixed phase category into Mixed1 and Mixed2, as predicted by Korolev et al. [59]. It should be noted that this analysis is based on a limited number of flight hours and while it requires further scrutiny, the advantage of the different capability of lidar and radar to detect various sizes of liquid droplets and ice crystals and the detection capability of in situ instruments in a convoluted product is profound and can provide a more consistent definition of various types of mixed-phases than the definitions derived from each individual instrument separately.
In some numerical models, the cloud phase is still separated by simplistic thresholds assumptions, e.g., temperature threshold, size threshold, or concentration threshold, into liquid and ice. However, the reality is much more convoluted with the presence of counterintuitive, small spheroidal ice, large rain drops, supercooled liquid (down to −40 • C) and high ice concentration (e.g., after ice multiplication events). Misclassified phase will essentially propagate into errors in the derived properties like the optical depth of the cloud as well as cloud longevity, timescale and nature of cloud chemistry taking place in a certain cloud phase.
Finally, it is worth noting that due to its multi-instrument nature, the presented classification approach produces useful output even in the absence of exceedingly sophisticated quality-control or corrections of input parameters that would normally be required for quantitative analysis of cloud microphysics measurements.
The classification of events is a common problem crossing many fields. The outcome of this study has an additional practical application for the classification of icing environments using an impartial approach. A development of cloud masking method based on the classified portions of the flight would enable determination of cloud boundaries and cloud types, subsequently paving the path for characterization of hazardous conditions via multiplatform analyses, i.e., airborne, spaceborne, ground, and models.

Conclusions
In this work, we have presented a multivariable characterization of the atmospheric environment with data collected in flight. The data was collected onboard the NRC Convair-580 aircraft in two selected flights during the ICICLE campaign in 2019. We implemented a multivariable classification approach to facilitate segregation of cloud analyses into different atmospheric state categories as well as to develop a tool for operational decision onboard the aircraft. The variety of sampling technologies installed on our aircraft (e.g., optical, electrical, vibrational, remote sensing, thermal) enabled us to validate the results of a common k-means clustering characterization approach with complementary independent measurements e.g., imagery and lidar. This new capability was compared with other advanced classification methods. The number of identified classes agrees with the lowest number of expected categories of the physical atmospheric content in these flight environments (e.g., clear-air, liquid, glaciated, mixed-phase/multimodal). An additional cluster was identified, indicating a split of the mixed-phase category. The pristine classes of clear-air and glaciated clouds have shown an excellent agreement with the ECCC phase separated datasets, which passed a manual expert evaluation of the phase.
The time demanding processing of individual datasets collected by each of the instruments and the subsequent quality assessment often requires high scrutiny. Our study shows that untreated and uncorrected data can be used in some methods to produce an adequate representation of the flight environment, validated with complementary tools, therefore potentially saving time and cost of processing.
This single-flight, individual, classification approach can be expanded, with more computing power, to include a greater number of flights for an offline study of sampling features across a higher variety of environments. The findings of this study have enabled the identification of the minimal number of key input parameters for objective classification of the environment. Those will be used in a development of a Convolutional Neural Network (CNN) approach for real-time implementation (e.g., [60]) in flight. However, the selection of the best methodology to use still strongly depends on the backend user application of the classification results.
Icing studies onboard NRC's Convair-580 have been ongoing for decades (e.g., [4,19,61]) and will continue in the foreseen future. The methodology presented here has the potential to become an operational tool for the NRC aircraft fleet, which is heavily used in international atmospheric science missions and can be transferred to other aircraft that have a similar suite of common instruments onboard.
Combining this approach with ground, modeling, and satellite datasets in this project will help to develop more accurate modeling tools for weather forecasting and ground detection of icing hazard.