Characterization of Volcanic Cloud Components Using Machine Learning Techniques and SEVIRI Infrared Images

Volcanic explosive eruptions inject several different types of particles and gasses into the atmosphere, giving rise to the formation and propagation of volcanic clouds. These can pose a serious threat to the health of people living near an active volcano and cause damage to air traffic. Many efforts have been devoted to monitor and characterize volcanic clouds. Satellite infrared (IR) sensors have been shown to be well suitable for volcanic cloud monitoring tasks. Here, a machine learning (ML) approach was developed in Google Earth Engine (GEE) to detect a volcanic cloud and to classify its main components using satellite infrared images. We implemented a supervised support vector machine (SVM) algorithm to segment a combination of thermal infrared (TIR) bands acquired by the geostationary MSG-SEVIRI (Meteosat Second Generation—Spinning Enhanced Visible and Infrared Imager). This ML algorithm was applied to some of the paroxysmal explosive events that occurred at Mt. Etna between 2020 and 2022. We found that the ML approach using a combination of TIR bands from the geostationary satellite is very efficient, achieving an accuracy of 0.86, being able to properly detect, track and map automatically volcanic ash clouds in near real-time.


Introduction
Any emission of gasses and particles from a volcano that reach the atmosphere is referred to as a volcanic cloud. The principal component of volcanic clouds is the ash derived from magmatic material that emerges as solid into large particles [1], while the major volcanic gasses are water vapor (H 2 O), sulfur dioxide (SO 2 ) and carbon dioxide (CO 2 ). H 2 O and CO 2 are widely distributed in the atmosphere, making it difficult to distinguish these gasses of volcanic origin from background quantities. Therefore, the approaches proposed to characterize the major components of a volcanic cloud are mainly focused on the detection and quantification of ash and SO 2 [2]. The focus on ash and SO 2 is mainly driven by the impacts that these components have on the atmosphere and the environment [3,4]. Moreover, both of these emissions can be hazardous to public health and aviation activities [5][6][7]. SO 2 is converted to sulfuric acid, forming small droplets that affect the Earth's radiation balance by reflecting solar radiation away from the surface [8]. The resulting disturbance to the Earth's radiation balance affects surface temperatures through direct radiative effects as well as through indirect effects on atmospheric circulation, resulting in important natural causes of climate change on many timescales [9]. Volcanic ash can also affect the radiation balance, but since it only lasts a few days in the atmosphere, its effects are mostly local. An eruptive column still carrying these hot particles can produce a pyroclastic flow, which can be deadly to those unlucky enough to be at the base of the volcano [10][11][12]. Nevertheless, volcanic ash is a hazard for aircraft, because it can damage the jet engines [13].
The discrimination between volcanic and weather clouds is a major issue, because in the region where the two types of clouds overlap, the BTD can be negative both for volcanic and for meteorological clouds. For this reason, correction procedures were applied [40][41][42].
Currently, new advanced approaches to set the best BTD threshold for the detection of volcanic ash clouds were developed, such as machine learning (ML) techniques. ML algorithms offer an innovative paradigm to automatically process a huge amount of satellite data for volcanology applications [43][44][45]. In this way, it is possible to create the volcanic cloud mask automatically, avoiding the manual time-consuming drawing. ML techniques can be used for classification purposes since they are able to learn complex patterns and trends from data. The pixel classification of volcanic cloud components can be a useful procedure because from the processing of satellite images acquired with high temporal resolution it is possible to follow the evolution of a volcanic cloud, its composition and how its components are dispersed in the atmosphere. The extraction of information related to a volcanic cloud using satellite data is a challenging procedure, because sometimes it is difficult to distinguish volcanic ash clouds from more common meteorological clouds, such as thin cirrus clouds, and in most cases, it is necessary the supervision of an operator to discriminate the components of a volcanic cloud. Supervised ML approaches learn from a training dataset, which includes a set of coupled inputs and expected outputs. Furthermore, an unsupervised ML algorithm does not use a training dataset but extracts common features from the input data based on their similarity.
We recently explored the potential of ML techniques to detect volcanic clouds by using both supervised and unsupervised approaches, namely support vector machine (SVM) and K-means [46,47]. As expected, the results show that SVM outperforms K-means since the intervention of the human to choose and label the training samples makes the model much more accurate and moreover generalizable.
Here, we take a step forward toward the full characterization of the volcanic clouds. We propose a SVM-based method to both detect a volcanic cloud and to discriminate its components, namely, to classify its pixels as rich of ash, rich of SO 2 or characterized by mixed components (ash, SO 2 and other). The SVM algorithm was implemented in the Google Earth Engine (GEE) platform [48]. This cloud computing platform offers unique opportunities for remote sensing data collection, processing, analysis, and visualizations at a regional scale with direct access to a multi-petabyte analysis-ready data catalogue. The proposed SVM exploits as input a combination of bands in the infrared regions of images acquired by the sensor SEVIRI and returns as output an image with four classes: ash, SO 2 , mix of ash and SO 2 (or simply mix), and background. This ML algorithm was applied to some of the paroxysmal explosive events that occurred at Mt. Etna between 2020 and 2022, considering a separate dataset as training and as testing.

2020-2022 Etna Paroxysmal Events
Mt. Etna (Figure 1) is the largest active volcano in Europe, and its activity often originates from its summit areas, which include four craters, named Northeast Crater (NEC), Voragine (VOR), Bocca Nuova (BN) and Southeast Crater (SEC), the most active crater since 1998 [49]. The youngest cone is the New Southeast Crater (NSEC), which began to grow after 2011 over the east flank of the SEC cone [50,51]. Eruptive activity after 2013 has led to the merging of the NSEC with the SEC [52]. Between 13 December 2020 and 21 February 2022, a new explosive phase took place at Mt. Etna, giving rise to 66 paroxysmal lava fountain episodes (from INGV weekly bulletins at www.ct.ingv.it, accessed on 1 July 2022) [53][54][55]. These events were characterized by Strombolian explosions, lava fountains, formation of short-lived lava flows and generation of eruptive columns [56]. An eruptive column is a cloud composed of ash, tephra and gasses emitted during a volcanic eruption. It can rise several kilometers above the vent of the volcano and can feed ash plumes. Ash plumes generated during the 2020-2022 Etna paroxysmal event reached an altitude of about 9-10 km above sea level and were spread by the wind into the atmosphere. The columns [56]. An eruptive column is a cloud composed of ash, tephra and gasses emitted during a volcanic eruption. It can rise several kilometers above the vent of the volcano and can feed ash plumes. Ash plumes generated during the 2020-2022 Etna paroxysmal event reached an altitude of about 9-10 km above sea level and were spread by the wind into the atmosphere. The explosive events considered in this study are those characterized by an intense and violent activity that produced large and high volcanic clouds. These were on 23 February 2021, 12 March 2021, 9 August 2021 and 10 February 2022.

Satellite Data Sources
SEVIRI is an instrument onboard the Meteosat Second Generation (MSG) geostationary satellite, operated by EUMETSAT [57]. It measures radiances in 12 spectral channels, which cover the range of visible to the infrared with a spatial resolution of 3 km at the equator to 4.5 km at Mediterranean latitudes. The main advantages of SEVIRI is its high temporal resolution of 15 min for the full disc and 5 min in rapid scan mode, which mainly covers Europe [58]. For the detection and characterization of volcanic cloud components, only the thermal infrared (TIR) channels were considered, and in particular, the spectral bands centered at 8.7, 10.8 and 12.0 µm.

Methods
The SVM is a general supervised learning method used for classification and regression. The foundations of SVM have been developed by Vapnik [59] and gained popularity due to many promising features such as better empirical performance. The SVM algorithm aims to find the plane or set of hyperplane maximizing the distance between samples belonging to different classes closest to the boundary [60,61]. The SVM was already exploited by the authors to detect a volcanic cloud and has been shown to be more efficient than unsupervised algorithms such as K-means [47]. One alternative to the traditional SVM can be the Neural Networks (NN), which sometimes make classification

Satellite Data Sources
SEVIRI is an instrument onboard the Meteosat Second Generation (MSG) geostationary satellite, operated by EUMETSAT [57]. It measures radiances in 12 spectral channels, which cover the range of visible to the infrared with a spatial resolution of 3 km at the equator to 4.5 km at Mediterranean latitudes. The main advantages of SEVIRI is its high temporal resolution of 15 min for the full disc and 5 min in rapid scan mode, which mainly covers Europe [58]. For the detection and characterization of volcanic cloud components, only the thermal infrared (TIR) channels were considered, and in particular, the spectral bands centered at 8.7, 10.8 and 12.0 µm.

Methods
The SVM is a general supervised learning method used for classification and regression. The foundations of SVM have been developed by Vapnik [59] and gained popularity due to many promising features such as better empirical performance. The SVM algorithm aims to find the plane or set of hyperplane maximizing the distance between samples belonging to different classes closest to the boundary [60,61]. The SVM was already exploited by the authors to detect a volcanic cloud and has been shown to be more efficient than unsupervised algorithms such as K-means [47]. One alternative to the traditional SVM can be the Neural Networks (NN), which sometimes make classification problems much easier and faster to compute. NN have already been employed to detect volcanic clouds [62][63][64][65][66][67]. However, NN is not always the answer to all classification problems. SVM requires 2 parameters (c and γ, which will be described in Section 3.2), a kernel to map the input data to a higher dimensional space and few input data, which need minimal processing. Furthermore, NN requires several parameters that depend on the number of layers, a non-linear activation function and many input data, which generally need a lot of processing. Therefore, since we have limited data and a simple classification problem, we prefer to implement a SVM algorithm compared to other more complex algorithms such as NN because it produces results with easy implementations and in a quick way.
The objective of this work is to train a SVM model for a multi-class classification problem, in order to segment a combination of SEVIRI TIR bands, called Ash RGB (Red, Green, Blue) into 4 classes: pure ash cloud, pure SO 2 cloud, mix cloud and background. In this way, it is possible to detect and characterize a volcanic cloud produced during a volcanic eruption. Once the model has been trained on some labeled samples and tested on new images, due to its high accuracy, it can be applied to any new images, without being trained again. The possibility of applying the model to new images makes it generalizable and applicable during a volcanic event even in near real time to monitor the evolution of a volcanic cloud.
We implemented the SVM model in GEE to automatically process and analyze SEVIRI Ash RGB images. The Ash RGB uses only infrared window channels, and therefore, it can be used both day and night for the detection and monitoring of volcanic ash as well as for sulfur dioxide gas.
In Figure 2, a general scheme of the ML algorithm proposed is reported. It represents the three main steps of our procedure: (a) the input feature preparation, i.e., the combination of the SEVIRI TIR band to construct the Ash RGB images, (b) classification, i.e., the design of the SVM classifier, and finally, (c) performance evaluation, to determine whether our model is working well or not.  (c) Performance Evaluation. The first step is to prepare the input feature of the SVM, obtained by opportunely combining the thermal infrared (TIR) bands centered at 8.7, 10.8 and 12.0 µm. The second is the implementation in GEE of the SVM algorithm to detect a volcanic cloud and identify its main components, while the last is the evaluation of the performance of the model using a confusion matrix.

Input Feature Preparation
The first step of our procedure is to prepare the input feature of the SVM algorithm. This step is fundamental since a discriminative set of features can properly separate the thin volcanic ash from cloud objects, provide SO 2 detection, and allow for analysis of scenes containing a mix of ash and sulfur dioxide (also called mixed region). SEVIRI images are provided by the EUMETSAT Data Store. In particular, we used the product MSG Level 1.5 Image Data, which corresponds to image data that have been corrected for all unwanted radiometric and geometric effects, has been geolocated using a standardized projection, and has been calibrated and radiance-linearized [68]. The Level 1.5 image, provided in a geostationary projection (GEOS Projection), were georeferenced to the reference system EPSG:4326-WGS 84.
By opportunely combining TIR bands centered at 8.7, 10.8 and 12.0 µm, it is possible to obtain an Ash RGB image highlighting the presence of the main components of a volcanic cloud. In particular, Ash RGB images are realized combining the brightness temperature (BT) of the three SEVIRI TIR channels in this way: red: BT 12.0 − BT 10.8 ; green: BT 10.8 − BT 8.7 ; blue: BT 10.8 [69]. The channel combination in the red beam is the reverse of the "split window" method [23]: thin volcanic ash tends to have a strong reddish color, while meteorological clouds have no contribution. The green channel emphasizes the presence of SO 2 (marked by green pixels), since it compares the SO 2 absorption band at 8.7 µm with the non-absorbing 10.8 µm band. Finally, the 10.8 µm in the blue beam provides a high contrast background for ash detection and removes the influence of cumulonimbus clouds. Therefore, depending on the concentration, red pixels indicate the presence of thin volcanic ash, green pixels the presence of SO 2 , while yellow pixels mark the mixed regions of a volcanic cloud characterized by both ash and SO 2 . There are some limitations related to an Ash RGB image, such as the difficult identification of ash and SO 2 when they are mixed with cirrus clouds. Moreover, another important limitation is related to the viewing angle, since the colors of a SEVIRI Ash RGB image depend on it. At high satellite viewing angles (>65 • ), it is difficult to correctly discriminate the volcanic cloud components, especially the SO 2 gas because the water clouds appear on a green color similar to that of the SO 2 . When the satellite viewing angle is close to the sub-satellite point, the volcanic cloud components can be discriminated more easily and accurately. However, the main advantage of using this type of image is to easily recognize the different components of the volcanic cloud due to its intuitive colors [70]. These considerations led us to use the Ash RGB image as input feature of our SVM model. Thus, for each SEVIRI image, we selected the three TIR bands centered at 8.7, 10.8 and 12.0 µm and combined them to obtain an Ash RGB composite image.
Pixels of the Ash RGB image were normalized between 0 and 1 by applying Equation (1): where x is the pixel to normalize, x min the minimum pixel value of the image and x max the maximum pixel value of the image.

Classification
The SVM classifier was implemented in GEE to detect a volcanic cloud and to identify its main components, namely to classify its pixels as rich of ash, rich of SO 2 or with mixed components (ash, SO 2 and other). The SVM technique was already applied to volcano monitoring, and it was demonstrated that it provides successful results [71,72], due to its high performance and lower computation costs. Until now, algorithms and models have been developed for the retrieval of the components of a volcanic cloud [73][74][75] but without the implementation of artificial intelligence approaches. However, machine learning and deep learning techniques have been exploited mainly for the detection of a volcanic cloud [63,66,76,77].
We designed the SVM classifier using a radial basis function (RBF) as kernel, whose goal is to take data as input and transform them into the required form [78]. The RBF kernel function resembles Equation (2): where γ is the "spread" of the kernel and should be carefully tuned according to the problem. Another important parameter is the cost c, which is the penalty for misclassifying a data point. We set the parameter γ = 0.5 and the cost c = 10.
The realization of any supervised model such as SVM classifier consists of two parts, the training phase and the testing phase.
The first step in the training phase is to build a dataset of pixels from Ash RGB images labeled as pixel rich of ash belonging to a volcanic cloud (pure ash pixel), pixel rich of SO 2 belonging to a volcanic cloud (pure SO 2 pixel), pixel characterized by mixed components belonging to a volcanic cloud (mixed components pixel) and pixel not belonging to the cloud (background pixel). As a training dataset for the SVM classifier, small areas belonging to volcanic clouds are manually selected and are labeled as pure ash, pure SO 2 or mixed components. Moreover, background regions are also defined and labeled. This approach is called multi-class classification, since the machine learning classification task consists of more than two classes [79,80]. In our case, the number of classes is four: (1) pure ash, (2) pure SO 2 , (3) mix of ash and SO 2 and (4) background. The training samples are extracted from three SEVIRI Ash RGB images acquired on 23 February 2021 01:27 UTC, 23 February 2021 06:12 UTC and 12 March 2021 10:12 UTC. We chose these images because each of them presents a volcanic cloud with different composition, respectively, a pure-SO 2 cloud, a pure-ash cloud and a cloud characterized by both ash and SO 2 . For each component, we used as training samples 10% of the total pixels related to that component contained in the three training images. For example, for the ash class, we used 10% of the total pure ash pixels contained in the three training images. Once the training dataset is built, the SVM model learns how to create the right output.
The second step is the testing phase; thus, the model learned by the training data is applied to the testing data, which are new SEVIRI Ash RGB images not containing training samples. To ensure that the trained model generalizes properly, it is important to verify the performance on different testing data that are not part of the training data.
We decided to use less training data than testing data to avoid the errors related to the labeling process, which is a manual process, and therefore to avoid ambiguity. In this way, we created a classifier as generalizable as possible that is able to discriminate the volcanic cloud from the background and to characterize its main components, resulting in an image with four classes (pure ash, pure SO 2 , mix and background).
In Table 1, the images used as training and as testing of the proposed SVM model are reported. Table 1. List of images used as training and as testing of the proposed SVM model.

Training Images
Testing Images

Performance Evaluation
Since we have a classification model for categorical classes, a confusion matrix can be used to evaluate the performance. The confusion matrix gives a comparison between actual and predicted values and is represented in a N × N matrix, where N is the number of classes. Each column of the matrix represents the instances in an actual class while each row represents the instances in a predicted class, or vice versa [81].
Let us consider the confusion matrix for a simple binary classification example. If we compare the actual classification values to the predicted classification values, there are 4 different outcomes: • F1-score, which combines precision and recall into a single measure In the case of multi-class classification, we need to calculate TP, FN, FP and TN for each individual class [82]. Figure 2c shows the confusion matrix for a multi-class classification problem with four classes (Ash, SO 2 , Mix and Background). The dark green diagonal represents correct predictions, while the other light green cells indicate the incorrect predictions (E). As shown, TP ash is the number of true positive ash samples in class Ash, while E ash,SO2 is the number of samples from class Ash that were incorrectly classified as class SO 2 . The false negative for the class Ash is the sum of E ash,SO2 , E ash,mix and E ash,background , which indicates the sum of all class ash samples that were incorrectly classified as class SO 2 , Mix or Background. False positive for any predicted class, which is located in a row, represents the sum of all errors in that row.
The indices used to establish the reliability of our results are: • Micro-averaged F1-score (Micro-F1): it is calculated by using the regular F1-score formula considering the total TP, total FP and total FN of the model. It is a global metric since it does not consider each class individually. In the case of a supervised machine learning model, we aim to build a model starting from a training dataset that also has very high performance indices in the test phase and therefore is able to work well with new dataset.

Results
The SVM classifier was trained on samples extracted from three SEVIRI Ash RGB images (see Table 1). First, the SVM model is applied to the SEVIRI Ash RGB images from which the training samples were extracted in order to classify the remaining pixels not used for the training. Each image is fed into the model, and for each case, we obtain as a result an outcome image in which all pixels are classified as pure ash, pure SO 2 , mix ash/SO 2 or background.
In Figure 3a,c,e, we can see the SEVIRI Ash RGB images (23 February 2021 01:27 UTC, 23 February 2021 06:12 UTC and 12 March 2021 10:12 UTC) used as input of the SVM model, and in Figure 3b,d,f, the corresponding outcomes. These results are not used to evaluate the accuracy of the model, since training samples were extracted from these three images. For the testing phase, we selected three new images, where no training was performed on them. In Figure 3b,d,f, the pixels colored in red, green and yellow correspond, respectively, to the class pure-ash, pure SO 2 and mixed components predicted by the SVM. The contours in red, green and yellow correspond, respectively, to the actual area characterized by pure ash, pure SO 2 and mixed components, defined manually by visual inspection. Even visually, we can note that our algorithm classifies pretty well the pixels within a volcanic cloud. correspond, respectively, to the class pure-ash, pure SO2 and mixed components predicted by the SVM. The contours in red, green and yellow correspond, respectively, to the actual area characterized by pure ash, pure SO2 and mixed components, defined manually by visual inspection. Even visually, we can note that our algorithm classifies pretty well the pixels within a volcanic cloud. In order to test the classifier, new images not used during the training phase are chosen. The SVM previously trained is applied to the SEVIRI Ash RGB images used as testing (see Table 1). The testing images were chosen in order to take into account volcanic clouds produced by eruptions from Etna that occurred in different and not consecutive periods in the 2020-2022 timeframe and therefore to make the evaluation of the accuracy of our algorithm more robust. In Figure 4a  In order to test the classifier, new images not used during the training phase are chosen. The SVM previously trained is applied to the SEVIRI Ash RGB images used as testing (see Table 1). The testing images were chosen in order to take into account volcanic clouds produced by eruptions from Etna that occurred in different and not consecutive periods in the 2020-2022 timeframe and therefore to make the evaluation of the accuracy of our algorithm more robust. In Figure 4a  The accuracy of the proposed classifier was computed using the confusion matrix for the multi-class classification. For each testing image, we realized a confusion matrix, based on the areal extensions, and then, we extracted the respective indices (Micro-F1, Macro-F1 and Weighted-F1). Subsequently, we combined the confusion matrices of each testing images into one, summing the three matrices element by element, in order to obtain the 4 × 4 total confusion matrix. In Table 2, the confusion matrix with percentage values is shown. Starting from the total confusion matrix, we calculated the overall performance indices. In Table 3, the performance indices for the multi-class classification are collected, calculated considering the three testing images separately and finally using the total confusion matrix.  The accuracy of the proposed classifier was computed using the confusion matrix for the multi-class classification. For each testing image, we realized a confusion matrix, based on the areal extensions, and then, we extracted the respective indices (Micro-F1, Macro-F1 and Weighted-F1). Subsequently, we combined the confusion matrices of each testing images into one, summing the three matrices element by element, in order to obtain the 4 × 4 total confusion matrix. In Table 2, the confusion matrix with percentage values is shown. Starting from the total confusion matrix, we calculated the overall performance indices. In Table 3, the performance indices for the multi-class classification are collected, calculated considering the three testing images separately and finally using the total confusion matrix. For the calculation of these performance indices, we did not take into account the values related to class background because the region outside the volcanic cloud is extremely larger than the one inside. Therefore, we would have a very great value of TP background , which would make the indices very high (also close to 1). Specifically, we are interested in evaluating the classifier's ability to discriminate components within the cloud, for this reason, in the performance evaluation, we considered only the predicted and actual values related to ash, SO 2 and mix.

Discussion
The SVM model has been found to be very efficient, since it is able to properly classify an image in a few minutes. Training samples were extracted from three images of reference, which are 23 February 2021 01:27 UTC, 23 February 2021 06:12 UTC and 12 March 2021 10:12 UTC. We decided to use these images because each of them contains a volcanic cloud which has different and specific characteristics. Specifically, the 23 February 2021 01:27 UTC image presents a pure-SO 2 volcanic cloud, 23 February 2021 06:12 UTC image a pure-ash volcanic cloud, and 12 March 2021 10:12 UTC image, a cloud with both components.
First, we applied the SVM model to the three images from which the samples were extracted, and the results of the classification are shown in Figure 3b,d,f. We can see that the shape of the volcanic clouds is detected correctly, and most classified pixels belong to its real class. However, as the testing dataset, we used three new images where no training was performed, which are 12 March 2021 09:27 01:27 UTC, 9 August 2021 03:27 UTC and 10 February 2022 21:57 UTC. The results of the classification of the testing images are shown in Figure 4b,d,f.
To evaluate the performance of the algorithm, we took into account only the three images used as testing, and we created the confusion matrix reported in Table 2. On the vertical axis, we have the actual labels, while on the horizontal axis, the predicted labels of the testing samples. A perfect classifier is characterized by a confusion matrix with values only on the diagonal. If we look at the first column, which refers to the ash class, we can see that 76.8% of ash test samples is correctly classified, while the remaining 23.2% is wrongly predicted: 0.09% is classified as SO 2 , 6.5% as mix and 16.62% as background. This confusion matrix allowed us to retrieve the metrics for the measurements of the model, which are reported in Table 3. The performance indices Micro-F1, Macro-F1 and Weighted-F1 are exploited to evaluate the ability of the model to correctly classify the volcanic cloud components. If we considered the three testing images separately, from Table 3, it is clear that the performances related to the first testing image (acquired during the day, at 09:27 UTC) are lower than the other two testing images (acquired during the night, respectively, at 03:27 and 21:57 UTC). Most probably, this is not related to the different times of day during which the acquisition took place, because SEVIRI Ash RGB images work well during both day and night periods, but to other factors. One of them is surely the greatest difficulty in recognizing the ash inside a cloud. This difficulty in discriminating the ash component is found especially when in a volcanic cloudthere are more components. In Figure 4b, we can see how part of the volcanic cloud containing ash was not classified as such, but as background. In contrast, in the case of a pure-ash volcanic cloud (such as that in Figure 3c), this component is detected very well (Figure 3d). On the whole, the values of these performance indices are always higher than 0.80 for the three testing images, and this means that the actual components present in a volcanic cloud are well predicted. Finally, we calculated the performance indices considering all the values of the four-class confusion matrix. In this way, we obtained an overall estimate of the indices Micro-F1, Macro-F1 and Weighted-F1 (respectively, 0.86, 0.83 and 0.86), and we can conclude that the model performances are great.
We decided to use another machine learning algorithm to compare our SVM model. In particular, we implemented a random forest (RF) model, which is a classification algorithm consisting of many decisions trees. Each decision tree is trained independently using samples of the training dataset with replacement, and their results are combined to obtain the final RF outcome based on the majority vote [83]. This algorithm establishes the outcome based on the class selected by most trees. The RF is a supervised ML algorithm; therefore, we have used the same training dataset exploited to train our SVM model. The parameter to set in the RF is the number of trees. We performed four tests choosing as the number of trees 10, 50, 100 and 150. We observed that, in all four cases, the results obtained by this algorithm are acceptable, but in some cases, they are not very good.
In Figure 5, we reported the outcomes of the RF model, with a number of trees equal to 100, applied to the images of Table 1. The input of the model is a SEVIRI Ash RGB image. In Figure 5a-c, we can see the results of the classification of the three images from which the samples were extracted, whereas in Figure 5d-f, the results of the classification of the three testing images, where no training was performed, are shown. We decided to use another machine learning algorithm to compare our SVM model. In particular, we implemented a random forest (RF) model, which is a classification algorithm consisting of many decisions trees. Each decision tree is trained independently using samples of the training dataset with replacement, and their results are combined to obtain the final RF outcome based on the majority vote [83]. This algorithm establishes the outcome based on the class selected by most trees. The RF is a supervised ML algorithm; therefore, we have used the same training dataset exploited to train our SVM model. The parameter to set in the RF is the number of trees. We performed four tests choosing as the number of trees 10, 50, 100 and 150. We observed that, in all four cases, the results obtained by this algorithm are acceptable, but in some cases, they are not very good.
In Figure 5, we reported the outcomes of the RF model, with a number of trees equal to 100, applied to the images of Table 1. The input of the model is a SEVIRI Ash RGB image. In Figure 5a,b,c, we can see the results of the classification of the three images from which the samples were extracted, whereas in Figure 5d,e,f, the results of the classification of the three testing images, where no training was performed, are shown. By visual inspection, we can observe that the RF model detects volcanic clouds with an area greater than the real areal extension (Figure 5b,f). For the cases 23 February 2021 01:27 UTC, 12 March 2021 10:12 UTC and 12 March 2021 09:27 UTC, the RF model seems to work well (Figure 5a,c,d). However, in the case of 9 August 2021 03:27 UTC, the classification of RF does not work at all, because the SO2 is confused with most of the background (Figure 5e), while our SVM model is able to well discriminate the cloud from the background and classify its components (Figure 4d).
The accuracy of the RF model is computed using the confusion matrix for the multiclass classification, and from this, we extracted the performance indices. In Figure 6, the comparison between the performance indices of the SVM and the RF is reported. The overall accuracy of the RF model results is very low, and this is mainly due to the fact that the image of 9 August 2021 03:27 UTC is not classified well. By visual inspection, we can observe that the RF model detects volcanic clouds with an area greater than the real areal extension (Figure 5b,f) (Figure 5a,c,d). However, in the case of 9 August 2021 03:27 UTC, the classification of RF does not work at all, because the SO 2 is confused with most of the background (Figure 5e), while our SVM model is able to well discriminate the cloud from the background and classify its components (Figure 4d).
The accuracy of the RF model is computed using the confusion matrix for the multiclass classification, and from this, we extracted the performance indices. In Figure 6, the comparison between the performance indices of the SVM and the RF is reported. The overall accuracy of the RF model results is very low, and this is mainly due to the fact that the image of 9 August 2021 03:27 UTC is not classified well. We can conclude that our SVM model is more efficient than the RF for the task of detection and characterization of a volcanic cloud. It works well not only when it is applied to images from which the training samples are extracted (Figure 3a,c,e), but also with images where no training is performed (Figure 4a,c,e). Therefore, the possibility of applying the SVM model to new images makes it generalizable and applicable during a volcanic event in near real time to monitor the evolution of a volcanic cloud.
In Figure 7, a 3D plot visualizing the BT values of the pixel in a three-dimensional space is shown, where the axes are the channels of the SEVIRI Ash RGB images, and each point has a different color according to its class group (red for ash, green for SO2, and yellow for mix of ash and SO2).  Table 4. We can conclude that our SVM model is more efficient than the RF for the task of detection and characterization of a volcanic cloud. It works well not only when it is applied to images from which the training samples are extracted (Figure 3a,c,e), but also with images where no training is performed (Figure 4a,c,e). Therefore, the possibility of applying the SVM model to new images makes it generalizable and applicable during a volcanic event in near real time to monitor the evolution of a volcanic cloud.
In Figure 7, a 3D plot visualizing the BT values of the pixel in a three-dimensional space is shown, where the axes are the channels of the SEVIRI Ash RGB images, and each point has a different color according to its class group (red for ash, green for SO 2 , and yellow for mix of ash and SO 2 ). The first channel (BT 12 Table 4. The ash centroid assumes a very high value in channel 1, which is the channel related to the presence of thin volcanic ash. Furthermore, the SO 2 centroid has a high value in channel 2, which is related to the presence of SO 2 gas plume. Lastly, the mix centroid has high value both for channel 1 and channel 2, since the pixels classified as mix contain both ash and SO 2 . Then, the pixels classified as ash will be on the right side of the 3D graph, which is the part with the highest values in channel 1. The pixels classified as SO 2 will instead be concentrated on the left side of the 3D graph, where there are the highest values of channel 2. Instead, the pixels classified as mix will be in a position between the ash class and the SO 2 class. A SEVIRI pixel has a rough spatial resolution (around 16 km 2 at the considered latitudes); thus, being very large, it can contain more components that can be difficult to discriminate. For this reason, it can happen that some pixels may not be correctly classified and will be far from the centroid to which they have been assigned. Furthermore, misclassification may be due to the dependence of the colors on the satellite viewing angle and the difficult identification of ash and SO 2 when they are mixed with cirrus clouds. However, we found that the pixels of each class are well distributed around their own centroid. Since we have verified that the accuracy of our model is high, we can say that the centroids have been correctly determined.  The ash centroid assumes a very high value in channel 1, which is the channel related to the presence of thin volcanic ash. Furthermore, the SO2 centroid has a high value in channel 2, which is related to the presence of SO2 gas plume. Lastly, the mix centroid has high value both for channel 1 and channel 2, since the pixels classified as mix contain both  The proposed SVM model and the high temporal resolution of SEVIRI give us the possibility to visualize and follow a volcanic cloud during an eruptive episode, from its formation to its complete dispersion in the atmosphere. In Figure 8, the tracking of the major components of the volcanic cloud produced during the 12 March 2021 event at Etna volcano is represented. This analysis was conducted considering 18 SEVIRI Ash RGB images acquired on 12 March 2021 (from 07:42 UTC to 11:57 UTC). The SVM model was applied to each image. The graph in Figure 8 shows the total number of pixels inside the detected volcanic cloud and the number of pixels related to each component inside the cloud. The volcanic cloud is visible from the 08:12 UTC, and its dimensions increase until 11:12 UTC. Afterward, the volcanic cloud starts to disperse into the atmosphere, and the number of pixels begins to decrease progressively. Thus, we demonstrated how our SVM model combined with the high sample rate of SEVIRI can be exploited to track the dispersion of a volcanic cloud in near real time.

Conclusions
Accurate detection, tracking, and ultimately nowcasting (i.e., near real time tracking and short-term forecasting) of volcanic ash clouds combining high temporal resolution satellite data and machine learning techniques have immediate applications to the realtime monitoring of volcanic explosive eruptions. By monitoring, we mean here both following the manifestations of the eruption once it has started as well as forecasting the areas potentially threatened by major components of a volcanic cloud. The need for integrated and efficient monitoring systems, operating on a global scale, and including tools for producing different scenarios as eruptive conditions change, is a primary challenge for volcanic hazard assessment.
We described and demonstrated the operation of a SVM classifier designed for the detection of volcanic clouds and the classification of their main components. The results show a unique temporal dataset of major volcanic cloud components, from their formation to their dispersion in the atmosphere, hosted and processed by a cloud computing platform. This enabled the rapid assessment of eruption evolution via a cloud computing platform that can collect and process time series data within minutes.

Conclusions
Accurate detection, tracking, and ultimately nowcasting (i.e., near real time tracking and short-term forecasting) of volcanic ash clouds combining high temporal resolution satellite data and machine learning techniques have immediate applications to the real-time monitoring of volcanic explosive eruptions. By monitoring, we mean here both following the manifestations of the eruption once it has started as well as forecasting the areas potentially threatened by major components of a volcanic cloud. The need for integrated and efficient monitoring systems, operating on a global scale, and including tools for producing different scenarios as eruptive conditions change, is a primary challenge for volcanic hazard assessment.
We described and demonstrated the operation of a SVM classifier designed for the detection of volcanic clouds and the classification of their main components. The results show a unique temporal dataset of major volcanic cloud components, from their formation to their dispersion in the atmosphere, hosted and processed by a cloud computing platform. This enabled the rapid assessment of eruption evolution via a cloud computing platform that can collect and process time series data within minutes.
We validated our approach in an operational context at Etna volcano during the paroxysmal explosive events that occurred between 2020 and 2022. As input of the classifier, we used Ash RGB composite images generated using a combination of three TIR bands from SEVIRI. The advantage of this model is that, once trained, due to its good accuracy, it can be applied to new images without having to be trained again. As a result, the proposed SVM is able to detect and characterize the volcanic cloud in any new image, thus allowing one to detect and track the entire volcanic event due to SEVIRI high temporal resolution. This approach gives us the possibility to monitor the evolution of a volcanic cloud in near real time and therefore to understand which regions may be most affected by its impact, since the emissions of ash and SO 2 can be hazardous to public health.
Although further analyses are required to fully evaluate the performance of the SVM model, it may support operational monitoring centers, such as the Etna Volcano Observatory (EVO), involved in better managing of volcanic ash clouds, providing time-critical hazard information. Therefore, the developed technology is expected to improve operational hazard detection, alerting, and management capabilities, minimizing the impact of dangerous and highly destructive paroxysmal events on populations and the environment.