Ensemble-Based Data Assimilation of Volcanic Ash Clouds from Satellite Observations: Application to the 24 December 2018 Mt. Etna Explosive Eruption

Accurate tracking and forecasting of ash dispersal in the atmosphere and quantification of its uncertainty are of fundamental importance for volcanic risk mitigation. Numerical models and satellite sensors offer two complementary ways to monitor ash clouds in real time, but limits and uncertainties affect both techniques. Numerical forecasts of volcanic clouds can be improved by assimilating satellite observations of atmospheric ash mass load. In this paper, we present a data assimilation procedure aimed at improving the monitoring and forecasting of volcanic ash clouds produced by explosive eruptions. In particular, we applied the Local Ensemble Transform Kalman Filter (LETKF) to the results of the Volcanic Ash Transport and Dispersion model HYSPLIT. To properly simulate the release and atmospheric transport of volcanic ash particles, HYSPLIT has been initialized with the results of the eruptive column model PLUME-MoM. The assimilation procedure has been tested against SEVIRI measurements of the volcanic cloud produced during the explosive eruption occurred at Mt. Etna on 24 December 2018. The results show how the assimilation procedure significantly improves the representation of the current ash dispersal and its forecast. In addition, the numerical tests show that the use of the sequential Ensemble Kalman Filter does not require a precise initialization of the numerical model, being able to improve the forecasts as the assimilation cycles are performed.


Introduction
During explosive eruptions, volcanoes can release large volumes of ash into the atmosphere. The term volcanic ash refers to those particles that, with a diameter < 2 mm, can remain suspended into the atmosphere for days or longer and can therefore be transported over great distances from the volcanic source. Monitoring and forecasting ash dispersal and deposition patterns is of crucial importance for volcanic hazard mitigation, which includes safety procedures for aviation [1,2] and populations living near volcanoes [3,4].
South-East Crater producing violent Strombolian activity. A second small fissure opened between the New-South-East and the North-East craters generating weak Strombolian activity lasted a few minutes. The two fissures fed a lava flow going east. A sustained ash plume rose from the two fissures and the summit craters of Bocca Nuova and North-East. The ash cloud drifted south-east and ash fall-out was found in the area of Zafferana Etnea, a village located at the slopes of Mt. Etna. From 13:30 ash emission from the summit craters decreased, while the Strombolian activity from the fissures was still active and persisted with violent and discontinuous emissions losing intensity during the night. A high level of seismicity persisted during the whole eruptive event with earthquakes up to magnitude 4 M located in the Etna region [58,59].
The sustained volcanic column generated during the peak in activity was recorded by the ground-based visible (VIS) camera of INGV-OE located in Catania (27 km from the summit craters), Figure 1 Panel (b). The camera had been calibrated to estimate column height [60] and INGV-OE reported a maximum height of 8.5 km asl at around 11:30. The camera estimates are in agreement with the plume height computed by exploiting the well-established dark pixel procedure applied to the SEVIRI measurements collected during the eruption [30]. Both estimates show a rapid decrease in column height which reached 5 km asl at around 12:00. Etna, while a picture taken during the eruption is displayed in the small lower-left box (photo taken by Boris Behncke). Panel (b) reports volcanic column height above sea level as estimated by the dark pixel procedure applied to SEVIRI data (blue dots) and by the calibrated VIS camera (gray dots).

Numerical Models: Plume-Mom and Hysplit
To simulate the release and the atmospheric dispersal of ash particles we coupled the eruptive column model PLUME-MoM [56] with the VATD model HYSPLIT [15]. PLUME-MoM is an integral plume model which simulates the rise of volcanic plumes in steady-state conditions and in a 3-D coordinate system. The transport of the volcanic mixture (solid particles and gas) is simulated by solving the set of transport equations for mass, momentum and energy, where the effect of the wind and the loss of particles during the rise are considered. The treatment of the polydisperse nature of the pyroclastic mixture is done applying the method of moments, which allows the treatment of particle aggregation and fragmentation. For this reason, the set of equations commonly employed by volcanic plume integral models [18,61] is reformulated accounting for the transport of the moments of each solid phase defining the volcanic mixture (more details and the complete description of the model are in de' Michieli Vitturi et al. [56]). ESPs and atmospheric conditions at the vent location are needed to initialize PLUME-MoM. ESPs include column height (from which mass eruption rate can be estimated) or mass eruption rate directly, Total Grain Size Distribution (TGSD) of the solid particles, water vapor content and mixture temperature. In case PLUME-MoM is initialized with column height instead of mass flow rate, an inversion procedure is operated internally by the model to find the best-fit mass eruption rate by searching for the optimum combination of vent diameter and initial mixture velocity [56]. When the inversion is done, initial mixture temperature and water content are kept constant. TGSD at the vent is defined as a series of bins each representing a specific diameter and containing a certain mass fraction (diameters are expressed in the logarithmic phi-scale).
The VATD model that we used to simulate the atmospheric transport of volcanic ash is the HYbrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) model [15]. HYSPLIT has been developed by the NOAA Air Resources Laboratory's (ARL) and computes simple air parcel trajectories, as well as complex transport, dispersion, chemical transformation and deposition simulations. HYSPLIT is currently used as VATD model by Darwin, Wellington, Washington and Anchorage VAACs. HYSPLIT calculates the dispersion of a pollutant by assuming either puff or particle dispersion. In the first approach, pollutant advection/dispersion is modelled by following the mean trajectories of puffs (packets of particles) which expand horizontally (and optionally vertically) due to turbulent mixing in the atmosphere until their dimensions exceed the size of a few meteo grid points. At this point, puffs are split into multiple smaller puffs, each with a proportional fraction of the original mass. Modelling pollutant dispersion through puffs instead of single particles allows a reduction of the computational times ensuring good model performance.
The results of PLUME-MoM in terms of mass fluxes of volcanic particles lost from the edges of the column are used to initialize HYSPLIT. Such particles are both those lost from the column during plume ascent and those injected into the atmosphere at the neutral buoyancy level. Routines written in Python automatically produce the HYSPLIT input files from the results of PLUME-MoM. A single control file allows the user to specify the input parameters needed by the two codes. In this file the user can set the computational domain (dimension and grid size), the duration of the eruption and the Eruptive Source Parameters (column height, Total Grain Size Distribution, water mass fraction, mixture temperature, etc.) defining the investigated event. Moreover, HYSPLIT requires meteorological wind data which can be both reanalysis or forecast data. Operationally, NOAA's Air Resources Laboratory (ARL) uses meteo data provided by the National Weather Service's National Centers for Environmental Prediction (NCEP). Such data can be downloaded for free and used to run the simulations [62]. PLUME-MoM takes into account atmospheric conditions to simulate the rise of a buoyant plume. For this reason, wind velocity, temperature, pressure and humidity profiles at the vent location are given as input to the model. To enforce the coupling between PLUME-MoM and the dispersal model HYSPLIT, these parameters are extracted from the meteorological data file used to perform the dispersal simulation through HYSPLIT. In the following, we refer to PLUME-MoM&HYSPLIT to indicate the complete model used to perform the simulations. Tadini et al. [63] show the application of PLUME-MoM&HYSPLIT for volcanic hazard assessment of Andean volcanoes. Figure 2 shows an example of the main outcomes of a typical PLUME-MoM&HYSPLIT simulation applied to the Mt. Etna case study. Plume height was set equal to 8300 m asl (i.e., 5000 m above the vent) and we set an eruption duration of 2 h (from 11:30 to 13:30 on 24 December 2018). TGSD is formed by 9 particle classes ranging from −3 to 5 φ with ∆φ = 1. Mixture temperature and water mass fractions are 1300 K and 0.03, respectively. Panel (a) shows the 3D structure of the plume as computed by PLUME-MoM, while plume radius, plume velocity, mixture density and relative density are displayed in Panel (b). Please note that we indicate with relative density the difference between the density of the volcanic mixture and the density of the surrounding air. A negative value of the relative density means that the density of the volcanic mixture is lower than the density of the atmosphere and thus the plume is buoyant. Mass fractions of the particle classes lost from the column during plume ascent are shown in Panel (c), where classes are numbered from CL1, corresponding to −3φ, to CL9, 5φ. As can be seen from the figure, the coarsest class (CL1) loses 35% of its initial mass during the ascent, while finer classes are transported up to the neutral buoyancy height without losing significant amount of mass. Finally, Panel (d) shows atmospheric ash columnar content (burden) in t km −2 as computed by HYSPLIT at two different time slices. A complete simulation (i.e., column generation and ash dispersion) can be done in a few minutes on a standard PC. For this reason, this tool has the potentiality for real-time applications, and it is particularly suitable for EnKF applications where hundreds of simulations are performed.

Satellite Data
The volcanic ash retrievals were based on measurements supplied by the Spin Enhanced Visible and Infrared Imager (SEVIRI) on board the MSG geostationary satellite [64]. SEVIRI has 12 spectral channels going from Visible (VIS) to Thermal InfraRed (TIR), a nadir spatial resolution of 3 km and a temporal resolution that ranges from 15 min (Earth full disk) to 5 min (rapid scan mode over Europe and Northern Africa). The retrieval has been realized by exploiting the SEVIRI channels centered at 11 and 12 µm and applying the well-established Volcanic Plume Retrieval (VPR) procedure, formerly Volcanic Plume Removal [25,65,66]. The VPR procedure is based on the computation of the volcanic ash cloud transmittances in the TIR channels and requires as input only the volcanic cloud altitude. From those, aerosol optical depth and effective radius are obtained, and ash column density is derived by applying the simplified formula introduced by Wen and Rose [28]. For the investigated eruption, we considered a cloud height of 8.5 km asl as estimated from the ground-based VIS camera measurements [60]. The total error estimated for ash columnar content is 40% of the retrieved values. This estimate is the result of sensitivity analysis performed on the retrieval procedure and takes into account uncertainties in Earth surface characteristics (temperature and emissivity), plume geometry (especially altitude and thickness), ash optical properties and interference of water vapor [67].

Data Assimilation: Algorithms And Tools
Kalman Filter (KF) is a sequential filter method. This means that the numerical model is integrated forward in time and, whenever measurements are available, these are assimilated to produce a new analyzed state with minimized errors with respect to previous model results and observations. The analyzed state is then used to reinitialize the model before the integration continues. From its first formulation [68], KF underwent a massive development both in mathematical formulations and applications. One of the main innovations has been the formulation of the so-called Ensemble Kalman Filters (EnKFs) [69,70] to treat large-scale numerical models. Indeed, classical KF requires high resources of storage and computational time to handle and compute error statistics. EnKFs overcome this limit by applying a Monte-Carlo method to forecast error statistics. The way in which EnKFs compute error statistics is by using an ensemble of model realizations to represent the state estimate as a mean state (i.e., ensemble mean) and a covariance matrix (i.e., ensemble covariance).
Several variants of the original EnKFs have been proposed over the recent years. Among them, the ensemble square-root Kalman Filters (EnSRKFs) are one of the most popular developments [71]. EnSRKFs compute the covariance matrices using the ensemble perturbations as square-root of the error covariance matrices. The advantage of this variant is that the assimilation is performed without the need to perturb the observations as done in the original EnKF formulations [70,72].
According to EnSRKFs theory, the state of a system, such as ocean or atmosphere, is estimated through a collection (ensemble) of m realizations of the system at time t k (x (α) k with α = 1, ..., m). The state estimate is given by the ensemble meanx k : and by the ensemble covariance matrix P k : In Equation (2), X k is the matrix of ensemble perturbations k ] being the ensemble matrix andX k = [x k , ...,x k ] is the collection of the ensemble mean. Please note that x (m) k andx k are column vectors, thus X k andX k are matrices. From Equation (2), we can observe that the square root of P k is given by the matrix of ensemble perturbations X k scaled by (m − 1) −0.5 .
Observations at time t k are in the form of the vector y o k of size p. The measurement operator H k links model state and observations as: with k being the vector of observation errors which are assumed to be Gaussian and with covariance matrix R. We assume the observations to be uncorrelated, thus R is a diagonal matrix. EnSRKFs, and KF in general, consist of two steps. In the first step (forecast step), ensemble members are advanced in time until observations are available (time t k ). In the forecast step, the statistics defining the forecast state arex f k and P f k , given by Equations (1) and (2) respectively. In the second step (analysis step), the filter is applied to provide a new analyzed ensemble whose statistics arex a k and P a k , both at time t k . The analyzed ensemble represents the linear combination of forecast state and observations minimizing overall uncertainty. This is the general idea behind data assimilation which combines information with known error statistics from different sources to reduce overall uncertainties.
For simplicity, in the following we omit the time index k, remembering that the assimilation cycle is performed at time t k .
For the present application, we used the Ensemble Transform Kalman Filter (ETKF) [73]. For the description of the algorithm we follow the notation of Nerger et al. [74]. When observations are available, the forecast state, whose statistics arex f k and P f k , is transformed into the analyzed state through the transformation matrix A defined by: where the forgetting factor γ has the role to increase the ensemble spread avoiding filter collapse. The analysis state covariance matrix P a and the state estimatex a are computed from the transformation matrix as: x a =x f + X fwETKF , withw ETKF being the weight vector: To compute the square root of the analysis state covariance matrix, X f is transformed as: The weight W ETKF is computed as: with C as the square root of A (CC T = A) and λ as an arbitrary orthogonal matrix of size mxm or the identity. Equations (6) and (8) can be combined into a single transformation of X f as: withW ETKF = [w ETKF , ...,w ETKF ]. From Equation (10), the analysis ensemble X a can be computed directly without updating the state estimate by Equation (6). In the following, we applied the ETKF in the localized version (Local Ensemble Transform Kalman Filter, LETKF). Local ETKF solves the same equations of the global filter, but it works on local domains and on localized observations [75]. Each local domain is a portion of the full model grid on which the filter is applied. In our case, each pixel of the computational grid represents a local domain. Only the observations lying within a fixed influence radius from the local domain are considered. At the end of the assimilation cycle, local analysis states are merged to form the global state vector and ensemble array. More details on local filters and their implementation can be found in Hunt et al. [55], Nerger et al. [76].
The tool that we adopted to perform the assimilation cycles is the Parallel Data Assimilation Framework (PDAF) [57]. PDAF is a software environment for ensemble DA developed and maintained at the Computing Center of the Alfred Wegener Institute. We followed the off-line implementation approach, which means that the numerical model doing the ensemble integration is executed separately from the assimilation program. When observations are available, model integration is stopped, and results are passed to PDAF which produces the analyzed ensemble. Each member of the analyzed ensemble is then used to advance the numerical integration stopped before the assimilation procedure. With the off-line implementation, model core is not modified, and files exchanged between the model and PDAF is done through routines which come with the PDAF package. PDAF offers a series of DA filters already implemented and the LETKF, the one we used for the present work, is one of the available options.

Data Assimilation Applied to Plume-Mom& HYSPLIT
We developed a DA procedure which uses SEVIRI observations of volcanic ash to correct the predictions (forecast state) produced by PLUME-MoM&HYSPLIT. In this application, the state is defined by the values of ash load (kg) on a pre-defined computational grid. Observations are assimilated to produce a new ash state (analyzed state) with minimized errors with respect to the forecast state and the observations. The analyzed state is used to initialize new model simulations which are integrated forward in time until new observations are available. At this point, a new assimilation cycle is performed ( Figure 3).

Ensemble Creation
The creation of the ensemble members is of fundamental importance for the success of the assimilation cycle. Indeed, the statistical moments describing the system (mean and covariance) are computed from the ensemble of model realizations. The sampling error done in representing the system with a finite number of model realizations decreases proportional to 1/ √ m, where m is the number of ensemble members. Thus, ensemble with enough members should be used to accurately reproduce the investigated system.
The first way that we investigate to generate the ensemble is to perform m PLUME-MoM&HYSPLIT simulations each initialized and carried forward with a perturbed version of a reference wind data. To generate the perturbed wind, a perturbation on horizontal wind direction and intensity is added to all the wind vectors forming the meteorological grid. In particular, the perturbations on wind direction are created from N rotation angles θ corresponding to N percentiles of a Gaussian distribution with µ θ and σ θ as mean value and standard deviation. The same is done to define the perturbations on wind intensity. In this case, a Gaussian distribution of mean and standard deviation µ int and σ int is defined and N coefficients are selected by considering N percentiles of the distribution. Each coefficient is a multiplication factor used to increase or decrease the intensity of the wind vectors. By combining the perturbations on wind direction and intensity, N 2 perturbed wind-fields are created (the original unperturbed wind data is included in the N 2 winds). After each assimilation cycle, ensemble member integration is carried forward by associating at each member a wind-field randomly sampled from the N 2 winds. It is important to remark that we do not force each ensemble member to have the same wind perturbation at different cycles.
We also tested a second way to generate the initial ensemble which is based on defining different eruptive scenarios each characterized by a different column height. Indeed, ash dispersal patterns as computed by VATD models strongly depend on the height at which ash particles are injected into the atmosphere at the volcanic vent location. For well-monitored volcanoes, as Mt. Etna is, estimates of column height can be supplied in real time by ground-and satellite-based measurements. However, the majority of sub-aerial volcanic systems is not routinely monitored and, in case of an eruption, estimates of column height are highly uncertain. For this reason, we tested the possibility to create the members of the ensemble from a set of hypothetical column heights spanning from 4300 to 18,300 m asl (i.e., 1000 to 15,000 m above the vent).
Uncertainties on ESPs other than column height are not considered in this work for the creation of the ensemble members. This means that, for example, initial Total Grain Size Distribution and water mass fraction are the same for all the ensemble members.

Observations Preprocessing
SEVIRI observations used in the assimilation cycles are re-gridded according to the computational grid used by HYSPLIT. Moreover, original ash columnar content (t km −2 ) is converted into ash load (kg) for consistency with the results of HYSPLIT. To define the observation covariance matrix (R), we assume that each pixel forming the observed ash cloud has its own error which is 40% of the ash mass contained in the pixel. This value is set according to the estimates done by Corradini et al. [67] about the uncertainty on ash columnar content from satellite retrievals. Satellite retrieved ash mass loading are 2D data, while HYSPLIT produces a 3D ash loading map. Thus, a measurement operator (H) is defined to compare observations and model results. H integrates in the vertical direction the modelled ash profile resulting in a total ash loading contained in each pixel. This quantity can be directly compared with the observed ash loading coming from SEVIRI observations.

Data Assimilation Cycle
When observations are available, the m numerical simulations forming the forecast ensemble are stopped and the corresponding state vectors enter as input in PDAF. Additional input data are the observation vector and the observation covariance matrix. We modified the interface routines of PDAF to adapt the assimilation cycle to our case study, while no modifications to the core routines of PDAF have been made. The core routines contain the assimilation algorithms that we used as a sort of black box. The filter that we used for our case study is the Local Ensemble Transform Kalman Filter (LETKF). We also tested global filters, but we found better performance with the localized ones. The influence radius for the observations to be considered was set equal to 5 pixels (about 50 km) and we set the forgetting factor equal to 0.8. The result of each DA cycle is a new analyzed ensemble whose members are used to continue forward with the simulations stopped before the assimilation step.

Simulation Settings
To test our assimilation procedure, we performed different experiments by varying the observation sampling time interval (∆t obs ), the number of ensemble members (m), and the way in which the initial ensemble is created.
The set of experiments that we performed is reported in Table 1. Table 1. Set of experiments performed to test the assimilation procedure. σ θ is the standard deviation of the perturbation in wind direction that we imposed to create the perturbed wind-fields, while σ int is the standard deviation of the perturbation in horizontal wind intensity. For all the experiments, µ θ = 0 • and µ int = 1. N is the number of percentiles that we considered from each Gaussian distribution (direction and intensity) and n wind indicates the total number of perturbed wind-fields used to create the ensemble members (n wind = N 2 ). H col is the number of hypothetical column heights from which ensemble members are created. m is the product of n wind by H col and indicates the number of members forming the ensemble. ∆t obs specifies the sampling time interval for the satellite observations to be considered in the assimilation procedure. The duration of the paroxysmal phase of the eruption is 1 h for each experiment, from 11:30 to 12:30 on 24 December 2018. The advection/dispersion of the ash cloud is simulated until 18:30 on the same day. INGV-OE bulletins and satellite observations reveal that ash emission persisted after the end of the paroxysmal phase. Thus, we simulated a continuous ash emission (from 12:30 to 18:30) setting a plume height of 4300 m asl (i.e., 1000 m above the vent).
The numerical wind-field used to test the assimilation procedure is produced by the Global Data Assimilation System (GDAS) which is managed by the National Weather Service's National Centers for Environmental Prediction (NCEP). We used the 3 hourly, global, 1 • latitude longitude dataset on pressure surfaces [77]. This data is the reference wind-field from which the perturbed winds are generated.
The ensemble members of EXP1 were created by perturbing the reference wind-field in both direction and intensity. In particular, we defined two Gaussian distributions, one for wind direction and one for wind intensity. The distribution for wind direction has µ θ = 0 • and σ θ = 15 • , while the distribution for wind intensity has µ int = 1 m s −1 and σ int = 0.5 m s −1 . We extracted 7 values from each curve by using the following percentiles: 16th, 25th, 40th, 50th, 60th, 75th, 84th. This operation resulted in 7 rotation angles equal to: −15 • , −10.1 • , −3.8 • , 0 • , 3.8 • , 10.1 • , 15 • and 7 intensity coefficients equal to: 0.5, 0.66, 0.87, 1, 1.12, 1.33, 1.5. Thus, a total number of 49 perturbed wind-fields was created by combining perturbations on wind direction and intensity. Column height was set equal to 7750 m asl from 11:30 to 11:45, 7050 m asl from 11:45 to 12:00, 5500 m asl from 12:00 to 12:15 and 5350 m asl from 12:15 to 12:30. These values were taken from the results of the dark pixel procedure applied to SEVIRI measurements done during the paroxysmal phase of the eruption (see Figure 1). We highlight that for this experiment, all the ensemble members were initialized with the same column height, which varies during the eruption. Thus, the parameter H col in Table 1 is equal to 1.
The TGSD used to initialize PLUME-MoM has mean value of 2 φ and std of 2.5 φ. The mean was set according to the TGSD reported in Scollo et al. [78] for the 2001 Mt. Etna flank eruption, while we increased the standard deviation to include in the distribution also the fine ash (i.e., particles with diameter up to 0.5 µm). TGSD was discretized in classes from −5 φ to 11 φ with ∆φ = 2. SEVIRI retrievals detect the very fine ash only (particles with diameter from 0.5 to 10 µm). For this reason, we assimilated and show the results only for classes 7, 9 and 11 φ, while the remaining classes do not enter in the procedure.
In EXP1 we assimilated SEVIRI observations every hour from 12:30 to 18:30. EXP2 and EXP3 differ from EXP1 for the standard deviation of the Gaussian distribution used for altering wind direction (σ θ = 10 • for EXP2 and σ θ = 20 • for EXP3). EXP4 and EXP5 present the same simulation settings of EXP1, but observations are assimilated every 30 min and 2 h respectively. Finally, in EXP6 ensemble members were created by varying both the eruptive scenario and the wind-field. In particular, we considered 5 possible column heights equal to 4300, 7300, 11,300, 15,300, 18,300 m asl (i.e., 1000, 4000, 8000, 12,000, 15,000 m above the vent). Moreover, we created 9 perturbed wind-fields from the set of rotation angles and intensity coefficients corresponding to the percentiles 16th, 50th and 84th. This results in the definition of an ensemble formed by 45 members (5 eruptive scenarios and 9 perturbed wind-fields).
The remaining input parameters were kept constant for all the experiments and are summarized in Table 2. These parameters define the properties of the eruptive mixture used to initialize PLUME-MoM and the setting parameters of the computational grid on which HYSPLIT simulations are performed. For the present application, particle aggregation was not considered and HYSPLIT simulations were done using the puff approach. However, in case of Mt. Etna eruptions, ash aggregation is not dominant [79]. It is worth noting that the same TGSD was used as input for all the experiments. Experiments from 1 to 5 aimed at testing the data assimilation procedure for uncertainties on meteorological conditions and for different observation sampling intervals. Since ESPs, including TGSD, are kept constant for all the experiments, the three particle classes which are assimilated carry the same fraction of the total ash mass loading for all the experiments (from 1 to 5). On the contrary, EXP6 was designed to test the effects that unknown ESPs (in this case column height) have on the assimilation results. A variation in column height determines a variation in mass eruption rate, and thus in the ash mass loading carried by the assimilated particle classes. Previous studies show that plume height is weakly dependent on the initial TGSD [80]. The reason is that the large amount of air entrained in the volcanic column reduces the contributions of the solid fraction to the overall dynamics [56,81]. Moreover, we are not interested in reproducing the deposit, which is more strongly related to initial TGSD, but we only look at the dispersal patterns of the fine ash. For these reasons, the choice that we made on initial TGSD determines the fraction of the total ash loading, which is assimilated only without major repercussions on the dynamics of the volcanic column.
We defined a computational domain formed by n x × n y cells (longitude and latitude directions) of 0.1 • by 0.1 • (1 • is about 100 km) and 7 vertical levels equal to 3300, 4000, 5000, 6000, 7000, 8000, 12,000 (the last level is equal to 20,000 m for EXP6).
For each experiment, the forecast state is formed by an ensemble of m model realizations each being the result of a PLUME-MoM&HYSPLIT simulation (Equation (1)). Model realizations are in the form of state vectors containing the variables to be assimilated, in our case the atmospheric ash load (kg) associated with each particle class, atmospheric level and computational cell. For the present case study, the dimension of the state vectors is 103761. This number is the product of n x (equal to 81), n y (equal to 61), the number of atmospheric levels (equal to 7) and the number of particle classes (equal to 3). Table 2. Input parameters used for the PLUME-MoM&HYSPLIT simulations. T mix and wv are the temperature and the water vapor mass fraction of the eruptive mixture, while C p is the specific heat of the solid particles. Particles density was set according to Bonadonna and Phillips [82], with a linearly increasing density from ρ 2 to ρ 1 for particles with diameter in the range 2 mm -8 µm, constant density ρ 2 for particles with diameter > 2 mm and constant density ρ 1 for particles with diameter < 8 µm. SF is particles shape factor. ∆ lat and ∆ lon is the resolution of the computational grid. Mt. Etna latitude and longitude coordinates are vent lat and vent lon respectively, while H vent is Mt. Etna elevation. numpar indicates the number of puffs released per cycle, while maxpar expresses the maximum number of puffs carried by the simulation. Finally, delt indicates HYSPLIT integration step in minutes.

Evaluation Metrics
For this work, we used several metrics to evaluate the performance of the numerical model (PLUME-MoM&HYSPLIT) and of the DA algorithm. Evaluation metrics for model forecast include the rank histogram [83], the Rank Probability Score [84], the Jaccard similarity coefficient, model precision and model sensitivity [85]. Such metrics were calculated at each assimilation cycle and were used to compare the numerically predicted ash state (forecast state) with the observed state. Please note that the satellite data were re-gridded to match the computational grid used for the numerical simulations.
The rank histogram, also known as Talagrand diagram [83], is a tool for assessing the reliability of ensemble forecasts. It is used to check whenever the ensemble spread is consistent with the assumption that observations statistically belong to the probability distributions of the forecast ensembles. Thus, for a good forecast, observations are indistinguishable from model forecasts and they can be considered to be members of the forecast distributions. The procedure that we adopted to construct the rank histogram is the following: at each grid point of the computational domain, we ranked the m ensemble members from the lowest to the highest producing m + 1 bins (the two extremes are included and they are rank 1 and rank m + 1). We identified which bin the observation falls into and we placed the observation in the appropriate bin. We tallied over all the grid points and we created a histogram (the Talagrand diagram). In case two or more ensemble members and the observation have the same value (most commonly 0), we randomly selected which bin receives the count. A flat Talagrand diagram means that observations are indistinguishable from any other ensemble member meaning that the ensemble spread correctly represents the uncertainty of the observations. An U-shaped diagram reflects a poorly spread ensemble, while a Dome-shaped one means that the spread of the ensemble is too large with respect to observation uncertainties. Indeed, in the first case (U-shaped), observations fall too often outside the extremes of the ensemble, while in the second case (Dome-shaped) observations fall too often near the center of the ensemble. An asymmetric histogram means that the ensemble contains bias. It is worth noticing that a flat histogram is not a necessary condition for a reliable ensemble, but it only means that the probability distribution of the observations is well represented by the ensemble spread [83].
While the rank histogram gives information on the ensemble for the whole domain, we also computed the Rank Probability Score (RPS) to analyze the ensemble results on the single observation points [84]. RPS measures the quadratic distance between the forecast and the observed probability distributions computed for a specific point of the domain: where m is the number of ensemble members, CDF f c,i is the i-th value of the forecast Cumulative Density Function (CDF), while CDF obs,i is the i-th value of the observed CDF. The CDFs for the forecasts were computed empirically from the ensemble values, while the CDFs for the observations are step functions centered at the observed values. As it is a quadratic operator, RPS penalizes larger deviations from the observed probability much stronger than smaller ones. RPS ranges from 0, in case of a perfect forecast, to 1. Model performance has also been evaluated through three indices named Jaccard similarity coefficient (R j ), also called Threat Score or Critical Success Index [86], model precision (R mp ), also called Positive Predictive Value, and model sensitivity (R ms ), also called Hit Rate or True Positive Rate. For the description of these indices we used the notation of Charbonnier et al. [85]. To calculate R j , R mp and R ms , we considered the number of False Positives, False Negatives, True Positives, and True Negatives resulting from the comparison of the forecast state (simulation results) with the observed state. In our case, pixels where ash is both numerically predicted and observed are denoted as True Positives (TP), while pixels where ash presence is simulated, but not observed, are called False Positives (FP). False Negative (FN) indicates that a pixel is numerically ash-free, but the presence of volcanic ash has been detected from space. Finally, ash-free pixels resulting from both model simulations and observations are denoted as True Negatives (TN).
Next, the Jaccard similarity coefficient (R j ) is computed as: where R j expresses the ratio between the intersection (TP) and the union (TP+FP+FN) of the number of pixels where ash is both simulated and observed. This coefficient allows us to quantify the similarity between the simulated and the observed plume and should be 100 in the ideal case of a complete overlapping between the two. Model precision R mp and model sensitivity R ms are evaluated as: High values of R mp indicate that the forecast plume reproduces well the observed one. However, some pixels where ash was detected from space could not be included in the simulation results. Model sensitivity (R ms ) gives the percentage of the observed ash cloud that the simulation reproduces, with no penalty for FP. This means that there could be FP pixels where ash is simulated but not observed. Values of R ms close to 100 guarantee that the number of False Negative pixels is close to 0. Since FN pixels falsely report no ash where ash in fact exists, they are particularly dangerous for aviation safety. Thus, model forecast should guarantee a low number of FN pixels.
Filter performance was evaluated by the root mean square error (RMSE): where n is the dimension of the state vector, m is the number of ensemble members, x i,j is the j-th member of i-th state variable andx i is the mean value of the i-th state variable. RMSE is an indicator of the ensemble spread around the mean state. High values of RMSE mean that the considered state (forecast or analyzed) is highly uncertain with the ensemble members spreading a lot around the mean state. The goal of the assimilation cycle is to produce a new state with a reduced uncertainty with respect to the observations and the forecast state. The RMSE of the observed state is computed as: where p is the dimension of the observation vector and i is the observation error. Finally, as the assimilation cycles are performed, we tracked the total atmospheric ash mass loading of the observed, forecast and analyzed states.

Results
In this section, the results of the assimilation experiments are presented and discussed. Evaluation metrics (R j , R mp and R ms and RMSE) are reported for all the experiments, while the complete set of figures showing the observed, forecast and analyzed ash clouds is presented for EXP1 only. Figures for the remaining experiments can be found in the supplementary material ( Figures S1-S25).
We first present the results of simulations done without DA, Figure 4. Only the portion of the cloud formed by the particle classes considered for the DA is displayed in columns named "Deterministic run" and "Ensemble no assimilation" (i.e., classes 7, 9, and 11 φ). The deterministic simulation was done using the reference wind-field and the model settings of Table 2. The members of the ensemble simulation were created using the setting parameters of EXP1, see Table 1, while parameters in Table 2 were used to initialize the PLUME-MoM&HYSPLIT simulations. Ensemble integration was stopped at observation time slices and re-initialized from the ensemble members without performing the assimilation. As for the DA experiments, the wind-field used by each member was sequentially changed at each re-initialization by randomly sampling from the 49 perturbed wind-fields. It emerges that simulations performed through ensembles appear to produce results more similar to the observed state even when DA is not performed. Table 3 shows the evaluation metrics R j , R mp and R ms computed for the deterministic and the ensemble simulations. It is clear that the ensemble simulation improves the quality of the forecast, with values of R ms close to 100%, while the deterministic run shows values of~25% for the different time slices.
Thus, as previously shown by Dare et al. [87], Zidikheri et al. [88], ensemble prediction is encouraged with respect to deterministic forecasting.
The results of the assimilation procedure are shown from Figures 5-8. At each assimilation cycle, three ash states are presented. From the left to the right, these are: the ash cloud as seen from space (Observations), the prediction done by the model (Forecast state) and the result of the assimilation procedure (Analyzed state). The statistics describing each ash state are the mean value and the standard deviation calculated for the ash columnar content of each pixel. For the forecast and analyzed states, these quantities are computed from the ensemble members by applying Equations (1) and (2) (standard deviation is the square root of the covariance). The mean state is displayed in Figures 5 and 6, while the standard deviation in Figures 7 and 8. For the forecast and the analyzed states, ash load (kg) contained in each pixel was integrated over the particle classes and the atmospheric levels to produce 2D maps. Then, ash load was converted into ash columnar content (t km −2 ) for comparison with the SEVIRI observations. A concentration cut-off of 0.01 t km −2 was applied to the plot of observations, forecast and analyzed states. Despite the concentration cut-off, 99% of the initial ash loading is still present into the regions plotted in the figures for the measured, forecast and analyzed ash states. It is worth noting that the simulated clouds are bigger than the observed ones. This result is mainly due to the way in which the ensemble was created, which is through perturbations of the wind-field. Changing the way in which the ensemble is generated (e.g., through perturbations of ESPs such as column height or TGSD) or varying the perturbation coefficients (i.e., µ θ , σ θ , µ int , σ int ) would affect the size and the shape of the simulated cloud. These effects can be seen in the experiments reported in the supplementary material (Figures S1-S25 Table 2 and performed using the reference wind-field. Panels (c,f,i) present the outcomes of an ensemble simulation performed using the setting of EXP1.
From Figures 5 and 6 it emerges that the analyzed state produced by the DA cycle improves the numerical ash forecasting by taking into account the observations. This is evident starting from the first assimilation cycle (12:30), where the peak in ash column amount of the forecast state was modified considering the observed state. The ensemble members forming the analyzed state at each assimilation cycle were used to reinitialize the dispersal simulation to provide a new forecast.
The ability of the filter in producing a new analyzed state with minimized errors is evident when looking at the standard deviations computed for the observed, forecast and analyzed states (Figures 7 and 8         Evaluation metrics were computed to check the performance of the numerical model in producing the forecast states used in the assimilation procedure. Panels from (a) to (g) of Figure 9 show the rank histograms computed for the different time slices (i.e., 12:30, 13:30, 14:30, 15:30, 16:30, 17:30 and 18:30), while Panel (h) presents the cumulative histogram obtained by summing histograms from (a) to (g). Each histogram was constructed by considering all the points of the computational domain and by applying a concentration cut-off of 0.01 t km −2 , to be consistent with the procedure previously applied. This means that ensemble values less than 0.01 t km −2 were replaced with zero. Rank histograms appear flat, without peaks at the extremes (U-shaped) or at the central bins (Dome-shaped). This means that the ensemble spread is appropriate in reproducing the actual model uncertainty, with the higher bias for times 16:30 and 17:30. This result can be partially due to the fact that in most of the pixels of the domain there is a zero concentration for both the ensemble members and the observation. For this reason, to better analyze the local performance of the data assimilation algorithm, we also computed the Rank Probability Score (RPS) for each pixel presenting an observation value different from zero.  1  20  50  10  30 40  1  20  50  10  30 40   1  20  50  10  30 40  1  20  50  10  30 40  1  20  50  10  30 40  1  20  50  10  30 40   1  20  50  10  30 40  1  20  50  10  30   RPS allows us to check the accuracy of the ensemble prediction locally (i.e., at each observation point), see Figure 10. The magenta pixels in Figure 10 indicate the observations falling outside the range of values predicted by the model. About 95% of the observed values are within the predicted values and RPSs range from 0 to 0.4 (0 means perfect prediction).
Overall, the rank histograms and the RPSs indicate that our ensembles are reliable in reproducing the true variability of the observations and thus they can be used for the DA procedure. At each assimilation time, True Positive, False Positive and False Negative areas were computed from the observed and the forecast clouds ( Figure 11). The number of pixels forming each region was used to calculate indices R j , R mp and R ms by applying Equations (12)- (14). As previously stated, high values of R ms are necessary to ensure reliable assimilation results. We found that the creation of the ensemble members through perturbed versions of a reference wind-field allows us to obtain high values of R ms . Indeed, excluding the first assimilation cycle (12:30), R ms computed for the following cycles is close to 100% for all the experiments, see Table 4.  The performance of the filter has been evaluated through the RMSEs computed by applying Equation (15) for the forecast and the analyzed states and Equation (16) for the observations. Figure 12 shows the RMSEs evaluated for the 6 experiments. It can be noticed that the RMSEs of the forecast and the analyzed states decrease as the assimilation cycles are performed. This trend is shared by all the experiments and indicates that the assimilation progressively reduces the uncertainty of the simulated ash clouds, both forecast and analyzed. Moreover, at each assimilation cycle, the analyzed states have a RMSE lower than the forecast states. This means that the assimilation procedure results in a new state where the uncertainty on ash amount and spatial distribution is minimized with respect to the predicted state.
Results of EXP1, EXP2 and EXP3 (Panels (a), (b) and (c) in Figure 12) are similar in terms of RMSE, while a variation can be observed for EXP4, EXP5 and EXP6 (Panels (d), (e) and (f) in Figure 12). This means that similar filter performance is obtained varying the angle used to perturb the wind-field (15 • for EXP1, 10 • for EXP2 and 20 • for EXP3).
On the contrary, the observation sampling time (1 h for EXP1, EXP2 and EXP3; 30 min for EXP4 and 2 h for EXP5) plays a more incisive role in filter performance. Indeed, short observation sampling times (30 min of EXP4) force the convergence of the analyzed state toward the forecast state. This is because, after each model re-initialization, the advection/diffusion mechanisms acting on the puffs forming the cloud have a limited time window to act before a new assimilation is done. Thus, the forecast state at time t k+1 remain similar to the analyzed state at time t k and so their RMSEs. RMSEs of the observations are independent from the sampling time interval and, for this case study, are always higher than the RMSEs of the forecast states. For this reason, the forecast states have a higher influence on the assimilated states than the observations. From Figure 12 it can be seen that long observation sampling times (2 h for EXP5) allow the advection/diffusion mechanisms to increase cloud spreading and thus to reduce filter convergence rate. A different trend emerges for EXP6 for which ensemble members have been created both by perturbing the wind-field and by considering the height of the eruptive column unknown (1000, 4000, 8000 and 12,000 m above the vent). The forecast RMSE computed for the first assimilation cycle (12:30) is about 250 t. This value is one order of magnitude higher than that computed for the other experiments (11 t). However, the analyzed state resulting from the first assimilation cycles has a RMSE of 75 t, which is 70% lower than the RMSE of forecast state. This trend is maintained for the following cycles and starting from the 14:30 assimilation cycle the forecast and analyzed RMSEs are consistent with what found in the previous experiments. EXP6 shows that DA enables a strong reduction to the uncertainties deriving from the initialization of the numerical model with highly uncertain ESPs. This result is of great value for the number of applications where ESPs are unknown or difficult to estimate in real time. Indeed, while Mt. Etna is extensively monitored from the ground and from space, the majority of worldwide volcanoes are not monitored [89] and ESPs are difficult to be assessed in real time when eruptions occur. In such cases, we found that the effects of highly uncertain ESPs are limited to the first assimilation cycles only and that the filter allows a significant reduction of the RMSE leading to results which are similar to well-constrained cases (i.e., EXP1, EXP2 and EXP3). Figure 13 shows the total atmospheric ash mass loading computed for the observed, forecast and analyzed states as the assimilation cycles are performed. Satellite retrievals report about 10 kt of fine ash released during the eruption. It is worth noting that the forecast ash loadings are higher than the analyzed loadings computed at the previous assimilation cycle. This because a continuous ash emission is simulated during the experiments. The trend shared by the 6 experiments is the tendency of the analyzed ash loading to converge to the observed one as the assimilation cycles are performed. This is particularly evident for EXP6, where the forecast ash loading at the first assimilation cycle (12:30) is 253 kt. This high value is due to the creation of the ensemble members through perturbations of volcanic column height, and thus mass flow rate. Thanks to the assimilation, the analyzed ash loading at 12:30 drops to 70 kt. This value further decreases to 31 kt at 13:30 and to 22 kt at 14:30. At the end of the assimilation experiment (18:30), the analyzed ash loading is 12 kt, which is a value in line with the 10 kt retrieved from space. This example highlights that sequential EnKF does not need a precise initialization of the numerical model. It is the filter that calibrates the analyzed state considering the uncertainties on observations and numerical predictions. 12 Figure 13. Total atmospheric ash mass loading of observed, forecast and analyzed states. Panels from (a-f) show the results for the different experiments ordered as in Figure 12.

Conclusions
We presented the implementation of a Data Assimilation procedure inside the VATD model HYSPLIT. The aim was developing a methodology to correct the predictions done by HYSPLIT with satellite observations of ash columnar content. To properly simulate explosive eruptions, HYSPLIT was initialized with the results of the eruptive column model PLUME-MoM. The coupling between the two codes has been completely automated and the two models allow us to simulate the release and atmospheric transport of volcanic ash clouds produced by explosive eruptions. The toolkit used to perform the assimilation is the Parallel Data Assimilation Framework (PDAF). PDAF routines were linked to the numerical model to perform the assimilation cycles in an automated way and at prescribed times (i.e., when observations are available). We used the Local Ensemble Transform Kalman Filter and tested it to track the ash cloud produced by the explosive eruption occurred at Mt. Etna on 24 December 2018. Measurements of ash columnar content done by the sensor SEVIRI were used as observations to be assimilated.
We performed different experiments (6 in total) varying both the type of parameters included in the ensembles, their respective ranges, and the observations sampling time. We showed that even the simple application of an ensemble strategy (without EnKFs) makes model predictions more similar to the observed ash cloud than those resulting from single deterministic simulations. This first result enhances the importance of ensemble-based ash forecasting.
Next, we showed how assimilation of satellite data further improve ash monitoring and forecasting. Indeed, the analyzed ash states are less uncertain with respect to both forecast and observed states and the peak in ash column amount is calibrated in agreement with what observed from space.
The ability of EnKF to reduce state uncertainty when ESPs are highly uncertain is another important feature of this kind of filter. In this way, accurate knowledge of ESPs is not mandatory for model initialization enforcing the use of EnKFs for ash forecasting of not routinely monitored volcanoes. Future research could aim at implementing variational filters for the retrieval of initial eruptive parameters from satellite observations.
Although we developed and applied the assimilation procedure to an historical eruption, Data Assimilation is perfectly suitable for real-time applications to track and forecast the advection/dispersal patterns of any ash cloud. Indeed, estimates of ash columnar content are furnished in real time by a variety of satellite sensors. For this reason, this kind of approach has the potential to greatly improve reliability of volcanic ash forecasting done by VATD models, with implications on aviation hazard avoidance. However, additional research is needed to improve the present approach in the context of aviation safety. First, while VAACs respond to volcanic crises producing 3-D ash concentration charts, the retrievals of volcanic clouds made from satellite measurements are in the form of 2-D maps. Information on cloud height and cloud thickness are needed to convert 2-D satellite maps into 3-D data. To that end, some approaches have been developed [49], but they rely on assumptions about cloud thickness, which is a parameter very difficult to estimate. Thus, additional research is needed to improve the conversion of 2-D satellite maps into 3-D charts. In this way, it will be possible to assimilate directly the 3-D ash concentration and thus to produce outcomes ready to be used for aviation avoidance. Secondly, a distinction between ash-contaminated and ash-free pixels based on concentration statistics and not on a purely binary approach would help to develop a more robust method for aviation risk assessment.
Finally, our approach could be effective also for monitoring and forecasting volcanic SO 2 clouds. Indeed, as for ash clouds, many satellite sensors can detect and measure volcanic SO 2 . Beyond SEVIRI, recent example instruments are the sensors TROPOMI [90] and EPIC [91]. In particular, TROPOMI enables the detection of SO 2 produced by passive degassing activity and low intensity eruptions. Since SO 2 is extremely dangerous for human health, the possibility to accurately forecast the dispersion patterns of such gas is of great value for volcanic risk mitigation.
Overall, we highlight that Data Assimilation aimed at monitoring and forecasting volcanic emissions (both ash and gases) can represent a real breakthrough for improving our assessment of volcanic hazards.