Using Satellite Data to Determine Empirical Relationships between Volcanic Ash Source Parameters

: Poor knowledge of dispersion model source parameters related to quantities such as the total fine ash mass emission rate, its effective spatial distribution, and particle size distribution makes the provision of quantitative forecasts of volcanic ash a difficult problem. To ameliorate this problem, we make use of satellite-retrieved mass load data from 14 eruption case studies to estimate fine ash mass emission rates and other source parameters by an inverse modelling procedure, which requires multidimensional sampling of several thousand trial simulations with different values of source parameters. We then estimate the dependence of these optimal source parameters on eruption height. We show that using these empirical relationships in a data assimilation procedure leads to substantial improvements to the forecasts of ash mass loads, with the use of empirical relationships between parameters and eruption height having the added advantage of computational efficiency because of dimensional reduction. In addition, the use of empirical relationships, which encode information in satellite retrievals from past case studies, implies that quantitative forecasts can still be issued even when satellite retrievals of mass load are not available in real time due to cloud cover or other reasons, making it especially useful for operations in the tropics where ice and water clouds are ubiquitous.


Introduction
Due to the hazards posed by volcanic ash on airborne aircraft, the monitoring and forecasting of volcanic ash is a key service for the aviation industry [1]. The Australian Bureau of Meteorology provides this service through the Darwin Volcanic Ash Advisory Centre (VAAC), one of nine such centers globally. It is responsible for issuing volcanic ash advisories in the volcanically active region covering Indonesia, Papua New Guinea, and the southern Philippines. The volcanic ash advisories issued by the VAACs comprise polygon coordinates that identify the regions of contaminated airspace. These advisories are qualitative in the sense that no estimates of the amount of ash in the contaminated region are provided. However, because of the potentially high economic impact of eruption events on airline operations, as for example during the high-impact Icelandic Eyjafjallajökull volcanic eruption event of 2010 [2], there has been an increasing demand for quantitative forecasts that provide information on the amount of ash in the contaminated regions. Quantitative forecasts will enable aircraft operators to better manage the risks of flying through regions of airspace with some likelihood of ash contamination.
Dispersion models, which are driven by gridded fields such as wind and rainfall obtained from meteorological Numerical Weather Prediction (NWP) models, are the primary tool for issuing volcanic ash forecasts. However, using these tools to provide reliable quantitative forecasts of airborne volcanic ash is a challenging problem. A major source of uncertainty lies in the specification of the 'source term', which includes the estimation of the total mass eruption rate, the fraction of airborne 'fine ash', the spatial distribution of ash, and the particle size distribution of the ash at the source. The uncertainty arises because the physical processes near the source are either poorly understood or require more complex and computationally demanding modelling [3] than what is available in the short time frames required for operational volcanic ash forecasting. Temporal variations of the source term, which are mainly driven by internal volcanological processes, are another source of uncertainty. Apart from these source-term uncertainties, all forecasts of ash are plagued by uncertainties in the ash transport and removal mechanisms within dispersion models. These include errors in the NWP fields [4][5][6], and parameterization of various physical processes such as sedimentation, deposition, and aggregation.
A commonly used approach to estimating mass eruption rates is based on the work of Mastin et al. [7]. In that study, data collected from nearly 30 past eruptions were used to estimate a power-law relationship between mass eruption rate and eruption height. The mass eruption rates in the case studies were estimated using field studies of volcanic tephra deposits. The empirical mass eruption rate was demonstrated to follow an approximate fourth-power relationship with the eruption height, in close agreement with theoretical predictions based on plume-rise modelling [8][9][10]. Many VAACs use such empirical relationships to estimate mass eruption rates [11]. Despite the good agreement with theory, the errors associated with these estimates can be significant, typically a factor of four [7]. In addition, these relationships predict the total mass eruption rate but do not specify the amount of mass in the size ranges considered most relevant for aviation forecasting, typically less than about 100 μm in diameter. Hence, the estimated mass eruption rate must be multiplied by a parameter, commonly called the 'fine ash fraction', to specify this amount. Mastin et al. [7] proposed a classification scheme to assign the fine ash fraction to different volcanos based on field studies of these or similar volcanoes during previous eruptions, but this approach is crude at best and does not work very well in general. Many VAACs instead use a set value for the fine-ash fraction, typically 5% [2]. Apart from the difficulties outlined above, the Mastin et al. [7] approach does not predict the correct spatial distribution of mass. In the absence of this information, the commonly used approach is to assume a uniform distribution of mass in the vertical. In addition, the particle size distribution needs to be specified and this is usually based on post-event analyses of previous eruption case studies.
A different approach towards an improved source term and reliable quantitative forecasts is to make use of satellite data. Over the last 10 years, the widespread availability of better multispectral sensors on geostationary satellite platforms like Himawari-8 has spurred the development of sophisticated remote-sensing algorithms for volcanic ash [12][13][14]. These algorithms, in addition to being able to better detect volcanic ash, enable the retrieval of volcanic ash cloud properties such as cloud top height, mass load and particle size information. This quantitative ash cloud data, particularly the mass load, combined with various inverse modelling methods [15][16][17][18][19][20][21][22][23] enables the estimation of the source mass emission rate, including the appropriate vertical distribution of mass, that needs to be specified for dispersion models. In this approach, the source mass emission rate does not require multiplication by the fine ash fraction because the retrieved mass loads are predominantly associated with the fine ash component and not the whole spectrum of particle sizes. This becomes clear if we consider that the retrievals have a typical upper effective radius cut-off of about 15 μm [14], which implies that particles less than 50 in radius (or equivalently, 100 in diameter) contain about 90% of the total mass within a retrieved pixel at this upper retrieval limit, assuming a log-normal number distribution with a geometric standard deviation of 2.1; the mass proportion in this particle size range will be even higher when the effective radius is significantly less than the retrieval limit. It should be noted that this is the distal fine ash fraction. Even though there might be higher proportions of larger particles closer to the source, these will fall out quickly and are, therefore, less important for longer-range aviation forecasting. Therefore, in this paper, any reference to fine ash fraction as derived from satellite retrievals is to be understood in this sense. A drawback of this approach is that the remote-sensing algorithms are not always able to successfully detect and retrieve ash properties due to surrounding meteorological (ice and water) clouds. This problem is particularly acute in the Darwin VAAC region due to the frequent presence of convective clouds.
In this paper, we present a new approach that addresses limitations in the empirical and satellitebased approaches. The satellite-based approach has the advantage of not requiring knowledge of the ash mass fraction, but it has the disadvantage of being dependent on atmospheric conditions (such as presence of meteorological cloud cover) for its usability in operational contexts. The empirical approach, in addition to other problems outlined above, requires specification of the poorly understood fine ash mass fraction, but it has the advantage of being usable regardless of the atmospheric conditions. The new approach uses satellite retrievals obtained from the NOAA VOLcanic Cloud Analysis Toolkit (VOLCAT) software, based on the algorithm of Pavolonis et al. [14], with inverse modelling methods based on multidimensional sampling of model source parameter space [3,6,[24][25][26], to estimate the source term (total mass emission rate, spatial mass distribution, and particle size distribution) for 14 eruption case studies in the Darwin VAAC area. These source term quantities are expressed as relatively simple functions of a few parameters such as the fine ash fraction, umbrella cloud diameter, and mean particle radius. From these cases, we then estimate empirical relationships between the calculated source-term parameters and eruption height as in the approach of Mastin et al. [7]. In this way, we aim to develop a system that is usable irrespective of atmospheric conditions but is less dependent on ad hoc parameters. In addition, because several source parameters become functions of eruption height, in this formulation the dimensionality of the inverse modelling problem is greatly reduced, thereby making it better suited for operational requirements.

Ash Observations
Automated detection and retrieval of volcanic ash are obtained using the VOLCAT remotesensing software. VOLCAT provides both automated detection [27] of ash in satellite imagery as well as retrieval of ash properties such as mass load for the detected pixels [14]. As noted in Section 1, the algorithm performs retrieval for pixels with ash particle effective radii less than 15 μm. However, as discussed by Pavolonis et al. [14], this should not be taken to imply that larger particles do not contribute to the radiance measured by the satellite. The particle size distribution is assumed to be log-normal (with a geometric standard deviation of 2.1), and an ash cloud with an effective radius at the upper limit of 15 μm, for example, will have a significant portion (~60%) of its mass contributed by particles larger than 15 μm. Retrievals from 14 eruptive events (listed in Table 1), all within the tropics (13 in the Darwin VAAC region and one in the Wellington VAAC region), are used as the base data to enable the estimation of the relationships between various source parameters and eruption height. Part of this dataset is also used for forecast verification purposes. These events occurred between 2014 and 2019 and represent eruptions with top heights extending from the midtroposphere to the lower stratosphere. Events from July 2015 onward are sampled using the JMA Himawari-8 satellite; prior to that date, the older, lower-resolution MTSAT-2 was used. Additional discussion of the case studies is provided in Section 3. VOLCAT does not have a 100% success rate for detection of ash for a variety of reasons, including substantial meteorological cloud cover [28]. However, we have specifically chosen case studies with successful detections of ash over most of the cloud within a time window of at least a few hours in this study. Table 1. Case studies arranged by decreasing eruption height. Also shown are estimated eruption start times (in the format year/month/day/hour) and time windows (relative to the start time) used in multidimensional inverse modelling (see Section 4), low-dimensional data assimilation (see Section 5), and forecast verification (see Section 5). For Rinjani I and II, the eruption is continuous, so the start time is somewhat arbitrary, as explained in more detail in Section 3. The roman numbers indicate that different eruption phases of the same volcano and S. Api is an abbreviation for Sangeang Api.

Dispersion Model
The hybrid single particle Lagrangian integrated trajectory (HYSPLIT) model [29,30] is used to simulate the dispersion of ash from the source. In this study, HYSPLIT was driven by an ensemble of 24 NWP fields from the ACCESS-GE global model [31], which has a timestep of 3 h and a resolution of about 60 km in the tropics. As well as dispersion by gridded wind fields and parameterised turbulence, the bureau's version of HYSPLIT has a dry deposition scheme which removes particles from the atmosphere by sedimentation, and eventual surface deposition, using the Ganser fall speed formulation [32] with a shape factor of 0.8 and a wet deposition scheme.

Source Term in Reference Runs
The source term is a cylinder of particles, in diameter, with its base at the volcano summit, , and top (eruption height), , initiated at time for a duration . In the standard setting, which is used to provide reference data by which we evaluate model improvements in the rest of this paper, is set to 20 km, is determined mainly by VAAC reports of the eruption height; and are estimated by a combination of VAAC reports and manual inspection of satellite data. The mass emission rate of fine ash (in kg s -1 ) is determined by the empirical Mastin relationship ̇= ̇, where ̇= 140.8( − ) . (with and in km) and is the fine-ash fraction, which is specified to be 5% [2]. The distribution of mass within the eruption column is assumed to be uniform. The particle size distribution is based on the distribution obtained by Hobbs et al. (1991) [33] and comprises a continuous spectrum of particle radii, , in the range 0 < < 50 m, which defines the particle size range considered most relevant for aviation forecasting. The emission rate is assumed to be a step function in time; it is constant for a duration and zero thereafter. This simple source formulation is similar to what is currently used at the Darwin VAAC and in many other volcanic ash advisory centres.

Source Term in Experimental Runs
In the experimental runs, which are the focus of this paper, we introduce a crude representation of an umbrella cloud. The source is a still cylinder of particles; however, the diameter, , of the cylinder represents the size of the umbrella cloud at the time that wind-driven dispersion becomes the dominant ash transport mechanism. The initialisation time is at + ∆ rather than the start of the eruption, . The time lag ∆ is the time taken for the umbrella cloud to reach a diameter of . Prior to the initialisation time, it is assumed that the umbrella cloud grows axisymmetrically at a rate determined by ∆ and . In general, the diameter of the ash column is less than the diameter of the umbrella cloud and should be initialised at rather than + ∆ , but here we choose to ignore this distinction to minimise the number of free parameters and errors arising from initialising the dispersion model from a small area due to NWP field errors. The emission duration is , relative to + ∆ rather than as in the reference runs. The source mass emission rate of fine ash particles for a given height ℎ and particle radius can be written as: where (ℎ) in Equation (1) describes the vertical mass distribution: which comprises a uniform emission rate denoted by , describing the eruption column, and a variable emission rate describing the umbrella cloud, which is chosen to be Gaussian with centre and standard deviation . In Equation (2), we define the width of the umbrella cloud = 4.29193 , which corresponds to full width at one-tenth maximum of the Gaussian distribution. For convenience, we also define the ratio of the mass emitted from the umbrella cloud to the mass emitted from the eruption column as = √2 ( − ) and deal with rather than hereon. This expression can be derived by integrating in the vertical to obtain the total mass emission rate for both the umbrella cloud and eruption column and dividing the two expressions, assuming that << − and << − . This interpretation of strictly only applies to the larger eruptions considered in this study. For smaller eruptions, which do not satisfy this condition, the mass ratio is in general very small as most of the mass is contained within the eruption column, and will be an overestimate of the mass ratio.
In addition to improving the umbrella cloud representation of the source, in the experimental runs the particle size number distribution is assumed to be described by a log-normal distribution with a variable mean geometric radius, . The particle size mass distribution is given by: Where in Equation (3) is the geometric standard deviation, which is fixed to a value of 1.5, and R = 50 μm is the maximum particle radius retained in the distribution [3]. The total mass eruption rate, ̇, in Equation (1) is given by the Mastin relationship as in the reference runs.
In summary, the parameters required to construct the improved source term are , ∆ , , , , , , and . In the following section we describe how this can be done by inverse modelling.

Inverse Modelling by Multidimensional Optimisation
The above source parameters can be estimated for given observations of ash by using the inverse modelling approach of Zidikheri et al. [3,6,[24][25][26]. The method is based on sampling the uncertainty bound of each parameter to form a large parameter grid (in hyperspace) given an initial estimate of the parameter bounds. Each point in this parameter grid (except for , as further discussed below) represents a possible configuration of the dispersion model source term, and each of these possible source configurations is used to run the dispersion model forward in time. The resulting model output is compared against observations within a specified time window (determined by the availability of observations) by using the pattern correlation metric [24]. We refer to these simulations as trial simulations in this paper. The inverse modelling algorithm then selects the best-performing trial simulations to form the forecast ensemble. The optimal pattern correlation cut-off, above which a given trial simulation is included in the forecast ensemble, is determined by choosing a range of different cut-offs and selecting the cut-off that maximises the Brier skill score ( [6] and references therein) of the forecast ensemble within the analysis time window.
Because the inverse modelling method relies on a grid search algorithm, the number of forward trial runs grows exponentially with the number of free parameters, and this limits its ability to handle the number of parameters required to completely solve the inverse problem posed above. This is exacerbated by using an NWP ensemble-such as ACCESS-GE-to drive the dispersion model because each source parameter configuration needs to be run with all the meteorological ensemble members. In this study, we ameliorate these problems by firstly limiting the number of grid points per parameter and secondly by dividing the problem into two parts, namely, the detection problem and the retrieval problem.
The detection problem involves optimising the parameters relative to the ash detections obtained from VOLCAT. In practice, this simply means transforming the VOLCAT mass load retrievals into a new binary-valued field, which has values of 1 where the mass load is greater than zero (and 0 elsewhere). The mass loads from the trial simulations are similarly given a value of 1 where there is any ash and 0 elsewhere to create a new binary-valued simulated detection field. The source parameters are then optimised with respect to these binary-valued fields. This approach has been used successfully in the past to estimate parameters such as eruption height [24][25][26]. Using this approach, we can initially disregard the source parameters that describe the total mass eruption rate and its vertical distribution, namely , , , , and , because they do not affect the simulated ash detection field. These parameters only change the amount of ash at a given location. The only parameters that affect the location of simulated ash are , ∆ , , and (as well as the meteorological ensemble members). Having used the ash detections to reduce the number of possible grid points for the inverse problem, the remaining grid points (optimal with respect to detections) are then combined with the grid points formed by sampling the parameters , , , and (the parameter is computed as described further below). This new parameter grid is then optimised relative to ash mass load retrievals; this forms the so-called retrieval problem. Splitting the problem this way is computationally more efficient because we do not have to sample the whole parameter space spanned by the unknown parameters.
The ash mass that is retrieved from satellite data is only a fraction of the total mass emitted from the volcano because it is only the fine ash that remains in the atmosphere at the time scales of hours or longer and can be detected by remote-sensing algorithms. This detectable component is given by ̇= ̇, where ̇ is the total mass eruption rate as estimated from the Mastin et al. [7] empirical relationship, and is the fine ash fraction, which is determined by the inverse modelling procedure. Note, however, that is just a scaling factor and has no effect on the pattern correlation. It is computed by scaling each of the trial simulations so that they have on average the right magnitudes relative to the ash mass load retrievals; that is: where Λ ( ) is the spatial mean of the observed (retrieved) mass loads at time , Λ ( ) is the spatial mean of the trial simulation mass loads, and is the number of temporal observations within the analysis time window. It should be noted that defining the fine ash fraction in this way in Equation (4) implicitly assumes that the Mastin relationship is accurate. If the Mastin relationship is in error, then accounts for both the fine ash fraction and the error in the Mastin relationship.

Deriving Empirical Scaling Relationships between Source Parameters and Eruption Height
The inverse modelling algorithm as described above requires the sampling of a ninedimensional parameter grid (eight source parameters plus the space formed by the meteorological ensemble members). This generally requires several thousands of trial simulations and is, therefore, not practical for operational volcanic ash forecasting, where timely forecasts are essential. In addition, the availability of satellite retrievals is strongly dependent on atmospheric conditions such as the presence of (water and ice) clouds and so, on the face of it, the method is not very reliable for operational use, especially in the tropical region covered by the Darwin VAAC. Therefore, we firstly seek to find empirical relationships between various parameters and eruption height using past eruption case studies. Having established the approximate dependence of the source parameters to eruption height, the optimisation problem is reduced to finding optimal eruption heights and NWP ensemble members only, which is much less computationally expensive, and therefore suitable for operational requirements. In addition, because eruption height can be estimated even without satellite retrievals [25,26], the method could still be used even when no real-time satellite retrievals are available.
Motivated by the known approximate fourth-power relationship between mass eruption rate and height (relative to the summit), we seek similar power-law relationships for the other source term parameters (except , for reasons explained below) as follows: where in Equation (5) is the i-th element of the vector = (Δ , , , , , , ) , comprised of the different source parameters; is a multiplicative error factor, which arises both due to sampling error and the fact that a simple power-law relationship between model parameters and eruption height is only a crude approximation at best (the error is considered as multiplicative, rather than additive, because some parameters, such as mass eruption rate, span several orders of magnitude and, therefore, the errors should scale accordingly); the superscript represents different case studies, in number ( =1,2,3.. ), obtained by considering a range of volcanic eruptions chosen to span a wide range of maximum eruption heights; the superscript represents different forecast ensemble members ( = 1,2,3.. , where is the total number of forecast ensemble members yielded by the inverse model, and is different for each case study, ); and are unknown coefficients. We have not included the eruption duration, , in the vector of source parameters above and instead assume it is an independent variable whose value must be estimated by inspection of satellite imagery, or other means, at the time of a volcanic eruption. The coefficients and are calculated by first converting (5) into logarithmic form, . The coefficients and may be evaluated by linear regression (that is, minimisation of ∑ ∑ ). However, to avoid the problem of dealing with data points with errors that might not be completely uncorrelated (due to the presence of the clusters of data points from the same case study), we first calculate the ensemble mean of each term in (6): where 〈 〉 = ∑ , 〈 〉 = ∑ , and 〈 〉 = ∑ . The error, 〈 〉, in Equation (7) is then less likely to be significantly correlated for different values of (for a given parameter, ), as each case study ( ) utilises different observations of mass load. Prior to regressing the data points (〈 〉, 〈 〉), we also need to consider that the data points are associated with different degrees of uncertainty for the different case studies because each case study is associated with different numbers of model ensemble members, degree of agreement with observations, and observation times relative to the initialisation time. Therefore, it is appropriate to first weight the data points so that: Where in Equation (8) = 〈 〉, = 〈 〉, and = 〈 〉. The weights are prescribed as: Each weight, as indicated in Equation (9), is directly proportional to the product of the ensemble mean pattern correlation value, 〈 〉, and the mean analysis time, ̅ , relative to the eruption start time . This ensures that parameter values associated with high pattern correlation values receive greater weight in the regression. Furthermore, it ensures that case studies with high pattern correlations obtained from using observations at later times, relative to the start time, receive greater weight than case studies utilising only observations soon after the eruption start time. The reasoning behind this is that pattern correlation values typically fall with increasing time as the dispersion pattern usually becomes more complex with increasing time; this must be accounted for when regressing data points based on observations at different times relative to the start time. Each weight is also inversely proportional to the ensemble standard deviation Ω from the mean parameter value 〈 〉, which is another measure of the degree of uncertainty of the inversion. This ensures that mean parameter values associated with small standard deviations receive greater weights than parameter values with large standard deviations. The coefficient ∈ {0,1} is a binary variable. It enables some case studies ( ) to be removed entirely from the regression for certain parameters ( ) (by setting = 0). This point will be discussed in more detail in Section 4.
The parameters and are estimated by linear regression from Equation (8), as shown in Appendix A. Although we estimate the parameters by linear regression of the weighted ensemble means, it is also useful to know the statistics of the errors in the ensemble members due to linear regression of the ensemble means (the statistics of the parameter in Equation (5)). These are described by the error covariance matrix, , whose elements are given by: Where in Equation (10) = ∑ ∑ .
The diagonal elements of contain the variances of the parameter errors, so the standard multiplicative errors (in normal space, not logarithmic) for each parameter are given by:

Application of Multidimensional Inverse Modelling to Selected Case Studies
The eruption case studies that are used in this paper for the purpose of estimating relationships between various source parameters and eruption height, as outlined in Section 2, are listed in Table  1. These case studies are discussed more comprehensively in the supplementary materials. Table 1 also lists the start times and observational time windows used in the multidimensional inverse modelling procedure for estimating parameter values. In this section, we shall also compare the performance of the reference runs, which employ a simple source formulation without source optimisation, with experimental runs. The experimental runs employ a more complicated source (with a crude representation of umbrella cloud effects) that is optimised with respect to VOLCAT satellite retrievals; these results are summarised in Table 2. The alternate value is based on CALIPSO imagery for Kelut (Case Study 1) and pilot reports for S. Api I and Rinjani III; the rest are based on brightness temperature analyses. All heights are relative to mean sea level. The mass loads shown are approximate peak values that we have extracted from the plots of retrieved and simulated values. In the latter, we have removed any grid points with particle effective radii greater than 15 m, the upper limit of the retrievals, in order to facilitate a better comparison between the two. The results in Table 2 show that the eruption height estimates based on inverse modelling agree quite well with Darwin VAAC estimates in most cases, albeit with a slight tendency towards higher values. In cases where there are significant discrepancies, further analysis using Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) overpasses (Kelut), pilot reports (Sangeang Api I, Rinjani III), and satellite brightness temperature imagery (Tinakula, Merapi, and Agung) reveals that the correct eruption height is likely to be closer to the inverse modelling estimates. Table  2 also reveals that the use of inverse modelling improves the simulated mass load estimates. In the reference runs, the mass loads are significantly higher than those in the retrievals, in some cases by more than an order of magnitude. This is especially true for the stronger (higher level) eruptions. This phenomenon was also noted by   [3].

Case
The application of multidimensional inverse modelling to all the 14 eruption case studies reveals significant improvement to forecast performance relative to reference runs without optimised source parameters using metrics such as mass load values, pattern correlations, and Brier skill scores. However, as noted in Section 2, because the number of trial simulations grows exponentially with the number of source parameters, it is very computationally intensive and therefore not suitable for implementation in operational contexts. We instead use the inferred parameter values to estimate relationships between various parameters and eruption height, which is done in Section 4, with the aim of constructing more computationally efficient data assimilation schemes, as demonstrated in Section 5.

Estimating Relationships between Source Parameters and Eruption Heights
Here we focus on representing the source term parameters obtained in the inverse modelling procedure, described in Sections 2 and 3, as power-law relationships with respect to eruption height above summit, − . The coefficients and defining the power-law ( − ) for each parameter are obtained by linear regression of the (log-transformed) data points as described in Section 2, and are shown in Table 3. The variation of the parameter values with eruption height are shown in Figures 1 and 2. The empirically derived curves (solid blue) and the error margins (dashed and dotted blue) are shown in both figures. It should be noted that the error margins are based on Equation (12). They describe the average deviations of the parameter ensemble members relative to the optimal fits, and not just the deviations of the ensemble means from the optimal fits, which are smaller in magnitude. The weights employed for each data point (Equation (9)) are shown in red. As alluded to in Section 2, some data points have been assigned a weight of 0 for some parameters (by setting = 0 in Equation (9)). These include the parameters and in Rinjani I and II because in those case studies the simulations were initialised within a time interval of 24 h prior to the assimilation time window indicated in Table 1 rather than the start of the eruption (as further discussed in the supplementary section) and, therefore, and in the sense of Section 2 could not be determined. In addition, the data points for the parameters , , and , which are related to the vertical mass distribution, have also been assigned weights of 0 for small eruptions with top heights less than about 5 km above the summit because this height range was deemed too small to accurately determine the correct vertical distribution of mass. Table 3. Source parameters determined by empirical power-law relationships in this study. The top four data rows show the power law coefficients and obtained for different source parameters and their respective standard regression errors. The bottom seven data rows show the coefficients of the covariance matrix as calculated in Equation (10) with the diagonal elements shown in bold. The units of the source parameters are also indicated ( and are dimensionless ratios and is in km above the summit). The independent variable − in the power-law relationships is measured in km. The results in Table 3, Figure 1a, and Figure 1b show that the start-time time lag, ∆ , grows slowly (power of ~0.7) with eruption height while the source diameter, , grows approximately linearly (power of ~0.95) with eruption height. In a qualitative sense at least, both results appear to be consistent with gravity-current driven flows during the early stages of large eruptions. In the limit of small eruptions ( → ), the parameterised ∆ and approach small values (3.8 min and 4.8 km, respectively), which are not significantly different from the settings used in many volcanic ash models (i.e., line source initialised at the eruption start time). The effects of the empirical power-law relationships for these two variables are, therefore, only significant for large eruptions in which both ∆ and are substantially larger than used in the default settings. For the largest eruption in this study (Kelut, 13 February 2014) the predicted start-time lag was over 30 min and the predicted umbrella cloud diameter nearly 100 km. The error margins are factors of 1.44 and 1.41 for Δ and , respectively, which indicates that the empirical relationship predicts the correct parameter values with reasonable skill.

Parameter
The coefficients associated with the parameters describing the vertical distribution of mass at the source also appear to be consistent with the physical picture of umbrella clouds. The centre of mass of the umbrella cloud, , grows with eruption height (albeit quite slowly, with a power of only ~0.59) while the depth of the umbrella cloud, , falls rapidly (power of ~−1.5) with increasing eruption height. In addition, the ratio of mass in the umbrella cloud to the mass in the eruption column, , grows approximately linearly with increasing eruption height. Therefore, the cumulative effect of these relationships is that for large eruptions, the vertical mass distribution is tightly confined within the altitudes spanned by the umbrella cloud while for small eruptions the distribution is approximately uniform (in the limit → , ~ 94 km, which is in effect a uniform distribution). As for ∆ and , the effect of these new parameterisations, therefore, would only be significant for large eruptions. The error for is a factor of 1.08, which is smaller than for the other parameters, indicating relatively high predictive skill for correct parameter value (see Figure 2a). The errors for and are however large, being factors of 6.76 and 6.38, respectively, and therefore we expect limited predictive skill for these parameters using the empirical relationships (see Figure 2b and Figure 2c). The large errors probably reflect the fact that the vertical mass distribution is much more complex than the simple Gaussian function representation that we have employed. They could also be related to the fact that, for the smaller eruptions, both large values of and small values of result in a nearly uniform distribution, which leads to considerable scatter in the values obtained by inverse modelling. and (c) mean particle radius, . Also shown are model-derived ensemble mean values (dots), optimal fit (solid blue), optimal fit multiplied by standard error factor (dashed blue), optimal fit divided by standard error factor (dotted blue) and prescribed normalised weights (solid red). Table 3 also indicate that the fine ash fraction, , falls approximately linearly with eruption height. Unlike the five parameters described above, whose characteristics can at least qualitatively be linked to the growth of a gravity-wave driven umbrella cloud, it is not completely clear why this is the case. There are other studies that have hinted that there is indeed an inverse relationship between eruption height and fine ash fraction. For example, Tupper et al. [34] found by running a high-resolution three-dimensional plume model that tropical volcanic eruptions result in the ash plume rising higher than suggested by the mass eruption rate, but with lower fine ash content. Gouhier et al. [35] also found that very strong eruptions have relatively small fine ash content by a using satellite-based methodology for estimating the fine ash fraction. The derived empirical relationship indicates that for small eruptions the fine ash fraction approaches 7% while for large eruptions, e.g., reaching tropopause height or 16 km in altitude, it is less than 1%. These results are consistent with previous studies [3,6], which found that for large eruptions (such as 13 February 2014 Kelut), the mass loads were vastly overestimated upon employing the default value of 5% for the fine ash fraction. The error margin for (and therefore in the fine ash emission rate) is a factor of 2.41, which is quite large (Figure 2d). However, we should put this uncertainty in the context of the commonly used Mastin scheme [7]. The error in the Mastin relationship is at least a factor of 4 [7]. In addition, in the Mastin approach, one must also specify the fine ash fraction, which is associated with significant uncertainty [2]. The estimate of the fine ash emission rate in this approach is, therefore, a significant improvement over the Mastin scheme in this regard. The physics behind the parameterisation of the mean particle radius, , is also not clear. It appears to grow slowly (power of ~0.59) with eruption height. Given that most of the VOLCAT mass load retrievals continue for at most a few hours past the commencement of an eruption in most case studies, it is very difficult to extract a reliable signal related to particle size distribution from the mass load retrievals because it typically takes several hours for the smaller particles to fall significant distances. However, the 13 February 2014 Kelut eruption (the data point with the highest − ), seems to be an exception, with a clear signature of particles falling to lower levels [3] and that together with the relatively large computed weight may explain the slight bias towards large particle sizes in the results shown in Table 3 (implied by the power of 0.59 variation with eruption height). The error is a factor of 6.60, which indicates that the predictive skill to be gained from this empirical relationship is limited, however (Figure 1c).

A Computationally Efficient Data Assimilation Scheme Based on Empirical Source Relationships
In the context of this study, data assimilation refers to the use of satellite data to optimise volcanic ash forecasts. It comprises an analysis phase in which the inverse model is used to find optimal model parameters within a relatively short time window in which observations are available (usually from the start of the eruption up to the time at which the forecast is issued), and a forecast phase in which the optimal parameters are used to propagate the models further forward in time. In both phases, an ensemble approach is pursued, with uncertainties in the source parameters and different ensemble NWP realisations being used to generate the ensemble members. A principal aim of this study is to investigate the viability of a more computationally tractable data assimilation scheme, which does not require high-dimensional optimisation (that is, many free parameters). Another principal aim is to investigate the viability of providing quantitative forecasts of ash mass load even when satellite retrievals are not available in real time. These aims seek to address two principal constraints in an operational environment, namely the requirement for timely forecasts to provide end users with sufficient time to act on available information, and the challenging tropical environment of the Darwin VAAC in which satellite retrievals are frequently not available due to the frequent presence of ice and water clouds.
The development of empirical relationships between various source parameters and eruption height in Section 4 is crucial because it enables the search space for model parameters to be greatly reduced when satellite retrievals are available in real time. Instead of independently searching for optimal parameters in nine-dimensional space, requiring thousands of trial simulations, the problem is reduced to finding the optimal eruption height, which requires only a few hundred trial simulations (in the order of 10 simulations for each of the 20-40 meteorological ensemble members available). In the case when satellite retrievals are not available, but ash can still be delineated from surrounding clouds, the eruption height may be estimated using inverse modelling as shown in other studies [24][25][26] and the empirical source term relationships used to provide quantitative forecasts of mass load (it should be noted that in contrast to Section 2, in which we treat any simulated ash as part of the detection field, here we impose a lower mass load limit to detections, which we take to be 0.1 g m -2 in order to better account for the variety of situations in practice such as long-lived eruptions in which some of the ash might have dissipated to quite low levels and, therefore, not be visible in satellite imagery or be detectable by automated algorithms such as VOLCAT). Finally, even when ash cannot be delineated from surrounding clouds, provided that an estimate of eruption height is available by other means, such as ground observer or pilot reports, the empirical relationships may still be used to provide quantitative forecasts of ash mass load. We show in this section how these various estimates of mass load in different operational contexts compare with estimates of mass load in reference runs, which utilise default settings for the source term as currently used operationally.
Ideally, we should use new case studies for this verification exercise. Unfortunately, identifying case studies with good retrievals, particularly in the tropical atmosphere, is a manual, time-consuming, process and it was not possible to pursue this approach in this study. Instead, we use the same case studies used in the training phase for forecast verification. However, here we divide the satellite retrieval data into two parts, namely analysis data and forecast verification data as shown in Table 1. The analysis data are for the most part the same data that were used in the training phase with multidimensional inverse modelling (although there are some differences in case studies with short time windows in which retrievals were available, as shown in Table 1). In this case, the analysis data are used to perform the two-dimensional optimisation of height and meteorological ensemble members instead of the computationally intensive nine-dimensional optimisation, with empirical source relationships being used to determine the other source parameters as functions of eruption height. On the other hand, the forecast verification data, for the most part, comprise retrievals that were deemed incomplete in some sense because they only covered a fraction of the ash cloud (the rest of the ash cloud did not yield retrievals either because the ash was too optically thin, or too optically thick, or was obscured by cloud, or some other reason) and, therefore, not suitable for the training phase. These data, which in all cases comprise retrievals at times greater than the times used for the analysis time window, are used for verification purposes instead. Hence, even though the same case studies are used here as in the training phase, for the most part the forecast verification segment of the data were not used in the training phase. It is quite likely, therefore, that the forecast performance here will generally be indicative of the performance of the (low-dimensional) system in general because of this division of the dataset.
In the analysis phase, we envisage three operational scenarios as far as observations are concerned, namely that full retrievals are available, only detections are available, and no retrievals or detections are available. In the first two scenarios, the eruption height and meteorological ensemble members are optimised with respect to available observations (retrievals and detections, respectively) while in the last scenario there is no optimisation and the a priori eruption heights in Table 2 (in brackets) are used instead to determine the source parameter values from the derived empirical relationships. We calculate Brier skill scores for each scenario as shown in Figure 3. The skill scores for the analysis phase are shown in Figure 3a while the forecast phase skill scores are shown in Figure 3b. In the forecast phase we use available retrievals to calculate the Brier skill scores in all three scenarios. Brier skill scores greater than zero imply improvement relative to reference runs (see Section 2). In a broad sense, we can see that there is overall improvement irrespective of which observation type is used. However, there are some differences between the runs, which are also evident. In the analysis phase, optimising with respect to the retrieved mass loads leads to superior results, which is not a surprising result since that is what the algorithm is designed to do. What is encouraging, however, is the extent to which optimising with respect to ash detections only leads to better agreement between simulated mass loads and retrievals; in fact, in a few cases, the detectionoptimised results are better than the results with retrieval optimisation. Employing the empirical source parameter relationships without optimising the eruption height also leads to improvement in all but three of the 14 case studies. These improvements are more modest, however, even in the 11 cases where the Brier skill score is positive. In the forecast phase (in which the retrievals are used only for verification and not to optimise the simulations in any way), the retrieval-optimised runs are still overall superior in the majority of cases, but the detection-optimised runs are actually significantly better in three cases and not much worse in about four or five cases.
The runs not employing optimisation in the analysis phase (but employing empirical source relationships) are generally slightly poorer compared to the optimised runs in both phases. However, they perform better in the forecast phase than in the analysis phase. This is evident if we consider that in the analysis phase the non-optimised runs are inferior to the optimised runs in all cases while in the forecast case there are at least three cases where they are superior to at least one of the optimised runs. Moreover, in the forecast phase the non-optimised runs are always superior to the reference runs, but this is not the case in the analysis phase. The reason for this is not completely clear. However, the results do demonstrate that the empirical relationships can improve the forecast skill even when observations are not available.  Table 1) for (a) analysis and (b) forecast phases in the low-dimensional optimisation runs. The results for the runs utilising retrievals are shown in red, the runs utilising detections only are shown in blue, and the runs without optimisation are shown in yellow (for which, in practice, there are no distinct analysis and forecast phases). Brier skill scores greater than zero imply improvement relative to reference simulations, which are chosen to be hybrid single particle Lagrangian integrated trajectory (HYSPLIT) runs with default source settings. Figure 4a shows the VOLCAT retrievals in the 13 February 2014 Kelut case study, which we take as representative of the larger magnitude eruptions considered in this paper, at 0530 UTC on 14 February. As also noted by   [3], VOLCAT retrievals covering most of the observed ash cloud were available in the period 13/2330-14/0230 UTC. After this time, however, only partial retrievals, which do not include most of the areas covered by the umbrella cloud (indicated by the red arrow), were available. There is enough data, however, to perform a verification study for the forecast phase even without the umbrella cloud. The mass load forecast obtained from the reference run, comprising the default source term without optimisation, is shown in Figure 4b. We can see that the ensemble mean load in the reference forecast is overestimated by at least an order of magnitude in the region where retrievals are available. The tendency of mass loads in large eruptions to be overestimated (and erroneously distributed) was also noted by   [3], and this continues to be the case here. The forecasted ensemble-mean mass loads for the run employing VOLCAT retrieval-optimised source parameters in real-time are shown in Figure 4c. The agreement between the ensemble mean loads and the VOLCAT retrievals is substantially improved over the reference run, both in terms of magnitudes and spatial distributions. The forecasts employing satellite detections only to optimise the ( Figure 4d) and empirical source relationships without optimisation (Figure 4e) also show similar improvements. These results are encouraging because they demonstrate that the empirical relationships between the source term parameters and eruption height, as derived in this study, do encode useful predictive skill since retrieval data at this stage of the eruption was not in any way included in the statistical analysis used to derive these relationships, as alluded to above.
VOLCAT retrievals and forecasted mass loads for the May 2019 Agung eruption, taken to represent typical smaller-sized eruptions, at 24/1500 UTC, about 3.5 h after the eruption, are shown in Figure 5. The VOLCAT retrieval at this time ( Figure 5a) shows a fairly localised band of ash to the south-west of the volcano with mass loads less than 2 g m −2 . The ensemble mean mass load from the reference forecast (Figure 5b) predicts the magnitude and general movement of the ash quite well, but is more spread out spatially, with a significant component moving more slowly to the south-west and, therefore, still over land at this time. Both types of optimisations (retrieval-based in (Figure 5c) and detection based in (Figure 5d)) are better able to capture the location of the ash, with a smaller portion being over land even during the forecast phase (about one hour after the latest observation was assimilated-see Table 1). This trend continues until at least 1700 UTC, after which there were no more useful retrievals to verify against. The run employing empirical source relationships but without optimisation (Figure 5e) does not appear to be much better than the reference run in this case. This suggests that in this case most of the improvement in the optimised runs (with respect to the reference run) originate from the optimisation rather than the use of the empirical relationships.

Discussion
In this study we have compiled a dataset of NOAA's VOLCAT mass load retrievals from 14 different eruption events that were chosen so that the dataset spanned a wide range of eruption heights. For each event, the inverse modelling procedure of Zidikheri et al. (2018) [6] was used to estimate the range of possible values of eight source parameters, namely the eruption height, , the start-time lag, ∆ , the effective umbrella cloud diameter, , the umbrella cloud centre of mass, , the umbrella cloud depth, , the umbrella cloud to eruption column mass ratio, , the fine ash fraction, , and the mean particle radius, , subject to the satellite retrievals. The average value of each parameter for each case study was then fitted to a power-law relationship with respect to the average eruption height. The start-time lag, ∆ , and the umbrella cloud diameter, , describe the horizontal extent of the ash source. We found that both the start-time lag and the effective umbrella cloud diameter increase with eruption height, which is consistent with the physics of umbrella clouds. Because the start-time lag is simply the time difference between the start of the eruption and the time at which wind-driven transport becomes dominant, we expect the time lag to increase for large eruptions with strong gravity currents. Similarly, we expect the umbrella cloud diameter, , at this time to be greater for bigger eruptions than for smaller eruptions because of the stronger gravity current.
The three parameters, , , and , described the vertical distribution of mass of the source. We found that increased with eruption height, which is consistent with plume modelling results. The umbrella cloud centre of mass, , is expected to be at the level of neutral buoyancy and this will increase with the size of the eruption. Different models predict values ranging from 0.5 to 0.8 for this parameter [35], which is broadly consistent with the values of 0.86 , 0.65 , and 0.55 obtained from the derived power-law relationship in Table 3 for = 10, 20, and 30 km, respectively.
The umbrella cloud 'depth', , was found to decrease with increasing eruption height and the umbrella to eruption column mass ratio, , was found to increase with eruption height. These results are to an extent a consequence of our modelling choice. We effectively employ two cylinders, one representing the eruption column, and the other representing the umbrella cloud. Large eruptions with large umbrella clouds will have most of the ash mass within the umbrella cloud by the time wind-driven ash transport becomes dominant and, therefore, we expect , the mass ratios of the two cylinders, to increase with eruption height. In addition, the mass distribution in the `umbrella-cloud' cylinder needs to be tightly confined around , the neutral buoyancy level, to properly model the effect of the umbrella cloud. For smaller eruptions, the umbrella cloud effect is not as strong and, therefore, wind-driven ash transport is the dominant mechanism throughout the length of the eruption column. Therefore, we expect the mass distribution of the source to be more uniform, consistent with larger values of the umbrella-cloud `depth', .
The last two parameters, and , are related to the microphysical properties of the ash. The fine ash fraction, , describes the fraction of the total emitted ash that remains airborne at time scales at which wind-driven dispersion is dominant. For the most part, this will comprise smaller ash particles as larger particles will be rapidly deposited to the surface. We found that decreases with increasing eruption height. The reason for this is not completely clear although such an effect has been suggested by other studies [34]. One explanation is that in large eruptions, particularly in the moist tropics, there are significant amounts of ice embedded within the ash particles, which somehow enhances the fallout of ash particles, therefore reducing the amount of airborne ash. We also found that the mean particle radius, , increases with eruption height. This might similarly be related to the complex interaction between ash and ice particles. However, testing this hypothesis was outside the scope of this study.
The empirical scaling relationships derived from the compiled dataset can then be used to construct a more computationally efficient data assimilation procedure. Instead of employing thousands of dispersion model simulations to sample multidimensional space for optimal parameter values, the problem is reduced to sampling just the eruption height and the space of possible meteorological fields (provided by the ensemble NWP fields). This is particularly advantageous in an operational setting where fast algorithms are crucial to enable end users maximum time to respond to the issued forecast. The scaling relationships can also be used to provide forecasts of mass loads in situations where satellite retrievals are not available in real time insofar as estimates of eruption height are available. We have shown that this low-dimensional sampling technique, with or without real-time satellite retrievals, leads to significant improvement to the mass load forecast skill, and presumably also to the ash concentration forecast skill.
So far we have not explicitly discussed how temporal variations in eruption strength could be handled within the framework considered in this paper. Most of the eruptions considered here were relatively short-lived and could, therefore, be considered as having constant strength for the duration of the eruption. For long-lived eruptions, such as the November 2015 Rinjani eruptions, we have also assumed that the eruption strength was constant, which was a reasonable first approximation in that case. Clearly, for long-lived eruptions with significant variation in strength, however, this approach might not work very well. A practical solution to this problem is to estimate the eruption height, ( ), as a function of time from a series of satellite brightness temperatures. These numbers could be input into the model as initial guesses and then optimally rescaled as ( ) = ( ) , with the rescaling parameter taking the place of (the time-invariant) in the inverse modelling (analysis) phase described in Sections 2 and 5. The other source parameters considered in this paper would of course be estimated as functions of ( ) and also become functions of time.
The case studies used to derive the empirical scaling relationships in this study were all chosen to be in the tropics. That means that these relationships implicitly account for typical atmospheric conditions in the tropics and will work best in those regions. To make these relationships applicable outside of the tropics, the coefficients in Table 3 will have to be recomputed using case studies relevant to the region of interest. A natural extension of this study would be to investigate to what extent the predictive skill of the source parameters could be improved by considering not only their dependence on eruption height but also on local atmospheric state variables such as wind, temperature, and humidity. This would make the empirical relationships more generally applicable and possibly more accurate. Another aspect that could be improved in the future is the representation of various source quantities in our model. For example, scaling relationships that relate umbrella cloud diameter to eruption strength and time [36] could be used to improve the crude umbrella cloud representation in this paper. The empirically derived scaling relationships derived in this study have significant errors, particularly for the umbrella cloud depth, mass ratio, and particle size distribution as discussed in Section 4 and readily seen in Figure 1 and Figure 2. This implies in its current form that the search space in our low-dimensional inverse modelling algorithm might be too narrow. In subsequent publications, we shall demonstrate how this might be improved by considering the errors in the optimal source parameter values.

Conclusions
We have demonstrated how an inverse modelling technique, based on sampling many trial simulations, may be used to infer relationships between various volcanic ash source parameters and eruption height by utilising a dataset comprising satellite mass load retrievals for 14 eruptions. These relationships may then be used to construct more computationally efficient, low-dimensional, data assimilation procedures suitable for operational requirements, with some versions having the added benefit of not requiring real-time mass load retrievals. We have shown that these computationally efficient data assimilation schemes yield mass load forecasts that are significantly improved over current operational settings of the dispersion model.
Supplementary Materials: VOLCAT satellite retrieval data used in this study are available at http://doi.org/ 10.5281/zenodo.3579613. A detailed description of the application of multidimensional inverse modelling to different case studies is available online at www.mdpi.com/xxx/s1. Figure

Acknowledgments:
We would like to thank Richard Dare and Rodney Potts, both from the Bureau of Meteorology, for reviewing the first draft of this paper and providing useful comments. Useful comments were also provided by three anonymous external reviewers and integrated into the final version of this paper.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
For completeness, here we present the formulas for estimating the coefficients and in Equation (5), and their standard errors, by weighted linear regression. Equation (8) Table 3 based on 14 eruption case studies.