An Evaluation of Algebraic Turbulence Length Scale Formulations

: Turbulence kinetic energy (TKE) schemes are routinely used for turbulence parameterization in numerical weather prediction models. A key component of these schemes is the so-called turbulence length scale. Novel scale-aware, budget-based diagnostics that account for the cross-scale transfer of variances are used to evaluate the performance of selected turbulence length scale formulations in the gray zone of turbulence. The diagnostics are computed using the coarse-graining method on high resolution large eddy simulation data for selected idealized cases. The vertical proﬁles and the temporal evolution of the turbulence length scales are analyzed. Additionally, the local normalized root mean square error and a non-local three-component technique tailored speciﬁcally to the turbulence length scale proﬁles are used for the evaluation. Based on our analyses, we recommend using turbulence length-scale formulations that depend not only on the boundary layer height, but also on the TKE and stratiﬁcation. Such formulations are able to perform satisfactorily in different ﬂow regimes, but their scale-awareness is still limited. Only the Honnert et al. formulation shows a stronger scale-awareness thanks to its cut-off relationship in the gray zone. However, in contrast to the turbulence length scale diagnostics, its resolution dependence does not change with height.


Introduction
Turbulent motions in the atmosphere occur on a range of different scales, from millimeter to kilometer scales [1,2]. As some turbulent motions occur on scales that are always smaller than the resolution of an atmospheric model, basically all atmospheric models require a turbulence parameterization to account for the influence of sub-grid-scale turbulence. The level to which the turbulent flow is parameterized is thus dependent on the resolution of the model. In numerical weather prediction (NWP) and global circulation (GC) models, all turbulent motions, including both large and small eddies, are parameterized. However, in large-eddy simulation (LES) models, only the isotropic turbulent eddies (the smaller eddies) need to be parameterized, because the energy-containing anisotropic turbulent eddies (the larger eddies) are resolved in LES [1,3].
The parameterization of turbulence in models with a resolution between the classical NWP/GC and the LES models is naturally also required, but it is more difficult. This is because the anisotropic turbulent eddies are already partly resolved, and only the unresolved part of the large eddies needs to be parameterized. In this gray zone of turbulence, some classical NWP assumptions need to be revisited [4][5][6]. We focus in this paper particularly on the fact that the net sub-grid cross-scale transfer of turbulence kinetic energy (TKE) is not zero anymore in the gray zone [1].
In order to account for the influence of the cross-scale transfer of TKE, Reference [3] proposed a modification of the turbulence length scale in TKE turbulence schemes. The turbulence length scale is one of the key components in TKE turbulence schemes. Primarily, the turbulence length scale is used to parameterize the viscous dissipation in the prognostic TKE equation. Consequently, the primary output of a turbulence scheme, the vertical turbulent fluxes, are directly affected by the turbulence length scale (see [3,7] for more details). The modifications of the turbulence length scale should lead to a scale-aware formulation of the turbulence length scale, which would enable a better representation of turbulence in the gray zone.
The reference for such a new formulation is a scale-aware diagnostic, where the turbulence length scale is defined through the viscous dissipation of TKE. The TKE dissipation term is calculated from the TKE budget using high-resolution LES data. To obtain the length scale at different resolutions, a coarse-graining method is used on the LES data (averaging over sub-domains of different sizes) [3].
While this TKE-based turbulence length scale diagnostic is accurate in most situations, the use of a TKE budget faces a potential problem when gravity waves are present in and above the atmospheric boundary layer (ABL). Specifically, the velocity fluctuations in the LES data cannot be easily separated between turbulence and wave kinetic energy [2]. In the presence of gravity waves, the TKE can be over-estimated, which yields unrealistic values of the turbulence length scale [3]. This problem can be partly avoided if the length scale is diagnosed from the budgets of the following scalar variances: liquid water potential temperature variance and total specific water content variance. Therefore, we also extend the budget-based diagnostic to scalar variances in this paper.
All three diagnostics are then used for the evaluation of several existing algebraic formulations: the height-dependent Blackadar (1962) [8] and Bastak Duran et al. (2018) [9] formulations, the TKE-dependent Bougeault and Lacarrere (1989) [10] and Honnert et al. (2021) [11] formulations, and the combined Nakanishi and Niino (2009) [12,13] formulation. This particular set represents the most frequently used algebraic turbulence length scale formulations in NWP models. The evaluation is performed for several idealized cases, covering different atmospheric boundary layer conditions. The paper is organized as follows. Section 2 gives a detailed description of the different budget-based turbulence length scale diagnostics and algebraic formulations, along with a description of the LES model used to simulate a variety of different boundary layer cases. The main results of the study are shown in Section 3, looking in particular at the scale dependence of the budget-based diagnostics, the evaluation of the algebraic turbulence length scale formulations, and the temporal development of the turbulence length scale over the simulated time period. Finally, a summary of the study's findings along with the main conclusions are found in Section 4.

Turbulence Length Scale Diagnostics
In this study, we evaluate existing algebraic turbulence length scale formulations using turbulence length scale diagnostics computed from coarse-grained LES data (see [3]). Two sets of diagnostics are used for this purpose.
First, the turbulence length scale is diagnosed without consideration of the transfer of TKE or scalar variances across scales. The turbulence length scales are derived from the definition of the viscous dissipation term in the prognostic equations for the TKE and from the definitions of the molecular dissipation terms for the variance of the liquid water potential temperature, θ l , or the total specific water content, q t (see [3,9] for more details): where the [] l c operator denotes an average over a horizontal sub-domain of size l c (equivalent to a horizontal grid box in NWP) and over time (in our case over 1 h; see Section 2.3), and the fluctuations from this average are denoted with : where ψ stands for any relevant quantity, and the average over all sub-domains with size l c is denoted by <> (see [3] for more details about the LES coarse graining). e k (l c ), θ 2 l l c , and q 2 t l c are the sub-grid scale TKE, the variance of θ l , and the variance of q t for subdomains with size l c (corresponding to a NWP computational mesh with a grid spacing of l c ). L ,e k , L ,θ l , and L ,q t are the respective diagnosed turbulence length scales. C = 0.871 and C p = 0.3677 are closure constants [14,15]. , θ l , and q t are the viscous/molecular dissipation rates of the TKE, the variance of θ l , and the variance of q t in their respective Reynolds-averaged (assuming horizontal homogeneity) prognostic equations (see, e.g., [16]): where the operator denotes an average over the horizontal domain and in time (in our case over the whole LES domain and 1 h; see Section 2.3), and the fluctuations from this average are denoted with : x, y, and z are horizontal and vertical coordinates; u, v and w are the corresponding components of the wind velocity; t is the time; p is the atmospheric pressure; θ v is the virtual potential temperature; θ vr is a reference virtual potential temperature; ρ 0 is a reference density; and g is the gravitational acceleration. The first terms on the left-hand side in Equations (6)- (8) are the tendency terms. The remaining terms on the left-hand side are the horizontal and vertical advection terms. Please note that under the assumption of horizontal homogeneity, the contribution from the advection terms is equal to zero (see, e.g., [1]). The first terms on the right-hand side represent the vertical turbulent transport. For TKE, the turbulent transport term also includes the pressure correlation term (the last expression). The second terms in Equations (6)-(8) are the shear production term for TKE and the gradient terms for scalar variances. The third term on the right-hand side in Equation (6) is the buoyancy production/destruction term. Second, the above formulations are extended to account for the transfer of TKE or scalar variances across scales: where , θ l , and q t are the effective viscous/molecular dissipation rates of the TKE, the variance of θ l , and the variance of q t , defined as: where T e k r (l c ), T θ l r (l c ), and T q t r (l c ) are the cross-scale transfers at the cutoff scale l c for TKE, the variance of θ l , and the variance of q t .
Effective dissipation rates are estimated from the budgets of the corresponding variables for a sub-domain of size l c (obtained as residuals from all other terms computed from the LES data; see [3]):

Algebraic Turbulence Length Scale Formulation
Five representative algebraic turbulence length scale formulations are evaluated in this study. The length scales are computed for all sub-domain sizes from the corresponding averaged LES data (see Section 2.1). While all the studied algebraic formulations are dependent on different aspects of the boundary layer, all five share a common height dependence, where the turbulence length scale is proportional to height via the von Karman constant, k, in a neutral surface layer.

Blackadar Formulation
The first algebraic formulation used in this study was proposed by Blackadar (1962) [8] and is referred to as L B for the remainder of this paper. The mixing length l B is defined as follows: where z is the altitude and λ is an asymptotic parameter set to 150 m for unstable layers and to 30 m for stable layers. It should be noted that l B was derived from similarity law relationship in neutral stratification. The corresponding length scale for a TKE scheme, L B , is computed from l B according to [9,17]: C K = 0.0882 is a closure constant [7].

Bastak Duran et al. Formulation
The next formulation used in Bastak Duran et al. [9], L BD1 , is an algebraic formulation derived from [18]: where H abl is the ABL height, calculated using the bulk Richardson number. The constants λ m , a m , b m , and β m are set to be 350 m, 5.5, 3.0, and 0.02, respectively.

Nakanishi and Niino Formulation
The formulation by Nakanishi and Niino (2009) [12,13], L NN , comprises three different turbulence length scale formulations, L Sur f , L T , and L Bouy : where B 1 = 24.0 and B 2 = 15.0 are closure constants and L NN,e k , L NN,θ l , and L NN,q t are values used for TKE and scalar variances, respectively. Please note that the introduction of L NN,e k , L NN,q t , and L NN,θ l is only required for our evaluation since the formulations for the dissipation terms in [13] differ from the formulation that was used in our turbulence length scale diagnostic (Equations (1)-(3)). The three length scales considered for L NN are dependent on different aspects of the boundary layer. The first of the three length scales, L Sur f , is the turbulence length scale within the surface layer. Above the surface layer, the value of L Sur f increases and therefore becomes less significant in the overall calculation. L Sur f is computed as follows (Equation (53) in [13]): where ζ is a stability parameter, L MO = −θ vr u 3 * /(kg(w θ v ) s ) is the Monin-Obukhov length, u * is the friction velocity, and (w θ v ) s is virtual potential temperature flux at the surface.
The second of the three length scales, L T , is the length scale that is dependent on the boundary layer height [12,13,19]: where h is the vertical limit of the simulated domain. The final length scale, L Bouy is related to how far a parcel of air, with a given TKE, can travel vertically before the buoyancy force stops the motion: where N is the Brunt-Väisälä frequency and q c is a velocity scale (see [13]). L Buoy is most accurate in stable layers.

Bougeault and Lacarrere Formulation
The Bougeault and Lacarrere [10] length scale, L BL , takes the balance between the TKE and buoyancy into account. This length scale is computed by combining the upward and downward free paths, L up and L down : The value of L up is defined as the movement of a parcel of air with a given TKE in the upward direction until a point where it is stopped by buoyancy effects [10,20]: The derivation of L down is similar to L up , but it is calculated in the opposite direction: It also has an additional requirement in that the parcel cannot go below the surface. The L BL is scaled similarly to L B (see Equation (21)) to ensure similarity law relationship near the surface.

Honnert et al. Formulation
The final formulation, derived by Honnert et al. (2021) [11], L H21 , is a combination of a non-local turbulence length scale and grid box size: where ∆x and ∆y are horizontal sizes of the grid box (in our case sub-domain), α = 0.5 is a calibration parameter, and L R is an extension of L BL by a contribution from shear according to [21]: where σ is the vertical wind shear, L Rup and L Rdown are the upward and downward free paths, and C 0 = 0.5 is a calibration constant.

LES Data
A total of five idealized cases with different ABL conditions are used for the evaluation of the turbulence length scale formulations: a continental cumulus case based on data from the Atmospheric Radiation Measurement (ARM) program at the Cloud and Radiation Testbed site in Oklahoma [22], a stratocumulus case with drizzle based on data recorded during the first research flight (RF01) of the second period of the Dynamics and Chemistry of Marine Stratocumulus (DYCOMS-II) campaign [23], a stable boundary layer case based on data from the Global Energy and Water Cycle Experiment (GEWEX) Atmospheric Boundary Layer Study (GABLS1) [24,25], a trade-wind cumulus case from the Barbados Oceanographic and Meteorological Experiment (BOMEX) [26], and a precipitating shallow cumulus case from the Rain in Shallow Cumulus over the Ocean (RICO) campaign [27].
All cases are simulated using the MicroHH LES model [28] with a very high resolution that ensures accuracy of the diagnostic method according to the recommendation in [3]. The LES were first run with a coarse resolution and then with subsequently finer resolutions, until the budget terms converged sufficiently. Sufficient convergence was declared if the values of the second-order moments changed less than by a predefined threshold (5% of the value) with increasing resolution. The finest resolution was then considered as adequate for the turbulence length scale diagnostic. The domain size was chosen to be large enough to be able to include sub-domains with spatial scales above the energy production range of TKE; i.e., the spatial scale of the larger sub-domains correspond to the resolution of typical NWP models. The smallest sub-domains have a spatial scale in the gray zone of turbulence for the particular case.
A detailed description of the model, including its governing equations and parameterizations, can be found in [28]. A summary of the setups for the five LES cases is listed in Table 1 (an adaptive time step was used).
The LES data are sampled every 60 s. The LES statistics and the subsequent turbulence length scale profiles, diagnosed or computed according to the algebraic formulations, are obtained by averaging over one hour in time (moving average). In order to study the scale-dependence of the turbulence length scale, the diagnostics and the algebraic formulations are computed on coarse-grained data with varying resolution. The horizontal domain of the LES is divided into a series of sub-domains of equal size. The number (N sd ) and the size (l c ) of the sub-domains are chosen so that the number of grid points in a sub-domain is large enough to ensure the representativeness of the resulting quantities (see the respective studies of the selected cases). Furthermore, the smallest sub-domains have a spatial scale in the gray zone of turbulence.
A summary of the sizes for all the sub-domains is listed in Table 2. A schematic of the LES domain can be seen in Figure 2 of [3]. It should be noted that estimation of the dissipation rates and the effective dissipation rates is not always accurate. In such cases, the values of these quantities are either too small or negative.
Typically, these difficulties with accuracy occur above the ABL, where the contribution from gravity waves to the budgets of TKE and scalar variances can be expected. To avoid these regions, all data points that have an effective TKE dissipation rate smaller than a prescribed fraction, C = 5 × 10 −5 s −1 , of the estimated surface TKE [29]: per second ,are filtered out. Here w * is the convective velocity scale, and c 0 = 3.75 and c 1 = 0.2 are empirical parameters [29].
Additionally, values that fall under a prescribed threshold are not used for computation of the turbulence length scale formulations. The prescribed thresholds are 10 −5 m 2 s −3 for dissipation rates of TKE, 10 −8 K 2 s −1 for dissipation rates of the θ l variance, and 10 −12 kg 2 kg −2 s −1 for dissipation rates of the q t variance.

Evaluation Scores
The turbulence length scale formulations are evaluated using root mean square of a normalized (by H abl ) length scale error: where z i is the height of the i-th model level, n z is number of model levels in the ABL, t j is the time after j time steps, n t is number of time steps, L F is length scale computed according to one of the turbulence length scale formulations listed in Section 2.2, and L D is one of the three length scale diagnostics (L C,e k , L C,θ l , and L C,q t ). The RMSE n score shows the average of the spatially local (independently at each level) accuracy of the length scale formulation. In order to also evaluate non-local properties of the formulations, such as amplitude and shape of the profile, a three-component score inspired by a three-component technique of [30] is used. In this paper, however, the three-component technique is tailored specifically for the turbulence length scale evaluation in the ABL. The first component is the amplitude component, A, which is defined as the mean (in time) difference in the normalized length scale paths: where L F P and L D P are normalized (by H abl ) length scale paths in the ABL. A expresses the error in the mixing potential of the whole length scale profile. When looking at the shape of the length scale profile, the most prominent feature is a peak in the ABL (for details see Section 3.1). Therefore, two shape components are used to specify the mean (in time) error in the normalized (by H abl ) peak location, S l , and peak magnitude, S m : where L F,max and L D,max are the maximum values of the length scales in the ABL, and z F,max and z D,max are the heights of these peaks, respectively. In case there are multiple local maxima in the ABL, the lowest maxima is taken. If the maxima are at adjacent model levels, the peak location is set to the average height of these points (typically for L H21 ). All three components are then displayed in one scatter plot, where S l and S m are the x and y coordinates, and the A value determines the coloring of the point.
Please note that all scores are normalized by H abl to enable a more consistent comparison between different cases and between different points in time. H abl is determined from the vertical profile of for the given sub-domain size. H abl is the height above which drops below a critical value of 10 −5 m 2 s −3 .

Results
The turbulence length scale diagnostics based on the budget of TKE and scalar variances for selected idealized cases are presented in this section. Subsequently, these diagnostics are used for the evaluation of the chosen algebraic turbulence length scale formulations.

Turbulence Length Scale Diagnostics
The new scale-aware turbulence length scales are computed from the effective dissipation rates, which are obtained from the variance budgets (see [3] and Section 2.3 for more details). To visualize the accuracy of the effective dissipation rates diagnostics, the individual terms of the three budgets are presented in Figure 1 for the BOMEX case. It can be seen that the grid spacing, vertical resolution, and the domain size of the LES are sufficient to obtain smooth profiles of the individual terms for all sub-domain sizes (only the results for the second largest, second smallest, and the smallest sub-domain sizes are shown). Such smooth profiles are necessary for further computation of the effective dissipation rates (violet lines). Similar results are found for all five idealized cases (not showed here). When comparing the diagnostics for different sub-domain sizes, the magnitude of the source terms (buoyancy term-green, gradient terms-blue), the turbulent transport terms (orange) is in general found to be smaller for the smaller subdomain sizes.
The vertical advection terms (gray) should be equal to zero because horizontal homogeneity is assumed for all sub-domains. However, the vertical advection terms deviate from zero for the smallest sub-domains (N sd = 32 2 ). This implies that the condition of horizontal homogeneity is not fulfilled at this scale and thus the length scale diagnostics for the smallest sub-domain size is less accurate. Therefore, only results for larger sub-domains are used in the evaluation. The accuracy of the method for the smallest sub-domain size can be increased by including horizontal terms (i.e., the buoyancy, shear/gradient, and turbulence transport terms) in the respective budgets.
The turbulence length scale diagnostics resulting from the budgets can be seen in Figure 2 for all cases. As expected, all length scales have a similarly shaped profile. Near the surface, the length scale is proportional to the distance from the surface according to the von Karman theory. At higher levels, the length scale continues to grow, but the growth rate decreases until the length scale reaches a maximum in the ABL. The location and the intensity of this peak depend on the ABL regime, the resolution, and the type of diagnostic (see below). Above the peak, the turbulence length scale decreases towards the top of the ABL, where it reaches either a positive value or it converges to zero. The L C,e k and L ,e k diagnostics tend to show a secondary length scale peak near the top of the ABL in the convective cases (ARM, BOMEX, RICO) [3]. Such an increase in the turbulence length scale would imply an increase in the representation of the top entrainment. However, for the L C,θ l , L C,q t , L ,θ l , and L ,q t diagnostics, the secondary peak is either significantly smaller (ARM, BOMEX, RICO) or non-existent (L ,θ l and L C,θ l for RICO). This suggests that the amplitude of the secondary peak in the L C,e k and L ,e k diagnostics is over-estimated due to the lower accuracy of the method caused by the presence of gravity waves near the top of the ABL [3].
In the cloudy cases, the location of the ABL peak correlates with the height of the cloud base. In the dry GABLS1 case, the peak is significantly weaker and is located roughly in the middle of the ABL. This means that the presence of clouds and related processes significantly affects the shape and the amplitude of the turbulence length scale profile.
At lower resolutions (larger sub-domains), the six different diagnostics are in general closer to each other for all cases. At higher resolutions (smaller sub-domain sizes), which enter the gray zone of turbulence (starting roughly with 8 2 sub-domains), the differences between the classical (L ,e k , L ,θ l and L ,q t ) and new diagnostics (L C,e k , L C,θ l , L C,q t ) become more apparent. While the classically diagnosed turbulence length scales monotonously decrease with the sub-domain size due to their clear dependence on the TKE and the scalar variances, the changes in the new diagnostics are height-dependent because of their additional dependence on the effective dissipation rates [3]. Basically, the newly diagnosed length scales decrease more slowly or even increase in the region of the ABL peak and decrease faster in the remaining regions with increasing resolution. Such changes make the shape of the vertical profile of the turbulence length scales resolution-dependent, where the ABL peak is more pronounced at higher resolutions.
Similarly, there are also bigger differences visible between the diagnostic based on TKE (L ,e k , L C,e k ) and diagnostics based on scalar variances (L ,θ l , L ,q t , L C,θ l , L C,q t ) at higher resolutions. The ABL peak is sharper in the scalar variances diagnostics than in the TKE diagnostic. The scalar variances diagnostics are relatively close to each other with the exception of the RICO case, where the L ,θ l and L C,θ l have an additional strong peak in the sub-cloud layer. Differences between the TKE diagnostics and the scalar variance diagnostics in the ABL peak could be attributed to the accuracy of the diagnostic method, because the effective dissipation rates for scalar variances have relatively small values in the sub-cloud layer for higher resolutions (particularly for potential temperature variance). However, despite the potential accuracy issues, which is most evident in the highest resolution, the increase in the ABL peak for the scalar variances in the sub-cloud layer appears consistently in all cloudy cases. The differences between L C,e k and L C,θ l , L C,q t could indicate that the dissipation rates for scalars are not entirely proportional to the dissipation rates of TKE in the gray zone of turbulence as is usually assumed at lower resolutions (see, e.g., [7,14,31]), and that their dependence could be height-or regime-dependent.  (1st column; a,d,g,j,m), the θ l variance budget (2nd column; b,e,h,k,n), and the q t variance budget (3rd column; c,f,i,l) for the five boundary layer cases (rows). The different sub-domain sizes are represented by different colors of lines, where the full lines are our new diagnostics (L C ) and dashed lines are the classical diagnostics (L ). N sd indicates the number of sub-domains. L C,q t and L ,q t are not plotted for the dry GABLS1 case. The budgets are plotted for the ARM case after 10 h of integration, for the BOMEX and the RICO case after 7 h of integration and for the DYCOMS-II case and the GABLS1 case after 3 h of integration. The horizontal black dashed lines indicate the ABL height, and the horizontal black solid lines indicate the cloud base height.

Temporal Evolution of the Turbulence Length Scale Diagnostics
For a better understanding of the behavior of the turbulence length scale, we present the temporal evolution of its diagnostics in Figures 3a-c, 4a-c, 5a-c, 6a-c and 7a-c. Due to potential accuracy issues with the smallest sub-domain sizes (highest resolution), discussed in Section 3.1, the temporal evolution is analyzed for the second smallest sub-domain size.
For all cases, all diagnostics show a similar shape of the vertical profile for the turbulence length scale as described in Section 3.1: a linear increase with height near the surface; a peak in the ABL near the cloud base when clouds are present; and a decrease towards the ABL top, where zero or a positive asymptotic value is reached. As expected, it can be seen that the shapes of the profile follow the changes in the ABL height (dashed black line) and the cloud base height (full black line), which implies that these two heights should play an important role in the parameterization of the turbulence length scale. An exception to the dependence on the ABL height is visible in the initial phase of the GABLS1 case, where the amplitude of the ABL peak changes without any correlation to the ABL height (no clouds are present).
When comparing the three types of diagnostics (TKE, θ l variance, and q t variance diagnostics), the results are consistent with findings in Section 3.1 for this (second smallest) sub-domain size, namely that the scalar variance diagnostics tend to have a more pronounced peak (sharper gradients of shading) in the ABL than the TKE diagnostic during the whole integration interval. Furthermore, a stronger secondary peak near the ABL top is present only in the TKE-based diagnostic.

Evaluation of the Algebraic Turbulence Length Scale Formulations
The selected algebraic formulations (see Section 2.2) computed from the coarse-grained LES data (see Section 2.3) are compared to the turbulence length scale diagnostics in Figure 8 for all cases.
All formulations exhibit an almost linear growth of the turbulence length scale near the surface as an extension of the von Karman theory, which is valid in the surface layer. A slight over-estimation compared to the diagnostics can be seen in the L BL and L H21 formulations, which is caused by the influence of the upward length scale component, L up or L Rup , that can be larger than the distance from the surface. The linear behavior near the surface layer does not significantly change with the resolution.   Figure 3, but for the GABLS1 case. L C,q t is not plotted for the dry GABLS1 case.
There are differences in the location and amplitude of the ABL peak between the length scale formulations. In order to achieve a more objective assessment of these characteristics, the time-averaged three-component plots (see Section 2.4) for two resolutions are presented in Figures 9 and 10. L B does not have an ABL peak, which significantly decreases its accuracy, as can be seen in the local normalized RMSE scores (see Equation (38)) that are presented in Figures 11 and 12 for two selected resolutions. Additionally, L B is the same for all cases and all resolutions, because it depends only on the height. L BD1 has a built-in ABL peak in its formulation, but its use is more suited for cloudy cases, where the estimation of the height and magnitude of the ABL peak is relatively accurate. L BD1 strongly over-estimates the magnitude of the ABL peak in the GABLS1. While L BD1 could be improved by calibration for the GABLS1 case, it would decrease its performance in the cloudy cases. Its overall performance across cases is thus limited. The shape of the L BD1 profile is closer to the shape of the L C,e k profile (the less pronounced ABL peak) than the profiles for scalar diagnostics. Therefore, its ABL peak estimation fits the TKE diagnostic better.
Both L B and L BD1 significantly over-estimate the magnitude of the length scale in the GABLS1 case, which can be seen not only in the non-local overall mixing A-component (red color), but also in the local RMSE scores. The L BD1 also over-mixes in the cloudy cases, except for the ARM case, but its local RMSE score is relatively good compared to other formulations. L BD1 only scales with the ABL height; therefore, its overall performance decreases with increasing resolution.
L NN,e k , L BL , and L H21 have A-component and RMSE scores similar to L BD1 for the shallow convection cases (ARM, BOMEX and RICO). For these cases, their ABL peak magnitude closely matches the diagnosed magnitudes. However, their ABL peak is mostly placed below the diagnosed heights.
For the stratocumulus DYCOMS-II case, L NN,e k under-estimates the ABL peak magnitude, and L BL and L H21 over-estimate the ABL peak magnitude and under-estimate the ABL peak height. In the GABLS1 case, L BL over-estimates the amplitude and under-estimates the location of the ABL peak. L NN,e k shows a better ability in this respect. L H21 also shows a good ability due to its explicit dependence on the vertical wind shear (see Equations (35) and (36)). The differences in the ABL peak for DYCOMS-II and GABLS1 cases are also mirrored in the differences in the A-component and the RMSE scores.
Because L NN,e k , L BL , and L H21 scale not only with the ABL height, but also with the TKE and stratification, their case-awareness is better than for L B and L BD1 . Their resolution awareness is also improved due to this dependence. However, only L H21 shows significant adjustment to resolution due to its cut-off formulation (see Equation (32)). Therefore, L H21 out-performs L BL at higher resolutions, particularly for the DYCOMS-II case, as can be seen in all scores. L H21 and L BL are very close to each other at lower resolutions for the cloudy cases. L NN,e k shows better performance for higher resolutions. This is probably caused by the choice of that particular calibration rather than by the scale-awareness of L NN,e k , since L NN,e k changes relatively slowly with resolution.
When all aspects of the evaluation are taken in to account, L NN,e k shows the best performance among the selected length scale formulations. This is due to its composite formulation that introduces dependence to the ABL height, the TKE, and the stratification. Due to these properties, L NN,e k adjusts to the specific flow regimes in the selected cases. L NN,e k also slightly adapts to the changes in the resolution. However, L NN,e k underestimates the height of the ABL peak compared to the diagnostics. The scale-awareness of L NN,e k is rather weak and depends on the specific calibration of the constants. L H21 and L BL have similar scores to L NN,e k for the cloudy cases, but these formulations place the ABL peak even lower than L NN,e k and also over-estimate the ABL peak magnitude and overall mixing for DYCOMS-II. For the GABLS1 case, L H21 is comparable with L NN,e k and clearly out-performs L BL due to its dependence on the vertical wind shear. While L BD1 is well suited for cloudy cases, it strongly over-estimates the turbulence length scale for the GABLS1 case. Additionally, L BD1 lacks scale awareness since it only scales with the ABL height. Adjustment to other resolutions and flow regimes can thus be achieved only via recalibration. L B is the simplest formulation, and thus it is not surprising that it has the worst scores. In particular, it does not have a peak in the ABL, and does not scale with any characteristic of the ABL.
When looking at the temporal evolution of the turbulence length scale formulations (Figures 3-7), the above overall evaluation can be confirmed for all formulations. L NN,e k , L H21 , L BL , and L BD1 change according to the changes in the ABL height. L B does not have this dependence. In addition, the TKE-dependent formulations (L NN,e k , L H21 , L BL ) are noisier than L BD1 in the shallow convection cases, which could deteriorate their performance when used as an an active component of a numerical model. Figure 8. Comparison of the turbulence length scale diagnostics (L C,e k , L C,q t , L C,θ l , L ,e k , L ,q t , and L ,θ l ) and the turbulence length scale formulations (L B , L BD1 , L NN,e k , L NN,q t , L BL , and L H21 ). Vertical profiles are plotted for the ARM case after 10 h of integration, for the BOMEX and the RICO case after 7 h of integration, and for the DYCOMS-II case and the GABLS1 case after 3 h of integration. L C,q t and L ,q t are not plotted for the dry GABLS1 case. The length scales are computed for the second-largest sub-domain size (1st column; a,c,e,g,i) and the second-smallest sub-domain size (2nd  column; b,d,f,h,j). N sd indicates the number of sub-domains.

Summary, Discussion, and Future Work
We have evaluated selected algebraic turbulence length scale formulations with budget-based turbulence length scale diagnostics that account for the cross-scale transfer of variances [3]. The diagnostics were computed using a coarse-graining method on high resolution LES data for selected idealized ABL cases: ARM (a continental cumulus case), BOMEX (a trade-wind cumulus case), RICO (a precipitating shallow cumulus case), DYCOMS-II (a stratocumulus case with drizzle), and GABLS1 (a weakly stable boundary layer case).
All vertical profiles of the length-scale diagnostics have a typical shape. Near the surface, the length scale is proportional to the distance from surface according to the von Karman theory. At higher levels, the length-scale growth rate decreases until it reaches a maximum in the ABL, whose location and intensity depend on the ABL regime, the resolution, and the type of diagnostic. Above the peak, the turbulence length scale decreases until it reaches an asymptotic value (positive or zero) at the top of the ABL.
In the cloudy cases, the location of the ABL peak correlates with the height of the cloud base. In the dry GABLS1 case, the peak is significantly weaker and is located roughly in the middle of the ABL. The new scale-aware diagnostics change with resolution (sub-domain size), but contrary to the changes in the classical turbulence length scale diagnostics, the resolution changes in the new diagnostics are height-dependent.
Compared to our previous study [3], we used not only the diagnostic based on the TKE budget, but also the diagnostics based on the budgets of the liquid-water potential temperature variance and the total specific-water-content variance. This extension helps to mitigate the accuracy problems of the TKE-based diagnostic in the presence of gravity waves in the convective cases near the top of the ABL, where the TKE diagnostic tends to show a secondary peak. Indeed, the scalar variance diagnostics indicate that the TKE diagnostic probably over-estimates the magnitude of this secondary peak in the ABL. In addition, we can observe that the ABL peak is sharper in the scalar variance diagnostics than in the TKE diagnostic. This could indicate that the dissipation rates for scalars are not entirely proportional to the dissipation rates of TKE in the gray zone of turbulence, as is usually assumed at lower resolutions, and that their dependence could be height-or regime-dependent.
We have used the local normalized RMSE (see Equation (38)) and the non-local threecomponent technique tailored specifically for the turbulence length scale profiles (see Section 2.4) for the evaluation of the turbulence length scale formulations.
Overall, L NN,e k shows the best performance among the selected length scale formulations. This is due to its multi-component composition and its dependence on the ABL height, the TKE, and the stratification. Thanks to these properties, L NN,e k adjusts to the specific flow regimes in the selected cases and also slightly adapts to the changes in the resolution. However, L NN,e k under-estimates the ABL peak location, and its scale-awareness could be stronger. L H21 and L BL have similar scores to L NN,e k for the cloudy cases but are less accurate in the estimation of the ABL peak location and magnitude, particularly for the DYCOMS-II case. For the GABLS1 case, L H21 is comparable with L NN,e k and clearly out-performs L BL due to its dependence on the vertical wind shear. L BD1 is well suited for cloudy cases, but it strongly over-estimates the turbulence length scale for the GABLS1 case. Additionally, L BD1 only scales with the ABL height, and thus it has almost no scale awareness. L B is the simplest formulation, and thus it is not surprising that it has the worst scores. In particular, it lacks the peak in the ABL and does not scale with any characteristic of the ABL.
When looking at the temporal evolution of the turbulence length scale formulations, L NN,e k , L H21 , L BL , and L BD1 adequately change according to the changes in the ABL height. In addition, the TKE-dependent formulations (L NN,e k , L H21 , L BL ) are noisier than L BD1 in the shallow convection cases, which could deteriorate their performance when used as an an active component of a numerical model.
It is clear from the evaluation that a proper scaling with the ABL height, TKE, stratification, and the vertical wind shear can improve the performance of the formulations. Such a scaling behavior makes the formulations more universal in terms of the ABL flow regime. To further improve this behavior, additional scaling variables could be introduced. In particular, all evaluated formulations have difficulties in the determination of the height of the ABL peak. As we have seen in the diagnostics, the ABL peak is usually placed just under the cloud base height. However, none of the formulations have an explicit dependence on the cloud base height, and thus the introduction of such a feature could be beneficial for future turbulence length scale formulations. We plan to investigate the scaling potential of the cloud base height in our future work.
The scale awareness in most of the evaluated formulations is relatively poor. The TKE's dependence on the resolution and the stratification introduces a degree of scale awareness to L NN,e k , L H21 , L BL , but it is not sufficient. Only L H21 shows a stronger scale awareness thanks to its cut-off formulation in the gray zone. However, in contrast to the turbulence length scale diagnostics, this dependence on resolution does not change with height. To improve the representation of turbulence in the gray zone of turbulence, we would recommend a turbulence length scale formulation that has a scale dependence that changes with height. We plan to propose such a formulation in our future work.
Better performance of the evaluated formulations can be achieved via recalibration of constants. This option can be used when changing the resolution of the model. Still, such a recalibration is rather time-consuming and does not support seamless changes in model resolution. Furthermore, a recalibration cannot be used to mitigate the lack of flow-regime dependence.
Based on our evaluation, we recommend using the L NN,e k , L H21 or L BL formulation in TKE turbulence schemes. Nevertheless, we would like to point out that the turbulence length scale formulation is only a part of a turbulence scheme, and thus the overall performance of the scheme can decrease if the turbulence length scale formulation is changed. This is because the scheme was previously calibrated with a different length scale formulation. Hence, an introduction of a new turbulence length scale formulation requires recalibration of the whole scheme, and/or recalibration of other physical parameterizations of the model.

Data Availability Statement:
The data for this study were generated with the LES model MicroHH and are openly available in Zenodo at https://zenodo.org/record/822842 [32] (accessed on 26 February 2022). The configuration files, outputs, and visualization scripts for the LES are openly available in Zenodo at: https://doi.org/10.5281/zenodo.6372434 [33] (accessed on 26 February 2022).

Acknowledgments:
The authors would like to note that this research was made possible using the resources of the Deutsches Kimarechenzentrum (DKRZ) granted by its Scientific Steering Committee (WLA) under project ID bb1096.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: