Maximum Likelihood instead of Least Squares in fracture analysis by means of a simple Excel sheet with VBA macro

This short note illustrates a linear regression algorithm based on Maximum Likelihood Estimation, with related Excel sheet and VBA program, adapted to the case of fracture aperture data sets in which sampling of smallest values was problematic. The method has been tested using Monte Carlo simulations and exhibits surprisingly better convergence against Least Squares criterion. As the method is conceptually simple and, following the here illustrated indications, the relative spreadsheet is easily achievable, it may be routinely used instead of the Least Squares in fracture analysis, as well as all those numerous cases in geology and geophysics in which there are systematic biases at the lower limits of the sampling scale.


Introduction
In order to study statistical features of aperture values in stratabound fracture sets, Guerriero et al. (2015) carried out an accurate statistical investigation involving two Lower Cretaceous (Albian) carbonate successions, outcropping at Faito and Chianello Mts.(southern Italy).These successions, which had been selected as surface analogues of buried oil reservoirs in Val D'Agri (Basilicata, Italy), were previously well studied in terms of sedimentology and petrophysics (Giorgioni et al., 2016;Iannace et al., 2008 and references therein), as well as of geological structural settings (Guerriero et al., 2010(Guerriero et al., , 2011(Guerriero et al., , 2013;;Vitale et al., 2012).It should be taken into account that, while stratabound fracture spacing has been extensively analysed (e.g., Odling et al., 1999, Bai andPollard, 2000; Guerriero and Mazzoli, 2021 and references therein), their aperture is little studied.The statistical analysis carried out by Guerriero et al. (2015) pointed out that stratabound joint aperture depends on bed thickness, according to a linear function, characterized by a non-zero constant term.This may have important implications for the hydraulic and structural characterization of fractured rocks (Guerriero et al., 2015).Nevertheless, in that analysis, due to evident di culties in eld sampling of joint aperture, it was decided to set a minimum threshold value of 0.265 mm for them, so including into this single category all detected aperture values lesser than this limit.In several cases, this artifact may induce a marked tendency of the best t line to intersect the ordinate axis near this threshold (Guerriero et al., 2015).As the constant term value identi ed by Guerriero et al. (2015) was just close to this limit, the suspicion arose that it could have been affected by an artifact.Therefore, it has been decided to repeat the analysis of those data using a more effective method, based on Maximum Likelihood Estimation (MLE).Illustrating the results of such analysis goes beyond the scope of this work and these will be presented in a companion paper to be published later.
The aim of this paper is to describe the adopted statistical method, the details about its validation and effectiveness, as well as the utilized algorithm and software.

Linear regression by means of Maximum Likelihood Estimation
A simple probabilistic model for joint aperture value distribution, compatible with eld fracture data is of Log Normal kind, in which the median is a linear function of mechanical bed thickness, and the standard deviation is constant (Guerriero et al., 2015).To estimate the aperture-bed thickness regression line parameters, in this work the MLE (e.g., Dekking et al., 2005) is used rather than that of Least Squares (LSM), in order to reduce biases related to small fracture sampling.Furthermore, in this analysis, residuals r j are calculated as difference between the logarithms of the observed and predicted values (Guerriero et al., 2015): r j = ln z j -ln y j ; (1) Let denote by x i the limit value between contiguous aperture classes s i .In this instance x i is an intermediate value between s i and s i+1 , opportunely chosen (Fig. 1).Under the hypothesis that aperture values exhibit Log Normal distribution F mn.d (x), whose mean is ln y and standard deviation is d (calculated according to Eq. 1).Here y is a linear function of bed thickness T, whose parameters are coe cient m (mm/cm) and constant term n (mm).Then the probability that a measured aperture value S, falls within the class s i , here denoted by p mn,d (s i ), for i > 1, is: Searching for the maximum of this function on the space of the three parameters m,n and d (numerically), the maximum likelihood estimates of these three parameters are achieved.Should be noted as according to the LSM, for each aperture value belonging to the 0.265 mm class, it is stated that this is substantially close to the theoretical one (false statement) whereas, according to the MLE (i.e, according to Eq. 3), it is stated that the observed value is less than or equal to the value x 1 (true statement).This is the substantial difference between MLE and LSM.

The Excel sheet and VBA program
The Excel folder utilized in this work includes three sheets: MLE, LSM and Results.Figure 1 illustrates in detail the sheet MLE; with respect to this latter the sheet LSM is different only in column K cells, whose formula calculate the square of residual in the adjacent cell in column J ().The rst 30 rows, which are not depicted here, include a header illustrating some user instructions.The routines, written in Visual Basic for Applications (VBA), which utilize this folder to analyse data and carry out Monte Carlo simulations are: Sub Maximize() and Sub Minimize_LS(): analyse a data set by maximizing or minimizing an object function, which is Log likelihood for the former and residual standard deviation for the latter.Sub Simul_Apert_Data(): based on eld data (#7 in Fig. 1) and on model true values of m, n and d (#2), it produces a 35 item data set by (i) resampling thickness data and (ii) producing, for each thickness, a random aperture value.Then, it identi es which class it belongs to and its limits (#6).Sub Simul_100(): For each triplet of true values m, n and d, it produces 100 simulated data sets and analyses each one by means of Sub Maximize(), in sheet MLE, and Sub Minimize_LS(), in sheet LSM.Then it saves the estimated values in columns S, T and U. Sub Monte_Carlo(): varies the n true value in the range 0.05-4.75mm, and for each value produces simulations by calling Sub Simul_100(), then saves the related results (#11) in sheet Results (Fig. 2).
The core of the calculation method in the MLE sheet is in formulas in column K, in which for each aperture value (in column E, #6) the probability logarithm that it falls within the range to which it belongs, is calculated as follows: LN(LOGNORMDIST(G60;LN(I$55*C60 + I$57);J$48)-LOGNORMDIST(F60;LN(I$55*C60 + I$57);J$48)) The formula LOGNORMDIST(G60;LN(I$55*C60 + I$57);J$48) provides the probability that an aperture value is lesser or equal to the limit in cell G60, when its median value is a linear function of thickness (term LN(I$55*C60 + I$57)) and its standard deviation is the value d in cell J48.The difference between these distributions furnishes the probability that an aperture value lies within the range limited by cells G60-F60.The sum of logarithms of these probabilities returns the Log likelihood (e.g., Dekking et al., 2005) in cell J55.The routine Sub Maximize() starts calculations by assigning likely initial values to parameters m, n and d achieved by Excel least squares functions in cells J46:L46 (#4) by means of instructions such as: Range("I55") = Range("K46") Range("I57") = Range("L46") Range("J48") = Range("J46") then, it iteratively adjusts the values of parameters m, n and d, in cells I55, I57 and J48 respectively (#3), according to a simple steeper descent algorithm, until the maximum of the Log Likelihood (cell J55) is reached.

by means of Monte Carlo simulation
A series of Monte Carlo simulations was carried out to verify MLE effectiveness and to make a comparison with the LSM, with parameter n (which was critical in our analysis) varying in the range 0,05 − 0,475 mm.From sheet MLE, the Sub Monte_Carlo() is called, which assigns, as "true" values from which simulated data are produced, m = 0.3, d = 0.2 and varies n, starting from 0.05 to 0.475 with step of 0.025.For each n value, 35 thickness values are chosen from eld data by means of random numbers (Sub Simul_Apert_Data()).For each thickness value, an aperture one is produced by means of a random number and of the function in cell F52: LOGINV(E52;LN(H41*G52 + H43);I$48) which returns the inverse Log Normal distribution of: (1) random number, (2) mean as logarithm of a linear function with m value from cell H41 and n from H43 and (3) standard deviation from cell I48.Then, this subroutine individuates the class and related limits, to which this value belongs, and saves these in columns E, F and G. Hence, the Sub Maximize() is called, which maximize the Log Likelihood.Then, the simulated data set is copied within sheet LSM, and the best t line is calculated, by minimizing the sum of square of residuals, calculated according to Eq. 1 (Sub Minimize_LS()).
For each n value, this procedure is repeated 100 times, then (m, n, d) are saved in columns U, T and S respectively.After 100 iterations, the average and standard deviation of these column values are saved into the sheet Results.Then, n is incremented of 0.025 mm.In calculating the coe cient and constant term, the MLE estimator shows modest deviation from the true value as early as n = 0.05 and converges for n > = 0.1.Also in estimating the standard deviation d of Log aperture value (i.e., the standard deviation of residuals according to Eq. 1) the MLE exhibits signi cant better convergence against LSM, over the whole studied range.Therefore, MLE is particularly effective in analyzing fracture data sets in which eld measurement of minor fracture aperture are problematic.As this linear regression method can be easily performed with an Excel spreadsheet, it may be routinely used in fracture analysis and in various experimental situations where biases occur at the lower limit of the sampling scale.From the left: expected aperture value (from linear law), residual of Log aperture, according to Eq. 1, and Log of probability calculated by means of Excel function LOGNORMDIST() (see main text).#9: Object function; in sheet MLE, it is the Log Likelihood, in sheet LSM it is the sum of square of residuals.#10: Data to build up probability plots of residuals.Use of these plots in fracture analysis lies outside the scope of this paper and is illustrated by Mazzoli et al., (2009).#11: Monte Carlo simulation output data.

discussion and concluding remarks
Three columns on the left include each one 100 estimated values of m, nand d; on the right side average and standard deviation of these columns.

Figure
Figure shows the results of Monte Carlo simulations in which the MLE and modi ed LSM are compared.The MLE shows a surprising effectiveness in the n range values 0.05-0.25,just where the LSM fails.In calculating the coe cient and constant term, the MLE estimator shows modest deviation from the true value as early as n = 0.05 and converges for n > = 0.1.Also in estimating the standard Development (MiSE), Italy, Grant Id: C19C20000520004Con icts of interest/Competing interestsNo, I declare that authors have no competing interests as de ned by Springer, or other interests that might be perceived to in uence the results and/or discussion reported in this paper.

Figures
Figures