Optimizing Laboratory Investigations of Saline Intrusion by Incorporating Machine Learning Techniques

: Deriving saltwater concentrations from the light intensity values of dyed saline solutions is a long-established image processing practice in laboratory scale investigations of saline intrusion. The current paper presents a novel methodology that employs the predictive ability of machine learning algorithms in order to determine saltwater concentration fields. The proposed approach consists of three distinct parts, image pre-processing, porous medium classification (glass bead structure recognition) and saltwater field generation (regression). It minimizes the need for aquifer-specific calibrations, significantly shortening the experimental procedure by up to 50% of the time required. A series of typical saline intrusion experiments were conducted in homogeneous and heterogeneous aquifers, consisting of glass beads of varying sizes, to recreate the necessary laboratory data. An innovative method of distinguishing and filtering out the common experimental error introduced by both backlighting and the optical irregularities of the glass bead medium was formulated. This enabled the acquisition of quality predictions by classical, easy-to-use machine learning techniques, such as feedforward Artificial Neural Networks, using a limited amount of training data, proving the applicability of the procedure. The new process was benchmarked against a traditional regression algorithm. A series of variables were utilized to quantify the variance between the results generated by the two procedures. No compromise was found to the quality of the derived concentration fields and it was established that the proposed image processing technique is robust when applied to homogeneous and heterogeneous domains alike, outperforming the classical approach in all test cases. Moreover, the method minimized the impact of experimental errors introduced by small movements of the camera and the presence air bubbles trapped in the porous medium.


Introduction
Currently, half the global population is living less than 200 km from the coast. This is forecast to rise to 75% by 2050 [1]. The predicted rapid increase in freshwater demand, in conjunction with the effects of climate change, will increasingly endanger the sustainability of groundwater resources. Excessive pumping and limited recharge can lead to saline intrusion (SI) of coastal aquifers in multiple locations. Since mixing fresh water with just 3% of sea water makes it unfit for drinking, prevention of saline intrusion in coastal aquifers will be one of the most important environmental challenges for the coming decades.
Saltwater intrusion has been studied extensively over the years [2][3][4]. Researchers have used either sand [5][6][7][8][9] or glass beads [10][11][12][13] to represent the porous medium of coastal aquifers in sandbox laboratory investigations of saltwater intrusion. Experimental data have been successfully used to validate analytical solutions [14,15] calibrate numerical models of groundwater flow [16,17], investigate specific characteristics of saltwater intrusion in aquifers, such as the interface between fresh and saltwater [18][19][20] and the existence of a mixing zone [21,22] or evaluate the implementation of specific coastal protection techniques [23,24]. Nevertheless, the majority of the aforementioned experimental investigations lack automated image analysis. Instead, data acquisition was based on manual characterisation of the different dye concentration zones through simple visual observation, significantly reducing the spatial and temporal resolution of the presented results.
Based on the investigations by Schincariol et al. [25], Zhang et al. [26,27] introduced an image analysis method to better determine the effects of intruding saltwater (SW) in contaminant transport on homogeneous aquifers. Nine different saltwater concentrations were used to establish a linear relationship between concentration and the optical density of dye. The results consisted of three isolines of relative saltwater concentrations, constituting the first depiction of mixing zone at laboratory scale. It is worth noting, that since in most sandbox experiments the mixing zone is relatively thin, researchers exclusively used sharp interface models to approximate the laboratory data up to that point. Konz et al. [28] proposed an image analysis procedure, calculating saltwater concentration in a homogeneous aquifer using a second order exponential function of the observed light intensity values. The method was further tested in two-dimensional heterogeneous aquifers, delivering contour plots of saltwater concentration of intruding wedges with two [29] and four [30] isolines, respectively. A similar approach was suggested by Liu et al. [31], using a least squares method to fit eight values of salt concentration with the corresponding measured grayscale pixel intensity values. The use of glass beads, instead of actual sand, and the transformation of coloured images into grayscale were common practice among all the presented image analysis investigations. Kuan et al. [32] presented the only study to utilise quartz sand in simulating an unconfined coastal aquifer while applying binary style automated image processing, which classified each pixel into one of two categories-freshwater or saltwater-according to its luminance levels. The position of lights in relation to the sand tank is another crucial element of the laboratory methodologies. In cases where a visually transparent porous medium is available, using a thin sandbox with clear glass beads, the tank was placed between the light source and the camera (light transmission technique). In all other studies, lighting was placed in front of the sandbox (light reflection technique). A detailed comparison between the two experimental approaches was attempted by Konz et al. [33] Robinson et al. [34] introduced a completely automated image analysis method applied in laboratory investigations of saline intrusion in homogeneous coastal aquifers. This pixel wise regression, further optimized by Etsias et al. [35], uses eight calibration images of the investigated aquifer, with standard seawater salt concentrations, to derive a power law correlation between them and the observed light intensities for each specific pixel. The process allowed for the precise calculation of values of the intruding toe length and width of the mixing zone, under both steady state and transient conditions [36]. Using the same method, the suitability of physical barriers to prevent saltwater intrusion in laboratory scale homogeneous [37] and heterogeneous [38,39] aquifers was investigated.
Employing sand tanks in the study of groundwater flow has not been limited to saltwater intrusionrelated phenomena. Laboratory setups, consisting of sandboxes or flow cells of varying size and shape have been successfully utilized to investigate the migration of non-aqueous phase liquids (NAPL). Wipfler et al. [40] conducted experimental measurements in a thin sandbox filled with fine and coarse sand to study the infiltration of NAPL on stratified aquifers. Similarly, Oostrom et al. [41] and O'Carroll and Sleep [42] quantified through sandbox testing the impact of water table fluctuations and temperature on NAPL transport. Finally, Sun [43] introduced an advanced multi-spectral image analysis technique, applicable to laboratory studies of NAPL aquifer contamination. Tracer tests conducted using similar laboratory rigs have been successfully utilized in the study of reactive transport through porous media [44,45]. Castro-Alcala et al. [46] concentrated on the visualization of the mixing processes in heterogeneous aquifers while Poonoosamy et al. [47] focused on the dissolution-precipitation processes occurring during reactive transport. The presented studies indicate that advances in the experimental methodologies applied in the study of saline intrusion could have a wider impact beyond this specific research field.
Sandbox setups have a relatively small size, being only a tiny fraction of the real-world aquifers they approximate. This means that variations in the applied boundary conditions and porous medium characteristics may not have the profound effect they have in nature, which constitutes their successful identification critical in the study of saltwater intrusion dynamics. One characteristic example is the inability to recreate on a laboratory scale the extensive mixing zones observed in fieldwork studies of tidally affected coastal aquifers [48]. The aforementioned image analysis techniques permit the quantification of laboratory scale variations in the toe length, the width of the mixing zone, the angle of saline intrusion and other SWI characteristics with progressively higher accuracy. This trend, assisted by advances in digital photography and the access to stronger computers, will gradually render the traditional laboratory methodology, based on simple visual observation of the dye concentrations inside a sand tank, obsolete. One of the main challenges in applying these methodologies, and the pixel wise regression, in particular, is the need for a full set of calibration images recreated for each investigated aquifer, thus making the experimental process time consuming. This issue makes its implementation on larger scale experimental setups impractical. An improved methodology is therefore required if the accuracy found in using an automated image analysis system is to be employed.
The analytics developed in machine learning techniques (MLT) continue to evolve and are now established as an efficient tool for predicting trends within experimental data. In the field of groundwater saline intrusion, their implementation has been mostly focused on the prediction of the amount of intruding saltwater [49,50] as well as the efficient management of vulnerable aquifers [51][52][53]. Robinson et al. [54] utilised the Random Forests ensemble learning method to shorten the experimental procedure in sandbox investigations. The resulting methodology, while taking less time, generated saltwater concentration fields that significantly deviated from those derived by the traditional pixel wise process. The current study incorporates established MLTs into a single method, that reduces experimentation time by as much as 50%, yet produces high quality temporal and spatial saltwater concentration fields in both homogeneous and heterogeneous aquifers alike. To the best of the authors' knowledge, it is the first time this has been successfully achieved.

Laboratory Setup and Investigated Aquifers
The sandbox apparatus depicted in Figure 1, described in detail by Robinson et al. [34], was used to investigate multiple scenarios of intrusion that allowed the generation of controlled and repeatable experimental data. It consists of a viewing chamber of dimensions 0.38 × 0.15 × 0.01 m with two cylindrical reservoir tanks on each side. The coastal aquifers were approximated by filling the central chamber with clear glass beads of predetermined diameters, supplied by Whitehouse Scientific ® . In most test cases the reservoir on the left was filled with fresh water, while the one on the right contained saltwater with a density of 1025 kg/m 3 . The saltwater was dyed using red food colouring at a mix ratio of 0.15 g/L of solution. Two fine mesh acrylic screens were used to retain the beads in the main part of the apparatus while still permitting water circulation between the two tanks and the viewing chamber. Adjustable overflow outlets controlled the water level in the two side chambers and incremental changes in water level were achieved via the use of two Microsonic ® (mic+25/DIU/TC, Microsonic GmbH, Dortmund, Germany) sensors. The measurements took place in a totally dark room. Lighting was introduced by two Camtree 600 LED light panels. A Nikon D850 Digital SLR Camera (D850 Digital SLR Camera, Nikon Corporation, Tokyo, Japan) equipped with a Nikkor FX 60 mm lens (Nikkor FX 60 mm, Nikon Corporation, Tokyo, Japan) was used to capture the experimental images. The 8-bit version of the images were used in the current investigation. The light panels were put on the back of the rig at a distance of 25 cm from the sandbox. The viewing chamber was positioned between the light source and the camera. The sandbox to camera distance was 95 cm. The domain area to be analysed was automatically derived from the rest of the image using the pre-processing algorithm introduced by Robinson et al. [34] using MATLAB R2018a (The MathWorks, Inc., Natick, MA, USA) application and programming language. The final cropped images had a resolution of 1185 × 3506 with a pixel size of 0.105 mm.
Water 2020, 12, x FOR PEER REVIEW 4 of 21 light panels were put on the back of the rig at a distance of 25 cm from the sandbox. The viewing chamber was positioned between the light source and the camera. The sandbox to camera distance was 95 cm. The domain area to be analysed was automatically derived from the rest of the image using the pre-processing algorithm introduced by Robinson et al. [34] using MATLAB R2018a (The MathWorks, Inc., Natick, MA, USA) application and programming language. The final cropped images had a resolution of 1185 × 3506 with a pixel size of 0.105 mm. Beads of three different diameters, 780, 1090, and 1325 µm, were used in the current investigation. An intrinsic flow test on the experimental domain allowed calculation of the permeability of the porous media using Darcy's law. Porosity was equal to 0.385, while permeability was equal to 7.98 × 10 −10 m 2 , 1.83 × 10 −9 m 2 , 2.39 × 10 −9 m 2 , for the 780, 1090, and 1325 µm beads, respectively. In total, 15 aquifers were created (Table 1); four homogeneous aquifers for each utilized bead size and three heterogeneous ones. The heterogeneous cases, which are referred to as Layered1, Layered2 and Layered3, had three horizontal layers composed of distinct bead diameters ( Figure 2). In Layered1, bead size increased in each heterogeneous layer from top to bottom, with 780 µm being the diameter in the upper layer, 1090 µm in the middle, and 1325 µm in the lower layer. In Layered2, the layout of the bead sizes was reversed such that 1325, 1090, and 780 µm were the sizes of the top, middle, and bottom layers, respectively. Layered3 had a low permeability layer, with a bead size of 780 µm, in the middle surrounded by two 1090 µm layers. Beads of three different diameters, 780, 1090, and 1325 µm, were used in the current investigation. An intrinsic flow test on the experimental domain allowed calculation of the permeability of the porous media using Darcy's law. Porosity was equal to 0.385, while permeability was equal to 7.98 × 10 −10 m 2 , 1.83 × 10 −9 m 2 , 2.39 × 10 −9 m 2 , for the 780, 1090, and 1325 µm beads, respectively. In total, 15 aquifers were created (Table 1); four homogeneous aquifers for each utilized bead size and three heterogeneous ones. The heterogeneous cases, which are referred to as Layered1, Layered2 and Layered3, had three horizontal layers composed of distinct bead diameters ( Figure 2). In Layered1, bead size increased in each heterogeneous layer from top to bottom, with 780 µm being the diameter in the upper layer, 1090 µm in the middle, and 1325 µm in the lower layer. In Layered2, the layout of the bead sizes was reversed such that 1325, 1090, and 780 µm were the sizes of the top, middle, and bottom layers, respectively. Layered3 had a low permeability layer, with a bead size of 780 µm, in the middle surrounded by two 1090 µm layers.
Two classical saltwater intrusion experiments, comparable to those introduced for the first time by Schincariol and Schwartz et al. [25] and Goswami1 and Clement [16], were recreated to develop and test the methodology employed. In the first, saltwater intrusion was initiated by hydraulic head difference between the freshwater and saltwater chambers ranging from 4 to 6 mm. This created a typical saltwater wedge in three homogeneous and the Layered1 and Layered2 heterogeneous aquifers (Figure 2a,b). In the second experiment, 10 mL of saltwater were injected into the upper part of the saturated freshwater only Layered3 aquifer. The saltwater plume was left to intrude deeper in the Water 2020, 12, 2996 5 of 21 aquifer for five minutes. Subsequently flow was introduced by a 6 mm hydraulic head difference between the two tanks. Images were acquired each minute for 28 min, until the saline plume was completely flushed out of the aquifer (Figure 2c). Two classical saltwater intrusion experiments, comparable to those introduced for the first time by Schincariol and Schwartz et al. [25] and Goswami1 and Clement [16], were recreated to develop and test the methodology employed. In the first, saltwater intrusion was initiated by hydraulic head difference between the freshwater and saltwater chambers ranging from 4 to 6 mm. This created a typical saltwater wedge in three homogeneous and the Layered1 and Layered2 heterogeneous aquifers (Figure 2a,b). In the second experiment, 10 mL of saltwater were injected into the upper part of the saturated freshwater only Layered3 aquifer. The saltwater plume was left to intrude deeper in the aquifer for five minutes. Subsequently flow was introduced by a 6 mm hydraulic head difference between the two tanks. Images were acquired each minute for 28 min, until the saline plume was completely flushed out of the aquifer (Figure 2c).

Methodology
The methodology introduced in this paper comprises three sections ( Figure 3). The first part involves experimental image pre-processing via both established and novel procedures. The second part treats the task of recognizing aquifer heterogeneity as a multinomial pixel wise classification problem. Lastly the third section derives the saltwater concentration fields via regression analysis on the values of light intensity (LI) and the corresponding bead sizes. Artificial Neural Networks (ANN) were the learners favoured in this investigation, but the applicability of different machine learning algorithms was also established. Sections 3.1-3.3 present the components of the subroutines in detail, including each of the steps followed by the authors in order to train the appropriate learners. Three homogeneous aquifers served as the training datasets for all the machine learning algorithms. The remaining aquifers were utilized for quantifying the models' generalization ability (Table 1). It was established that all the training data necessary for the implementation of the method in heterogeneous aquifers with three different bead diameters, 780 µm, 1090 µm, and 1325 µm, could

Methodology
The methodology introduced in this paper comprises three sections ( Figure 3). The first part involves experimental image pre-processing via both established and novel procedures. The second part treats the task of recognizing aquifer heterogeneity as a multinomial pixel wise classification problem. Lastly the third section derives the saltwater concentration fields via regression analysis on the values of light intensity (LI) and the corresponding bead sizes. Artificial Neural Networks (ANN) were the learners favoured in this investigation, but the applicability of different machine learning algorithms was also established. Section 3.1, Section 3.2, Section 3.3 present the components of the subroutines in detail, including each of the steps followed by the authors in order to train the appropriate learners. Three homogeneous aquifers served as the training datasets for all the machine learning algorithms. The remaining aquifers were utilized for quantifying the models' generalization ability (Table 1). It was established that all the training data necessary for the implementation of the method in heterogeneous aquifers with three different bead diameters, 780 µm, 1090 µm, and 1325 µm, could be generated by a single homogeneous aquifer per utilised bead size. After the training of the aforementioned learners was complete, saltwater concentration fields could be successfully generated with no further calibration effort for any number of new homogeneous or heterogeneous aquifers.

Image Pre-Processing
Data pre-processing significantly affects the generalization performance of supervised machine learning algorithms [55]. A 15 × 15 pixel median filter was applied to the acquired experimental data to diminish the impact of outliers. Despite this, the images of homogeneous aquifers were characterized by significant irregularity in the distribution of light intensity values. This can be attributed to two distinct factors: the different optical attributes of each specific glass bead and the lighting conditions. While the first cause creates a random variation in the observed LI values, the second is responsible for a distinct distribution, where the images appeared lighter in the centre of the sandbox. This lighting pattern was efficiently filtered out by developing the Mean Homogenization Factor (MHF) presented in Equation (1): where x, y are the pixel coordinates and n represents the three homogeneous aquifers, 780, 1090 and 1325 µm, used for neural training in the current investigation. The mean value of light intensity for all colours in each of the aforementioned homogeneous images was calculated. Subsequently, the red (R), green (G), blue (B) and greyscale LI of every pixel was divided by the mean LI, giving the values of a Homogenization Factor for each pixel. By averaging the values of these factors for the three homogeneous aquifers, the effect of non-uniform backlighting was isolated from the random deviation caused by each individual bead. Dividing any given image with the values of MHF led to the successful normalization of its LI. The impact of the Mean Homogenization Factor on the LI distribution of the three training images is visualized in Figure 4. It is evident, that the application of MHF resulted in a starker LI distribution contrast between the three utilised bead sizes. This contributed to a significant improvement in the training and generalization performance of the machine learning algorithms tested in the next section.
Water 2020, 12, x FOR PEER REVIEW 7 of 21

Image Pre-Processing
Data pre-processing significantly affects the generalization performance of supervised machine learning algorithms [55]. A 15 × 15 pixel median filter was applied to the acquired experimental data to diminish the impact of outliers. Despite this, the images of homogeneous aquifers were characterized by significant irregularity in the distribution of light intensity values. This can be attributed to two distinct factors: the different optical attributes of each specific glass bead and the lighting conditions. While the first cause creates a random variation in the observed LI values, the second is responsible for a distinct distribution, where the images appeared lighter in the centre of the sandbox. This lighting pattern was efficiently filtered out by developing the Mean Homogenization Factor (MHF) presented in Equation (1): where x, y are the pixel coordinates and n represents the three homogeneous aquifers, 780, 1090 and 1325 µm, used for neural training in the current investigation. The mean value of light intensity for all colours in each of the aforementioned homogeneous images was calculated. Subsequently, the red (R), green (G), blue (B) and greyscale LI of every pixel was divided by the mean LI, giving the values of a Homogenization Factor for each pixel. By averaging the values of these factors for the three homogeneous aquifers, the effect of non-uniform backlighting was isolated from the random deviation caused by each individual bead. Dividing any given image with the values of MHF led to the successful normalization of its LI. The impact of the Mean Homogenization Factor on the LI distribution of the three training images is visualized in Figure 4. It is evident, that the application of MHF resulted in a starker LI distribution contrast between the three utilised bead sizes. This contributed to a significant improvement in the training and generalization performance of the machine learning algorithms tested in the next section.

Porous Medium Classification
A necessary step towards the elimination of aquifer-specific calibrations and to move beyond current pixel wise methodologies, is the ability to recognize the size of individual beads and thereby any given heterogeneous structure. Various classical and soft-computing approaches to image segmentation have been proposed over the years [56]. In the current paper a soft-computing method based on supervised machine learning is favoured. The proposed technique dissociates from image specific parameter configuration; based on qualitative evaluation of the investigated heterogeneous structures by the user, which is essential in classical methods, such as image thresholding or edge detection. This enables the automatic recognition of more complex, including totally random,

Porous Medium Classification
A necessary step towards the elimination of aquifer-specific calibrations and to move beyond current pixel wise methodologies, is the ability to recognize the size of individual beads and thereby any given heterogeneous structure. Various classical and soft-computing approaches to image segmentation have been proposed over the years [56]. In the current paper a soft-computing method based on supervised machine learning is favoured. The proposed technique dissociates from image specific parameter configuration; based on qualitative evaluation of the investigated heterogeneous structures by the user, which is essential in classical methods, such as image thresholding or edge detection. This enables the automatic recognition of more complex, including totally random, experimental aquifers, while minimizing the time needed for recreating each one of them. This part of the investigation is based on images of the 15 aquifers taken prior to saltwater intrusion. The pixels of three homogeneous aquifers served as the training subset. The remaining twelve images were used to evaluate the effectiveness of the method. The training subset was deliberately kept as small as possible, in order to ensure the easy implementation of the approach.

Input Variable Selection
In machine learning applications, it has been demonstrated that successful variable selection can improve the prediction performance of the predictors while leading to faster and more cost-effective learners [57]. The three light intensity values (R, G, B) for each pixel, alongside their greyscale combination, were the basis of the input variable investigation. Moreover, it was established that the LI variance of the 15-by-15 neighbourhood around each input pixel provided an additional, useful feature for successfully distinguishing the three utilized bead sizes. This is because the larger bead diameters were associated with higher light intensity variance as depicted in Figure S1 of the supplementary material.
The relatively limited variable search space in the current problem, permitted the use of the more computationally intensive step-wise variable selection [58], instead of the much more commonly used, but less reliable, filter approaches. Artificial Neural Networks, with two hidden layers, consisting of 30 neurons each, were utilized for input variable selection. This ANN architecture was derived via trial and error. A sigmoid transfer function was used in the hidden neural layers and a softmax function [59] in the output layer, while Scaled Conjugate Gradient Backpropagation [60] served as the training algorithm. A network was trained for each one of the individual variables presented in Figure 5, to evaluate their fitness. The models were trained in parallel in a multi-core CPU, using MATLAB's Neural Network Toolbox version 11.1 and Parallel Computing Toolbox version 6.12. The utilized subsets were randomly divided, such that 70% of the data was used for training, 15% for validation and 15% for testing. Early stopping was applied to prevent potential network overfitting [61]. The neural networks' predictive ability was quantified by the percentage of successfully classified pixels on the 12 testing aquifers (Table 1).
Water 2020, 12, x FOR PEER REVIEW 8 of 21 experimental aquifers, while minimizing the time needed for recreating each one of them. This part of the investigation is based on images of the 15 aquifers taken prior to saltwater intrusion. The pixels of three homogeneous aquifers served as the training subset. The remaining twelve images were used to evaluate the effectiveness of the method. The training subset was deliberately kept as small as possible, in order to ensure the easy implementation of the approach.

Input Variable Selection
In machine learning applications, it has been demonstrated that successful variable selection can improve the prediction performance of the predictors while leading to faster and more cost-effective learners [57]. The three light intensity values (R, G, B) for each pixel, alongside their greyscale combination, were the basis of the input variable investigation. Moreover, it was established that the LI variance of the 15-by-15 neighbourhood around each input pixel provided an additional, useful feature for successfully distinguishing the three utilized bead sizes. This is because the larger bead diameters were associated with higher light intensity variance as depicted in Figure S1 of the supplementary material.
The relatively limited variable search space in the current problem, permitted the use of the more computationally intensive step-wise variable selection [58], instead of the much more commonly used, but less reliable, filter approaches. Artificial Neural Networks, with two hidden layers, consisting of 30 neurons each, were utilized for input variable selection. This ANN architecture was derived via trial and error. A sigmoid transfer function was used in the hidden neural layers and a softmax function [59] in the output layer, while Scaled Conjugate Gradient Backpropagation [60] served as the training algorithm. A network was trained for each one of the individual variables presented in Figure 5, to evaluate their fitness. The models were trained in parallel in a multi-core CPU, using MATLAB's Neural Network Toolbox version 11.1 and Parallel Computing Toolbox version 6.12. The utilized subsets were randomly divided, such that 70% of the data was used for training, 15% for validation and 15% for testing. Early stopping was applied to prevent potential network overfitting [61]. The neural networks' predictive ability was quantified by the percentage of successfully classified pixels on the 12 testing aquifers (Table 1).   The single variable which maximized model performance was the greyscale LI, thus it was chosen as the basis of the investigation. Subsequently, more variables were individually added to the training subset, and a new network was trained each time. As presented in Figure 5, the ANNs' generalization ability on the heterogeneous aquifers was slightly lower than their performance on the homogeneous ones. This phenomenon was attributed to the distinct pattern in LI distribution observed in the heterogeneous cases due to the overlapping of layers of different bead sizes. The incorporation of blue LI, even though it improved the prediction of the homogeneous aquifers, caused an increase in pixel misclassification in the layered cases, particularly the 780 and 1090 µm ones, while 1325 µm beads were unaffected. The two smaller sizes, being predominantly blue (Figure 4), were prone to higher fluctuations of the values of B that negatively affected the network's generalization performance; thus, any incorporation of blue LI or blue related variables was rejected. Through neural training, using nine different variable combinations, it was established that R, G, greyscale LI, and grayscale variance was the optimum training subset (Case 4 in Figure 5).

Evaluation of Various Machine Learning Algorithms
With the establishment of the optimum variable combination, the suitability of multiple machine learning techniques was investigated. MATLAB Statistics and Machine Learning Toolbox version 11.3 was utilized to train six well-known learners on 1185 × 3506 × 3 = 12.46 × 10 6 data points, and their performance was compared with that of the neural network: All the MLT architectures were established via trial and error. The algorithms were trained in parallel and five-fold cross-validation was employed to prevent the learners from overfitting. Training times varied significantly between the seven MLTs (Table S1 of the Supplementary Material). In general, all the methods were able to predict bead structure of heterogeneous aquifers ( Figure 6) with an accuracy of 98.85% or higher (Figure 7). It can be concluded that there was no need for stronger, more sophisticated learning algorithms or bigger training datasets. This indicates that the method proposed in the current paper can be easily adapted to different experimental setups in the future, utilizing various learners.
Water 2020, 12, x FOR PEER REVIEW 9 of 21 The single variable which maximized model performance was the greyscale LI, thus it was chosen as the basis of the investigation. Subsequently, more variables were individually added to the training subset, and a new network was trained each time. As presented in Figure 5, the ANNs' generalization ability on the heterogeneous aquifers was slightly lower than their performance on the homogeneous ones. This phenomenon was attributed to the distinct pattern in LI distribution observed in the heterogeneous cases due to the overlapping of layers of different bead sizes. The incorporation of blue LI, even though it improved the prediction of the homogeneous aquifers, caused an increase in pixel misclassification in the layered cases, particularly the 780 and 1090 µm ones, while 1325 µm beads were unaffected. The two smaller sizes, being predominantly blue ( Figure  4), were prone to higher fluctuations of the values of B that negatively affected the network's generalization performance; thus, any incorporation of blue LI or blue related variables was rejected. Through neural training, using nine different variable combinations, it was established that R, G, greyscale LI, and grayscale variance was the optimum training subset (Case 4 in Figure 5).

Evaluation of Various Machine Learning Algorithms
With the establishment of the optimum variable combination, the suitability of multiple machine learning techniques was investigated. MATLAB Statistics and Machine Learning Toolbox version 11.3 was utilized to train six well-known learners on 1185 × 3506 × 3 = 12.46 × 10 6 data points, and their performance was compared with that of the neural network: 30-learner Random Forest [63].
All the MLT architectures were established via trial and error. The algorithms were trained in parallel and five-fold cross-validation was employed to prevent the learners from overfitting. Training times varied significantly between the seven MLTs (Table S1 of the Supplementary Material). In general, all the methods were able to predict bead structure of heterogeneous aquifers ( Figure 6) with an accuracy of 98.85% or higher (Figure 7). It can be concluded that there was no need for stronger, more sophisticated learning algorithms or bigger training datasets. This indicates that the method proposed in the current paper can be easily adapted to different experimental setups in the future, utilizing various learners.

Generating Saltwater Concentration Fields
The image processing methods that have been presented up to now were based on the same principle [25,27,28,30,31]. A number of calibration images were acquired for each aquifer, these data were used to formulate a relationship between LI values and SW concentrations via regression analysis. While relatively accurate, these techniques were disadvantaged by the need to acquire new calibration images for every investigated aquifer, making them slow and labour intensive. Eliminating this need by the use of machine learning, will represent a step change in the ability to run significant laboratory investigations.
A traditional MLT approach would feed the raw data from the eight calibration images (with saltwater concentrations of 0%, 5%, 10%, 20%, 30%, 50%, 70%, and 100%), of one homogeneous aquifer for each utilized bead size, into a sufficiently strong learner that would automatically establish a relationship between LI and SW% [54]. Nevertheless, it is well documented [31,33] that experimental data of saltwater intrusion generated in sandbox setups are characterized by a high percentage of irregularity, mostly due to the deviating light absorbing attributes of the individual beads. The nature of this type of data makes their straightforward use for neural training problematic.
Instead of the actual images, a feedforward ANN with one 10-neuron hidden layer was trained on the mean values of monochromatic green LI for each specific aquifer ( Figure 8) using Levenberg-Marquardt as its training algorithm [64]. This led to low training time and minimum training error (MSE~0.2 × 10 −19 ), proving that the network's size was sufficient. In 8-bit digital images, light intensities are expressed with 256 values, varying between 0 and 255. It was observed that in the pure saltwater images, the blue LI component values were minimized getting values close or equal to 0, while for many pixels red LI was equal to 255; thus, monochromatic green was favoured for SW concentration calculation [35].

Generating Saltwater Concentration Fields
The image processing methods that have been presented up to now were based on the same principle [25,27,28,30,31]. A number of calibration images were acquired for each aquifer, these data were used to formulate a relationship between LI values and SW concentrations via regression analysis. While relatively accurate, these techniques were disadvantaged by the need to acquire new calibration images for every investigated aquifer, making them slow and labour intensive. Eliminating this need by the use of machine learning, will represent a step change in the ability to run significant laboratory investigations.
A traditional MLT approach would feed the raw data from the eight calibration images (with saltwater concentrations of 0%, 5%, 10%, 20%, 30%, 50%, 70%, and 100%), of one homogeneous aquifer for each utilized bead size, into a sufficiently strong learner that would automatically establish a relationship between LI and SW% [54]. Nevertheless, it is well documented [31,33] that experimental data of saltwater intrusion generated in sandbox setups are characterized by a high percentage of irregularity, mostly due to the deviating light absorbing attributes of the individual beads. The nature of this type of data makes their straightforward use for neural training problematic.
Instead of the actual images, a feedforward ANN with one 10-neuron hidden layer was trained on the mean values of monochromatic green LI for each specific aquifer ( Figure 8) using Levenberg-Marquardt as its training algorithm [64]. This led to low training time and minimum training error (MSE~0.2 × 10 −19 ), proving that the network's size was sufficient. In 8-bit digital images, light intensities are expressed with 256 values, varying between 0 and 255. It was observed that in the pure saltwater images, the blue LI component values were minimized getting values close or equal to 0, while for many pixels red LI was equal to 255; thus, monochromatic green was favoured for SW concentration calculation [35].
In order to minimize the impact of noisy data, caused by the presence of beads of varying colour tones, a quantification of the optical irregularities of the glass bead medium was essential. This procedure took place in the freshwater-only images of the tested aquifers. To do so, the LI values of each individual pixel were divided with the corresponding measured average LI value of its bead size. In doing so, a Bead Optical Irregularity Metric (BOIM) was derived, characterising each specific bead of the investigated aquifers, as shown in Equation (2), assigning higher and lower BOIM values to lighter and darker beads, respectively.
BOIM(x, y) = LI(x, y) meanLI (2) where "meanLI" is specific to the local pixel bead size. In order to minimize the impact of noisy data, caused by the presence of beads of varying colour tones, a quantification of the optical irregularities of the glass bead medium was essential. This procedure took place in the freshwater-only images of the tested aquifers. To do so, the LI values of each individual pixel were divided with the corresponding measured average LI value of its bead size. In doing so, a Bead Optical Irregularity Metric (BOIM) was derived, characterising each specific bead of the investigated aquifers, as shown in Equation (2), assigning higher and lower BOIM values to lighter and darker beads, respectively.
where "meanLI" is specific to the local pixel bead size. By dividing the freshwater only image of any aquifer with its corresponding BOIM matrix, the mean LI values the feedforward ANN was trained upon are returned. When these values were fed on the neural network, the 0% SW concentration fields was perfectly recreated. The same procedure was repeated for saltwater-only images of the aquifers, deriving the BOIM corresponding to the 100% saltwater percentage. This led to the perfect prediction of the 100% SW concentration components of the testing images. The two derived components of bead irregularity were not identical to each other in any of the investigated cases, proving that the level of light absorbance was not only correlated to the composition of each specific bead, but also to the dye concentration present. Thus, the distinction between the bead optical irregularity metric at C = 0% (BOIM0) and C = 100% (BOIM100) was deemed necessary.
Intruding saltwater wedge images are dominated by two distinct regions: the saltwater wedge, with 100% SW concentration; the pure freshwater part of the aquifer, where the intrusion has not occurred. The mixing zone makes up no more than 5% of the acquired data. Dividing the SWI test images with BOIM0 and BOIM100 creates two data subsets that, when fed to the neural network, generate a perfect prediction of the two main concentration regions. The effect of the implementation of the two factors on a test image is depicted in Figure 9. By dividing the freshwater only image of any aquifer with its corresponding BOIM matrix, the mean LI values the feedforward ANN was trained upon are returned. When these values were fed on the neural network, the 0% SW concentration fields was perfectly recreated. The same procedure was repeated for saltwater-only images of the aquifers, deriving the BOIM corresponding to the 100% saltwater percentage. This led to the perfect prediction of the 100% SW concentration components of the testing images. The two derived components of bead irregularity were not identical to each other in any of the investigated cases, proving that the level of light absorbance was not only correlated to the composition of each specific bead, but also to the dye concentration present. Thus, the distinction between the bead optical irregularity metric at C = 0% (BOIM0) and C = 100% (BOIM100) was deemed necessary.

Green Light Intensity
Intruding saltwater wedge images are dominated by two distinct regions: the saltwater wedge, with 100% SW concentration; the pure freshwater part of the aquifer, where the intrusion has not occurred. The mixing zone makes up no more than 5% of the acquired data. Dividing the SWI test images with BOIM0 and BOIM100 creates two data subsets that, when fed to the neural network, generate a perfect prediction of the two main concentration regions. The effect of the implementation of the two factors on a test image is depicted in Figure 9.
Water 2020, 12, x FOR PEER REVIEW 12 of 21 Figure 9. Effect of the implementation of BOIM0 and BOIM100 on a SWI image of the Layered2 aquifer.

Optimum Combination of Predicted Saltwater Concentration Fields
A crucial step towards the correct representation of the mixing zone on the salt-freshwater interface was the proper combination of values of the two, BOIM0 and BOIM100-modified, SW concentration fields. Figure 10 illustrates the mean squared error (MSE) generated by the ANN for the actual data of the homogeneous calibration images, expressing the difference between the

Optimum Combination of Predicted Saltwater Concentration Fields
A crucial step towards the correct representation of the mixing zone on the salt-freshwater interface was the proper combination of values of the two, BOIM0 and BOIM100-modified, SW concentration fields. Figure 10 illustrates the mean squared error (MSE) generated by the ANN for the actual data of the homogeneous calibration images, expressing the difference between the calculated and the real SW%. For concentrations closer to 0%, the BOIM0-modified subsets generate less MSE. The same applies to the BOIM100-modified figures for bigger SW concentrations. A prediction combination based on this error distribution was formulated. For extremely small or big saltwater concentrations only the BOIM0 (zone 1) and the BOIM100 (zone 5) predictions were taken into account. In between zones 1 and 5, three distinct zones were favoured, where either simple (zone 3) or weighted averaging (zones 2 and 4) between the two predictions took place. A genetic algorithm, presented in detail in Appendix A, was utilised to optimize the limits of the five aforementioned concentration areas in order to minimize the generated MSE (Equation (A1)). The algorithm was executed eight times (Table A1) to derive an optimum solution. According to this solution, the best prediction combination was as follows: zone 1 = 0%-16%, zone 2 = 16%-33%, zone 3 = 33%-51%, zone 4 = 51%-88% and zone 5 = 88%-100% ( Figure 10). Since prediction error depends on the optical characteristics of the glass bead medium, dye and back lighting (factors which remain consistent), the prediction combination needs to be optimized only once for each experimental setup. The presented procedure took advantage of the capabilities of supervised machine learning. It turned a highly noisy experimental dataset, in which relatively strong learners failed to predict its concentration values, into two subsets by filtering out much of the noise introduced by the variance in light absorbing attributes of individual glass beads. The two subsets were then fed into a less complex neural network. By properly combining the two predictions, high quality saltwater concentration fields were obtained.

Results
Saltwater concentration fields generated using the novel machine learning based method presented in this paper and concentration fields derived from the pixel wise regression method introduced by Robinson et al. [34] were compared with each other. For the first experimental case, steady state images of SWI, introduced by a hydraulic head difference equal to 6 mm, in the three The presented procedure took advantage of the capabilities of supervised machine learning. It turned a highly noisy experimental dataset, in which relatively strong learners failed to predict its concentration values, into two subsets by filtering out much of the noise introduced by the variance in light absorbing attributes of individual glass beads. The two subsets were then fed into a less complex neural network. By properly combining the two predictions, high quality saltwater concentration fields were obtained.

Results
Saltwater concentration fields generated using the novel machine learning based method presented in this paper and concentration fields derived from the pixel wise regression method introduced by Robinson et al. [34] were compared with each other. For the first experimental case, steady state images of SWI, introduced by a hydraulic head difference equal to 6 mm, in the three homogeneous aquifers (780, 1090, and 1325 µm) and Layered1, Layered2 heterogeneous aquifers were processed using the two methodologies (Figure 11a-e). To further investigate the ability of the new procedure to recreate quality concentration fields of heterogeneous aquifers, and to quantify any performance difference between steady and unsteady state conditions, two additional cases were investigated. The images of the layered aquifers were acquired exactly 20 min after the difference of hydraulic head between the freshwater and saltwater chamber was reduced from 6 to 4 mm and while the intruding SW wedge had not yet stabilized (Figure 11f,g). In the second experimental case, the actual test figures of moving saltwater plume are presented alongside the corresponding generated flow fields two, five, nine and fifteen minutes after the injection of saltwater in the Layered3 heterogeneous aquifer (Figure 12).
Recognizing any significant difference between the SW concentration fields generated by the two methods through simple visual observation ( Figure 11 and Figure S2 of the Supplementary Material) was not feasible for the majority of the test cases. During the data acquisition procedure, air bubbles were formed in the upper part of the homogeneous 1090 and 1325 µm aquifers. This is an experimental error reported before by Robinson et al. [34], that can only be avoided through the use of degassed water, further complicating the laboratory procedure. Even though bubbles appeared as spots of over-predicted SW concentration in the pixel wise calculated concentration maps (Figure 11b(i),c(i)), they did not affect the maps generated with the novel technique (Figure 11b(ii),c(ii)) that recreated the freshwater only parts of the images successfully. This makes the use of regular tap water compatible with the presented image analysis procedure.
Three variables were utilized to quantify the difference between the results of the two procedures; the toe length (TL) of the intruding wedge, the width of the mixing zone (WMZ) and the average difference between the concentration values generated by the two procedures (∆C) (Figure 13). TL was defined as the maximum horizontal distance of the 50% concentration isoline from the right aquifer edge, while WMZ equalled the average vertical distance between the 25% and 75% concentration isolines.
The maximum TL difference between the two methods was observed in the steady state case of the Layered2 heterogeneous aquifer. It was less than 0.6 mm, or 6 pixels, which correspond to between 45% and 76% of a single glass bead diameter. The prediction of the width of the mixing zone was extremely similar for the two methods, giving a difference of 0.01 mm~0.23 mm. Average ∆C across most testing images was less than 2%, with the exception of the Layered2 steady state case. No distinct difference was observed in the performance of the new method between the homogenous and heterogeneous aquifers, steady and unsteady state cases, or the wedge shaped and saline plume intrusion data. This demonstrates the method's resilience, indicating it can be applied to a diverse set of laboratory investigations. Water 2020, 12, x FOR PEER REVIEW 14 of 21  Recognizing any significant difference between the SW concentration fields generated by the two methods through simple visual observation ( Figure 11 and Figure S2 of the Supplementary Material) was not feasible for the majority of the test cases. During the data acquisition procedure, air bubbles were formed in the upper part of the homogeneous 1090 and 1325 µm aquifers. This is an experimental error reported before by Robinson et al. [34], that can only be avoided through the use of degassed water, further complicating the laboratory procedure. Even though bubbles appeared as spots of over-predicted SW concentration in the pixel wise calculated concentration maps ( Figure  11b(i),c(i)), they did not affect the maps generated with the novel technique (Figure 11b(ii),c(ii)) that recreated the freshwater only parts of the images successfully. This makes the use of regular tap water compatible with the presented image analysis procedure.
Three variables were utilized to quantify the difference between the results of the two procedures; the toe length (TL) of the intruding wedge, the width of the mixing zone (WMZ) and the average difference between the concentration values generated by the two procedures (ΔC) ( Figure  13). TL was defined as the maximum horizontal distance of the 50% concentration isoline from the right aquifer edge, while WMZ equalled the average vertical distance between the 25% and 75% concentration isolines.  The maximum TL difference between the two methods was observed in the steady state case of the Layered2 heterogeneous aquifer. It was less than 0.6 mm, or 6 pixels, which correspond to between 45% and 76% of a single glass bead diameter. The prediction of the width of the mixing zone was extremely similar for the two methods, giving a difference of 0.01 mm~0.23 mm. Average ΔC across most testing images was less than 2%, with the exception of the Layered2 steady state case. No distinct difference was observed in the performance of the new method between the homogenous and heterogeneous aquifers, steady and unsteady state cases, or the wedge shaped and saline plume intrusion data. This demonstrates the method's resilience, indicating it can be applied to a diverse set of laboratory investigations.

Conclusions
This paper presents a novel method of incorporating machine learning techniques in laboratory investigations of saltwater intrusion in coastal aquifers. The procedure consists of three distinct parts. Initially an approach of filtering out back lighting noise was presented. The second section included the training of a learner to conduct pixel-wise bead size classification analysis that automatically recreated the ground stratification of the investigated laboratory aquifers. In the third part, the generated bead size was used alongside the acquired values of monochromatic green light intensity to train an Artificial Neural Network in correlating LI values with saltwater concentrations (regression analysis). A procedure for quantifying the optical irregularity of individual glass beads was proposed. This enabled the minimization of the irregularity's impact on the generalization performance of the regression ANN, leading to highly accurate SW concentration fields.
The main advantage of the methodology is the minimization of aquifer-specific calibrations, thus significantly lowering the time needed for experimental data acquisition. Instead of obtaining calibration images for every tested aquifer, they are obtained only once at the initial stage of the investigation and are used for neural training. Apart from the freshwater and saltwater only images, any subsequent acquisition of calibration data, for any new aquifers, is deemed redundant. In the utilized apparatus, this leads to time savings of up to five hours of labour-intensive laboratory work per tested case. This advantage becomes crucial for larger-scale investigations that would include multiple heterogeneous aquifers with potentially larger sandbox dimensions. In conclusion, the machine learning oriented technique gave satisfactory results for homogeneous and heterogeneous aquifers alike. The proposed methodology presented a series of additional advantages over prior methods.

•
The method outperformed pixel-wise regression in most cases. It perfectly recreated the pure freshwater and saltwater zones of the experimental SW intrusion data. The impact of air bubbles was limited. Since the regression was not performed on a pixel wise level, there was no need for perfect overlap between the test and calibration images, thus small movements of the camera introduced minimal error.

•
The approach was implemented successfully for the wedge-shaped intrusion and the irregularly shaped saltwater plume alike, proving its resilience and applicability on diverse setups of saline intrusion. The basic principles of the methodology may be extensible to laboratory investigations of phenomena beyond this specific research field, such as contaminant transport of dense or non-aqueous phase liquids.

•
The results confirmed that a minimum of one homogeneous aquifer per utilized bead size generated sufficient training data for the implementation of the method. It was established that the procedure can be recreated by substituting the ANNs with different machine learning algorithms. All the algorithms used were well established and their architecture was derived through trial and error. The application of more complex learning procedures, such as deep learning, was not necessary. The aforementioned facts demonstrate that the method's implementation is straightforward and does not require advanced machine learning skills, making it a user friendly scientific tool. .
( 1, . . 5) = ( ( ℎ ( , ) ( , ), ( 1, ) ))/3 (A1) where j is the number of homogeneous aquifers (three) and i is the number of calibration concentrations (eight). An initial solution population of 500 was deemed sufficient, while the genetic procedures were implemented for 100 generations. The algorithm was executed for a total of eight times (Table A1) to derive the optimum solution. Figure A1. Calculation of weights for the weighted averaging of generated Mean Squared Error.  Figure A1. Calculation of weights for the weighted averaging of generated Mean Squared Error.