Can Deep Learning Extract Useful Information about Energy Dissipation and Effective Hydraulic Conductivity from Gridded Conductivity Fields?

We confirm that energy dissipation weighting provides the most accurate approach to determining the effective hydraulic conductivity (K eff ) of a binary K grid. A deep learning algorithm (UNET) can infer K eff with extremely high accuracy (R 2 > 0.99). The UNET architecture could be trained to infer the energy dissipation weighting pattern from an image of the K distribution with high fidelity, although it was less accurate for cases with highly localized structures that controlled flow. Furthermore, the UNET architecture learned to infer the energy dissipation weighting even if it was not trained on this information directly . However, the weights were represented within the UNET in a way that was not immediately interpretable by a human user. This reiterates the idea that even if ML/DL algorithms are trained to make some hydrologic predictions accurately, they must be designed and trained to provide each user-required output if their results are to be used to improve our understanding of hydrologic systems most effectively.


Introduction
Numerical modeling is fundamental to understanding hydrologic systems, and to predicting outcomes to be used for water resources management and groundwater contaminant remediation [1]- [4].Water movement through the subsurface is controlled largely by the hydraulic conductivity of the region, which can vary over orders of magnitude across multiple scales [5].
There is a rich body of literature on the upscaling of hydraulic conductivity.[6] categorized upscaling techniques as being either local or non-local.Local techniques, which include simple averaging, power averaging, renormalization, and percolation theory, are based on the assumption that effective upscaled conductivity depends only on the statistical distribution of media of different conductivities contained within the medium [7], [8], [17]- [21], [9]- [16].However, while the method is very fast and efficient, severe errors can occur in the final estimates at the scale of the largest blocks due to unrealistic boundary representations during the recursive upscaling process [22].Non-local techniques, which include inverse modeling and energy dissipation, consider how boundary conditions affect flow [23]- [27].However, this approach requires solving the flow problem, including the boundary conditions, which requires that many observations are available to properly constrain the parameter estimation problem.This can be very computationally demanding and several authors, further, have shown that this approach can provide imperfect results [27], [28].
The most direct approach to determining how spatially variable averaging of hydraulic conductivities occurs during flow is through energy dissipation analysis.This approach is largely limited to steady-state problems, and also requires solving the flow problem to determine the effective, upscaled parameter value.In essence, the energy dissipation approach defines the energy per unit time required to force the fluid through each block of the porous medium; this value is normalized for the shape of the domain and the boundary conditions, and then can be used to define the spatial distribution of weights to be applied to the local conductivity values when upscaling to determine Keff.In this regard, [29] and [30], [31] suggested that Keff can be determined from a grid of cells by assuming that dissipated energy must be preserved during the equivalent block conductivity computation.
Although the energy dissipation approach is computationally demanding and requires that the flow problem be solved for both the homogeneous and heterogeneous case, it has been found to be the most accurate and mathematically rigorous way to upscale conductivity for steady state problems [32].Further, it can provide significant insight into the specific locations that contribute most to the upscaled value of Keff.Borrowing on the approach to defining the sample area of time domain reflectometry probes using this approach [33], it is possible to identify relatively small areas of the domain that contribute disproportionately to the value of Keff, thereby identifying key structures in the subsurface that may be controlling flow.
Reviews of several studies in the context of estimating effective parameter values (e.g., [34], [35]) indicate that data-driven approaches are efficient, and can even outperform stochastic modeling or local (i.e., structure-based) techniques.The architecture underlying convolutional neural networks (CNNs) allows for the preservation of spatial structure and correlation information, and we might therefore expect that the CNN approach is particularly suitable for problems involving gridded inputs, such as hydraulic conductivity fields [2], [14], [36], [37].In the context of effective subsurface parameter estimation, the accuracy of CNN-based approaches can be attributed to the fact that, unlike classic stochastic approaches that only consider the first and second statistical moments [37] of a highly spatially variant media, the machine learning approaches can account for spatial patterns that are not explicitly characterized by those statistical moments or by classical structure-based models.[37] used a CNN to map conductivity fields to macro-dispersivity, [38] combined images of porous media with integral quantities of porosity and specific surface area to estimate pore-scale permeability, and [36] parameterized a non-Gaussian conductivity field using a convolutional adversarial autoencoder as well as proposing a deep residual dense CNN to map spatially distributed conductivity to head and solute concentration for 2D and 3D media.
Despite their impressive predictive power, ML-based models can suffer from a lack of interpretability [39], [40].Most studies have mapped from measured inputs to outputs without due consideration of the underlying physical processes involved [37], [38], [41], [42].Consequently, several studies, have attempted to incorporate physical constraints into DL algorithms.For example, [43] used a knowledge-based neural network to estimate head distribution by taking into consideration the residuals of the governing equations, boundary conditions, and expert knowledge when formulating the loss function used to train the model.[34] incorporated governing flow partial differential equation constraints (the Darcy and Richards equations) along with training data into a DL algorithm to infer the hydraulic conductivity map based on sparse observations of head and conductivity during saturated flow through a heterogeneous medium and to infer the constitutive pressure-conductivity relationship from observations of capillary pressures during unsaturated flow.
To date, little attention has been paid to the design of the underlying ML/DL architecture.In particular, we found no publications addressing the problem of how the ML/DL approach extracts and uses information from the heterogeneous field in the process of inferring Keff.Here, we make use of recently developed approaches that facilitate comparing the activation patterns of different DL models [44] to examine how these ML tools extract and use the knowledge that is relevant to the process of upscaling (i.e.energy dissipation weighting).
This study has two primary objectives.The first is to examine the potential for a specific type of CNN, an image to image translation algorithm known as UNET, to infer the effective hydraulic conductivities of two-dimensional binary conductivity fields.Conceptually, these binary fields can be viewed as simplifications of bimodal K fields that can result from coastal depositional processes and fracturing in low permeability media [45].The second objective is to understand weather UNET learns to use energy dissipation weighting as an intermediate step for Keff .Therefore, we compare the ability of a UNET to infer Keff from a binary K grid when trained intermediately on the energy dissipation weighting to that when trained only on the K grid.
In this regard, we examine how information is processed by the UNET, to examine whether it is accounting for energy dissipation 'naturally', even when it is not explicitly trained using such information.

Methodology
We examined the effect of the structure of a binary medium on the effective hydraulic conductivity, Keff, using the MODFLOW numerical 2-D groundwater model to produce the steady-state head distribution over a square grid with a 1-D applied gradient.We computed Keff from the geometry of the grid, the applied Type I boundary conditions, and the steady-state flow through the system for different random distributions of two media with different K values.We also computed the energy dissipation in every cell to examine whether this information can provide insight into the spatial weighting of the K values used to determine Keff.Further, we examined the UNET algorithm to infer Keff from the K grid, with and without information about the energy dissipation distribution.Finally, we used central kernel alignment similarity [44] to infer the hidden layer representation for Keff estimation in an attempt to understand how and whether the deep learning algorithm considers energy dissipation during Keff estimation.

Flow through Heterogeneous Binary Grids (Dataset Generation)
We defined 25 by 25 grid domains with no flow boundaries at the top and bottom and constant head boundaries of 2 m and 1 m on the left and right boundaries, respectively.Each cell has a length of 1 m on a side.Two media populated the grid, with K values of 1, high K, and 0.001 cm/s, low K. Different percentages of the prevalence of the high K material were considered, ranging from 1% to 99%.For each high K percent, 3000 random distributions of the media were modeled.Figure 1 shows one example of a grid with 50% high K material.
For each grid, the effective hydraulic conductivity was computed based on Darcy's Law, the global gradient applied over the domain, and the steady-state flow through the system.The convergence criterion on the head used in MODFLOW was 0.01 m.To account for small errors that persisted when the convergence criterion was met, the value of Keff was calculated based on the flow into the left boundary and the flow out of the right boundary.The resulting Keff values calculated at both boundaries agreed within 1%, and the average value was used for all analyses.

Energy Dissipation Weighting Method
Conceptually, energy dissipation is defined as the energy per unit time necessary to force the fluid through the porous medium [30].The value of Keff can be thought of as a weighted average of the spatially distributed values of K. Energy dissipation can be used to define the spatial distribution of weighting factors based on the square of the gradient of the potential at each location normalized by the sum of the square of the gradient of the potential for the same boundary conditions for a domain filled with a homogeneous medium [29].The weighting factor at a point at (x,y) can be expressed as:

𝑊 (𝑥, 𝑦) = [∇∅( , )] ∬[∇∅ ( , )]
Eq.1 where w(x,y) is the weighting factor at point (x,y), ∅(x,y) is the potential at each location, and ∅ is the potential distribution for the equivalent homogenous field.showed that spatially variable properties (e.g. for K) can be weighted to determine an upscaled property (here Keff) as the sum of the local K weighted by the energy dissipation weighting factor over the domain, as: In this study, the steady-state head values were used to compute the energy dissipation distribution.Because MODFLOW determines head values at the nodes and the K values are defined over the cells, the gradient and K values are not aligned.There are two approaches to compute Keff with the energy dissipation approach for these conditions.First, the gradient can be computed at each cell edge and the value of Keff at the edge can be determined based on the K value in the two neighboring cells.Second, the head values can be interpolated to the edges, allowing for gradients to be computed at the nodes, matching the locations of the K grid.Both of these approaches were tested and were found to agree within 1%; accordingly, the average of these two estimates of Keff was used for each grid for further analyses.Hereafter, the energy dissipation weights are referred to as ED weights, or simply as weights.

Estimating Keff with and without ED Weights
For a given percent of high K material, the energy dissipation distribution depends on the structure and arrangement of high K and low K cells in the domain.As a result, the K distribution and the ED weight distribution are related (but not identical) sources of information for inferring Keff.In particular, given that the knowledge of energy dissipation has been shown to provide valuable information regarding the weighting required to define Keff, the problem of estimating Keff from a grid of K values can be seen as a problem that has two stages.The first is to estimate the energy dissipation weighting at each cell, and the second is to use the estimates of the spatially distributed ED weights to estimate Keff.
Recent advances in the application of deep learning to image processing have led to the development of powerful machine learning architectures.The UNET architecture [46] was developed to address problems that require consideration of multiple scales by including skip connections, which recombine information from earlier hidden layers with that of later hidden layers.Here, we propose a modified UNET architecture that estimates the spatial weight distribution and then combines this estimate with the K grid to estimate Keff (Figure 2).We applied the UNET in two different ways to understand if and how ED weighting is used in the estimation of Keff.In the first implementation, referred to as 'informed', the model is trained using the freeze-training technique [47], [48], in which the lower branch of the model, Figure (2), up to the point that the K grid information is reintroduced, is first trained to estimate the spatially distributed ED weights.This is achieved by providing the ED weights to intermediate layers during training.Once trained, the informed UNET is then used to predict Keff without being provided ED weights.This is possible because UNET models are a variation of encoder-decoder algorithms, which include a contracting path (like a vanilla CNN) followed by an expanding path.
The contracting path (i.e., encoder) is responsible for capturing the context while the expanding path (i.e., decoder) enables localization.Through the use of encoder-decoder paths, the UNET can provide an output that has the same dimensions as the input.In our application, this property is necessary to obtain ED weights on a grid having the same size as the K grid.Making use of this structure, we trained UNET to infer the ED weights and then used those inferred weights to predict Keff by training the rest of architecture In other words, for the informed UNET, the weights of the lower branch were frozen after training, and training was then continued by feeding only the K grid into the UNET.The algorithm then provided estimates of the ED weights, which were concatenated with the K grid and fed into the final fully-connected layer.This model was trained to estimate Keff.
The second implementation of UNET is referred to as 'uninformed'.The model structure was identical to the informed UNET, but was only provided K grid information; it was not trained using any information about the actual ED weights.Rather, all weights in the model were fitted simultaneously during training to fit Keff.
The details of our UNET structure are provided in the Appendices (Table A1-B).Briefly, the contracting path is comprised of repeated blocks of two consecutive 3x3 convolutional kernels with rectified linear activation functions (Relu) followed by a 2 x 2 max-pooling layer with a stride of 2 to reduce the number of parameters and diminish the next layer input size.On the contracting path, multilevel decomposition is applied to each layer, doubling the number of feature maps (i.e., filters) at each step.The expanding path consists of repeated blocks of transposed convolution layers with a kernel size of 2X2 and a stride of 2. In each block, the output of the transposed convolution layer is concatenated with the cropped feature map of the corresponding step from the encoding procedure (a skip connection).The concatenated values are subjected to two consecutive 3x3 convolutional kernels with Relu activation functions.The skip connections help to recover information that may be lost by down-sampling during decoding.The cropping procedure in the concatenation ensures that the tensor extracted from the encoder will have the same size as the corresponding layer in the decoder.During decoding, the convolutional layer halves the number of channels.A final convolution layer with a kernel size of 1X1 and linear activation maps the current number of channels to a single layer.A skip connection was introduced to recover information of the original grid, like the percent of high K, that may be lost by when inferring the ED weights.Specifically, the inferred ED weights were concatenated with the K grid and fed through a convolutional layer and a dense, fully-connected layer to estimate Keff.It should be noted that as part of preprocessing, we padded the input image to 32×32 to make the final output of the UNET the same as the original image.

Model Evaluation
Identical data were provided to both methods; specifically, the K grids (3000 realizations, each with 99 different percent high K values) MODFLOW-determined Keff values, and (where applicable) ED weights.The inputs and targets were divided into training, validation, and testing subsets.A random selection of 65% of these inputs was used for training and 15 % were used as a validation dataset for hyperparameter tuning.The same training/validation/testing sets were used for all of the analyses reported herein.Model performance is reported using the testing data set, comprising the remaining 20% of the data.Before training, the inputs were standardized by subtracting the mean value and dividing by the standard deviation.All hyper-parameters were tuned using a grid search approach.The root mean squared error (RMSE) between the observed and model-calculated values (of Keff or ED weight) is used to assess the prediction quality of each model.The R 2 value was also calculatedbut is only used to further illustrate the quality of the predictions.

Deep Learning Implementation
The deep learning architecture was implemented in Python 3.6.9with Tensorflow V. 2.2.0 and CUDA version 10.1.Training and predictions were done on a P100 NVIDIA GPU.For both the "informed" and "uninformed" models, we used Adamax with a learning rate of 5e-4 as the optimizer.Training was stopped when performance on the validation dataset stopped improving within a patience value equals to 50.

CKA and Similarity Analysis
In addition to investigating whether machine learning algorithms can be trained to predict Keff using gridded binary K information, we also wanted to determine whether these tools can infer the underlying pattern of energy-dissipation in the process of inferring Keff.If it can be shown that the deep learning procedure naturally infers the spatial distribution of energy dissipation, then it would provide an example of how DL tools can "learn" underlying concepts.Further, because the distribution of energy dissipation indicates which parts of the medium are having the largest impact on steady-state flow, the ability to make inferences regarding these patterns would also enable an understanding of the relationship between Keff and the structure of the K distribution.Such knowledge would also be valuable for understanding soil property distributions that may impact dispersion, colloid trapping/mobilization, and erosion/piping.
To investigate the ability of deep learning tools to make inferences regarding the underlying pattern of energy dissipation, we applied the UNET methodology in both informed and uninformed modes, as described above.To compare how information flowed through the UNET in informed and uninformed modes, we examined the intermediate representations (i.e., hidden layer outputs) of each trained model.Specifically, the hidden layer outputs, known as hidden representations, characterize the "features" learned by a hidden layer of a neural network from an input (i.e,K grid), represented in a machine-readable format.Similarity measurements can be used to compare these intermediate representations between networks.[44] showed that for a similarity index to be suitable, it should be invariant to orthogonal transformation and isotropic scaling, and not be an invertible linear transformation.We use the Hilbert-Schmidt independence criterion (HSIC) [49], which is a kernel-based statistical measure of the independence between two sets of variables: (, ) = ( ) () Eq.3 where: , ,  ∈  × in which H is the centering matrix  =  − 11 ,1 is identity matrix, and K=  ( ) ,  ( ) ,  = ( ( ) ,  ( ) ) are positive semidefinite kernel functions.For linear kernels, K=(, ) = X .An HSIC value of 0 implies independence.Other researchers [12], [13], [44] showed that HSIC can be made to be invariant to isotropic scaling by normalization.This normalized HSIC index is known as centered kernel alignment (CKA): Eq.4 In this study, we used the Centered Kernel Alignment (CKA) metric proposed by [44] with linear kernels to evaluate the similarities of layer representations in our trained networks.Specifically, we calculated the CKA between corresponding intermediate representations of the informed and uninformed networks.To assess the similarity between corresponding intermediate representations of model 1 and model 2 at layer i and j, we flattened the representations and let  ∈  × and  ∈  × be the matrix of intermediate representations of model 1 and model 2 with m1 and m2 neurons for n examples.Then, we constructed the linear kernel matrices: K= X and L= Y .Finally, we used equation [4] to compute the CKA metric.We compared similarities for all paired combinations of layers to explore how information flowed through both networks.

Results
The main goal of this study was to investigate the impact that "structure" has on the effective value of hydraulic conductivity (Keff) of a binary heterogeneous medium.We examined this for multiple realizations of random fields that contain different percentages of the higher K material.
A key insight regarding this was presented by [29] and [30] who showed that the spatial distribution of energy dissipation during steady-state flow can be used to define spatially distributed weights on K that can be used to compute Keff.We first confirm this finding for the set of binary grids examined.Then, we examine whether deep learning algorithms can predict Keff with and without information regarding the ED weights.By comparing DL algorithms trained with and without access to energy dissipation information, we seek to understand the mechanism by which Keff is inferred by the DL.

Analysis of the Effective Hydraulic Conductivity (Keff) and High K Percentage
The steady-state flow problem, Figure (1), was solved for 3000 random realizations of a binary flow field for high conductivity mixtures ranging from 1 to 99%.Keff was computed from the overall gradient applied over the domain and the steady-state flow through the domain.Figure 3 indicates how Keff varies as a function of the percent high K material present in the realization.The parallel and series arrangements for each percent high K realization were calculated analytically and are seen to place limits on the ranges that Keff values can take.The mean value of Keff for each high K percentage is shown in the figure.
The plot demonstrates the nonlinear dependence of Keff on percent high conductivity.At low percentages of high conductivity, Keff is only minimally affected by the addition of more high K material and remains approximately equal to the conductivity of the lower K material.A nonlinear transition zone is seen to occur at approximately 40 to 70% high K, and the relationship becomes approximately linear above 70%.For a given percentage of high K, the maximum variance of Keff occurs in the central transition zone.
These results illustrate the two related but different challenges for inferring Keff from a binary grid: predicting mean Keff as a function of the percent high K material; and predicting Keff for a specific grid given knowledge regarding the percentage of high K material present.
Analysis of the Energy-Dissipation Weighting Method to Explain the Keff [29] showed that the pattern of energy dissipation, calculated from the square of the gradient of the potential, can be used to determine an upscaled property like Keff.This fact is confirmed by our study (Figure 4).The energy dissipation approach can be thought of as computing a weighted average of the local K values on the grid that perfectly recovers the flow-based Keff.Despite the power of the energy dissipation approach, the weights are very difficult to identify visually.For example, the two grids are shown in Figures 5a and 5b both have 80% high conductivity material but have strikingly different Keff values (0.53 and 0.24 respectively).The corresponding maps of the ED weights are shown in Figures 5c and 5d, illustrating that the grid with the lower Keff has a much more localized pattern of ED weighing.While it might be tempting to attribute this localized weighting to the connected pattern of low K cells running vertically through Figure 5b, beyond this qualitative assessment it is essentially impossible to visually infer the values of the ED weights from the knowledge of the spatial organization of K. Of course, both the pattern of ED weights and their values can be computed readily by solving the steady-state flow problem, but then the value of Keff can be determined directly and knowledge of the ED weights is superfluous.Accordingly, the ED weighting approach is best seen as a method for understanding spatial organization (e.g.[33]), rather than a practical approach for inferring Keff from a K grid.
By classifying the domain into high and low weight areas, we can see that the spatial structure of high-weight areas varies systematically with the percent high conductivity material.The paired images in Figure 6a show how the fractions of high energy cell relate to the corresponding ED maps for several K grids with different percentages of high K material.Figure 6b shows the expected fraction of high energy cells as a function of the threshold used to define high weight areas.There are two clear conclusions.First, the high energy area is restricted in a relatively small area for percent high conductivity conditions between approximately 50 and 80%.Second, the results are not highly sensitive to the choice of threshold.Finally, Figure 6c indicates a strong relationship between Keff and the fraction of high energy dissipation cells (defined with a threshold of 95%), but with some interesting complications to that relationship in the range of 50 to 60% high conductivity material.These results suggest that information regarding the fraction of high energy cells may be informative for inferring Keff for a given percent high conductivity material fraction, but that the relationship is likely to be complex.

Inferring Keff with UNET with and without ED weights
The uninformed UNET achieved (Figure 7a) a RMSE=0.0113and R2 =0.9984 when evaluated on testing data.On the other hand, the results of the informed UNET (Figure 7b) are interesting as it offers marginal performance improvement over uninformed with RMSE=0.0106 and an R2=0.9986 while direct training on the ED weights.This indicates that the structure of the UNET enables it to learn something that allows it to achieve this outstanding performance.Although the informed UNET was provided information regarding the ED weights during training, its predictions of Keff are made based solely on the K grid.In other words, training with knowledge of the ED distribution mainly affects the internal weights of the UNET.The result is that the "uninformed" and "informed" versions of UNET exhibit similar predictive performance (indicating equally informative representations of the overall input-output mapping) while learning different internal representations of the mapping from gridded K to Keff.

Inferring ED weights with UNET
The performance of the informed UNET for inferring EC is illustrated for some example grids in Figure 8.The correspondence between the ED weights predicted by the informed UNET and the value calculated directly from the flow model shows low RMSE (0.0069) and high R2 (0.9549) and the ability of the UNET to infer the fraction of high energy cells is likewise good (RMSE=0.04876and R2=0.9832).However, there is still a considerable mismatch (Figure 9A).In particular, UNET consistently under-predicts the ED weights for cells that have very high actual weight, while consistent over-predicting the fraction of high energy cells for cases with intermediate percent high K (Figure 9B).From Figure 5b, these are the conditions that give rise to the most concentrated weighting.Taken together, these results suggest that the UNET has difficulty in inferring the ED weights when they are concentrated in highly localized areas (e.g.60% high K material in Figure 5b).

Discussion
Based on the results presented above, we focuse to discuss two issues.First, can DL learn relationships that can predict both the trend and grid-specific variation of Keff based on the binary grid of K? Second, can DL architecture be designed to infer patterns associated with ED weighting in the intermediate layers without providing that information during training?
Dependence of the ED Weighting Distribution on the K Field The Keff associated with binary grids shows a highly nonlinear dependence on the percentage of high K material (Figure 3).Specifically, Keff is closer to the arithmetic mean for materials with low to medium percentage of high K, while being approximately halfway between the arithmetic and harmonic means for materials with a higher percentage of high K.The variation in this trend is due to the influence of specific structural patterns in the spatial distribution of high and low K cells among grid realizations.The maximum degree of variability occurs for materials with intermediate percentages of high K values.In general, both the trend and the specific variations in Keff are very well explained by ED-weighted averaging (Figure 4).
Given that the energy dissipation weights carry information regarding the impact of structure on the effective conductivity of a binary K field, we examined the nature of this weighting as a function of the percentage of high K material present in the medium.Specifically, we defined the minimum area that contains 95% of all of the ED weight, and classified the cells within this region as being 'high energy cells'.
At high and low percent high K conditions, the medium is nearly homogeneous, but the energy is distributed over ~75% of the domain (Figure 6b).The ED weighting is more highly constricted, residing in a smaller number of high energy cells, for 60% high K material grids.The restricted high K areas centered around 60% high K material tend to form localized regions within which most of the energy dissipation occurs, indicating the influence of structures that force the flow to occur through regions of relatively low K, leading to high energy loss.However, as the percentage of high K increases to 80%, the high weight areas become concentrated in a small number of unconnected regions, suggesting a different structural mechanism whereby flow is forced through a small number of low K cells, rather than being channeled through a continuous structure.

Comparison of Performance
In summary (Table 1), performance improves when ED information is provided.In terms of RMSE and R 2 , Both inormed and uninformed UNET performed extremely well.So, the differences in performance are mainly due to their abilities to make case-specific use of structural (pattern) information, which manifests as variations in Keff at any given percentage of high K material (Figure 7).The performance was poorest when Keff values are low (Figure 7).The performance was also relatively poor for intermediate percentage levels of high K (Figure 10).That is, the methods had the most difficulty when localized structures act to impede flow, whether those structures are organized as a continuous region (intermediate high K percentage) or as isolated blocks of low K material (low Keff).

Hidden Layer Representation Analysis
The superior performance of the informed UNET is notable because it does not require that the flow problem be solved to make predictions for the testing set.Specifically, once trained with ED weight information (requiring solving the flow problem during testing and validation), the UNET algorithm uses the learned relationships to infer the values of the ED weights for the test samples and combines this with the K grid to infer Keff.
The performance of the uninformed UNET, for which ED weight information was never presented, so the flow problem never had to be solved, is comparable to that of the trained UNET.Given that the ED weights are thought to srepresent a key mechanism linking the K grid to the value of Keff, this raises the question of whether the uninformed UNET is somehow inferring information regarding the distribution of ED without being explicitly provided with such information during training.
For the informed UNET, the output layer of the lower branch, which is concatenated with the K grid before the final step of inferring Keff, represents the ED weight distribution.Examining the corresponding layer of the uninformed UNET shows no correlation with the true ED weights.However, a more advanced analysis, based on computing the centered kernel alignment similarity (CKA) [44], provides a more complete picture of the information flows through the informed and uninformed UNETs.These results are visualized as a similarity matrix (Figure 11).The output of each layer of the informed model is compared to other layers of the uninformed model to examine the degree of similarity between them while accounting for the presence of invertible linear transformations.A similarity value of zero between two layers indicates that their representations are not invertible linear transformations of each other while a similarity value of 1 indicates that the two layers are equivalent up to a linear transformation.
We first compared the results for the informed UNET with that of an untrained network with random initial weights and the same architecture (Figure 11a).The values on the diagonal (representing the same layer in the two networks) have high CKA similarity for the first three layers; this makes sense given that both networks are being fed the same inputs.However, the similarity begins to diminish beyond that point; they show very strong dissimilarity at the output layer, where the informed UNET is constrained to predict values that correspond to the ED weights.They also differ strongly at the final dense layer because the untrained network did a poor job of inferring Keff.
Comparing the informed and uninformed UNETs gave striking results (Figure 11b).Namely, layer similarity remains high for all layers except the output layer, where the informed UNET is required to predict values that correspond to the ED weights.Further, the final dense layer is also highly similar, reflecting the near-identical skill in predicting Keff achieved by both the informed and uninformed UNET.
In general, these results and patterns of similarity are consistent with the findings of [44], [50].They show that there can be many possible intermediate architectural solutions to achieve the same task, but that the representations learned for the layers closer to the inputs and the outputs tend to be similar.We interpret this to mean that the untrained UNET can "learn" some useful information that is related to the ED weights directly from the K grids.This information is not a direct map of actual ED weights.So, when required to produce such a map (training under-informed conditions), the UNET learns an intermediate relationship that can provide this map and to the user.It then uses the ED distribution to infer Keff.However, when not required to produce an ED map (training under uninformed conditions), the UNET does not develop a layer to translate the information to a user-readable ED map.Rather, the latent information about the ED weights propagates through the UNET, with an associated change in the final dense layer to produce high-quality inferences of Keff.The CKA analysis cannot uncover relationships between networks in the presence of invertible nonlinear transformations.To examine this, we sequentially swapped the weights of the uninformed UNET with those of the informed UNET.Specifically, at each step of this analysis (i.e., for each layer), we used the weights of the uninformed model for the preceding layers while maintaining the informed UNET weights for the succeeding layers.The results (Figure 12) are presented with the deepest layer at the top left, progressing along each row and then downward to the final layer at the bottom right.There are strong linear correlations between the observed Keff and that predicted with the 'swapped' network until the substitutions reach the conv2d_12 layer.This is consistent with the high CKA representation similarity to this layer (Figure 11).There is a strongly nonlinear relationship for conv2d_13, which corresponds with a low CKA value at this layer.In the final layer (i.e, output layer), we see a strong negative linear correlation between the output of the mixed structure model and that of the informed model.This pattern is consistent with the high CKA value observed in Figure 11 and suggests that an orthogonal transformation between the weights was necessary to overcome the changes applied in the deeper layers and recover the correct Keff values.This analysis suggests that both the informed and uninformed UNET are implementing similar computational processes, ostensibly extracting information corresponding to the ED distribution from the K grid, but representing it differently in n-d dimensional space.Further, that the user-imposed requirement to produce a readable ED map results in a nonlinear transformation that must be compensated in later layers to produce accurate inferred Keff values.

Conclusions
We have investigated the ability of DL algorithms to infer the effective hydraulic conductivity of binary K grids.DL methods were able to infer Keff with extremely high accuracy (R2 > 0.99) when provided with only the binary grid.
Relying on previous work that showed the value of energy dissipation weighting for understanding and inferring Keff, we examined whether providing such information improved the DL performance.While adding information derived from the ED distribution improved the performance of each algorithm, the improvement was similar to that realized by increasing the algorithmic complexity.The UNET architecture could be trained to infer the ED weighting from the K grid.This finding was supported by a similarity analysis of the hidden layers of UNETs with and without ED information provided.The accuracy of the inferred ED weights was lower when the energy dissipation weights were concentrated into small areas; i.e., the UNET was better able to infer the impacts of diffuse structures than highly localized structures.This finding may be due to the relatively small number of realizations that showed strong structural control in our sample set, suggesting that future work should examine this possibility.
While the UNET extracted the relevant ED weight information from the K grids, it only translated this information to a user-readable map if forced to do so.This may have other implications for the use of ML/DL techniques in subsurface hydrology.For example, ML/DL algorithms may be able to implicitly infer head distribution information 'naturally' if they are trained to predict streamflow; but the head distributions may not be available to the user unless the algorithms are specifically designed to produce them.This may be an important consideration if ML/DL algorithms are applied to models with multiple calibration data types or if the models will be used for multi-objective decision support.

Figure 1 .
Figure 1.Sample 25x25 cell grid with 50% high K (white) and 50% low K (grey) cells, constant head boundaries (blue), and no flow boundaries (diagonal lines).The left boundary has a constant head of 2 and right boundary has a constant head of 1, with flow occurring from left to right.

Figure2)
Figure2) Proposed U-net architecture.The architecture is composed of two submodel.Energy dissipation model has a UNET shape structure followed by a CNN model to map output of UNET to Keff.Blue box corresponds to a multi-channel feature map.The number of channels is denoted on top of the box.The x-y-size is provided at the lower left edge of the box.White boxes represent skipped connection.The arrows are operations performed on feature maps described in the legend.

Figure 3
Figure 3 Keff distribution as a function of percent high K for medium K contrast condition

Figure 4 )Figure 5 )
Figure 4) Keff estimation using the energy dissipation method

Figure 6 )
Figure 6) The energy dissipation pattern for different percent of high K materials.A: Grid samples and their corresponding energy dissipation weightings of high contributing cells as a function of percent of high K material.B: Average fraction of high energy dissipation cells as a function of the percent high K material, shown for definitions of ''high energy dissipation".C: relationship between high energy cells and Keff for different ranges of high k percentage.

Figure 7 )
Figure 7) the testing performance of Keff estimation using different methods.A: Keff estimation using the energy dissipation Uninformed UNET model.B: Keff estimation using Informed UNET model with pre-training on energy dissipation.

Figure 9 )
Figure 9 ) Performance of informed UNET model in energy dissipation estimation A: Energy dissipation weighting prediction for all grids.B: Fraction of high energy dissipation cells prediction performance as function of percent of high K material.

Figure 8 )
Figure 8) Samples of energy dissipation weight distributions prediction for different ranges of percent of high K material.Panel A: Observation.Panel B: Predicted values.

Figure10)
Figure10) Difference between inferred and actual fraction of high K cells for each grid.To compare the errors of grids at each high k percentage, the values of left y axis is scaled by average of actual number of high k cells at each k percentage.The fraction of high K cells for a 95% threshold is presented by blue line.

Figure 11 )
Figure 11) CKA similarity matrix between A) Informed Unet and untrained Unet b) Informed Unet and Uninformed Unet

Figure 12 )
Figure 12) Correlation between True Keff and the output of Unet model built up by sequential substitution of Informed model weights with Uninformed Unet collectively.Each subplots labeled with the corresponding layer name in DL model.

Table 1 )
Training, validation, and testing performance of all models