Ising Model for Interpolation of Spatial Data on Regular Grids

We apply the Ising model with nearest-neighbor correlations (INNC) in the problem of interpolation of spatially correlated data on regular grids. The correlations are captured by short-range interactions between “Ising spins”. The INNC algorithm can be used with label data (classification) as well as discrete and continuous real-valued data (regression). In the regression problem, INNC approximates continuous variables by means of a user-specified number of classes. INNC predicts the class identity at unmeasured points by using the Monte Carlo simulation conditioned on the observed data (partial sample). The algorithm locally respects the sample values and globally aims to minimize the deviation between an energy measure of the partial sample and that of the entire grid. INNC is non-parametric and, thus, is suitable for non-Gaussian data. The method is found to be very competitive with respect to interpolation accuracy and computational efficiency compared to some standard methods. Thus, this method provides a useful tool for filling gaps in gridded data such as satellite images.


Introduction
The current availability of massive remotely sensed georeferenced datasets, pertaining to land cover, terrain elevation, population, meteorological variables, and atmospheric pollution creates increasing demands for efficient processing and analysis methods. The information contained in such Earth-observation data can help to develop reliable tools for ecosystem management, environmental policy, the design of real-time hazard warning systems, and various other decision-making tasks. However, the Earth-observation data typically require preprocessing before they can be used in standard analytic methods. A typical problem is data heterogeneity, i.e., the fact that data are acquired by different modalities, using different methodologies and space-time resolutions. Furthermore, data coverage is often incomplete due to different reasons, such as limited resources (material, human, and technical), equipment limitations (detection level and resolution), sensor malfunctions, or adverse meteorological conditions (observations hindered by clouds) [1,2].
Resolution differences between different sensors as well as data gaps create the missing data problem. In order to apply standard tools for the analysis of space-time earthobservation data, there is a need to fill the gaps and to unify the resolution. These tasks involve downscaling (refining) data with sparse resolution and generating optimal estimates at points without measurements. The mathematical problem of gap filling is interpolation: Estimates of the variable under consideration need to be generated at the target point based on the available data in the vicinity of the target point. Depending on the nature of the modeled variable, interpolation involves either a classification problem (if

Definition of the Interpolation Problem
Let us consider a set of sampling points G s = { r i }, where r i = (x i , y i ) ∈ R 2 and i = 1, . . . , N. These points are assumed to be distributed on a rectangular gridG of size N G = L x × L y , where L x and L y are, respectively, the horizontal and vertical dimensions of the rectangle (in terms of the unit length) and N < N G . Let z i be a value attributed to the point r i . Then, the set Z s = {z i ∈ R} N i=1 represents the sample of the process. Let G p = { r p } P p=1 be the set of prediction points where p = 1, . . . , P such thatG = G s ∪ G p . For label data, Z s takes values in a set of discrete labels {C q } N c q=1 . For regression applications, Z s can be represented as a realization of an underlying continuously valued random field Z( r i ). In the following, we discretize the continuous distribution using a number of classes, C q , q = 1, . . . , N c . The classes are defined with respect to a set of threshold levels t k , k = 1, . . . , N c + 1, where t 1 = min(z 1 , . . . , z N ) and t N c +1 = max(z 1 , . . . , z N ). Each class C q corresponds to an interval as follows: C q = (t q , t q+1 ] for q = 2, . . . , N c − 1, C 1 = (t 1 , t 2 ], and C N c = (t N c −1 , t N c ).
We define the indicator field I Z ( r p ) to take integer values q ∈ {1, . . . , N c } equal to the appropriate class index for the value z( r p ). In particular, I Z ( r p ) = q implies that z( r p ) ∈ C q (where C q is a specific label for the classification problem or a specific interval for the regression problem). The interpolation problem can then be posed as a classification problem for both classification and regression applications, i.e., each point in G p is assigned a class label. In order to estimate the class identity of Z at the prediction points, we use the well-studied Ising model. Once all the prediction points have been assigned to a class, a map of the process Z can be generated consisting of equivalent class (isolevel) contours.

The Ising Model
For each class q, let us assume a set of variables {s q i } N i=1 ("spins") that can take the value s q i = 1 ("spin-up") or s q i = −1 ("spin-down"). The Ising model considers pairwise interactions between the spins, expressed by the following Hamiltonian (for brevity, we drop the class index) [22]: where the symbol H[{s}] denotes that energy is a function of the set of spin values (spin configuration).
In general, spin configurations that result in lower energy are more likely to be realized. The first term in the energy corresponds to the "spin-spin exchange" interaction energy. The coupling strength J i,j controls the strength as well as the type of the interaction: if J i,j > 0, it is "ferromagnetic" (it favors spins of the same sign), but if J i,j < 0 it is "antiferromagnetic" (favoring spins of the opposite sign). The second term corresponds to a symmetry-breaking bias, which is caused by the presence of a site dependent "external field" h i . Positive (negative) values of the external field favor spins of the same sign. Hence, h i controls the overall distribution of the "spin" values between 1 and −1 (the magnetization). The coupling strength J i,j is usually considered to be uniform, and its range limited to nearest neighbors. However, the model can be generalized to include also non-uniform, longerrange couplings.
The probability density function for a spin configuration {s} is given by the following Boltzmann-Gibbs exponential expression: where k B is Boltzmann's constant and T the temperature. The partition function Z is a normalization factor obtained by summing the exponential e −H[{s}]/k B T over all possible "spin configurations". Hence, it is only a function of the model parameters J i,j and h i but not of a particular configuration.
In the forward problem, the coupling strength and the polarizing field are known, and one is interested in the most probable spin configurations or in the calculation of the spin correlation function. In the inverse problem, the "spins" at certain locations are known (they can be obtained from the sampled field values). The estimation process focuses on inferring the model parameters (e.g., by means of the maximum likelihood method) that best represent the observations. Unfortunately, the normalizing constant Z is in most cases intractable by analytical means, and its numerical evaluation is a computational bottleneck. Possible approaches to circumvent this problem, such as the maximum pseudo-likelihood approach [23] or various Markov Chain Monte Carlo estimation techniques [24], can be either very inaccurate or prohibitively slow, respectively.
Once the model parameters are determined, the optimal values of the "spins" at non-sampled locations (i.e., where the data gaps are), can be determined by maximizing the conditional (on the data) probability f (equivalently by minimizing H) with respect to the unknown values.
In order to circumvent the difficult problem of parameter estimation, we use a non-parametric method, explained below, that does not require knowledge of the Ising model's parameters.

Non-Parametric Nearest-Neighbor Model
In the following, we use the ideas motivating the Ising Hamiltonian (Equation (1)). In this study, we restrict the scope of the Ising model to the simplest energy functional: We set the polarizing field uniformly to zero, i.e., h i = 0, i = 1, . . . , N and limit the exchange interactions to uniform "ferromagnetic" strength only for nearest neighbors (N.NB.), i.e., J i,j = J > 0 if i ∈ N.NB.(j) and J i,j = 0 otherwise. The choice of zero polarizing field does not allow explicitly controlling the ratio of "up" versus "down spins". As explained below, this is achieved in the simulations by selecting the initial "spin" values so as to reflect the "up-down" spin distribution of the sample.
If we are dealing with an interpolation problem that involves only two classes, the interpolation is performed in a single pass. The "data" Z s = {z i } N i=1 are transformed into discrete variables ("spins"). If our problem involves multiple classes (resulting from different labels or from the discretization of continuous value), a hierarchical interpolation scheme is used. In this scheme, the sample, G q s , and prediction, G q p , sub-grids are progressively updated as the class index q changes from 1 to N c . For the lowest class G 1 s = G s and G 1 p = G p , where G s and G p are the initially defined sampling and interpolation grids, respectively. For all classes, q = 1, . . . , N c , G q p ∪ G q s =G. As we increase q, the sites with negative spins join the updated sample subgrid, and they are simultaneously removed from the prediction subgrid. At each level q, the discretization is binary with respect to the respective threshold value, i.e., s where N q is the number of sites with known values at level q. For q > 1, the sample (prediction) subgrid is augmented (diminished) by the grid nodes r l ∈ G p for which s q−1 l = −1. It follows that N 1 = N and where q = 1, . . . , N c includes all the spin values for the class index q. The union of the two sets containing the sample and prediction values at level q, i.e.,S q = S q s ∪ S q p contains the "spin" values over the entire gridG for the specific level. The Ising model can then be used to represent spatial interactions between the spins S q for level q, which means that the spins are defined with respect to the corresponding binarization threshold.
The hierarchical scheme outlined above helps to avoid the parameter inference problem and suggests a non-parametric approach. This approach utilizes a cost function, U(S q p |S q s ), that measures the deviation (squared difference) between a suitably normalized energy, C q s , of the sample configuration at level q and the respective energy of the spin configurationC q over the entire grid. This is given by the following: where C   for p = 1, . . . , P q and r p ∈G do 28 ifŜ (i) ( r p ) = −1 then 29Î Z ( r p ) ← q; n q ← n q + 1 ; // assign −1 "spins" at level q 30 end 31 end 32 P q+1 ← P q − n q ; // Update # prediction sites for next class 33 q ← q + 1 ; // increase class index 34 end 35 q ← N c ; // Set non-assigned spins at level N c 36 for n = 1, . . . , N G do 37 ifÎ Z ( r n ) = NaN then 38Î Z ( r n ) ← N c 39 end 40 end

Hierarchical Strategy
The hierarchical algorithm proceeds sequentially at the lowest binarization threshold and proceeds by increasing the class index. The binary discretization and the classification of the non-measured sites are initially performed with respect to the first class and then repeated sequentially for the remaining classes. The "gaps" in the prediction subgrid, G p , are gradually filled as the algorithm proceeds through consecutive levels. At each level, all the locations identified as having −1 spin values at the lower levels are used as input (sample data) in the current stage. The reduced prediction subgrid, G q p , for the class index q contains P q points so that for q > q it holds that P q ≤ P q and P 1 = P. In the case of continuous variables, the classes C q can be defined as desired and do not need to represent intervals of uniform size.
The INNC algorithm uses the rejection ratio, which is defined by the following. ρ = number of rejected states number of simulated states .
The rejection ratio is constantly updated and is used to control when the algorithm should stop proposing new states and move on to the next class level q.
The main steps of the INNC method are shown in the pseudocode of Algorithm 1. This algorithm returns an indicator fieldÎ Z = I Z (G s ) ∪Î Z (G p ), which consists of the original sample classes and the class estimates at G p . The indicator values at the sampling sites are exactly reproduced because the initial state respects these values and the iterative steps skip over sites in the updated sample set S q s . Below, we refer to I Z (G s ) as the training set.
Note that Algorithm 1 is presented for non-vectorized implementation, but the generation of new states (line 12) is actually realized using vectorized single-spin Metropolis updating. The vectorization is enabled owing to the fact that the square grid can be divided into two interpenetrating subgrids in a checkerboard fashion (checkerboard decomposition). Hence, by considering the short-range character of the interaction restricted to the nearest neighbors, the spins in the first subgrid only interact with spins of the second subgrid and vice versa. By means of this decomposition, it is possible to apply the updating algorithm to spins belonging to the same subgrid in parallel. The algorithm sweeps through the lattice several times until the rejection ratio exceeds the threshold value (herein, it is set to one).
Spin updating is performed at zero temperature. The T = 0 constraint means that there is no stochastic selection of unfavorable spins. Hence, candidate "spins" to be updated are flipped unconditionally only if the flip lowers the cost function. This is called a "greedy" Monte Carlo algorithm [25], and it guarantees convergence, which is usually very fast. In comparison, in simulated annealing, T is slowly lowered starting from an initial high-temperature state. This approach is much slower computationally, but the resulting configuration is less sensitive to the initial stateŜ The sensitivity of the greedy algorithm is known to be especially pronounced in high-dimensional spaces with non-convex energies. In such cases, the greedy algorithm is more likely to become stuck in the local minima instead of converging to the global minimum. However, this is not a concern for the interpolation problem. In fact, targeting the global minimum of the cost function U strongly emphasizes the sample correlation energy per "spin" pair C q s , ignoring that the latter is influenced by sample-to-sample fluctuations.
The initial configuration can be selected with a number of methods. Since the proposed model aims to provide a fast and automatic interpolation method, the initial configuration should minimize the relaxation path (in state space) to the equilibrium. It should also be selected preferably with little or no user intervention. Assuming a certain degree of spatial continuity, which is common in geospatial data sets,Ŝ q (0) p is determined based on the "augmented sample" states S q s in the immediate neighborhood of each individual prediction point. The neighborhood of such a node r p is determined by an adaptable m × m stencil (where m = 2l + 1) centered at r p . The stencil size m ≤ m max is adaptively determined, reflecting local sampling density and spin value distributions. Starting from an initial value of m = 3, we tested if a clear majority of either positive or negative spin values is established within the stencil. If this is not the case, we increased m by one, tested again, and repeated testing as necessary. An arbitrary upper bound m max is imposed on the stencil size to prevent oversmoothing and to restrict the computational load (memory and CPU time). Then,ŝ q (0) p is assigned by majority rule, based on the prevailing value of its neighbors in S q s inside the stencil. If there is no prevailing sign (i.e., if an equal number of +1 and −1 values are present or if r p has no neighbors in S q s inside the stencil), the initial value is randomly assigned.
The proposed INNC updates are accepted unconditionally as long as they lower the cost function of Equation (3). Using the vectorized checkerboard algorithm, the entire grid is swept in two steps. The simulation terminates for a given class index q if one complete sweep through the interpolation subgrid G q p does not produce a single successful update. The hierarchical scheme used implies that the computational load is reduced with increasing q, which is in line with the reduction in size of the subgrid G q p . The input information required by the algorithm, thus, involves the definition of the class intervals and the maximum stencil size m max used to generate the initial state. The number of classes depends on the nature of the problem and the objective of the study: If the interpolation problem involves discrete class labels, the number of labels is predetermined and no discretization is needed. If the interpolation problem involves a continuous-valued process, the discretization depends on the objective of the study. If the goal is to determine exceedance levels, binary classification is sufficient. For environmental monitoring and decision-making purposes, a moderate number (e.g., six or eight) of classes is often sufficient. For example, the Fire Weather Index used to measure fire risk in Europe is mapped into six classes (very low, low, medium, high, very high, and extreme) [26]. However, a higher number of classes can be used if one desires to resolve the values of the modeled process in a superior manner.

Data Description
The performance of the INNC interpolation method is demonstrated with two environmental data sets as well as with synthetic (simulated) data. The first set represents a map of soil quality data categorized in different classes. This data set contains a finite number of discrete levels; thus, the prediction of missing data is inherently a classification problem.
The second data set represents a map of surface elevation; the latter is a continuously valued variable. For the purpose of generating an isolevel elevation map corresponding to some predefined resolution, the elevation data should be discretized according to the desired resolution to allow applying INNC.
The above two environmental data sets are used to assess the classification and regression performance of INNC for gap filling. Both data sets exhibit a skewed non-Gaussian probability distribution, which allows testing the ability of INNC to operate under non-Gaussian conditions. Finally, we generate synthetic sets of spatially correlated data of different sizes. These enable us to assess the computational complexity and ability of INNC to automatically fill gaps in very large data sets, such as remote sensing images.

Soil Quality
This data set describes soil quality for crop production over a major part of Europe and is obtained from the Harmonized World Soil Database. The latter is a 30 arc second raster database with over 16,000 different soil mapping units that combines existing regional and national updates of soil information worldwide with the information contained within the 1:5,000,000 scale FAO-UNESCO Soil Map of the World [Soil data2008].
Our chosen data represent the soil nutrient availability segregated in seven classes according to the degree of constraints imposed on soil quality (1-no or slight; 2-moderate; 3-severe; 4-very severe; 5-mainly no soil; 6-permafrost; 7-water). In this classification, lower numbers correspond to "better" soil quality. The spatial domain considered is a rectangle of 120 × 85 pixels. Some summary statistics are as follows: size N G = 10,200, z min = 1, z max = 7,z = 1.785, z 0.50 = 2, and σ z = 0.984. The value of the skewness coefficient is 2.02 and of the kurtosis coefficient 9.632. The frequency histogram of soil quality class values is presented in Section 7.1.

Surface Elevation
This data set represents the surface elevation on a 5 min latitude/longitude grid over part of the territory of North America (approximately 80°-110°W, 55°-40°N). The data form a rectangle comprising 400 × 200 pixels [Surface data1988]. Elevation values (in meters) are referenced to the center of each cell with a resolution of 1 m.
Some summary statistics are as follows: size N G = 80,000, z min = 1 m, z max = 3790 m, z = 774.41 m, z 0.50 = 441 m, and σ z = 713.17 m. The skewness coefficient is equal to 1.37, and the kurtosis coefficient is equal to 4.07. As evidenced from the above statistics, the data are non-Gaussian and positively skewed. The elevation frequency histograms corresponding to the respective class intervals considered in the study are presented in Sections 7.2-7.4.

Synthetic Data
The synthetic data are simulated from the joint Gaussian distribution with mean m = 50 and standard deviation equal to σ = 10, i.e., Z ∼ N(m = 50, σ = 10). The spatial correlations are imposed by means of the exponential covariance C(r) = σ 2 exp(−r/ξ), where ξ = 5 and r represents the Euclidean distance between any two grid nodes. The exponential covariance function implies that the spatial process is relatively rough and, thus, is appropriate for modeling, e.g., soil processes.
The data are generated on square grids with L nodes per side, where L = 32, . . . , 2048 using the spectral simulation method [4,29,30]. The largest grid size examined is typical of data sets collected by various remote sensing techniques.

Missing Data Simulation and INNC Performance Assessment
In order to generate data sets with missing data (gaps), we follow the methodology described below. From each complete data set, we generated a partial sample Z s of size N = (1 − p)N G by randomly removing P = pN G nodes. The removed values are set aside for validation purposes. For three different degrees of thinning, p = 0.33, 0.5, and 0.66, we generate 100 different partial sample configurations. These differ from each other with respect to the set of grid nodes that have been removed. The values of the process at these validation nodes are then estimated by using the INNC interpolation Algorithm 1.
In order to assess INNC performance, the estimated values at the validation nodes are compared with the true values (which were removed from the respective sample). In classification performance evaluation, the indicator values I Z (G p ) at the validation nodes are compared with the estimatesÎ Z (G p ), obtained after removing the set of nodes G p from the data. The originally discrete data (soil quality) are used without further processing.
To test the interpolation performance of continuously valued data (surface elevation and synthetic), we first discretize the data according to the desired resolution. We use different resolutions and respective class intervals. For the surface elevation data set, a resolution of 500 m is used first, which segregates the data into N c = 8 classes and C q = [500(q − 1), 500q), q = 1, . . . , 8. Second, a finer resolution of 250 m is used, resulting in N c = 15 classes corresponding to the intervals C q = [250(q − 1), 250q), q = 1, . . . , 15. Finally, we gradually increase the resolution up to N c = 100 classes in order to test the interpolation performance for data with almost continuous variations.
In the case of synthetic data, we arbitrarily discretize the entire range of observed values into N c = 8 classes and test the interpolation performance with increasing domain size. This design aims to study the scaling of INNC computational complexity with size. The INNC algorithm is applied in all cases with a maximum stencil size m max = 5.
The measure that we use to assess interpolation performance of the INNC algorithm in the case of classification is the misclassification rate, i.e., the fraction of misclassified pixels defined by the following: where P is the number of validation points, I Z ( r p ) is the true value at the validation points, andÎ Z ( r p ) is the INNC estimate; δ(I, I ) = 1 if I = I , δ(I, I ) = 0 if I = I is the Kronecker delta.
In the case of a large number of classes, the root mean square error (RMSE) is a more suitable measure for evaluating INNC interpolation performance. As will be shown in the following section, the RMSE typically shows a decrease with increasing N c up to some threshold N * c beyond which it stabilizes and becomes independent of N c . The value of N * c depends on the data set under consideration, but it appears to decrease with sample sparseness. The RMSE is defined as the following: where Z( r p ) is the original true value at the point r p , andẐ( r p ) is the estimate of the continuously valued field. This estimate is obtained from the classification ofÎ Z ( r p ) and a subsequent back-transformation of the indicator field scale to the original continuum scale.
Z( r p ) = tÎ Z ( r p ) + 1 2 tÎ Z ( r p )+1 − tÎ Z ( r p ) . In Section 7, we use RMSE to assess the interpolation of surface elevation data. Average values of the misclassification rate and the RMSE, respectively, are obtained from ensembles of different missing-data realizations with the same degree of thinning.
The computations are performed in the Matlab® programming environment on a desktop computer with 16 GB of RAM and an Intel®Core ™i7-4790 processor with an 3.60 GHz clock.

Soil Quality
The map and the histogram of the complete data are shown in Figure 1. It is evident in these plots that the first three (1-3) classes clearly dominate over the remaining ones, covering almost 97% of the spatial domain. Figure 1b, in addition to the histogram of the class values for the complete data (left bar) also includes the histograms of the reconstructions by means of INNC classification for the three degrees of thinning (p = 0.33, 0.5, and 0.66). These histograms are shown by the respective bars 2-4 (moving from left to right) in Figure 1b.
The histograms (bars 2-4 in Figure 1b display mean values obtained from 100 realizations (missing data configurations for a given p). The match between the distributions of the original and the reconstructed data deteriorates with increasing p. In particular, the second class is overestimated at the cost of mainly the third class. Note that the second class is the closest to the mean value (calculated as the sum of the class values, each multiplied with the respective probability). Hence, the overestimation of the second class can be attributed to the averaging effect, which is common in interpolation methods.  Figure 2 presents the reconstructed maps based on INNC. These maps are obtained from a single realization (the first from the ensemble of one hundred). As already suggested by the histograms in Figure 1b, the visual agreement between the original map (shown in Figure 1a) and the reconstructions deteriorates with increasing sparsity of the sampling subgrid. For example, for p = 0.66, the most apparent misclassification is observed in the sixth class (permafrost, shown in orange color). This class appears in the complete data in very small and disconnected clusters, which are surrounded by bigger clusters that belong in different classes. Therefore, permafrost clusters can be viewed as "hot-spots", which are difficult to predict particularly because sampling points in this class are sparse. Misclassification also occurs along the borders between different classes. Note that the values on the sampling subgrid G s (which varies between realizations) do not contribute to misclassification since INNC by construction honors the values on G s .  Table 1 lists quantitative measures of the INNC classification performance based on statistics calculated over the ensemble of 100 realizations. These include the mean misclassification rate F * , the standard deviation STD F * of the misclassification rate, and CPU time T cpu . The above are complemented by measures intrinsic to the current method, such as the mean number of Monte Carlo steps, N MC , required to optimize the cost function and the mean cost function U * at termination. The averaging of N MC and T cpu is performed over individual realizations for the cumulative values all the class levels. The averaging of the cost function U * is performed over both the ensemble of realizations and all class levels of each realization.
To validate the classification ability and computational performance of INNC, we compare it with the commonly used the fuzzy k-nearest neighbor (FKNN) classification algorithm [31] implemented in the Matlab , function fknn [32]. We chose the FKNN method because it has been shown to dominate its non-fuzzy counterpart in terms of lower error rates and also to compare well with other standard more sophisticated classification methods. At the same time, it is still relatively simple and computationally efficient enough to process larger data sets (it would be computationally impossible to perform the analysis presented below by using some more sophisticated classification methods, such as Support Vector Machines). As expected, the misclassification rate increases with p for both methods. However, INNC exhibits superior performance. First, the INNC misclassification rate is lower than the FKNN method's misclassification rate, i.e., As for the remaining INNC measures, one can notice a slight increase in N MC with p. However, the total number of steps remains very low, and the difference between the smallest value for p = 0.33 and the largest value for p = 0.66 does not exceed one MC sweep. There is also some increase in U * with p, but all the values obtained reflect a satisfactory level of convergence to the optimum. We note that even though the greedy algorithm does not pursue global minima, the values of the cost function are quite close to zero.

Surface Elevation: Resolution 500 m-8 Classes
The isolevel map and the histogram of the complete data, discretized according to the vector of thresholds corresponding to this resolution are shown in Figure 3. In the map, the elevations in the range of 0 ≤ Z < 500 m (first class) dominate, covering about 55% of the area, while those above 3500 m correspond only to 0.1%. Figure 3b presents the histogram of the class values for the complete data (left histogram bar) as well as the histograms of the INNC reconstructions for the three degrees of thinning. The histograms of the reconstructions based on the training sets with p = 0.33, 0.5, and 0.66 are shown by the 2-4 (moving from left to right). The histograms represent average values obtained from 100 realizations. The match between the probability distributions of the complete data and the reconstructions is excellent. As mentioned above, this was achieved without explicit control by means of an external field, i.e., by using h i = 0 in Equation (1).   Figure 4 helps to visualize the interpolation results in terms of reconstructed maps. The isolevel maps are obtained from a single realization (the first from the set of one hundred). We observe that the reconstructed maps provide a close visual match to the original map, shown in Figure 3a. This is the case not only at lower p but also the spatial patterns of the original map are reconstructed surprisingly well at p = 0.66.  The first part of Table 2 presents the quantitative measures of INNC interpolation performance along with the measures obtained by means of FKNN. As expected, the misclassification rates increase with p for both INNC and FKNN. However, the INNC misclassification rates are again much lower than those of FKNN method, i.e., F * (INNC) / F * (FKNN) = 0.74, 0.75 and 0.78 for p = 0.33, 0.50 and 0.66, respectively. However, in comparison with the soil quality data, the relative differences between the methods are smaller. On the other hand, the computational efficiency of INNC with respect to FKNN is even higher, i.e., T cpu (INNC) / T cpu (FKNN) = 0.004, 0.004, and 0.005 for p = 0.33, 0.50, and 0.66, respectively.
The values of N MC are slightly higher than for the soil quality data, and their increase with p is more apparent. While the overall increase in N MC for the surface elevation data can be ascribed to the increased size of the data set, the increase in N MC with p can be generally ascribed to the fact that N MC is a measure of the "spin" system's relaxation time.
On the other hand, increasing p translates into higher P and, thus, a larger state-space of size 2 P . Since the number of prediction nodes, P q , decreases with q due to the progressive filling of gaps by the INNC hierarchical scheme, the Metropolis sampler tends to speed up as q increases. The relaxation time is shortened by proper choice of the initial state.
There are interlevel differences in the value of U(S q p |S q s ), but their magnitudes remain relatively small. For example, even at p = 0.66, which results in the highest values of the cost function, max(U * ) ≤ 10 −3 . The average CPU time needed for the optimization at any p is of the order of a fraction of second. The very low values of N MC and T cpu are also due to the vectorized implementation of spin updating using the checkerboard algorithm.

Surface Elevation: Resolution 250 m-15 Classes
Next, we repeat the classification experiment by using a resolution of 250 m. The isolevel map in Figure 5a clearly has higher resolution than the eight level map in Figure 3a. The most and least represented classes are the second and last, which contain approximately 34% and 0.1% of the values, respectively.
Due to the higher resolution, an increase in the misclassification rate is expected. Nevertheless, as is evident in Figures 5b and 6, both the class distributions and the visual patterns are recovered quite well by the reconstructions in all cases.
By comparing the numerical values of F * obtained by increasing the number of classes from 8 to 15, the misclassication rates almost doubles (see the second part of Table 2). Nevertheless, the ratio of the misclassification rates obtained by means of the INNC and

Surface Elevation: Increasing Resolution-Crossover to Continuous Interpolation
It is interesting to investigate the interpolation performance of the INNC method for gradually increasing number of levels. However, for sufficiently large values of N c , it makes more sense to evaluate prediction performance in terms of prediction errors, such as the RMSE defined in Equation (6). Then, INNC can be compared with a standard interpolation method. For this purpose we used the inverse distance weighted (IDW) method [33] implemented in the Matlab function fillnans [34]. The parameters for IDW were as follows: power = 2.7 and unlimited search radius.
In Figure 7 we present the evolution of the RMSE of INNC (blue circles) with increasing number of levels and compared it with the RMSE of IDW (red line) for different degrees of thinning p. In all the cases, one can observe a gradual decrease in RMSE with increasing N c up to a certain threshold value N * c , beyond which the RMSE levels off. This threshold point appears to decrease with increasing p: It corresponds to N * c ≈ 50 for p = 0.33, N * c ≈ 30 for p = 0.50, and N * c ≈ 20 for p = 0.66. In comparison to the IDW method, for p = 0.33, the RMSE of INNC reaches the IDW value of 81.13 ± 0.60 at N c ≈ 25 and beyond N * c ≈ 50 it levels off at the value 72.88 ± 0.92, where the errors represent one standard deviation. The range of RMSE estimates is based on the ensemble of 100 realizations. However, for larger p, the relative superiority of the INNC method seems to diminish. For p = 0.50, the optimal RMSE of the INNC method 88.55 ± 1.03 achieved for N c > 30 is comparable with the IDW value of 87.68 ± 0.81; for p = 0.66, the IDW method is clearly superior with an RMSE 95.86 ± 1.26 versus the optimal INNC value of 108.29 ± 6.26 beyond N * c ≈ 20. In addition, we observed that, for all values of p, the dispersion of the RMSE obtained by INNC is greater than that for IDW. Both of these patterns can be attributed to the fact that IDW uses information from the entire sample at each prediction node. This results in improved estimates compared to INNC, especially for sparser data sets (higher p). The improved performance of IDW compared to INNC for higher p is also due to the spatial patterns of the elevation, which exhibits spatial correlations that extend over a large portion of the grid. A different data set with less spatial continuity would be more favorable for INNC.
The insets in the respective panels of Figure 7 represent the computational efficiency of the two methods. They show the evolution of the CPU time of the INNC method (blue circles) with increasing N c and compared it with that of the IDW method (red line). Since the CPU time of INNC is relatively very small and only increases linearly with the number of levels (note the semi-log scale), one can conclude that even for N c N * c , the INNC CPU times are on average about two orders of magnitude smaller than the CPU time of the IDW method. This behavior reflects the fact that INNC is a local method, while IDW takes into account all the data on the sample subgrid for the interpolation at each prediction node.

Synthetic Data: Scaling with Data Size
Finally, we study the performance of the INNC method on increasing grid sizes N G = L × L, with L = 2 n and n = 5, . . . , 11. We use the above described spatially correlated synthetic data discretized to obtain N c = 8 levels. For illustration, in Figure 8 we show the original data set for the selected size L = 256 after discretization along with the reconstructions for the thinning values p = 0.33, 0.5, and 0.66. The results showing both the interpolation and computational performance for different values of L and p are presented in Figure 9. In particular, Figure 9a shows that the misclassification rates gradually decrease with increasing grid size from initial values of F * = (0.46 ± 0.03, 0.49 ± 0.03, 0.54 ± 0.03 for p = (0.33, 0.5, 0.66) and L = 32 down to F * = (0.31 ± 0.003, 0.33 ± 0.004, 0.36 ± 0.006 for p = (0.33, 0.5, 0.66) and L = 2048. The decrease is less steep at larger L, but nevertheless it continues for all sizes up to L = 2048. Figure 9b shows the behavior of the corresponding CPU time versus grid size on a log-log plot. The plots indicate an almost linear increase with the grid size N G = L 2 (the actual fits produce slightly superlinear scaling with the exponent approximately equal to 1.05). As already observed in the previous cases, the CPU time also slightly increases with p.

Conclusions
We investigated the INNC interpolation method which can be used to fill gaps in gridded spatial data. The latter can represent either processes that take discrete class labels or real values, discrete, or continuous. We showed that INNC is suitable for automatic mapping of large spatial data sets and demonstrated that its interpolation on different real-world and synthetic data sets is competitive against standard methods.
The INNC method is inspired from the Ising model. It is based on minimizing a cost function that measures the distance between sample-based, normalized, discrete correlation energies, and the respective energies of the entire domain (grid). INNC is implemented by using greedy Monte Carlo simulation conditioned by the sample values. Owing to a thoughtful initialization of the unknown values on the prediction subgrid, a greedy optimization approach, and vectorization, INNC is computationally fast. The time needed for the Monte Carlo relaxation is very short, and the resulting CPU time varies almost linearly with respect to both the number of classes and the grid size. Furthermore, the INNC method is universal with regard to the data probability distributions (i.e., it makes no assumptions thanks to its non-parametric nature). In addition, it is almost automatic and can be applied with no ad hoc inputs. The only parametrization in the proposed approach involves the number of discretization classes to be used for continuous data. The number of classes can be set arbitrarily large if high resolution is needed.
The model is demonstrated herein for regular grids. However, the extension to irregularly spaced data is straightforward. The interaction constant J i,j in Equation (1) can be defined via a kernel function (such as the radial basis function). The interaction neighborhood (nearest neighbors) of any point r can be defined to include those points for which their Voronoi cells share a boundary with r. Furthermore, possible extensions could include the incorporation of further-neighbor or/and "multi-spin" correlation energy in the Hamiltonian. Overall, based on the studies presented herein, INNC has great potential as a method for gap filling in remote-sensing data products, with minimal if any intervention by the user. We will investigate this further in forthcoming publications. Data Availability Statement: The soil quality data set (sq1.asc file) is available from the Harmonized World Soil Database at https://webarchive.iiasa.ac.at/Research/LUC/External-Worldsoil-database/HTML/SoilQualityData.html?sb=11 (accessed on 2 July 2021). The surface elevation data set is available from Global Digital Elevation Models at https://ngdc.noaa.gov/mgg/ global/global.html (accessed on 18 June 2021). The code is available from Matlab File Exchange (https://www.mathworks.com/matlabcentral/fileexchange/99749-innc-interpolation-method (accessed on 23 September 2021)).

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: