# Random Forest Spatial Interpolation

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Methodology

#### 2.1.1. Deterministic Interpolation Methods

#### 2.1.2. Kriging

#### 2.1.3. Random Forest

#### 2.1.4. Random Forest Spatial Interpolation

#### 2.2. Datasets and Covariates

#### 2.2.1. Synthetic Dataset

#### 2.2.2. Precipitation Dataset

^{2}(Figure 2). Catalonia was chosen as a study area because it has a well-established network of meteorological stations and observations are freely available through the daily Global Historical Climatological Network (GHCN-daily) [40]. GHCN-daily is an integrated database of daily meteorological summaries, among others precipitation, from land surface stations from over 100,000 stations in 180 countries and territories. The observations were updated and quality assurance checks performed on a daily basis. The Catalonia station dataset that was used to model daily precipitation with the tested methodologies consists of observations from 87 GHCN-daily stations for a three-year period, from 2016 to 2018. All observations which failed any of the GHCN-daily quality assurance checks (2948, 3.1% of the total) were removed from the dataset. Coordinates were reprojected from WGS 84 global reference system to UTM zone 31N projection (which is appropriate for Catalonia) before computing Euclidean distances to nearest stations, as required in RFSI. The station locations and a histogram of the observations are shown in Figure 2. About 69% (63,880 of a total of 92,404 observations) of the GHCN-daily precipitation data are zero. The maximum observed daily precipitation amount is 220.9 mm.

#### 2.2.3. Temperature Dataset

#### 2.3. Accuracy Assessment

#### 2.3.1. Synthetic Case Study

#### 2.3.2. Real-World Case Studies

## 3. Results

#### 3.1. Synthetic Case Study

#### 3.2. Precipitation Case Study

#### 3.2.1. Space–Time Regression Kriging (STRK)

#### 3.2.2. IDW and Random Forest Models

#### 3.2.3. Accuracy Assessment

#### 3.3. Temperature Case Study

#### 3.3.1. IDW and Random Forest models

#### 3.3.2. Accuracy Assessment

## 4. Discussion

#### 4.1. RFSI Performance

- RFSI is much closer to the philosophy of spatial interpolation than standard RF and RFsp. RFSI uses observations nearby in a direct way to predict at a location. RFsp uses a much more indirect way to include the spatial context in RF prediction. In fact, RFSI mimics kriging much more than RFsp, with the additional advantage that it is not restricted to a weighted linear combination of neighbouring observations.
- Compared to kriging, RFSI is easier to fit, because there is no need for semivariogram modelling and stringent stationarity assumptions.
- RFSI provides a model with more interpretative power than RFsp, i.e., the importance of the first, second, third, etc., nearest observations can be assessed and compared with each other (Figure 6) and with the importance of environmental covariates (Figure 10 and Figure 14). RFsp variable importance shows how important buffer distances from observation points are, but this is difficult to interpret, because it is unclear why certain buffer distance layers have high importance and others do not. However, it should be noted that feature importance is difficult to measure objectively in cases where covariates are cross-correlated and their influence may be masked by other covariates.
- RFSI has several orders of magnitude better scaling properties than RFsp. In RFsp the number of spatial covariates equals the number of observations, whereas in RFSI it is optimized and fairly independent of the number of observations.
- Hengl et al. [8] recommended using RFsp for fewer than 1000 locations. For more than 1000 locations RFsp becomes slow because buffer distances cannot be computed quickly (Table 1). The calculation of spatial covariates needed to apply RFSI, (Euclidean) distances and observations to the nearest locations, is not computationally extensive.
- RFsp cannot be spatially cross-validated properly, i.e., with nested LLOCV. Considering that in nested LLOCV entire stations are held out, the buffer distance covariates in the test dataset (consisting of one main fold) and nested folds of the calibration dataset (consisting of the other folds) are not the same. Therefore, RFsp hyperparameters tuned on the nested folds with one set of buffer distance covariates can be a poor choice to make predictions on the test dataset.

#### 4.2. Extensions and Improvements

## 5. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Abbreviations

RFSI | Random Forest Spatial Interpolation |

RFsp | Random Forest for Spatial Predictions framework |

NN | Nearest Neighbour |

IDW | Inverse Distance Weighting |

TS | Trend Surface mapping |

BLUP | Best Linear Unbiased Predictor |

ML | Machine Learning |

RS | Remote Sensing |

RK | Regression Kriging |

KED | Kriging with External Drift |

OK | Ordinary Kriging |

RF | Random Forest |

SVM | Support Vector Machines |

ANN | Artificial Neural Networks |

NEX-GDM | NASA Earth Exchange Gridded Daily Meteorology |

GRF | Geographical Random Forest |

STRK | Space–Time Regression Kriging |

CART | Classification And Regression Trees |

OOB | Out-Of-Bag |

GHCN-daily | daily Global Historical Climatological Network |

DEM | Digital Elevation Model |

IMERG | Integrated Multi-satellitE Retrievals for GPM |

TMAX | Maximum Temperature |

TMIN | Minimum Temperature |

MODIS LST | Moderate Resolution Imaging Spectroradiometer Land Surface Temperature |

CCC | Lin’s Concordance Correlation Coefficient |

MAE | Mean Absolute Error |

RMSE | Root Mean Square Error |

ESS | Error Sum of Squares |

TSS | Total Sum of Squares |

LLOCV | Leave-Location Out Cross-Validation |

QRF | Quantile Regression Forest |

IQR | Interquartile Range |

RFSI${}_{0}$ | RFSI without environmental covariates |

DOY | Day Of Year |

CDATE | Cumulative Day from a Date |

## References

- Thiessen, A.H. Precipitation averages for large areas. Mon. Weather Rev.
**1911**, 39, 1082–1089. [Google Scholar] [CrossRef] - Willmott, C.J.; Rowe, C.M.; Philpot, W.D. Small-Scale Climate Maps: A Sensitivity Analysis of Some Common Assumptions Associated with Grid-Point Interpolation and Contouring. Am. Cartogr.
**1985**, 12, 5–16. [Google Scholar] [CrossRef] - Chorley, R.J.; Haggett, P. Trend-Surface Mapping in Geographical Research. Trans. Inst. Br. Geogr.
**1965**, 37, 47–67. [Google Scholar] [CrossRef] - Matheron, G. Principles of geostatistics. Econ. Geol.
**1963**, 58, 1246–1266. [Google Scholar] [CrossRef] - Goovaerts, P. Geostatistics for Natural Resources Evaluation; Oxford University Press: Oxford, UK, 1997. [Google Scholar] [CrossRef]
- Diggle, P.J.; Ribeiro, P.J. Model-Based Geostatistics; Springer Series in Statistics; Springer: New York, NY, USA, 2007. [Google Scholar] [CrossRef]
- Webster, R.; Oliver, M.A. Geostatistics for Environmental Scientists; Statistics in Practice; John Wiley & Sons, Ltd.: Chichester, UK, 2007. [Google Scholar] [CrossRef][Green Version]
- Hengl, T.; Nussbaum, M.; Wright, M.N.; Heuvelink, G.B.M.; Gräler, B. Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ
**2018**, 6, e5518. [Google Scholar] [CrossRef][Green Version] - Journel, A.G. Nonparametric estimation of spatial distributions. J. Int. Assoc. Math. Geol.
**1983**, 15, 445–468. [Google Scholar] [CrossRef] - Carrera-Hernández, J.; Gaskin, S. Spatio temporal analysis of daily precipitation and temperature in the Basin of Mexico. J. Hydrol.
**2007**, 336, 231–249. [Google Scholar] [CrossRef] - Castro, L.M.; Gironás, J.; Fernández, B. Spatial estimation of daily precipitation in regions with complex relief and scarce data using terrain orientation. J. Hydrol.
**2014**, 517, 481–492. [Google Scholar] [CrossRef] - Gräler, B.; Rehr, M.; Gerharz, L.; Pebesma, E.J. Spatio-Temporal Analysis and Interpolation of PM10 Measurements in Europe for 2009. ETC/ACM Tech. Paper 2012/08 2013, 30p. Available online: https://www.eionet.europa.eu/etcs/etc-atni/products/etc-atni-reports/etcacm_2012_8_spatio-temp_pm10analyses (accessed on 1 February 2020).
- Li, J.; Heap, A.D. Spatial interpolation methods applied in the environmental sciences: A review. Environ. Model. Softw.
**2014**, 53, 173–189. [Google Scholar] [CrossRef] - Li, J.; Heap, A.D.; Potter, A.; Daniell, J.J. Application of machine learning methods to spatial interpolation of environmental variables. Environ. Model. Softw.
**2011**, 26, 1647–1659. [Google Scholar] [CrossRef] - Appelhans, T.; Mwangomo, E.; Hardy, D.R.; Hemp, A.; Nauss, T. Evaluating machine learning approaches for the interpolation of monthly air temperature at Mt. Kilimanjaro, Tanzania. Spat. Stat.
**2015**, 14, 91–113. [Google Scholar] [CrossRef][Green Version] - Hengl, T.; Heuvelink, G.B.M.; Kempen, B.; Leenaars, J.G.B.; Walsh, M.G.; Shepherd, K.D.; Sila, A.; MacMillan, R.A.; Mendes de Jesus, J.; Tamene, L.; et al. Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions. PLoS ONE
**2015**, 10, e0125814. [Google Scholar] [CrossRef] - Kirkwood, C.; Cave, M.; Beamish, D.; Grebby, S.; Ferreira, A. A machine learning approach to geochemical mapping. J. Geochem. Explor.
**2016**, 167, 49–61. [Google Scholar] [CrossRef][Green Version] - Hashimoto, H.; Wang, W.; Melton, F.S.; Moreno, A.L.; Ganguly, S.; Michaelis, A.R.; Nemani, R.R. High-resolution mapping of daily climate variables by aggregating multiple spatial data sets with the random forest algorithm over the conterminous United States. Int. J. Climatol.
**2019**, 39, 2964–2983. [Google Scholar] [CrossRef] - Veronesi, F.; Schillaci, C. Comparison between geostatistical and machine learning models as predictors of topsoil organic carbon with a focus on local uncertainty estimation. Ecol. Indic.
**2019**, 101, 1032–1044. [Google Scholar] [CrossRef] - Mohsenzadeh Karimi, S.; Kisi, O.; Porrajabali, M.; Rouhani-Nia, F.; Shiri, J. Evaluation of the support vector machine, random forest and geo-statistical methodologies for predicting long-term air temperature. ISH J. Hydraul. Eng.
**2018**. [Google Scholar] [CrossRef] - He, X.; Chaney, N.W.; Schleiss, M.; Sheffield, J. Spatial downscaling of precipitation using adaptable random forests. Water Resour. Res.
**2016**, 52, 8217–8237. [Google Scholar] [CrossRef] - Čeh, M.; Kilibarda, M.; Lisec, A.; Bajat, B. Estimating the Performance of Random Forest versus Multiple Regression for Predicting Prices of the Apartments. ISPRS Int. J. Geo-Inf.
**2018**, 7, 168. [Google Scholar] [CrossRef][Green Version] - Georganos, S.; Grippa, T.; Niang Gadiaga, A.; Linard, C.; Lennert, M.; Vanhuysse, S.; Mboga, N.; Wolff, E.; Kalogirou, S. Geographical random forests: A spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto Int.
**2019**. [Google Scholar] [CrossRef][Green Version] - Behrens, T.; Schmidt, K.; Viscarra Rossel, R.A.; Gries, P.; Scholten, T.; MacMillan, R.A. Spatial modelling with Euclidean distance fields and machine learning. Eur. J. Soil Sci.
**2018**, 69, 757–770. [Google Scholar] [CrossRef] - Zhu, X.; Zhang, Q.; Xu, C.Y.; Sun, P.; Hu, P. Reconstruction of high spatial resolution surface air temperature data across China: A new geo-intelligent multisource data-based machine learning technique. Sci. Total Environ.
**2019**, 665, 300–313. [Google Scholar] [CrossRef] [PubMed] - Hengl, T.; Heuvelink, G.B.M.; Perčec Tadić, M.; Pebesma, E.J. Spatio-temporal prediction of daily temperatures using time-series of MODIS LST images. Theor. Appl. Climatol.
**2012**, 107, 265–277. [Google Scholar] [CrossRef][Green Version] - R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2012. [Google Scholar]
- Burrough, P.A.; McDonnell, R. Principles of Geographical Information Systems; Oxford University Press: Oxford, UK, 1989. [Google Scholar]
- Webster, R. Is soil variation random? Geoderma
**2000**, 97, 149–163. [Google Scholar] [CrossRef] - Chilès, J.P.; Delfiner, P. Geostatistics: Modeling Spatial Uncertainty, 2nd ed.; Wiley Series in Probability and Statistics; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2012. [Google Scholar] [CrossRef]
- Ahmed, S.; De Marsily, G. Comparison of geostatistical methods for estimating transmissivity using data on transmissivity and specific capacity. Water Resour. Res.
**1987**. [Google Scholar] [CrossRef] - Kilibarda, M.; Hengl, T.; Heuvelink, G.B.M.; Gräler, B.; Pebesma, E.J.; Perčec Tadić, M.; Bajat, B. Spatio-temporal interpolation of daily temperatures for global land areas at 1 km resolution. J. Geophys. Res. Atmos.
**2014**, 119, 2294–2313. [Google Scholar] [CrossRef][Green Version] - Breiman, L. Bagging predictors. Mach. Learn.
**1996**, 24, 123–140. [Google Scholar] [CrossRef][Green Version] - Breiman, L. Random Forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef][Green Version] - Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Routledge: Abingdon, UK, 2017. [Google Scholar] [CrossRef]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer Texts in Statistics; Springer: New York, NY, USA, 2013; Volume 103. [Google Scholar] [CrossRef]
- Amit, Y.; Geman, D. Shape Quantization and Recognition with Randomized Trees. Neural Comput.
**1997**. [Google Scholar] [CrossRef][Green Version] - Pebesma, E.J. Multivariable geostatistics in S: The gstat package. Comput. Geosci.
**2004**, 30, 683–691. [Google Scholar] [CrossRef] - Bivand, R.S.; Pebesma, E.J.; Gómez-Rubio, V. Applied Spatial Data Analysis with R; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
- Menne, M.J.; Durre, I.; Vose, R.S.; Gleason, B.E.; Houston, T.G. An Overview of the Global Historical Climatology Network-Daily Database. J. Atmos. Ocean. Technol.
**2012**, 29, 897–910. [Google Scholar] [CrossRef] - Huffman, G.J.; Bolvin, D.T.; Nelkin, E.J. Integrated Multi-satellitE Retrievals for GPM (IMERG), Late Run, Version V06A. 2014. Available online: ftp://jsimpson.pps.eosdis.nasa.gov/data/imerg/gis/ (accessed on 31 March 2019).
- Lin, L.I.K. A Concordance Correlation Coefficient to Evaluate Reproducibility. Biometrics
**1989**, 45, 255–268. [Google Scholar] [CrossRef] [PubMed] - Wright, M.N.; Ziegler, A. Ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. J. Stat. Softw.
**2017**, 77. [Google Scholar] [CrossRef][Green Version] - Elseberg, J.; Magnenat, S.; Siegwart, R.; Andreas, N. Comparison of nearest-neighbor-search strategies and implementations for efficient shape registration. J. Softw. Eng. Robot.
**2012**, 3, 2–12. [Google Scholar] - Meyer, H.; Reudenbach, C.; Hengl, T.; Katurji, M.; Nauss, T. Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation. Environ. Model. Softw.
**2018**, 101, 1–9. [Google Scholar] [CrossRef] - Pejović, M.; Nikolić, M.; Heuvelink, G.B.M.; Hengl, T.; Kilibarda, M.; Bajat, B. Sparse regression interaction models for spatial prediction of soil properties in 3D. Comput. Geosci.
**2018**, 118. [Google Scholar] [CrossRef] - Meinshausen, N. Quantile regression forests. J. Mach. Learn. Res.
**2006**, 7, 983–999. [Google Scholar] - Heuvelink, G.B.M.; Pebesma, E.J.; Gräler, B. Space-Time Geostatistics. In Encyclopedia of GIS; Shekhar, S., Xiong, H., Zhou, X., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 1–20. [Google Scholar] [CrossRef]
- Tank, A.K.; Zwiers, F.W.; Zhang, X. Guidelines on Analysis of Extremes in a Changing Climate in Support of Informed Decisions for Adaptation; Technical Report WCDMP-No. 72, WMO-TD No. 1500; World Meteorological Organization: Geneva, Switzerland, 2009. [Google Scholar]
- Zimmerman, D.; Pavlik, C.; Ruggles, A.; Armstrong, M.P. An experimental comparison of ordinary and universal kriging and inverse distance weighting. Math. Geol.
**1999**, 31, 375–390. [Google Scholar] [CrossRef] - MacCormack, K.E.; Brodeur, J.J.; Eyles, C.H. Evaluating the impact of data quantity, distribution and algorithm selection on the accuracy of 3D subsurface models using synthetic grid models of varying complexity. J. Geogr. Syst.
**2013**, 15, 71–88. [Google Scholar] [CrossRef] - Nevtipilova, V.; Pastwa, J.; Boori, M.S.; Vozenilek, V. Testing Artificial Neural Network (ANN) for Spatial Interpolation. J. Geol. Geosci.
**2014**, 3, 1–9. [Google Scholar] [CrossRef][Green Version] - Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. Statistical and Machine Learning forecasting methods: Concerns and ways forward. PLoS ONE
**2018**, 13, e0194889. [Google Scholar] [CrossRef][Green Version] - Malamos, N.; Koutsoyiannis, D. Bilinear surface smoothing for spatial interpolation with optional incorporation of an explanatory variable. Part 2: Application to synthesized and rainfall data. Hydrol. Sci. J.
**2016**, 61, 527–540. [Google Scholar] [CrossRef] - Liao, Y.; Li, D.; Zhang, N. Comparison of interpolation models for estimating heavy metals in soils under various spatial characteristics and sampling methods. Trans. GIS
**2018**, 22, 409–434. [Google Scholar] [CrossRef] - Qiao, P.; Li, P.; Cheng, Y.; Wei, W.; Yang, S.; Lei, M.; Chen, T. Comparison of common spatial interpolation methods for analyzing pollutant spatial distributions at contaminated sites. Environ. Geochem. Health
**2019**, 41, 2709–2730. [Google Scholar] [CrossRef] - Long, J.; Liu, Y.; Xing, S.; Zhang, L.; Qu, M.; Qiu, L.; Huang, Q.; Zhou, B.; Shen, J. Optimal interpolation methods for farmland soil organic matter in various landforms of a complex topography. Ecol. Indic.
**2020**, 110, 105926. [Google Scholar] [CrossRef] - Goovaerts, P. Estimation or simulation of soil properties? An optimization problem with conflicting criteria. Geoderma
**2000**, 97, 165–186. [Google Scholar] [CrossRef] - Wadoux, A.M.; Brus, D.J.; Heuvelink, G.B.M. Sampling design optimization for soil mapping with random forest. Geoderma
**2019**, 355, 113913. [Google Scholar] [CrossRef] - Davies, M.M.; van der Laan, M.J. Optimal Spatial Prediction Using Ensemble Machine Learning. Int. J. Biostat.
**2016**, 12, 179–201. [Google Scholar] [CrossRef][Green Version]

**Figure 2.**GHCN-daily station locations on top of a digital elevation model (DEM) of the study area (

**left**) and histogram of daily precipitation for Catalonia (

**right**). The histogram contains 92,404 GHCN-daily observations for the 2016–2018 period.

**Figure 4.**Comparison of average MAE estimated for each of the interpolation methods, for all nugget-to-sill ratios and ranges. Coloured bars are average MAE for test locations from 100 different simulations. Error bars are standard errors computed from 100 simulations.

**Figure 5.**Prediction maps made using 500 sample locations with nugget-to-sill ratio 0.25 and range 50, for one of the 100 simulated realities. The top left map (SIM) shows the simulated reality and the locations of the 500 samples.

**Figure 6.**Covariate importance plot for RFsp (

**left**) and RFSI (

**right**), for the case shown in Figure 5. The importance index is scaled to a maximum of 1, obsi and disti represent observations and distances to the i-th nearest observation location, and layer.i represents buffer distances to the i-th observation location.

**Figure 7.**RFSI prediction maps made using 500 sample locations with nugget-to-sill ratio 0.25 and range 50, with different number of nearest locations (n).

**Figure 8.**RMSE vs number of nearest locations (n) used in RFSI for one simulation with nugget-to-sill ratio 0.25, ranges 50 and 200, using 100 (

**top**), 500 (

**middle**) and 2000 (

**bottom**) sample locations. Larger discs represent the optimal number of nearest locations with minimum RMSE.

**Figure 10.**Covariate importance plot for RF (

**left**), RFsp (

**middle**) and RFSI (

**right**), for the precipitation case study. The importance index is scaled to a maximum of 1. The importance of covariates IMERG, TMAX, and TMIN is shown in red.

**Figure 11.**Scatter density plots of predictions vs. observations with 1:1 line for the precipitation case study.

**Figure 12.**Prediction maps of daily precipitation (mm) for the five models, for 1–4 January 2016. The bottom row shows the maximum observed precipitation for each day.

**Figure 14.**Covariate importance plot for RF (

**left**), RFsp (

**middle**) and RFSI (

**right**), for the temperature case study. The importance index is scaled to a maximum of 1. The importance of environmental covariates is shown in red.

**Figure 15.**Prediction maps of daily temperature (${}^{\xb0}$C) for the four models, for 2 February 2008. The bottom row shows the maximum and minimum observed temperature.

**Table 1.**Distance calculation time and modelling time for RFSI and RFsp, and prediction time for RFSI, RFsp and OK. All results refer to the synthetic case study and represent the average computing time computed from 100 simulations. All calculations and time estimations were done on a personal computer with Intel

^{®}Core™ i7-7820X CPU @ 3.60GHz × 16 processor and 126 GB of RAM.

Criteria | Method | Number of Points | |||||
---|---|---|---|---|---|---|---|

100 | 200 | 500 | 1000 | 2000 | 5000 | ||

Distance calculation time [s] | RFsp | 24.98 | 47.75 | 114.42 | 263.08 | 477.37 | 3832.88 |

RFSI | 1.40 | 1.48 | 1.62 | 1.65 | 1.69 | 1.75 | |

Modelling time [s] | RFsp | 0.06 | 0.27 | 2.35 | 13.50 | 71.73 | 498.21 |

RFSI | 0.02 | 0.04 | 0.09 | 0.20 | 0.42 | 1.18 | |

Prediction time [s] | OK | 5.25 | 5.72 | 6.38 | 6.81 | 7.11 | 8.03 |

RFsp | 5.47 | 9.57 | 22.30 | 46.32 | 70.58 | 312.12 | |

RFSI | 2.93 | 3.37 | 4.05 | 4.74 | 5.60 | 6.83 |

Component | Nugget [mm ^{2}] | Sill [mm ^{2}] | Range | Function | Anisotropy Ratio |
---|---|---|---|---|---|

Spatial | 0.00 | 0.89 | 218.8 km | Spherical | n/a |

Temporal | 1.63 | 4.15 | 2.6 days | Spherical | n/a |

Spatio-temporal | 9.51 | 11.30 | 91.7 km | Spherical | 120 km/day |

Model | Mtry | Min.Node.Size | Sample.Fraction | n | p |
---|---|---|---|---|---|

IDW | n/a | n/a | n/a | 13 | 2.2 |

RF | 2 | 20 | 0.65 | n/a | n/a |

RFsp | 58 | 4 | 0.29 | n/a | n/a |

RFSI | 4 | 6 | 0.95 | 7 | n/a |

**Table 4.**Accuracy metrics of all six prediction methods as assessed using nested 5-fold LLOCV for the precipitation case study.

Method | R${}_{1:1}^{2}$ [%] | CCC | MAE [mm] | RMSE [mm] |
---|---|---|---|---|

STRK | 67.5 | 0.815 | 1.2 | 3.9 |

IDW | 69.6 | 0.820 | 1.1 | 3.8 |

RF | 49.4 | 0.674 | 1.7 | 4.9 |

RFsp | 53.3 | 0.690 | 1.6 | 4.7 |

RFSI | 69.5 | 0.820 | 1.1 | 3.8 |

RFSI${}_{0}$ | 68.6 | 0.814 | 1.2 | 3.9 |

**Table 5.**Performance of all models for precipitation below and above 1 mm. Values for the hits and misses are number of observations (obs.) for a given condition. Predictions (pred.) used in this table are from nested 5-fold LLOCV. Overall accuracy represents the percentage of correct classifications. Numbers in bold represent the best performance.

Hits | Misses | RMSE [mm] | Overall | |||||
---|---|---|---|---|---|---|---|---|

Method | obs. pred. | < 1 mm < 1 mm | ≥ 1 mm ≥ 1 mm | < 1 mm ≥ 1 mm | ≥ 1 mm < 1 mm | < 1 mm | ≥ 1 mm | Accuracy [%] |

STRK | 68,272 | 15,894 | 6307 | 1847 | 1.2 | 8.6 | 91.2 | |

IDW | 68,398 | 16,164 | 6181 | 1577 | 1.0 | 8.4 | 91.6 | |

RF | 63,382 | 14,914 | 11,197 | 2827 | 1.7 | 10.6 | 84.8 | |

RFsp | 64,235 | 15,273 | 10,344 | 2468 | 1.6 | 10.2 | 86.1 | |

RFSI | 68,031 | 16,524 | 6548 | 1217 | 1.0 | 8.4 | 91.6 | |

RFSI${}_{0}$ | 67,917 | 16,535 | 6662 | 1206 | 1.0 | 8.5 | 91.5 |

Model | Mtry | Min.Node.Size | Sample.Fraction | n | p |
---|---|---|---|---|---|

IDW | n/a | n/a | n/a | 11 | 1.8 |

RF | 6 | 3 | 0.85 | n/a | n/a |

RFsp | 154 | 2 | 0.77 | n/a | n/a |

RFSI | 5 | 15 | 0.90 | 10 | n/a |

**Table 7.**Accuracy metrics of all six prediction methods as assessed using nested 10-fold LLOCV for the temperature case study. Note that accuracy metrics for STRK are taken from Hengl et al. [26].

Method | R${}_{1:1}^{2}$ [%] | CCC | MAE [mm] | RMSE [mm] |
---|---|---|---|---|

STRK | 91.0 | n/a | n/a | 2.4 |

IDW | 95.0 | 0.974 | 1.2 | 1.8 |

RF | 95.7 | 0.978 | 1.1 | 1.6 |

RFsp | 95.5 | 0.976 | 1.1 | 1.6 |

RFSI | 96.6 | 0.983 | 1.0 | 1.4 |

RFSI${}_{0}$ | 94.9 | 0.974 | 1.2 | 1.8 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Sekulić, A.; Kilibarda, M.; Heuvelink, G.B.M.; Nikolić, M.; Bajat, B. Random Forest Spatial Interpolation. *Remote Sens.* **2020**, *12*, 1687.
https://doi.org/10.3390/rs12101687

**AMA Style**

Sekulić A, Kilibarda M, Heuvelink GBM, Nikolić M, Bajat B. Random Forest Spatial Interpolation. *Remote Sensing*. 2020; 12(10):1687.
https://doi.org/10.3390/rs12101687

**Chicago/Turabian Style**

Sekulić, Aleksandar, Milan Kilibarda, Gerard B.M. Heuvelink, Mladen Nikolić, and Branislav Bajat. 2020. "Random Forest Spatial Interpolation" *Remote Sensing* 12, no. 10: 1687.
https://doi.org/10.3390/rs12101687