Imputation of GPS Coordinate Time Series Using missForest
Abstract
:1. Introduction
2. Methods and Data
2.1. missForest
- , the non-missing observed values of variable ;
- , the missing values of variable ;
- , the variable with observations other than ;
- , the variables with observations other than .
Algorithm 1 missForest |
Require: an matrix, stopping criterion 1: Sort by amount of missing values of stations descend; 2: Make an initial guess for missing values using another method; 3: while not do 4: store previously imputed matrix; 5: for s in do 6: Fit a random forest:; 7: Predict using ; 8: update impute matrix, using predicted ; 9: update 10: return the imputed matrix ; |
2.2. Baseline Methods
2.3. Evaluation Indicators
2.4. PCA
2.5. Out-of-Bag Error (OOB)
2.6. GPS Time Series and Experiment Settings
3. Imputation Results
3.1. Different Gap Size Analysis
3.1.1. 2-Day Gap
3.1.2. 7-Day Gap
3.1.3. 30-Day Gap
3.1.4. 180-Day Gap
3.2. Different Missing Rate Analysis
3.3. OOB versus NRMSE
3.4. PCA of Different Gap Sizes
3.5. Time Consumption
4. Discussion and Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Liu, N.; Dai, W.; Santerre, R.; Kuang, C. A MATLAB-Based Kriged Kalman Filter Software for Interpolating Missing Data in GNSS Coordinate Time Series. GPS Solut. 2017, 22, 25. [Google Scholar] [CrossRef]
- Shirzaei, M.; Bürgmann, R.; Foster, J.; Walter, T.R.; Brooks, B.A. Aseismic Deformation across the Hilina Fault System, Hawaii, Revealed by Wavelet Analysis of InSAR and GPS Time Series. Earth Planet. Sci. Lett. 2013, 376, 12–19. [Google Scholar] [CrossRef]
- Liu, B.; King, M.; Dai, W. Common Mode Error in Antarctic GPS Coordinate Time-Series on Its Effect on Bedrock-Uplift Estimates. Geophys. J. Int. 2018, 214, 1652–1664. [Google Scholar] [CrossRef]
- Dong, D.; Fang, P.; Bock, Y.; Webb, F.; Prawirodirdjo, L.; Kedar, S.; Jamason, P. Spatiotemporal Filtering Using Principal Component Analysis and Karhunen-Loeve Expansion Approaches for Regional GPS Network Analysis. J. Geophys. Res. Solid Earth 2006, 111. [Google Scholar] [CrossRef] [Green Version]
- He, X.; Hua, X.; Yu, K.; Xuan, W.; Lu, T.; Zhang, W.; Chen, X. Accuracy Enhancement of GPS Time Series Using Principal Component Analysis and Block Spatial Filtering. Adv. Space Res. 2015, 55, 1316–1327. [Google Scholar] [CrossRef]
- Chen, Q.; Van Dam, T.; Sneeuw, N.; Collilieux, X.; Weigelt, M.; Rebischung, P. Singular Spectrum Analysis for Modeling Seasonal Signals from GPS Time Series. J. Geodyn. 2013, 72, 25–35. [Google Scholar] [CrossRef]
- Donders, A.R.T.; Van der Heijden, G.J.M.G.; Stijnen, T.; Moons, K.G.M. Review: A Gentle Introduction to Imputation of Missing Values. J. Clin. Epidemiol. 2006, 59, 1087–1091. [Google Scholar] [CrossRef] [PubMed]
- Robinson, A.P.; Hamann, J.D. Forest Analytics with R: An Introduction; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
- Xu, C. Reconstruction of Gappy GPS Coordinate Time Series Using Empirical Orthogonal Functions. J. Geophys. Res. Solid Earth 2016, 121, 9020–9033. [Google Scholar] [CrossRef]
- Wang, X.; Cheng, Y.; Wu, S.; Zhang, K. An Effective Toolkit for the Interpolation and Gross Error Detection of GPS Time Series. Surv. Rev. 2016, 48, 202–211. [Google Scholar] [CrossRef]
- Schneider, T. Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. J. Clim. 2001, 14, 20. [Google Scholar] [CrossRef]
- Li, W.; Li, F.; Zhang, S.; Lei, J.; Zhang, Q.; Yuan, L. Spatiotemporal Filtering and Noise Analysis for Regional GNSS Network in Antarctica Using Independent Component Analysis. Remote. Sens. 2019, 11, 386. [Google Scholar] [CrossRef] [Green Version]
- Van Buuren, S.; Oudshoorn, K. Flexible Multivariate Imputation by MICE; TNO: Leiden, The Netherlands, 1999. [Google Scholar]
- Little, R.J.A.; Rubin, D.B. Bayes and Multiple Imputation. In Statistical Analysis with Missing Data; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2002; pp. 200–220. ISBN 978-1-119-01356-3. [Google Scholar]
- Barnard, J.; Rubin, D.B. Small-Sample Degrees of Freedom with Multiple Imputation. Biometrika 1999, 86, 948–955. [Google Scholar] [CrossRef]
- Blewitt, G.; Lavallée, D. Effect of Annual Signals on Geodetic Velocity. J. Geophys. Res. Solid Earth 2002, 107, ETG 9-1–ETG 9-11. [Google Scholar] [CrossRef] [Green Version]
- Forsyth, D.A.; Ponce, J. Computer Vision: A Modern Approach, 2nd Ed. ed; Pearson: London, UK, 2012; ISBN 978-0-13-608592-8. [Google Scholar]
- Szeliski, R. Computer Vision: Algorithms and Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
- Chowdhury, G.G. Natural Language Processing. Annu. Rev. Inf. Sci. Technol. 2003, 37, 51–89. [Google Scholar] [CrossRef] [Green Version]
- Indurkhya, N.; Damerau, F.J. Handbook of Natural Language Processing; CRC Press: Boca Raton, FL, USA, 2010; Volume 2. [Google Scholar]
- Cao, W.; Wang, D.; Li, J.; Zhou, H.; Li, L.; Li, Y. BRITS: Bidirectional Recurrent Imputation for Time Series. In Advances in Neural Information Processing Systems 31; Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2018; pp. 6775–6785. [Google Scholar]
- Yoon, J.; Jordon, J.; Van der Schaar, M. GAIN: Missing Data Imputation Using Generative Adversarial Nets. In Proceedings of the 35th International Conference on Machine Learning, PLMR, Stockholm Sweden, 10–15 July 2018; 2018; 80, pp. 5689–5698. [Google Scholar]
- Stekhoven, D.J.; Buhlmann, P. missForest--Non-Parametric Missing Value Imputation for Mixed-Type Data. Bioinformatics 2012, 28, 112–118. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Waljee, A.K.; Mukherjee, A.; Singal, A.G.; Zhang, Y.; Warren, J.; Balis, U.; Marrero, J.; Zhu, J.; Higgins, P.D. Comparison of Imputation Methods for Missing Laboratory Data in Medicine. BMJ Open 2013, 3. [Google Scholar] [CrossRef] [PubMed]
- Shah, A.D.; Bartlett, J.W.; Carpenter, J.; Nicholas, O.; Hemingway, H. Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study. Am. J. Epidemiol. 2014, 179, 764–774. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dyer, S.A.; Dyer, J.S. Cubic-Spline Interpolation. IEEE Instrum. Meas. Mag. 2001, 4, 44–46. [Google Scholar] [CrossRef]
- Smith, F.J. An Algorithm for Summing Orthogonal Polynomial Series and Their Derivatives with Applications to Curve-Fitting and Interpolation. Math. Comput. 1965, 19, 33–36. [Google Scholar] [CrossRef]
- Farouki, R.T.; Neff, C.A. Hermite Interpolation by Pythagorean Hodograph Quintics. Math. Comp. 1995, 64, 1589–1609. [Google Scholar] [CrossRef]
- Abdi, H.; Williams, L.J. Principal Component Analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer Texts in Statistics; Springer: New York, NY, USA, 2013; Volume 103, ISBN 978-1-4614-7137-0. [Google Scholar]
- Janitza, S.; Hornung, R. On the Overestimation of Random Forest’s out-of-Bag Error. PLoS ONE 2018, 13, e0201904. [Google Scholar] [CrossRef] [PubMed]
- Blewitt, G.; Hammond, W.C.; Kreemer, C. Harnessing the GPS Data Explosion for Interdisciplinary Science. Eos 2018, 99, 1–2. [Google Scholar] [CrossRef]
Method | Evaluation | 90 * 2-Day | 20 * 7-Day | 6 * 30-Day | 1 * 180-Day |
---|---|---|---|---|---|
Cubic spline | MAE (mm) | 5.00 | 5.10 | 5.31 | 6.18 |
NRMSE | 0.115 | 0.116 | 0.121 | 0.141 | |
Rp | 0.68 | 0.67 | 0.68 | 0.42 | |
Orthogonal polynomial | MAE (mm) | 3.67 | 6.27 | 19.08 | 99.88 |
NRMSE | 0.088 | 0.152 | 0.471 | 2.509 | |
Rp | 0.85 | 0.64 | 0.30 | 0.11 | |
Hermite | MAE (mm) | 3.02 | 3.68 | 4.47 | 5.63 |
NRMSE | 0.072 | 0.088 | 0.106 | 0.129 | |
Rp | 0.89 | 0.83 | 0.79 | 0.61 | |
RegEM | MAE (mm) | 2.81 | 2.84 | 2.96 | 3.40 |
NRMSE | 0.065 | 0.066 | 0.068 | 0.074 | |
Rp | 0.92 | 0.91 | 0.91 | 0.83 | |
missForest | MAE (mm) | 2.55 | 2.61 | 2.71 | 2.95 |
NRMSE | 0.062 | 0.063 | 0.065 | 0.066 | |
Rp | 0.93 | 0.92 | 0.92 | 0.90 |
Method | Evaluation | 10% | 20% | 30% | 40% |
---|---|---|---|---|---|
Cubic spline | MAE (mm) | 5.48 | 5.52 | 5.53 | 5.58 |
NRMSE | 0.126 | 0.128 | 0.129 | 0.130 | |
Rp | 0.64 | 0.64 | 0.64 | 0.62 | |
Orthogonal polynomial | MAE (mm) | 6.89 | 6.89 | 6.92 | 7.04 |
NRMSE | 0.161 | 0.161 | 0.162 | 0.165 | |
Rp | 0.65 | 0.64 | 0.64 | 0.62 | |
Hermite | MAE (mm) | 3.71 | 3.82 | 4.02 | 4.17 |
NRMSE | 0.093 | 0.093 | 0.095 | 0.098 | |
Rp | 0.82 | 0.82 | 0.82 | 0.81 | |
RegEM | MAE (mm) | 2.76 | 2.82 | 3.12 | 3.48 |
NRMSE | 0.065 | 0.066 | 0.073 | 0.081 | |
Rp | 0.92 | 0.91 | 0.90 | 0.88 | |
missForest | MAE (mm) | 2.64 | 2.66 | 2.68 | 2.74 |
NRMSE | 0.061 | 0.062 | 0.063 | 0.064 | |
Rp | 0.92 | 0.92 | 0.92 | 0.91 |
Method | Gap Size | PC1 (%) | PC2 (%) | PC3 (%) | SUM (%) | Ddistance | Aangle |
---|---|---|---|---|---|---|---|
Original | - | 75.24 | 8.73 | 3.57 | 87.55 | 0 | 0 |
Cubic spline | 2 | 72.53 | 9.17 | 3.63 | 85.35 | 0.049 | 2.828 |
7 | 72.22 | 9.09 | 3.56 | 84.88 | 0.050 | 2.890 | |
30 | 72.32 | 9.41 | 3.38 | 85.12 | 0.048 | 2.790 | |
180 | 71.05 | 9.28 | 3.51 | 83.85 | 0.057 | 3.26 | |
Orthogonal polynomial | 2 | 74.17 | 8.76 | 3.58 | 86.52 | 0.009 | 0.555 |
7 | 69.87 | 8.55 | 3.72 | 82.15 | 0.318 | 1.823 | |
30 | 45.89 | 6.86 | 4.25 | 57.02 | 0.341 | 19.707 | |
180 | 22.00 | 5.69 | 5.41 | 33.10 | 1.041 | 63.12 | |
Hermite | 2 | 74.89 | 8.81 | 3.56 | 87.27 | 0.023 | 1.374 |
7 | 74.11 | 8.77 | 3.60 | 86.49 | 0.026 | 1.494 | |
30 | 73.37 | 8.87 | 3.41 | 85.66 | 0.029 | 1.668 | |
180 | 71.48 | 9.26 | 3.53 | 84.29 | 0.050 | 2.881 | |
RegEM | 2 | 75.86 | 8.65 | 3.54 | 88.06 | 0.008 | 0.501 |
7 | 75.79 | 8.69 | 3.51 | 88.00 | 0.015 | 0.859 | |
30 | 75.83 | 8.73 | 3.44 | 88.01 | 0.027 | 1.572 | |
180 | 75.27 | 8.69 | 3.57 | 87.53 | 0.037 | 2.121 | |
missForest | 2 | 76.36 | 8.59 | 3.43 | 88.38 | 0.009 | 0.572 |
7 | 76.38 | 8.61 | 3.41 | 88.41 | 0.010 | 0.619 | |
30 | 76.45 | 8.68 | 3.31 | 88.45 | 0.013 | 0.764 | |
180 | 76.23 | 8.55 | 3.45 | 88.24 | 0.024 | 1.408 |
METHOD | Cubic Spline | Orthogonal Polynomial | Hermite | RegEM | missForest |
---|---|---|---|---|---|
Time (s) | 0.14 | 0.10 | 0.09 | 0.48 | 5.58 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, S.; Gong, L.; Zeng, Q.; Li, W.; Xiao, F.; Lei, J. Imputation of GPS Coordinate Time Series Using missForest. Remote Sens. 2021, 13, 2312. https://doi.org/10.3390/rs13122312
Zhang S, Gong L, Zeng Q, Li W, Xiao F, Lei J. Imputation of GPS Coordinate Time Series Using missForest. Remote Sensing. 2021; 13(12):2312. https://doi.org/10.3390/rs13122312
Chicago/Turabian StyleZhang, Shengkai, Li Gong, Qi Zeng, Wenhao Li, Feng Xiao, and Jintao Lei. 2021. "Imputation of GPS Coordinate Time Series Using missForest" Remote Sensing 13, no. 12: 2312. https://doi.org/10.3390/rs13122312
APA StyleZhang, S., Gong, L., Zeng, Q., Li, W., Xiao, F., & Lei, J. (2021). Imputation of GPS Coordinate Time Series Using missForest. Remote Sensing, 13(12), 2312. https://doi.org/10.3390/rs13122312