Optimal Estimate of Global Biome—Speciﬁc Parameter Settings to Reconstruct NDVI Time Series with the Harmonic ANalysis of Time Series (HANTS) Method

: Terrestrial remote sensing data products retrieved from radiometric measurements in the optical and thermal infrared spectrum such as vegetation spectral indices can be heavily contaminated by atmospheric conditions, including cloud and aerosol layers. This contamination results in gaps or noisy observations. The harmonic analysis of time series (HANTS) has been widely used for time series reconstruction of remote sensing imagery in recent decades. To use HANTS model, a series of parameters, such as number of frequencies (NF), ﬁtting error tolerance (FET), degree of over-determinedness (DoD), and regularization factor (Delta), need to be deﬁned by users. These parameters provide ﬂexibilities, but also make it difﬁcult for non-expert users to determine appropriate settings for speciﬁc applications. This study systematically evaluated the reconstruction performance of the model under different parameter setting scenarios by simulating pixel-wise reference and noisy NDVI time series. The results of these numerical experiments were further used to identify optimal settings and improve global NDVI reconstruction performance. The results suggested optimal settings for different areas (local optimization). If a user opts to use unique settings for global reconstruction, the setting NF = 4, FET = 0.05, DoD = 5, and Delta = 0.5 can produce the best performance across all setting scenarios (global optimization). In addition, several internal improvements, such as dynamic weighting scheme, polynomial and inter-annual harmonic components, and ancillary attributes of input data can be used to further improve the performance of reconstruction. With these results, future non-expert users can easily determine appropriate settings of HANTS for speciﬁc applications in different regions.


Introduction
A wealth of terrestrial satellite data products has been accumulated since the Earth Resources Technology Satellite (ERTS-1) was launched into space in 1972 [1,2]. Vegetation spectral indices such as the Normalized Difference Vegetation Index (NDVI) are widely applied to monitor and to evaluate regional and continental vegetation dynamics [3,4]. The medium to coarse spatial resolution sensors, such as Advanced Very High Resolution Radiometer (AVHRR), Moderate Resolution Imaging Spectroradiometer (MODIS), Satellite Pour l'Observation de la Terre Vegetation (SPOT-VEGETATION), and Visible Infrared Imaging Radiometer Suite (VIIRS), onboard sun-synchronous polar orbiting satellites provide daily global coverage observations [5,6]. Moreover, satellites carrying sensors with higher spatial resolution, such as Landsat Thematic Mapper (TM) series and Sentinal-2A/B Multispectral Instrument (MSI), have a re-revisit time ranging from four to more than 15 days [7,8]. The instruments onboard geo-synchronous orbit satellites (such as the Spinning Enhanced Visible and InfraRed Imager (SEVIRI) onboard Meteosat Second Generation (MSG) spacecraft and Advanced Geosynchronous Radiation Imager (AGRI) onboard FengYun-4) can even capture regional vegetation variability at the sub-daily scale [9]. In other words, the temporal coverage of NDVI observations is limited by the combination of platforms and sensors [10]. Moreover, the measurements of spectral radiance by a space-borne imaging radiometer are heavily affected by clouds, aerosols, and atmospheric water vapor [11]. Clouds cover more than 50% of the earth surface at any given time and this is the main constraint on retrieving reliable time series of at-surface NDVI observations [12]. In general, cloud cover causes a large decrease in NDVI compared with clear sky conditions and changes more quickly than vegetation phenology as captured by NDVI [10,13].
To suppress the impact of less-than-ideal atmospheric conditions, especially the cloud cover, daily NDVI observations by moderate spatial resolution sensors are temporally composited to a 10-or 16-day time window using maximum value compositing (MVC) to generate NDVI data products [11]. The maximum NDVI value in each pixel and time window is retained and assumed to capture vegetation conditions in the entire time window [14][15][16]. This procedure may not remove cloud-contaminated observations completely and create gaps in the time series of NDVI, e.g., because there might be no cloud-free observation in a number of pixels and time windows, particularly in some super-humid regions [14][15][16]. Longer compositing window lengths (e.g., one month) can reduce cloud-affected observations further but might also remove meaningful phenological information in terrestrial vegetation [17]. Accordingly, time windows shorter than one month are applied in temporal compositing of NDVI retrievals [18]. To further reduce residual noise in the MVC-produced NDVI time series, other time series reconstruction (TSR) methods have been proposed, such as asymmetric Gaussian (AG) [19], double logistic (DL) [19], Savitzky-Golay (SG) [20], iterative interpolation for data reconstruction (IDR) [21], and Whitaker [22,23]. Some of these methods use temporally close clear-sky observations to estimate NDVI in a given pixel for time windows with no cloud-free observations [20,21]. Comprehensive reviews on TSR methods can be found in [12,[24][25][26]. Each TSR method is based on different theories and assumptions about the properties of time series and applies different user-defined parameters, with the consequence that each method performs differently in different regions when applied to global reconstruction of observations [10,12]. For example, the AG and DL methods are more suitable to reconstruct NDVI time series at high latitudes, especially in the boreal and tropical forest areas, where the large faction of noisy observations over tropical humid areas may yield large errors and/or unstable fittings [12]. For continental and global studies on terrestrial vegetation, a TSR method providing stable and accurate reconstruction without excessive local tune-up is necessary [27,28].
Fourier-based harmonic analysis techniques have been extensively applied in modeling time series of remote sensing data especially vegetation index by mimicking surface dynamic with several harmonic components at different frequencies [29][30][31][32][33]. As a Fourier-based model, the harmonic analysis of time series (HANTS) is one of the most popular algorithms for the reconstruction of extended time-series of satellite data and originally conceived for NDVI observations [16,18,30,34]. The main advantages of HANTS include [34] (1) inherent coherence of harmonic components with periodical phenology rhythms; (2) low-pass filtering to preserve the slower phenological signals while excluding the high frequency noise induced by adverse atmospheric conditions; (3) simple (via iteratively linear least square fitting) implementation of the method; (4) impressive compression power for raw time series. Since it was coded by Verheof [35], the algorithm has been implemented with different programming languages, including Fortran, IDL, C, Matlab, R, and Python as well as the Google Earth Engine (GEE) platform [36] (Table 1) to meet the requirements of different users. Notwithstanding its popularity in the field of time series analysis of satellite data, the reconstruction performance is heterogeneous when applied for global reconstruction [18]. The global reconstruction performance of HANTS was systematically evaluated by Zhou et al. [18], who presented a replicable measure of reconstruction accuracy for different regions on Earth. The same HANTS parameter setting was applied globally, however, while at the same time showing that performance varied with parameter setting, as expected [18]. For instance, HANTS gave rather high reconstruction errors in the boreal forest region as well as in part of cropland area, because of long gaps in time series or an inadequate choice of the number of harmonic components to capture higher frequency features in the NDVI signal [18]. Zhou et al. [12] carried out a comparative evaluation of the global reconstruction performance of several popular time series reconstruction methods, including HANTS, asymmetric Gaussian (AG), double logistic (DL), Savitzky-Golay (SG), and Whitaker smoother (WS). This evaluation suggested that HANTS performance was comparable with and in some cases better than the other methods included in the study. The results confirmed the potential relevance of a site (biome)-specific parameter setting towards better performance. The HANTS implemented by Verheof [35] designs several user-defined control parameters including the length of base period (BP), number of frequencies (NF), fitting error tolerance (FET), degree of over-determinedness (DoD), and regularization factor (i.e., Delta). These parameters all determine the reconstruction performance to a large extent. These parameters offer users a useful flexibility in the analy-Remote Sens. 2021, 13,4251 4 of 20 sis of complex time series. On the other hand, effective parameter setting is a challenge for non-expert users and no other choice was available so far but to set these parameters based on a combination of literature and trial-and-error [18]. For the global reconstruction of complex signals affected by heterogeneous gaps, the reconstruction performance may be more sensitive to parameter setting than in regional applications [27,28]. For example, Zhou et al. [27] investigated the sensitivity of the reconstruction error to the selection of harmonic components, fitting error tolerance, as well as weighting schemes in HANTS and concluded that the overall performance can be improved with optimized parameter settings instead of applying a widely accepted setting scheme. Following the initial work in Zhou et al. [27], the optimal setting of other user-defined parameters still needs to be further explored and identified.
Besides the above-mentioned parameters to prescribe the operation of the HANTS algorithm, the impact of several characteristics of the input datasets on the performance of global reconstruction need to be taken into account. These characteristics include but are not limited to: (1) Time window applied in compositing the input data. Most freely available NDVI data apply a 10-or 16-or 30-day time window in compositing daily observations by applying maximum value composition (MVC) [37][38][39]. This procedure was believed to be sufficient to eliminate most cloudy observations [11,15]. Most applications were based on such composite data products [16,18,34], although, to our knowledge, no study evaluated the performance of HANTS in processing raw daily NDVI data or products with different composition time windows. (2) Quality control (QC) information. Pixel-based QC information, which indicates the retrieving reliability (e.g., good, marginal, snow/ice or cloudy), is an indispensable attribute of quantitative remote sensing data products [40]. Previous studies suggested this information could help to exclude low quality observations and improve reconstruction performance [20,37]. The accuracy of QC information, however, may also degrade its reliability [10,12], and the degree to which the QC information may impact the global reconstruction performance of HANTS needs to be investigated. (3) Actual acquisition date of each observation. For each pixel and time window the MVC procedure selects the maximum NDVI value but does not retain the actual date of acquisition [15,41]. This implies that, e.g., in a 7-day composite, there might be a difference in acquisition time of up to 14 days between the NDVI observations retained in adjacent pixels. This inconsistency can be mitigated by assigning an approximate time stamp, e.g., start, middle or end of the time window, to each selected maximum NDVI observation. However, this solution may still yield large differences between this approximate time stamp and the actual date of acquisition for the retained maximum NDVI value. When applying longer time windows, e.g., 30 days, or when observing critical or shorter phenological stages, the vegetation signal can change significantly during the time window [41,42]. It still needs to be evaluated, therefore, whether the global reconstruction performance might be improved by applying the actual date acquisition of each retained NDVI observation in the reconstruction of the time series.
As a summary, the successful worldwide applications of HANTS during the past three decades [24,34] has suggested that the model can be a robust and promising algorithm for global NDVI reconstruction, although the performance requires improvements in some regions [12,18].
The objective of this study is to describe and evaluate an improved HANTS method by systematically optimizing parameter settings and taking into account key-characteristics of input time series data. If successful, the study is likely to trigger further and wider use of HANTS for global and regional reconstruction of NDVI time-series.

Materials
Daily NDVI data were generated by using daily land surface reflectance retrieved from Terra/MODIS measurements, i.e., the data product MOD09GA-MODIS/Terra Surface Reflectance Daily L2G Global 1 km and 500 m, for the period 2001-2020 [37]. The Terra/MODIS measurements alone provide more than 6900 independent NDVI estimations for each pixel, which is enough for the statistical analysis of this study (see methods section). Thus, Aqua/MODIS measurements, which provide similar NDVI estimation as Terra/MODIS, were ignored [12]. The "QC_500 m" layer of the MOD09GA product provides an indication of the accuracy of each observation [40]. In order to speed up the evaluation procedure, only the NDVI observations of 445 BELMANIP2 (Benchmark Land Multisite Analysis and Inter-comparison of Products) sites [43] were downloaded from Google Earth Engine (GEE) and used in the evaluation. The BELMANIP2 sites were carefully selected by Baret et al. [43] to represent the global terrestrial vegetation types and their phenology (Figure 1), where the percentage of sites for each biome closely match the global fractional abundance of each biome. The number of sites sampling each biome is given in Figure 1, in which vegetation sites are dominated by grasses/cereal crops (GCC) (97 sites), savanna (SAV) (62 sites), evergreen broadleaf forest (EBF) (61 sites), and shrubs (SHR) (60 sites).
rithm for global NDVI reconstruction, although the performance requires improvements in some regions [12,18].
The objective of this study is to describe and evaluate an improved HANTS method by systematically optimizing parameter settings and taking into account key-characteristics of input time series data. If successful, the study is likely to trigger further and wider use of HANTS for global and regional reconstruction of NDVI time-series.

Materials
Daily NDVI data were generated by using daily land surface reflectance retrieved from Terra/MODIS measurements, i.e., the data product MOD09GA-MODIS/Terra Surface Reflectance Daily L2G Global 1 km and 500 m, for the period 2001-2020 [37]. The Terra/MODIS measurements alone provide more than 6900 independent NDVI estimations for each pixel, which is enough for the statistical analysis of this study (see methods section). Thus, Aqua/MODIS measurements, which provide similar NDVI estimation as Terra/MODIS, were ignored [12]. The "QC_500 m" layer of the MOD09GA product provides an indication of the accuracy of each observation [40]. In order to speed up the evaluation procedure, only the NDVI observations of 445 BELMANIP2 (Benchmark Land Multisite Analysis and Inter-comparison of Products) sites [43] were downloaded from Google Earth Engine (GEE) and used in the evaluation. The BELMANIP2 sites were carefully selected by Baret et al. [43] to represent the global terrestrial vegetation types and their phenology (Figure 1), where the percentage of sites for each biome closely match the global fractional abundance of each biome. The number of sites sampling each biome is given in Figure 1

Overview of the Evaluation and Optimization Procedures
The procedure to optimize the HANTS configuration for global NDVI time series reconstruction includes an evaluation and an optimization step ( Figure 2). For each site, an annual reference NDVI time series and a set of annual noisy series based on raw MODIS daily NDVI time series and QC information was generated using a time series simulator. The simulated noisy series are further processed by applying HANTS for different configurations considering internal parameter settings, improvement schemes, and several external influence factors. The HANTS algorithm coded by Verhoef [35] with predefined parameters was referred as "classical HANTS" to differentiate it from later HANTS versions with several improvements; see Section 2.2.3 for details. The difference between the reconstructed noisy series and reference series was used to quantify the reconstruction performance in terms of overall reconstruction error (ORE) [12] (Section 2.2.4 for details). By comparing the ORE obtained for a set of sites and for different configurations, the configurations providing better reconstruction global performance, i.e., lower ORE, can be identified.

Overview of the Evaluation and Optimization Procedures
The procedure to optimize the HANTS configuration for global NDVI time series reconstruction includes an evaluation and an optimization step ( Figure 2). For each site, an annual reference NDVI time series and a set of annual noisy series based on raw MODIS daily NDVI time series and QC information was generated using a time series simulator. The simulated noisy series are further processed by applying HANTS for different configurations considering internal parameter settings, improvement schemes, and several external influence factors. The HANTS algorithm coded by Verhoef [35] with predefined parameters was referred as "classical HANTS" to differentiate it from later HANTS versions with several improvements; see Section 2.2.3 for details. The difference between the reconstructed noisy series and reference series was used to quantify the reconstruction performance in terms of overall reconstruction error (ORE) [12] (Section 2.2.4 for details). By comparing the ORE obtained for a set of sites and for different configurations, the configurations providing better reconstruction global performance, i.e., lower ORE, can be identified.

Simulation of Reference and Noisy NDVI Time Series
To measure the reconstruction performance of HANTS quantitatively, the immediate method is to evaluate how close a reconstructed noisy series get to clear-sky NDVI series (or "ground truth"). One cannot expect to find ideal clear-sky NDVI series, however, as few vegetated pixels on Earth can be completely free from cloud cover during the vegetation growth season. We assumed that vegetation phenology and seasonal cloud cover at a specific location (pixel) remain roughly similar across the years. Likewise, in earlier studies [10,12,18], long-term historical NDVI observations were applied to simulate reference

Simulation of Reference and Noisy NDVI Time Series
To measure the reconstruction performance of HANTS quantitatively, the immediate method is to evaluate how close a reconstructed noisy series get to clear-sky NDVI series (or "ground truth"). One cannot expect to find ideal clear-sky NDVI series, however, as few vegetated pixels on Earth can be completely free from cloud cover during the vegetation growth season. We assumed that vegetation phenology and seasonal cloud cover at a specific location (pixel) remain roughly similar across the years. Likewise, in earlier studies [10,12,18], long-term historical NDVI observations were applied to simulate reference annual NDVI series, representing cloud-free vegetation phenology, and noisy NDVI series including cloud contaminated observations. Zhou et al. [12] proposed a robust scheme to synthesize pixel-based annual reference series and simulate noisy conditions (e.g., caused by cloud cover) using long-term historical NDVI observations. The annual reference time series were constructed by targeting the 445 BELMANIP2 (Benchmark Land Multisite Analysis and Intercomparison of Products) sites [43]. The method used daily NDVI retrievals for 14 years and the Quality Assessment (QA) flags indirectly, i.e., to separate the daily observations into high and low quality (HQ, LQ). Temporal composites of the daily HQ observations were generated and further treated as being clear-sky to construct the reference time series for each site, while noise is added to mimic the effect of clouds and snow, generating the noisy time series. The noise is generated by taking into account the probability of LQ observations estimated from the daily time series of each site. In this study, the method of Zhou [12] was applied to construct an annual time series of reference and noisy NDVI observations. A detailed description of the method can be found in [16]. Specifically, for each site, the procedure uses daily MODIS reflectance time series (from MOD09GA) and QC information ("QC_500 m" layer) as input and generates one annual daily reference NDVI series (365 samples) and 100 annual daily noisy NDVI series. The daily NDVI observations were labeled as "high" and "low" quality respectively using the QC flags, with "high" quality NDVI assumed to be less affected by cloud conditions, observation geometry and instrumental errors. In particular, a NDVI observation is assessed "high quality" when the QC flags are "0000" in both bits 2-5 (band 1 quality) and bits 6-9 (band 2 quality) and "00" in bits 0-1 (cloud state). All other NDVI observations are assessed as "low quality". The "high quality" and "low quality" NDVI observations were used to simulate reference and noisy NDVI time series. In this way, the QC flags are not applied to select specific outliers in the gap-filling stage, but are only applied to extract a large sample of HQ observations. Moreover, we used 20 years (2001~2020) of daily high-quality observations, which we deemed sufficient to construct a robust annual reference series to capture pixel seasonal NDVI dynamic.

Configurations for Evaluation and Optimization
In this study, three kinds of Configurations were developed to evaluate the reconstruction performance (Table 2). Firstly, the classical settings of the NF, FET, DOD, and Delta parameters were evaluated for all possible values. Next step, the candidate improvements were evaluated separately and finally a new functionality was added to the current algorithm to optimize parameter settings. Finally, configurations on external data attributes, e.g., temporal windows applied in compositing, QC flag for initial weights setting, and the actual acquisition date of the retained observations in each pixel, were evaluated to identify procedures for the further improvement of reconstruction performance.  . In turn, the frequency is the reciprocal of P(i).
Since atmospheric contamination mainly introduces high frequency noise, a few low frequency components (i.e., NF < 4) were used in previous studies (e.g., [18,30,35]). In this study, NF in the range from 2 (12-month and 6-month components) to 6 was applied, i.e., 2 months was the shortest period/highest frequency component, in the global evaluation of time series reconstruction by HANTS. (c) Fitting error tolerance (FET): The acceptable maximum deviation between raw observations and the result of the reconstruction. In the case of NDVI(t), at each iteration, a negative deviation from the modeled time series larger than FET will be excluded from further iterative processing. The iteration is terminated when all deviations between the remaining valid observations and the fitted model are smaller than the pre-defined FET value. A small FET may erroneously remove some valid observations as outliers while some real outliers cannot be correctly identified with a too large FET. FET is frequently set between 0.05 and 0.1 in the reconstruction of NDVI time series by HANTS [16,18,44,45]. In this study, a FET range from 0.01 to 0.12 with a 0.01 step was applied. (d) Degree of over-determinedness (DoD): A Fourier series including the harmonic components determined by BP and NF is used to model the time series of observations. The coefficients of the modeled series are obtained by solving a linear system of equations, which requires at least 2NF + 1 independent observation, since amplitude and phase value need to be determined for each harmonic component of the series. The observations are inherently accompanied with errors, solutions of the system of equations by using > 2NF + 1 observations may improve the accuracy of estimated amplitude and phase. Such an overdetermined system of equations is best solved using least square method, where more observations give a smaller error of estimate. HANTS is designed to identify and remove outliers iteratively. The DoD is defined as the minimum number of required additional observations, i.e., the minimum difference between the remaining valid observation size and 2NF + 1 [35]. In other words, if the total input observation size is N0, then the removed outliers should not exceed (N0 − (DoD + 2NF + 1)). Therefore, the DoD can be set between 0 and (N0 − (2NF + 1)) and it is a second termination criterion of iterations beside FET. A too large DoD tends, however, to prevent the iteration procedure from detecting possible outliers [16]. The minimum number of valid observations needed to solve a system of equations is 13 when NF = 6. So, the maximum possible DoD for 16-day composited yearly NDVI (N0 = 23) is 10. In this study, we varied DoD between 0 and 12 in steps of 1 to analyze the impact of DoD on model performance. These non-unique solutions may yield large fluctuations in the fitted results, especially in subsequent iterations. The ridge regression method was applied to solve the linear system in the classical HANTS and a regularization factor (i.e., Delta) was used to damp the randomness of the solutions [20]. The Delta is generally set as a small positive value (e.g., 0.1). The impact on model performance was evaluated by varying Delta between 0 and 1 in step of 0.1. (f) HiLo flag: This parameter indicates whether outliers are expected either above or below the fitted model, which depends on the nature of observations [16]. For instance, cloud covered or contaminated targets yield lower NDVI values compared to clear-sky conditions, thus outliers are below the fitted model and are rejected (i.e., HiLo = "Low"). Given the type of observations, HiLo parameter is unambiguously defined and there is no need to evaluate the impact on HANTS performance. (g) Valid range (VR): The valid range of input signals is determined by the nature of the observations and is defined for each data product [16]. For instance, the land surface NDVI generally ranges between 0 (or −0.2 if including water or snow) and 1.0. Observations outside this range can be rejected as outliers directly.
In summary, the NF, FET, DoD, and Delta are the four most critical parameters of classical HANTS controlling the reconstruction performance. The NF, FET, and DoD jointly determine profile fitting and outlier detection of the model, and thus their settings are evaluated jointly. Here the "joint" evaluation means calculating performance metrics over all sites under each possible combination of NF, FET, and DoD settings. For each site, there will be 5 (NF settings) × 12 (FET settings) × 13 (DoD settings) = 780 combinations. Based on the joint evaluation result, the Delta scenarios are further evaluated, after which the optimized parameter settings for global reconstruction using classical HANTS can be derived.

(2) Model improvements
In addition to the procedure to optimize parameter settings of the classical HANTS, several ways to improve global reconstruction performance by adapting the design of HANTS were proposed. Specifically, the proposed improvements and the procedures to evaluate them are described below: (a) Dynamic update of weights. Initially, all input observations are assigned a weight = 1 by HANTS. The algorithm was originally designed to apply varying weights, but it was first implemented with binary weights, i.e., = 1 for valid observations and = 0 for outliers. As explained above, at each iteration, any observation negatively deviating from the current Fourier series by more than the FET are assigned a weight = 0 and excluded in further iterations. The FET setting is set on the basis of user experience and erroneous detection of outlier is unavoidable. The dynamic update of weights was proposed to improve the performance. In practice, the weight w i k of the i-th observation is updated at each k-th iteration taking into account the deviation from the current Fourier series as: where yr k is the vector of estimates by the Fourier series at the k-th iteration. In this case, if the k-th estimates are larger than estimates in the previous iteration, weights are increased. This leads to the estimates in later iterations to approach the upper envelope of NDVI time-series.
Polynomial or inter-annual harmonic components: If HANTS is applied to annual NDVI time series, the estimated Fourier components can only explain variations over periods shorter than a year while real signals may contain trends or inter-annual variations [12]. Earlier studies on Fourier analysis of NDVI time series were focused on investigate inter-annual variation of vegetation regulated by climate using multi-annual data records [30,[46][47][48].
Multi-annual components in the Fourier series or alternatively 3-order polynomial components can be used to capture inter-annual variability.
(3) Impact of input data attributes The impact on reconstruction performance of key-characteristics of input data was evaluated, specifically the time window applied in the compositing, QC information, and the actual date of acquisition of the observation retained in the temporal composite for each time window and each pixel:

Performance Metrics
The ORE was defined as the RMSE between the reconstructed noisy series and reference series [12] and was used as the main matric in this study to quantify the performance of HANTS under different scenarios: where: i = 1, 2, . . . , 445 sites; j = 1, 2, . . . , 100 noisy series; yr i, j noise,k is the k-th estimate of j-th reconstructed noisy NDVI series for the i-th site; yr i re f ,k is the k-th estimate of simulated reference NDVI series for i-th site; N (=365) is the number of samples in each reconstructed time series.
For each site, there were 100 simulated noisy NDVI series, resulting in 100 noisyreference series pairs. Zhou et al. [12] applied the mean ORE over 100 replications as a measure of reconstruction performance at each site under specific scenarios. The standard deviation of ORE over the 100 replications, i.e., Std-ORE i , reflects the stability of model performance under different noise conditions [12]. Smaller mean ORE i and Std-ORE i indicate better performance. The two metrics can be used to rank model configurations independently or can be combined. The configuration yielding the smallest ORE i , however, may not yield the smallest Std-ORE i , i.e., the optimal configuration was identified via a trade-off between ORE i and Std-ORE i [12].
The configurations defined by the settings of NF, FET and DoD were ranked separately in the order of increasing ORE i and Std-ORE i , giving two rankings for each configuration. Then the configuration with the lowest sum of the two rankings was selected as the best setting of NF, FET and DoD for each site. The best setting of NF, FET, and DoD may vary with sites, which leads to "local optimization" [27]. Contrariwise, the global reconstruction of a NDVI dataset would require a unique setting of NF, FET, and DoD. The configuration ranked first at most sites was taken as the "global optimization" [10,27]. For example, if the combination of (NF = 3, FET = 0.05, DoD = 5) gave the smallest ORE i for 200 sites and none other configuration gave this performance for more than 200 sites, the configuration (NF = 3, FET = 0.05, DoD = 5) was applied as global optimization to all sites. The global optimal configuration was the same as the local optimal configuration for some sites, while it was different for other sites, i.e., the overall performance achieved by applying the local optimal configuration settings was higher than that with the global optimal configuration. Different settings of Delta scenarios were evaluated for cases obtained with the best local and global settings of NF, FET and DoD. Zero or small Delta may result in non-unique solutions of the linear system of equations, giving an extremely large ORE for some sites. The normalized ORE, i.e., rORE, was applied to evaluate the impact of Delta settings on reconstruction performance: where ORE min and ORE mean are the minimum and mean ORE values of all Delta settings for a site. ORE delta is the ORE value for a specific Delta setting.

Optimization of Parameter Setting
Optimal settings for NF, FET, and DOD: There are 780 combinations of settings for NF, FET and DoD for each site, from which the optimal setting was identified by applying ORE and Std-ORE criteria (see Section 2.2.4). The local optimization, i.e., in principle with a different best setting for each site, achieved a better reconstruction performance than the global optimal setting in terms of both ORE and Std-ORE (Figure 3). The best setting based on ORE (L1, G1) gave the smallest ORE but a sub-optimal Std-ORE, and the other way around when applying the best ranking based on Std-ORE. The setting based on the trade-off of ORE and Std-ORE, i.e., L3 (G3), gave a performance in between L1 and L2 (G1 and G2).  The local optimal settings (L3) for global sites were shown in Figure 4. To achieve the best reconstruction performance requires 4-5 or even six harmonic components at high latitudes (e.g., >50 °N) ( Figure 4A). In the humid areas at low latitudes, such as the rainforest area, it is better to use two harmonic components ( Figure 4A). Cloud contami-  The local optimal settings (L3) for global sites were shown in Figure 4. To achieve the best reconstruction performance requires 4-5 or even six harmonic components at high latitudes (e.g., >50 °N) ( Figure 4A). In the humid areas at low latitudes, such as the The local optimal settings (L3) for global sites were shown in Figure 4. To achieve the best reconstruction performance requires 4-5 or even six harmonic components at high latitudes (e.g., >50 • N) ( Figure 4A). In the humid areas at low latitudes, such as the rainforest area, it is better to use two harmonic components ( Figure 4A). Cloud contamination of the NDVI observations is more frequent in humid areas due to higher cloud cover, which may result in multiple and adjacent "bad" observations. High frequency harmonic components can capture rapid variations in time series, as the ones due to clouds, and may prevent HANTS from identifying these occurrences as outliers [18]. This results in larger reconstruction errors compared with settings involving only low frequency harmonic components. As regards the FET, the results suggested that a higher value, say 0.05 to 0.09, should be set at higher latitudes in the Northern hemisphere ( Figure 4B). Contrariwise, at lower latitudes, it is better to use a lower FET, i.e., 0.01-0.03. The best DoD setting is rather variable across sites, except in the tropical rainforest areas where DoD should be less than 3 ( Figure 4C).
Interested readers should in a first instance query the optimal parameter settings for the sites sampling a specific area of interest. To present of an overview of the site-specific results, the optimal parameter settings have been stratified by biome (Figure 4) using the land cover classes applied in the MODIS data product described earlier. The aggregated results suggested that four harmonics can result in better reconstruction performance in the DNF, ENF, DBF, SHR, and GCC classes, while two and three harmonics would be sufficient in EBF, SAV, and BCR respectively ( Figure 4A). The optimal FET for DNF, ENF, DBF, and SAV was 0.05 on average, while smaller values were suggested for other biomes ( Figure 4B). The mean DoD across different biomes was ranging from 2 to 7, i.e., with a limited dependence on the biome ( Figure 4C).
The global best setting G3 corresponds to NF = 4, FET = 0.05, and DoD = 5, which gave a much higher ORE and Std-ORE than the L3 setting ( Figure 5). Particularly, the ORE and Std-ORE given by L3 and G3 in the boreal forest area and in the equatorial rainforest area at low latitudes can reach 0.06 and 0.018 respectively, which are much higher than in other areas ( Figure 5).
Optimal settings for Delta: The boxplots of ORE with different Delta settings were similar (not shown). The mean rORE was smaller with zero or small Delta value (<0.4), compared to larger Delta values ( Figure 6). The former settings, however, produced a severely skewed distribution of rORE across global sites, i.e., with more outliers and averages much higher than the upper quantiles ( Figure 6), which suggested an unstable reconstruction. In contrast, less skewed distribution of rORE can be expected with higher Delta values, although a too large Delta may degrade reconstruction performance, i.e., increasing mean ORE globally ( Figure 6). As a trade-off, setting Delta between 0.4 and 0.6 seems the best option that can avoid both outliers at some sites caused by small Delta and lower performance caused by a large Delta. As regards the best settings for different regions, a Delta >0.8 is needed for humid areas such as tropical rainforest regions, since the small seasonality of NDVI can aggravate collinearity of the observations and the solutions of the system of equations. For most of the other sites, a Delta < 0.5 is preferred, which presents a slight biome-dependent pattern (Figure 7).

Impact of Proposed Improvements
The three proposed improvements, i.e., dynamic weighting, three-order polynomial component, and inter-annual harmonic components, gave ORE values comparable to classical HANTS with global optimized settings (G3) for most sites. Moreover, smaller ORE, i.e., significantly better reconstruction performance was obtained for part of the sites as shown in Figure 8. Overall, dynamic weighting had a limited effect at sites where the reconstruction error was low to medium with the global optimal settings of NF, FET, and DoD, while more sites saw a significant improvement with dynamic weighting observed when local optimal settings of NF, FET, and DoD were applied ( Figure 8A). Interested readers should in a first instance query the optimal parameter settings for the sites sampling a specific area of interest. To present of an overview of the site-specific results, the optimal parameter settings have been stratified by biome (Figure 4) using the land cover classes applied in the MODIS data product described earlier. The aggregated results suggested that four harmonics can result in better reconstruction performance in the DNF, ENF, DBF, SHR, and GCC classes, while two and three harmonics would be sufficient in EBF, SAV, and BCR respectively ( Figure 4A). The optimal FET for DNF, ENF, DBF, and SAV was 0.05 on average, while smaller values were suggested for other biomes Sites with historical maximum NDVI less than 0.2 were excluded for evaluation (gray filled cycles). The inset barplots presented the average value of the parameters for specific biome.

Impact of Input Data Attributes
The global performance of reconstruction by HANTS with global optimized settings (G3) but different composite lengths of five, eight, and 16 days produced similar ORE statistics (Figure 9). This suggested that composite length does not have a significant effect on reconstruction performance. Contrariwise, the direct use of daily NDVI series without MVC composition gave a higher reconstruction error. Including QC information in the initial weighting (QC + HANTS) in the reconstruction can improve global performance, especially when using daily NDVI time series. Some studies filtered out low quality NDVI observations using QC information [13,49], while our result suggested that the "QC only" reconstruction gave a much larger ORE than all other configurations when detecting and removing outliers. Applying the actual acquisition date (AAD) resulted in a slight improvement in reconstruction performance only for longer composite windows (i.e., 16-day composition). The limited improvement in reconstruction performance by the AAD treatment over short composite windows may be explained by the insufficiently low frequency harmonics used in HANTS to capture the detail of variation in NDVI caused by different time stamps within composite windows.
Remote Sens. 2021, 13, x FOR PEER REVIEW 14 of 21 ( Figure 4B). The mean DoD across different biomes was ranging from 2 to 7, i.e., with a limited dependence on the biome ( Figure 4C). The global best setting G3 corresponds to NF = 4, FET = 0.05, and DoD = 5, which gave a much higher ORE and Std-ORE than the L3 setting ( Figure 5). Particularly, the ORE and Std-ORE given by L3 and G3 in the boreal forest area and in the equatorial rainforest area at low latitudes can reach 0.06 and 0.018 respectively, which are much higher than in other areas ( Figure 5). Optimal settings for Delta: The boxplots of ORE with different Delta settings were similar (not shown). The mean rORE was smaller with zero or small Delta value (<0.4), compared to larger Delta values ( Figure 6). The former settings, however, produced a severely skewed distribution of rORE across global sites, i.e., with more outliers and averages much higher than the upper quantiles ( Figure 6), which suggested an unstable reconstruction. In contrast, less skewed distribution of rORE can be expected with higher Delta values, although a too large Delta may degrade reconstruction performance, i.e., increasing mean ORE globally ( Figure 6). As a trade-off, setting Delta between 0.4 and 0.6 seems the best option that can avoid both outliers at some sites caused by small Delta and lower performance caused by a large Delta. As regards the best settings for different regions, a Delta >0.8 is needed for humid areas such as tropical rainforest regions, since the small seasonality of NDVI can aggravate collinearity of the observations and the solutions of the system of equations. For most of the other sites, a Delta < 0.5 is preferred, which presents a slight biome-dependent pattern (Figure 7).   The local optimal settings (L3) for global sites were shown in Figure 4. To achieve the best reconstruction performance requires 4-5 or even six harmonic components at high latitudes (e.g., >50 °N) ( Figure 4A). In the humid areas at low latitudes, such as the rainforest area, it is better to use two harmonic components ( Figure 4A). Cloud contami- Figure 6. Normalized ORE, i.e., rORE, obtained for the L3 and G3 settings vs. Delta values. The lower whiskers, lower side of boxes, horizontal line inside boxes, upper side of boxes, and upper whiskers correspond to minimum, lower quartile, median, upper quartile, and maximum values of the samples. The asterisk ("*") and star ("☆") indicate the mean values and outliers for each boxplot.

Figure 7.
Best Delta setting for all sites. The NF, FET, and DoD are according to the G3 setting, i.e., NF = 4, FET = 0.05, and DoD = 6. Sites with historical maximum NDVI less than 0.2 were excluded for evaluation (gray filled cycles).

Impact of Proposed Improvements
The three proposed improvements, i.e., dynamic weighting, three-order polynomial component, and inter-annual harmonic components, gave ORE values comparable to classical HANTS with global optimized settings (G3) for most sites. Moreover, smaller ORE, i.e., significantly better reconstruction performance was obtained for part of the sites as shown in Figure 8. Overall, dynamic weighting had a limited effect at sites where the reconstruction error was low to medium with the global optimal settings of NF, FET, and DoD, while more sites saw a significant improvement with dynamic weighting observed when local optimal settings of NF, FET, and DoD were applied ( Figure 8A).

Impact of Input Data Attributes
The global performance of reconstruction by HANTS with global optimized settings (G3) but different composite lengths of five, eight, and 16 days produced similar ORE statistics ( Figure 9). This suggested that composite length does not have a significant effect on reconstruction performance. Contrariwise, the direct use of daily NDVI series without MVC composition gave a higher reconstruction error. Including QC information in the initial weighting (QC + HANTS) in the reconstruction can improve global performance, especially when using daily NDVI time series. Some studies filtered out low quality NDVI observations using QC information [13,49], while our result suggested that the "QC only" reconstruction gave a much larger ORE than all other configurations when detecting and removing outliers. Applying the actual acquisition date (AAD) resulted in a slight improvement in reconstruction performance only for longer composite windows (i.e., 16day composition). The limited improvement in reconstruction performance by the AAD treatment over short composite windows may be explained by the insufficiently low frequency harmonics used in HANTS to capture the detail of variation in NDVI caused by different time stamps within composite windows.

Impact of Input Data Attributes
The global performance of reconstruction by HANTS with global optimized settings (G3) but different composite lengths of five, eight, and 16 days produced similar ORE statistics ( Figure 9). This suggested that composite length does not have a significant effect on reconstruction performance. Contrariwise, the direct use of daily NDVI series without MVC composition gave a higher reconstruction error. Including QC information in the initial weighting (QC + HANTS) in the reconstruction can improve global performance, especially when using daily NDVI time series. Some studies filtered out low quality NDVI observations using QC information [13,49], while our result suggested that the "QC only" reconstruction gave a much larger ORE than all other configurations when detecting and removing outliers. Applying the actual acquisition date (AAD) resulted in a slight improvement in reconstruction performance only for longer composite windows (i.e., 16day composition). The limited improvement in reconstruction performance by the AAD treatment over short composite windows may be explained by the insufficiently low frequency harmonics used in HANTS to capture the detail of variation in NDVI caused by different time stamps within composite windows. Figure 9. Boxplot of ORE vs. composite length for different attributes of input data; reference setting: HANTS without QC weighting and actual acquisition date of observations; QC+HANTS" using QC of input data to assign the initial weights, i.e., weight = 1 for QC = 1 and weight = 0 for Figure 9. Boxplot of ORE vs. composite length for different attributes of input data; reference setting: HANTS without QC weighting and actual acquisition date of observations; QC+HANTS" using QC of input data to assign the initial weights, i.e., weight = 1 for QC = 1 and weight = 0 for QC = 0; QC Only: filtering outliers with only QC and no iteration in HANTS; AAD: using the actual date of acquisition in the reconstruction. The asterisk ("*") and star (" The local optimal settings (L3) for global sites were shown in Figure 4. To achieve the best reconstruction performance requires 4-5 or even six harmonic components at high latitudes (e.g., >50 °N) ( Figure 4A). In the humid areas at low latitudes, such as the 4. Discussion

The Improved Harmonic ANalysis of Time Series (iHANTS)
The improved harmonic analysis of time series (iHANTS) proposed in this study can reconstruct global NDVI datasets with improved performance on the basis of the systematic evaluation of the impacts of a broad set of parameters. Specifically, the systematic evaluation of configurations led to the following suggestions: (1) The critical internal parameters, i.e., NF, FET, DoD and Delta can be set on the basis of either a "local" or "global" ranking for global reconstruction. For regional applications, users can refer to Figures 3 and 6 for best local settings. The best global settings, based on the trade-off between ORE and Std-ORE, are: NF = 4, FET = 0.05, DOD = 5, and Delta = 0.5, which is the default setting scheme used by [18] to evaluate the performance of global NDVI reconstruction. (2) Reconstruction performance can be significantly improved in specific regions of the Earth by using dynamic weighting instead of the classical rigid weighting scheme and by adding 3-order polynomial or inter-annual harmonic components to account for inter-annual variability. Dynamic weighting and 3-order polynomial components require a revised implementation of HANTS. (3) Global reconstruction performance can be improved by using QC information of the dataset to set initial weight and applying the actual date of acquisition for each input observation. Most freely accessible implementations of HANTS do not support custom setting of initial weights and input timestamps. Thus, a revised implementation of the model is needed again.
Several freely available NDVI products are generated using the radiometric data acquired by different sensors, such as AVHRR, MODIS, and SPOT-VEGETATION, and may be provided with different composition lengths [38]. The results of this study suggest that the composition length has a limited impact on global reconstruction performance. This may be due to two opposite impacts. On the one hand, a shorter composition length captures rapid variations in NDVI, and on the other hand, it also collects more low-quality observations, which impacts reconstruction performance in opposite directions [14,15]. In other words, no improvement can be achieved by using a different composition length. Of course, if observations are acquired at higher frequency, e.g., by combining radiometric data acquired by both Terra/MODIS and Aqua/MODIS, performance can be improved, since a higher frequency in data acquisition can increase the probability of high-quality observations [17].
Although the gap-filling performance of HANTS is improved by the optimized parameter settings, there remain limitations that should be highlighted. Firstly, parameter settings were optimized using the annual reference NDVI series constructed by Zhou et al. [12]. This implies that the study assumed that the intra-annual variability captured by the annual reference series dominate pixel NDVI dynamics. For pixels with large inter-annual variability or land-cover change developing over even longer periods of time, the real NDVI dynamics may be poorly represented by the annual reference series. In these cases, the optimized parameter settings should be interpreted with caution [12]. Secondly, the optimization procedure applied to synthetic noisy time series constructed using the daily observations acquired by both Terra and Aqua MODIS sensors. This characterization of noisy conditions is not directly applicable to data acquired by satellites with a lower revisit frequency, such as Landsat-8 and Sentinel-2A/B. Lastly, we used the absolute deviation between reference and reconstructed series (i.e., ORE) as a performance metric, which may not be the most appropriate to characterize the accuracy if capturing phenological features. The latter should be addressed in a follow-up study.

Applying Optimaized HANTS to Other Terrestrial Remote Sensing Variables
Our evaluation of HANTS configurations has been based on NDVI time series, but some findings are applicable to time series of other observables, such as leaf area index (LAI) [50] and land surface temperature (LST) [24,51]. The LAI signal is similar to NDVI in terms of phenology curve, as well as the direction of outliers by clouds effect. It is thus reasonable to assume that the best settings of NF, DoD, and Delta would also improve the performance in the reconstruction of LAI time-series. With regard to the FET, the best global setting is 0.05 for NDVI, i.e., 5% of the valid dynamic range of NDVI (0-1). Taking into account that the valid dynamic range of LAI is from 0 to 10, the FET can be set at about 0.5 (10 × 5%) for the reconstruction of LAI time-series. For LST products, most previous studies [24,45,51] applied small NF (NF = 2 or 3) in HANTS processing, while the Xu et al. [52] reported the optimal NF should be set from 7 to 9 to reconstruct LST over Yangtze River Delta. The appropriate NF settings for global reconstruction still need to be carefully evaluated in future. The FET should also be adapted on the basis of the dynamic range of LST.
Fourier-based gap-filling methods fit time series of available observations with predefined harmonic components but differ in the way potentially contaminated observations are identified and replaced to generate time series of cloud-free images. Besides HANTS, another example is the method proposed by Zhu et al. [29]. Both methods require a threshold to detect outliers. Zhu et al. [29] also use QC flags on clouds, cloud cover, and snow, while other settings were described less precisely and may need to be adjusted depending on the area (biome) observed. We may safely assume, therefore, that optimization towards biome-specific parameter settings is likely to benefit the performance of Fourier-based gap-filling methods.

Other Application Topics for the HANTS
HANTS has been widely applied to analyze long time series of terrestrial remote sensing observables for almost 30 years, either for gap-filling or extracting accurate harmonic components for further analysis [34,44,46,47]. The algorithm has been implemented using different programming languages and platforms, such as Fortran, ENVI/IDL, Matlab, Python, R, etc. (see Table 1). In 2014, the new release of IDL (8.4 version) included HANTS as an official function ( Table 1). Most of these implementations only implement the core of HANTS for single series processing and are commonly used for small scale processing by researchers [18,51]. When applying the method to long time series of images for a large region or at high temporal and spatial resolution, the computational constraints of personal computers (PC) may limit its usage. One needs to consider parallelizing computer processing to make full use of computational resources, such as a multiple-core CPU of the PC or even implementing the method on super computers or clusters [12]. Moreover, cloud-based geo-computation platforms such as the Google Earth Engine (GEE) platform provide a data warehouse of popularly and freely available remote sensing datasets and rich cloud computing resources for earth observation users [2]. Users can design and implement applications without spending a lot of time to download and process large volumes of data on local PCs [5]. We have implemented HANTS on the GEE platform, which can process various remote sensing datasets on request and very efficiently. For instance, to reconstruct one year global 16-day NDVI product with 0.05-degree spatial resolution (MOD13C1), HANTS on the GEE platform only takes 10 min on average, while it may take more than two hours on a local PC without parallelization, not to speak of the time spent on downloading data. The Javascript code for the HANTS implementation on GEE is available by personal request to the senior author of this paper. The full GEE version of HANTS will be published soon.
The amplitudes and phases of the periodic components of NDVI signals are quantitative phenological metrics of vegetation vigor across timescales and have been frequently applied in land cover change detection or vegetation-climate interaction analysis [34,[46][47][48]. The main purpose of HANTS is the quantitative analysis of observable signals in the frequency domain. To this end, efficient and accurate reconstruction and gap-filling are necessary and that is the service that globally improved HANTS can provide. The performance of any time series reconstruction method is fundamentally dependent on the quality of raw observations, which means that one cannot expect to perfectly recover clear-sky NDVI signals for pixels where most observations are contaminated by clouds.

Conclusions
HANTS has been one of the most widely used time series reconstruction methods in the remote sensing community. Sant attention, however, has been paid to investigating the optimal parameter settings under different conditions and non-expert users have to identify suitable settings by a lengthy, subjective trial and error process, which impedes the method from being applied in a larger community. This study systematically evaluated and quantified the impacts of HANTS configurations on global NDVI reconstruction performance and proposed best settings for each configuration. The evaluation was performed by generating pixel-wise reference and noisy NDVI time series using long-term historical observations from MODIS. The results suggested both local and global optimal settings of critical parameters of the model, i.e., NF, FET, DoD, and Delta. To facilitate the non-expert users of the model, the local optimal settings for global sites have been listed in Supplementary Materials, Table S1. The dynamic weighting scheme, inter-annual harmonic and 3-order polynomial components can be used to improve global reconstruction performance by updating the implementation of classical HANTS. In addition, by including attributes of input NDVI data, such as data quality flag (QC) and the actual acquisition date of each observation retained in the temporal composites, the performance of global reconstruction can be further improved. Future users can refer to the settings described in this study towards the better performance of HANTS for the regional or global reconstruction of time-series of bio-geophysical remote sensing observables.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/rs13214251/s1, Table S1: A CSV file listing the local optimal settings for all BELMANIP2 sites was provided as a supplementary document.