How Does Scale E ﬀ ect Inﬂuence Spring Vegetation Phenology Estimated from Satellite-Derived Vegetation Indexes?

: As an important land-surface parameter, vegetation phenology has been estimated from observations by various satellite-borne sensors with substantially di ﬀ erent spatial resolutions, ranging from tens of meters to several kilometers. The inconsistency of satellite-derived phenological metrics (e.g., green-up date, GUD , also known as the land-surface spring phenology) among di ﬀ erent spatial resolutions, which is referred to as the “scale e ﬀ ect” on GUD , has been recognized in previous studies, but it still needs further e ﬀ orts to explore the cause of the scale e ﬀ ect on GUD and to quantify the scale e ﬀ ect mechanistically. To address these issues, we performed mathematical analyses and designed up-scaling experiments. We found that the scale e ﬀ ect on GUD is not only related to the heterogeneity of GUD among ﬁne pixels within a coarse pixel, but it is also greatly a ﬀ ected by the covariation between the GUD and vegetation growth speed of ﬁne pixels. GUD of a coarse pixel tends to be closer to that of ﬁne pixels with earlier green-up and higher vegetation growth speed. Therefore, GUD of the coarse pixel is earlier than the average of GUD of ﬁne pixels, if the growth speed is a constant. However, GUD of the coarse pixel could be later than the average from ﬁne pixels, depending on the proportion of ﬁne pixels with later GUD and higher growth speed. Based on those mechanisms, we proposed a model that accounted for the e ﬀ ects of heterogeneity of GUD and its co-variation with growth speed, which explained about 60% of the scale e ﬀ ect, suggesting that the model can help convert GUD estimated at di ﬀ erent spatial scales. Our study provides new mechanistic explanations of the scale e ﬀ ect on GUD .


Defining GUD in VI Time-Series
We determined GUD by using Zhang's logistic method [13], because it has been employed in MODIS land-surface phenology products [2]. To be exact, we first used the four-parameter logistic function to fit the VI time-series during a rising period, which is from the start of a year to the date when the annual maximum VI occurs: where t is the day of year (DOY), and VI fit (t) is the fitted VI value at t. The parameter c represents the difference between annual maximum and minimum VI values, and d is the background VI value. The parameter a controls the translation of the time-series and parameter b is associated with the rate of VI increase. Then, from the fitted logistic curve of Equation (1), the curvature (K) of VI fit (t) can be calculated as: Thus, the GUD is defined as the date when the rate of change of the K reaches its first local maximum value. Note that VI fit (t) and VI fit (t) are the first and second order derivatives of VI fit (t) with respect to t, K is thus independent of the constant addition item d (the background values of VI times-series). As a result, GUD is determined by the parameters a, b, and c.

Defining the Scale Effect of VI on GUD Estimation
The GUD at a coarse pixel (GUD coarse ) was directly estimated from the VI time-series of the coarse pixel. To compare GUD coarse with the GUD of fine pixels (GUD fine ) at a same footprint, we aggregated GUD fine covered by the coarse pixel through averaged approach (calculating the unweighted average of GUD fine ) as suggested by Peng et al. [7]. The result was noted as GUD fine−ave . Thus, we evaluated the scale effect of GUD using the difference between GUD coarse and GUD fine−ave , expressed as: Given that GUD is estimated from the logistic function, we can infer that GUD coarse is controlled by its own fitting parameters a coarse , b coarse , and c coarse , and the GUD fine−ave is determined by the fitting parameters (i.e., a fine , b fine , and c fine ) of fine VI time-series. Then, we used the linear mixing model of VI times series to link the parameters at different scale, that is, the VI at a coarse pixel (VI coarse ) is assumed to be a linear mixture of the total number of VIs at fine pixels within the coarse pixel, expressed as: where n is the number of fine pixels within the coarse pixel. Although the normalized difference vegetation index (NDVI) value of a coarse pixel tends to be slightly lower than the average of NDVI of corresponding fine pixels [14], it only introduces small errors when modeling the coarse VI time-series (i.e., EVI time-series) from fine pixels by a linear mixture model [6,15,16]. Based on Equation (4), Remote Sens. 2019, 11,2137 4 of 20 parameters a coarse , b coarse , and c coarse of a coarse VI time-series are closely related to the corresponding fine VI time-series and their parameters (i.e., a fine , b fine , and c fine ). Thus, the bias value in Equation (3) is a function (f ) of the controlling parameters of fine pixels, expressed as:

Factors Influencing the Scale Effect on GUD Estimation
To clarify the key factors influencing the scale effect of GUD estimation, we first investigated the two-pixel mixture case. As suggested in Section 2.2, the scale effect can be expressed as: Bias = f (a fine,1 , b fine,1 , c fine,1 , a fine,2 , b fine,2 , c fine,2 ) (6) Considering that the parameters a, b and c are less intuitive, we replaced them by using GUD, length of the period (MP) from GUD to maturity date (MD) and greenness change (GC). Here, the maturity date was calculated as the date when the rate of change of the curvature of a logistic function fitted by VI time series reaches the second local maximum value from the start of a year [13]. The GC is defined as the difference between minimum VI before GUD and annual maximum VI (See Figure 1). According to Shang et al. [17], the parameters a, b, and c can be expressed by GUD, MP, and GC, as (details are given in Appendix A): The meanings of the parameters GUD, MP and GC are easy to understand. The MP and GC represent the growth speed of VI, defined as the amount of increasing in VI per unit time, at the horizontal axis (time required) and vertical axis (the range of VI). Therefore, the Bias in the left of Equation (1) can be represented as a function of GUD, MP and GC, and Equation (6) can be rewritten as: Previous studies have suggested that the scale effect is mainly caused by the difference among fine pixels within a coarse pixel [5,6,18]. Based on Equation (8), we therefore assume that the bias is mainly influenced by the difference in the parameters between two fine pixels (i.e., ∆GUD fine , ∆MP fine , and ∆GC fine ). Thus, Equation (8) becomes: The three factors (∆GUD fine , ∆MP fine and ∆GC fine ) are considered as the biophysical factors that influence the GUD at coarse pixels, which is illustrated in Figure 1. mainly influenced by the difference in the parameters between two fine pixels (i.e., ∆ , ∆ , and ∆ ). Thus, Equation (8) becomes: The three factors (∆ , ∆ and ∆ ) are considered as the biophysical factors that influence the GUD at coarse pixels, which is illustrated in Figure 1.   (9) We divided the function f in Equation (9) into two parts: (1) the bias caused only by ∆GUD fine , and (2) the bias caused by heterogeneity of growth speed at a given ∆GUD fine , expressed as:

Form of Function f in Equation
For simplicity, we assume that the effects of ∆MP fine and ∆GC fine on Bias are independent of each other, so Equation (10) can be rewritten as: To be strict, the effects of ∆MP fine and ∆GC fine may not be independent and thus there may be interactions between ∆MP fine and ∆GC fine at a certain ∆GUD fine . However, incorporating the interaction effect could make the scale-effect model more complicated and difficult to parameterize. We thus employed the three main items in Equation (11) and investigated the performance of the model in explaining the scale effect.
We explored the possible function forms of each item in Equation (11) by conducting simulation experiments for three scenarios. In scenario I, we simulated two VI curves for the fine pixel 1 and pixel 2 with different GUD but identical MP and GC. We assumed that GUD for pixel 1 was fixed DOY 110, whereas the GUD of pixel 2 varied between DOY 90 and 130 (the GUD difference changes from −20 to 20 days), which was used to analyze variation in ∆GUD fine (Figure 2a). This scenario shows that bias ∆GUD fine values varies with ∆GUD fine in a quadratic function (Figure 2d): where c 1 is the fitting parameter. Negative bias in Figure 2d suggests an advanced GUD coarse caused by the scale effect, indicating that ∆GUD fine between fine pixels will lead to an earlier GUD coarse .
where is the fitting parameter. Negative bias in Figure 2d suggests an advanced caused by the scale effect, indicating that ∆ between fine pixels will lead to an earlier . Figure 2. Results of the simulation experiments (three scenarios) exploring the function forms of three variables in Equation (11). (a-c) VI Curves 1 (blue) and 2 (red to yellow) in Scenarios I, II, and III. (d-f) bias ∆GUD fine , bias ∆MP fine |∆GUD fine , and bias ∆GC fine |∆GUD fine change with the corresponding factors.
The negative values of ∆MP fine and positive ∆GC fine indicate faster growth speed of pixel 1 than that of pixel 2. The different color of point refers to different levels of ∆GUD fine (i.e., −20, −10, 0,10 and 20 days) and the negative ∆GUD fine values indicate earlier green-up date (GUD) of pixel 1 than that of pixel 2. The dashed lines represent the proposed function forms (Equations (12) and (13)) used to fit these bias values. (g-i) Fitting performance of each function form, respectively.
In scenario II, we investigated the effect of ∆MP fine on bias (bias ∆MP fine |∆GUD fine ) at different levels of ∆GUD fine (Figure 2b). The MP in pixel 1 was fixed at 75 days, whereas that of pixel 2 changed from 55 to 95 days to simulate different ∆MP fine values (−20 to 20 days). In addition, we set the GUD of pixel 1 from DOY 90 to 130 and fixed the GUD in pixel 2 at DOY 110 to simulate different levels of ∆GUD fine (−20, −10, 0, 10, 20 days). As shown in Figure 2e, at a given level of ∆GUD fine , the bias caused by ∆MP fine approximately follows a linear function. In addition, a greater effect of ∆MP fine on bias is observed when ∆GUD fine is larger (Figure 2h). Scenario III (Figure 2c) was the same scenario II but for ∆GC fine . Similarly, we obtained linear functions between the bias and ∆GC fine (Figure 2f). Based on these observations, we used the following equations to describe their effect in scenarios II and II: where c 2 and c 3 are the fitting parameters. The linear fitting using Equation (13) achieved R 2 values of 0.941 and 0.969 for these variables (Figure 2h,i); the former (∆GUD fine × ∆MP fine ) showed a negative slope and the latter (∆GUD fine × ∆GC fine ) had a positive slope. Note that a smaller MP and a greater GC both indicate a greater growth speed. The form of Equation (13) suggests that the synchronous change of GUD and growth speed (i.e., positive sign of ∆GUD fine × ∆GC fine ) can delay GUD coarse and vice versa.
In summary, given that the GUD is derived from the VI curve in the logistic-form, results of the simulation experiments suggest that heterogeneity of ∆GUD fine leads to an earlier GUD coarse , and this advance in GUD coarse is further adjusted by the relationship between ∆GUD fine and ∆MP fine or ∆GC fine , which can enhance or offset the negative bias in GUD coarse depending on the reversal and synchronous changes between the GUD and growth speed. Further, it can be deduced that the detected GUD of a coarse pixel is closer to the fine pixel with an earlier GUD and a faster growth speed (shorter MP and greater GC).
By combining Equations (11)-(13), we achieved the final model for estimating the scale effect of GUD for the two-endmember case: Following the function form, we generalized Equation (14) for the multi-endmember case by using variance and covariance (i.e., var GUD , cov MP,GUD , and cov GC,GUD instead of ∆GUD 2 fine , ∆GUD fine × ∆MP fine , and ∆GUD fine × ∆GC fine , respectively): Next, we describe a series of experiments designed to test Equations (14) and (15).

Phenology Observations
We collected some typical vegetation greenness annual curves from eight phenological camera observation sites (Table 1) provided by the PhenoCam website [19] (https://phenocam.sr.unh.edu/ webcam/). PhenoCam data were used because digital photographs at the ground were cloud-free and were taken repeatedly at a high temporal frequency (e.g., 0.5 h). We calculated the green chromatic coordinate (GCC) and used it as the indicator of land-surface greenness, according to previous studies [20][21][22]: where R, G, and B represent digital numbers at the red, green, and blue bands, respectively. We generated the time-series of GCC to represent the annual greenness change of vegetation at each site ( Figure 3). greenness, according to previous studies [20][21][22]: where R, G, and B represent digital numbers at the red, green, and blue bands, respectively. We generated the time-series of GCC to represent the annual greenness change of vegetation at each site ( Figure 3).  Table 1. We used the PhenoCam data to test the two-endmember model in Equation (14). For this, we repeatedly chose two curves from the eight GCC curves ( Figure 3) at each time, which generated a total of 28 combinations. In each combination, we first mixed the two curves ( ) with equal proportions to simulate the GCC curve of a coarse pixel ( ) according to Equation (4). We then calculated ∆ , ∆ , and ∆ between the two curves. The bias, as an indicator of the scale effect (Equation (3)), was calculated as the difference between the GUD of and the average of GUD of the two curves. The model includes three influential  Table 1.

Testing the Two-Endmember Model
We used the PhenoCam data to test the two-endmember model in Equation (14). For this, we repeatedly chose two curves from the eight GCC curves ( Figure 3) at each time, which generated a total of 28 combinations. In each combination, we first mixed the two curves (GCC fine ) with equal proportions to simulate the GCC curve of a coarse pixel (GCC coarse ) according to Equation (4). We then calculated ∆GUD fine , ∆MP fine , and ∆GC fine between the two GCC fine curves. The bias, as an indicator of the scale effect (Equation (3)), was calculated as the difference between the GUD of GCC coarse and the average of GUD of the two GCC fine curves. The model includes three influential factors (∆GUD fine , ∆MP fine , and ∆GC fine ), so for comparison, we also performed linear fitting at each time by only employing one factor. Finally, the bias values in all 28 combinations were fitted by the proposed two-endmember model and the one factor linear model, respectively. Because different equations may have different numbers of parameters, we calculated the adjusted R 2 and root-mean-square error (RMSE) for evaluations.

Testing the Multi-Endmember Model Based on SIMMAP Simulated Data
We tested the multi-endmember model by using the data simulated by the SIMMAP (simulación de mapas) software [23]. The software can generate landscape spatial patterns that contain various categories of different degrees of landscape fragmentation based on the modified random cluster method. The degrees of landscape fragmentation were control by parameter p, which was defined as the initial probability that a pixel was marked. The larger the p parameter, the smaller the number of patches and the more homogeneous the surface (as shown in Figure 4). More detailed descriptions and download links of SIMMAP can be found on the website: http://www2.montes.upm.es/personales/ saura/software.html.
We first generated three images (1000 × 1000 pixels) as fine resolution images with different degrees of landscape fragmentation, controlled by the parameter p in SIMMAP (p = 0.45, 0.50, 0.55; Figure 4). In each image, we included the eight vegetation types with equal proportion, and for each type a PhenoCam GCC curve (Table 1) was assigned. We then scaled up the three fine resolution images to coarser with aggregation rates of 3 × 3, 4 × 4, . . . , 20 × 20 pixels (Figure 5a,b) based on Equation (4). GUD coarse , corresponding GUD fine−ave , and three model parameters (i.e., the variance and covariance) were then calculated from coarse and fine resolution images (Figure 5c,d). The bias values were thus calculated as the difference of GUD coarse and GUD fine−ave . Finally, we tested our multi-endmember model on bias at each scale respectively. Note that we randomly split all of the pixels into two-thirds for training (determining the three coefficients in Equation (15)) and one-third for validation.
We first generated three images (1000 × 1000 pixels) as fine resolution images with different degrees of landscape fragmentation, controlled by the parameter p in SIMMAP (p = 0.45, 0.50, 0.55; Figure 4). In each image, we included the eight vegetation types with equal proportion, and for each type a PhenoCam GCC curve (Table 1) was assigned. We then scaled up the three fine resolution images to coarser with aggregation rates of 3 × 3, 4 × 4, …, 20 × 20 pixels (Figure 5a,b) based on Equation (4).
, corresponding , and three model parameters (i.e., the variance and covariance) were then calculated from coarse and fine resolution images (Figure 5c,d). The bias values were thus calculated as the difference of and . Finally, we tested our multi-endmember model on bias at each scale respectively. Note that we randomly split all of the pixels into two-thirds for training (determining the three coefficients in Equation (15)) and one-third for validation.

Testing the Multi-Endmember Model Based on Landsat-MODIS Fused EVI2 Data
We further employed the same satellite data used by Zhang et al. [5] to test the multi-endmember model. The image ( Figure 6) was estimated form Landsat-MODIS fused EVI2 (a two-band EVI [24]) in 2014 with a spatial resolution of 30 m and a frequency of 3 days. The study area was in central Iowa, USA, and mainly includes nine land-cover types: corn, soybean, hay, other crops, grass, forests, shrubs, non-vegetated areas, and open water/wetlands. More

Testing the Multi-Endmember Model Based on Landsat-MODIS Fused EVI2 Data
We further employed the same satellite data used by Zhang et al. [5] to test the multi-endmember model. The GUD fine image ( Figure 6) was estimated form Landsat-MODIS fused EVI2 (a two-band EVI [24]) in 2014 with a spatial resolution of 30 m and a frequency of 3 days. The study area was in central Iowa, USA, and mainly includes nine land-cover types: corn, soybean, hay, other crops, grass, forests, shrubs, non-vegetated areas, and open water/wetlands. More detailed information regarding the data was reported by Zhang et al. [5]. Likewise, we simulated coarser satellite images by up-scaling the EVI2 data from 30 m to 150, 210, 270, 330, 390, and 450 m, which corresponds to aggregation rates of 5 × 5, 7 × 7, 9 × 9, 11 × 11, 13 × 13, and 15 × 15, respectively. During this process, we excluded some pixels with missing observations or inaccurate phenology estimations, as suggested by Zhang et al. [5]. Up-scaling operations can avoid the errors caused by geometric registration, atmospheric effect, and sensor differences, which allowed us to focus on the scale effect rather than other factors. We again randomly selected two-thirds of the pixels as training data and the remaining pixels were used for validation.

Performance of the Two-Endmember Model
Performance of the two-endmember scale-effect model on 28 mixed GCC curves is shown in Figure 7. It can be seen that the two-endmember model achieved good performance (R 2 adj = 0.826, RMSE = 5.53 days), suggesting its ability to account for the scale effect (Figure 7a). Linear models with only one factor performed poorly, with R 2 adj < 0.2 and RMSE > 12 days (Figure 7b-d). These investigations emphasized that integrating the three influential factors with a proper function form is very important for explaining the scale effect. Likewise, we simulated coarser satellite images by up-scaling the EVI2 data from 30 m to 150, 210, 270, 330, 390, and 450 m, which corresponds to aggregation rates of 5 × 5, 7 × 7, 9 × 9, 11 × 11, 13 × 13, and 15 × 15, respectively. During this process, we excluded some pixels with missing observations or inaccurate phenology estimations, as suggested by Zhang et al. [5]. Up-scaling operations can avoid the errors caused by geometric registration, atmospheric effect, and sensor differences, which allowed us to focus on the scale effect rather than other factors. We again randomly selected two-thirds of the pixels as training data and the remaining pixels were used for validation.

Performance of the Two-Endmember Model
Performance of the two-endmember scale-effect model on 28 mixed GCC curves is shown in Figure 7. It can be seen that the two-endmember model achieved good performance (R 2 adj = 0.826, RMSE = 5.53 days), suggesting its ability to account for the scale effect (Figure 7a). Linear models with only one factor performed poorly, with R 2 adj < 0.2 and RMSE > 12 days (Figure 7b-d). These investigations emphasized that integrating the three influential factors with a proper function form is very important for explaining the scale effect.  (14)) and (b-d) linearly fitting using only one of the three influential factors. Note that the bias is unrelated to the sign of the factor in three linear factor models, so we used the absolute values of the factors in (b-d) to not account for their signs. ** p < 0.01, * p < 0.05.

Performance of Multi-Endmember Model Based on SIMMAP Simulated Data
We showed the changing patterns of the average of bias (bıas ) values at different scales for the training and validation datasets, respectively (Figure 8a,d). The average of bias (negative values) decreases as scale becoming coarser under all conditions of landscape fragmentation (p = 0.45, 0.50, and 0.55), suggesting a greater scale effect for larger differences of spatial resolution between images. Moreover, we found larger absolute values of bıas in the image with higher landscape fragmentation (i.e., lower p values), which suggests that the scale effect is more obvious in heterogeneous areas. We used the proposed multi-endmember model to fit the bias values and achieved > 0.6 for all landscapes, both for training and validation data ( Figure 8).  (14)) and (b-d) linearly fitting using only one of the three influential factors. Note that the bias is unrelated to the sign of the factor in three linear factor models, so we used the absolute values of the factors in (b-d) to not account for their signs. ** p < 0.01, * p < 0.05.

Performance of Multi-Endmember Model Based on SIMMAP Simulated Data
We showed the changing patterns of the average of bias (bias) values at different scales for the training and validation datasets, respectively (Figure 8a,d). The average of bias (negative values) decreases as scale becoming coarser under all conditions of landscape fragmentation (p = 0.45, 0.50, and 0.55), suggesting a greater scale effect for larger differences of spatial resolution between images. Moreover, we found larger absolute values of bias in the image with higher landscape fragmentation (i.e., lower p values), which suggests that the scale effect is more obvious in heterogeneous areas. We used the proposed multi-endmember model to fit the bias values and achieved R 2 > 0.6 for all landscapes, both for training and validation data (Figure 8).

Performance of Multi-Endmember Model Based on MODIS-Landsat Fused EVI2 Data
We presented the average of bias values at various scales of Landsat-MODIS fused EVI2 images in Figure 9a,d. We found a similar decreasing pattern at the larger scale, as that in Figure 8, which confirms a more considerable scale effect when the spatial resolution became coarser. We fitted these bias values by using our multi-endmember model and achieved the performance with > 0.56 and RMSE < 2.3 days. Taking 450-m resolution as an example, we showed the scatter plot (1:1 line) of bias and predicted bias in Figure 10. An > 0.70 in the validation dataset was achieved by the proposed model. Furthermore, in addition to the negative bias values, the model can also account for the positive bias values (Figure 10), suggesting that the new model can explain the delayed , which was not well understood in previous studies [6].

Performance of Multi-Endmember Model Based on MODIS-Landsat Fused EVI2 Data
We presented the average of bias values at various scales of Landsat-MODIS fused EVI2 images in Figure 9a,d. We found a similar decreasing pattern at the larger scale, as that in Figure 8, which confirms a more considerable scale effect when the spatial resolution became coarser. We fitted these bias values by using our multi-endmember model and achieved the performance with R 2 > 0.56 and RMSE < 2.3 days. Taking 450-m resolution as an example, we showed the scatter plot (1:1 line) of bias and predicted bias in Figure 10. An R 2 > 0.70 in the validation dataset was achieved by the proposed model. Furthermore, in addition to the negative bias values, the model can also account for the positive bias values (Figure 10), suggesting that the new model can explain the delayed GUD coarse , which was not well understood in previous studies [6].

Conceptual Explanations of the Scale Effect
Previous studies attributed the scale effect of detected GUD to the heterogeneity of land cover or GUD [5,6]. However, in this study, we found that the scale effect is explained by the heterogeneities both of GUD and vegetation growth speed during spring. We further incorporated these factors into the scale-effect model (Equation (15)) that we developed. The good performance of the model in a series of experiments provided the rationale for clarifying these influential factors.
Here, we further demonstrated in detail why the scale effect is influenced by these three factors and

Conceptual Explanations of the Scale Effect
Previous studies attributed the scale effect of detected GUD to the heterogeneity of land cover or GUD [5,6]. However, in this study, we found that the scale effect is explained by the heterogeneities both of GUD and vegetation growth speed during spring. We further incorporated these factors into the scale-effect model (Equation (15)) that we developed. The good performance of the model in a series of experiments provided the rationale for clarifying these influential factors. Here, we further demonstrated in detail why the scale effect is influenced by these three factors and

Conceptual Explanations of the Scale Effect
Previous studies attributed the scale effect of detected GUD to the heterogeneity of land cover or GUD [5,6]. However, in this study, we found that the scale effect is explained by the heterogeneities both of GUD and vegetation growth speed during spring. We further incorporated these factors into the scale-effect model (Equation (15)) that we developed. The good performance of the model in a series of experiments provided the rationale for clarifying these influential factors. Here, we further demonstrated in detail why the scale effect is influenced by these three factors and how they can be used to explain some phenomena of caused by scale effect (i.e., the scale effect increases with up-scaling and GUD coarse maybe later than GUD fine−ave in some areas).
First, we graphically explain why the detected GUD at a coarse pixel is earlier than the average of GUDs of all fine pixels within it, a phenomenon reported recently [5], and further why this tendency is stronger with greater heterogeneity of GUD. We simulated two coarse pixels (pixels A and B in Figure 11a,b) by the linear spectral mixture model, in which the average GUDs of fine pixels are identical and all the VI curves of corresponding fine pixels have exactly the same shape (no difference in GC and MP) but have differences in GUD and their proportions. The heterogeneity of GUD for pixel B is larger than that for pixel A (bottom-right insets of Figure 11a,b). It can be observed that the VI curve for pixel B with greater GUD heterogeneity obviously increases earlier in spring than pixel A, and thus, pixel B has an earlier GUD (Figure 11c). Such an advance (negative bias) is mainly because the larger number of fine pixels with earlier GUD in pixel B makes a greater contribution to VI values of the mixed coarse pixel in spring, by the linear spectral mixture model. This simulation experiment highlights that the spectral mixture process in VI can linearly propagate the heterogeneity of GUD from the fine scale to the coarse scale, leading to a significant scale effect. how they can be used to explain some phenomena of caused by scale effect (i.e., the scale effect increases with up-scaling and maybe later than in some areas). First, we graphically explain why the detected GUD at a coarse pixel is earlier than the average of GUDs of all fine pixels within it, a phenomenon reported recently [5], and further why this tendency is stronger with greater heterogeneity of GUD. We simulated two coarse pixels (pixels A and B in Figure 11a,b) by the linear spectral mixture model, in which the average GUDs of fine pixels are identical and all the VI curves of corresponding fine pixels have exactly the same shape (no difference in GC and MP) but have differences in GUD and their proportions. The heterogeneity of GUD for pixel B is larger than that for pixel A (bottom-right insets of Figure 11a,b). It can be observed that the VI curve for pixel B with greater GUD heterogeneity obviously increases earlier in spring than pixel A, and thus, pixel B has an earlier GUD (Figure 11c). Such an advance (negative bias) is mainly because the larger number of fine pixels with earlier GUD in pixel B makes a greater contribution to VI values of the mixed coarse pixel in spring, by the linear spectral mixture model. This simulation experiment highlights that the spectral mixture process in VI can linearly propagate the heterogeneity of GUD from the fine scale to the coarse scale, leading to a significant scale effect. Figure 11. A simulation experiment is designed to help understand the influence of heterogeneity of GUD on scale effect. (a,b) Coarse pixels A and B consist of five fine VI curves with five different GUD values but the same VI curve shape. However, the shape of the coarse VI curves is different from the shape of fine VI curves. The heterogeneity of GUD is larger in pixel B than that in pixel A, as shown by the histograms of fine pixel GUD in the bottom-right; (c) The VI curves for coarse pixels A and B derived from the linear spectral mixture model, respectively. , is earlier than , and they are both earlier than (the vertical dashed line in the right of , ).
Next, we demonstrated why heterogeneity of MP and GC under the condition of GUD heterogeneity can also influence the scale effect. We simulated three coarse pixels (pixels A, B, and C; black lines in Figure 12a, 12b and 12c, respectively) consisting of two fine pixels with different GUDs (blue and orange curves). The three coarse pixels have identical GUDs of corresponding fine pixels, but the later green-up fine pixel (orange line) in pixels B and C has smaller MP or greater GC. It is clear that the fine pixels with smaller MP or greater GC have greater amplitude of (orange lines in Figure 12e,f). Consequently, the fine pixel with greater values caused by greater growth speed (smaller MP or greater GC) contributes more in the mixing process, making the GUD of pixel B or C later than that of pixel A and closer to that of the later fine pixel with faster growth speed and greater value (Figure 12b,d). Combining Figures 11 and 12, it is clear that the heterogeneity of GUD, MP, and GC leads to scale effect from the fine to coarse scales through the linear mixture process both in VI and , meaning that the larger the heterogeneity of the three factors, the more significant the scale effect. Figure 11. A simulation experiment is designed to help understand the influence of heterogeneity of GUD on scale effect. (a,b) Coarse pixels A and B consist of five fine VI curves with five different GUD values but the same VI curve shape. However, the shape of the coarse VI curves is different from the shape of fine VI curves. The heterogeneity of GUD is larger in pixel B than that in pixel A, as shown by the histograms of fine pixel GUD in the bottom-right; (c) The VI curves for coarse pixels A and B derived from the linear spectral mixture model, respectively. GUD coarse, B is earlier than GUD coarse, A and they are both earlier than GUD fine−ave (the vertical dashed line in the right of GUD coarse, A ).
Next, we demonstrated why heterogeneity of MP and GC under the condition of GUD heterogeneity can also influence the scale effect. We simulated three coarse pixels (pixels A, B, and C; black lines in Figures 12a, 12b and 12c, respectively) consisting of two fine pixels with different GUDs (blue and orange curves). The three coarse pixels have identical GUDs of corresponding fine pixels, but the later green-up fine pixel (orange line) in pixels B and C has smaller MP or greater GC. It is clear that the fine pixels with smaller MP or greater GC have greater amplitude of K (orange lines in Figure 12e,f). Consequently, the fine pixel with greater K values caused by greater growth speed (smaller MP or greater GC) contributes more in the K mixing process, making the GUD of pixel B or C later than that of pixel A and closer to that of the later fine pixel with faster growth speed and greater K value (Figure 12b,d). Combining Figures 11 and 12, it is clear that the heterogeneity of GUD, MP, and GC leads to scale effect from the fine to coarse scales through the linear mixture process both in VI and K , meaning that the larger the heterogeneity of the three factors, the more significant the scale effect. Our findings are based on the GUD detected by the curvature method. Here, we further used the widely-used relative threshold method (10% and 20% relative threshold) to estimate GUD and investigated whether the scale effect can still be accounted for by the three proposed factors (i.e., ∆ , ∆ , and ∆ ). We performed the same experiments as those in Figure 2, and found that the relationship between the GUD bias and the three factors (Figures 13) was similar to those based on the curvature method (Figure 2d-f). Nevertheless, we observed that the range of the change in the bias with respect to those three factors based on the relative threshold method seems to be slightly smaller than that based on the curvature method. For example, when ∆ varies between −20 and 20 days, the change of the bias values caused by ∆ are 3 and 2.5 days for the curvature method and the relative threshold method, respectively (comparing Figure 2d-f with Figure 13). These small differences are probably because the relative threshold method is slightly less sensitive to scale effects, compared with the curvature method [11]. The scale effects on GUD estimated using the relative threshold method can also be explained by the three factors proposed in this study. Our findings are based on the GUD detected by the curvature method. Here, we further used the widely-used relative threshold method (10% and 20% relative threshold) to estimate GUD and investigated whether the scale effect can still be accounted for by the three proposed factors (i.e., ∆GUD, ∆MP, and ∆GC). We performed the same experiments as those in Figure 2, and found that the relationship between the GUD bias and the three factors ( Figure 13) was similar to those based on the curvature method (Figure 2d-f). Nevertheless, we observed that the range of the change in the bias with respect to those three factors based on the relative threshold method seems to be slightly smaller than that based on the curvature method. For example, when ∆GUD varies between −20 and 20 days, the change of the bias values caused by ∆GUD are 3 and 2.5 days for the curvature method and the relative threshold method, respectively (comparing Figure 2d-f with Figure 13). These small differences are probably because the relative threshold method is slightly less sensitive to scale effects, compared with the curvature method [11]. The scale effects on GUD estimated using the relative threshold method can also be explained by the three factors proposed in this study. Remote Sens. 2019, 11, x FOR PEER REVIEW 16 of 20 We conducted an up-scaling experiment over the contiguous United States. Our results confirmed the findings in Peng et al. [6] and also found that occurred later in the processing of up-scaling in some area, which indicates a positive bias (later green-up for coarse pixel; Figures 7 and 10). The influence of growth speed of the fine-pixel VI curve may explain these observations. As we discussed earlier, the scale effect is caused by two kinds of biases (Equation (10)), which are accounted for by the heterogeneity of GUD and that of growth speed, respectively. The bias due to GUD heterogeneity is systematically negative (earlier green-up for coarse pixel), but the sign of the second bias depends on whether the change of GUD and growth speed is synchronous. In other words, when the growth speed of vegetation with later GUD is significantly greater than that with early GUD, the heterogeneity of growth speed will lead to a positive bias and even exceed the advanced effect caused by the negative former bias, eventually resulting in a positive bias. Therefore, our study highlights the importance to consider the heterogeneity of the growth speed of vegetation in the GUD up-scaling method, rather than to only consider the heterogeneity of GUD. For example, the spatial variability in the threshold of percentile method [5], observed by Peng et al. [7], may be caused by ignoring the heterogeneity of growth speed.

Implications and Limitations
Understanding the scale effect of the VI on GUD detection is useful for cross-scale comparisons and validation of satellite VI-derived phenology. Considering the coarse spatial resolution of commonly used VI products, which ranges from 250 m to 8 km, heterogeneity inevitably exists for most of the Earth's land surface and consequently leads to significant discrepancy among cross-scale comparisons and validations [5,25].
Our analyses suggest that it is helpful to reduce uncertainty to perform cross-scale phenology validation or evaluation of remote-sensing retrieval of phenology based on observations of a homogenous land surface in terms of the influential factors (i.e., the heterogeneity of GUD, MP, GC). For a heterogeneous surface, a practical solution is to include the scale effect in the process of comparison or validation. Table 2 lists a set of coefficients of the proposed scale-effect model for both simulation data (taking p = 0.5 as an example) and Landsat-MODIS fused EVI2 data. The standardized coefficients indicate the different importance of each item, and the unstandardized coefficients values are relatively stable across scales for different surfaces, suggesting the possibility We conducted an up-scaling experiment over the contiguous United States. Our results confirmed the findings in Peng et al. [6] and also found that GUD coarse occurred later in the processing of up-scaling in some area, which indicates a positive bias (later green-up for coarse pixel; Figures 7 and 10). The influence of growth speed of the fine-pixel VI curve may explain these observations. As we discussed earlier, the scale effect is caused by two kinds of biases (Equation (10)), which are accounted for by the heterogeneity of GUD and that of growth speed, respectively. The bias due to GUD heterogeneity is systematically negative (earlier green-up for coarse pixel), but the sign of the second bias depends on whether the change of GUD and growth speed is synchronous. In other words, when the growth speed of vegetation with later GUD is significantly greater than that with early GUD, the heterogeneity of growth speed will lead to a positive bias and even exceed the advanced effect caused by the negative former bias, eventually resulting in a positive bias. Therefore, our study highlights the importance to consider the heterogeneity of the growth speed of vegetation in the GUD up-scaling method, rather than to only consider the heterogeneity of GUD. For example, the spatial variability in the threshold of percentile method [5], observed by Peng et al. [7], may be caused by ignoring the heterogeneity of growth speed.

Implications and Limitations
Understanding the scale effect of the VI on GUD detection is useful for cross-scale comparisons and validation of satellite VI-derived phenology. Considering the coarse spatial resolution of commonly used VI products, which ranges from 250 m to 8 km, heterogeneity inevitably exists for most of the Earth's land surface and consequently leads to significant discrepancy among cross-scale comparisons and validations [5,25].
Our analyses suggest that it is helpful to reduce uncertainty to perform cross-scale phenology validation or evaluation of remote-sensing retrieval of phenology based on observations of a homogenous land surface in terms of the influential factors (i.e., the heterogeneity of GUD, MP, GC). For a heterogeneous surface, a practical solution is to include the scale effect in the process of comparison or validation. Table 2 lists a set of coefficients of the proposed scale-effect model for both simulation data (taking p = 0.5 as an example) and Landsat-MODIS fused EVI2 data. The standardized coefficients indicate the different importance of each item, and the unstandardized coefficients values are relatively stable across scales for different surfaces, suggesting the possibility of estimating scale-derived bias, once we have developed the model for different surfaces. Accordingly, the scale-derived bias estimation will provide a reference that excludes the scale effect or will help estimate uncertainty. Table 2. The model coefficients (c 1 , c 2 , c 3 ) in simulation data (taking p = 0.5 as an example) and Landsat-MODIS fused EVI2 data. Scale refers to the size of the aggregated window. In the temperature-limited biomes (e.g., temperate forests), green-up of a plant species mainly depends on whether the accumulated spring temperature required for green-up is satisfied or not [26,27]. For a heterogeneous surface, the accumulated temperatures required by various species can be quite different [28,29], which may lead to substantial GUD heterogeneity. However, such GUD heterogeneity can vary among years due to temperature fluctuation, even if the vegetation community composition remains unchanged. In years with a warmer spring, the more favorable temperature conditions and faster temperature increase can make it easier to satisfy the requirements of accumulated temperature for different species. Thus, GUD heterogeneity among species may be reduced in these years [30]. Accordingly, it is reasonable to speculate that the heterogeneity of GUD within a coarse pixel could be smaller in years with a warmer spring than in those with a colder spring. Consequently, validation performed in warmer spring years may be less affected by the scale effect. Therefore, we recommend using data in warmer spring years to conduct validation. Moreover, such interannual variations of scale effect caused by different GUD heterogeneity could further contribute to the phenological temporal changes in heterogeneous areas. In particular, the advance of remotely sensed GUD under climatic warming in recent decades [31] could have been partly underestimated, because the negative bias of GUD caused by its heterogeneity in warmer years is smaller than the colder years. This point should be addressed when analyzing the response of phenology metrics to climate change for ecosystems detected by coarse VI products.
We recognize that our current scale-effect analyses still have some limitations. First, our analyses used the logistic function to describe vegetation growth. However, VI time-series data may deviate much from curve of logistic function in some cases (e.g., Figure 3) due to the constraints of various environmental conditions on vegetation growth [32]. Fortunately, the performance of our model in these cases seems to be acceptable (Figures 7 and 8). Further evaluations for scale effect in the case with non-logistic VI curve are necessary in future studies. Second, our investigations were based on up-scaled data rather than on coarse and fine data from different sensors. We did this because various errors from different sensors could not be excluded in the scale-effect analyses, such as differences in imaging conditions, band design and spectral response functions of sensors, and geometric registration, among others. Third, satellite-derived phenology metrics were detected from VI time-series data. Some Vis, such as NDVI, have several known flaws, especially with sensitivity to soil background and spring snowmelt, saturation at moderate to high greenness, and nonlinear scaling [33][34][35]. Clarifying the effects of these complex confounding factors may be needed in future studies to further improve our understanding of the scale effect of land-surface phenology.

Conclusions
Our analyses revealed that the scale effect of GUD was controlled by the heterogeneities of both GUD fine and the vegetation growth speed (MP fine and GC fine ) rather than land-cover or vegetation types. We developed a multivariate scale-effect model (Equation (15)) to account for the GUD bias across different scales. Using both simulated data and MODIS-Landsat fused EVI2 data, we confirmed that the heterogeneity of GUD fine is the most important factor driving the scale effect and this factor directly causes systematically negative bias (i.e., GUD coarse is smaller than GUD fine−ave ). The heterogeneity of vegetation growth speed makes the GUD coarse closer to the GUD of vegetation with faster growth, and the direction of the effect (positive or negative bias) depends on whether there is synchronous change in the GUD and growth speed. Our findings provide a mechanistic explanation of the correlation between the scale effect and land-surface heterogeneity, as well as a reference to understand or further convert GUD acquired at different spatial resolutions.