Detection of Change Points in Pseudo-Invariant Calibration Sites Time Series Using Multi-Sensor Satellite Imagery

: The remote sensing community has extensively used Pseudo-Invariant Calibration Sites (PICS) to monitor the long-term in-ﬂight radiometric calibration of Earth-observing satellites. The use of the PICS has an underlying assumption that these sites are invariant over time. However, the site’s temporal stability has not been assured in the past. This work evaluates the temporal stability of PICS by not only detecting the trend but also locating signiﬁcant shifts (change points) lying behind the time series. A single time series was formed using the virtual constellation approach in which multiple sensors data were combined for each site to achieve denser temporal coverage and overcome the limitation of dependence related to a speciﬁc sensor. The sensors used for this work were selected based on radiometric calibration uncertainty and availability of the data: operational land imager (Landsat-8), enhanced thematic mapper (Landsat-7), moderate resolution imaging spectroradiometer (Terra and Aqua), and multispectral instrument (Sentinel-2A). An inverse variance weighting method was applied to the Top-of-Atmosphere (TOA) reﬂectance time series to reveal the underlying trend. The sequential Mann–Kendall test was employed upon the weighted TOA reﬂectance time-series recorded over 20 years to detect abrupt changes for six reﬂective bands. Statistically signiﬁcant trends and abrupt changes have been detected for all sites, but the magnitude of the trends (maximum of 0.215% change in TOA reﬂectance per year) suggest that these sites are not changing substantially over time. Hence, it can be stated that despite minor changes in all evaluated PICS, they can be used for radiometric calibration of optical remote sensing sensors. The new approach provides useful results by revealing underlying trends and providing a better understanding of PICS’ stability.


Introduction
Satellite remote sensing is used to acquire information related to the Earth's resources and to monitor changes that occur in the land surface, coastal and oceanic ecosystems. Radiometric calibration of remote sensing sensors is the foundation for using its data in a further quantitative application. The complete radiometric calibration of a sensor onboard a satellite typically consists of both pre-launch and post-launch radiometric calibration. Pre-launch radiometric calibration involves evaluating sensor and onboard calibrators in a controlled environment under a variety of operating conditions [1]. These onboard calibrators and sensors are characterized and calibrated using the Système International (SI) traceable standards to achieve the needed accuracy [2]. Nevertheless, once the satellites are launched into space, exposure to ultraviolet radiation and other electrical, mechanical, and thermal effects, may degrade their calibration over time [3]. In consequence, continuous re-calibration is required during the entire sensor lifetime (post-launch calibration).
Post-launch radiometric calibration of remote sensing sensors can be achieved using artificial and natural ground targets on the Earth surface. This technique is known as has changed has not yet been evaluated. In this context, this work aims to expand the trend analysis of PICS by focusing on detecting significant shifts lying behind the PICS time series trend. The sites investigated in this work are PICS that are commonly used in calibration analysis at the South Dakota State University Image Processing Laboratory (SDSU IPLAB): Libya 1, Libya 4, Sudan 1, Egypt 1, Niger 1, and Niger 2. The goal of change point assessments is to describe the nature and degree of the known change and to identify a stable dataset over a time period and make its use for a successful radiometric calibration.
To monitor the long-term temporal stability of the PICS, virtual constellation approach was implemented. There are currently more than 150 Earth-observing satellites that are orbiting the Earth, carrying sensors that measure different sections of the visible, microwave, and infrared regions of the electromagnetic spectrum. The increase in the number of remote sensing sensors provides the community a unique opportunity to combine data sets. Combining data from two or more sensors into a single data set creates a "virtual constellation" [14]. According to CEOS, a virtual constellation is a "coordinated set of space or ground segment capabilities from different partners that focuses on observing a particular parameter or a set of parameters of the Earth system" [15]. The three significant advantages of combining data from multiple sensors into a single seamless time series are: (a) it increases the temporal resolution of the dataset; (b) it overcomes the dependence limitation of the potential trend that exists in any particular sensor; and (c) it allows a revealing new understanding of PICS. The multiple sensor data fusion involves reducing the effects of differences in the sensor's radiometric performance, such as spectral response and acquisition viewing/illumination geometry. The sensors applied in this work are Operational Land Imager (OLI) on-board Landsat-8 satellite, Enhanced Thematic Mapper (ETM+) on-board Landsat-7, Moderate Resolution Imaging Spectroradiometer (MODIS) on-board Terra, MODIS on-board Aqua, and Multispectral Instrument (MSI) on-board Sentinel-2A. These sensors are selected based on the radiometric calibration uncertainty and availability of the data.
The paper is structured as follows: Section 2 provides the work methodology along with the satellite sensors and the location of the six PICS utilized in this work. It also includes the pre-processing steps applied to the satellite sensors imagery to create a virtual constellation and the non-parametric statistical tests used for the analysis. Section 3 presents the results and the interpretation of the results for all six PICS. Section 4 presents the discussion and, finally, concluding remarks are presented in Section 5.

Satellite Sensor Overview
A brief overview of satellite sensors used for this work is described in this section. The image data used in this work came from five orbital platforms: Landsat-7, Landsat-8, Sentinel-2A, Aqua, and Terra. For implementing the virtual constellation approach, it is necessary to combine sensors with similar spectral characteristics to match them with minimal processing requirements [16]. Hence, the sensors from different orbital platforms used in this work have similar spectral coverage, enabling synergistic use. The common spectral bands for the five sensors are denominated blue, green, red, NIR, SWIR 1, and SWIR 2. These six spectral bands are listed in Table 1.  The Landsat series of sensors have acquired the longest continuous global imagery  of the Earth's surface. Landsat-8 and Landsat-7 are part of the Landsat series and were  launched on 11 February 2013, and 15 April 1999. The revisit period of the Operational Land Imager (OLI) onboard Landsat-8 is 16 days, which is eight days out of phase with the Enhanced Thematic Mapper plus (ETM+) onboard Landsat-7. OLI is a push broom imager that has nine spectral bands covering a 185 km swath width and ETM+ is a whiskbroom imager with 8 spectral bands and a 183 km swath width. Both OLI and ETM+ have a spatial resolution of 30 m in their multispectral bands. Landsat-7 has a local overpass time of 10:30 am, which is close to that of Landsat-8 (i.e., 10:13 am) [17]. The ETM+ has an estimated uncertainty of ±5%, and OLI has an estimated post-launch calibration uncertainty of 2% and pre-launch calibration uncertainty of approximately 3% in reflectance products [18,19].

Sentinel MSI
The Multi-Spectral Instrument (MSI) onboard Sentinel-2A is part of the European Space Agency's (ESA, Noordwijk, Netherland) Copernicus program that has been contributing Earth remote sensing data since 2015 [19]. It provides images of the entire Earth every 10 days with spatial resolution varying between 10 m to 60 m for 13 spectral bands. MSI is a push broom imager that has a 295km swath width. The orbit is Sun-synchronous at 786 km altitude with a 10:30 am equatorial crossing time [20]. The data obtained from the onboard Multi-Spectral Instrument (MSI) is comparable to other well-calibrated sensors such as the OLI. The OLI and MSI showed stable radiometric calibration of approximately 2.5% with consistency between matching spectral bands [21].

Terra and Aqua MODIS
The Moderate Resolution Imaging Spectroradiometer (MODIS) onboard Terra and Aqua satellites is one of the key instruments for the National Aeronautics and Space Administration (NASA, Washington DC, USA) Earth Observing System (EOS) providing data for more than 20 years [22]. The NASA MODIS Characterization Support Team (MCST) is responsible for on-orbit calibration, characterization, and sensor operation. Terra has been collecting imagery since 18 December 1999, and Aqua since 04 May 2002. The MODIS sensor is designed to image the Earth in 36 spectral bands with a spatial resolution ranging from 250 m to 1000 m. The instrument is a whiskbroom scanner with a wide field of view of 2330 km and a view zenith angles up to ±45 • . Due to the regular assessment of the sensor's on-orbit performance, both Terra and Aqua MODIS are viewed as well-calibrated sensors. Terra and Aqua MODIS are also commonly called the AM and PM series of satellites because of their equatorial crossing time of 10:20 am and 1:30 pm. They can view the entire globe once in every two days [22]. The estimated radiometric calibration uncertainties of the MODIS TOA reflectance products are 2% [23,24].

Study Area
The sites investigated in this work are the Pseudo-Invariant Calibration Sites located in North Africa. These are the six SDSU IPLAB PICS (Libya 4, Niger 1, Sudan 1, Niger 2, Egypt 1, and Libya 1), that demonstrate better temporal stability and have larger areas as compared to other CEOS referenced sites [25]. The region identified by the PICS Normalization Process (PNP) algorithm developed by the SDSU IPLAB as "optimal" regions was used as the region of interest. This region exhibits temporal, spatial, and spectral variability of 3% or less [8]. Table 2 shows the corresponding latitude and longitude coordinates defining the Region of Interest (ROI) along with the Sentinel UTM/WGS84 projection tiles and Landsat Worldwide Reference System 2 (WRS-2) path and row.

Image Pre-Processing
Radiometrically and geometrically corrected standard MSI data product images referred to as Level 1C (L1C) are available at the Copernicus Open Access Hub (https://scihub.copernicus.eu/ (accessed on 2 October 2020)). Landsat-8 OLI and Landsat-7 ETM+ Collection-1 Level-1 data are available at the United States Geological Survey (USGS) Earth Explorer website (https://earthexplorer.usgs.gov (accessed on 27 March 2020)). For the MODIS data, Collection 6.1 image data products for both Terra and Aqua can be accessed from MODAPS Web Services (https://modwebsrv.modaps.eosdis.nasa.gov/ (accessed on 16 August 2020)). Imagery data for the corresponding Sentinel-2A MSI, Landsat-7 ETM+, Landsat-8 OLI, Terra MODIS, and Aqua MODIS for the selected PICS were downloaded and later retrieved from the existing SDSU IPLAB archive. The preprocessing flowchart of this work is shown in Figure 1.

Conversion to TOA Reflectance
Landsat Collections Level-1 data product consists of calibrated and quantized scaled Digital Numbers (DN) representing the multispectral image data. These reflective band DN's were converted into TOA reflectance using the rescaling coefficients from the MTL file [26,27].
where, ρ λ is TOA reflectance without the solar angle correction , M ρ is band-specific multiplicative rescaling factor from the metadata, Q cal is quantized and calibrated standard product pixel values DN, and A ρ is band specific additive rescaling factor from the metadata. The solar angle correction of the TOA reflectance was calculated by using the formula: where, ρ λ is TOA reflectance without the solar angle correction, and SZA is solar zenith angle. The conversion of the Sentinel-2A MSI DN's to TOA reflectance was done by using a single constant for scaling, which accounts for the exo-atmospheric irradiance, Earth-sun distance, and the solar angle cosine correction: where, DN cal is calibrated DN value (pixel value), and g = 1000 is the scale factor. The MODIS image data products were processed to TOA reflectance using the calibration parameters from the metadata for the same region of interest used for other sensors at their specified spatial resolution. The input calibration parameters' values were transformed into the reflectance scales and reflectance offsets by using Equation (4).
where, ρ cos(θ) B,T,FS is reflectance product for reflected solar bands, SI B,T,FS is level 1B product scaled integer, and reflectance_scales B and reflectance_offsets B are parameters deduced from Level 1B calibration parameters.

Data Filtering to Remove Cloud, Cloud Shadow, and Large Sensor View Angles
After calculation of the cosine corrected mean TOA reflectance value for the Region of Interest (ROI) of each image, filtering was required to obtain cloud and cloud shadow-free imagery. A quality assessment band is included with Landsat Level-1 data products in which each pixel in the Quality Assessment band consists of a decimal value, which represents bit-packed combinations of sensor, atmosphere, and surface conditions. Thus, the quality assessment band data can be used to filter out the cloud-contaminated pixels from ETM+ and OLI imagery. The pixel value of 672 (clear pixels) of the quality assessment band was used to form a cloud binary mask for removing cloudy pixels from ETM+ imagery and a pixel value of 2720 for OLI imagery [3]. Likewise, for Sentinel, cloud mask data from a cloud mask file was used to remove the cloudy pixels for all the bands with different spatial resolutions. Some outliers remained in the dataset even after applying the cloud mask using the quality assessment band and the cloud mask data. Thereby, an empirical sigma (σ) filtering approach was used for identifying outliers. Basically, a scene was considered a potential outlier when the mean TOA reflectance value of the scene exceeds one sigma (± 1σ) threshold. Subsequently, a visual inspection was carried out for the images that were considered as a potential outlier. When cloud, cloud shadow, or other artifacts were identified during the visual inspection of the image within the ROI for any spectral band of an image, the entire scene was discarded from the analysis. Figure 2 illustrates a potential identified outlier and the image corresponding to the point highlighted by the red circle.
As can be seen, the red rectangle represents the ROI on the image, and it contains clouds, therefore this image was discarded from the analysis.

Scaling Adjustment Factor (Near Coincident Scene Pairs)
The five sensors used in this work collect data in similar spectral regions; however, there are some differences in the Spectral Response Function (SRF) (see Table 1). These differences in SRF between sensors may generate a systematic error during the process of combining the data. Consequently, for mitigating the effects of differences in SRF between sensors, a procedure to achieve spectral consistency and to correct the absolute difference in the data was employed [13,16].
The TOA reflectance generated from ETM+, MODIS, and MSI sensors were scaled to match the OLI sensor by applying a Scaling Adjustment Factor (SAF), i.e., the TOA reflectance of each sensor was normalized by using the adjustment factor. The SAF was calculated by taking the mean ratio of cosine corrected TOA Reflectance values from near coincident acquisitions with the OLI (see Equation (5)). Near coincident acquisition refers to the scenes that are imaged within a maximum acceptable window of days. For MODIS and Landsat-7 ETM+, 'near-coincident' refers to the scene pairs that are imaged approximately eight days apart as they are eight days out of phase with the OLI. The scene pairs that are imaged within the window of four days were used for Sentinel-2A since they provide optimal scaling factor with minimum differences in between the sensors.
where, SAF is scaling adjustment factor, ρ is mean TOA reflectance from OLI sensor, and ρ / / is mean TOA reflectance from ETM+, MSI, or MODIS sensor.

Bidirectional Reflectance Distribution Function (BRDF) Normalization
PICS are not perfect Lambertian surfaces, and consequently, directional reflectance effects are present in satellite remote sensors reflectance retrievals due to variability in the geometry of acquisition, especially in longer wavelength bands. The TOA reflectance is dependent on the illumination and the viewing geometry, which vary significantly from one acquisition to another. To address the impacts due to the different solar and viewing geometries on the derived TOA reflectance, a Bidirectional Reflectance Distribution Function (BRDF) normalization was applied to the dataset.
Several approaches have been reported in the literature to model the BRDF data. Liu [28] employed a BRDF model based only on the solar zenith angle for cross-calibration between MODIS and MVIRS. Schlapfer [29] also proposed a normalization method for All satellite sensors used in this work had view zenith angles of 7.5 degrees or less except MODIS. The Field of View (FOV) of the MODIS sensor is ± 49.5 • , which can increase the BRDF differences between the sensors due to uneven illumination and reflectance. Therefore, for MODIS, the scenes having a large view zenith angle (VZA > 15 • ) were excluded from the analysis in order to minimize the BRDF differences between the sensors.

Scaling Adjustment Factor (Near Coincident Scene Pairs)
The five sensors used in this work collect data in similar spectral regions; however, there are some differences in the Spectral Response Function (SRF) (see Table 1). These differences in SRF between sensors may generate a systematic error during the process of combining the data. Consequently, for mitigating the effects of differences in SRF between sensors, a procedure to achieve spectral consistency and to correct the absolute difference in the data was employed [13,16].
The TOA reflectance generated from ETM+, MODIS, and MSI sensors were scaled to match the OLI sensor by applying a Scaling Adjustment Factor (SAF), i.e., the TOA reflectance of each sensor was normalized by using the adjustment factor. The SAF was calculated by taking the mean ratio of cosine corrected TOA Reflectance values from near coincident acquisitions with the OLI (see Equation (5)). Near coincident acquisition refers to the scenes that are imaged within a maximum acceptable window of days. For MODIS and Landsat-7 ETM+, 'near-coincident' refers to the scene pairs that are imaged approximately eight days apart as they are eight days out of phase with the OLI. The scene pairs that are imaged within the window of four days were used for Sentinel-2A since they provide optimal scaling factor with minimum differences in between the sensors.
where, SAF is scaling adjustment factor, ρ OLI is mean TOA reflectance from OLI sensor, and ρ ETM+/MSI/MODIS is mean TOA reflectance from ETM+, MSI, or MODIS sensor.

Bidirectional Reflectance Distribution Function (BRDF) Normalization
PICS are not perfect Lambertian surfaces, and consequently, directional reflectance effects are present in satellite remote sensors reflectance retrievals due to variability in the geometry of acquisition, especially in longer wavelength bands. The TOA reflectance is dependent on the illumination and the viewing geometry, which vary significantly from one acquisition to another. To address the impacts due to the different solar and viewing geometries on the derived TOA reflectance, a Bidirectional Reflectance Distribution Function (BRDF) normalization was applied to the dataset.
Several approaches have been reported in the literature to model the BRDF data. Liu [28] employed a BRDF model based only on the solar zenith angle for cross-calibration between MODIS and MVIRS. Schlapfer [29] also proposed a normalization method for sensors having a wide field of view based on the solar zenith, view zenith, and relative azimuth angles (i.e., the difference between the solar and view azimuth angles). Likewise, Mishra [25] derived an empirical linear BRDF model based on solar and view zenith angles using near nadir images of Terra MODIS over Libya 4. Farhad [30] developed a linear BRDF model where the reflectance's were modeled as a quadratic function of view zenith angles. This model was further extended by Kaewmanee [31] with quadratic terms for all four angles (solar zenith/azimuth and view zenith/azimuth), including the interaction terms. This model provided the best BRDF model characterization and the best after normalization uncertainty. For this work, Kaewmanee's 15 coefficient quadratic model derived from four angles was chosen for BRDF normalization (see Equation (6)).
where, β 0 , β 1 , β 2 , . . . . are the model coefficients, and Y 1 , X 1 , Y 2 , X 2 are Cartesian coordinates representing the planar projections of the solar and sensor angles originally given in spherical coordinates: where, SZA is solar zenith angle, SAA is solar azimuth angle, VZA is view zenith angle, and VAA is view azimuth angle. The BRDF-normalized TOA reflectance for each sensor was calculated using: where, ρ obs is observed mean TOA reflectance from each scene , ρ model is model predicted TOA reflectance, and ρ ref is TOA reflectance with respect to a set of "reference" solar and sensor position angles. For this analysis, the reference SZA, SAA, VZA, and VAA angles were chosen by taking the mode of the corresponding SZA, SAA, VZA, and VAA angles from all processed scenes. The reference angle chosen for six PICS for the normalization procedure is listed in Table 3. In the course of selecting an appropriate TOA reflectance harmonization algorithm for the TOA reflectance dataset from five different sensors, the use of a single BRDF model worked well in the data by giving a minimum coefficient of variation (lesser amount of dispersion around the mean) in the normalized dataset. Although previous work [13] used different BRDF models for different sensors for the same site, it is well known that the directional effect is a property related to the target. Hence, it would be more reasonable to normalize the combined TOA reflectance data using a single BRDF model for a site with a common reference angle rather than using a different BRDF model for different sensors data for the same site.

Statistical Analysis 2.4.1. Inverse Variance Weighted Moving Average Method
Intrinsic in the TOA reflectance data collected over time is some form of random variation. Smoothing (or averaging) is a technique applied to time series to reduce the effect caused due to random variation in the data. The idea is to remove the noise from the data and expose the signal to the underlying trend, seasonal and cyclical components in the time series. There are several smoothing techniques, which involve averaging methods and exponential smoothing methods. Simple averaging and averaging by weights fall under averaging methods, where simple averaging implies taking the average of the observations and averaging by weights implies assigning equal or unequal weights to all the observations and then averaging them. In this work, weighted moving averages were applied, with the weights based on each observation's uncertainty. Non-parametric statistical analyses were applied to the resulting smoothed series [32].
The method of weighting the TOA reflectance data by uncertainties and applying the moving average is referred to as the inverse variance weighted average method. Calculating a moving average involves producing a new time series where the values comprise the weighted average of observations in the original time series. There is a need for weighting the data before applying statistical tests because there may be an over-representation of certain characteristics in the dataset. For instance, the test might be more biased towards the OLI sensor as we are using OLI as a reference for the scaling adjustment. On the other hand, having a larger number of observations than other sensors could also influence the test results. Weighting compensates for biases in the data. The moving average technique requires a specific window size, which defines the number of original observations used to calculate the moving average value.
In this work, a moving average with a window size of twelve months with seven days shifting was applied to reduce the seasonal effects, smooth out the short-term fluctuations in the time series data, and highlight long-term trends. For this application, the weights were allocated by making each point's weight inversely proportional to the square of its uncertainty [33] (see Equation (13)). Hence, data points in the time series with less uncertainty are given more weight than data points with higher uncertainty values (see Equation (12)). (13) where, ρ i is virtual constellation TOA reflectance time series, w i is weight, and σ is total uncertainty associated with each TOA reflectance. The total uncertainty associated with each observation of the time series was calculated taking into account four sources of uncertainties: (1) spatial uncertainty; (2) BRDF model uncertainty; (3) scaling adjustment factor uncertainty; and (4) sensor calibration uncertainty (see Equation (14)).
The spatial uncertainty was calculated considering the variation of mean TOA reflectance within the ROI. BRDF model uncertainty was evaluated by taking the residual errors between the observed and the predicted TOA reflectance. Scaling adjustment factor uncertainty is the ratio of the standard deviation to the mean scaling adjustment factor. Lastly, the sensor calibration uncertainties for different sensors used for this work are listed in Table 4. After applying the inverse variance weighting method, change point detection test is applied to the dataset. The purpose of the change point detection test is to investigate possible dates when the behavior of time series data has changed. The sequential version of the Mann-Kendall test [34] is a non-parametric test applied to time-series data to detect potential change points. This analysis has been increasingly employed in different fields, including finance, aerospace, meteorology, and climate changes. Bisai [35] examined the fluctuations of trend and the abrupt change point of atmospheric temperature time series in the Kolkata observatory using the sequential Mann-Kendall test. Kundu [36] also employed the Mann-Kendall test and Sen's slope estimator to determine the annual variability in the meteorological parameters of Western Rajasthan.
The test uses the ranked values of the original data in the analysis and locates the beginning year of the trend. It sets up two series, one progressive u(t) and another retrograde u (t) by computing the rank values y i from the original values (x 1 , x 2 , x 3 . . . , x n ) and the magnitudes of y i (i = 1,2,3 . . . ,n) are compared with y j (j = 1,2,3 . . . ,i-1). For each comparison, if y i >y j , they are counted and denoted by n i . The test statistic t i is given by: The distribution of test statistic t i has a mean as and variance as The sequential values for the series were calculated for each of the test statistic variables t i using: Here, the progressive sequential statistic u(t i ) is estimated using the original time series (x 1 ,x 2, x 3 . . . , x n ) whereas the values of a retrograde sequential statistic u (t i ) are estimated in the same way but starting from the end of the series (x n ,x n−1 , x n−2 . . . , x 1 ). The standardized variable u(t) has a zero mean and unit standard deviation and fluctuates around zero [37]. When the values of u(t i ) (progressive series) and u (t i ) (retrograde series) are plotted as a function of time, the intersection point of these curves represents the approximate turning point of trend within the time series. When either the progressive or retrograde curve exceeds a certain threshold before or after the intersection point, this trend turning point (change point) is considered significant at the corresponding level [35]. The threshold value in this study is ±2.58 (99% significance level), with the crossing point estimating the year at which the trend begins.

Statistical Test for Trend Analysis (Mann-Kendall Test)
The Mann-Kendall test is a non-parametric trend detection technique that tests the null hypothesis H 0 : data are independently and randomly ordered so that there is no monotonic trend present against the alternative hypothesis H a : there is a monotonic trend in the data [38,39]. In other words, the Mann-Kendall test analyzes data collected over time for consistently monotonic trends (increasing or decreasing trends).
For this work, the Mann-Kendall test was applied to the same time series using two different strategies: (a) by dividing the time series into different regions based on the change points detected by the sequential Mann-Kendall test and applying the test in those regions to identify the slope magnitude of the trend; and (b) applying the test to the entire time series to look into the presence of an overall trend in every band. The mathematical theory behind the Mann-Kendall test is that it compares the difference in signs between earlier and later data points. The concept is that if a trend is present, the sign values will tend to increase or decrease based on the direction of the trend. Let x 1 , x 2 , x 3 . . . , x n represent n data points where x j represents the data point at time j. The Mann-Kendall test statistic is given by: where: The initial value of the Mann-Kendall statistic, S, is assumed to be 0, and each data value is compared to all the subsequent data values in the ordered time series. If the data value from a later time period is higher than the data value of an earlier time period, then S is incremented by 1. If the data value from the later time period is lower than the data value of the earlier time period, S is decremented by 1. A very high positive value of S indicates an increasing trend, and a very low negative value indicates a decreasing trend [40]. This test was performed at a 99% significance level.

Results
The results obtained at each step of the methodology for Sudan 1 are presented in Section 3.1 and the summary of results for rest of the sites are presented in Section 3.2.

Stability Analysis of Sudan 1 3.1.1. Scaling Adjustment Factor
To implement the virtual constellation approach, TOA reflectance data was homogenized using the scaling adjustment factor as described in Section 2.3.3. Figure 3 illustrates TOA reflectance from the combined satellite sensors over time for Sudan 1 site without ( Figure 3a) and with (Figure 3b) scaling adjustment factor. The offsets seen in Figure 3a caused by atmospheric effects, differences in the SRF, and the spectral signature of the ground target, of the individual sensors, are reduced after applying the scaling adjustment factor.

Bidirectional Reflectance Distribution Function (BRDF) Normalization
The adjusted TOA reflectance from the combined sensor data was BRDF normalized in order to minimize seasonal effects. Figure 4 shows the TOA reflectance predicted by the 15-coefficient BRDF model explained in Section 2.3.4. The residual error of this model was estimated by taking the mean percentage difference between the observed value and value predicted by the BRDF model for all the sites, which is presented in Table 5. The residual errors are expected to be close to zero when the model is working well for the data. From the table, it is evident that the residuals are close to zero except for the blue and SWIR 2 band. This might be due to the atmospheric and water absorption features of these bands. Figure 4. In red, the TOA reflectance predicted by the BRDF model; and in black, the observed TOA reflectance data of virtual constellation over Sudan 1 for blue, green, red, NIR, SWIR 1, and SWIR 2 bands.

Inverse Variance Weighted Moving Average Method
Next, the inverse variance weighted moving average method was applied to scaling adjusted BRDF normalized TOA reflectance dataset. Table 6 summarizes the total average estimated uncertainty considering all four sources of uncertainty mentioned in Section 2.4.1. The scaling adjusted BRDF normalized TOA reflectance and the TOA reflectance obtained after applying the weighted moving average trends over Sudan 1 are shown in Figure 5.
The weighted averaged TOA reflectance shows no fluctuations in the dataset and highlights the long-term trends. In addition, the residual seasonality seen in the dataset even after BRDF normalization has been minimized.

Sequential Mann-Kendall Test
The results of the sequential Mann-Kendall (SQMK) test statistic for the virtual constellation weighted averaged TOA reflectance dataset of Sudan 1 reveal a statistically significant change point. The yearly plots for the progressive series u(t i ) and retrograde series u (t i ) for each band are shown in Figure 6. The level of significance was set to 99% (i.e., critical value ±2.58). The SQMK result for the SWIR 2 band is also similar. The trend and change behavior of band pairs blue and green, red and NIR, and SWIR 1 and SWIR 2 are expected to be similar since their frequency range along the electromagnetic spectrum to acquire the data are close to each other. Table 7 summarizes change points detected by the SQMK test for all six spectral bands for Sudan 1 site.

Mann-Kendall Test
The presence of monotonic trends and their statistical significance was verified using the non-parametric Mann-Kendall (MK test) at a significance level of α = 0.01. The MK test was applied to: (a) the entire time series to indicate the long-term trends; and at (b) sub-periods based on change-point results to indicate short-term trends.
The results of the MK test applied to the entire time series to identify a long-term trend is presented in Table 8. The test results indicate that there is a presence of long-term monotonic trends in all the bands. On the other hand, the test was not able to detect a trend in some of the bands (red, NIR, SWIR 1, and SWIR 2) when applied to the unweighted TOA reflectance due to the presence of random variation in the dataset. This suggests that applying the statistical test to a weighted moving averaged dataset gives an advantage of detecting the underlying trend over the unweighted dataset. For the descriptive analysis of the site's stability, the MK test was applied to the subperiods based on the change point results (SQMK test) presented in Section 3.1.4. Figure 7 highlights the significant monotonic trends before and after the turning points throughout the time series. The points in between the trend line represents the inflection point

Stability Analysis of Libya 4, Egypt 1, Niger 1, Niger 2 and Libya 1
The summary of results obtained for Libya 4, Egypt 1, Niger 1, Niger 2 and Libya 1 are presented in this section.
The MK test results applied to the sub-periods defined by the results of the SQMK test are shown in Figure 8. The figure shows the weighted averaged TOA reflectance dataset divided into sub-periods over Libya 4 for corresponding bands. These sub-periods are fitted with a significant trend line (α = 0.01) based on the MK test results both before and after the change points. Figure 8 depicts pronounced upward and downward trends occurring at different time periods throughout the entire time series. The MK test indicates a significant upward trend followed by a downward trend in the blue band, highlighting the actual change points in the years 2009, 2013, and 2018. The green band has a similar trend pattern as the blue band, but the upward trend starts one year earlier with the change points in the years 2008, 2013, and 2017. Although the slope of these trend lines is small (∼ 10 −4 ), the inflection points (where the function changes concavity) between these trend lines are statistically significant. The red and the NIR bands also indicate an upward trend and a downward trend with the inflection point in the year 2008 between these trend lines. The inflection points around the year 2012 for SWIR 1 and SWIR 2 can be seen with a downward trend before and upward trend after the inflection point. Table 9 summarizes the change points detected by the SQMK test for all the six spectral bands for the Libya 4 site.  For Egypt 1, (see Figure 9) there is an increasing trend in the blue and green bands from the beginning of the time series, followed by a decreasing trend towards the end of the time series. However, the change point is detected at different years (2017 and 2019 respectively) for each band. The red and NIR bands have a similar trend, indicating change point in the same year (2006, 2016, and 2019). Although the SWIR bands are also expected to exhibit a similar trend pattern, the test results show contrasting results for this specific site. The decreasing trend is seen at the beginning of the time series until the year 2016 and an increasing trend after that for SWIR 1 band, whereas there is an increasing trend followed by a decreasing trend with the turning point in the year 2016 for SWIR 2 band. The reason behind the contradiction in the trend result for this band pair might be due to the absence of AQUA data (noisy detector) for the SWIR 1 band [41]. Table 10 summarizes the change points detected by the SQMK test for all the six spectral bands for Egypt 1 site. Figure 9. TOA reflectance data of the virtual constellation over Egypt 1 for blue, green, red, NIR, SWIR 1, and SWIR 2 bands with fitted trend lines before and after turning points. For Niger 1, (see Figure 10)  Table 11 summarizes the change points detected by the SQMK test for all the six spectral bands for Niger 1 site. Figure 10. TOA reflectance data of the virtual constellation over Niger 1 for blue, green, red, NIR, SWIR 1, and SWIR 2 bands with fitted trend lines before and after turning points. For Niger 2, (see Figure 11) it is evident that, like other sites, the band pair blue, green and red-NIR have similar trend patterns and the same year of change point but different trend patterns and same years of the change point for SWIR bands. Table 12 summarizes the change points detected by the SQMK test for all the six spectral bands for the Niger 2 site. Figure 11. TOA reflectance data of the virtual constellation over Niger 2 for blue, green, red, NIR, SWIR 1, and SWIR 2 bands with fitted trend lines before and after turning points. For Libya 1, (see Figure 12) the graph for each band indicates the turning point detected by the SQMK test and the trend behavior before and after the turning point. The change points and the trend patterns are again similar for the band pairs blue-green, red-NIR, and SWIR1-SWIR 2. The change points detected by the SQMK test for all the bands are summarized in Table 13. Figure 12. TOA reflectance data of the virtual constellation over Libya 1 for blue, green, red, NIR, SWIR 1, and SWIR 2 bands with fitted trend lines before and after turning points. The Mann-Kendall test results for the overall long-term trend for Libya 1, Egypt 1, Niger 1, Niger 2 and Libya 1 for all six bands are presented in Table 14. This analysis provides sufficient statistical evidence to state that there is a long-term upward trend in all the bands for Libya 4, Libya 1, and Egypt 1. In contrast, Sudan 1 PICS data indicates a long-term downward trend in all common bands except for Red and NIR bands. Likewise, Niger 1 PICS data indicates a long-term downward trend in all the bands except for the blue band, and Niger 2 indicates temporal stability across all the bands except for green and SWIR 2 bands.

Discussion
Taking the advantage of the virtual constellation approach to combine the data from multiple sensors into a single time series and separating sensor changes from site changes, this work focused on quantifying the temporal variability of six PICS. Based on the analysis of the results, it can be inferred that there is a presence of both short-term and long-term monotonic trends in the six common reflective bands of five sensors for all six evaluated PICS. Table 15 summarizes the long-term reflectance change for all the bands over all the six PICS sites. The negative and positive sign in this table indicates a monotonic downward and upward trend, respectively. Even though the trend detected are statistically significant, the magnitude of the slope is noticeably small ranging from 0.008% to 0.215% change in reflectance unit per year. Hence, these trends do not suggest that these sites are changing excessively over time. The findings of the current study and findings of the previous study by Tuli [13] are different in context of presence of long-term trend in some of the sites, which is expected because both the work employs different statistical approaches. For example, the result of the previous study does not indicate any monotonic trend in six reflective solar bands for Libya 4 and Egypt 1 whereas; the current study shows that there is a presence of long-term upward trend in all the solar reflective bands for Libya 4 and Egypt 1. Likewise, for other sites the current study indicates presence of long-term monotonic trend that previous study was unable to detect. This implies that the current approach is more robust as weighting the TOA reflectance by uncertainty removes the biasness of the statistical test on the data and highlights the underlying long-term trend.
The results also provide evidence from a statistical perspective to indicate significant change points in all the sites. The change points have wider implication for those utilizing the calibration sites as they highlight the dynamic nature of the sites. These changes might be atmospheric change, climate change, or changes due to human activities. Therefore, the analysis presented here could be extended further to try to find out the reason behind the change or relate these changes to an actual physical change with the use of independent data and meteorological records. From the results, it is also clear that the amount of temporal change throughout the time series varies at different sub-periods for different spectral bands, but the results on the trend behavior are broadly consistent with the band pairs blue/green, red/NIR, and SWIR1/SWIR 2 in most of the sites.
The findings of this study suggest that, even though some changes are statistically significant, they can still be used for radiometric calibration. However, for the appropriate use of the PICS data, the user should be careful while implying the change observed in the temporal stability of the PICS directly to the sensor response as some amount of changes might be coming from the site itself. The magnitude of the trends (amount of site change) reported in this work can be useful to reduce the calibration uncertainty of the satellite sensors. Another important aspect to take into consideration is the stability requirement of the PICS for each of the satellite sensor missions. For example, the maximum amount of temporal change detected in all the sites was 0.215% per year for Niger 2. This magnitude of change increases with the increase in length of time of change. In 10 years' time, the magnitude of the change becomes 2.15%, which is less than the stated mission requirement (i.e., 5% calibration uncertainty) for Landsat-7 ETM+; therefore, the site change may be neglected. Nevertheless, some sensors demonstrate finer calibration uncertainty. The radiometric accuracy of sensor Traceable Radiometry Underpinning Terrestrial and Helio Studies (TRUTHS) is 0.3% [42]. In this case, the observed temporal change in 10 years would be 3%, which cannot be overlooked, and the site cannot be considered as a viable source of radiometric calibration.
In summary, this analysis seems to improve the understanding of the stability of PICS by locating the change point and revealing the underlying trend that previous study methods were incapable of detecting. It also has outlined the importance of quantifying the temporal stability of PICS, showing that temporal invariance of the site cannot be assumed. This paper may serve as a reference for the calibration engineers who intend to use PICS for the radiometric calibration and performance monitoring of the satellite sensors in the future.

Conclusions
Pseudo-invariant calibration sites have been widely used over the last decades for the regular monitoring of the radiometric performance of the Earth-observing satellite sensors. PICS are the regions on the Earth's surface that are assumed to exhibit minimal change over a long period. Historically, any change observed in the temporal stability of the PICS was directly implied to the change that occurred due to the sensor response. However, this assumption has not been assessed extensively in the past. The present study utilizes the virtual constellation approach to form a long-term TOA reflectance time series from five different satellite sensors (Landsat-8 OLI, Landsat-7 ETM+, Sentinel-2A MSI, Terra MODIS, and Aqua MODIS) over six different PICS sites (Libya 4, Sudan 1, Niger 1, Niger 2, Libya 1 and Egypt 1). The underlying trend and the locations where the behavior of PICS data has changed in this time series were identified using the statistical sequential Mann-Kendall test. Finally, the change point determined by the sequential Mann-Kendall test was used to detect short-term trends by applying the Mann-Kendall test to sub-periods of the time series. The results show that the magnitude of weighted averaged TOA reflectance trends for Sudan 1, Libya 4, Egypt 1, Niger 1, and Libya 1 over 20 years are small changing no more than 0.042% per year even though they are statistically significant. For Niger 2, there is a maximum change of 0.215% reflectance unit per year for the SWIR 2 band compared to all other sites that have been evaluated.
The proposed algorithm provides useful results for identifying the stable dataset over a time period to its usefulness for radiometric calibration. It is important to highlight that this approach can be used to detect change points not only for the selected PICS but also for any time-series data that is used by the remote sensing community. Therefore, the analysis presented here could be further extended to determine the temporal stability of other CEOS recommended PICS, extended PICS, and even satellite sensors.