Annual Seasonality Extraction Using the Cubic Spline Function and Decadal Trend in Temporal Daytime MODIS LST Data

Examining climate-related satellite data that strongly relate to seasonal phenomena requires appropriate methods for detecting the seasonality to accommodate different temporal resolutions, high signal variability and consecutive missing values in the data series. Detection of satellite-based Land Surface Temperature (LST) seasonality is essential and challenging due to missing data and noise in time series data, particularly in tropical regions with heavy cloud cover and rainy seasons. We used a semi-parametric approach, involving the cubic spline function with the annual periodic boundary condition and weighted least square (WLS) regression, to extract annual LST seasonal pattern without attempting to estimate the missing values. The time series from daytime Aqua eight-day MODIS LST located on Phuket Island, southern Thailand, was selected for seasonal extraction modelling across three different land cover types. The spline-based technique with appropriate number and placement of knots produces an acceptable seasonal pattern of surface temperature time series that reflects the actual local season and weather. Finally, the approach was applied to the morning and afternoon MODIS LST datasets (MOD11A2 and MYD11A2) to demonstrate its application on seasonally-adjusted long-term LST time series. The surface temperature trend in both space and time was examined to reveal the overall 10-year period trend of LST in the study area. The result of decadal trend analysis shows that various Land Use and Land Cover (LULC) types have increasing, but variable surface temperature trends.


Introduction
Satellite-based climate-related data, such as Land Surface Temperature (LST), play a critical role in monitoring climatological processes, land surface energy interactions and water balance at regional to global scales [1][2][3][4], as well as climate change impacts. LST is the skin temperature of the land surface measured at the interface between surface materials (top of plant canopy, water, soil, ice or snow surface) and the atmosphere [5]. Some of the most advanced and widely-used LST products are those from the Moderate Resolution Imaging Spectroradiometer (MODIS) on-board the NASA Earth Observation System (EOS) Terra and Aqua satellites, launched in 1999 and 2002, respectively. Both space-borne sensors detect and measure the Earth's surface temperature four times per day, at 10:30/22:30 (Terra) and 13:30/1:30 (Aqua) local overpass-times. With a high temporal frequency of curve that does not display too much rapid fluctuation by selecting the right number and placements of the spline function components. In this study, we are aware of making a necessary compromise between two curve estimations aspects, acceptable fit to data and the smoothness and continuity of the fitted curve. Lastly, we apply the method and analyse trends and seasonality over the island of Phuket, Thailand, a tropical island study area with forests and complex land use/cover patterns.

Study Area
The study area is Phuket Island (see Figure 1). It has a tropical monsoon weather designation using the Köppen climate classification [19]. Its climate is typically divided into two distinct seasons, dry and rainy, with transitional periods in between. The dry season begins in November and usually lasts until March, which is influenced by the northeast monsoon that brings cold and dry air from the Asian continent. The rainy season starts in April and lasts until October. The strong wind of the southwest monsoon draws warm moisture from the India Ocean, resulting in plenty of rain. The average rainfall in Phuket is around 200-400 mm per month during the wet and humid period. Phuket Island is divided into three administrative districts and one local administration. The details of district area and its land cover are described in Table 1 Table 2.
Remote Sens. 2017, 9, 54 3 of 19 (2) obtaining an estimated curve that does not display too much rapid fluctuation by selecting the right number and placements of the spline function components. In this study, we are aware of making a necessary compromise between two curve estimations aspects, acceptable fit to data and the smoothness and continuity of the fitted curve. Lastly, we apply the method and analyse trends and seasonality over the island of Phuket, Thailand, a tropical island study area with forests and complex land use/cover patterns.

Study Area
The study area is Phuket Island (see Figure 1). It has a tropical monsoon weather designation using the Köppen climate classification [19]. Its climate is typically divided into two distinct seasons, dry and rainy, with transitional periods in between. The dry season begins in November and usually lasts until March, which is influenced by the northeast monsoon that brings cold and dry air from the Asian continent. The rainy season starts in April and lasts until October. The strong wind of the southwest monsoon draws warm moisture from the India Ocean, resulting in plenty of rain. The average rainfall in Phuket is around 200-400 mm per month during the wet and humid period. Phuket Island is divided into three administrative districts and one local administration. The details of district area and its land cover are described in Table 1. The land use/cover distribution is illustrated in the map of the year 2009 LULC classification ( Figure 1b). There are three major land cover types found in the study area are Forest (F), Agricultural land (A) and Urban and built-up land (U), according to the Level-1 LULC types classified by the Land Development Department (LDD), Ministry of Agriculture and Cooperatives of Thailand. The inland Water bodies (W) are mostly dams, canals, old mine pits and aquaculture farms that comprise 1.5% of the entire island area. The abandoned open ground mines, which eventually turn into scrub, open woodland and grassland, comprise the primary land use type of Miscellaneous land (M) that cover around 7.1% of the study area. Figure 1b also shows the locations of three samples of 1-km MODIS LST pixels with 1:30 p.m. overpass-time MODIS LST time series that were extracted from and subsequently were used to model annual seasonality. The details of the sample sites location and their LULC types are described in Table 2.

MODIS LST
The Collection-6 (C6) daytime MODIS/Terra and MODIS/Aqua land surface temperature and emissivity eight-day composite L3 global 1-km sinusoidal grid products (MOD11A2 and MYD11A2) downloaded from the Global Subsets Tool [20] were used in this study. The C6 MODIS LST/emissivity products have a much better quality than the qualities of the C4.1 LST product (produced by the version 4 algorithm from C5 input data) and C5 products (generated by the version 5 algorithm from C5 input data) because they provide more stable results. Detailed validation of the new version of MODIS LST products is given in the most recently published paper [18,21]. As one of the aims of this study was to demonstrate how to apply the cubic spline function for detecting seasonal patterns in satellite-based temperature data, three samples of daytime temperature time series were extracted from a MYD11A2 product (LST_Day_1km). Moreover, its product quality control layers (QC_day), which provide information on the accuracy of the retrievals, were also used. Each time series consists of a total of 690 observations (4 July 2002-26 June 2017), which is precisely 15 years of data observation. Moreover, daytime LST data from both MODIS sensors on Terra and Aqua satellites were also used to apply the proposed pixel-based seasonal extracting method and further decadal trend investigation.

Meteorological Data
The 15-year historical weather data acquired from the National Climatic Data Center (NCDC) under public access provided by NOAA's National Center of Environmental Information (NCEI). The data collected from two meteorological stations in the study area were used to compare the trend of seasonality produced from the proposed seasonality extraction method with the local climate. The daily average surface air temperatures, including maximum and minimum temperature reported during the day from the year 2002-2016, reduced the effect of variation by averaging into monthly data. However, only climatic data from the meteorological station located in Phuket downtown were used in comparison with the seasonality of LST extracted from Study Site 3, which is an urban and built-up land cover site, because the region of the study site covers the location of the meteorological station (see Figure 1b).

Cubic Spline Function
A k-th order spline model is a well-known, numeric curve fitting function with derivatives that users can define as necessary for its applications [22]. Not only the degree of the spline function, but other parameters such as a number of polynomial pieces connecting points (known as knots), the position of the knots and the free coefficients of the spline function are the user's choice [23]. The most commonly-used spline is a cubic spline, of order 3. A cubic spline function is a piecewise cubic polynomial with continuous second derivatives and is smoothest among all functions in the sense that it has a minimal integrated squared second derivative. Moreover, it is easily fitted using linear least squares regression [24]. The cubic spline function is widely used for smoothing data in various fields of study such as interactive computer graphics [24,25], real-time digital signal processing [26][27][28] and satellite-based time series data [29][30][31][32][33][34]. The main advantages of the spline fit are that the first and second derivatives are continuous [24], including its simple calculation, good stability, high precision and smoothness [30]. The gradients derived from cubic spline functions are smoothly joined parabolas, not the abruptly joined straight line segments characteristic of parabolic spline smoothing [35].
As MODIS LST data are collected over time, they fluctuate over the season. The seasonal pattern is assumed to be the same for every year, and the change in other parameters such as the land cover change that has a direct or indirect effect on the LST data is consistently increasing or decreasing. A sudden natural disaster, for instance a forest fire or landslide or tsunami that could cause the instantaneous change of data behaviour, is excluded in this model consideration. Ideally, the model should provide continuous seasonal patterns for each day of the year within a year. These considerations suggest that the most appropriate model is a cubic spline with specific boundary conditions that ensure smooth periodicity. The model and all of the parameters are illustrated in Figure 2. The formula for a cubic spline function is: Where t denotes time, t 1 < t 2 < · · · < t p are specified knots and (t − x) + is (t − x) for t > x and 0 otherwise.

Cubic Spline Function
A k-th order spline model is a well-known, numeric curve fitting function with derivatives that users can define as necessary for its applications [22]. Not only the degree of the spline function, but other parameters such as a number of polynomial pieces connecting points (known as knots), the position of the knots and the free coefficients of the spline function are the user's choice [23]. The most commonly-used spline is a cubic spline, of order 3. A cubic spline function is a piecewise cubic polynomial with continuous second derivatives and is smoothest among all functions in the sense that it has a minimal integrated squared second derivative. Moreover, it is easily fitted using linear least squares regression [24]. The cubic spline function is widely used for smoothing data in various fields of study such as interactive computer graphics [24,25], real-time digital signal processing [26][27][28] and satellite-based time series data [29][30][31][32][33][34]. The main advantages of the spline fit are that the first and second derivatives are continuous [24], including its simple calculation, good stability, high precision and smoothness [30]. The gradients derived from cubic spline functions are smoothly joined parabolas, not the abruptly joined straight line segments characteristic of parabolic spline smoothing [35].
As MODIS LST data are collected over time, they fluctuate over the season. The seasonal pattern is assumed to be the same for every year, and the change in other parameters such as the land cover change that has a direct or indirect effect on the LST data is consistently increasing or decreasing. A sudden natural disaster, for instance a forest fire or landslide or tsunami that could cause the instantaneous change of data behaviour, is excluded in this model consideration. Ideally, the model should provide continuous seasonal patterns for each day of the year within a year. These considerations suggest that the most appropriate model is a cubic spline with specific boundary conditions that ensure smooth periodicity. The model and all of the parameters are illustrated in Figure 2. The formula for a cubic spline function is: Where denotes time, ⋯ are specified knots and − is − for and 0 otherwise. is the cubic spline function for and ; is the cubic spline function at ; and is the cubic spline function at .
In this case, we need a special annual periodic boundary condition that requires the quadratic and cubic coefficients of the spline function to be 0 for and , where and are the location of the first knot and the last knot, respectively. The functions (between and ) and (between and ) are linear functions with the same slope. Thus, the following two equations must be satisfied: s(t) is the cubic spline function for t > T 0 and t < T p ; s(T 0 ) is the cubic spline function at t < t 1 ; and s T p is the cubic spline function at t > t p .
In this case, we need a special annual periodic boundary condition that requires the quadratic and cubic coefficients of the spline function to be 0 for t > t p and t < t 1 , where t 1 and t p are the location of the first knot and the last knot, respectively. The functions s(T 0 ) (between T 0 and t 1 ) and s T p Remote Sens. 2017, 9,1254 6 of 17 (between t p and T p ) are linear functions with the same slope. Thus, the following two equations must be satisfied: To ensure a smooth and continuous curve at the start and the endpoint, the slopes at t 1 and t p need to be imposed, and they need to be equal. However, the values at the end and the start points are not necessarily matched in order to allow the trend to appear in the function. When considering at T p , the function beyond t p is linear as the function before t 1 . Therefore, s(T 0 ) = s T p in all derivatives of the cubic polynomial (n − 1, where n is the degree of the spline function). Thus, Incorporating two constraints (Equations (2) and (3)) into the equation above, therefore, In this case, the cubic spline function with degree 3 needs m + n + 1 free coefficients, where m is the number of knots (each polynomial piece has n + 1 coefficients, and the continuity conditions introduce n bands per knot, leaving (m + 1)(n + 1) − nm = m + n + 1 free coefficients), then the cubic spline function can be rewritten as: where: This cubic spline function was used in this study to capture the seasonal component in LST time series within the annual period. Note that this cubic spline function has a dissimilar boundary condition from the "periodic end condition" when the cubic spline function satisfies the following condition [25]: S t p = S(t 1 ), S t p = S (t 1 ), S t p = S (t 1 ) and S (t 1 ) = S t p = 0 for a natural cubic spline.

Knots Selection
Choosing the location and number of knots for smoothing the spline curve is a critical issue. However, there are current strategies for the optimal selection of those parameters based on procedures to add knots in intervals where the residuals show trends as indicated by autocorrelation or in intervals where the residuals are inadmissibly significant [23]. The number of knots corresponds to the number of linear segments. More knots would result in a smoother covariance surface, but would require more parameters to estimate. Locations of knots should correspond to the pattern of changes along the curve. Rapid changes require a larger number of knots in a given region, whereas small changes would need a sparser distribution of knots [36].
There are, practically, two distinctive philosophical approaches in choosing the optimal number and placement of knots. The first one is the subjective choice, for which one could place more knots where the data vary the most rapidly, and fewer knots where they are more stable. It is one of the advantages of using cubic spline that the data can be obtained at unequal intervals [37]. In contrast, there could be as few knots as possible to ensure that there are at least 4 or 5 points per interval for the cubic spline [22]. This restriction corresponds to the usual practice of keeping the number of parameters as small as possible since the flexibility of the spline function could over-fit. The second approach is the automatic method where the optimal knots are chosen directly by the data. This objective approach would be to use an iterative algorithm for the free knot spline [38] where the spline parameters are automatically estimated using uniform quartiles to find the knot positions and generalized cross-validation [39,40] to quantify the appropriate number of knots to be used. The basic idea of leave-one-out cross-validation is to omit some portion of data randomly and fit the model using the remaining data, then to use the fitted model to make predictions on the omitted data. Finally, the errors are computed (called predicted residuals). The step is repeated multiple times, and then, the Mean Square Error (MSE) is calculated from the predicted residuals (called cross-validated errors).
To select the optimal number and placement of knots to be used with LST data in this study, we compromised between goodness of fit and smoothness of the seasonal curve. Figure 3 shows the result of the adjusted r-squared and cross-validated error, respectively, when we varied the number of knots in different scenarios of their position. The criteria of three separate cases, which are subjected to the knots placement method, are (1) knots are placed at an equal interval (denoted by Case 1), (2) knots are placed at uniform quartiles (denoted by Case 2) and (3) knots are placed at a "best practice" locations (denoted by Case 3). The "best practice" knot locations are defined based on the actual behaviour of the LST data. LST data, especially in the tropical region, generally have high proportions of missing and highly variable data during the rainy season. We fix four knots at Julian Days 10, 115, 310 and 350 to avoid misplacing knots in the high variation and low distribution of data segmentation and to ensure that at least two knots are located near the beginning and the end of the annual range. Then, we allocate the rest between those four knots by equal intervals to cover the rapid change of the curve. It is important that the knots be located where the data are stable and not too far from the beginning and the end of the annual data series to ensure a continuous seasonal pattern between years.
Remote Sens. 2017, 9, 54 7 of 19 using the remaining data, then to use the fitted model to make predictions on the omitted data. Finally, the errors are computed (called predicted residuals). The step is repeated multiple times, and then, the Mean Square Error (MSE) is calculated from the predicted residuals (called cross-validated errors).
To select the optimal number and placement of knots to be used with LST data in this study, we compromised between goodness of fit and smoothness of the seasonal curve. Figure 3 shows the result of the adjusted r-squared and cross-validated error, respectively, when we varied the number of knots in different scenarios of their position. The criteria of three separate cases, which are subjected to the knots placement method, are (1) knots are placed at an equal interval (denoted by Case 1), (2) knots are placed at uniform quartiles (denoted by Case 2) and (3) knots are placed at a "best practice" locations (denoted by Case 3). The "best practice" knot locations are defined based on the actual behaviour of the LST data. LST data, especially in the tropical region, generally have high proportions of missing and highly variable data during the rainy season. We fix four knots at Julian Days 10, 115, 310 and 350 to avoid misplacing knots in the high variation and low distribution of data segmentation and to ensure that at least two knots are located near the beginning and the end of the annual range. Then, we allocate the rest between those four knots by equal intervals to cover the rapid change of the curve. It is important that the knots be located where the data are stable and not too far from the beginning and the end of the annual data series to ensure a continuous seasonal pattern between years.

LST Outlier Elimination
Since we fit the annual climatological profile using spline functions, all 15-year data series were re-ordered into Julian days. Extreme values can be detected for each group of data at each Julian day using the Interquartile Range (IQR). However, this outlier detection method may not adequately filter out the extreme values where there is less data density and the LST values separate from other values in a specific Julian day. The simple additional anomaly detection using the 3-sigma rule could be used if the entire time series were normally distributed. Any LST value that exceeds three standard deviations from the norm (±3σ) was thus marked as an outlier.

Weighted Least Squares Regression
Due to the negative effects on a number of phenomena that contaminate the measured LST signal, including clouds, atmospheric perturbations, variable illumination and viewing geometry [37], which tend to cause fluctuation in the LST values, weighted least squares regression can be applied to penalize negative errors instead of using an ordinary least squares regression. The

LST Outlier Elimination
Since we fit the annual climatological profile using spline functions, all 15-year data series were re-ordered into Julian days. Extreme values can be detected for each group of data at each Julian day using the Interquartile Range (IQR). However, this outlier detection method may not adequately filter out the extreme values where there is less data density and the LST values separate from other values in a specific Julian day. The simple additional anomaly detection using the 3-sigma rule could be used if the entire time series were normally distributed. Any LST value that exceeds three standard deviations from the norm (±3σ) was thus marked as an outlier.

Weighted Least Squares Regression
Due to the negative effects on a number of phenomena that contaminate the measured LST signal, including clouds, atmospheric perturbations, variable illumination and viewing geometry [37], which tend to cause fluctuation in the LST values, weighted least squares regression can be applied to penalize negative errors instead of using an ordinary least squares regression. The method of Weighted Least Squares (WLS) regression can be utilized when the Ordinary Least Squares (OLS) assumption of the constant variation in the errors is violated and is less sensitive to large changes in small parts of the observations. The WLS method is adopted in this study as a way of dealing with heteroscedastic errors. WLS can be achieved by minimizing the weighted sum of squares, where w i are weights inversely proportional to the variance of the residuals (i.e., w i = 1/σ 2 i ). This includes OLS as the special case where all the weights w i = 1.
In this study, we set the rule as extreme LST values are given zero weight (w i = 0) and give a score range from 4-1 according to the quality control data [5,6]. We give a score of 4 when average LST error ≤ 1 K, a score of 3 for average LST error ≤ 2 K, a score of 2 for average LST error ≤ 3 K and a score of 1 for average LST error > 3 K.

Placement and Number of Knots
From the optimal knots sensitivity testing result, as shown in Figure 3, the "best practice" approach for knots placement (Case 3) could enhance the periodic cubic spline model fitting, which gives high adjusted r-squared and low cross-validated error in all sample LST time series when the number of knots is larger. The Case 3 lines have constant values of the goodness of fit and error after using eight or nine knots in most of the sample data series. Increasing the number of knots gives no better fit to data and tends to have more errors when using more than 10 knots. Additional knots contribute to the roughness of a curve that appears in highly fluctuating LST data. Using eight knots as the "best practice" may be more appropriate in this case. We can achieve our aim to trade-off between suitable curve fitting and a smooth, good-looking curve. Moreover, this approach introduces fewer free coefficients in the cubic spline function, which ensures that data are not overfitted. Figure 4 demonstrates how the cubic spline curve tracks the pattern of the LST data in the annual period when choosing eight knots at the "best practice" location. The estimated seasonal component was calculated after eliminating all extreme LST values and applying WLS regression using the weighted conditions as described in Section 3.3. The adjusted r-squared values display how well the proposed cubic spline function with optimal knots and their appropriate allocation fitted the data. Although they are not considered to be the best fit, we cannot accomplish a better fit to the data due to the high variability from different years in the LST data. This is typical behaviour for satellite-based temperature data, especially in the tropical climate region. The weighted regression that considers the outliers and the bias from data measurement error produces a curve more closely reflecting the actual trend of the data. The proposed cubic spline function with the unique boundary condition produces a smooth seasonal curve when compared to the eight-day and monthly average MODIS LST, as shown in Figure 5. Averaging data in a specific period is the traditional approach to overcome the missing value issue and variability in the data. When averaging data within a short period, there is high fluctuation in seasonality due to a high variation in LST data. The smoothness of seasonality could be achieved when a longer span of data was used in averaging. However, this means the number of data is significantly Remote Sens. 2017, 9, 1254 9 of 17 reduced, as well. Figure 5 also shows the trend of seasonality from the urban and built-up site along with the trend of monthly surface air temperature measured from the meteorological station at the study site that represents the local weather. Although the surface skin temperature (T skin ) and the surface air temperature (T air ) are different measures with regard to both their physical meaning and their magnitude (an absolute value of LST is usually higher than the surface air temperature), the two temperatures are complementary in their contribution of valuable information in climate study [41]. LST has been used for mapping and estimating surface air temperature in the different parts of the globe [42][43][44][45]. The seasonal curve resulting from the cubic spline function shows a similar pattern and trend to the local monthly average air temperature.

Annual Seasonal Patterns of LST Time Series
Remote Sens. 2017, 9, 54 9 of 19 contribution of valuable information in climate study [41]. LST has been used for mapping and estimating surface air temperature in the different parts of the globe [42][43][44][45]. The seasonal curve resulting from the cubic spline function shows a similar pattern and trend to the local monthly average air temperature. The green bar presents the data distribution density. The "r-sq" label shows the adjusted r-squared value of the cubic spline function fitted to the data. The dark (black) dots indicate high-quality LST data, and the dots turn pale (grey) when the LST data are subjected to data quality control. The surface temperature is constant during the wet season, despite the high variation and less intensity of data during this time. It keeps a steady level until early December, which is a month after the formal end of the rainy season. This delay of time could be because high moisture is still stored in the ground during the rainy season, as well as the variation of seasons in different years. The surface temperature starts rising around the middle of December and touches the highest point around late February to the middle of March. The surface temperature then begins to decrease when the first rain of the southwest monsoon comes around late March, often followed by more rainfall. However, the regular local rainy season usually starts in April, when the amount of rainfall and the number of the rainy days are substantial. Figure 6a shows the annual seasonality of sample LST data from forest, agriculture and urban land cover sites. The temperature of the urban surface is the The crosses indicate all the outliers excluded from the cubic spline fitting, and the total number of detached observations is displayed as "zero weighted". The green bar presents the data distribution density. The "r-sq" label shows the adjusted r-squared value of the cubic spline function fitted to the data. The dark (black) dots indicate high-quality LST data, and the dots turn pale (grey) when the LST data are subjected to data quality control. contribution of valuable information in climate study [41]. LST has been used for mapping and estimating surface air temperature in the different parts of the globe [42][43][44][45]. The seasonal curve resulting from the cubic spline function shows a similar pattern and trend to the local monthly average air temperature. The crosses indicate all the outliers excluded from the cubic spline fitting, and the total number of detached observations is displayed as "zero weighted". The green bar presents the data distribution density. The "r-sq" label shows the adjusted r-squared value of the cubic spline function fitted to the data. The dark (black) dots indicate high-quality LST data, and the dots turn pale (grey) when the LST data are subjected to data quality control. The surface temperature is constant during the wet season, despite the high variation and less intensity of data during this time. It keeps a steady level until early December, which is a month after the formal end of the rainy season. This delay of time could be because high moisture is still stored in the ground during the rainy season, as well as the variation of seasons in different years. The surface temperature starts rising around the middle of December and touches the highest point around late February to the middle of March. The surface temperature then begins to decrease when the first rain of the southwest monsoon comes around late March, often followed by more rainfall. However, the regular local rainy season usually starts in April, when the amount of rainfall and the number of the rainy days are substantial. Figure 6a shows the annual seasonality of sample LST data from forest, agriculture and urban land cover sites. The temperature of the urban surface is the warmest in all months, while the forest surface is the coolest. The rising rate (the slope from Julian The surface temperature is constant during the wet season, despite the high variation and less intensity of data during this time. It keeps a steady level until early December, which is a month after the formal end of the rainy season. This delay of time could be because high moisture is still stored in the ground during the rainy season, as well as the variation of seasons in different years. The surface temperature starts rising around the middle of December and touches the highest point around late February to the middle of March. The surface temperature then begins to decrease when the first rain of the southwest monsoon comes around late March, often followed by more rainfall. However, the regular local rainy season usually starts in April, when the amount of rainfall and the number of the rainy days are substantial. Figure 6a shows the annual seasonality of sample LST data from forest, agriculture and urban land cover sites. The temperature of the urban surface is the warmest in all months, while the forest surface is the coolest. The rising rate (the slope from Julian Day 33-57 where the rising seasonality curve has maximum linearity during this period) of the urban impervious surface temperature during the drying out phase is approximately 0.126 • C/day. It is also higher than the rising temperature of the forest and agricultural area, which are approximately 0.079 and 0.068 • C/day, respectively. The difference between the maximum (at Julian Day 65) and minimum (at Julian Day 337) surface temperature during the dry season (∆T dry ) for the urban surface site is also higher than other locations. The ∆T dry of the urban site is around 6.03 • C, whereas the ∆T dry of the entire forest and agriculture land cover sites is around 4.03 and 4.89 • C, respectively. The urban and built-up land surface causes more variability of maximum and minimum surface temperature during the dry period. Figure 6b displays the ratio of surface temperature seasonality in urban and agricultural land cover area to the forest (S urban /S forest and S agric. /S forest ). It highlights the impact of urbanization and agriculture land use on natural surface vegetation.
Remote Sens. 2017, 9, 54 10 of 19 impervious surface temperature during the drying out phase is approximately 0.126 °C/day. It is also higher than the rising temperature of the forest and agricultural area, which are approximately 0.079 and 0.068 °C/day, respectively. The difference between the maximum (at Julian Day 65) and minimum (at Julian Day 337) surface temperature during the dry season (ΔTdry) for the urban surface site is also higher than other locations. The ΔTdry of the urban site is around 6.03 °C, whereas the ΔTdry of the entire forest and agriculture land cover sites is around 4.03 and 4.89 °C, respectively. The urban and built-up land surface causes more variability of maximum and minimum surface temperature during the dry period. Figure 6b displays the ratio of surface temperature seasonality in urban and agricultural land cover area to the forest (Surban/Sforest and Sagric./Sforest). It highlights the impact of urbanization and agriculture land use on natural surface vegetation.

Application to Daytime MODIS LST Data
We applied the proposed seasonal extraction method to the morning and afternoon LST data from MOD11A2 and MYD11A2 products. The data observed in the daytime MOD11A2 data (10:30 a.m. overpass-time) starts from 5 March 2000, which is two years earlier than the daytime MYD11A2 data (1:30 p.m. overpass-time). However, for the simplicity of data processing and comparing the result, we decided to select data observation from 4 July 2002-26 June 2017, which are 690 observations in total (46 observations each year for precisely 15 years). There are 611 MODIS LST pixels within Phuket Island excluding one pixel situated over the Bangwad dam. The seasonal component was calculated in a series by series (pixel by pixel) procedure. There is no distinction of the seasonal pattern found in the daytime MODIS LST datasets since our study area is considered as local scale with the same local weather and seasonal effect throughout the area. Therefore, we selected eight knots and placed them as the "best practice" position, which gives the best goodness of fit in the optimal knots process for both 10:30 a.m. and 1:30 p.m. LST datasets, to all LST time series located over the Phuket Island.
Once the seasonality is identified, the seasonal effect can be eliminated from the data series. This process is called seasonal adjustment, or deseasonalizing. Since the seasonal component from the spline-based seasonal extraction process is the additive model [46], the seasonally-adjusted series was determined by subtracting the LST series from the seasonal component and then adding the differentiation of averaged seasonal component and averaged seasonal subtracted LST, which is denoted as:

Application to Daytime MODIS LST Data
We applied the proposed seasonal extraction method to the morning and afternoon LST data from MOD11A2 and MYD11A2 products. The data observed in the daytime MOD11A2 data (10:30 a.m. overpass-time) starts from 5 March 2000, which is two years earlier than the daytime MYD11A2 data (1:30 p.m. overpass-time). However, for the simplicity of data processing and comparing the result, we decided to select data observation from 4 July 2002-26 June 2017, which are 690 observations in total (46 observations each year for precisely 15 years). There are 611 MODIS LST pixels within Phuket Island excluding one pixel situated over the Bangwad dam. The seasonal component was calculated in a series by series (pixel by pixel) procedure. There is no distinction of the seasonal pattern found in the daytime MODIS LST datasets since our study area is considered as local scale with the same local weather and seasonal effect throughout the area. Therefore, we selected eight knots and placed them as the "best practice" position, which gives the best goodness of fit in the optimal knots process for both 10:30 a.m. and 1:30 p.m. LST datasets, to all LST time series located over the Phuket Island.
Once the seasonality is identified, the seasonal effect can be eliminated from the data series. This process is called seasonal adjustment, or deseasonalizing. Since the seasonal component from the spline-based seasonal extraction process is the additive model [46], the seasonally-adjusted series was determined by subtracting the LST series from the seasonal component and then adding the differentiation of averaged seasonal component and averaged seasonal subtracted LST, which is denoted as: where S is the seasonal component and i is the observation ordered in the data series. Then, the LST data had become a stationary time series. Subsequently, detecting the lag one autocorrelation in the seasonal stationary data using first-order autoregression model, written as AR(1) [47,48], is required. If the autocorrelation is found, then it can be eliminated using linear filtering [49]. This procedure is vital to time series data analysis to meet the assumption of random error terms (independent errors), especially when fitting the linear trend using the OLS regression model. Autocorrelation in data will cause standard errors of regression coefficients to underestimate the true standard error and produce narrow confidence intervals. We found that the maximum lag one autocorrelation in 10:30 a.m. and 1:30 p.m. LST data is 0.26 and 0.24, respectively. There is around 25% of the time series that requires being filtered using a linear filter (reverse subtraction) to remove the autocorrelation influence. Finally, the surface temperature trend of the sufficient number of LST observations was directly estimated using a linear regression model, and the p-value can be obtained to weight the strength of the evidence.

Spatial Distribution of Daytime LST Trends
To observe the linear trend for each LST time series, the linear least squares estimation can be used to fit the data with any function of the form Y i = α + βx i + i for i = 1, 2, . . . , N. A method is a general approach that estimates α and β by minimizing the sum of squared deviations observed from expected responses. The linear model can be used over long ranges of data, but it is assumed to have poor extrapolation properties and sensitivity to outliers. When dividing the β parameter with the ∆x in the LST time series, we can obtain the level of temperature change during the ∆x period. Although we have data for 15 years and we can find the linear trend of LST over the 15-year time, a "decadal" trend of temperature change seems to be more appropriate and more widely used in the scientific community. The result of 10-year period LST trends can be depicted as thematic maps. Figure 7a The areas that have warming surface temperatures higher than 1 • C per decade with high statistical significance (p-value < 0.01) appear mostly in the agriculture lands (~54% for 10:30 a.m. and~76% for 1:30 p.m. LST data) and urban area (~20% for 10:30 a.m. and~10% for 1:30 p.m. LST data).

Trend of Daytime LST in Different LULC Types
We further investigated the overall trend of daytime LST in all LULC types in the entire study area. To discover the overall LST trend for more extensive area, we used the Generalized Estimating Equations (GEEs) [50,51]. GEE is a general statistical approach to fit a marginal model for longitudinal/clustered data analysis [52]. The GEE method takes clustered correlations into account by means of keeping high correlations within the group and fewer correlations between the group. This statistical model is ideal for time series analysis of multiple series such as temporal LST from many locations that have a spatial correlation with other nearby locations. Figure 8 shows the overall trends of land surface temperature in various LULC types and areas of 10:30 a.m. and 1:30 p.m. MODIS LST datasets. The LST data have been grouped into five main LULC categories and four administrative regions. In each LULC group, there is the combination of uniform and mixed land cover types in a 1-km 2 LST pixel area. In the study area, there are 0.24 and 0.12 • C increases in the surface temperature in a decade for the 10:30 a.m. and 1:30 p.m. datasets, respectively, and those uprising trends are statistically significant. All diurnal surface temperature trends are warmer in all various land cover types and areas. The rising surface temperature trend in the morning is higher than in the afternoon. The overall LST trends, both 10:30 a.m. and 1:30 p.m. overpass-time, in the miscellaneous land (total 39 time series) increase more than in other areas. The second warming trend is found in the urban area (total of 62 time series), followed by the forest (total of 92 time series) and the agricultural (total of 389 time series) land cover areas. However, the afternoon LST trend in a cultivated area is more significant than in the forest area. Moreover, LST trends in Phuket City and Mueang Phuket districts, which have an extreme density of asphalt streets and concrete buildings, are higher than Thalang and Kathu districts, which are mainly para-rubber plantations and upland evergreen rain forest, respectively.  the agricultural (total of 389 time series) land cover areas. However, the afternoon LST trend in a cultivated area is more significant than in the forest area. Moreover, LST trends in Phuket City and Mueang Phuket districts, which have an extreme density of asphalt streets and concrete buildings, are higher than Thalang and Kathu districts, which are mainly para-rubber plantations and upland evergreen rain forest, respectively.

Discussion
The daytime MYD11A2 subset of data over the study area contained a large percentage of highquality data, which was greater than 67% of the total observations. There were 33% of missing observations from the selected forest site and 30% missing observations from the sample agricultural and urban sites, as shown in Figure 4. The primary cause of missing data is due to a cloud and other sensor-related artefacts resulting from sun-sensor geometries and aerosols. The LST retrieval in C6 daily MODIS LST product series (MOD11A1 and MYD11A1) is retrieved using the generalized splitwindow algorithm under clear-sky conditions (at a confidence ≥ 95% over land that has an altitude ≤ 2000 m or ≥ 66% over land that has an altitude >2000 m and at a confidence of ≥ 66% over lakes) [6,53]. During the C6 Level-3 data processing, the cloud-contamination LSTs are removed by using constraints on the temporal variations in clear-sky LSTs over a period of 32 days by the refinement implemented in Product Generation Executive code (PGE16) [26], and they are indicated as zero (as missing value) in the obtained C6 eight-day composite MODIS LST dataset; although the proportion of gaps produced by missing daily observations is reduced mainly because of the compositing over eight-day periods of the daily LST product. Moreover, LST values that pass the cloud filter could still be subjected to a degree of cloud contamination causing data quality issues. The information about

Discussion
The daytime MYD11A2 subset of data over the study area contained a large percentage of high-quality data, which was greater than 67% of the total observations. There were 33% of missing observations from the selected forest site and 30% missing observations from the sample agricultural and urban sites, as shown in Figure 4. The primary cause of missing data is due to a cloud and other sensor-related artefacts resulting from sun-sensor geometries and aerosols. The LST retrieval in C6 daily MODIS LST product series (MOD11A1 and MYD11A1) is retrieved using the generalized split-window algorithm under clear-sky conditions (at a confidence ≥95% over land that has an altitude ≤2000 m or ≥66% over land that has an altitude >2000 m and at a confidence of ≥66% over lakes) [6,53]. During the C6 Level-3 data processing, the cloud-contamination LSTs are removed by using constraints on the temporal variations in clear-sky LSTs over a period of 32 days by the refinement implemented in Product Generation Executive code (PGE16) [26], and they are indicated as zero (as missing value) in the obtained C6 eight-day composite MODIS LST dataset; although the proportion of gaps produced by missing daily observations is reduced mainly because of the compositing over eight-day periods of the daily LST product. Moreover, LST values that pass the cloud filter could still be subjected to a degree of cloud contamination causing data quality issues. The information about the quality of measured LST data is flagged in the quality control (QC) for LST and emissivity data. From the sample time series, the problem of missing data, which is indicated by the green bar in Figure 4, occurs more during the wet period of the year. High variability of surface temperatures is also found during the wet season.
Despite the missing value issues and the fluctuation of LST values, the particular boundary restricted cubic spline function with optimal knots successfully achieved the extraction of annual LST seasonality. The pattern of seasonality from the three study sites was generally similar, except for the magnitude of seasonality, which varied for the different land cover types. The impervious surfaces such as asphalt road, concrete ground/building and manmade rooftop materials in the urban area absorb the Sun's energy during the day more and faster than natural landscapes such as soil and vegetation. Thus, the overall magnitude of LST seasonality from the urban surface is the highest whereas that from the natural forest is the lowest. The cross-correlation coefficients, which are for comparing the temporal structure and mutual information of the signals, were between 0.877 and 0.960, and there were insignificant time shifts among the three seasonal profiles. Figure 9 displays the plot of the cross-correlation function that shows that the seasonalities from natural forest and urban sites have the most similarity. The graph of cross-correlation function indicates two identical series when the graphs on the positive and negative lag are symmetric. Table 3 also shows the normalized cross-correlation coefficients with time shift between the seasonality of the three different study sites. Both seasonal patterns from natural forest and agricultural sites highly correlate (0.933) with the time shift of two observation points (16 days). There is the highest correlation (0.963) of seasonality with the time shift of one observation point (eight days) from the urban and agricultural sites, whereas there is no time shift between seasonality from the forest and urban sites with cross-correlation at 0.960. Although the seasonal curve produced by the proposed cubic spline function is predominantly resulted from the optimal number of knots and their placement, if the same interval of knots is applied for the entire LST time series at such a local scale, it will extract similar seasonality of LST according to the local weather and season that persistently affect the surface temperature. In the case of extending the method over broader area coverage, such as at regional or global scales, the investigation of the different seasonal patterns may need to be conducted. The different optimal numbers of knots and their placements may need to be applied to the LST time series in the diverse regions.
whereas that from the natural forest is the lowest. The cross-correlation coefficients, which are for comparing the temporal structure and mutual information of the signals, were between 0.877 and 0.960, and there were insignificant time shifts among the three seasonal profiles. Figure 9 displays the plot of the cross-correlation function that shows that the seasonalities from natural forest and urban sites have the most similarity. The graph of cross-correlation function indicates two identical series when the graphs on the positive and negative lag are symmetric. Table 3 also shows the normalized cross-correlation coefficients with time shift between the seasonality of the three different study sites. Both seasonal patterns from natural forest and agricultural sites highly correlate (0.933) with the time shift of two observation points (16 days). There is the highest correlation (0.963) of seasonality with the time shift of one observation point (eight days) from the urban and agricultural sites, whereas there is no time shift between seasonality from the forest and urban sites with cross-correlation at 0.960. Although the seasonal curve produced by the proposed cubic spline function is predominantly resulted from the optimal number of knots and their placement, if the same interval of knots is applied for the entire LST time series at such a local scale, it will extract similar seasonality of LST according to the local weather and season that persistently affect the surface temperature. In the case of extending the method over broader area coverage, such as at regional or global scales, the investigation of the different seasonal patterns may need to be conducted. The different optimal numbers of knots and their placements may need to be applied to the LST time series in the diverse regions. Figure 9. Plot of cross-correlation function among the seasonality of the three selected study sites. The blue dashed lines represent a 95% confidence interval at 0.3. Table 3. Cross-correlation coefficients among the seasonality of the three selected study sites.  The blue dashed lines represent a 95% confidence interval at ±0.3 Table 3. Cross-correlation coefficients among the seasonality of the three selected study sites. One particular statistical method that delivers an excellent fit to data and produces a seasonal curve that reflects the real seasonality is the WLS regression, which had been applied to fit a cubic spline function to the data instead of using OLS regression. The main advantage of weighted regression is that it works well when successive false lows are rare so that local valleys occur separately [54,55]. However, it is not appropriate to use when several false lows occur in sequence because it results in false local peaks, which bias the estimated value downwards. This approach might not be suitable for daily data, and its applicability will depend on the frequency of cloud contamination and the strength of seasonality. In this study, we applied this robust weighted regression on LST data annually where each point of observation (46 observations for each year) contains a minimum of one value to a maximum of 15 values. Hence, the behaviour of the data may not cause any drawback of this model for this circumstance. Although there is a possibility that there are missing values at particular Julian days for the entire 15 years of LST observations, which could be found during the wet season, this occurrence is infrequent and does not consecutively happen in the dataset.

Conclusions
In this paper, we presented a simple method to extract annual seasonality in MODIS LST time series using a spline-based approach. The results indicate a high potential for using a cubic spline function with annual periodic boundary conditions, including an appropriate number of knots and their placement, to extract the seasonal component in MODIS time series under the assumption that the seasonal pattern is the same for every year and the change of other parameters that have a direct and indirect effect on the data is monotonic. The cubic spline function satisfactorily extracted the seasonal pattern, even when there were many successive missing values in the data series, as well as under the condition of high interannual data variability. Moreover, cubic splines efficiently modelled the MODIS LST data that were influenced by spatial autocorrelation, since the proposed method can be applied to each LST time series (pixel) in any period and scale of area coverage. It could be considered as an alternative method to describe the seasonality of remotely-sensed environmental and climate data, which is often difficult to achieve due to the continuous missing value issues that may not be appropriately estimated using interoperation methods. The proposed method can be adopted as the first procedure for time series analysis of temporal LST data. Seasonally-adjusted or deseasonalized data can be directly calculated by subtracting the raw data from the seasonal component, which is produced using the proposed method. Subsequently, the linear and polynomial trends can be used to estimate and analyse for forecasting purpose. Since time series forecasting involves taking models fit on historical data and using them to predict future observations, appropriate time series decomposition may result in effective predicting performance. Modelling these time series components using traditional statistical methods is the most common and efficient way to forecast the future values. We also demonstrated the application of the proposed method by examining decadal daytime MODIS LST linear trends for the LULC types and areas in spatial and categorical perspectives. The proposed method, as well as the example of its application are beneficial for a long-term Earth surface heat studies such as the Urban Heat Island (UHI) and climate change at any scale.