#### 2.3.1. Cubic Spline Function

A

k-th order spline model is a well-known, numeric curve fitting function with derivatives that users can define as necessary for its applications [

22]. Not only the degree of the spline function, but other parameters such as a number of polynomial pieces connecting points (known as knots), the position of the knots and the free coefficients of the spline function are the user’s choice [

23]. The most commonly-used spline is a cubic spline, of order 3. A cubic spline function is a piecewise cubic polynomial with continuous second derivatives and is smoothest among all functions in the sense that it has a minimal integrated squared second derivative. Moreover, it is easily fitted using linear least squares regression [

24]. The cubic spline function is widely used for smoothing data in various fields of study such as interactive computer graphics [

24,

25], real-time digital signal processing [

26,

27,

28] and satellite-based time series data [

29,

30,

31,

32,

33,

34]. The main advantages of the spline fit are that the first and second derivatives are continuous [

24], including its simple calculation, good stability, high precision and smoothness [

30]. The gradients derived from cubic spline functions are smoothly joined parabolas, not the abruptly joined straight line segments characteristic of parabolic spline smoothing [

35].

As MODIS LST data are collected over time, they fluctuate over the season. The seasonal pattern is assumed to be the same for every year, and the change in other parameters such as the land cover change that has a direct or indirect effect on the LST data is consistently increasing or decreasing. A sudden natural disaster, for instance a forest fire or landslide or tsunami that could cause the instantaneous change of data behaviour, is excluded in this model consideration. Ideally, the model should provide continuous seasonal patterns for each day of the year within a year. These considerations suggest that the most appropriate model is a cubic spline with specific boundary conditions that ensure smooth periodicity. The model and all of the parameters are illustrated in

Figure 2. The formula for a cubic spline function is:

Where

$t$ denotes time,

${t}_{1}<{t}_{2}<\cdots <{t}_{p}$ are specified knots and

${\left(t-x\right)}_{+}$ is

$\left(t-x\right)$ for

$t>x$ and 0 otherwise.

In this case, we need a special annual periodic boundary condition that requires the quadratic and cubic coefficients of the spline function to be 0 for

$t>{t}_{p}$ and

$t<{t}_{1}$, where

${t}_{1}$ and

${t}_{p}$ are the location of the first knot and the last knot, respectively. The functions

$s\left({T}_{0}\right)$ (between

${T}_{0}$ and

${t}_{1}$) and

$s\left({T}_{p}\right)$ (between

${t}_{p}$ and

${T}_{p}$) are linear functions with the same slope. Thus, the following two equations must be satisfied:

To ensure a smooth and continuous curve at the start and the endpoint, the slopes at ${t}_{1}$ and ${t}_{p}$ need to be imposed, and they need to be equal. However, the values at the end and the start points are not necessarily matched in order to allow the trend to appear in the function. When considering at ${T}_{p}$, the function beyond ${t}_{p}$ is linear as the function before ${t}_{1}$. Therefore, $s\left({T}_{0}\right)=s\left({T}_{p}\right)$ in all derivatives of the cubic polynomial ($n-1$, where $n$ is the degree of the spline function).

Thus,

${s}^{\prime}\left({T}_{0}\right)={s}^{\prime}\left({T}_{p}\right)$, which makes

$b=b+3{\sum}_{k=1}^{p}{c}_{k}{\left({T}_{p}-{t}_{k}\right)}^{2}$ and consequently,

${{T}_{p}}^{2}{\sum}_{k=1}^{p}{c}_{k}-2{T}_{p}{\sum}_{k=1}^{p}{c}_{k}{t}_{k}+{\sum}_{k=1}^{p}{{t}_{k}}^{2}{c}_{k}=0$. Incorporating two constraints (Equations (2) and (3)) into the equation above, therefore,

In this case, the cubic spline function with degree 3 needs

$m+n+1$ free coefficients, where

$m$ is the number of knots (each polynomial piece has

$n+1$ coefficients, and the continuity conditions introduce

$n$ bands per knot, leaving

$\left(m+1\right)\left(n+1\right)-nm=m+n+1$ free coefficients), then the cubic spline function can be rewritten as:

where:

This cubic spline function was used in this study to capture the seasonal component in LST time series within the annual period. Note that this cubic spline function has a dissimilar boundary condition from the “periodic end condition” when the cubic spline function satisfies the following condition [

25]:

$S\left({t}_{p}\right)=S\left({t}_{1}\right)$,

${S}^{\prime}\left({t}_{p}\right)={S}^{\prime}\left({t}_{1}\right)$,

${S}^{\u2033}\left({t}_{p}\right)={S}^{\u2033}\left({t}_{1}\right)$ and

${S}^{\u2033}\left({t}_{1}\right)={S}^{\u2033}\left({t}_{p}\right)=0$ for a natural cubic spline.

#### 2.3.2. Knots Selection

Choosing the location and number of knots for smoothing the spline curve is a critical issue. However, there are current strategies for the optimal selection of those parameters based on procedures to add knots in intervals where the residuals show trends as indicated by autocorrelation or in intervals where the residuals are inadmissibly significant [

23]. The number of knots corresponds to the number of linear segments. More knots would result in a smoother covariance surface, but would require more parameters to estimate. Locations of knots should correspond to the pattern of changes along the curve. Rapid changes require a larger number of knots in a given region, whereas small changes would need a sparser distribution of knots [

36].

There are, practically, two distinctive philosophical approaches in choosing the optimal number and placement of knots. The first one is the subjective choice, for which one could place more knots where the data vary the most rapidly, and fewer knots where they are more stable. It is one of the advantages of using cubic spline that the data can be obtained at unequal intervals [

37]. In contrast, there could be as few knots as possible to ensure that there are at least 4 or 5 points per interval for the cubic spline [

22]. This restriction corresponds to the usual practice of keeping the number of parameters as small as possible since the flexibility of the spline function could over-fit. The second approach is the automatic method where the optimal knots are chosen directly by the data. This objective approach would be to use an iterative algorithm for the free knot spline [

38] where the spline parameters are automatically estimated using uniform quartiles to find the knot positions and generalized cross-validation [

39,

40] to quantify the appropriate number of knots to be used. The basic idea of leave-one-out cross-validation is to omit some portion of data randomly and fit the model using the remaining data, then to use the fitted model to make predictions on the omitted data. Finally, the errors are computed (called predicted residuals). The step is repeated multiple times, and then, the Mean Square Error (MSE) is calculated from the predicted residuals (called cross-validated errors).

To select the optimal number and placement of knots to be used with LST data in this study, we compromised between goodness of fit and smoothness of the seasonal curve.

Figure 3 shows the result of the adjusted r-squared and cross-validated error, respectively, when we varied the number of knots in different scenarios of their position. The criteria of three separate cases, which are subjected to the knots placement method, are (1) knots are placed at an equal interval (denoted by Case 1), (2) knots are placed at uniform quartiles (denoted by Case 2) and (3) knots are placed at a “best practice” locations (denoted by Case 3). The “best practice” knot locations are defined based on the actual behaviour of the LST data. LST data, especially in the tropical region, generally have high proportions of missing and highly variable data during the rainy season. We fix four knots at Julian Days 10, 115, 310 and 350 to avoid misplacing knots in the high variation and low distribution of data segmentation and to ensure that at least two knots are located near the beginning and the end of the annual range. Then, we allocate the rest between those four knots by equal intervals to cover the rapid change of the curve. It is important that the knots be located where the data are stable and not too far from the beginning and the end of the annual data series to ensure a continuous seasonal pattern between years.

#### 2.3.4. Weighted Least Squares Regression

Due to the negative effects on a number of phenomena that contaminate the measured LST signal, including clouds, atmospheric perturbations, variable illumination and viewing geometry [

37], which tend to cause fluctuation in the LST values, weighted least squares regression can be applied to penalize negative errors instead of using an ordinary least squares regression. The method of Weighted Least Squares (WLS) regression can be utilized when the Ordinary Least Squares (OLS) assumption of the constant variation in the errors is violated and is less sensitive to large changes in small parts of the observations. The WLS method is adopted in this study as a way of dealing with heteroscedastic errors. WLS can be achieved by minimizing the weighted sum of squares,

where

${w}_{i}$ are weights inversely proportional to the variance of the residuals (i.e.,

${w}_{i}=1/{\sigma}_{i}^{2}$). This includes OLS as the special case where all the weights

${w}_{i}=1$.

In this study, we set the rule as extreme LST values are given zero weight (

${w}_{i}=0$) and give a score range from 4–1 according to the quality control data [

5,

6]. We give a score of 4 when average LST error ≤ 1 K, a score of 3 for average LST error ≤ 2 K, a score of 2 for average LST error ≤ 3 K and a score of 1 for average LST error > 3 K.