The Curve Estimation of Combined Truncated Spline and Fourier Series Estimators for Multiresponse Nonparametric Regression

: Nonparametric regression becomes a potential solution if the parametric regression assumption is too restrictive while the regression curve is assumed to be known. In multivariable nonparametric regression, the pattern of each predictor variable’s relationship with the response variable is not always the same; thus, a combined estimator is recommended. In addition, regression modeling sometimes involves more than one response, i.e., multiresponse situations. Therefore, we propose a new estimation method of performing multiresponse nonparametric regression with a combined estimator. The objective is to estimate the regression curve using combined truncated spline and Fourier series estimators for multiresponse nonparametric regression. The regression curve estimation of the proposed model is obtained via two ‐ stage estimation: (1) penalized weighted least square and (2) weighted least square. Simulation data with sample size variation and different error variance were applied, where the best model satisfied the result through a large sample with small variance. Additionally, the application of the regression curve estimation to a real dataset of human development index indicators in East Java Province, Indonesia, showed that the proposed model had better performance than uncombined estimators. Moreover, an adequate coefficient of determination of the best model indicated that the proposed model successfully explained the data variation.


Introduction
As one of the renowned methods of regression analysis, parametric regression has been used for many years in various scientific fields. However, some parametric regression assumptions are too restrictive, such as identifying a regression curve's shape with some prespecified functional form (e.g., linear, quadratic, or cubic) [1]. In real datasets, not all regression curves have a visible pattern due to the absence of relationship information between response and predictor variables. Therefore, in such scenarios, nonparametric regression analysis is more recommended [2]. Nonparametric regression is able to degrade the misspecification risk because data inherently represent the real shape of a regression curve without interference due to the researcher's subjectivity [2].
To date, researchers have investigated various functions for estimating the regression curve in nonparametric regression, such as spline [3][4][5][6], Fourier series [7][8][9], kernel [10][11][12], polynomial [13][14][15], and wavelet [16][17][18] functions. The present study explores the nonparametric regression approach to analyzing spatial data, i.e., the use of the nonparametric truncated spline function in geographically weighted regression [19,20]. In nonparametric regression with more than one predictor variable (multivariable), the pattern of the relationship between each predictor variable and the response variable is not always the same; therefore, in this study, we employ a nonparametric regression model with a combined estimator. Considering the concept of semiparametric regression, some studies have proposed combined estimator models to estimate the regression curve, such as spline and kernel functions [21][22][23]. Similarly, in [24], the researchers developed a combined estimator of kernel and Fourier series, whereas in [25][26][27], the researchers presented nonparametric regression with a mixed estimator of truncated spline and Fourier series. Recently, researchers have also been giving attention to a mixed model of longitudinal data, as carried out by [28].
So far, studies on nonparametric regression with combined estimators have not dealt with more than two responses, i.e., multiresponse nonparametric regression. Numerous studies on nonparametric regression have been applied to many scientific fields, such as sociodemographic analysis [23][24][25][26], finance [5,29], climatology [4], and economics [30], wherein some cases have involved real data with two or more correlated response variables. Therefore, this study makes a major contribution to the research on multiresponse nonparametric regression by using a combined estimator. Of the several types of regression functions for nonparametric regression, the truncated spline function has the major advantages of high flexibility, good visual interpretation, and the ability to handle smooth function characters and data that have changed behavior at certain subintervals [2]. On the other hand, the Fourier series function is used to estimate the regression curve if the data are smooth and follow a repeated pattern at specific intervals [9]. The Fourier series function is also a trigonometric polynomial that can adjust the data's local nature effectively. In this study, we adopt a cosine Fourier series function [9], which follows a trend line as the regression curve estimator. A Fourier series with a cosine function was employed since it is an even function, such that the second derivative usually mathematically obtains a scalar or a nonzero value. Furthermore, the penalties in the penalized least square method can be well defined. Considering some of the advantages of these two functions, as outlined above, this study highlights the combined use of truncated spline and Fourier series estimators for multiresponse nonparametric regression.
One of the most well-known tools for the estimation of regression is the ordinary least square (OLS) method. However, the OLS method cannot be directly used in nonparametric regression because of the unknown curve shape. By utilizing smooth, continuous, and differentiable properties, as well as other components, the OLS method was modified with conditional optimization to create the penalized least-square (PLS) regression method [31]. The combined use of estimators for multiresponse nonparametric regression leads to the use of an error variance-covariance matrix as a weighting matrix. Therefore, PLS optimization is still added with a weight; accordingly, the proposed model was obtained through penalized weighted least square (PWLS) optimization. To evaluate the proposed model performance, we employed a simulation study with three different sample sizes and three error variances and applied the proposed model to a real dataset.
Given the view above, the objective of this study is to obtain a curve estimation using combined truncated spline and Fourier series estimators for multiresponse nonparametric regression. This aim is followed by estimating an error variance-covariance matrix as a weighting matrix. To estimate the proposed model, a two-stage estimation method was used. Stage 1 was to estimate the Fourier series component using PWLS optimization. The following stage was to estimate the truncated spline component by utilizing weighted least square (WLS) optimization. In addition to curve estimation, we attempted to simulate and apply the proposed model to a dataset of the HDI in East Java Province, Indonesia. The paper is organized as follows: Section 1 provides an overview of the topic while identifying the knowledge gap and stating the aims of the research. Section 2 describes the truncated spline function and the Fourier series function, along with the PWLS method. Section 3 presents the findings of the regression curve estimation via the combined use of truncated spline and Fourier series estimators for multiresponse nonparametric regression, along with an estimation of the variance-covariance matrix, smoothing parameter selection, and a simulation study following the application of a real dataset. Section 4 gives a brief conclusion and makes future recommendations.

Multiresponse Nonparametric Regression, Truncated Spline Function, and Fourier Series Function
Given paired data , , … , , , … , , , , … , , the relationship between response , , … , and predictor , , … , , , , … , variables is assumed to follow multiresponse nonparametric regression, as follows: , , … , , , , … , where ℎ 1,2, … , and 1,2, … , . In this study, r is defined as the number of response variables, and n refers to the number of observations for each response and predictor variable. For convenience, Equation (1) can be rewritten in the following matrix form: , The regression curve in Equation (1) can be assumed to be an additive model.
In Equation (3), the regression curve ; 1, 2, … , is assumed to be smooth and is approached by a truncated spline function. Meanwhile, the regression curve ; 1,2, … , is approached by a Fourier series function, which assumes that the regression curve is unknown and is contained in continuous space 0, . Hence, the multiresponse nonparametric regression in Equation (3) can be written as Under the assumption ~ 0, , as described in [32], ℓ for ℎ ℓ; ℎ, ℓ 1,2, … , . This term refers to situations in which the response variables and ℓ are a pair, such that a correlation between the ℎ-th response and the ℓ-th response yields ℓ , and zero otherwise. The error correlation between responses is the same for every response, defined as ℓ ℓℓ ; 1,2, … , ; ℎ ℓ; ℎ, ℓ 1,2, … , .
Thus, the regression curve in Equation (4) is approached by a linear truncated spline function with knots , , … , , as follows: , with the truncated function , 0 , Adopting the Fourier series function in [9], in Equation (4) is approached by a Fourier series cosine function following the trend line ( ), as given in Equation (6):

Penalized Weighted Least Square
Suppose that the dataset , , … , , , , … , , , , … , follows the multiresponse nonparametric regression in Equation (1) and is rewritten in matrix form as Equation (2). The random error in Equation (2) is normally distributed, with mean 0 and error variance-covariance matrix , such that it can be written as ~ , . In particular, as a weighting matrix plays an important role in accommodating the correlation between responses. If there is a correlation between responses, the correlation is defined as ℓ ℓℓ , such that ℓ ℓℓ . Note that is the identity matrix. When all observations come in pairs, has ,ℓ elements, as described below: The regression curve of the combined truncated spline and Fourier series estimators for multiresponse nonparametric regression was estimated by carrying out the following PWLS optimization: where refers to the number of observations for all response variables, alternatively written as ∑ . The first component in Equation (7) is a function that measures the goodness-of-fit (GoF), while the second component is the penalty. In addition, is a weighted component, and serves as a positive smoothing parameter that controls the balance between the GoF and the penalty. In this study, during the process of regression curve estimation, the parameters and are given. Note that the superscript T refers to transpose and the superscript " refers to the second derivative of the function.
As stated in the Introduction section, we estimated the regression curve of the proposed model in two stages. The first stage involves completing the estimation of the regression curve through PWLS optimization, which results in Theorem 1 (see the Section 3.1). Subsequently, the second stage involves completing the estimated regression curve using WLS optimization, which results in Theorem 2 (see the Section 3.1).

Curve Estimation of Combined Truncated Spline and Fourier Series Estimators for Multiresponse Nonparametric Regression
As previously stated, PWLS and WLS optimization were used to obtain combined truncated spline and Fourier series estimators for multiresponse nonparametric regression. Therefore, some lemmas and theorems were required. Whenever Lemma 1 presents the GoF, Lemma 2 presents the penalty of the PWLS optimization. By using the results of Lemma 1 and Lemma 2, the first stage estimation is obtained, which is formulated in Theorem 1. The second stage involves estimating the regression curve using WLS optimization, as presented in Lemma 3, with the main result presented in Theorem 2. All the proofs of the lemmas and theorems are presented in Appendix A to Appendix E.

Lemma 1. If the multiresponse nonparametric regression model is written as Equation (4) and the regression curve
is as given in Equation (6), the GoF can be formulated as follows: where is the ℎ ℎ weighting matrix, The proof of Lemma 1 is given in Appendix A. (6), the penalty component is as follows:

Lemma 2. If the multiresponse nonparametric regression model is written as Equation (4) and the regression curve is as presented in Equation
A complete description of the proof of Lemma 2 can be found in Appendix B.
Having discussed how to construct the GoF and the penalty, Theorem 1 addresses potential solutions to the first-stage estimation using PWLS optimization in Equation (7). , .
The evidence of Theorem 1 is given in Appendix C. The second stage involves estimating the regression curve using WLS optimization, as described in Lemma 3, with the result in Theorem 2. (5), WLS optimization can be obtained as

Lemma 3. If the regression curve is as described in Equation
As a note, Lemma 3′s proof is provided in Appendix D.

Theorem 2. If WLS optimization is given by Lemma 3, the regression curve estimation for multiresponse nonparametric regression can be attained from WLS optimization such that
, , in Theorem 1 can be rewritten as Equation (12). First, substituting Equation (9) into Equation (A9) gives Equation (10): .
If the result of Equation (10) is substituted into Equation (A10), the regression curve estimation of the Fourier series can be rewritten as Equation (12). As a note, Equation (A9) and Equation (A10) can be found in Appendix C. .
Another main finding of this study is the regression curve of the combined truncated spline and Fourier series estimators for multiresponse nonparametric regression, as presented in Corollary 1. (8) and Equation (11), the regression curve estimation of the combined truncated spline and Fourier series estimators for the additive model is presented below: , , , .

Corollary 1. According to the main result in Equation
Proof. By using an additive model in Equation (3), the regression curve estimation of the combined estimator for multiresponse nonparametric regression can be written in the following matrix form: Based on the result of , , , in Equation (8)  , , For simplification, by substituting Equation (9) and Equation (12) into Equation (13), can also be expressed as , , , . □

Estimation of Error Variance-Covariance Matrix
The result in Equation (13) shows that curve estimation of the proposed model leads to the use of an error variance-covariance matrix as a weighting matrix. Hence, estimation of the error variance-covariance matrix is presented in Theorem 3.

Theorem 3. If the regression curve estimation is given by Corollary 1 and random error is normally distributed with mean 0 and error variance-covariance matrix
such that it can be written as ~ , , the error variance-covariance matrix estimation is as follows: The proof of Theorem 3 is given in Appendix F.

Smoothing Parameter Selection
Another critical step in nonparametric regression modeling is selecting the optimal knot, oscillation parameter, and smoothing parameter. A large smoothing parameter produces a smoother estimator function but constrains the capability mapping of data. In contrast, if the smoothing parameter is too small, the function estimator is rougher [33]. We concluded that a suitable method was required to determine the optimal smoothing parameters. The smoothing parameter selection in this study was based on the GCV method, as conducted by [33]; the detailed procedure and optimal properties of this method have been carried out by [34]. The modified GCV method for combined truncated spline and Fourier series estimators in the multiresponse nonparametric regression model is as follows: where , , , , , , By substituting Equation (15) into Equation (14), the optimum smoothing parameter is obtained by taking the minimum of modified GCV for the proposed model, as presented below , , , , where , , is the regression curve estimation of the proposed model, as in Equation (13).

Simulation Study
Simulations with 100 replications were carried out to verify and validate the theoretical result of the proposed model. Using three data sample sizes ( = 20, 50, and 100) and three different error variances ( = 0.1, 0.5, and 1), random data were generated for multiresponse nonparametric regression. As shown in Figure 1, each predictor variable describes different functions, i.e., a polynomial function represents the truncated spline ( ), while a trigonometry function with a trend represents the Fourier series function ( ). As a note, Figure 1 only presents a partial scatterplot of the numerical example function for = 100 and = 0.1. The simulation study followed the model in Equation (4), so the polynomial and trigonometric functions used in this numerical example were obtained from the following function: In particular, the predictor variables and are generated from a Uniform (0,1) distribution, while random errors are generated from a multivariate normal distribution. In addition, , , and are correlated with , , , 0.9. In order for ease of the computational process and based on the results of the partial scatterplot identification from Figure 1, the simulation study was carried out with the combination of three knots (K = 1, 2, 3) and three oscillation parameters (T = 1, 2, 3). To determine the optimal smoothing parameter, the minimum GCV criteria were used.   20 and 50 is provided in Appendix G. Table 1 shows that the smallest GCV (6.098) occurs for variance 0.1 with the combination of the three knots-three oscillation model. This model yields adequate results, with coefficient of determination (R 2 ) = 93,202 and mean-square-error (MSE) = 5.675. Surprisingly, the same result is seen for the other sample sizes ( 20 and 50), where the smallest GCV is obtained for variance 0.1. In addition, a comparison between sample sizes gives the result in which the smallest GCV is gained from the simulation with the largest sample number ( 100). Thus, a smaller variance with a large sample size will produce a minimum GCV, which implies that the regression curve will be better estimated as well. These findings are consistent with [23], who developed mixed estimators in nonparametric regression for cross-sectional data.

Data Application
This section attempts to provide the application of the proposed model to a real dataset. The proposed model was applied to four indicators of HDI data in 38 regencies/cities across East Java Province, Indonesia, in 2018. To measure human development in some regions, the UNDP developed the HDI in 1990. In the 1990 Human Development Report, three basic dimensions of human development were defined in the HDI: income (economy), health, and education [35]. From these dimensions, four indicators of HDI data were derived: life expectancy rate, expected years of schooling, mean years of schooling, and adjusted per capita expenditure [35]. In this study, these four indicators are the response variables, , , , and , respectively.
The issue of the HDI has received considerable critical attention in every province/regency in Indonesia as it is one of the determining components of the general allocation fund (DAU) [36]. As one of the largest provinces in Indonesia, East Java Province contributes significantly to Indonesia's HDI. East Java Province's HDI in 2018 was 70.77 [37], slightly lower than the national achievement. In addition, it was the lowest among the five other provinces of Java island in the last three years. Therefore, further studies on East Java Province's HDI are to be potentially discussed.
To determine whether the correlation matrix has an identity matrix (homogeneity of variances), we used Bartlett's test of sphericity with statistical significance 0.05. The results showed a statistical value of = 121.993 with p-value = 0.00, which means that the decision was to reject the null hypothesis as p-value < . According to this result, we can infer that there is a significant correlation among the response variables; hence, multiresponse nonparametric regression analysis can be used.
Several studies have highlighted the potential factors associated with Indonesia's HDI, such as the percentage of people living in poverty [25,38], the unemployment rate [25,38,39], population density [25,38], and the per capita gross regional domestic product (GRDP) [25,39,40]. Based on these, the predictor variables used in this study were population density and the percentage of people living in poverty. In addition, the relationship between the response variable and each predictor variable was identified by a partial scatterplot, as shown in Figure 2. The partial scatter plot between the four indicators of the HDI and population density ( ) showed changes at a particular subinterval that is fit for the truncated spline function. Meanwhile, the partial scatterplot between the response variable and the percentage of people living in poverty ( ) followed a pattern that repeated at a certain interval, with a particular trend; thus, this variable was approached by the Fourier series function. As shown in Table 2, some scenarios using a combined estimator (Model 1) and uncombined estimators (Model 2 and Model 3) were performed to compare the effectiveness of the proposed model. By using the modified GCV formula in Section 3.3, we selected the best model according to the minimum GCV as the criterion. In this study, a comparison of these three models revealed that the proposed model (Model 1) had the smallest GCV compared to the other uncombined estimator models for multiresponse nonparametric regression. In summary, these results indicate that the proposed model is better recommended for modeling the 2018 HDI data of East Java Province. Another important result from Table 2 is the comparison of the number of knots and oscillation combinations from the proposed model. Similar to the simulation studies, we used a combination of three knot and oscillation parameters each. According to the minimum GCV as the criterion, the leading model was a three knots-three oscillations model with a GCV equal to 1.39189 and MSE = 1.08377. In addition, this model yields a coefficient of determination (R 2 ) of 99.84%. Due to the satisfactory result of R 2 , it can be said that the proposed model is able to describe the variance of the response variable through the predictor variables exceptionally well. Another interesting result was the number of knot and oscillation combinations of the best model in the simulation study, in line with those of data application.
Appendix H presents the results of parameter estimation from the best model obtained from the minimum GCV in Table 2. As such, we obtained a three knots-three oscillations model. Based on the parameter estimation results for each response variable for the 2018 HDI data of East Java Province, Indonesia, the multiresponse nonparametric regression model with a combined truncated spline and Fourier series estimator can be written as

Discussion and Conclusions
This study presents the major findings from regression curve estimation with a combined truncated spline and Fourier series estimator for an additive model in multiresponse nonparametric regression using PWLS and WLS optimization, as shown below: , Furthermore, the result of error variance-covariance matrix estimation is as follows: From the simulation study, the minimum GCV obtained a satisfactory outcome through a large sample with small variance. In addition, data application to a real dataset of the 2018 HDI in East Java revealed that the proposed model had a better result than the uncombined model. An adequate coefficient of determination (R 2 ) from the best model indicates that the proposed model can explain the data variation remarkably well. Interestingly, the number of knots and oscillations of the best model in this simulation study and data application is consistent; that is a combination of three knots-three oscillations. In summary, these findings have significant implications for the understanding of regression curve estimation when using combined estimators for multiresponse nonparametric regression. Although the focus of the research was on combined truncated spline and Fourier series estimators, the procedure in this study can be applied to other combinations of estimators for multiresponse nonparametric regression.
The scope of this study was limited in terms of curve estimation and estimation of the variance-covariance matrix; thus, the major limitation was the absence of hypothesis testing and confidence intervals testing. Considering its necessity for good statistical practice, there is a commitment to perform such a goodness-of-fit test (model checking). Therefore, further research will be conducted to achieve hypothesis testing and confidence intervals testing. Another possible area is that a further simulation study could be performed using another function, such as an exponential or logarithmic function, to validate the capability of the proposed model. An additional variation of in a simulation study could be used to gain more insights into the performance evaluation of the proposed model.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
GoF goodness-of-fit GCV generalized cross-validation MSE mean square error MLE maximum likelihood estimation OLS ordinary least square PLS penalized least square PWLS penalized weighted least square WLS weighted least square

Appendix A
Lemma 1 can be proven by the completed , , … , as the GoF of PWLS optimization in Equation (7), as shown below: If ∑ , then Equation (A1) can be written as where is the Fourier series function, as shown in Equation (6). Furthermore, the function with one predictor variable, symbolized as , can be written in the following matrix form: Similarly, for 1,2, … , predictors, the Fourier series function for the multiresponse nonparametric regression presented in Equation (A3) can be written as Thus, the GoF in Equation (7), where is the Fourier series function, can be expressed as , , … , As a result, the GoF component in Equation (A5) can be drawn in the following matrix form: , … , .

Appendix B
The penalty component of the PWLS optimization in Equation (7) is as follows: where is presented in Equation (6). To prove Lemma 2, let us begin with solving the second derivative in Equation (A6), as follows:

Appendix C
According to the result of the GoF in Lemma 1 and the penalty in Lemma 2, the PWLS optimization in Equation (7) can be written as To solve the optimization in Equation (A8), the estimator can be obtained by deriving the partial Equation (A8) against and equating the result with 0, as follows: The subsequent step is to substitute in Equation (A9) into Equation (A4), such that the estimator of the truncated spline function can be written as . (A11)

Appendix D
The solution in Theorem 1 still contains an component, where is a truncated linear spline function, as described in Equation (5). Hence, to complete the PWLS optimization in Equation (7), it is necessary to estimate the regression curve of . First, the multiresponse nonparametric regression curve of the truncated spline component in Equation (5), which involves only one predictor, is written in the following matrix form: Consequently, with the 1,2, … , predictor can be written as follows: According to Equation (A13), the truncated spline function for multiresponse nonparametric regression can be expressed in matrix form, as follows: To solve the regression curve estimation of the truncated spline and Fourier series estimators for multiresponse nonparametric regression with PWLS optimization, substitute Equations (A11) and (A14) into Equation (4), as follows: ; . (A15) The subsequent step is to solve Equation (A15) with WLS optimization. Therefore, the estimator can be obtained by completing the following WLS:

Appendix E
The second stage of the proposed method is performed using WLS optimization, as follows: , then we perform the multiplication in parentheses to obtain, WLS optimization can be obtained by partial derivation of against . When the expression is equal to 0, the result is Equation ( The maximum value of the likelihood function is obtained by partially deriving Equation (A24) against ℓ and equating the result with 0, as follows: The subsequent step is to estimate and using the ordinary least square (OLS) method, with the result as follows: