Next Article in Journal
Convective Heat Transfer of a Pseudoplastic Nanosuspension within a Chamber with Two Heated Wall Sections of Various Heat Fluxes
Previous Article in Journal
A Sparsified Densely Connected Network with Separable Convolution for Finger-Vein Recognition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Truncated Spline and Local Linear Mixed Estimator in Nonparametric Regression for Longitudinal Data and Its Application

by
Idhia Sriliana
1,2,
I Nyoman Budiantara
1,* and
Vita Ratnasari
1
1
Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Surabaya 60111, Indonesia
2
Department of Statistics, Faculty of Mathematics and Natural Sciences, University of Bengkulu, Bengkulu 38371, Indonesia
*
Author to whom correspondence should be addressed.
Symmetry 2022, 14(12), 2687; https://doi.org/10.3390/sym14122687
Submission received: 7 November 2022 / Revised: 10 December 2022 / Accepted: 13 December 2022 / Published: 19 December 2022

Abstract

:
Longitudinal data modeling is widely carried out using parametric methods. However, when the parametric model is misspecified, the obtained estimator might be severely biased and lead to erroneous conclusions. In this study, we propose a new estimation method for longitudinal data modeling using a mixed estimator in nonparametric regression. The objective of this study was to estimate the nonparametric regression curve for longitudinal data using two combined estimators: truncated spline and local linear. The weighted least square method with a two-stage estimation procedure was used to obtain the regression curve estimation of the proposed model. To account for within-subject correlations in the longitudinal data, a symmetric weight matrix was given in the regression curve estimation. The best model was determined by minimizing the generalized cross-validation value. Furthermore, an application to a longitudinal dataset of the poverty gap index in Bengkulu Province, Indonesia, was conducted to illustrate the performance of the proposed mixed estimator. Compared to the single estimator, the truncated spline and local linear mixed estimator had better performance in longitudinal data modeling based on the GCV value. Additionally, the empirical results of the best model indicated that the proposed model could explain the data variation exceptionally well.

1. Introduction

Regression analysis is a statistical technique that plays a crucial role in inferential statistics and is widely employed in numerous scientific disciplines. The classical regression method used for many years is parametric regression, which assumes the regression curve’s shape follows a specified functional form, such as linear, quadratic or cubic [1]. Along with the development of computational science and limitations in parametric regression models, i.e., the assumption of the specified regression curve functional form, a nonparametric regression model that does not necessitate numerous assumptions is becoming more recommended for solving problems in various applied fields. The nonparametric regression method has a high degree of flexibility since the data can determine the form of the estimated curve without interference from the researcher’s subjectivity [2]. In nonparametric regression, there are various functions utilized to estimate the regression curve, which are local linear [3,4,5], spline [6,7,8], kernel [9,10,11], local polynomial [12,13,14], and Fourier series [15,16,17] functions. In addition [18], developed moving extremes ranked set sampling (MERSS) to estimate a simple linear regression model.
The nonparametric regression approach, which has been conducted using various estimators, is capable of modeling a dataset, but the model has a weakness. It is limited to one data pattern form, i.e., the non-mixed estimator (often called a single estimator) in nonparametric regression. Theoretically, it assumes that each predictor variable always has the same relationship pattern as the response variable. In reality, however, the data pattern of each predictor has a different form. In addition, nonparametric regression with a single estimator is unable to handle different data patterns between predictors. Because of these weaknesses, researchers have developed a nonparametric regression model by involving two or more estimators in the nonparametric regression model, from now on referred to as a mixed estimator. The idea of developing a mixed estimator is taken from the concept of semiparametric regression, such as that carried out by [19]. Several mixed estimator studies have been conducted to estimate the nonparametric regression curve; for example [20,21,22] developed a mixed estimator of spline and kernel for estimating nonparametric regression curves. Similarly, in [23,24], researchers presented nonparametric regression curve estimation using an estimator that combined truncated spline and Fourier series. On the other hand, [25] proposed two nonparametric estimators of the regression function with mixed measurement errors.
The development of research that uses the regression method not only deals with cross-sectional data but also with longitudinal data. Longitudinal data modeling commonly uses a parametric regression method. Several methods in parametric models for longitudinal data have been developed and can be read about in [26], with related references listed therein. Occasionally, parametric models for longitudinal data are too restrictive for many applications because when the parametric model is misspecified, the estimators might be severely biased and lead to erroneous conclusions. In recent years, a large amount of literature on longitudinal data analysis has proposed several estimators in nonparametric regression, such as spline [27,28], kernel [29,30], Fourier series [31], and local linear [32,33] to overcome this difficulty.
In longitudinal studies, the nonparametric regression model has not dealt with two combined estimators, i.e., a mixed estimator. Therefore, this study developed an adaptive method to estimate regression curves for longitudinal data by using a mixed estimator in nonparametric regression. Among several estimators in nonparametric regression, the truncated spline is one of the more renowned estimators due to its high flexibility in handling data that change at particular subintervals. Moreover, this estimator has an accurate visual interpretation [2]. Meanwhile, the local linear estimator is widely used in nonparametric regression because it is simple and easy to understand. The local linear estimator is one of the smoothing techniques used in the nonparametric approach [34]. This estimator has a good ability to model data that have a monotonous pattern, such as upward or downward trends. The estimation is obtained by locally fitting a one-degree polynomial to the data via weighted least squares (WLS) optimization. Considering the advantages of these two estimators, as mentioned earlier, the primary objective of this study is to acquire a nonparametric regression curve estimation for longitudinal data using a mixed estimator that combines truncated spline and local linear functions. The curve estimation was carried out using WLS optimization through two-stage estimation. In addition, to illustrate the performance of the proposed model, an application to a real dataset is given for modeling poverty gap index data in Bengkulu Province, Indonesia.
The remainder of the paper is structured as follows: Section 2 presents an overview of the longitudinal data nonparametric regression model, truncated spline function, local linear function, and WLS optimization. Section 3 comprises four subsections. The estimation of the nonparametric regression curve for longitudinal data using a truncated spline and local linear mixed estimator is presented in Section 3.1., followed by the selection of the optimal knot point and bandwidth parameter to obtain the best model in Section 3.2. Section 3.3 provides the implementation of the proposed model in a real longitudinal data case. A discussion of the findings and future research are addressed in the final section.

2. Materials and Methods

In longitudinal studies, data from individuals are collected repeatedly over time. Longitudinal data are usually correlated between observations within a subject but are independent between subjects [35]. Given a paired longitudinal dataset t 1 i l , , t p i l , x 1 i l , , x q i l , y i l , the relationship between predictor ( t 1 i l , , t p i l , x 1 i l , , x q i l ) and response y i l variables is assumed to follow a nonparametric regression model for longitudinal data, which can be expressed as:
y i l = μ i l t 1 i l , , t p i l , x 1 i l , , x q i l + ε i l ,   i = 1 , 2 , , n ,   l = 1 , 2 , , L
where μ is the regression curve and ε i l is a random error that is assumed to be identical, independent and normally distributed. In this study, n represents the number of subjects, p and q denote the number of predictor variables, and L represents the number of observations for each subject. For simplicity, Equation (1) can be represented in matrix form as follows:
y = μ t , x + ε
The regression curve of μ i l t 1 i l , , t p i l , x 1 i l , , x q i l for each i-th subject is assumed unknown and to be an additive model. Thus, it can be written as:
μ i l t 1 i l , , t p i l , x 1 i l , , x q i l = j = 1 p f j i t j i l + k = 1 q g k i x k i l
in which j = 1 p f j i t j i l represents the truncated spline component and k = 1 q g k i x k i l is the local linear component. According to Equations (1) and (3), the paired data t 1 i l , , t p i l , x 1 i l , , x q i l , y i l following the nonparametric regression model for longitudinal data can be rewritten as follows:
y i l = j = 1 p f j i t j i l + k = 1 q g k i x k i l + ε i l
where random error ε i l has the following assumptions:
E ε i l = 0 ; Cov ε i l , ε i l = σ i l 2 , if l = l σ i l , l , if l l for i = 1 , 2 , , n and l = 1 , 2 , , L
such that the error varian-covarian matrix Cov ε = V can be written as follows:
V = V 1 0 0 0 V 2 0 0 0 V n , V i = σ i 1 2 σ i 1 , 2 σ i 1 , L σ i 2 , 1 σ i 2 2 σ i 2 , L σ i L , 1 σ i L , 2 σ i L 2 , i = 1 , 2 , , n
Furthermore, the regression curve component f j i t j i l is approximated by a linear truncated spline with knots λ 1 j , λ 2 j , , λ S j , as given in Equation (5):
f j i t j i l = θ 0 i + θ 1 j i t j i l + m = 1 S α m j i t j i l λ m j i +
with the truncated function,
t j i l λ m j i + = t j i l λ m j i ,   t j i l λ m j i 0 ,   t j i l < λ m j i
Meanwhile, the regression curve component g k i x k i l is approximated by a local linear function at fixed point x 01 i , x 02 i , , x 0 q i . Assume that g k i x k i l are independently on different interval and has a d + 1 st derivative for d = 1 at x 0 k i . By Taylor expansion, g k i x k i l can be locally approximated by a local linear function, defined as follows:
g k i x k i l = g k i x k i l + g k i x k i l x k i l x 0 k i = β 0 i + β 1 k i x k i l x 0 k i
where x k i l I h x 0 k i , I h x 0 k i = x 0 k i h , x 0 k i + h is the local neighborhood with the size specified by a constant h > 0 called the bandwidth parameter.
In general terms, the WLS optimization form for estimating the regression curve of μ i l using a mixed estimator of truncated spline and local linear form in Equation (4) is equal to the goodness of fit component that can be defined by
Min f j , g k i = 1 n l = 1 L v i l y i l j = 1 p f j i t j i l k = 1 q g k i x k i l 2 w h x k i l x 0 k i
However, the regression curve estimate of the proposed model in this study is achieved simultaneously through a two-stage estimation technique. The first stage is to complete the estimation of the local linear component. The following stage involves the completion of truncated spline component estimation. The estimation of the two components is carried out using WLS optimization. The estimation results of each component are given by Theorems 1 and 2 in Section 3.1.

3. Results

3.1. Estimation of the Nonparametric Regression Curve for Longitudinal Data Using a Truncated Spline and Local Linear Mixed Estimator

As mentioned previously, a two-stage estimation technique using weighted least squares (WLS) optimization was adopted to generate the truncated spline and local linear mixed estimator in the nonparametric regression for longitudinal data. Consequently, some lemmas and theorems are needed to obtain the regression curve estimation of the proposed model. The first lemma describes the goodness of fit of the local linear component. The first stage of estimation, as stated in Theorem 1, is derived by using the result of Lemma 1. Lemma 2 shows the second stage of estimation, i.e., WLS optimization, to estimate the regression curve of the truncated spline component, with the estimation results presented in Theorem 2. Appendixe A, Appendixe B, Appendixe C and Appendixe D provide all the proofs for the lemmas and theorems.
 Lemma 1. 
If the regression curve of the local linear component  g k i x k i l in the nonparametric regression model for longitudinal data is given by Equation (7), then the goodness of fit can be determined using the following equation:
y X x 0 β T V W h y X x 0 β
where W h = d i a g W h 1 , W h 2 , , W h n and V = d i a g V 1 , V 2 , , V n are the n L × n L symmetric matrix as a weighting of the local linear component and longitudinal data, respectively.
y = y 1 T , y 2 T , , y n T T , y i = y i 1 , y i 2 , , y i L T , y i l = y i l j = 1 p f j i t j i l ,   i = 1 , 2 , , n ,   l = 1 , 2 , , L , X x 0 = X 1 x 01 0 0 0 X 2 x 02 0 0 0 X n x 0 n , X i x 0 i = 1 x 1 i 1 x 01 i x 2 i 1 x 02 i x q i 1 x 0 q i 1 x 1 i 2 x 01 i x 2 i 2 x 02 i x q i 2 x 0 q i 1 x 1 i L x 01 i x 2 i L x 02 i x q i L x 0 q i , β = β 1 β 2 β n T , β i = β 0 i β 11 i β 12 i β 1 q i T .  
The proof of the first lemma can be seen in Appendix A.
 Theorem 1. 
If the goodness of fit is given in Lemma 1, then the regression curve estimation of the local linear component can be obtained from WLS optimization, which is as follows:
g ^ λ , h t , x = J y  
where y = y f and J = X x 0 X x 0 T V W h X x 0 1 X x 0 T V W h .
The proof of Theorem 1 is provided in Appendix B. Furthermore, Lemma 2 describes the second stage of WLS optimization to estimate the regression curve of the truncated spline component, with Theorem 2 being the estimation result.
 Lemma 2. 
If the regression curve of the truncated spline component  f j i t j i l is as presented in Equation (5), then the WLS optimization can be formulated as follows:
I - J y I - J T λ γ T V I - J y I - J T λ γ
where J = X x 0 X x 0 T V W h X x 0 1 X x 0 T V W h and V = d i a g V 1 , V 2 , , V n is the n L × n L symmetric matrix as a weighting of the longitudinal data.
T λ = P 1 0 0 0 P 2 0 0 0 P n | R 1 0 0 0 R 2 0 0 0 R n , P i = 1 t 1 i 1 t 2 i 1 t p i 1 1 t 1 i 2 t 2 i 2 t p i 2 1 t 1 i L t 2 i L t p i L , R i = t 1 i 1 λ 11 i + t 1 i 1 λ S 1 i + t p i 1 λ 1 p i + t p i 1 λ S p i + t 1 i 2 λ 11 i + t 1 i 2 λ S 1 i + t p i 2 λ 1 p i + t p i 2 λ S p i + t 1 i L λ 11 i + t 1 i L λ S 1 i + t p i L λ 1 p i + t p i L λ S p i + , γ = θ α , θ = θ 01 θ 111 θ 121 θ 1 p 1 θ 0 n θ 11 n θ 12 n θ 1 p n T , α = α 111 α S 11 α 1 p 1 α S p 1 α 11 n α S 1 n α 1 p n α S p n .
The evidence to support Lemma 2 can be found in Appendix C.
 Theorem 2. 
If Lemma 2 provides the optimization of WLS, then the regression curve estimation of the truncated spline component in the nonparametric regression model for the longitudinal data in Equation (4) can be obtained by WLS optimization, such that
f ^ λ , h t , x = T λ K 1 L y
where K = J T 2 V J T λ V T λ and L = I J T V I J .
In addition, an explanation of how to prove Theorem 2 is shown in Appendix D.
After obtaining the estimation of the truncated spline component f ^ λ , h t , x in Theorem 2, the estimation result of g ^ λ , h t , x in Theorem 1 can be expressed as Equation (12). We start by substituting Equation (10) into Equation (A9), which yields the following equation:
β ^ = X x 0 T V W h X x 0 1 X x 0 T V W h y = X x 0 T V W h X x 0 1 X x 0 T V W h y f ^ = X x 0 T V W h X x 0 1 X x 0 T V W h y T λ γ ^ = X x 0 T V W h X x 0 1 X x 0 T V W h y T λ K 1 L y = X x 0 T V W h X x 0 1 X x 0 T V W h I T λ K 1 L y
Finally, the regression curve estimation of the local linear component is obtained by substituting Equation (11) into Equation (A2) and it can be rewritten as Equation (12).
g ^ λ , h t , x = X x 0 β ^ = X x 0 X x 0 T V W h X x 0 T X x 0 T V W h I T λ K 1 L y = J I T λ K 1 L y
The most important finding of this study is the curve estimation of the truncated spline and local linear mixed estimator in the nonparametric regression for longitudinal data. This finding is shown in Corollary 1.
 Corollary 1. 
Based on the estimation of truncated spline and local linear components in Equation (10) and Equation (12), respectively, the estimation of the nonparametric regression curve for longitudinal data using the truncated spline and local linear mixed estimator can be expressed as a matrix:
μ ^ λ , h t , x = T λ K 1 L y + J I T λ K 1 L y
 Proof of Corollary 1. 
The regression curve estimation of the mixed estimator in nonparametric regression model for longitudinal data in Equation (3) can be rewritten in the following matrix form:
μ ^ λ , h t , x = f ^ λ , h t , x + g ^ λ , h t , x
By substituting the regression curve estimation results of f ^ λ , h t , x in Equation (10) and g ^ λ , h t , x in Equation (12), μ ^ λ , h t , x can be defined as follows:
μ ^ λ , h t , x = T λ K 1 L y + J I T λ K 1 L y = A λ , h y + B λ , h y
For simplification, μ ^ λ , h t , x can also be expressed as
μ ^ λ , h t , x = T λ K 1 L + J I T λ K 1 L y

3.2. Optimal Number of Knots and Bandwidth Selection

One method that is commonly used to determine the optimal knot is generalized cross validation (GCV) [36], in which the optimal knot is obtained by taking the minimum GCV value. In the case of longitudinal data, Wu and Zhang, in [35], generalize the GCV method for selecting the optimal knot. In this study, modifications to the GCV method were carried out for the selection of the knot and bandwidth parameters on a truncated spline and local linear mixed estimator in a nonparametric regression model for longitudinal data. The modified GCV method is given by Lemma 3 and Lemma 4.
 Lemma 3. 
If, given the regression curve estimation of the truncated spline and local linear mixed estimator in nonparametric regression for longitudinal data, as in Equation (13), then the mean square error (MSE) of the model is as follows:
MSE λ , h = 1 N I A λ , h B λ , h y 2
where N = n × L , A λ , h = T λ K 1 L and B λ , h = J I T λ K 1 L .
 Proof of Lemma 3. 
From Theorems 1 and 2, we obtain the curve estimation of the truncated spline and local linear mixed estimator in a nonparametric regression model for longitudinal data written as μ ^ λ , h t , x . Thus, based on μ ^ λ , h t , x in Equation (13), the MSE of the model is given as follows:
MSE λ , h = 1 N y μ ^ λ , h t , x T y μ ^ λ , h t , x = 1 N y A λ , h y + B λ , h y T y A λ , h y + B λ , h y = 1 N y A λ , h + B λ , h y T y A λ , h + B λ , h y = 1 N I A λ , h B λ , h y T I A λ , h B λ , h y = 1 N I A λ , h B λ , h y 2
 Lemma 4. 
If given the regression curve estimation μ ^ λ , h t , x in Equation (13) and MSE λ , h in Lemma 3, then the GCV function for the truncated spline and local linear mixed estimator in a nonparametric regression model for longitudinal data is given by:
GCV λ , h = 1 N I A λ , h B λ , h y 2 N 1 t r a c e I A λ , h B λ , h 2
 Proof of Lemma 4. 
Based on the regression curve estimation μ ^ λ , h t , x in Equation (13) and MSE λ , h in Lemma 3, the GCV function for the truncated spline and local linear mixed estimator in a nonparametric regression model for longitudinal data can be formulated as follows:
GCV λ , h = MSE λ , h N 1 t r a c e I A λ , h B λ , h 2 = 1 N I A λ , h B λ , h y 2 N 1 t r a c e I A λ , h B λ , h 2
The optimum knot λ and the bandwidth parameter h are obtained by minimizing the modification of the GCV function for the proposed mixed estimator model in Equation (15), as shown below:
  GCVopt λ o p t , h o p t = M i n λ , h 1 N I A λ , h B λ , h y 2 N 1 t r a c e I A λ , h B λ , h 2
where N = n × L , A λ , h = T λ K 1 L and B λ , h = J I T λ K 1 L derive from the regression curve estimation of the proposed mixed estimator model, as in Equation (13). □

3.3. Application to Real Data

In this part, we make an effort to demonstrate how the proposed model can be applied to a real case. The proposed mixed estimator model, along with its curve estimation and modified GCV function, was applied in order to model poverty gap index data in 10 regencies across Bengkulu Province, Indonesia, over a twelve-year period (2010–2021). The dataset was a longitudinal observation consisting of ten regencies as subjects with twelve repeated times. The observed response ( y ) in this study was the distribution pattern of poverty gap index data in each regency. The poverty gap index (hereinafter PGI-P1) is one of the poverty indicators established by Statistics Indonesia to measure poverty intensity. PGI-P1 is defined as the average measure of the expenditure gap of each poor population toward the poverty line. A decrease in PGI-P1 indicates that the average expenditure of poor people tends to be closer to the poverty line, which means that the expenditure inequality of the poor is also decreasing [37].
Poverty eradication is the first goal of the Sustainable Development Goals (SDGs) established by the United Nations in 2015. In Indonesia, the poverty issue has become a strategic topic and a research priority for both central and local governments. Bengkulu is one of the provinces in Indonesia that requires tremendous attention in poverty alleviation programs. The poverty rate in Bengkulu is approximately double the national poverty rate. BPS socio-economic data (March 2022) report that Bengkulu is among Indonesia’s 10 poorest provinces, with 14.62% of the population living in poverty [38]. According to BPS socio-economic data, over the past twelve years, the poverty rate in Bengkulu has generally declined; however, this is not in line with the decrease in the poverty gap index. In this regard, PGI-P1 could help evaluate (public or private) policy in the area of poverty reduction programs. Therefore, PGI-P1 was a potential topic to be discussed in this research, particularly the PGI-P1 data in Bengkulu Province.
Several longitudinal studies have been conducted to analyze the factors that significantly affect PGI-P1, such as the average length of school years [39,40], literacy rate [40], gross regional domestic product (GRDP) per capita [41,42], and percentage of households working in the agricultural sector [40,42]. However, we used only the average length of school years and percentage of households working in the agricultural sector as predictor variables in this study. Furthermore, the partial relationship between the response variable and each predictor variable for ten regencies is demonstrated by the scatterplot in Figure 1.
Based on Figure 1, the partial scatterplot between PGI-P1 and the average length of school years ( x ) tends to change monotonically with local acuity. Thus, it is assumed that the average length of school years is a predictor variable for the local linear. Meanwhile, the partial scatter plot between PGI-P1 and percentage of households working in the agricultural sector ( t ) indicated a change in the data pattern at particular subintervals in some subjects; this is a good fit for the truncated spline component.
Another important point observed in Table 1 is the comparison of some proposed model combinations. Based on the GCV criterion, the leading PGI-P1 model produces the smallest GCV of 19.2324, obtained from the model with a combination of 2 knots, second weight, and bandwidth parameter of 0.9067. This model produces a coefficient of determination (R-squared) of 92.17% and a mean square error (MSE) equal to 0.2917. The R-squared value implies that 92.17% of the variance in PGI-P1 in Bengkulu Province could be explained by predictors in the model. Meanwhile, other variables not incorporated in the model produce a relatively small contribution to describing the data variability. The results of knot location for each subject and the best model parameter estimation are presented in Appendix E and Appendix F, respectively.
In general, the estimation of the nonparametric regression model for longitudinal data using the truncated spline and local linear mixed estimator with two knots and two predictors, one of the predictors following spline function and the other following local linear function, can be formulated as follows:
y ^ i l = θ ^ 0 i + θ ^ 1 i t i l + α ^ 1 i t i l λ 1 i + + α ^ 2 i t i l λ 2 i + + β ^ 0 i + β ^ 1 i x i l x 0 i
for i = 1 , 2 , , 10 ,   l = 1 , 2 , , 12 .
Based on the results of parameter estimation for the best model in Appendix F, the nonparametric regression model with a truncated spline and local linear mixed estimator for modeling the PGI-P1 data of Bengkulu over the past twelve years for each regency (subject) can be written as follows
Subject 1
y ^ 1 l = 6.7469 + 0.1778 t 1 l 0.1705 t 1 l 46.36 + + 0.0721 t 1 l 76.44 + + 2.2127 0.1967 x 1 l 9.26 = 4.5343 + 0.1778 t 1 l 0.1705 t 1 l 46.36 + + 0.0721 t 1 l 76.44 + 0.1967 x 1 l 9.26
Subject 2
y ^ 2 l = 5.1266 0.0022 t 2 l + 0.0226 t 2 l 35.54 + 0.0953 t 2 l 78.41 + 3.2662 1.0230 x 2 l 8.28 = 1.8604 0.0022 t 2 l + 0.0226 t 2 l 35.54 + 0.0953 t 2 l 78.41 + 1.0230 x 2 l 8.28
Subject 10
y ^ 10 l = 1.2430 + 0.3772 t 10 l 0.2532 t 10 l 7.98 + 0.3447 t 10 l 23.92 + + 2.8213 + 1.3042 x 10 l 3.51 = 1.5783 + 0.3772 t 10 l 0.2532 t 10 l 7.98 + 0.3447 t 10 l 23.92 + + 1.3042 x 10 l 3.51
Interpretation of the model in each subject over time is generally divided for each predictor and each subinterval of the truncated spline function. As an example, for subject 1 (South Bengkulu Regency), the interpretation of the model for predictor (t), percentage of households working in the agricultural sector, is as follows: if it is assumed that the other predictor (average length of school years) is constant, then the influence of the percentage of households working in the agricultural sector on PGI-P1 in South Bengkulu can be expressed in the following equation:
y ^ 1 l = 0.1778 t 1 l 0.1705 t 1 l 46.36 + + 0.0721 t 1 l 76.44 + + c
in which
c = 4.5343 0.1967 x 1 l 9.26
The model in Equation (16) possesses three subintervals and it can be interpreted using the following truncated function:
y ^ 1 l = 0.1778 t 1 l + c , t 1 l 46.36 7.9049 + 0.0073 t 1 l + c , 46.36 < t 1 l 76.44 5.5116 + 0.0794 t 1 l + c , t 1 l > 76.44
Based on the truncated function in Equation (17), in which the first subinterval was assigned to the percentage of households working in the agricultural sector in South Bengkulu over twelve years that was less than 46.36, an increase of one point in percentage of households working in the agricultural sector will increase the PGI-P1 by 0.1778 points. The second subinterval contains the percentage of households working in the agricultural sector in the range of 46.36 to 76.44 and also had a positive correlation; an escalation of one point in the percentage of households working in the agricultural sector will add 0.0073 points to PGI-P1. Meanwhile, the last subinterval was applied to the percentage of households working in the agricultural sector that was greater than 76.44, which occurred only in 2010. In that year, the percentage of households working in the agricultural sector also had a positive correlation with PGI-P1. If there is an increase of one point in the percentage of households working in the agricultural sector, then the index of PGI-P1 would increase by 0.0794 points. This interpretation is applicable to other subjects in the same way.
Furthermore, based on the empirical results of the best model, it is also possible to visually compare the actual and fitted values of the response variable for each subject. A comparison between the response variable (blue line) and the fitted values (red line) using the proposed model is presented in Figure 2. Some of the fitted values, as shown on the graph, have a similar pattern to the actual data, while others do not; nonetheless, the discrepancy is not extremely large. In summary, this application study certainly contributes to our understanding of the proposed model, which is the truncated spline and local linear mixed estimator in nonparametric regression for longitudinal data, notwithstanding its limitations.

4. Discussion and Conclusions

In this study, nonparametric regression with a new mixed estimator is proposed for estimating curves in longitudinal data modeling. We combine the truncated spline and the local linear as the classes of estimators in nonparametric regression. The estimation of the proposed model’s regression curve using two-stage WLS optimization is as follows:
μ ^ λ , h t , x = T λ K 1 L + J I T λ K 1 L y
Furthermore, the application of the real dataset to model the PGI-P1 data in Bengkulu Province shows that the proposed mixed estimator model produces better results compared to the single estimator model. One of the most important findings is that the best PGI-P1 model is obtained from the proposed model using a combination of two knots, the second weight, and some value of bandwidth parameters. The best model yields an R-squared value that is quite significant in explaining the data variability based on the predictors in the model. In summary, these implementation studies may provide an understanding of regression curve estimation using the truncated spline and local linear mixed estimator in the nonparametric regression for longitudinal data.
A major limitation of this research is the absence of a confidence interval estimation and hypothesis testing of the proposed model. Therefore, further research ought to be conducted to attain a confidence interval estimation and to perform hypothesis testing. The applicability of the approach described in this paper to different mixed estimators in nonparametric regression for longitudinal data is also a potential issue for further research. Additionally, other case studies could be performed using combinations of the proposed model with more predictors, higher knot numbers and more varied bandwidth parameters to learn more about the performance evaluation of the proposed model.

Author Contributions

Conceptualization, I.S., I.N.B. and V.R.; methodology, I.S. and I.N.B.; software, I.S. and V.R.; validation, I.N.B. and V.R.; formal analysis, I.S. and I.N.B.; investigation, I.S. and V.R.; data curation, I.S.; writing—original draft preparation, I.S.; writing—review and editing, I.N.B. and V.R.; visualization, I.S.; supervision, I.N.B. and V.R.; project administration, I.N.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Acknowledgments

The authors would like to thank the anonymous reviewers and the editor for their helpful comments.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The regression curve g k i x k i l is assumed to be unknown but smooth and contained in a specific function space. That regression curve is approximated using a local linear estimator, as presented in Equation (7). The local linear function g k i x k i l with one predictor variable, notated by k = 1 , can be written as
g k i = g k i x k i 1 , g k i x k i 2 , , g k i x k i L T , i = 1 , 2 , , n
According to Equation (7), the local linear function in the component of regression curve   g k i as written above can be described in the following matrix form:
  g k i = g k i x k i 1 g k i x k i 2 g k i x k i L = X k i x 0 k i β k i = β 0 i + β 1 k i x k i 1 x 0 k i β 0 i + β 1 k i x k i 2 x 0 k i β 0 i + β 1 k i x k i L x 0 k i = 1 x k i 1 x 0 k i 1 x k i 2 x 0 k i 1 x k i L x 0 k i β 0 i β 1 k i
Consequently, the local linear function with q number of predictors for the component of regression curve g i given in Equation (A1) can be expressed as follows:
g i = X 1 i x 01 i β 1 i + X 2 i x 02 i β 2 i + + X q i x 0 q i β q i = k = 1 q X k i x 0 k i β k i = X i x 0 i β i ,
thus, obtained
  g = g 1 T g 2 T g n T = X 1 x 01 β 1 X 2 x 02 β 2 X n x 0 n β n = X 1 x 01 0 0 0 X 2 x 02 0 0 0 X n x 0 n β 1 β 2 β n = X x 0 β
The model of nonparametric regression for longitudinal data in Equation (4) can be rewritten as
y i l j = 1 p f j i t j i l = k = 1 q g k i x k i l + ε i l y i l = k = 1 q g k i x k i l + ε i l ,   i = 1 , 2 , , n ,   l = 1 , 2 , , L
In the matrix form, the model in Equation (A3) can be written as in Equation (A4).
y f = g + ε y * = g + ε
Thus, based on Equation (A3), where g k i x k i l is the local linear function, the goodness of fit for WLS optimization as presented in Equation (8) can be written as follows:
i = 1 n l = 1 L v i l y i l k = 1 q g k i x k i l 2 w h x k i l x 0 k i = i = 1 n l = 1 L v i l y i l k = 1 q β 0 i + β 1 k i x k i l x 0 k i 2 w h x k i l x 0 k i
As a result, the goodness of fit for the local linear component in Equation (A5) can be represented by the matrix form below
V y X x 0 β 2 W h = y X x 0 β T V W h y * X x 0 β

Appendix B

According to the goodness of fit in Lemma 2, the WLS optimization in Equation (8) can be expressed in the form
Min f j , g k i = 1 n l = 1 L v i l y i l k = 1 q g k i x k i l 2 w h x k i l x 0 k i
In the matrix from, Equation (A6) is written as
Min β y X x 0 β T V W h y X x 0 β
Suppose Q β = y X x 0 β T V W h y X x 0 β , thus Equation (A7) can be described as follows
Q β = y T V W h y 2 β T X x 0 T V W h y + β T X x 0 T V W h X x 0 β
Subsequently, Equation (A7) can be rewritten in the form:
Min β Q β = Min β y T V W h y 2 β T X x 0 T V W h y + β T X x 0 T V W h X x 0 β
The estimator β ^ can be obtained by solving the optimization in Equation (A8). The completion is done by taking derivative partial Q β against β and equating the result with zero, which is as follows:
Q β β = 0 y T V W h y 2 β T X x 0 T V W h y + β T X x 0 T V W h X x 0 β β = 0 2 X x 0 T V W h y + 2 X x 0 T V W h X x 0 β ^ = 0
giving the result
β ^ = X x 0 T V W h X x 0 1 X x 0 T V W h y
By substituting β ^ in Equation (A9) into Equation (A2), the regression curve estimation of the local linear component can be written as
g ^ λ , h t , x = X x 0 β ^ = X x 0 X x 0 T V W h X x 0 1 X x 0 T V W h y = J y
in which y = y f and X x 0 X x 0 T V W h X x 0 1 X x 0 T V W h .

Appendix C

The estimation result of Theorem 1 for the local linear component still includes the linear truncated spline function f , as shown in Equation (5). Therefore, to complete the WLS optimization in Equation (8), the estimation of the regression curve for the truncated spline component f j i t j i l is required. According to Equation (5), the truncated spline component in the nonparametric regression curve for longitudinal data, which only has one predictor j = 1 , can be expressed in the following matrix form:
f j i = f j i t j i 1 f j i t j i 2 f j i t j i L = θ 0 i + θ 1 j i t j i 1 + m = 1 S α m j i t j i 1 λ m j i + θ 0 i + θ 1 j i t j i 2 + m = 1 S α m j i t j i 2 λ m j i + θ 0 i + θ 1 j i t j i L + m = 1 S α m j i t j i L λ m j i + = 1 t j i 1 1 t j i 2 1 t j i L θ 0 i θ 1 j i + t j i 1 λ 1 j i + t j i 1 λ 2 j i + t j i 1 λ S j i + t j i 2 λ 1 j i + t j i 2 λ 2 j i + t j i 2 λ S j i + t j i L λ 1 j i + t j i L λ 2 j i + t j i L λ S j i + α 1 j i α 2 j i α S j i = P j i θ j i + R j i α j i
Therefore, using Equation (A11), the function of truncated spline f i for j = 1 , 2 , , p predictors, can be written as follows:
f i = P 1 i θ 1 i + R 1 i α 1 i + P 2 i θ 2 i + R 2 i α 2 i + + P p i θ p i + R p i α p i = P 1 i θ 1 i + P 2 i θ 2 i + + P p i θ p i + R 1 i α 1 i + R 2 i α 2 i + + R p i α p i = P i θ i + R i α i
According to Equation (A12), the function of the truncated spline component in nonparametric regression for longitudinal data can be written in following matrix form:
f = f 1 T f 2 T f n T = P 1 θ 1 P 2 θ 2 P n θ n + R 1 α 1 R 2 α 2 R n α n = P 1 0 0 0 P 2 0 0 0 P n θ 1 θ 2 θ n + R 1 0 0 0 R 2 0 0 0 R n α 1 α 2 α n ,
such that
f = P 1 0 0 0 P 2 0 0 0 P n | R 1 0 0 0 R 2 0 0 0 R n θ 1 θ 2 θ n α 1 α 2 α n = P | R θ α
Equation (A13) can be rewritten in the form
f = T λ γ
where T λ = P | R is n L × n ( 1 + p + S ) matrix and γ = θ α is n ( 1 + p + S ) × 1 vector.
Furthermore, the additive model of nonparametric regression for longitudinal data in Equation (4) can be expressed in this matrix form:
y = f + g + ε
Substituting Equation (A10) into Equation (A15) obtained
y = f + J y + ε
In order to estimate the regression curve of the truncated spline component through WLS optimization, Equation (A16) can be formulated as follows:
y J y = f + ε y J y f = f + ε I J y = I J f + ε I J y = I J T λ γ + ε
Thus, the error model ε can be written as
ε = I J y I J T λ γ
Therefore, the regression curve estimation can be obtained by solving Equation (A18) using WLS optimization. The WLS is given by
ε T V ε = I - J y I - J T λ γ T V I - J y I - J T λ γ

Appendix D

The second stage of the estimation procedure in the proposed method is performed by estimating the component of regression curve that is approximated by the truncated spline function using WLS optimization. According to the result of the WLS in Lemma 2, the WLS optimization can be written as follows:
Min γ I - J y I - J T λ γ T V I - J y I - J T λ γ
If given Q γ = I - J y I - J T λ γ T V I - J y I - J T λ γ , then by performing the multiplication of parentheses in Q γ , we obtain
Q γ = y J y + J T λ γ T λ γ T V y J y + J T λ γ T λ γ = y T V y + y T J T V J y + γ T T λ T J T V J T λ γ + γ T T λ T V T λ γ 2 y T J T V y + + 2 γ T T λ T J T V y 2 γ T T λ T V y 2 γ T T λ T J T V J y + 2 γ T T λ T V J y + 2 γ T T λ T V J T λ γ
The WLS optimization completion is obtained by setting equal to zero the partial derivative of Q γ against γ , i.e.,
Q γ γ = 0 ,
The partial derivation yields the parameter estimate of γ ^ , which is as follows:
2 T λ T J T V J T λ γ ^ + 2 T λ T V T λ γ ^ + 2 T λ T J T V y 2 T λ T V y + 2 T λ T J T V J y + T λ T V J y 4 T λ T V J T λ γ ^ = 0 2 T λ T J T V J T λ γ ^ + V T λ γ ^ + J T V y V y J T V J y + V J y 2 V J T λ γ ^ = 0 J T V J T λ γ ^ + V T λ γ ^ + J T V y V y J T V J y + V J y 2 V J T λ γ ^ = 0
Such that by solving the above equation, the result is obtained as in Equation (A20).
γ ^ = J T 2 V J T λ V T λ 1 I J T V I J y .
Equation (A20) can be rewritten in the following form:
γ ^ = K 1 L y
where K = J T 2 V J T λ V T λ and L = I J T V I J .
Finally, the regression curve estimation for the truncated spline component is obtained by substituting Equation (A21) into the truncated spline estimator component as in Equation (A14), which is given by
f ^ λ , h t , x = T λ γ ^ = T λ K 1 L y

Appendix E

Table A1. Knot location for each subject of the best model.
Table A1. Knot location for each subject of the best model.
SubjectKnots LocationSubjectKnots Location
λ 1 λ 2 λ 1 λ 2
Subject 146.3676.44Subject 636.8969.97
Subject 235.5478.41Subject 745.0679.98
Subject 345.4767.80Subject 849.9177.73
Subject 455.7480.09Subject 843.9877.67
Subject 550.5378.84Subject 107.9823.92

Appendix F

Table A2. The best model parameter estimation results.
Table A2. The best model parameter estimation results.
SubjectParameter NotationEstimated ValueSubjectParameter NotationEstimated Value
Subject 1 θ 01 −6.7469Subject 6 θ 06 −1.5656
θ 11 0.1778 θ 16 0.0207
α 11 −0.1705 α 16 −0.0142
α 21 0.0721 α 26 −0.0162
β 01 2.2127 β 06 2.5207
β 11 −0.1967 β 16 −0.4925
Subject 2 θ 02 5.1266Subject 7 θ 07 −3.5361
θ 12 −0.0022 θ 17 −0.0593
α 12 0.0226 α 17 0.0017
α 22 −0.0953 α 27 0.1892
β 02 −3.2662 β 07 7.9063
β 12 −1.0230 β 17 −1.6067
Subject 3 θ 03 1.4510Subject 8 θ 08 1.1165
θ 13 −0.1051 θ 18 −0.0094
α 13 0.1202 α 18 0.0138
α 23 0.0197 α 28 0.0050
β 03 5.1926 β 08 1.2300
β 13 0.0526 β 18 −0.5594
Subject 4 θ 04 8.5899Subject 9 θ 09 3.4895
θ 14 −0.2437 θ 19 −0.0491
α 14 0.3035 α 19 0.0727
α 24 −0.1710 α 29 −0.0930
β 04 8.0732 β 09 −0.1362
β 14 −0.5480 β 19 0.7152
Subject 5 θ 05 2.0107Subject 10 θ 010 −1.2430
θ 15 0.0535 θ 110 0.3772
α 15 −0.0766 α 110 −0.2532
α 25 0.1059 α 210 −0.3447
β 05 −1.4896 β 010 2.8213
β 15 −1.4033 β 110 1.3042

References

  1. Montgomery, C.D.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis, 5th ed.; John Willey & Sons, Inc.: Hoboken, NJ, USA, 2012. [Google Scholar]
  2. Eubank, R.L. Nonparametric Regression and Spline Smoothing, 2nd ed.; Marcel Dekker, Inc.: New York, NY, USA, 1999. [Google Scholar]
  3. Linke, Y.; Borisov, I.; Ruzankin, P.; Kutsenko, V.; Yarovaya, E.; Shalnova, S. Universal Local Linear Kernel Estimators in Nonparametric Regression. Mathematics 2022, 10, 2693. [Google Scholar] [CrossRef]
  4. Luo, S.; Zhang, C.Y.; Xu, F. The Local Linear M-Estimation with Missing Response Data. J. Appl. Math. 2014, 2, 1–10. [Google Scholar] [CrossRef] [Green Version]
  5. Cheruiyot, L.R. Local Linear Regression Estimator on the Boundary Correction in Nonparametric Regression Estimation. J. Stat. Theory Appl. 2020, 19, 460–471. [Google Scholar] [CrossRef]
  6. Chen, Z.; Chen, M.; Ju, F. Bayesian P-Splines Quantile Regression of Partially Linear Varying Coefficient Spatial Autoregressive Models. Symmetry 2022, 14, 1175. [Google Scholar] [CrossRef]
  7. Du, R.; Yamada, H. Principle of Duality in Cubic Smoothing Spline. Mathematics 2020, 8, 1839. [Google Scholar] [CrossRef]
  8. Lestari, B.; Fatmawati; Budiantara, I.N. Spline Estimator and Its Asymptotic Properties in Multiresponse Nonparametric Regression Model. Songklanakarin J. Sci. Technol. 2020, 42, 533–548. [Google Scholar] [CrossRef]
  9. Kayri, M.; Zirhlioglu, G. Kernel Smoothing Function and Choosing Bandwidth for Non-Parametric Regression Methods. Ozean J. Appl. Sci. 2009, 2, 49–54. [Google Scholar]
  10. Zhao, G.; Ma, Y. Robust Nonparametric Kernel Regression Estimator. Stat. Probab. Lett. 2016, 116, 72–79. [Google Scholar] [CrossRef]
  11. Yang, Y.; Pilanci, M.; Wainwright, M.J. Randomized Sketches for Kernels: Fast and Optimal Nonparametric Regression. Ann. Stat. 2017, 45, 991–1023. [Google Scholar] [CrossRef] [Green Version]
  12. Syengo, C.K.; Pyeye, S.; Orwa, G.O.; Odhiambo, R.O. Local Polynomial Regression Estimator of the Finite Population Total under Stratified Random Sampling: A Model-Based Approach. Open J. Stat. 2016, 6, 1085–1097. [Google Scholar] [CrossRef] [Green Version]
  13. Chamidah, N.; Budiantara, I.N.; Sunaryo, S.; Zain, I. Designing of Child Growth Chart Based on Multi-Response Local Polynomial Modeling. J. Math. Stat. 2012, 8, 342–347. [Google Scholar] [CrossRef]
  14. Opsomer, J.D.; Ruppert, D. Fitting a Bivariate Additive Model by Local Polynomial Regression. Ann. Stat. 1997, 25, 186–211. [Google Scholar] [CrossRef]
  15. Bilodeau, M. Fourier Smoother and Additive Models. Can. J. Stat. 1992, 20, 257–269. [Google Scholar] [CrossRef]
  16. Kim, J.; Hart, J.D. A Change-Point Estimator Using Local Fourier Series. J. Nonparametr. Stat. 2011, 23, 83–98. [Google Scholar] [CrossRef]
  17. Yu, W.; Yong, Y.; Guan, G.; Huang, Y.; Su, W.; Cui, C. Valuing Guaranteed Minimum Death Benefits by Cosine Series Expansion. Mathematics 2019, 7, 835. [Google Scholar] [CrossRef] [Green Version]
  18. Yao, D.S.; Chen, W.X.; Long, C.X. Parametric Estimation for the Simple Linear Regression Model under Moving Extremes Ranked Set Sampling Design. Appl. Math. J. Chin. Univ. 2021, 36, 269–277. [Google Scholar] [CrossRef]
  19. Ruppert, D.; Wand, M.P.; Carroll, R.J. Semiparametric Regression; Cambridge University Press: Cambridge, UK, 2003; ISBN 9780511755453. [Google Scholar]
  20. Hidayat, R.; Budiantara, I.N.; Otok, B.W.; Ratnasari, V. The Regression Curve Estimation by Using Mixed Smoothing Spline and Kernel (MsS-K) Model. Commun. Stat.-Theory Methods 2021, 50, 3942–3953. [Google Scholar] [CrossRef]
  21. Sauri, M.S.; Hadijati, M.; Fitriyani, N. Spline and Kernel Mixed Nonparametric Regression for Malnourished Children Model in West Nusa Tenggara. J. Varian 2021, 4, 99–108. [Google Scholar] [CrossRef]
  22. Budiantara, I.N.; Ratnasari, V.; Ratna, M.; Zain, I. The Combination of Spline and Kernel Estimator for Nonparametric Regression and Its Properties. Appl. Math. Sci. 2015, 9, 6083–6094. [Google Scholar] [CrossRef]
  23. Mariati, N.P.A.M.; Budiantara, I.N.; Ratnasari, V. The Application of Mixed Smoothing Spline and Fourier Series Model in Nonparametric Regression. Symmetry 2021, 13, 2094. [Google Scholar] [CrossRef]
  24. Nurcahayani, H.; Budiantara, I.N.; Zain, I. The Curve Estimation of Combined Truncated Spline and Fourier Series Estimators for Multiresponse Nonparametric Regression. Mathematics 2021, 9, 1141. [Google Scholar] [CrossRef]
  25. Yin, Z.H.; Liu, F.; Xie, Y.F. Nonparametric Regression Estimation with Mixed Measurement Errors. Appl. Math. 2016, 7, 2269–2284. [Google Scholar] [CrossRef] [Green Version]
  26. Diggle, P.J.; Heagerty, P.; Liang, K.Y.; Zeger, S.L. Analysis of Longitudinal Data; Oxford Univ. Press, Inc.: Oxford, NY, USA, 2002. [Google Scholar]
  27. Mardianto, M.F.F.; Gunardi; Utami, H. An Analysis about Fourier Series Estimator in Nonparametric Regression for Longitudinal Data. Math. Stat. 2021, 9, 501–510. [Google Scholar] [CrossRef]
  28. Fernandes, A.A.R.; Budiantara, I.N.; Otok, B.W.; Suhartono. Spline Estimator for Bi-Responses Nonparametric Regression Model For Longitudinal Data. Appl. Math. Sci. 2014, 8, 5653–5665. [Google Scholar] [CrossRef]
  29. Vogt, M.; Linton, O. Classification of Non-Parametric Regression Functions in Longitudinal Data Models. J. R. Stat. Soc. Ser. B Stat. Methodol. 2017, 79, 5–27. [Google Scholar] [CrossRef] [Green Version]
  30. Cheng, M.Y.; Paige, R.L.; Sun, S.; Yan, K. Variance Reduction for Kernel Estimators in Clustered/Longitudinal Data Analysis. J. Stat. Plan. Inference 2010, 140, 1389–1397. [Google Scholar] [CrossRef]
  31. Jou, P.H.; Akhoond-Ali, A.M.; Behnia, A.; Chinipardaz, R. A Comparison of Parametric and Nonparametric Density Functions for Estimating Annual Precipitation in Iran. Res. J. Environ. Sci. 2009, 3, 62–70. [Google Scholar] [CrossRef] [Green Version]
  32. Sun, Y.; Sun, L.; Zhou, J. Profile Local Linear Estimation of Generalized Semiparametric Regression Model for Longitudinal Data. Lifetime Data Anal. 2013, 19, 317–349. [Google Scholar] [CrossRef] [Green Version]
  33. Yao, W.; Li, R. New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data. J. R. Stat. Soc. Ser. B Stat. Methodol. 2013, 75, 123–138. [Google Scholar] [CrossRef] [Green Version]
  34. Fan, J.; Gijbels, I. Local Polynomial Modelling and Its Applications; Chapman & Hall: London, UK, 1996. [Google Scholar]
  35. Wu, H.; Zhang, J. Nonparametric Regression Methods for Longitudinal Data Analysis; John Willey & Sons, Inc.: Hoboken, NJ, USA, 2006. [Google Scholar]
  36. Wahba, G. Spline Models for Observational Data; SIAM, Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1990. [Google Scholar]
  37. BPS-Statistics Indonesia. Penghitungan And Analisis Kemiskinan Makro Indonesia Tahun 2019; BPS-Statistics Indonesia: Jakarta, Indonesia, 2021.
  38. BPS-Statistics of Bengkulu Province. Profil Kemiskinan Provinsi Bengkulu September 2021; BPS-Statistics of Bengkulu Province: Bengkulu, Indonesia, 2022.
  39. Asrol, A.; Ahmad, H. Analysis of Factors That Affect Poverty in Indonesia. Rev. Espac. 2018, 39, 14–25. [Google Scholar]
  40. Ghazali, M.; Otok, B.W. Pemodelan Fixed Effect Pada Regresi Data Longitudinal Dengan Estimasi Generalized Method of Moments (Studi Kasus Data Pendududuk Miskin Di Indonesia). Statistika 2016, 4, 39–48. [Google Scholar]
  41. Sinaga, M. Analysis of Effect of GRDP (Gross Regional Domestic Product) Per Capita, Inequality Distribution Income, Unemployment and HDI (Human Development Index). Budapest Int. Res. Critics Inst. J. 2020, 3, 2309–2317. [Google Scholar] [CrossRef]
  42. Fajriyah, N.; Rahayu, S.P. Pemodelan Faktor-Faktor Yang Mempengaruhi Kemiskinan Kabupaten/Kota Di Jawa Timur Menggunakan Regresi Data Panel. J. Sains Seni ITS 2016, 5, 2337–3520. [Google Scholar]
Figure 1. Partial scatterplot of 10 subjects between (a) the poverty gap index (PGI-P1) and average length of school years; (b) the poverty gap index (PGI-P1) and percentage of households working in the agricultural sector. * scatterplot between response and each predictor.
Figure 1. Partial scatterplot of 10 subjects between (a) the poverty gap index (PGI-P1) and average length of school years; (b) the poverty gap index (PGI-P1) and percentage of households working in the agricultural sector. * scatterplot between response and each predictor.
Symmetry 14 02687 g001
Figure 2. Comparison between actual and fitted values for each subject.
Figure 2. Comparison between actual and fitted values for each subject.
Symmetry 14 02687 g002
Table 1. Summary of GCV results for PGI-P1 modeling.
Table 1. Summary of GCV results for PGI-P1 modeling.
Model 1Nonparametric Regression with Truncated Spline and Local Linear Mixed Estimator for Longitudinal Data
Number of KnotsWeight TypeBandwidth Parameter
x
GCV
1 V = N 1 I 0.777119.3669
V = n 1 I 1.208919.7119
V = Σ 1 5.440031.5315
2 V = N 1 I 0.795319.9016
V = n 1 I 0.906719.2324 *
V = Σ 1 0.318168.3926
Model 2Nonparametric Regression with Truncated Spline Estimator for Longitudinal Data
Number of KnotsWeight TypeGCV
1 V = N 1 I 19.7483
V = n 1 I 20.0859
V = Σ 1 24.2408
2 V = N 1 I 22.9741
V = n 1 I 21.1933
V = Σ 1 24.3359
Model 3Nonparametric Regression with Local Linear Estimator for Longitudinal Data
Weight TypeBandwidth ParameterGCV
x t
V = N 1 I 5.4487.3830.4703
V = n 1 I 5.4487.3831.7384
V = Σ 1 5.4487.3856.7212
* The minimum value of GCV.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Sriliana, I.; Budiantara, I.N.; Ratnasari, V. A Truncated Spline and Local Linear Mixed Estimator in Nonparametric Regression for Longitudinal Data and Its Application. Symmetry 2022, 14, 2687. https://doi.org/10.3390/sym14122687

AMA Style

Sriliana I, Budiantara IN, Ratnasari V. A Truncated Spline and Local Linear Mixed Estimator in Nonparametric Regression for Longitudinal Data and Its Application. Symmetry. 2022; 14(12):2687. https://doi.org/10.3390/sym14122687

Chicago/Turabian Style

Sriliana, Idhia, I Nyoman Budiantara, and Vita Ratnasari. 2022. "A Truncated Spline and Local Linear Mixed Estimator in Nonparametric Regression for Longitudinal Data and Its Application" Symmetry 14, no. 12: 2687. https://doi.org/10.3390/sym14122687

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop