Next Article in Journal
Convergence on Kirk Iteration of Cesàro Means for Asymptotically Nonexpansive Mappings
Previous Article in Journal
Channel-Pruning Convolutional Neural Network with Learnable Kernel Element Position Convolution Utilizing the Symmetric Whittaker–Shannon Interpolation Function
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimation of Biresponse Semiparametric Regression Model for Longitudinal Data Using Local Polynomial Kernel Estimator

1
Doctoral Study Program, Faculty of Science and Technology, Airlangga University, Surabaya 60115, Indonesia
2
Department of Statistics, Faculty of Sciences and Technology, Universitas Muhammadiyah Semarang, Semarang 50273, Indonesia
3
Department of Mathematics, Faculty of Science and Technology, Airlangga University, Surabaya 60115, Indonesia
4
Research Group of Statistical Modeling in Life Science, Faculty of Science and Technology, Airlangga University, Surabaya 60115, Indonesia
5
Department of Mathematics, Faculty of Mathematics and Natural Sciences, The University of Jember, Jember 68121, Indonesia
6
Department of Statistics, Faculty of Science, Muğla Sıtkı Koçman University, Muğla 48000, Turkey
7
Department of Mathematics, University of Wisconsin, Oshkosh Algoma Blvd, Oshkosh, WI 54901, USA
*
Author to whom correspondence should be addressed.
Symmetry 2025, 17(3), 392; https://doi.org/10.3390/sym17030392
Submission received: 12 January 2025 / Revised: 12 February 2025 / Accepted: 25 February 2025 / Published: 4 March 2025
(This article belongs to the Section Mathematics)

Abstract

:
When handling longitudinal data in regression models, we often encounter problems involving two interrelated response variables. These response variables may display an unknown curve shape in their relationship with one predictor variable, referred to as the nonparametric component, while maintaining a linear relationship with other predictor variables, referred to as the parametric component. In such cases, a Biresponse Semiparametric Regression (BSR) approach is a suitable solution. This research aims to estimate the BSR model for longitudinal data using the Local Polynomial Kernel (LPK) estimator by considering a symmetrical variance–covariance matrix estimate validated on simulation data and apply it to a real dataset of Dengue Hemorrhagic Fever (DHF) disease. The parameter estimation method used is a combination of Least Squares (LS) and Weighted Least Squares (WLS). For determining the optimal bandwidth, we use a Generalized Cross–Validation (GCV) method. The simulation study results indicate that with kernel weighting, employing weights derived from the inverse of the variance–covariance matrix significantly enhances the estimation accuracy of the BSR model. In addition, the results of the estimation for modeling the DHF disease, where platelets and hematocrit are response variables, and hemoglobin and examination time are predictor variables, produced an R-Square value of 92.8%.

1. Introduction

Regression analysis has long been used to identify patterns of functional relationships through the form of the regression curve and the effect between predictor variables (independent) and response variables (dependent) by estimating the regression curve. First, we create a scatter diagram to determine the curve shape between these variables. In the regression model, the curve has a patterned shape, including curve shapes such as linear, quadratic, or exponential patterns [1]. Parametric and nonparametric regression models are the two types used in regression analysis. The nonparametric regression model includes the assumption that a curve’s form is unknown, which can be interpreted as no previous information available. The parametric regression provides the assumption that the regression curve’s shape is known, or it can be said that there is information about the shape of the regression curve. The nonparametric regression has great flexibility because the pattern of the nonparametric regression curve is assumed to be smooth [2].
Nonparametric regression (NR) is a regression analysis approach in which the predictor is built based on the information obtained from the data rather than having a preset form, meaning that there is an unknown specific curve shape connecting the predictor with the response variable. The NR model does not have assumptions like parametric regression. In the NR model, the regression curve is only assumed to be smooth, meaning it is in a certain function space. In estimating the NR curve, it is not influenced by the researcher’s subjectivity, so it has high flexibility. Next, a regression analysis that uses both parametric and nonparametric models is defined as Semiparametric Regression (SR). The SR model is formed by at least two predictors. The SR model assumes that a functional connection between the response with predictor variables follows certain patterns, while the shape of the pattern for other predictor variables is unknown [3].
In some problems with multipredictors, the response variable has a curve shape whose relationship pattern with the predictor variable is unknown, but with other predictor variables, the shape of the curve is known. In this condition, the SR model approach is suggested [4]. The SR models utilize certain estimators, such as kernel, spline, Fourier series, and local polynomial estimators. Several studies regarding one-response SR have been conducted by researchers, including Asrini [5] who used the Fourier series estimator; Diana et al. [6] who used a Bayesian approach; and Mohaisen and Abdulhussein [7] who used the spline estimators. Meanwhile, modeling biresponse or multi-response regression in the NR model has been conducted by Lestari et al. [8,9] who estimated the regression curve with a smoothing spline estimator.
The regression model has three types of data, namely cross-sectional, time series, and longitudinal data. The features of cross-sectional data differ from those of longitudinal data. The observations in the cross-sectional data contain n independent subjects, and only one observation is made for each subject. Time series data can be interpreted as a series of data obtained by observations at different times periodically at certain intervals [1]. Longitudinal data are a collection of data observed from n independent objects, each observed repeatedly at different times. Longitudinal data have advantages over cross-sectional data. That is, with the same number of subjects, the results of error observations provide a better estimate of the treatment effect than cross-sectional data because the data are estimated for each observation. In addition, longitudinal studies are more powerful than cross-sectional studies because fewer subjects are needed to obtain the same statistical power [10].
The main objective of SR modeling is to obtain an estimate of the regression curve. Several estimators can be used to estimate the regression curve, including the Local Polynomial Kernel (LPK). The LPK estimates the regression curve based on the polynomial form with the polynomial order according to the data pattern. The LPK estimator is based on assigning weights to each observation using the kernel function. In contrast, the weight is determined by the smoothing parameter, namely the bandwidth. LPK has several advantages, including reducing asymptotic bias and providing good estimation results [11]. In the estimating process of the regression curve using the LPK estimator, we optimize a WLS (Weighted Least Squares) function based on the GCV criterion. The GCV approach can be used to determine the smoothing parameters (bandwidth). Furthermore, the WLS method is a form of development of OLS (Ordinary Least Squares) used to overcome heteroscedasticity. Heteroscedasticity in the regression model occurs due to the inequality of the variance of the error vector. The WLS is a special case of generalized Least Squares. The WLS method can maintain the efficiency properties of its estimator without losing its bias and consistency properties [12]. Determination of the optimum bandwidth in nonparametric regression can be performed using the GCV method. The GCV method is often used because it has optimal asymptotic properties, and invariance to transformation and unknown variance of population. The optimum bandwidth is selected based on the minimum of GCV [13,14].
The Biresponse Nonparametric Regression (BNR) model for longitudinal data has been widely studied, and estimates of regression model parameters have been produced. The SR approach to longitudinal data, time series, and cross-section one-response regression models has been widely used by researchers. Longitudinal data have advantages compared to other types of data. Longitudinal data are able to capture changes and patterns over time of the observed subjects. In addition, longitudinal data are able to control the heterogeneity of the subject because the subject is observed repeatedly. It provides the advantage of investigating the changes in the subject over time, which the cross-section data are unable to achieve. Previous studies for longitudinal data were limited to one-response SR models, while some real cases involve biresponse or multi-response variables. This research examines the estimation of SR models using biresponse variables on longitudinal data.
The biresponse regression model differs from the one-response regression model, where the biresponse model consists of several equations assuming a correlation between responses. Therefore, the novelty of this research is the estimation of the BSR model using the Local Polynomal Kernel estimator by considering variance–covariance matrix estimates. In the estimation process of the BSR model, we use the WLS (Weighted Least Squares) optimization method by including a weight matrix, which is the inverse of the variance–covariance matrix, into the Least Squares optimization function based on the GCV criterion. Therefore, the estimation of the variance–covariance matrix is very necessary in this research. This is what distinguishes it from the classical estimation methods that have existed before. In this research, the LPK estimator is combined with the variance–covariance matrix estimation, which is validated on the simulation data. The simulation results are combined with DHF data of hospitalized patients. The response variables used are platelets and hematocrit of DHF patients during hospitalization, while the predictor variables are the examination time and hemoglobin levels. The predictor variable of examination time is a nonparametric component, while the hemoglobin is a parametric component. The original contribution of this research is the development of a semiparametric model involving two responses in longitudinal data with LPK function. In addition, this model is applied to model DHF data, where platelets and hematocrit are correlated, while time and hemoglobin variables are predictor variables with nonparametric and parametric components, respectively.

2. Literature Review

In this section, we provide brief discussions of research concepts based on studies from various references. This is provided to show research that has been carried out previously regarding the problems examined in this study so as to provide an aspect of originality to this research.

2.1. Nonparametric Regression (NR) Model

The approach of using NR models, both one-response and biresponse, has been widely used by previous researchers using longitudinal, cross-section and time series data. The following previous researchers who have conducted research on one-response or bi-response modeling in the NR model include the following: Mardianto et al. [13], Ketrin et al. [14], and Iriany and Fernandes [15] used the Fourier series estimator; Rahmawati et al. [16] used the spline smoothing estimator; Geller and Neumann [17] and Fan and Gijbels [18] discussed local polynomials in the NR models; Araveeporn [19] used truncated spline estimators in time series data; and Mariati et al. [20] used a mixed estimator of smoothing spline and Fourier series to estimate the NR model. Studies on longitudinal data using the NR model have been widely conducted, including spline regression estimation with Least Squares [21], truncated spline regression estimation [22,23], smoothing spline estimation [24], kernel NR estimation [25,26], and mixed estimators of NR [27]. These studies are limited to one response even though some cases involve several correlated response variables. In research on longitudinal data, biresponse or multiresponse variables are often found in NR modeling. Previous studies on Biresponse Nonparametric Regression (BNR) using longitudinal data include those by Xiang et al. [28] who used the Local Polynomial Kernel approach; Fernandes et al. [29] who used spline smoothing. The researchers mentioned above discussed nonparametric regression models for longitudinal data only. In other words, these researchers have not discussed semiparametric regression models for longitudinal data yet.

2.2. Semiparametric Regression (SR) Model

Several studies on estimating SR models using different estimators have been conducted by some researchers. For example, Asrini [5] used the Fourier series estimator; Diana et al. [6] estimated an SR model using the Bayesian approach; Nisa and Budiantara [30] used a mixed estimator of truncated spline and Fourier series; Mohaisen and Abdulhussein [7] used the spline estimator; and Ampa et al. [31] used the kernel estimator. In addition, various studies on estimating SR models for longitudinal data have also been conducted by some researchers, for example Sun and You [32] who used the Least Squares spline approach; Jazi and Pullenayegum [33] selected the variables of a longitudinal SR model; and Utami et al. [34] used the local polynomial estimator. These previous studies are limited to one-response SR models only. In the real cases, we frequently find some correlated response variables of SR models for longitudinal data that must be analyzed. In this case, we should employ a biresponse or multiresponse SR model for longitudinal data. Several previous researchers who studied biresponse or multiresponse SR modeling for longitudinal data include Li [35] and Jia et al. [36] who estimated the covariance matrix of errors to improve estimation efficiency for coefficients of the longitudinal SR model; Jiao et al. [37] who analyzed longitudinal data based on the Bayesian method, and Chamidah et al. [38] who estimated the multiresponse SR model using the smoothing spline estimator.
Researchers generally accommodate this correlation using a symmetrical variance–covariance matrix when estimating model parameters. The variance–covariance matrix is theoretically assumed to be a fixed variable, while in real cases, its value is unknown. Several studies have been carried out on the estimation of the variance–covariance matrix in nonparametric and semiparametric regression models, including the studies by Lestari et al. [8,9], Li [35], Jia et al. [36], and Ampulembang et al. [39]. This means that the variance–covariance matrix must be estimated from the data. In previous research, the implementation of the estimation results was only carried out on simulated data. Based on the simulation results from previous research, the results showed that estimates using the variance–covariance matrix were better than estimates without using the variance–covariance matrix. Therefore, it is necessary to carry out research regarding the estimation of the SBR model for longitudinal data using local polynomials. Based on the estimation results, it is then simulated using a variance–covariance, which is then implemented on real data, namely DHF patients. Compared to previous methods, SBR for longitudinal data with the LPK estimator using the variance–covariance matrix as the weighted estimator provides the advantage of being able to model data with two correlated responses in longitudinal data phenomena with a semiparametric pattern, so that the model estimate fits.

3. Materials and Methods

In order to accomplish the research goals, this section provides a brief explanation of the materials and research steps used, such as the Weighted Least Squares (WLS) estimator, the SR model of longitudinal data, the Local Polynomial Kernel estimator in the NBR model, the goodness of fit of the estimated model, the DHF, and research steps.

3.1. Weighted Least Squares (WLS) Estimator

According to Montgomery et al. [40], deviations in error assumptions can usually occur in observations in regression analysis. The deviation is indicated by V a r ( ε ) I σ 2 , so that to overcome the regression model, the WLS method is used. This method can overcome violations of the heteroscedasticity assumption and can eliminate the unbiased and consistent properties of the OLS (Ordinary Least Squares) model. The deviation can be seen from the following regression model:
y = X β + ε
where y represents a response variable, X represents a predictor variable, β represents a parameter of predictor variable, and ε represents a random error. Here, E ε = 0 , and V a r ε = V σ 2 where V is a weighting matrix, and ε ~ N n ( 0 , V σ 2 ) , where ε demonstrates Multivariate normal distribution. This regression model shows the deviation form, namely the matrix V σ 2 I σ 2 . This means that the estimator for the parameter of the regression model based on Least Squares, namely β ^ = ( X T X ) 1 X T y , is not valid. Hence, another method is needed to obtain the estimator for the parameter of the regression model, namely the Weighted Least Squares (WLS) method, so that we can obtain the estimator for the parameter of the regression model based on the WLS, where β ^ = ( X T V 1 X ) 1 X T V 1 y .

3.2. Semiparametric Regression (SR) Model for Longitudinal Data

Longitudinal data are obtained by repeatedly observing each subject at different periods. Longitudinal data have different characteristics from cross-sectional data. The difference between longitudinal and cross-sectional data is that cross-sectional data are independent, while longitudinal data are dependent on the same subject and independent from different subjects [10]. Between responses and between observations on the same subject are correlated when longitudinal data include multiple responses [38]. The SR model combines two regression models, namely the parametric regression model and nonparametric model, so the SR model has at least two predictor variables. The SR model assumes that the functional association between the response and predictor variables follows a certain pattern, while the other predictor has an unknown pattern. For example, y i denotes the response variable for the i-th subject while x i and t i represent the predictor variables. If the form of the function between y i   and x i is known and the form of curve for t i is unknown (if linear), the data can be modeled using the SR model as follows:
y i = β 0 + β 1 x 1 i + β 2 x 2 i + + β p x p i + η ( t i ) + ε i ,   i = 1 , 2 , , n
where y i   is the response variable at the i-th observation; x 1 i ,   x 2 i , , x p i are parametric components of the predictor variable; β 0 ,   β 1 ,   β 2 , , β p are parameters of the parametric components; η t i is a nonparametric component of predictor variables; and ε i is the random error with ε i ~ N 0 , σ 2 . The function η t i is a function whose curve shape is not known yet, and it is assumed to be smooth. Model (1) can be represented by the following matrix equation:
y = X β + η + ε
where y is the response variable with dimensions ( n × 1 ) , X is the parametric component of the predictor variable with the dimensions ( n × p + 1 ) , β is the parameter of the parametric component with the dimensions ( ( p + 1 ) × 1 ) and η is the nonparametric component of the predictor variable with the dimensions ( n × 1 ) that are assumed to be smooth and ε ~ N n ( 0 , I σ 2 ) [38,41].
Next, the SR model for longitudinal data is given as follows [38]:
y i j = x i j T β + η t i j + ε i j ,   i = 1 , 2 , , n ;   j = 1 , 2 , , m i
where y i j is the response variable, x i j T β is the parametric component of the regression function, η ( t i j ) is an unknown function called the nonparametric component of the regression function that is estimated by the Local Polynomial Kernel (LPK) estimator, and ε i j ~ N 0 , σ 2 is the observation error.
Model (3) can be written as the following matrix equation:
y = X β + η + ε .
Therefore, Equation (4) is as follows:
y = η + ε
where y = y X β . Furthermore, if the estimation of β , namely β ^ , is known, then the nonparametric function η can be estimated by using WLS optimization.
Therefore, Equation (2) can be expressed as follows:
η t i j z i j T θ t 0 .
Next, let θ ^   be an estimator of θ obtained by using the following WLS optimization:
Q = Min θ { i = 1 n j = 1 m i y i j z i j T θ ( t 0 ) 2 ( t i j t 0 ) } .
The WLS optimization in Equation (6) can be written as follows:
Q = M i n θ { ( y Z t 0 θ t 0 ) T ( y Z t 0 θ ( t 0 ) ) }
where Z i = [ z i 1 , z i 2 , , z i n i ] T and Z t 0 = [ Z 1 T , , Z n T ] . Therefore, the value for θ ^ t 0 can be obtained by adjusting Equation (7) with respect to θ t 0 , and setting to zero. Hence, we obtain the following equation:
θ ^ t 0 = ( Z t 0 T Z t 0 ) 1 Z t 0 T y .
Then, we obtain the LPK estimator, η ^ , as follows:
η ^ = Z t 0 ( Z t 0 T Z t 0 ) 1 Z t 0 T y = Z t 0 Z t 0 T Z t 0 1 Z t 0 T ( y X β ^ )
Thus, based on Equation (8), the following equation can be obtained:
η ^ = A y = A ( y X β ^ )
where A = Z t 0 Z t 0 T Z t 0 1 Z t 0 T .
Next, we determine β ^ by minimizing the Least Squares (LS) function as follows:
Q β = ( I A y ( I A ) X β ) T ( I A y ( I A ) X β ) .
Hence, we obtain β ^ as follows:
β ^ = X T I A T I A X 1 X T I A T I A y
where A p a r = X X T I A T I A X 1 X T I A T I A , A n o n p a r = A ( I A p a r ) , and
A ( h ) = A p a r + A ( I A p a r ) .

3.3. Local Polynomial Kernel Estimator in Biresponse Nonparametric Regression (BNR) Model

In regression analysis, if the response variable has two correlated variables, then it is called the biresponse regression (BR) model. The BR model can be written as follows:
y i = f x i + ε i ,   i = 1 , 2 , , n
where y i = ( y i ( 1 ) , y i ( 2 ) ) T is the response variable, f x i = ( f 1 ( x i ) , f ( 2 ) ( x i ) ) T is the regression function, and ε i = ( ε i ( 1 ) , ε i ( 2 ) ) T is the observation error with the variance–covariance matrix of each observation, Σ i , given by the following expression:
Σ i = C o v ε i = C o v ε i ( 1 ) ε i ( 2 ) = V a r ( ε i ( 1 ) ) C o v ( ε i ( 1 ) , ε i ( 2 ) ) C o v ( ε i ( 2 ) , ε i ( 1 ) ) V a r ( ε i ( 2 ) ) = σ 1 i 2 σ 1 i σ 2 i ρ i σ 1 i σ 2 i ρ i σ 2 i 2
where σ 1 2 and σ 2 2 are two variance components, and ρ is the correlation coefficient [41].
The Biresponse Nonparametric Regression (BNR) model is a regression model consisting of two response variables that are correlated with each other. The BNR model can be written as follows:
y i = η t i + ε i ,   i = 1 , 2 , , n
where y i = ( y i ( 1 ) , y i ( 2 ) ) T represents two response variables that are correlated with each other, η t i = ( η 1 ( t i ) , η 2 ( t i ) ) T is the regression function, and ε i = ( ε i ( 1 ) , ε i ( 2 ) ) T is the observation error with the variance–covariance matrix Σ i , where index i indicates the observation. Here, y i ( 1 ) and y i ( 2 ) are correlated with each other. The function η t i is a nonparametric biresponse curve estimated using the LPK estimator by considering the kernel function K h and a weighting matrix W . The matrix W is a ( 2 n × 2 n ) weighting matrix obtained from the inverse of the variance–covariance matrix, as shown below:
W = Σ 11 Σ 12 Σ 21 Σ 22 1
where Σ r r = d i a g ( σ 1 2 ( r ) , σ 2 2 ( r ) , , σ n 2 ( r ) ) , and Σ s r = d i a g ( σ 1 ( s ) ( r ) , σ 2 ( s ) ( r ) , , σ n ( s ) ( r ) ) .
The function η t i = ( η 1 ( t i ) , η 2 ( t i ) ) T is approximated locally with a Taylor Series of p-degree polynomials at point t 0 as follows [41]:
η r t i θ 0 ( r ) t 0 + t i t 0 θ 1 ( r ) t 0 + + t i t 0 p θ p ( r ) ( t 0 ) ,   r = 1 , 2
where θ k r t 0 = η r ( k ) ( t 0 ) k   ! , k = 0,1 , 2 , , p .
Let y = y 1 y 2 ; y r = y 1 ( r ) y 2 ( r ) y n ( r ) ; θ t 0 = θ 1 θ 2 ; θ r = θ 0 ( r ) ( t 0 ) θ 1 ( r ) ( t 0 ) θ p ( r ) ( t 0 ) ; Z t 0 = Z ( 1 ) ( t 0 ) 0 0 Z ( 2 ) ( t 0 ) ; Z r t 0 = Z 1 T , , Z n T ;   Z i = [ z i 1 , z i 2 , , z i p ] T ; z i p = [ 1 , ( t i t 0 ) , ( t i t 0 ) 2 , , ( t i t 0 ) p ] T ; ε = ε 1 ε 2 ; and ε r = ε 1 ( r ) ε 2 ( r ) ε n ( r ) .
If η r ( t i ) z i p T θ r ; i = 1 , 2 , , n , so Equation (13) can be transformed into the following equation:
y = Z t 0 θ t 0 + ε .
The estimation of θ , namely θ ^ , is obtained by using the Weighted Least Squares (WLS) method, which minimizes the following function:
Q = y Z t 0 θ t 0 T W K h ( t 0 ) y Z t 0 θ t 0
where K h t 0 = K h ( 1 ) 0 0 K h ( 2 ) ; and K h ( r ) = d i a g ( K 1 h r , K 2 h r , , K n h r ) .
Hence, we obtain the estimation of θ ( t 0 ) , namely θ ^ t 0 , as follows:
θ ^ t 0 = ( Z t 0 T W K h ( t 0 ) Z t 0 ) 1 Z t 0 T W K h ( t 0 ) Z t 0 y .
The bandwidth or smoothing parameter is a controller of the balance between the smoothing function and the suitability of the function to the data. If the bandwidth is too small, the shape of the estimated curve will not be smooth, meaning that the estimated value obtained is closer to the data. An excessively wide bandwidth will result in an overly smooth estimated curve that tends to approach the response variable’s average. As a result, selecting the appropriate bandwidth for nonparametric regression is crucial. One technique for determining the optimum bandwidth h is the following Generalized Cross-Validation (GCV) approach, as shown below [9,23]:
G C V h = M S E h n 1 t r I A 2
where   M S E h = n 1 i = 1 n y i y ^ i 2 .
The bandwidth h that generates the lowest GCV value is the optimum bandwidth h value.

3.4. Goodness of Fit of Estimated Model

The degree to which the independent variable simultaneously influences the dependent variable is ascertained using the coefficient of determination ( R 2 ) . The greater the coefficient of determination value, the greater the influence of the independent variable on the dependent variable. The formula for the coefficient of determination is given as follows [42]:
R 2 = i = 1 n j = 1 m i y ^ i j y ¯ 2 i = 1 n j = 1 m i y i j y ¯ 2 ;   0 R 2 1
where y i j is the response variable in the i -th subject of the j -th observation, y ^ i j is the estimated response variable in the i -th subject of the j -th observation, and y ¯ is the average response.
The average of the squared errors between the estimated and real values is known as the MSE. The best model can be identified by finding the minimum MSE value close to zero. This means that the estimation result is close to the actual data value and can be used to calculate the prediction for the next period. The formula of the MSE is given by Equation (17).

3.5. Dengue Haemorrhagic Fever (DHF)

Dengue Hemorrhagic Fever (DHF) is a disease that often occurs in tropical areas and is geographically similar to malaria. DHF is transmitted to humans by mosquitoes called Aedes Aegypti. The course of DHF is about 6 to 7 days, with a peak of mild fever at the end of the fever period. The diagnosis of dengue fever is generally based on the results of laboratory tests on patient blood samples [43]. The normal platelet count ranges from 150 thousand per microliter to 440 thousand per microliter. However, DHF patients experience a decrease in platelets to below 100,000/mm3. The laboratory diagnosis of DHF is thrombocytopenia, which is a decrease in platelets to below 100,000/mm3. Hemoconcentration is an increase in hematocrit levels by 20% or more [44].
The data used in this study are secondary data regarding platelet, hematocrit, and hemoglobin levels in hospitalized patients with DHF in 2022 at the Roemani Hospital Semarang. The number of subjects taken was ten patients, with each patient observed for seven days of observation, with the criteria of patients observed being Grade I patients, marked by symptoms of fever, followed by a positive tourniquet test and indications of viral infection with signs of bleeding [45]. This research used two response variables: platelet count and hematocrit data from hospitalized DHF patients. The parametric predictor variable is the hemoglobin level in hospitalized DHF patients, and the nonparametric predictor is the examination time (per day) in hospitalized DHF patients.

3.6. Research Steps

The research steps include estimating a two-response semiparametric regression (SR) model for longitudinal data using a local polynomial estimator and then applying it to platelet and hematocrit data from Grade 1 DHF patients. This research uses two response variables and two predictor variables.
  • Determination of the components from BSR models of longitudinal data, such as parametric component regression function and nonparametric component regression function.
  • Estimation of BSR model using longitudinal data with LPK estimators. The estimation method in this study was to optimize the WLS method.
  • Simulation of BSR model using the LPK estimator with longitudinal data by considering the use of variance–covariance matrix estimation and the simulation of the implementation of the BSR model in the case of homoscedasticity and heteroscedasticity using three types of functions, namely trigonometric functions, exponential functions, and polynomial functions. The simulation process was carried out with the help of open source software R. The prediction accuracy value in this simulation uses the MSE ratio; if the MSE ratio value is greater than 1, then the MSE of the second method is better than the MSE of the first method. Therefore, it can be proven that the estimation in the BSR model is better using the matrix W and kernel weighting.
  • Implementation of the BSR model using the LPK estimator of longitudinal data in DHF patients. The following section will cover the application of the BSR model for the case of heteroscedasticity using two indicators for determining DHF disease, namely platelet and hematocrit levels, which are influenced by time and hemoglobin.

4. Results and Discussion

This session explains the analysis conducted following the research objectives. We discuss the estimation of the Biresponse Semiparametric Regression (BSR) model for longitudinal data using the LPK estimator. The estimation method in this research is to optimize the Weighted Least Squares (WLS) function. Then, the results of the BSR model estimation of longitudinal data using the LPK estimator are applied to the simulation data by considering the use of variance–covariance matrix estimation. The simulation results are combined with DHF data of hospitalized patients.

4.1. Determining Components of BSR Model for Longitudinal Data

In the paired longitudinal data x i j , t i j , y i j r ,   j = 1 , 2 , , m ; i = 1 , 2 , , n ; r = 1 , 2 , with m indicating the number of subjects, while n indicates the number of observations and r indicates the number of responses. The predictor variables consist of x i j and t i j . y i j ( 1 ) is the first response, and y i j ( 2 ) is the second response. The relationship curve form between y i j ( r ) and x i j is known, while the relationship curve form between y i j ( r ) and t i j is unknown. Hence, the relationship between ( x i j , t i j ) and y i j r follows a semiparametric regression (SR) model as follows:
y i j ( r ) = β 0 r + β 1 r x 1 i + η r t i j + ε i j ( r )
If f r x i j = β 0 r + β 1 r x 1 i , then Equation (19) can be transformed into the following matrix equation:
y i j = f x i j + η t i j + ε i j
where y i j = ( y i j ( 1 ) y i j ( 2 ) ) T consists of two correlated response variables, f ( x i j ) = ( f ( 1 ) ( x i j ) f ( 2 ) ( x i j ) ) T is the parametric component regression function, η ( t i j ) = ( η ( 1 ) ( t i j ) η ( 2 ) ( t i j ) ) T is the nonparametric component regression function and ε i = ( ε i ( 1 ) ε i ( 2 ) ) T is the measurement error with a mean of 0 and variance of Σ i . The equation for each response variable can be expressed as follows:
y i j ( 1 ) = β 0 1 + β 1 1 x 1 i + η 1 t i j + ε i j ( 1 )
y i j ( 2 ) = β 0 2 + β 1 2 x 1 i + η 2 t i j + ε i j ( 2 )
In the matrix notation, we may write the equation as follows:
y i j ( 1 ) y i j ( 2 ) = f ( 1 ) x i j + η 1 t i j f ( 2 ) x i j + η 2 t i j + ε i j ( 1 ) ε i j ( 2 )
where
y i j ( 1 ) = y 11 ( 1 ) y 12 ( 1 ) y 1 m ( 1 ) y 21 ( 1 ) y 22 ( 1 ) y 2 m ( 1 ) y n 1 ( 1 ) y n 2 ( 1 ) y n m ( 1 ) = f ( 1 ) x 11 + η 1 t 11 f ( 1 ) x 12 + η 1 t 12 f ( 1 ) x 1 m + η 1 t 1 m f ( 1 ) x 21 + η 1 t 21 f ( 1 ) x 22 + η 1 t 22 f ( 1 ) x 2 m + η 1 t 2 m f ( 1 ) x n 1 + η 1 t n 1 f ( 1 ) x n 2 + η 1 t n 2 f ( 1 ) x n m + η 1 t n m + ε 11 ( 1 ) ε 12 ( 1 ) ε 1 m ( 1 ) ε 21 ( 1 ) ε 22 ( 1 ) ε 2 m ( 1 ) ε n 1 ( 1 ) ε n 2 ( 1 ) ε n m ( 1 ) ;   y i j ( 2 ) = y 11 ( 2 ) y 12 ( 2 ) y 1 m ( 2 ) y 21 ( 2 ) y 22 ( 2 ) y 2 m ( 2 ) y n 1 ( 2 ) y n 2 ( 2 ) y n m ( 2 ) = f ( 2 ) x 11 + η 2 t 11 f ( 2 ) x 12 + η 2 t 12 f ( 2 ) x 1 m + η 2 t 1 m f ( 2 ) x 21 + η 2 t 21 f ( 2 ) x 22 + η 2 t 22 f ( 2 ) x 2 m + η 2 t 2 m f ( 2 ) x n 1 + η 2 t n 1 f ( 2 ) x n 2 + η 2 t n 2 f ( 2 ) x n m + η 2 t n m + ε 11 ( 2 ) ε 12 ( 2 ) ε 1 m ( 2 ) ε 21 ( 2 ) ε 22 ( 2 ) ε 2 m ( 2 ) ε n 1 ( 2 ) ε n 2 ( 2 ) ε n m ( 2 ) .
Based on the nature of longitudinal data, random errors in each subject, namely ε i 1 ( r ) , ε i 2 ( r ) , , ε i m ( r ) , are correlated, and errors between subjects ε 1 j ( r ) , ε 2 j ( r ) , , ε n j ( r ) are independent, with y i j ( r ) stating the response variable for r = 1 , 2 . On the other hand, the errors in each response, ε i j ( 1 ) and ε i j ( 2 ) , are correlated. Since η r t i j is approached by the Local Polynomial Kernel (LPK) estimator of degree p at point t around point t 0 as in Equation (13), we may write it as follows:
η r t i j η r ( t 0 ) + ( t i j t 0 ) η r 1 ( t 0 ) + ( t i j t 0 ) 2 η r ( 2 ) ( t 0 ) 2 ! + + ( t i j t ) p η r p ( t 0 ) p !
for r = 1 , 2 .
Next, if θ k ( r ) t 0 = η r k t 0 / k !   ; k = 0 , 1 , 2 , , p , the following equation can be obtained:
η r t i j θ 0 ( r ) t 0 + t i j t 0 θ 1 ( r ) t 0 + + t i j t 0 p θ p ( r ) t 0 .
Hence, the Taylor series for η 1 t i j is given by the following equation:
η 1 t i j = θ 0 ( 1 ) t 0 + t i j t 0 θ 1 ( 1 ) t 0 + + t i j t 0 p θ p ( 1 ) ( t 0 ) .
Equation (23) can be expressed as follows:
η 1 t i j = z 1 ( t 0 ) θ ( 1 ) ( t 0 )
where z 1 t 0 = 1 , t i j t 0 , t i j t 0 2 , , t i j t 0 p , and
θ 1 t 0 = θ 0 ( 1 ) t 0 , θ 1 ( 1 ) t 0 , θ 2 ( 1 ) t 0 , , θ p ( 1 ) ( t 0 ) T .
Also, the Taylor series for η 2 t i j is given by the following equation:
η 2 t i j = θ 0 ( 2 ) t 0 + t i j t 0 θ 1 ( 2 ) t 0 + + t i j t 0 p θ p ( 2 ) ( t 0 ) .
Equation (24) can be expressed as follows:
η 2 t i j = z 2 ( t 0 ) θ ( 2 ) ( t 0 )
where z 2 t 0 = 1 , t i j t 0 , t i j t 0 2 , , t i j t 0 p , and
θ 2 t 0 = θ 0 ( 2 ) t 0 , θ 1 ( 2 ) t 0 , θ 2 ( 2 ) t 0 , , θ p ( 2 ) ( t 0 ) T .
Equations (23) and (24) can be expressed in matrix notation as follows:
η t i j = η 1 ( t i j ) η 2 ( t i j ) = Z t 0 θ ( t 0 ) .
Hence, the model in Equation (20) can be written as follows:
y = X β + Z t 0 θ t 0 + ε .
The model in Equation (26) is the BSR model for longitudinal data that contains two components, namely the parametric function component ( i . e . , X β ) and the nonparametric function component (i.e., Z t 0 θ t 0 ).

4.2. Estimation of BSR Model for Longitudinal Data

In the BSR model (26), estimation of the nonparametric component parameter, θ t 0 , is obtained by minimizing the Weighted Least Squares (WLS) criterion by using the following kernel weighting function.
K h t 0 = K h ( 1 ) ( t 0 ) 0 0 K h ( 2 ) ( t 0 )
where K h ( r ) t 0 = d i a g K 1 h ( r ) t 0 , K 2 h ( r ) t 0 , , K n h ( r ) t 0 , and
K i h ( r ) t 0 = d i a g K i 1 ( r ) t i 1 t 0 , K i 2 ( r ) t i 2 t 0 , , K i m ( r ) t i m t 0 .
Next, to obtain the estimation of the nonparametric component parameter, θ t 0 , firstly we assume that the estimation of the parametric component parameter, β , is known. Therefore, the Equation (19) becomes the following:
y i j r = η r t i j + ε i j r
where y i j r = y i j r ( β 0 r + β 1 r x 1 i ) . Hereinafter, we may write the model in the matrix notation as follows:
y = Z t 0 θ t 0 + ε
where y = y X β .
The form of function Z t 0 θ t 0 in the model (28) is unknown. So, it is estimated by using the LPK estimator and including the kernel weighting function as follows:
Q = y Z t 0 θ t 0 T W K h ( t 0 ) y Z t 0 θ t 0
= y T W K h t 0 y 2 θ T t 0 Z t 0 T W K h t 0 y + θ T t 0 Z t 0 T W K h t 0 Z t 0 θ t 0 .
Next, by taking the partially derivative of Equation (30) with respect to θ t 0 and setting it to zero, we obtain the following results:
Q θ t 0 θ t 0 = θ ^ ( t 0 ) = 0       Z t 0 T W K h t 0 y Z t 0 T W K h t 0 Z t 0 θ ^ t 0 = 0       Z t 0 T W K h t 0 y = Z t 0 T W K h t 0 Z t 0 θ ^ t 0       θ ^ t 0 = Z t 0 T W K h t 0 Z t 0 1 Z t 0 T W K h t 0 y
Based on Equations (25) and (30), the LPK estimator for η ( x i j ) is as follows:
η ^ t i j = Z t 0 θ ^ t 0 = Z t 0 Z t 0 T W K h t 0 Z t 0 1 Z t 0 T W K h t 0 y .
Since y = y X β , and based on Equation (32), we obtain the following equation:
η ^ t 0 = Z t 0 Z t 0 T W K h t 0 Z t 0 1 Z t 0 T W K h t 0 y = H ( y X β )
where H = Z t 0 Z t 0 T W K h t 0 Z t 0 1 Z t 0 T W K h t 0 .
Next, by substituting Equation (33) into Equation (26), this step gives the following equation:
I H y = I H X β + ε .
Hence, based on Equation (34), we have the following Least Squares (LS) expression:
( β ) = I H y I H X β T I H y I H X β = y T I H T I H y 2 β T X T I H T I H y + β T X T I H T I H X β .
Next, by taking the partially derivative of Equation (35) with respect to β and setting it to zero, we obtain the following results:
L S ( β ) β β = β ^ = 0   2 X T I H T I H y + 2 X T I H T I H X β ^ = 0     X T I H T I H X β ^ = X T I H T I H y     β ^ = X T I H T I H X 1 X T I H T I H y .
Based on Equations (20) and (36), we obtain the estimation of the parametric component regression function, namely f ^ , as follows:
f ^ ( x ) = X β ^ = X X T I H T I H X 1 X T I H T I H y = A P a r y
where A P a r = X X T I H T I H X 1 X T I H T I H .
Furthermore, by substituting Equation (37) into Equation (33), we obtain the LPK estimator of the nonparametric component of regression function as follows:
η ^ ( t ) = H y X β = H ( y A P a r y ) = H ( I A P a r ) y = A n o n p a r y
where A n o n p a r = H ( I A P a r ) .
Finally, based on Equations (37) and (38), we obtain the estimation result of the Biresponse Semiparametric Regression (BSR) model for longitudinal data using Local Polynomial Kernel (LPK) estimator as follows:
y ^ = X β ^ + Z t 0 θ ^ t 0 = f ^ x + η ^ ( t ) = A P a r y + A n o n p a r y = A P a r y + H ( I A P a r ) y = A P a r + H ( I A P a r ) y = A h y
where   A h = A P a r + H ( I A P a r ) .

4.3. Optimal Bandwidth Selection Using GCV Method

After obtaining the parameter estimates of the Biresponse Semiparametric Regression (BSR) model using the Local Polynomial Kernel estimator, the next step is determining the values of the Generalized Cross-Validation (GCV) and MSE of the model formed. The GCV criterion is one method for selecting the optimal bandwidth. The GCV function is defined as follows:
G C V h = 2 N 1 y y ^ T y y ^ 2 N 1   t r a c e I A ( h ) 2 = 2 N 1 y T I A ( h ) T I A ( h ) y 2 N 1   t r a c e I A ( h ) 2
where N = n × m .
Furthermore, the selection of the optimal bandwidth using GCV is performed by substituting the value of the bandwidth ( h ( 0 , ) ) into the A ( h ) in Equation (40) until the minimum GCV(h) value is obtained. In addition to GCV, other measures of model goodness, such as MSE (Mean Square Error), are also considered.
The MSE is the average of the squared errors between the actual value and the estimated value. The best model can be determined by determining the minimum MSE value that is the closest to zero. This means that the estimated results are close to the actual values, and can be used to calculate prediction values for future periods. The formula for the MSE value of the BSR model is defined as follows:
M S E h = N 1 y T I A ( h ) T I A ( h ) y .

4.4. Simulation Study

This section discusses the simulation of the implementation of the BSR model in the case of homoscedasticity and heteroscedasticity using three types of functions, namely trigonometric functions, exponential functions, and polynomial functions. The number of responses used is two responses (biresponse), and only uses two predictors, i.e., one predictor for the nonparametric component and one predictor for the parametric component.
The stages of generating data are as follows:
  • Generating predictor variable values as parametric components (x) with uniform distribution. The notation for uniform distribution is x ~ U ( a = 10 ,   b = 15 ) , where α is the lowest value of x, and b is the highest value of x.
  • Generating predictor variable values as nonparametric components (t) with uniform distribution. The notation for uniform distribution is t ~ U ( a = 1 ,   b = 7 ) , where α is the lowest value of t, and b is the highest value of t.
  • Generating error ( ε ) values based on the bivariate normal distribution with a mean of 0 (namely, μ 1 = μ 2 = 0 ) and a variance–covariance matrix Σ according to the specified correlation and variance, where the variance of the first response ( σ 1 2 ) is 10, and the variance of the second response ( σ 2 2 ) is 2, and the value of its covariance ( C o v ( ε 1 , ε 2 ) ) is 3.
ε = ε 1 ε 2 ~ N μ 1 μ 2 , σ 1 2 C o v ( ε 1 , ε 2 ) C o v ( ε 1 , ε 2 ) σ 2 2 , Σ = σ 1 2 C o v ( ε 1 , ε 2 ) C o v ( ε 1 , ε 2 ) σ 2 2 = 10 3 3 2 .
  • Calculating the value of the response variable by substituting the values of x and t for the specified function plus the error value according to the simulation scenario form equation.
This simulation study aims to prove that in the case of biresponses, the best estimate can be obtained by using the weighting matrix (W) in addition to kernel weighting. In this simulation, the prediction accuracy value, namely the MSE of the model estimation results with the first method, is compared with the MSE of the model estimation results with the second method. The first method is an estimation method that only uses the kernel weighting without using the W, while the second method is an estimation method that uses the kernel weighting and the inverse weighting of the response error variance–covariance matrix W. Thus,
M S E   R a t i o = M S E   o f   1 s t   m e t h o d M S E   o f   2 n d   m e t h o d
If the MSE ratio value is greater than one, it indicates that the MSE of the second method is better than the first method because it has a smaller MSE value, and vice versa. This means that if the MSE ratio value is greater than one, it is proven that the estimation in the biresponse regression is better using the W weighting and kernel weighting. This simulation study uses two responses with three different types of functions: trigonometric, exponential, and polynomial. In addition, this simulation has several scenarios, namely sample size, type of function and type of case. The simulation study scenario is presented in Table 1.
Simulations were carried out on scenarios with different sample sizes (n), i.e., 49, 70, 105, and 140 samples, types of cases, namely homoscedasticity and heteroscedasticity, and different functions (polynomial, exponential and trigonometric). Each scenario was simulated for 30 iterations and produced different MSE values. The simulation process was carried out with the help of open source software R and an example of the output of running the R-Code. The simulation results based on the scenario are given in Table 2.
The results of the simulation study can be seen in Table 2 and Figure 1, which shows that the average of the MSE ratio is greater than one. It proves that weighting in inverse variance–covariance matrix W estimation and kernel weighting in biresponse regression estimation are indeed needed and have been proven to provide better estimation accuracy values when using the inverse variance–covariance matrix W compared to the method that only uses kernel weighting. Based on Figure 1, it can be concluded that the more the amount of data increases, the higher the MSE ratio value. In addition, the MSE ratio value in the case of heteroscedasticity is higher than in the case of homoscedastic data. It means that higher MSE ratio values indicate that the use of the inverse variance–covariance matrix W in biresponse regression estimation is very necessary to solve local polynomial biresponse regression models using longitudinal data, especially in cases of heteroscedasticity and increasingly large amounts of data. This is in accordance with research from Rosopa et al. [46] and Midi et al. [47], where regression parameter estimation using weighted W was used if heteroscedasticity was found in the data.

4.5. Implementation of Biresponse Semiparametric Regression (BSR) Model Using LPK Estimator for Longitudinal Data of DHF Patients

Based on the simulation results of the BSR model of longitudinal data, it can be concluded that in the case of heteroscedasticity, the best estimate is to use kernel weighting and W weighting. The following section will cover the application of the BSR model for the case of heteroscedasticity using two indicators for determining DHF disease, namely platelet and hematocrit levels, which are influenced by time and hemoglobin. This study uses secondary data from medical records of inpatients with DHF in 2023 at Roemani Hospital Semarang, Central Java Province, Indonesia. The number of subjects was 10 patients, and each patient was observed seven times for a total of 70 patients. The parametric predictor is the hemoglobin in inpatients with DHF, and the nonparametric predictor is the examination time (per day) in inpatients with DHF. The following are the characteristics of data taken from 10 samples of patients who were hospitalized for seven days suffering from Grade 1 DHF, which was characterized by symptoms of fever, followed by a positive tourniquet test indicative of viral infection with signs of bleeding, with the descriptive statistics presented below (see Table 3).
The first step for implementing the Local Polynomial kernel (LPK) of the BSR model for longitudinal data is to determine the optimal bandwidth and local polynomial degree of the nonparametric component. The determination uses the GCV method by selecting the bandwidth and local polynomial degree with the smallest GCV value as the optimal value. Based on the GCV method, the smallest GCV value is 72.89, with the optimal bandwidth value for the first response of 0.1 and the optimal bandwidth value for the second response of 0.6, while the local polynomial degree for the first response is 2 and the local polynomial degree on second response is 3. After obtaining the bandwidth value and local polynomial degree on each response, the next step is determining the parameter estimates of the parametric and nonparametric component models. Parameter estimation of the BSR model using the LPK estimator with the optimal bandwidth value for the first response of 0.1 and optimal bandwidth value for the second response of 0.6, in addition to a local polynomial degree for the first response of 2 and local polynomial degree for the second response of 3, produces an MSE value of 0.45. The following figure (see Figure 2) is a comparison graph between the actual values and estimated values using the LPK estimator for each response.
Based on Figure 2 and Figure 3, the data pattern of the estimated BSR model using the LPK follows the actual data pattern. The results of the BSR estimation model applied to the DHF data obtained an R-square value of 92.8%, which means that according to Hair et al. [48], it can be defined as a strong estimation model because the R-square value is close to 100%. Based on Figure 1, the estimation results show that the lowest platelet levels, i.e., 32,477/mm3, were experienced by the third patient on the fifth day of being hospitalized. Figure 2 shows that the lowest hematocrit levels, 32.52%, were experienced by the sixth patient on the first day of being hospitalized. The following section will present the platelet (the first response) and hematocrit (the second response) levels in the second patient on the fourth day of being hospitalized using the Polynomial Local Kernel (LPK) estimator as follows:
y 24 ( 1 ) = β 0 1 + β 1 1 x 1 i + η 1 t 24 = 290442 11778.9 x + 1.96 × 10 17 t 4 2 ;       3.9 < t < 4.1
y 24 ( 2 ) = β 0 2 + β 1 2 x 1 i + η 2 t 24 y ^ 24 2 = 4.94 + 2.62 x 9.84 × 10 3 t 4 + 9.55 × 10 3 t 4 2 1.53 × 10 2 t 4 3 ;       3.4 < t < 4.6
Based on Equations (44) and (45) and on the regression coefficient value of the hemoglobin variable, it can be concluded that every increase in the hemoglobin variable of 1 g/dL will cause platelet levels to decrease by 11,778.9/mm3, and hematocrit levels to increase by 2.62%, as experienced by the second patient on the fourth day of hospitalization. Meanwhile, the platelet modeling equation (first response) and hematocrit (the second response) modeling equation applicable to the seventh patient on the second day of hospitalization using the LPK estimator are as follows:
y 72 ( 1 ) = β 0 1 + β 1 1 x 1 i + η 1 t i j = 300517 10637.5 x 6.63 × 10 17 t 2 + 5.48 × 10 17 t 2 2 ;     1.9 < t < 2.1
y 72 ( 2 ) = β 0 2 + β 1 2 x 1 i + η 2 t i j = 5.02 + 2.62 x + 0.07 t 4 0.02 t 4 2 + 0.08 t 4 3 ;     1.4 < t < 2.6
Based on Equations (46) and (47), every 1 g/dL increase in hemoglobin variable will cause platelets to decrease by 10,637.5/mm3, and hematocrit to increase by 2.62%, which was experienced by the seventh patient on the second day of hospitalization.

5. Conclusions

Equation (39) is the result of applying LPK (Local Polynomial Kernel) estimator to estimate the Biresponse Semiparametric Regression (BSR) model for longitudinal data. The estimation method used for nonparametric functions uses WLS, while parametric functions use the LS method. In the LPK estimator using kernel weighting, the kernel weighting contains the optimal bandwidth. The selection of the optimum polynomial order and bandwidth in the model estimation is determined using the GCV method. The smallest GCV value is used to determine the optimum polynomial order and bandwidth. In addition, the use of the weighting matrix W, i.e., the inverse of the variance–covariance matrix, plays an important role in this process. It has been proven by simulating data using several iterations, which resulted in the use of weighting in the form of the inverse of the variance–covariance matrix W in biresponse regression estimation, and has been proven to provide better estimation accuracy values than methods that only use kernel weighting. The implementation of this modeling estimation is applied to DHF data, with the response variables being platelets and hematocrit, while the parametric component predictor variable is hemoglobin and the nonparametric component predictor variable is the examination time. The results of the model estimation obtained an MSE = 0.45, and R-square = 92.8%, which means that 92.8% of the variability in the dependent variable is explained by the independent variables used in this research and the remaining 7.2% are the independent variables not used in this research.
This study is only limited to data that have two responses and two predictor variables consisting of one predictor as a nonparametric component and the other predictor as a parametric component, so this topic needs to be studied further by increasing the number of response variables (multi-responses) and predictor variables (multi-predictors). In addition, the parametric component predictor variable is linear to the response variable so it needs to be expanded into a polynomial for the response variable.

Author Contributions

This research article was the result of the following contributions from all authors: conceptualization, T.W.U., N.C. and T.S.; methodology, T.W.U., N.C., T.S. and B.L.; software, T.W.U., N.C., T.S. and B.L.; validation, T.W.U., N.C., T.S. and B.L.; formal analysis, T.W.U., N.C., T.S., B.L. and D.A.; investigation and resources, T.W.U., N.C. and T.S.; writing—preparation of the original draft, T.W.U., N.C. and T.S.; review and editing, T.W.U., N.C., T.S., B.L. and D.A.; supervision, T.W.U., N.C. and T.S.; project administration, T.W.U. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Muhammadiyah University of Semarang, Indonesia, under the number 1927/UNIMUS/KP/2022.

Data Availability Statement

Upon reasonable request, the corresponding author will provide the data used in this article.

Acknowledgments

The authors would like to acknowledge the editors and peer reviewers of the Symmetry journal for their insightful comments, revisions, criticisms, and recommendations that helped to improve the paper.

Conflicts of Interest

No conflicts of interest have been disclosed by the authors.

Nomenclature

NRNonparametric Regression
SRSemiparametric Regression
LPKLocal Polynomial Kernel
LSLeast Squares
WLSWeighted Least Squares
GCVGeneralized Cross–Validation
OLSOrdinary Least Squares
BNRBiresponse Nonparametric Regression
BSRBiresponse Semiparametric Regression
DHFDengue Hemorrhagic Fever
MARSMultivariate Adaptive Regression Spline
MSEMean Square Error

References

  1. Gujarati, D. Basic Econometrics, 4th ed.; McGraw-Hill: New York, NY, USA, 2003. [Google Scholar]
  2. Eubank, R.L. Nonparametric Regression and Spline Smoothing, 2nd ed.; Marcel Dekker: New York, NY, USA, 1999. [Google Scholar]
  3. Cheng, M.-Y.; Huang, T.; Liu, P.; Peng, H. Bias reduction for nonparametric and semiparametric regression models. Stat. Sin. 2018, 28, 2749–2770. [Google Scholar] [CrossRef]
  4. Wahba, G. Spline Models for Observational Data; SIAM: Philadelphia, PA, USA, 1990. [Google Scholar]
  5. Asrini, L.J. Fourier series semiparametric regression models (Case study: The production of law land rice irrigation in Central Java). ARPN J. Eng. Appl. Sci. 2014, 9, 1501–1506. [Google Scholar]
  6. Diana, R.; Budiantara, I.N.; Purhadi; Darmesto, S. Smoothing spline in semiparametric additive regression model with bayesian approach. J. Math. Stats. 2013, 9, 161–168. [Google Scholar] [CrossRef]
  7. Mohaisen, A.J.; Abdulhussein, A.M. Spline semiparametric regression models. J. Kufa Math. Comp. 2015, 2, 1–10. [Google Scholar] [CrossRef]
  8. Lestari, B.; Chamidah, N.; Budiantara, I.N.; Aydin, D. Determining confidence interval and asymptotic distribution for parameters of multiresponse semiparametric regression model using smoothing spline estimator. J. King Saud Univ.-Sci. 2023, 35, 102664. [Google Scholar] [CrossRef]
  9. Lestari, B.; Chamidah, N.; Aydin, D.; Yilmaz, E. Reproducing kernel Hilbert space approach to multiresponse smoothing spline regression function. Symmetry 2022, 14, 2227. [Google Scholar] [CrossRef]
  10. Wu, H.; Zhang, J.T. Nonparametric Regression Methods for Longitudinal Data Analysis; John Wiley & Sons, Inc.: Toronto, ON, Canada, 2006. [Google Scholar]
  11. Welsh, A.H.; Yee, T.W. Local regression for vector responses. J. Stat. Plann. Infer. 2006, 136, 3007–3031. [Google Scholar] [CrossRef]
  12. Wasserman, L. All of Nonparametric Statistics; Springer: New York, NY, USA, 2005. [Google Scholar]
  13. Mardianto, M.F.F.; Gunardi; Utami, H. An analysis about Fourier series estimator in nonparametric regression for longitudinal data. Math. Stats. 2021, 9, 501–510. [Google Scholar] [CrossRef]
  14. Ketrin, M.W.; Fitri, F.; Putra, A.A.; Zilrahmi. Nonparametric regression modeling with Fourier series approach on poverty cases in West Sumatra province. UNP J. Stats. Data Sci. 2023, 1, 53–58. [Google Scholar] [CrossRef]
  15. Iriany, A.; Fernandes, A.A.R. Hybrid Fourier series and smoothing spline path non-parametrics estimation model. Front. Appl. Math. Stat. 2023, 8, 1045098. [Google Scholar] [CrossRef]
  16. Rahmawati, D.P.; Budiantara, I.N.; Prastyo, D.D.; Octavanny, M.A.D. Modeling of human development index in Papua province using spline smoothing estimator in nonparametric regression. J. Phys. Conf. Ser. 2021, 1752, 012018. [Google Scholar] [CrossRef]
  17. Geller, J.; Neumann, M.H. Improved local polynomial estimation in time series regression. J. Nonparametr. Stats. 2017, 30, 1–27. [Google Scholar] [CrossRef]
  18. Fan, J.; Gijbels, I. Local Polynomial Modelling and Its Applications, 1st ed.; Routledge: New York, NY, USA, 2018. [Google Scholar]
  19. Araveeporn, A. The estimating parameter and number of knots for nonparametric regression methods in modelling time series data. Modelling 2024, 5, 1413–1434. [Google Scholar] [CrossRef]
  20. Mariati, M.P.A.M.; Budiantara, I.N.; Ratnasari, V. The application of mixed smoothing spline and Fourier series model in nonparametric regression. Symmetry 2021, 13, 2094. [Google Scholar] [CrossRef]
  21. Hermawan, T. Estimasi kurva regresi spline pada data longitudinal dengan metode kuadrat terkecil. J. Intersect. 2020, 5, 17–25. [Google Scholar] [CrossRef]
  22. Ramli, M.; Ratnasari, V.; Budiantara, I.N. Estimation of matrix variance covariance on nonparametric regression spline truncated for longitudinal data. J. Phys. Conf. Ser. 2020, 1562, 012014. [Google Scholar] [CrossRef]
  23. Sifriyani; Sari, A.R.M.; Dani, A.T.R.; Jalaluddin, S. Bi-response truncated spline nonparametric regression with optimal knot point selection using generalized cross-validation in diabetes mellitus patient’s blood sugar levels. Commun. Math. Biol. Neurosci. 2023, 2023, 48. [Google Scholar]
  24. Zhang, X.; Liao, J.; Lu, K. Smoothing spline estimation for nonparametric model of longitudinal data. Chin. J. Appl. Prob. Stats. 2016, 32, 313–326. [Google Scholar]
  25. Rizaldi, M.; Fitriyani, N.; Baskara, Z.W. Modeling of economic growth rate in West Nusa Tenggara province with longitudinal kernel nonparametric regression. Eig. Math. J. 2024, 7, 50–55. [Google Scholar] [CrossRef]
  26. Sadek, A.; Mohammed, L.A. Evaluation of the performance of kernel non-parametric regression and ordinary least squares regression. Int. J. Inform. Vis. 2024, 8, 1352–1360. [Google Scholar] [CrossRef]
  27. Sriliana, I.; Budiantara, I.N.; Ratnasari, V. The performance of mixed truncated spline-local linear nonparametric regression model for longitudinal data. MethodsX 2024, 12, 102652. [Google Scholar] [CrossRef] [PubMed]
  28. Xiang, D.; Qiu, P.; Pu, X. Nonparametric regression analysis of multivariate longitudinal data. Stat. Sin. 2013, 23, 769–789. [Google Scholar] [CrossRef]
  29. Fernandes, A.A.R.; Budiantara, I.N.; Otok, B.W.; Suhartono. Spline estimators for bi-responses nonparametric regression model for longitudinal data. Appl. Math. Sci. 2014, 8, 5653–5665. [Google Scholar] [CrossRef]
  30. Nisa, K.; Budiantara, I.N. Modeling East Java Indonesia life expectancy using semiparametric regression mixed spline truncated and Fourier series. Media Stat. 2020, 13, 149–160. [Google Scholar] [CrossRef]
  31. Ampa, A.T.; Budiantara, I.N.; Zain, I. Kernel multivariable semiparametric regression model in estimating the level of open unemployment in East Java Province. J. Phys. Conf. Ser. 2021, 1899, 012127. [Google Scholar] [CrossRef]
  32. Sun, X.; You, J. Iterative weighted partial spline least squares estimation in semiparametric modeling of longitudinal data. Sci. China Ser. A Math. 2003, 46, 724–735. [Google Scholar] [CrossRef]
  33. Jazi, O.A.; Pullenayegum, E. Variable selection in semiparametric regression models for longitudinal data with informative observation times. Stat. Med. 2022, 41, 3281–3298. [Google Scholar] [CrossRef] [PubMed]
  34. Utami, T.W.; Chamidah, N.; Saifudin, T. Platelet modeling in DHF patients using local polynomial semiparametric regression on longitudinal data. J. Teor. Apl. Mat. 2024, 8, 231. [Google Scholar] [CrossRef]
  35. Li, Y. Efficient semiparametric regression for longitudinal data with nonparametric covariance estimation. Biometrika 2011, 98, 355–370. [Google Scholar] [CrossRef]
  36. Jia, S.; Zhang, C.; Wu, H. Efficient semiparametric regression for longitudinal data with regularised estimation of error covariance function. J. Nonparametr. Stats. 2019, 31, 867–886. [Google Scholar] [CrossRef]
  37. Jiao, G.; Liang, J.; Wang, F.; Chen, X.; Chen, S.; Li, H.; Jin, J.; Cai, J.; Zhang, F. Longitudinal data analysis based on Bayesian semiparametric method. Axioms 2023, 12, 431. [Google Scholar] [CrossRef]
  38. Chamidah, N.; Lestari, B.; Budiantara, I.N.; Saifudin, T.; Rulaningtyas, R.; Aryati, A.; Wardani, P.; Aydin, D. Consistency and asymptotic normality of estimator for parameters in multiresponse multipredictor semiparametric regression model. Symmetry 2022, 14, 336. [Google Scholar] [CrossRef]
  39. Ampulembang, A.P.; Otok, B.W.; Rumiati, A.T.; Budiasih. Bi-responses nonparametric regression model using MARS and its properties. Appl. Math. Sci. 2015, 9, 1417–1427. [Google Scholar] [CrossRef]
  40. Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linier Regression Analysis, 4th ed.; John Wiley & Sons, Inc.: New York, NY, USA, 2012. [Google Scholar]
  41. Kikechi, C.B. On local polynomial regression estimators in finite populations. Int. J. Stat. Appl. Math. 2020, 5, 58–63. [Google Scholar]
  42. Johnson, R.A.; Wichern, D.W. Applied Multivariate Statistical Analysis; Prentice Hall: New York, NY, USA, 1982. [Google Scholar]
  43. Jayashree, K.; Manasa, G.C.; Pallavi, P.; Manjunath, G.V. Evaluation of platelets as predictive parameters in dengue fever. Indian J. Hematol. Blood Transfus. 2011, 27, 127–130. [Google Scholar] [CrossRef]
  44. Ojha, A.; Nandi, D.; Batra, H.; Singhal, R.; Annarapu, G.K.; Bhattacharyya, S.; Seth, T.; Dar, L.; Medigeshi, G.R.; Vrati, S.; et al. Platelet activation determines the severity of thrombocytopenia in dengue infection. Sci. Rep. 2017, 7, 41697. [Google Scholar] [CrossRef] [PubMed]
  45. Faridah, I.N.; Dania, H.; Chen, Y.-H.; Supadmi, W.; Purwanto, B.D.; Heriyanto, M.J.; Aufa, M.A.; Chang, W.-C.; Perwitasari, D.A. Dynamic changes of platelet and factors related dengue hemorrhagic fever: A retrospective study in Indonesian. Diagnostics 2022, 12, 950. [Google Scholar] [CrossRef]
  46. Rosopa, P.J.; Schaffer, M.M.; Schroeder, A.N. Managing heteroscedasticity in general linear models. Psychol. Methods 2013, 18, 335–351. [Google Scholar] [CrossRef]
  47. Midi, H.; Rana, M.D.S.; Imon, R. The performance of robust weighted least squares in the presence of outliers and heteroscedastic errors. WSAS Transact. Math. 2019, 7, 351–361. [Google Scholar]
  48. Hair, J.F., Jr.; Black, W.C.; Babin, B.J. Multivariate Data Analysis, 5th ed.; Prentice Hall, Inc.: Hoboken, NJ, USA, 2011. [Google Scholar]
Figure 1. Comparison of average of MSE ratio in each scenario in the simulation.
Figure 1. Comparison of average of MSE ratio in each scenario in the simulation.
Symmetry 17 00392 g001
Figure 2. Plot results of the estimated model values along with the actual values for the first response (platelets).
Figure 2. Plot results of the estimated model values along with the actual values for the first response (platelets).
Symmetry 17 00392 g002
Figure 3. Plot results of the estimated values of the model along with the actual values on the second response (hematocrit).
Figure 3. Plot results of the estimated values of the model along with the actual values on the second response (hematocrit).
Symmetry 17 00392 g003
Table 1. Simulation scenario.
Table 1. Simulation scenario.
No.ScenarioDescription
1Longitudinal Data Sample Size49, 70, 105, 140
2Type of CasesHeteroscedasticity/Homoscedasticity
3Parametric Component Functions g 1 x i j = 6 x i j + 3 ; g 2 x i j = 2 x i j + 0.8
4Nonparametric Component FunctionsPolynomial Function:
f 1 t i j = t i j 3 + 8 t i j 2 + 10 t i j + 7
f 2 t i j = t i j 3 + 3 t i j 2 + 8 t i j + 2
Exponential Function:
f 1 t i j = 3 E x p ( t i j )
f 2 t i j = 2 E x p ( t i j )
Trigonometric Function:
f 1 t i j = 3 + 5 C o s ( π   t i j )
f 2 t i j = 2 + 3 S i n ( π   t i j )
Table 2. Simulation results for calculating the MSE ratio based on each scenario.
Table 2. Simulation results for calculating the MSE ratio based on each scenario.
Number of Samples (n)Type of Case HeteroscedasticityAverage of MSE Ratio
PolynomialExponentialTrigonometric
49Yes1.80201.69951.6300
No1.43051.34171.3187
70Yes2.51032.56612.3767
No2.08372.10961.8931
105Yes3.47173.38993.2816
No2.80332.78002.7251
140Yes4.15324.01894.1729
No3.31653.30583.3510
Table 3. Descriptive statistics in Grade 1 DHF patients.
Table 3. Descriptive statistics in Grade 1 DHF patients.
VariableMeanMinMax
Platelets (mm3)75,271.2832,000147,000
Hematocrit (%)40.7732.948.8
Hemoglobin (g/dL)13.7510.716.8
Time (days)417
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Utami, T.W.; Chamidah, N.; Saifudin, T.; Lestari, B.; Aydin, D. Estimation of Biresponse Semiparametric Regression Model for Longitudinal Data Using Local Polynomial Kernel Estimator. Symmetry 2025, 17, 392. https://doi.org/10.3390/sym17030392

AMA Style

Utami TW, Chamidah N, Saifudin T, Lestari B, Aydin D. Estimation of Biresponse Semiparametric Regression Model for Longitudinal Data Using Local Polynomial Kernel Estimator. Symmetry. 2025; 17(3):392. https://doi.org/10.3390/sym17030392

Chicago/Turabian Style

Utami, Tiani Wahyu, Nur Chamidah, Toha Saifudin, Budi Lestari, and Dursun Aydin. 2025. "Estimation of Biresponse Semiparametric Regression Model for Longitudinal Data Using Local Polynomial Kernel Estimator" Symmetry 17, no. 3: 392. https://doi.org/10.3390/sym17030392

APA Style

Utami, T. W., Chamidah, N., Saifudin, T., Lestari, B., & Aydin, D. (2025). Estimation of Biresponse Semiparametric Regression Model for Longitudinal Data Using Local Polynomial Kernel Estimator. Symmetry, 17(3), 392. https://doi.org/10.3390/sym17030392

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop