1. Introduction
Regression analysis has long been used to identify patterns of functional relationships through the form of the regression curve and the effect between predictor variables (independent) and response variables (dependent) by estimating the regression curve. First, we create a scatter diagram to determine the curve shape between these variables. In the regression model, the curve has a patterned shape, including curve shapes such as linear, quadratic, or exponential patterns [
1]. Parametric and nonparametric regression models are the two types used in regression analysis. The nonparametric regression model includes the assumption that a curve’s form is unknown, which can be interpreted as no previous information available. The parametric regression provides the assumption that the regression curve’s shape is known, or it can be said that there is information about the shape of the regression curve. The nonparametric regression has great flexibility because the pattern of the nonparametric regression curve is assumed to be smooth [
2].
Nonparametric regression (NR) is a regression analysis approach in which the predictor is built based on the information obtained from the data rather than having a preset form, meaning that there is an unknown specific curve shape connecting the predictor with the response variable. The NR model does not have assumptions like parametric regression. In the NR model, the regression curve is only assumed to be smooth, meaning it is in a certain function space. In estimating the NR curve, it is not influenced by the researcher’s subjectivity, so it has high flexibility. Next, a regression analysis that uses both parametric and nonparametric models is defined as Semiparametric Regression (SR). The SR model is formed by at least two predictors. The SR model assumes that a functional connection between the response with predictor variables follows certain patterns, while the shape of the pattern for other predictor variables is unknown [
3].
In some problems with multipredictors, the response variable has a curve shape whose relationship pattern with the predictor variable is unknown, but with other predictor variables, the shape of the curve is known. In this condition, the SR model approach is suggested [
4]. The SR models utilize certain estimators, such as kernel, spline, Fourier series, and local polynomial estimators. Several studies regarding one-response SR have been conducted by researchers, including Asrini [
5] who used the Fourier series estimator; Diana et al. [
6] who used a Bayesian approach; and Mohaisen and Abdulhussein [
7] who used the spline estimators. Meanwhile, modeling biresponse or multi-response regression in the NR model has been conducted by Lestari et al. [
8,
9] who estimated the regression curve with a smoothing spline estimator.
The regression model has three types of data, namely cross-sectional, time series, and longitudinal data. The features of cross-sectional data differ from those of longitudinal data. The observations in the cross-sectional data contain
independent subjects, and only one observation is made for each subject. Time series data can be interpreted as a series of data obtained by observations at different times periodically at certain intervals [
1]. Longitudinal data are a collection of data observed from
independent objects, each observed repeatedly at different times. Longitudinal data have advantages over cross-sectional data. That is, with the same number of subjects, the results of error observations provide a better estimate of the treatment effect than cross-sectional data because the data are estimated for each observation. In addition, longitudinal studies are more powerful than cross-sectional studies because fewer subjects are needed to obtain the same statistical power [
10].
The main objective of SR modeling is to obtain an estimate of the regression curve. Several estimators can be used to estimate the regression curve, including the Local Polynomial Kernel (LPK). The LPK estimates the regression curve based on the polynomial form with the polynomial order according to the data pattern. The LPK estimator is based on assigning weights to each observation using the kernel function. In contrast, the weight is determined by the smoothing parameter, namely the bandwidth. LPK has several advantages, including reducing asymptotic bias and providing good estimation results [
11]. In the estimating process of the regression curve using the LPK estimator, we optimize a WLS (Weighted Least Squares) function based on the GCV criterion. The GCV approach can be used to determine the smoothing parameters (bandwidth). Furthermore, the WLS method is a form of development of OLS (Ordinary Least Squares) used to overcome heteroscedasticity. Heteroscedasticity in the regression model occurs due to the inequality of the variance of the error vector. The WLS is a special case of generalized Least Squares. The WLS method can maintain the efficiency properties of its estimator without losing its bias and consistency properties [
12]. Determination of the optimum bandwidth in nonparametric regression can be performed using the GCV method. The GCV method is often used because it has optimal asymptotic properties, and invariance to transformation and unknown variance of population. The optimum bandwidth is selected based on the minimum of GCV [
13,
14].
The Biresponse Nonparametric Regression (BNR) model for longitudinal data has been widely studied, and estimates of regression model parameters have been produced. The SR approach to longitudinal data, time series, and cross-section one-response regression models has been widely used by researchers. Longitudinal data have advantages compared to other types of data. Longitudinal data are able to capture changes and patterns over time of the observed subjects. In addition, longitudinal data are able to control the heterogeneity of the subject because the subject is observed repeatedly. It provides the advantage of investigating the changes in the subject over time, which the cross-section data are unable to achieve. Previous studies for longitudinal data were limited to one-response SR models, while some real cases involve biresponse or multi-response variables. This research examines the estimation of SR models using biresponse variables on longitudinal data.
The biresponse regression model differs from the one-response regression model, where the biresponse model consists of several equations assuming a correlation between responses. Therefore, the novelty of this research is the estimation of the BSR model using the Local Polynomal Kernel estimator by considering variance–covariance matrix estimates. In the estimation process of the BSR model, we use the WLS (Weighted Least Squares) optimization method by including a weight matrix, which is the inverse of the variance–covariance matrix, into the Least Squares optimization function based on the GCV criterion. Therefore, the estimation of the variance–covariance matrix is very necessary in this research. This is what distinguishes it from the classical estimation methods that have existed before. In this research, the LPK estimator is combined with the variance–covariance matrix estimation, which is validated on the simulation data. The simulation results are combined with DHF data of hospitalized patients. The response variables used are platelets and hematocrit of DHF patients during hospitalization, while the predictor variables are the examination time and hemoglobin levels. The predictor variable of examination time is a nonparametric component, while the hemoglobin is a parametric component. The original contribution of this research is the development of a semiparametric model involving two responses in longitudinal data with LPK function. In addition, this model is applied to model DHF data, where platelets and hematocrit are correlated, while time and hemoglobin variables are predictor variables with nonparametric and parametric components, respectively.
3. Materials and Methods
In order to accomplish the research goals, this section provides a brief explanation of the materials and research steps used, such as the Weighted Least Squares (WLS) estimator, the SR model of longitudinal data, the Local Polynomial Kernel estimator in the NBR model, the goodness of fit of the estimated model, the DHF, and research steps.
3.1. Weighted Least Squares (WLS) Estimator
According to Montgomery et al. [
40], deviations in error assumptions can usually occur in observations in regression analysis. The deviation is indicated by
, so that to overcome the regression model, the WLS method is used. This method can overcome violations of the heteroscedasticity assumption and can eliminate the unbiased and consistent properties of the OLS (Ordinary Least Squares) model. The deviation can be seen from the following regression model:
where
represents a response variable,
represents a predictor variable,
represents a parameter of predictor variable, and
represents a random error. Here,
, and
where
is a weighting matrix, and
, where
demonstrates Multivariate normal distribution. This regression model shows the deviation form, namely the matrix
. This means that the estimator for the parameter of the regression model based on Least Squares, namely
is not valid. Hence, another method is needed to obtain the estimator for the parameter of the regression model, namely the Weighted Least Squares (WLS) method, so that we can obtain the estimator for the parameter of the regression model based on the WLS, where
.
3.2. Semiparametric Regression (SR) Model for Longitudinal Data
Longitudinal data are obtained by repeatedly observing each subject at different periods. Longitudinal data have different characteristics from cross-sectional data. The difference between longitudinal and cross-sectional data is that cross-sectional data are independent, while longitudinal data are dependent on the same subject and independent from different subjects [
10]. Between responses and between observations on the same subject are correlated when longitudinal data include multiple responses [
38]. The SR model combines two regression models, namely the parametric regression model and nonparametric model, so the SR model has at least two predictor variables. The SR model assumes that the functional association between the response and predictor variables follows a certain pattern, while the other predictor has an unknown pattern. For example,
denotes the response variable for the
i-th subject while
and
represent the predictor variables. If the form of the function between
and
is known and the form of curve for
is unknown (if linear), the data can be modeled using the SR model as follows:
where
is the response variable at the
i-th observation;
are parametric components of the predictor variable;
are parameters of the parametric components;
is a nonparametric component of predictor variables; and
is the random error with
. The function
is a function whose curve shape is not known yet, and it is assumed to be smooth. Model (1) can be represented by the following matrix equation:
where
is the response variable with dimensions
,
is the parametric component of the predictor variable with the dimensions
,
is the parameter of the parametric component with the dimensions
and
is the nonparametric component of the predictor variable with the dimensions (
) that are assumed to be smooth and
[
38,
41].
Next, the SR model for longitudinal data is given as follows [
38]:
where
is the response variable,
is the parametric component of the regression function,
is an unknown function called the nonparametric component of the regression function that is estimated by the Local Polynomial Kernel (LPK) estimator, and
is the observation error.
Model (3) can be written as the following matrix equation:
Therefore, Equation (4) is as follows:
where
. Furthermore, if the estimation of
, namely
, is known, then the nonparametric function
can be estimated by using WLS optimization.
Therefore, Equation (2) can be expressed as follows:
Next, let
be an estimator of
obtained by using the following WLS optimization:
The WLS optimization in Equation (6) can be written as follows:
where
and
. Therefore, the value for
can be obtained by adjusting Equation (7) with respect to
, and setting to zero. Hence, we obtain the following equation:
Then, we obtain the LPK estimator,
, as follows:
Thus, based on Equation (8), the following equation can be obtained:
where
A .
Next, we determine
by minimizing the Least Squares (LS) function as follows:
Hence, we obtain
as follows:
where
,
, and
3.3. Local Polynomial Kernel Estimator in Biresponse Nonparametric Regression (BNR) Model
In regression analysis, if the response variable has two correlated variables, then it is called the biresponse regression (BR) model. The BR model can be written as follows:
where
is the response variable,
is the regression function, and
is the observation error with the variance–covariance matrix of each observation,
, given by the following expression:
where
and
are two variance components, and
is the correlation coefficient [
41].
The Biresponse Nonparametric Regression (BNR) model is a regression model consisting of two response variables that are correlated with each other. The BNR model can be written as follows:
where
represents two response variables that are correlated with each other,
is the regression function, and
is the observation error with the variance–covariance matrix
, where index
i indicates the observation. Here,
and
are correlated with each other. The function
is a nonparametric biresponse curve estimated using the LPK estimator by considering the kernel function
and a weighting matrix
. The matrix
is a
weighting matrix obtained from the inverse of the variance–covariance matrix, as shown below:
where
, and
.
The function
is approximated locally with a Taylor Series of
p-degree polynomials at point
as follows [
41]:
where
,
.
Let ; ; ; ; ; ; ; ; and .
If
;
, so Equation (13) can be transformed into the following equation:
The estimation of
, namely
, is obtained by using the Weighted Least Squares (WLS) method, which minimizes the following function:
where
; and
Hence, we obtain the estimation of
, namely
as follows:
The bandwidth or smoothing parameter is a controller of the balance between the smoothing function and the suitability of the function to the data. If the bandwidth is too small, the shape of the estimated curve will not be smooth, meaning that the estimated value obtained is closer to the data. An excessively wide bandwidth will result in an overly smooth estimated curve that tends to approach the response variable’s average. As a result, selecting the appropriate bandwidth for nonparametric regression is crucial. One technique for determining the optimum bandwidth
is the following Generalized Cross-Validation (GCV) approach, as shown below [
9,
23]:
The bandwidth
that generates the lowest GCV value is the optimum bandwidth
value.
3.4. Goodness of Fit of Estimated Model
The degree to which the independent variable simultaneously influences the dependent variable is ascertained using the coefficient of determination
. The greater the coefficient of determination value, the greater the influence of the independent variable on the dependent variable. The formula for the coefficient of determination is given as follows [
42]:
where
is the response variable in the
-th subject of the
-th observation,
is the estimated response variable in the
-th subject of the
-th observation, and
is the average response.
The average of the squared errors between the estimated and real values is known as the MSE. The best model can be identified by finding the minimum MSE value close to zero. This means that the estimation result is close to the actual data value and can be used to calculate the prediction for the next period. The formula of the MSE is given by Equation (17).
3.5. Dengue Haemorrhagic Fever (DHF)
Dengue Hemorrhagic Fever (DHF) is a disease that often occurs in tropical areas and is geographically similar to malaria. DHF is transmitted to humans by mosquitoes called Aedes Aegypti. The course of DHF is about 6 to 7 days, with a peak of mild fever at the end of the fever period. The diagnosis of dengue fever is generally based on the results of laboratory tests on patient blood samples [
43]. The normal platelet count ranges from 150 thousand per microliter to 440 thousand per microliter. However, DHF patients experience a decrease in platelets to below 100,000/mm
3. The laboratory diagnosis of DHF is thrombocytopenia, which is a decrease in platelets to below 100,000/mm
3. Hemoconcentration is an increase in hematocrit levels by 20% or more [
44].
The data used in this study are secondary data regarding platelet, hematocrit, and hemoglobin levels in hospitalized patients with DHF in 2022 at the Roemani Hospital Semarang. The number of subjects taken was ten patients, with each patient observed for seven days of observation, with the criteria of patients observed being Grade I patients, marked by symptoms of fever, followed by a positive tourniquet test and indications of viral infection with signs of bleeding [
45]. This research used two response variables: platelet count and hematocrit data from hospitalized DHF patients. The parametric predictor variable is the hemoglobin level in hospitalized DHF patients, and the nonparametric predictor is the examination time (per day) in hospitalized DHF patients.
3.6. Research Steps
The research steps include estimating a two-response semiparametric regression (SR) model for longitudinal data using a local polynomial estimator and then applying it to platelet and hematocrit data from Grade 1 DHF patients. This research uses two response variables and two predictor variables.
Determination of the components from BSR models of longitudinal data, such as parametric component regression function and nonparametric component regression function.
Estimation of BSR model using longitudinal data with LPK estimators. The estimation method in this study was to optimize the WLS method.
Simulation of BSR model using the LPK estimator with longitudinal data by considering the use of variance–covariance matrix estimation and the simulation of the implementation of the BSR model in the case of homoscedasticity and heteroscedasticity using three types of functions, namely trigonometric functions, exponential functions, and polynomial functions. The simulation process was carried out with the help of open source software R. The prediction accuracy value in this simulation uses the MSE ratio; if the MSE ratio value is greater than 1, then the MSE of the second method is better than the MSE of the first method. Therefore, it can be proven that the estimation in the BSR model is better using the matrix W and kernel weighting.
Implementation of the BSR model using the LPK estimator of longitudinal data in DHF patients. The following section will cover the application of the BSR model for the case of heteroscedasticity using two indicators for determining DHF disease, namely platelet and hematocrit levels, which are influenced by time and hemoglobin.
4. Results and Discussion
This session explains the analysis conducted following the research objectives. We discuss the estimation of the Biresponse Semiparametric Regression (BSR) model for longitudinal data using the LPK estimator. The estimation method in this research is to optimize the Weighted Least Squares (WLS) function. Then, the results of the BSR model estimation of longitudinal data using the LPK estimator are applied to the simulation data by considering the use of variance–covariance matrix estimation. The simulation results are combined with DHF data of hospitalized patients.
4.1. Determining Components of BSR Model for Longitudinal Data
In the paired longitudinal data
;
, with
m indicating the number of subjects, while
n indicates the number of observations and
r indicates the number of responses. The predictor variables consist of
and
.
is the first response, and
is the second response. The relationship curve form between
and
is known, while the relationship curve form between
and
is unknown. Hence, the relationship between
and
follows a semiparametric regression (SR) model as follows:
If
, then Equation (19) can be transformed into the following matrix equation:
where
consists of two correlated response variables,
is the parametric component regression function,
is the nonparametric component regression function and
is the measurement error with a mean of
0 and variance of
. The equation for each response variable can be expressed as follows:
In the matrix notation, we may write the equation as follows:
where
Based on the nature of longitudinal data, random errors in each subject, namely
are correlated, and errors between subjects
are independent, with
stating the response variable for
. On the other hand, the errors in each response,
and
are correlated. Since
is approached by the Local Polynomial Kernel (LPK) estimator of degree
p at point
t around point
as in Equation (13), we may write it as follows:
for
.
Next, if
;
, the following equation can be obtained:
Hence, the Taylor series for
is given by the following equation:
Equation (23) can be expressed as follows:
where
, and
Also, the Taylor series for
is given by the following equation:
Equation (24) can be expressed as follows:
where
, and
Equations (23) and (24) can be expressed in matrix notation as follows:
Hence, the model in Equation (20) can be written as follows:
The model in Equation (26) is the BSR model for longitudinal data that contains two components, namely the parametric function component (
) and the nonparametric function component (i.e.,
).
4.2. Estimation of BSR Model for Longitudinal Data
In the BSR model (26), estimation of the nonparametric component parameter,
, is obtained by minimizing the Weighted Least Squares (WLS) criterion by using the following kernel weighting function.
where
, and
Next, to obtain the estimation of the nonparametric component parameter,
, firstly we assume that the estimation of the parametric component parameter,
, is known. Therefore, the Equation (19) becomes the following:
where
. Hereinafter, we may write the model in the matrix notation as follows:
where
.
The form of function
in the model (28) is unknown. So, it is estimated by using the LPK estimator and including the kernel weighting function as follows:
Next, by taking the partially derivative of Equation (30) with respect to
and setting it to zero, we obtain the following results:
Based on Equations (25) and (30), the LPK estimator for
is as follows:
Since
, and based on Equation (32), we obtain the following equation:
where
.
Next, by substituting Equation (33) into Equation (26), this step gives the following equation:
Hence, based on Equation (34), we have the following Least Squares (LS) expression:
Next, by taking the partially derivative of Equation (35) with respect to
and setting it to zero, we obtain the following results:
Based on Equations (20) and (36), we obtain the estimation of the parametric component regression function, namely
, as follows:
where
.
Furthermore, by substituting Equation (37) into Equation (33), we obtain the LPK estimator of the nonparametric component of regression function as follows:
where
.
Finally, based on Equations (37) and (38), we obtain the estimation result of the Biresponse Semiparametric Regression (BSR) model for longitudinal data using Local Polynomial Kernel (LPK) estimator as follows:
4.3. Optimal Bandwidth Selection Using GCV Method
After obtaining the parameter estimates of the Biresponse Semiparametric Regression (BSR) model using the Local Polynomial Kernel estimator, the next step is determining the values of the Generalized Cross-Validation (GCV) and MSE of the model formed. The GCV criterion is one method for selecting the optimal bandwidth. The GCV function is defined as follows:
where
.
Furthermore, the selection of the optimal bandwidth using GCV is performed by substituting the value of the bandwidth () into the in Equation (40) until the minimum GCV(h) value is obtained. In addition to GCV, other measures of model goodness, such as MSE (Mean Square Error), are also considered.
The MSE is the average of the squared errors between the actual value and the estimated value. The best model can be determined by determining the minimum MSE value that is the closest to zero. This means that the estimated results are close to the actual values, and can be used to calculate prediction values for future periods. The formula for the MSE value of the BSR model is defined as follows:
4.4. Simulation Study
This section discusses the simulation of the implementation of the BSR model in the case of homoscedasticity and heteroscedasticity using three types of functions, namely trigonometric functions, exponential functions, and polynomial functions. The number of responses used is two responses (biresponse), and only uses two predictors, i.e., one predictor for the nonparametric component and one predictor for the parametric component.
The stages of generating data are as follows:
Generating predictor variable values as parametric components (x) with uniform distribution. The notation for uniform distribution is , where α is the lowest value of x, and b is the highest value of x.
Generating predictor variable values as nonparametric components (t) with uniform distribution. The notation for uniform distribution is , where α is the lowest value of t, and b is the highest value of t.
Generating error () values based on the bivariate normal distribution with a mean of 0 (namely, ) and a variance–covariance matrix Σ according to the specified correlation and variance, where the variance of the first response () is 10, and the variance of the second response () is 2, and the value of its covariance () is 3.
This simulation study aims to prove that in the case of biresponses, the best estimate can be obtained by using the weighting matrix (
W) in addition to kernel weighting. In this simulation, the prediction accuracy value, namely the MSE of the model estimation results with the first method, is compared with the MSE of the model estimation results with the second method. The first method is an estimation method that only uses the kernel weighting without using the
W, while the second method is an estimation method that uses the kernel weighting and the inverse weighting of the response error variance–covariance matrix
W. Thus,
If the MSE ratio value is greater than one, it indicates that the MSE of the second method is better than the first method because it has a smaller MSE value, and vice versa. This means that if the MSE ratio value is greater than one, it is proven that the estimation in the biresponse regression is better using the
W weighting and kernel weighting. This simulation study uses two responses with three different types of functions: trigonometric, exponential, and polynomial. In addition, this simulation has several scenarios, namely sample size, type of function and type of case. The simulation study scenario is presented in
Table 1.
Simulations were carried out on scenarios with different sample sizes (
n), i.e., 49, 70, 105, and 140 samples, types of cases, namely homoscedasticity and heteroscedasticity, and different functions (polynomial, exponential and trigonometric). Each scenario was simulated for 30 iterations and produced different MSE values. The simulation process was carried out with the help of open source software R and an example of the output of running the R-Code. The simulation results based on the scenario are given in
Table 2.
The results of the simulation study can be seen in
Table 2 and
Figure 1, which shows that the average of the MSE ratio is greater than one. It proves that weighting in inverse variance–covariance matrix
W estimation and kernel weighting in biresponse regression estimation are indeed needed and have been proven to provide better estimation accuracy values when using the inverse variance–covariance matrix
W compared to the method that only uses kernel weighting. Based on
Figure 1, it can be concluded that the more the amount of data increases, the higher the MSE ratio value. In addition, the MSE ratio value in the case of heteroscedasticity is higher than in the case of homoscedastic data. It means that higher MSE ratio values indicate that the use of the inverse variance–covariance matrix
W in biresponse regression estimation is very necessary to solve local polynomial biresponse regression models using longitudinal data, especially in cases of heteroscedasticity and increasingly large amounts of data. This is in accordance with research from Rosopa et al. [
46] and Midi et al. [
47], where regression parameter estimation using weighted
W was used if heteroscedasticity was found in the data.
4.5. Implementation of Biresponse Semiparametric Regression (BSR) Model Using LPK Estimator for Longitudinal Data of DHF Patients
Based on the simulation results of the BSR model of longitudinal data, it can be concluded that in the case of heteroscedasticity, the best estimate is to use kernel weighting and
W weighting. The following section will cover the application of the BSR model for the case of heteroscedasticity using two indicators for determining DHF disease, namely platelet and hematocrit levels, which are influenced by time and hemoglobin. This study uses secondary data from medical records of inpatients with DHF in 2023 at Roemani Hospital Semarang, Central Java Province, Indonesia. The number of subjects was 10 patients, and each patient was observed seven times for a total of 70 patients. The parametric predictor is the hemoglobin in inpatients with DHF, and the nonparametric predictor is the examination time (per day) in inpatients with DHF. The following are the characteristics of data taken from 10 samples of patients who were hospitalized for seven days suffering from Grade 1 DHF, which was characterized by symptoms of fever, followed by a positive tourniquet test indicative of viral infection with signs of bleeding, with the descriptive statistics presented below (see
Table 3).
The first step for implementing the Local Polynomial kernel (LPK) of the BSR model for longitudinal data is to determine the optimal bandwidth and local polynomial degree of the nonparametric component. The determination uses the GCV method by selecting the bandwidth and local polynomial degree with the smallest GCV value as the optimal value. Based on the GCV method, the smallest GCV value is 72.89, with the optimal bandwidth value for the first response of 0.1 and the optimal bandwidth value for the second response of 0.6, while the local polynomial degree for the first response is 2 and the local polynomial degree on second response is 3. After obtaining the bandwidth value and local polynomial degree on each response, the next step is determining the parameter estimates of the parametric and nonparametric component models. Parameter estimation of the BSR model using the LPK estimator with the optimal bandwidth value for the first response of 0.1 and optimal bandwidth value for the second response of 0.6, in addition to a local polynomial degree for the first response of 2 and local polynomial degree for the second response of 3, produces an MSE value of 0.45. The following figure (see
Figure 2) is a comparison graph between the actual values and estimated values using the LPK estimator for each response.
Based on
Figure 2 and
Figure 3, the data pattern of the estimated BSR model using the LPK follows the actual data pattern. The results of the BSR estimation model applied to the DHF data obtained an R-square value of 92.8%, which means that according to Hair et al. [
48], it can be defined as a strong estimation model because the R-square value is close to 100%. Based on
Figure 1, the estimation results show that the lowest platelet levels, i.e., 32,477/mm
3, were experienced by the third patient on the fifth day of being hospitalized.
Figure 2 shows that the lowest hematocrit levels, 32.52%, were experienced by the sixth patient on the first day of being hospitalized. The following section will present the platelet (the first response) and hematocrit (the second response) levels in the second patient on the fourth day of being hospitalized using the Polynomial Local Kernel (LPK) estimator as follows:
Based on Equations (44) and (45) and on the regression coefficient value of the hemoglobin variable, it can be concluded that every increase in the hemoglobin variable of 1 g/dL will cause platelet levels to decrease by 11,778.9/mm
3, and hematocrit levels to increase by 2.62%, as experienced by the second patient on the fourth day of hospitalization. Meanwhile, the platelet modeling equation (first response) and hematocrit (the second response) modeling equation applicable to the seventh patient on the second day of hospitalization using the LPK estimator are as follows:
Based on Equations (46) and (47), every 1 g/dL increase in hemoglobin variable will cause platelets to decrease by 10,637.5/mm
3, and hematocrit to increase by 2.62%, which was experienced by the seventh patient on the second day of hospitalization.
5. Conclusions
Equation (39) is the result of applying LPK (Local Polynomial Kernel) estimator to estimate the Biresponse Semiparametric Regression (BSR) model for longitudinal data. The estimation method used for nonparametric functions uses WLS, while parametric functions use the LS method. In the LPK estimator using kernel weighting, the kernel weighting contains the optimal bandwidth. The selection of the optimum polynomial order and bandwidth in the model estimation is determined using the GCV method. The smallest GCV value is used to determine the optimum polynomial order and bandwidth. In addition, the use of the weighting matrix W, i.e., the inverse of the variance–covariance matrix, plays an important role in this process. It has been proven by simulating data using several iterations, which resulted in the use of weighting in the form of the inverse of the variance–covariance matrix W in biresponse regression estimation, and has been proven to provide better estimation accuracy values than methods that only use kernel weighting. The implementation of this modeling estimation is applied to DHF data, with the response variables being platelets and hematocrit, while the parametric component predictor variable is hemoglobin and the nonparametric component predictor variable is the examination time. The results of the model estimation obtained an MSE = 0.45, and R-square = 92.8%, which means that 92.8% of the variability in the dependent variable is explained by the independent variables used in this research and the remaining 7.2% are the independent variables not used in this research.
This study is only limited to data that have two responses and two predictor variables consisting of one predictor as a nonparametric component and the other predictor as a parametric component, so this topic needs to be studied further by increasing the number of response variables (multi-responses) and predictor variables (multi-predictors). In addition, the parametric component predictor variable is linear to the response variable so it needs to be expanded into a polynomial for the response variable.