A New Partially Linear Regression with an Application to the Price of Coffee Before and After the Pandemic

Ortega, Edwin M. M.; Rodrigues, Gabriela M.; Jang, Kwan Sung; Cordeiro, Gauss M.

doi:10.3390/stats9020040

Open AccessArticle

A New Partially Linear Regression with an Application to the Price of Coffee Before and After the Pandemic

¹

Department of Exact Sciences, University of São Paulo, Piracicaba 13418-900, Brazil

²

Department of Statistics, Federal University of Pernambuco, Recife 50670-901, Brazil

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Stats 2026, 9(2), 40; https://doi.org/10.3390/stats9020040

Submission received: 23 February 2026 / Revised: 26 March 2026 / Accepted: 26 March 2026 / Published: 8 April 2026

Download

Browse Figures

Versions Notes

Abstract

We propose a partially linear regression linear model to explain coffee prices before and after the COVID-19 pandemic. This new regression model incorporates the fundamental assumption of linearity and nonlinearity between these variables. We consider the penalized quasi-likelihood method for parameter estimation and present residual analysis for the new regression model. A simulation study examines penalized quasi-likelihood estimators and the empirical distribution of the quantile residuals. Furthermore, the article aims to identify variables that influence changes in coffee prices, such as the price of Arabica and Robusta varieties, supply (expressed in millions of bags of production), global consumption, exchange rates, inflation, and the pandemic.

Keywords:

coffee price; partial regression model; penalized quasi-likelihood estimation; quantile residuals

1. Introduction

Regression analysis is a powerful tool that is essential in several fields, including agricultural sciences, to study the relationship between dependent and independent variables. Traditionally, regression models rely on distributions for the response variable, but these distributions are often insufficient to capture more complex behaviors in the data, such as skewness and bimodality. The data set analyzed in this article is composed of Arabica coffee prices from January 2019 to August 2024. Coffee is a global commodity, crucial to the economies of many developing countries. In addition to being an economic engine, coffee also plays an important social role, serving as a catalyst for social interaction and a symbol of hospitality. In Brazil, coffee has long been a hugely important commodity for the nation’s economy. Other covariates are also observed in this dataset, notably the price of Robusta coffee, global consumption, the occurrence of the pandemic, and the national consumer price index (IPCA). Figure 1 reveal the behavior of these variables from a descriptive perspective. Figure 1a presents the price frequency histogram, where there is asymmetry and bimodality. In turn, Figure 1b shows a nonlinear relationship between the price of Arabica coffee and the IPCA index. The smooth curve was obtained using a locally weighted regression smoothing method (LOESS), which fits local polynomial regressions to subsets of the data to capture potential nonlinear patterns between the variables. This non-parametric approach allows the visualization of the underlying trend without imposing a predefined functional form on the relationship. Figure 1c reports the boxplot of the price of Arabica coffee in relation to the pandemic covariate. In particular, the different behavior of the price of coffee before and after the pandemic stands out. In all cases, prices are based on a 60 kg bag of coffee and are measured in Brazilian Reais.

It is important to develop new flexible distributions to explain these characteristics more robustly. Several approaches have been proposed in the literature to address departures from linearity and normality in regression models. Classical transformation-based methods, such as the Box–Cox transformation (Box and Cox, 1964) [1], the ACE algorithm (Breiman and Friedman, 1985) [2], and the AVAS procedure (Tibshirani, 1988 [3]), aim to improve model fit by transforming the response and explanatory variables. More recently, robust extensions of these methods have been proposed, such as the robust AVAS method (Riani et al. (2024) [4]). In parallel, partially linear and semiparametric regression models have been widely used to accommodate nonlinear covariate effects through smooth functions (Engle 45 et al., 1986; Speckman, 1988; Ruppert et al., 2003) [5,6,7]. Rather than employing transformation-based strategies, this study adopts a flexible parametric distribution embedded in a partially linear model to directly handle skewness or other non-normalities. The generalized odd log-logistic-G (GOLL-G) family (Gleaton and Lynch, 2006) [8] provides an analytically tractable alternative for building more flexible regression models. Some studies have been published using the GOLL-G family. For example, Vigas et al. (2025) [9] introduced a regression model with informative censoring mechanisms based on the GOLL-G family and recently, Rodrigues et al. (2025) [10] defined a regression model based on the odd log-logistic beta distribution with limited data, and the results were compared using machine learning methodology. These studies do not consider the presence of covariates with nonlinear effects in the systematic component.

Several partially linear regression models have been proposed in the literature. For example, Cardozo et al. (2022) [11] studied a partially linear model based on the generalized gamma distribution. Rodriguez et al. (2022) [12] developed robust estimation methods for partially linear regression models with monotonicity constraints. Vasconcelos et al. (2024) [13] investigated estimation and diagnostic aspects for partially linear regression models based on the Rice distribution. More recently, Fidelis et al. (2024), Hou-Chen et al. (2024), Cho et al. (2024), and Liu et al. (2025) [14,15,16,17] proposed extensions of partially linear models under different distributional and dependence assumptions. However, these studies fail to utilize regression techniques that can jointly capture nonlinear covariate influences and complex distributional features like asymmetry or bimodality.

Based on the work of these researchers, we build a generalized odd log-logistic normal (GOLLN) partially linear regression model to study how the COVID-19 pandemic affected coffee prices.The main contributions of this paper are as follows: (i) identify and quantify the economic, climatic, and market variables that most influenced coffee price variations during and after the pandemic; (ii) show the performance of the GOLLN partially linear regression models against other regression models to analyze the benefits of the new approach; and (iii) report simulations on the behavior of estimators and the empirical distribution of quantile residuals (qrs), and generate valuable insights for producers, exporters, and policymakers to improve the resilience of the coffee sector.

The paper is structured as follows. In Section 2 we present the material part of the paper as well as the description of the data. We also introduce the methodological part of the partially regression model and estimates the parameters using the penalized quasi-likelihood method and discuss some simulation studies. In the Section 3, we present the results demonstrating the flexibility and usefulness of the proposed regression model. In the Section 4, we discuss the main contributions and interpretations of the results. Section 5 concludes with some remarks.

2. Materials and Methods

2.1. Materials: Coffee Price Data Before and After the Pandemic

Coffee is one of the most widely consumed beverages in the world and plays a significant role in the global economy, especially in Brazil, which is the world’s largest producer and exporter of this commodity. Since the nineteenth century, Brazilian coffee farming has been a fundamental pillar of the national economy, generating billions of dollars in export revenue and millions of direct and indirect jobs. In addition to its economic importance, coffee has a profound influence on Brazilian culture and daily life, as a central element in gastronomy and social relationships. Therefore, coffee culture is deeply rooted in the identity of Brazil. In addition to its gastronomic importance, it influences the arts and daily life. The coffee sector is also a fertile field for scientific research, seeking to improve the quality and productivity of output, as well as address challenges such as climate change and market fluctuations. Therefore, coffee is not just an agricultural product, but a central element in Brazil’s socioeconomic structure, reflecting its history, culture, and capacity for innovation. Given the importance of coffee, it is important at all links in the production–consumption chain to know price trends in advance, enabling better planning of activities, both by producers, who need to buy inputs and sell their production, and to industries, which need to make decisions on purchasing the raw material. However, the coffee market is highly volatile and sensitive to various economic, climatic, and market variables. Global supply and demand, weather conditions, production costs, and currency exchange rates are just some of the variables that affect coffee prices. The COVID-19 pandemic brought significant challenges, changing consumption patterns, climate impacts, and logistical issues that have affected both global coffee production and commerce.

The first stage of the study involved collecting historical data on coffee prices from the Center for Advanced Studies in Applied Economics (CEPEA) of the University of São Paulo. We gather annual coffee production (in millions of bags) and global coffee consumption from the United States Department of Agriculture (USDA). The data collected cover the period from January 2019 to August 2024. Other relevant variables for the analysis, such as exchange rates, are taken from CEPEA, while inflation is determined from the Brazilian Institute of Geography and Statistics (IBGE). Furthermore, the pandemic period is determined based on information from the World Health Organization (WHO), considering the beginning and end of the pandemic according to the organization’s official data. We employ the following variables (for

i = 1, \dots, 68

):

$y_{i}$ : Arabica coffee price in Brazilian currency (Coffee is a key pillar of the Brazilian economy. Therefore, studying price trends is vital, given their profound influence on export revenue, the national trade balance, and consumer-level inflation.);
$x_{i 1}$ : Robusta coffee price in Brazilian currency (Record-breaking prices for Brazilian robusta (conilon) are reshaping the economy, profoundly affecting the cost of living, trade balance, and earnings for producers.);
$x_{i 2}$ : Global consumption (As the world’s top coffee producer and exporter, Brazil holds a dominant position in both global supply and local consumption.);
$x_{i 3}$ : Presence of the pandemic (0 = beforehand, 1 = afterward) (The COVID-19 pandemic has profoundly influenced the coffee market overall, altering production, consumption patterns, and global supply chains.);
$t_{i}$ : Broad National Consumer Price Index (IPCA) (The IPCA directly influences the price of coffee, serving as the primary measure of the increased costs consumers face when purchasing the product in supermarkets).

The IPCA has been produced by IBGE since December 1979. This index began being used as the country’s official index, adjusting wages, rents, and the interest rate on passbook savings accounts, in addition to other monetary assets from November 1985.

2.2. Methods: New Partial Regression Model

We take the parent model as normal

N (μ, σ^{2})

, where

μ \in R

and

σ > 0

. Let

ϕ (\cdot)

and

Φ (\cdot)

be the probability density function (pdf) and the cumulative distribution function (cdf) of the standard normal, respectively.

Following Cordeiro et al. (2017) [18], the GOLLN cdf with two positive shape parameters (

α

and

θ

) is

\begin{matrix} F (y; α, θ, τ) = \frac{Φ^{α θ} (\frac{y - μ}{σ})}{Φ^{α θ} (\frac{y - μ}{σ}) + {[1 - Φ^{θ} (\frac{y - μ}{σ})]}^{α}}, \end{matrix}

(1)

where

τ = (μ, σ^{2})

and

y \in R

.

Let

Y \sim GOLLN (α, θ, μ, σ)

have cdf (1). The pdf of Y follows after differentiating (1)

\begin{matrix} f (y; α, θ, τ) & = & \frac{α θ ϕ (\frac{y - μ}{σ}) Φ^{α θ - 1} (\frac{y - μ}{σ}) {[1 - Φ^{θ} (\frac{y - μ}{σ})]}^{α - 1}}{σ {\{Φ^{α θ} (\frac{y - μ}{σ}) + {[1 - Φ^{θ} (\frac{y - μ}{σ})]}^{α}\}}^{2}} . \end{matrix}

(2)

The plots in Figure 2 reveal the bimodality and asymmetry of the density of Y. We emphasize that when the values of the shape parameter

α

are in the range

(0, 1)

, the GOLNN and OLLN distributions allow model the presence of bimodality in the data.

The quantile function (qf) of Y becomes

y = Q (u) = Q_{N} ({[\frac{{(\frac{u}{1 - u})}^{1 / α}}{1 + {(\frac{u}{1 - u})}^{1 / α}}]}^{1 / θ}),

(3)

where

Q_{N} (u) = Φ^{- 1} (u)

. Thus, a random sample from the GOLLN distribution is generated from (3).

2.3. The GOLLN Partially Linear Regression

We construct the GOLLN partially linear regression (GOLLNPLR) under the penalized cubic smoothing spline. The parameter

μ_{i}

is a function of a vector of known explanatory variables

x_{i} = {(x_{i 1}, \dots, x_{i p})}^{⊤}

and

t = {(t_{1}, \dots, t_{n})}^{⊤}

(nonlinear effects), namely,

\begin{matrix} μ_{i} = x_{i}^{⊤} β + g (t_{i}), i = 1, \dots, n, \end{matrix}

(4)

where

β = {(β_{1}, \dots, β_{p})}^{⊤}

is the unknown vector and

g (\cdot)

is a smooth function of

t_{i}

. The penalty based on the second-order derivative of

g (\cdot)

(see O’Sullivan, 1986 [19]) is taken for Equation (4).

Some new special cases of the GOLLNPLR are: OLLNPLR (

θ = 1

), exponenentiated normal PLR (

α = 1

) and NPLR (

α = θ = 1

).

The parameter vector is

η = {(α, θ, β^{⊤}, σ)}^{⊤}

and the smoothing parameter is

λ > 0

. Let

g (t_{i})

be a second-order differentiable cubic smoothing spline, where

t_{1}^{0} < \dots < t_{q}^{0}

are ordered knots and q is the number of knots controlled non-parametrically.

The penalty method can be expressed in matrix notation (Green and Silverman, 1993 [20]). Let

u_{i}

be the distance between the control nodes i and

i + 1

, say

u_{i} = t_{i + 1}^{0} - t_{i}^{0}

(

i = 1, \dots, q - 1

). We define the

q \times (q - 2)

tridiagonal matrix

C

whose elements (

i = 1, \dots, q

and

j = 2, \dots, q - 1

) are:

c_{i j} = 0

for

| i - j | \geq 2

,

c_{j - 1, j} = u_{j - 1}^{- 1}, c_{j j} = - u_{j - 1}^{- 1} - u_{j}^{- 1}

, and

c_{j + 1, j} = u_{j}^{- 1}

.

Next, the elements of the symmetric matrix of order

(q - 2)

K

(

i = 2, \dots, q

and

j = 2, \dots, q - 1

) are:

k_{i j} = 0

for

| i - j | \geq 2

,

k_{i i} = (u_{i - 1} + u_{i}) / 3

, and

k_{i, i + 1} = k_{i + 1, i} = u_{i} / 6

. Let

D = C K^{- 1} C^{⊤}

be a positive definite matrix.

The n independent observations

y_{i}

’s have pdf

f (y_{i}; η_{i})

in (2), and nonlinear effects

g = (g (t_{1}^{0}), \dots, g (t_{q}^{0}))

, which are determined by maximizing the penalized quasi-likelihood (PQL) function:

\begin{matrix} l_{p} (η, g) & = & n log (α θ) + n log (σ^{- 1}) + (α θ - 1) \sum_{i = 1}^{n} log [Φ (z_{i})] + (α - 1) \sum_{i = 1}^{n} [1 - Φ^{θ} (z_{i})] \\ - & 2 \sum_{i = 1}^{n} log \{Φ^{α θ} (z_{i}) + {[1 - Φ^{θ} (z_{i})]}^{α}\} - \frac{λ}{2} g^{⊤} D g, \end{matrix}

(5)

where

z_{i} = (y_{i} - μ_{i}) / σ

.

Maximum penalized quasi-likelihood estimates (MPLEs) are determined as the parameter values that maximize (5). The nonlinear component

g (\cdot)

is approximated by a cubic smoothing spline with a second-order difference penalty, which controls the smoothness of the function through the smoothing parameter

λ

. This penalty induces a trade-off between goodness-of-fit and smoothness, leading to stable estimation of nonlinear effects. The resulting model can be estimated within the penalized quasi-likelihood framework, yielding point estimates and standard errors for all model parameters.

Computational Implementation

Model estimation was carried out using the gamlss framework in R ([21]; version 4.3.1). In this context, the smooth function

g (\cdot)

is implemented using the function pb(), which represents a penalized B-spline (P-spline) with a second-order difference penalty. The smoothing parameter is automatically selected within the penalized quasi-likelihood procedure using the RS algorithm (Lee et al., 2006; Rigby and Stasinopoulos, 2014 [22,23]). This implementation provides a computationally efficient representation of the theoretical spline-based formulation described above.

2.4. Residual Analysis

The qrs (Dunn and Smyth, 1996 [24]) for the new regression model are

\begin{matrix} {q r}_{i} = Φ^{- 1} \{\frac{Φ^{\hat{α} \hat{θ}} (\frac{y_{i} - {\hat{μ}}_{i}}{\hat{σ}})}{Φ^{\hat{α} \hat{θ}} (\frac{y_{i} - {\hat{μ}}_{i}}{σ}) + {[1 - Φ^{\hat{θ}} (\frac{y_{i} - {\hat{μ}}_{i}}{\hat{σ}})]}^{\hat{α}}}\}, \end{matrix}

(6)

where

{\hat{μ}}_{i} = x_{i}^{⊤} \hat{β}

.

Atkinson (1985) proposed a method for building an envelope to better interpret the probability plot of the residuals. The envelope bands based on these residuals are well-established in the paper.

2.5. Simulation Study

A Monte Carlo simulation study was conducted to assess the finite-sample performance of the maximum likelihood estimates (MLEs) in the proposed regression model, focusing on the analysis of bias and mean squared error (MSEs) across various sample sizes and censoring percentages, as well as examining the empirical distribution of the corresponding qrs.

We generate

r = 1000

replicates using the gamlss package with two covariates

x_{1} \sim Binomial (1, 0.5)

and

t \sim Uniform (0, 1)

:

Scenario 1 (cubic structure for the nonlinear part): $μ_{i} = β_{1} x_{1 i} + g (t_{i})$ , where $g (t_{i}) = 2 t_{i} + \sin (2 π t_{i})$ gives a cubic structure for the nonlinear part of the regression model.
The true parameters are: $β_{1} = 0.4$ , $log (σ) = - 1.6$ , $log (α) = 0.5$ and $log (θ) = 0.6$ .
Scenario 2 (quadratic structure for the nonlinear part): $μ_{i} = β_{1} x_{1 i} + h (t_{i})$ , where $g (t_{i}) = \sin (π t_{i})$ , $β_{1} = 0.6$ , $log (σ) = - 1.3$ , $log (α) = 0.4$ and $log (θ) = 0.5$ .

The algorithm has the steps: (i) generate

x_{1 i} \sim Binomial (1, 0.5)

and

t_{i} \sim Uniform (0, 1)

; (ii) find

μ_{i}

for each scenario; (iii) generate

u_{i} \sim Uniform (0, 1)

and the data from Equation (6); (iv) calculate the mean estimates (MEs), biases, MSEs and qrs. The sample sizes (

n = 50, 150

and 450) were chosen to represent small, moderate, and large sample scenarios, respectively. This range allows for a systematic assessment of the finite-sample and asymptotic behavior of the proposed estimators, including bias, mean squared error, and confidence interval coverage, and reflects sample sizes commonly encountered in empirical applications.

Table 1 shows that the biases and MSEs decrease, and the mean standard errors (Mean SEs) and average confidence intervals (ACIs) systematically decrease when n grows bigger. This fact highlights the gain in precision and consistency of the estimators. Additionally, the empirical coverage probabilities of the confidence intervals approach the nominal level with increasing n, especially for moderate and large samples, reinforcing the adequacy of the inferential procedure adopted. Figure 3 reveals that the generated smoothed curves approach the true curve as n increases. So, the estimator of the non-parametric part is also consistent.

The regression model can accommodate nonlinear relationships without providing their forms. Figure A1 in Appendix A reveals superimposed QQ plots for all simulation residuals. The close alignment of the empirical curves with the reference line of the standard normal shows that these residuals follow approximately this distribution, particularly for large n.

3. Results

We begin this analysis by presenting the scatterplots (see Figure 1b) of the response variable in relation to the covariates considered in the study with their respective approximate nonparametric curves. As illustrated in Figure 1b, the relationship between the price of Arabica coffee and the IPCA is nonlinear. Thus, we utilize the GOLLNPLR to examine which of these nonlinear effects are statistically significant. Figure 1c highlights the disparate Arabica coffee price behavior before and after the pandemic, based on the pandemic covariate boxplot.

We compare the GOLLN distribution with three special models to understand which distribution is most appropriate for modeling the coffee prices. Table 2 reports the MLEs and their standard errors (SEs) (in parentheses), Global Deviance (GD), Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC) for four distributions. These results indicate that the GOLLN and OLLN distributions can be chosen as the best models for these data. Table 2 reveals that the estimates of

α

are in the range (0, 1), indicating the presence of bimodality in the data.

Figure 4 reports histogram and estimated GOLLN, OLLN, ExpN and normal densities. The estimated GOLLN and OLLN densities are bimodal, and they are very suitable for the data.

3.1. Discussion of the GOLLNPLR

It is considered the systematic component

μ_{i} = β_{0} + β_{1} x_{i 1} + β_{2} x_{i 2} + β_{3} x_{i 3} + g (t_{i}), i = 1, \dots, 68 .

The three statistics for the PLR models are listed in Table 3. So, the OLLNPLR is the most plausible model for explaining Arabica coffee prices.

Likelihood ratio (LR) statistics indicate that the GOLLN model is not significantly different from the OLLN sub-model (LR = 0.28,

p = 0.61

), while it provides a significantly better fit than the ExpN (LR = 7.93,

p < 0.001

) and Normal (LR = 11.07,

p < 0.001

) models, confirming the superiority of GOLLN and OLLN over the remaining alternatives. Table 4 presents the MPLEs along with their SEs, p-values, and 95% confidence intervals (CIs) for the fitted OLLNPLR model. The covariates

x_{1}

,

x_{2}

and

x_{3}

are statistically significant at the 5% level.

3.2. Residual Analysis

Figure 5a presents the qrs of the fitted OLLNPLR model, where all observations are random in

(- 3, 3)

. Hence, the model provides a good fit. The normal probability plot of the qrs with the simulated envelope in Figure 5b, constructed using a 5% significance level, also supports the adequacy of the fitted model.

4. Discussion

4.1. Linear Effects

Table 4 shows that as the price of Robusta coffee increases, the price of Arabica coffee also increases significantly.
There is a small but significant drop in global consumption (as expected) when the price of coffee increases.

4.2. Pandemic Effect

The price of Arabica coffee is significantly different from the levels before and after the pandemic.
Figure 6b displays the reliability curves of the levels of pandemic to the price of Arabica coffee. Figure 6b shows a good model for both levels of the pandemic and a difference between the levels to the price of Arabica coffee.

4.3. Nonlinear Effects

The IPCA values $(t_{i})$ are explicit in the horizontal axis of Figure 6a. The vertical axis indicates the penalized smoothers to the fitted prices, which shows that the nonlinear effect of the IPCA. Thus, the use of penalized smoothers is relevant. Moreover, the penalized smoothers for the covariate $t_{i}$ present an increasing period of the value of the price of coffee and from the value of IPCA 6250 (approximately) the decrease in the price of Arabica coffee is evident and then apparently the price of coffee tends to increase slightly from IPCA 6700.

4.4. Practical Implications

As the world’s leading producer and exporter, Brazil holds a central position in the global coffee industry, with the commodity serving as a cornerstone of its national economy since the 19th century. The sector generates billions in export revenue and creates millions of jobs, acting as a fundamental economic pillar. Beyond commerce, coffee is deeply ingrained in Brazilian culture, culture, and daily life, serving as a vital component of social gatherings and gastronomy. However, the coffee market is highly volatile and sensitive to various economic, climatic, and market variables. Global supply and demand, weather conditions, production costs, and exchange rates are just some of the variables that affect coffee prices. The COVID-19 pandemic brought significant challenges, altering consumption patterns, impacting the climate, and addressing logistical issues that affected both global coffee production and trade. Using a GOLLN-based partial regression model, this study analyzes pre- and post-pandemic coffee prices. The objective is to identify and measure the impact of influential variables, including production costs, supply and demand, weather conditions, and exchange rates. An analysis of pre- and post-pandemic data is conducted to evaluate the structural impacts of recent external shocks on the coffee sector. Based on this, the study identifies key insights aimed at enhancing the strategic planning and operational resilience of producers, exporters, and policymakers in navigating future crises.

5. Conclusions

We proposed a new partial linear regression model based on the generalized odd log-logistic normal (GOLLN) distribution to study the behavior of Arabica coffee prices, considering several factors, including the absence and presence of the pandemic. The dataset showed a nonlinear relationship between the price of Arabica coffee and the National Consumer Price Index (IPCA). Furthermore, there is a linear relationship between the price of Arabica coffee and the covariates: Robusta coffee price, global consumption, and the presence of the pandemic. In general, we present different comparisons between all the variables involved in the research, with their respective interpretations. A simulation study confirmed that the new GOLLN partial linear regression model was adequate to capture different nonlinear and linear forms and presented consistent estimates. This regression model can also be used to model other response variables from other research fields, where the variables exhibit a nonlinear relationship. The model can also be useful in cases where the response variable is bimodal and/or asymmetric. As future work, we suggest comparing the new GOLLN partial linear regression model with machine learning methods in terms of prediction, considering datasets with a larger number and observations, and more covariates. Lastly, time series regression models can be implemented. One approach is to use partial regression models, incorporating GOLLN distribution with autoregressive errors to capture potential serial correlation in the data.

Author Contributions

Conceptualization, E.M.M.O., G.M.R., K.S.J. and G.M.C.; methodology, E.M.M.O., G.M.R., K.S.J. and G.M.C.; software, E.M.M.O., G.M.R., K.S.J. and G.M.C.; validation, E.M.M.O., G.M.R., K.S.J. and G.M.C.; formal analysis, E.M.M.O., G.M.R., K.S.J. and G.M.C.; investigation, E.M.M.O., G.M.R., K.S.J. and G.M.C.; resources, E.M.M.O., G.M.R., K.S.J. and G.M.C.; data curation, E.M.M.O., G.M.R., K.S.J. and G.M.C.; writing—original draft preparation, E.M.M.O., G.M.R., K.S.J. and G.M.C.; writing—review and editing, E.M.M.O., G.M.R., K.S.J. and G.M.C.; visualization, E.M.M.O., G.M.R., K.S.J. and G.M.C.; supervision, E.M.M.O., G.M.R., K.S.J. and G.M.C.; project administration, E.M.M.O., G.M.R., K.S.J. and G.M.C.; funding acquisition, E.M.M.O., G.M.R., K.S.J. and G.M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding CNPq, CAPES and FAPESP—Grant 2024/10798-4.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data is available at: https://github.com/gabrielamrodrigues/Coffee_price (accessed on 18 March 2026).

Acknowledgments

The authors acknowledge financial support from CNPq, CAPES, Brazil. Edwin M. M. Ortega would like to acknowledge the support of the FAPESP—Grant 2024/10798-4.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

IPCA	National consumer price index
GOLL-G	Generalized odd log-logistic-G
GOLLN	Generalized odd log-logistic normal
OLLN	Odd log-logistic normal
ExpN	Exponentiated Normal
qf	Quantile function
GOLLNPLR	GOLLN partially linear regression
OLLNPLR	OLLN partially linear regression
ExpNPLR	Exponentiated normal partially linear regression
NPLR	Normal partially linear regression
MPLEs	Maximum penalized quasi-likelihood estimates
qrs	Quantile residuals
MSEs	Mean squared erros
SEs	mean standard errors Mean
ACIs	Average confidence intervals
CPs	Coverage probabilities
CEPEA	Center for Advanced Studies in Applied Economics
GD	Global Deviance
AIC	Akaike Information Criterion
BIC	Bayesian Information Criterion

Appendix A

Figure A1. Normal QQ plots of the quantile residuals (qrs) from all simulation replicates. The solid black line represents the theoretical reference line for the standard normal distribution, while the shaded region corresponds to the QQ plots obtained from the 1000 simulation replicates, illustrating variability. Panels (a–c) refer to Scenario 1 with

n = 50

,

n = 150

, and

n = 450

, respectively, and panels (d–f) refer to Scenario 2 with the same sample sizes.

Figure A1. Normal QQ plots of the quantile residuals (qrs) from all simulation replicates. The solid black line represents the theoretical reference line for the standard normal distribution, while the shaded region corresponds to the QQ plots obtained from the 1000 simulation replicates, illustrating variability. Panels (a–c) refer to Scenario 1 with

n = 50

,

n = 150

, and

n = 450

, respectively, and panels (d–f) refer to Scenario 2 with the same sample sizes.

References

Box, G.E.P.; Cox, D.R. An analysis of transformations. R. Stat. Soc. Ser. B 1964, 26, 211–243. [Google Scholar] [CrossRef]
Breiman, L.; Friedman, J.H. Estimating optimal transformations for multiple regression and correlation. J. Am. Stat. Assoc. 1985, 80, 580–598. [Google Scholar] [CrossRef]
Tibshirani, R. Estimating transformations for regression via additivity and variance stabilization. J. Am. Stat. Assoc. 1988, 83, 394–405. [Google Scholar] [CrossRef]
Riani, M.; Atkinson, A.C.; Corbellini, A. Robust transformations for multiple regression via additivity and variance stabilization. J. Comput. Graph. Stat. 2024, 33, 85–100. [Google Scholar] [CrossRef]
Engle, R.F.; Granger, C.W.J.; Rice, J.; Weiss, A. Semiparametric estimates of the relation between weather and electricity sales. J. Am. Stat. Assoc. 1986, 81, 310–320. [Google Scholar] [CrossRef]
Speckman, P. Kernel smoothing in partial linear models. R. Stat. Soc. Ser. B 1988, 50, 413–436. [Google Scholar] [CrossRef]
Ruppert, D.; Wand, M.P.; Carroll, R.J. Semiparametric Regression; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
Gleaton, J.U.; Lynch, J.D. Properties of generalized log-logistic families of lifetime distributions. J. Probab. Stat. Sci. 2006, 4, 51–64. [Google Scholar]
Vigas, V.P.; Ortega, E.M.M.; Cordeiro, G.M.; Silva, G.L.; Santos, J.P.C. A new regression model under informative censoring mechanism with applications. Commun. Stat-Theor. M. 2025, 54, 2578–2598. [Google Scholar] [CrossRef]
Rodrigues, G.M.; Cordeiro, G.M.; Ortega, E.M.M.; Vila, R. New regression model and machine learning for fitting proportional data with application. Statistics 2025, 59, 498–516. [Google Scholar] [CrossRef]
Cardozo, C.A.; Paula, G.A.; Vanegas, L.H. Generalized log-gamma additive partial linear models with P-spline smoothing. Stat. Pap. 2022, 63, 1953–1978. [Google Scholar] [CrossRef]
Rodríguez, D.; Valdora, M.; Vena, P. Robust estimation in partially linear regression models with monotonicity constraints. Commun. Stat. Simul. Comput. 2022, 51, 2039–2052. [Google Scholar] [CrossRef]
Vasconcelos, J.C.S.; Ortega, E.M.M.; Cordeiro, G.M.; Vasconcelos, J.; Biaggioni, M.A.M. Estimation and diagnostic for a partially linear regression based on an extension of the Rice distribution. Revstat Stat. J. 2024, 22, 433–454. [Google Scholar]
Fidelis, C.R.; Ortega, E.M.M.; Prataviera, F.; Vila, R.; Cordeiro, G.M. Reparametrized Generalized Gamma Partially Linear Regression with Application to Breast Cancer data. J. Appl. Stat. 2024, 51, 3248–3265. [Google Scholar] [CrossRef] [PubMed]
Chou-Chen, S.W.; Oliveira, R.A.; Raicher, I.; Paula, G.A. Additive partial linear models with autoregressive symmetric errors and its application to the hospitalizations for respiratory diseases. Stat. Pap. 2024, 65, 5145–5166. [Google Scholar] [CrossRef]
Cho, S.; Jeon, J.M.; Kim, D.; Yu, K.; Park, B.U. Partially Linear Additive Regression with a General Hilbertian Response. J. Am. Stat. Assoc. 2024, 119, 942–956. [Google Scholar] [CrossRef]
Liu, Y.; Lu, J.; Paula, G.A.; Liu, S. Bayesian diagnostics in a partially linear model with first-order autoregressive skew-normal errors. Comput. Stat. 2025, 40, 1021–1051. [Google Scholar] [CrossRef]
Cordeiro, G.M.; Alizadeh, M.; Ozel, G.; Hosseini, B.; Ortega, E.M.M.; Altun, E. The generalized odd log-logistic family of distributions: Properties, regression models and applications. J. Stat. Comput. Simul. 2017, 87, 908–932. [Google Scholar] [CrossRef]
O’Sullivan, F. A Statistical Perspective on Ill-Posed Inverse Problems. Stat. Sci. 1986, 1, 502–527. [Google Scholar]
Green, P.; Silverman, B. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach; Chapman and Hall/CRC: New York, NY, USA, 1993. [Google Scholar]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2023; Available online: https://www.R-project.org/ (accessed on 18 March 2026).
Lee, Y.; Nelder, J.A.; Pawitan, Y. Generalized Linear Models with Random Effects: Unified Analysis via H-Likelihood; Chapman & Hall/CRC: New York, NY, USA, 2006. [Google Scholar]
Rigby, R.A.; Stasinopoulos, D.M. Automatic smoothing parameter selection in GAMLSS with an application to centile estimation. Stat. Methods Med. Res. 2014, 23, 318–332. [Google Scholar] [CrossRef] [PubMed]
Dunn, P.K.; Smyth, G.K. Randomized quantile residuals. J. Comput. Graph. Stat. 1996, 5, 236–244. [Google Scholar] [CrossRef]

Figure 1. (a) Arabica coffee price frequency histogram. (b) Scatter diagram between the IPCA and price of Arabica coffee. (c) Boxplot of the price of Arabica coffee in relation to the pandemic covariate.

Figure 2. The GOLLN density.

Figure 3. The curves of

g (t_{i})

for two scenarios and different sample sizes. The first row corresponds to scenario 1 with

n = 50

,

n = 150

, and

n = 450

(from left to right). The second row corresponds to scenario 2 with

n = 50

,

n = 150

, and

n = 450

(from left to right).

Figure 3. The curves of

g (t_{i})

for two scenarios and different sample sizes. The first row corresponds to scenario 1 with

n = 50

,

n = 150

, and

n = 450

(from left to right). The second row corresponds to scenario 2 with

n = 50

,

n = 150

, and

n = 450

(from left to right).

Figure 4. Four fitted densities to the price of Arabica coffee data.

Figure 5. (a) Index plot of the qrs: the solid line indicates zero, and the dashed lines represent reference bounds for model adequacy. (b) Normal probability plot of the qrs with simulated envelope at the 5% significance level.

Figure 6. (a) Form of the penalized smoother

p b (\cdot)

for covariate t using the OLLNPLR. The red line represents the estimated smooth effect, and the shaded region corresponds to its confidence band. (b) Empirical and estimated reliability functions by

x_{3}

.

Figure 6. (a) Form of the penalized smoother

p b (\cdot)

for covariate t using the OLLNPLR. The red line represents the estimated smooth effect, and the shaded region corresponds to its confidence band. (b) Empirical and estimated reliability functions by

x_{3}

.

Table 1. Simulation results: biases, MSEs, Mean SEs, ACIs, and coverage probabilities (CPs).

Scenario 1		$n = 50$					$n = 150$					$n = 450$
$η$	Value ¹	Bias	MSE	Mean SE	ACI	CP	Bias	MSE	Mean SE	ACI	CP	Bias	MSE	Mean SE	ACI	CP
$β_{1}$	0.40	0.00	0.00	0.03	0.10	0.883	−0.00	0.00	0.02	0.06	0.927	−0.00	0.00	0.00	0.04	0.933
$log (σ)$	−1.60	−0.09	0.02	0.09	0.36	0.819	−0.04	0.00	0.05	0.21	0.914	−0.01	0.00	0.03	0.12	0.963
$log (α)$	0.50	0.06	0.01	0.09	0.35	0.970	0.02	0.00	0.05	0.20	0.996	0.01	0.00	0.03	0.12	0.999
$log (θ)$	0.60	0.03	0.00	0.12	0.47	0.986	0.01	0.00	0.07	0.27	0.991	0.00	0.00	0.04	0.15	0.993
Scenario 2		$n = 50$					$n = 150$					$n = 450$
$η$	Value ¹	Bias	MSE	Mean SE	CI Length	CP	Bias	MSE	Mean SE	CI Length	CP	Bias	MSE	Mean SE	CI Length	CP
$β_{1}$	0.60	0.00	0.00	0.04	0.17	0.905	−0.00	0.00	0.04	0.10	0.952	−0.00	0.00	0.01	0.06	0.936
$log (σ)$	−1.30	−0.08	0.02	0.10	0.40	0.864	−0.03	0.01	0.06	0.23	0.928	−0.01	0.00	0.03	0.13	0.952
$log (α)$	0.40	0.04	0.00	0.09	0.39	0.994	0.01	0.00	0.06	0.22	0.997	0.00	0.00	0.03	0.13	0.996
$log (θ)$	0.50	0.01	0.00	0.12	0.47	0.988	0.00	0.00	0.07	0.27	0.988	0.00	0.00	0.04	0.15	0.986

¹ True parameter value used in the simulation.

Table 2. Findings from some fitted models.

Model	$α$	$θ$	$μ$	$σ$	GD	AIC	BIC
GOLLN	0.1080	1.9367	787.09	87.7908	970.0	978.0	986.9
	(0.0157)	(0.6688)	(47.6979)	(6.4167)
OLLN	0.1324	1	870.52	84.1513	972.7	978.7	985.4
	(0.0159)		(14.8631)	(3.2403)
ExpN	1	1.0254	866.18	348.51	987.4	993.4	1000.1
		(13.0130)	(405.35)	(1311.95)
Normal	1	1	872.87	344.36	987.5	991.4	995.9
			(41.7476)	(29.5205)

Table 3. Information criteria.

Model	Parametric Regression			Partial Linear Regression
Model	GD	AIC	BIC	GD	AIC	BIC
Normal	814.87	826.87	840.19	733.05	757.33	784.28
ExpN	809.07	823.07	838.61	729.90	755.40	783.69
OLLN	801.87	815.87	831.41	721.97	747.56	775.96
GOLLN	792.21	808.21	825.97	729.08	756.94	787.86

Bold values indicate the best results. For GD, AIC, and BIC, smaller values represent better model fit.

Table 4. Findings from the fitted OLLNPLR model.

$η, g$	MPLE	SE	p-Value	CI
$β_{0}$	6.817	0.770	<0.001	(5.308, 8.326)
$β_{1}$	0.001	0.000	<0.001	(0.001, 0.001)
$β_{2}$	−0.014	0.005	0.011	(−0.024, −0.004)
$β_{3}$	0.175	0.016	<0.001	(0.144, 0.205)
$log (σ)$	5.675	0.091	<0.001	(5.497, 5.854)
$log (α)$	1.913	0.092	<0.001	(1.733, 2.092)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ortega, E.M.M.; Rodrigues, G.M.; Jang, K.S.; Cordeiro, G.M. A New Partially Linear Regression with an Application to the Price of Coffee Before and After the Pandemic. Stats 2026, 9, 40. https://doi.org/10.3390/stats9020040

AMA Style

Ortega EMM, Rodrigues GM, Jang KS, Cordeiro GM. A New Partially Linear Regression with an Application to the Price of Coffee Before and After the Pandemic. Stats. 2026; 9(2):40. https://doi.org/10.3390/stats9020040

Chicago/Turabian Style

Ortega, Edwin M. M., Gabriela M. Rodrigues, Kwan Sung Jang, and Gauss M. Cordeiro. 2026. "A New Partially Linear Regression with an Application to the Price of Coffee Before and After the Pandemic" Stats 9, no. 2: 40. https://doi.org/10.3390/stats9020040

APA Style

Ortega, E. M. M., Rodrigues, G. M., Jang, K. S., & Cordeiro, G. M. (2026). A New Partially Linear Regression with an Application to the Price of Coffee Before and After the Pandemic. Stats, 9(2), 40. https://doi.org/10.3390/stats9020040

Article Menu

A New Partially Linear Regression with an Application to the Price of Coffee Before and After the Pandemic

Abstract

1. Introduction

2. Materials and Methods

2.1. Materials: Coffee Price Data Before and After the Pandemic

2.2. Methods: New Partial Regression Model

2.3. The GOLLN Partially Linear Regression

Computational Implementation

2.4. Residual Analysis

2.5. Simulation Study

3. Results

3.1. Discussion of the GOLLNPLR

3.2. Residual Analysis

4. Discussion

4.1. Linear Effects

4.2. Pandemic Effect

4.3. Nonlinear Effects

4.4. Practical Implications

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI