Transfer Learning of High-Dimensional Stochastic Frontier Model via Elastic Net

Chen, Jiahao; Chen, Wenjun; Song, Yunquan

doi:10.3390/axioms14070507

Open AccessArticle

Transfer Learning of High-Dimensional Stochastic Frontier Model via Elastic Net

by

Jiahao Chen

¹

,

Wenjun Chen

^2,* and

Yunquan Song

^1,*

¹

College of Science, China University of Petroleum, Qingdao 266580, China

²

Student Career Guidance Center, China University of Petroleum (East China), Qingdao 266580, China

^*

Authors to whom correspondence should be addressed.

Axioms 2025, 14(7), 507; https://doi.org/10.3390/axioms14070507

Submission received: 22 May 2025 / Revised: 24 June 2025 / Accepted: 26 June 2025 / Published: 28 June 2025

(This article belongs to the Special Issue Methods and Applications of Advanced Statistical Analysis, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

In this paper, the high-dimensional stochastic frontier model problem is explored via Elastic Net under the transfer learning framework. When the target data is limited, transfer learning improves the accuracy of model estimation and prediction by transferring the source data. When the transferable source is known, a transfer learning algorithm for a high-dimensional stochastic frontier model is proposed based on Elastic Net. In addition, based on the prior knowledge of the parameters, this paper introduces linear constraints to improve the estimation accuracy in transfer learning. When the transferable source is unknown, this paper designs a corresponding algorithm to detect the transferable source. Finally, the effectiveness of the method is proved by simulation experiments and actual cases.

Keywords:

high-dimensional stochastic frontier model; transfer learning; linear constraints; Elastic Net

MSC:

62P20; 68T05

1. Introduction

Efficiency analysis is one of the main methods of studying economic growth rate and determining the causes of economic growth. Stochastic frontier analysis is a common method of efficiency analysis, which is widely used in economics, management, and other fields. Aigner et al. [1], Meeusen and Broeck [2] and Battese and Corra [3] have proposed the stochastic frontier analysis (SFA), which is based on the deterministic production frontier model proposed by Aigner and Chu [4], and adds a random error term (denoted as v) so that the frontier production function also has randomness. It is usually assumed that this error term is an ordinary white noise error term and obeys a normal distribution. Generally, u is used to represent the technical inefficiency term, assuming that it is non-negative and independent of v. The stochastic frontier function can be expressed as

f (X; β) \cdot e x p (v - u)

, where

f (X)

denotes the functional form of the production function, typically specified as a Cobb–Douglas function.

The most obvious difference between the stochastic frontier model and the general econometric model is that there is a composite error term

v - u

. In order to estimate the parameters contained in the error term and calculate the technical efficiency, it is necessary to make more detailed assumptions on the distribution of the error term. Battese and Corra [3] assume that u obeys a half normal distribution, which is the most commonly adopted assumption in stochastic frontier models. Meeusen and Broeck [2] assume that u obeys exponential distribution, which has good properties and is often used. Greene [5,6] proposes the Gamma distribution hypothesis of the inefficiency term u, but it has been questioned. Stevenson [7] proposes the truncated normal distribution assumption of u. In our paper, the most commonly used half normal distribution hypothesis is used for model estimation. Stochastic frontier analysis is a common method for efficiency measurement. After a long period of development, a series of theoretical innovations on stochastic frontier model setting and estimation and technical efficiency estimation and inference have emerged.

In the era of big data, data sets are often massive and diverse. However, in practical applications, the amount of data collected or generated by people is often not enough to provide accurate models for estimation and prediction. For example, when using stochastic frontier analysis to analyze the production efficiency of production units, the analysis performance depends on a large amount of labeled data, but obtaining sufficient labeled data to train the model is usually expensive and time-consuming. Moreover, the data collected or generated by people generally exist in high-dimensional form, so using stochastic frontier analysis requires us to consider the problem that the amount of data is too small and the data is high-dimensional.

In order to solve the problem that the amount of data is too small, researchers have begun to consider integrating different but related data sets to improve the accuracy of parameter estimation and prediction. The idea of transfer learning is to find relevant data sets from existing domains and reduce data set bias by learning in new target domains. So far, transfer learning has been widely used in various fields such as computer science [8], health care [9] and power security [10], which fully proves the effectiveness of transfer learning.

High-dimensional data analysis has attracted people’s attention for a long time in the past, and many effective methods have been proposed. The Elastic Net developed by Zou and Hastie [11] is a very efficient and practical method. Elastic Net is a hybrid regularization technique that has gained significant attention due to its remarkable ability to handle high-dimensional data, address collinearity, and select relevant features. In addition, Jia and Yu [12] and Yuan and Lin [13] establish the theoretical properties of the Elastic Net, such as statistical consistency and convergence rate, which provided a solid foundation for its practical application. Similar to transfer learning, high-dimensional data analysis often encounters the challenge of limited sample size, where the number of samples is significantly smaller than the data dimension. Therefore, people began to apply transfer learning to high-dimensional problems.

In order to achieve this goal, Bastani [14] adopts transfer learning in the context of high-dimensional problems, mainly focusing on estimation and prediction in high-dimensional linear models. Following the traditional framework of transfer learning, Bastani [14] operates under the assumption that there is a certain degree of similarity between the target model and the source model. If the similarity condition is violated, negative transfer may occur, which means that the increase in sample size will actually lead to worse training and prediction results of the model. Notably, Bastani [14] assumes that the objective model is high-dimensional, while ensuring that the size of the source data is much larger than the dimension. In addition, the study only considers a single source model and does not explore scenarios involving multiple sources. As a way of further improvement, Li et al. [15] consider multiple sources, with both the source and target models situated in high-dimensional settings. Based on the work of Li et al. [15,16], the concept of transfer learning is extended to high-dimensional generalized linear models. They introduce an algorithm-free method to identify transferable sources in the source detection context, and always select transferable sources to prevent negative transfer.

Building on the research of Bastani [14] and Li et al. [15], Meng et al. [17] introduce a novel two-step transfer learning algorithm for cases where the transferable domain is known. This algorithm employs the Elastic Net penalty in both the transfer and debiasing steps. In addition, when the transferable source is unknown, a new algorithm for detecting the transferable source through the Elastic Net is also proposed. Furthermore, based on the high-dimensional model, Chen et al. [18] add the constraints of prior knowledge and study robust transfer learning. Finally, through experiments, it is proved that it is feasible to use transfer learning for high-dimensional models with linear constraints, and the performance of the model is greatly improved. When the transferable source is unknown, a constrained transferable source detection algorithm is proposed. The above high-dimensional problems are mostly based on linear regression models. For high-dimensional stochastic frontier models, Qingchuan and Shichuan [19] pioneered the use of the Alasso penalty method to select variables in stochastic frontier models, and achieved good results.

As far as we know, no one has applied the Elastic Net to the stochastic frontier model. Inspired by the work of Meng et al. [17] and Qingchuan and Shichuan [19], we explore the use of the Elastic Net in high-dimensional stochastic frontier models to perform variable selection and address multicollinearity issues, and further apply this approach in a transfer learning framework. To enhance the accuracy of coefficient estimation and improve overall model performance, we are additionally inspired by Chen et al. [18]. According to the prior knowledge, we add equality and inequality constraints to limit the coefficient estimation to a certain range.

This paper proposes transfer learning of high-dimensional stochastic frontier model via Elastic Net. To improve the accuracy of coefficient estimation and enhance model performance, inspired by Chen et al. [18], we incorporate equality and inequality constraints based on prior knowledge to restrict the coefficient estimates within a certain range. The following are the innovations of this paper:

(1): When the transferable source domain is known, we propose a high-dimensional stochastic frontier transfer learning algorithm via Elastic Net.
(2): When the transferable source domain is unknown, we design a transferable source detection algorithm.
(3): When some prior knowledge is known, we add linear constraints to the model so that the coefficient estimation is reduced to a certain range and the coefficient estimation is improved.

The rest of this paper is organized as follows. In Section 1, we first briefly review the stochastic frontier model and transfer learning. Then we introduce the Elastic Net stochastic frontier model with transfer learning based on known transferable sources and unknown transferable sources in Section 2. Next, we perform extensive simulations in Section 3 and one real data experiment in Section 4. Finally, we summarize our paper in Section 5.

2. Methodology

2.1. High-Dimensional Stochastic Frontier Model via Elastic Net

This paper studies the following stochastic frontier models

Y = f (X; β) \cdot e x p (v - u),

(1)

where

f (X; β)

represents the frontier production function, v is the bilateral random error term, and u represents the technical inefficiency term. Formula (1) uses the Cobb–Douglas production function. The stochastic frontier model based on Cobb–Douglas production function and logarithm is

ln Y = β_{0} + \sum β_{k} ln X + v - u .

(2)

For the sake of convenience, let

x = ln X

and

y = ln Y

, then the Formula (2) can be expressed as

y = x β + v - u,

(3)

where

v \sim N (0, σ_{v}^{2})

and

u \sim N^{+} (0, σ_{u}^{2})

.

Although Formula (3) is similar to the ordinary linear model, the perturbation term

ε = v - u

is asymmetric due to the introduction of the non-negative term u. This asymmetry allows for the identification of the existence and distribution of u. We assume that

v \sim N (0, σ_{v}^{2})

,

u \sim N^{+} (0, σ_{u}^{2})

, and that v and u are mutually independent and also independent of the explanatory variables. Based on these assumptions, the log-likelihood function can be derived as follows [20]:

ln L (ε ∣ λ, σ^{2}) = n ln (\frac{\sqrt{2}}{\sqrt{π}}) + n ln (\frac{1}{σ}) + \sum_{i = 1}^{n} ln [1 - Φ (ε_{i} λ σ^{- 1})] - \frac{1}{2 σ^{2}} \sum_{i = 1}^{n} ε_{i}^{2},

(4)

where

ε_{i} = y_{i} - x_{i} β

,

σ^{2} = σ_{u}^{2} + σ_{v}^{2}

, and

λ = σ_{u} / σ_{v}

. The function

Φ (\cdot)

denotes the cumulative distribution function of the standard normal distribution.

Compared with the ordinary linear model, the log-likelihood function in Formula (4) introduces a nonlinear term to reflect the presence of inefficiency. Ordinary linear models typically assume that the error term follows a symmetric normal distribution, and their likelihood functions involve only the exponential of the squared residuals. In stochastic frontier models, however, the residual is decomposed into white noise and an inefficiency term, which is assumed to follow a half-normal distribution. Consequently, Formula (4) includes the nonlinear term

ln [1 - Φ (ε_{i} λ / σ)]

, which characterizes the asymmetry in the residual distribution.

This difference has an important influence on the optimization process. First, the likelihood function no longer has a closed-form solution, requiring numerical methods for parameter estimation. Second, the nonlinear term may lead to numerical instability—especially when

Φ (\cdot)

approaches 1, causing

1 - Φ (\cdot)

to become nearly zero, which results in a loss of precision in the logarithmic term. Therefore, stabilization techniques are necessary in practical optimization. Finally, the inclusion of the additional parameters

λ

and

σ

increases the dimensionality of the optimization problem, making the estimation process more complex.

In general, we use maximum likelihood estimation for parameter estimation but typically do not account for cases where the number of features exceeds the number of samples. To this end, we add regularization terms to achieve feature selection and address the collinearity problem. For convenience, we define

ln L_{i}

as the negative log-likelihood function for the i-th observation. The resulting optimization problem can be expressed as

min_{β} \sum_{i = 1}^{n} ln L_{i} (β) + λ_{1} {∥ β ∥}_{1} + λ_{2} {∥ β ∥}_{2}^{2} .

(5)

If we have some prior knowledge of the parameters before the estimation, we can incorporate linear equality and inequality constraints. This transforms Formula (5) into the following constrained optimization problem:

min_{β} \sum_{i = 1}^{n} ln L_{i} (β) + λ_{1} {∥ β ∥}_{1} + λ_{2} {∥ β ∥}_{2}^{2} s . t . C β \geq d, E β = f .

(6)

Among them,

λ_{1}

and

λ_{2}

are the penalty parameters, and

C \in R^{q \times p}, d \in R^{q}, E \in R^{m \times p}, f \in R^{m}

are vectors that are set according to the actual situation. The Formula (5) can be used to select parameters by

λ_{1} {∥ β ∥}_{1}

, and the collinearity problem can be solved by

λ_{2} {∥ β ∥}_{2}^{2}

.

We now provide a more detailed introduction to the linear constraints to enhance readability. If we know some prior knowledge of the parameters before the estimation, such as

β_{1} \geq a

,

w β_{2} + v β_{3} \geq b

and

β_{4} = c

, where

w, v, a, b, c

is a real number, then we can use this knowledge to construct a linear constraint. Specifically, the following matrix is constructed:

C = [\begin{matrix} 1 & 0 & 0 & 0 & \dots & 0 \\ 0 & w & v & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & 0 \end{matrix}] \in R^{q \times p}, d = [\begin{matrix} a \\ b \\ ⋮ \\ 0 \end{matrix}] \in R^{q \times 1}

E = [\begin{matrix} 0 & 0 & 0 & 1 & \dots & 0 \\ 0 & 0 & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & 0 \end{matrix}] \in R^{q \times p}, f = [\begin{matrix} c \\ 0 \\ ⋮ \\ 0 \end{matrix}] \in R^{q \times 1}

The inequality constraint is

C β \geq d

, and the equality constraint is

E β = f

.

Here, we discuss the linear constraints derived from prior knowledge and the regularization terms

λ_{1} {∥ β ∥}_{1}

and

λ_{2} {∥ β ∥}_{2}^{2}

. The regularization terms are considered soft constraints, which influence the objective function through penalty terms, thereby contributing to structural selection and preventing overfitting during model training. In contrast, linear constraints are hard constraints that typically originate from domain-specific prior knowledge. While regularization helps control model complexity and enhance interpretability, linear constraints enforce parameter feasibility based on the actual problem setting. The two types of constraints are complementary in nature.

In addition, to solve the optimization problem in Formula (6), we refer to the framework proposed by Zou et al. [11] and introduce several modifications. First, the Elastic Net penalty term in the original problem is equivalently reformulated as a standard Lasso problem. Following Lemma 1 in Zou et al. [11], an extended data set is constructed to achieve this transformation. As a result, the original problem is converted into a minimization problem with an

ℓ_{1}

penalty. The Lasso solution path is then computed on the extended data set using a standard Lasso algorithm, and the final solution to the original problem is recovered through variable transformation.

The choice of penalty parameters has a significant impact on the estimation. Here, we take the Formula (5) as an example to briefly illustrate the selection of the penalty parameters

λ_{1}

and

λ_{2}

. The same strategy can be applied to determine the penalty parameters in other parts of the model.

By introducing the mixing parameter

λ

, the optimization problem becomes

min_{β} \sum_{i = 1}^{n} ln L_{i} (β) + α λ {∥ β ∥}_{1} + \frac{1 - α}{2} λ {∥ β ∥}_{2}^{2},

(7)

where

λ = λ_{1} + λ_{2}

and

α \in [0, 1]

. Given a fixed

α

and a sequence candidate values

λ^{1}, \dots, λ^{m}

, we use five-fold cross validation to find the optimal

λ

. The process involves randomly dividing the sample data into five subsets, using one subset as the validation set and the remaining subsets as the training set for each iteration. For each candidate value

λ^{ℓ}

(

ℓ = 1, \dots, m

), the model is trained on the training set and evaluated on the validation set. The value of

λ

that minimizes the mean absolute error (

M A E

) across all validation sets is selected as the final penalty parameter.

When the sample size of the target data is large, the above constraint variable selection method has achieved good results, but when the sample size of the target data is small, the variable selection effect is not ideal. In this case, it is necessary to use transfer learning technology to transfer data from similar distributed source data sets.

2.2. Transfer Learning of High-Dimensional Stochastic Frontier Model via Elastic Net

Transfer learning usually has two domains: source domain and target domain, respectively. The target domain, which has limited data or labels, is the focus of learning, while the source domain contains abundant data or rich label information. Although the source and target domains may differ in data distribution, they share certain similarities. The purpose of transfer learning is to transfer knowledge from the source domain to the target domain to complete model learning.

Suppose that the target data

(X^{(0)}, Y^{(0)})

, where the ith target data is defined as

(x_{i}^{(0)}, y_{i}^{(0)})

, comes from the following stochastic frontier model

y_{i}^{(0)} = x_{i}^{(0)} β + v_{i}^{(0)} - u_{i}^{(0)}, i = 1, 2, 3, \dots, n .

(8)

Then the target model is constructed:

\sum ln L_{i} (β) + λ_{1} {∥ β ∥}_{1} + λ_{2} {∥ β ∥}_{2}^{2} .

(9)

Assume that there are K source data sets

(X^{(K)}, Y^{(K)})

. And the kth source data set is represented as

(X^{(k)}, Y^{(k)})

, where

X_{(k)} \in R^{n_{k} \times p}

and

Y^{(k)} \in R^{n_{k}}

. The kth source data set of the jth data is represented as

(x_{j}^{(k)}, y_{j}^{(k)})

. The source data comes from the stochastic frontier model:

y_{j}^{(k)} = x_{j}^{(k)} w^{(k)} + v_{j}^{(k)} - u_{j}^{(k)}, j = 1, 2, 3, \dots, n_{k} .

(10)

In this paper, we refer to source data sets that meet a predefined similarity condition as transferable sources. Only the information from these transferable sources is utilized to enhance the coefficient estimation for the target data. This restriction is reasonable, as it ensures that the incorporated information is sufficiently relevant and consistent with the target domain, thereby improving the accuracy and robustness of the estimation. In contrast, if the similarity between the source and target domains is insufficient, incorporating source data may degrade performance rather than improve it.

We now define a transferable source. Given any positive threshold

h > 0

, let

β = w^{(0)}

denote the parameter vector of the target model, and let

w^{(k)}

denote the parameter vector of the kth source model. Define the difference as

δ^{(k)} = β - w^{(k)}

. If

{∥ δ^{(k)} ∥}_{1} \leq h

, the k-th source is considered transferable. The index set of all transferable sources is then defined as

A = \{k : {∥ δ^{(k)} ∥}_{1} \leq h\}

.

This paper studies how to perform transfer learning on high-dimensional stochastic frontier models with known and unknown transferable sources, and how to perform transfer learning on high-dimensional stochastic frontier models with linear constraints.

2.2.1. Known Transferable Sources and No Constraints

Suppose there is a set of transferable sources

A_{h}

, where the sample size is

n_{A}

. Firstly, the target data and transferable data are brought into the log-likelihood function with regularization term to obtain

{\hat{w}}^{A}

{\hat{w}}^{A} \leftarrow arg min_{w} \{\sum_{i = 1}^{n_{0} + n_{A}} ln L_{i} (\hat{w}) + λ_{1} {∥ w ∥}_{1} + λ_{2} {∥ w ∥}_{2}^{2}\},

(11)

where

ε_{i} = y_{i} - x_{i} \hat{w}

.

Next, the target data is brought into the log-likelihood function with regularization term to obtain

{\hat{δ}}_{A}

{\hat{δ}}^{A} \leftarrow arg min_{δ} \{\sum_{i = 1}^{n_{0}} ln L_{i} ({\hat{w}}^{A} + δ) + η_{1} {∥ δ ∥}_{1} + η_{2} {∥ δ ∥}_{2}^{2}\},

(12)

where

ε_{i} = y_{i} - x_{i} ({\hat{w}}^{A} + δ)

.

The penalty parameters

λ_{1}

,

λ_{2}

,

η_{1}

and

η_{2}

are obtained by cross validation. In order to help readers easily understand the algorithm, the

A

-Trans algorithm is shown in Algorithm 1.

The transfer learning algorithm proposed below is an improvement of the two-step transfer learning framework introduced by Chen et al. [18]. The main difference lies in the use of a different likelihood function. The practicability and effectiveness of the original algorithm have been demonstrated in their paper.

Algorithm 1:

A

-Trans

Input: Target data

(X^{(0)}, Y^{(0)})

, transferable source data

{(X^{(k)}, Y^{(k)})}_{k = 1}^{K}

, penalty parameter

λ_{1}

,

λ_{2}

,

η_{1}

and

η_{2}

Output: The estimated coefficient vector

\hat{β}

1 Transferring step:

{\hat{w}}^{A} \leftarrow arg min_{w} \{\sum_{i = 1}^{n_{0} + n_{A}} ln L_{i} (\hat{w}) + λ_{1} {∥ w ∥}_{1} + λ_{2} {∥ w ∥}_{2}^{2}\}

2 Correction step:

{\hat{δ}}^{A} \leftarrow arg min_{δ} \{\sum_{i = 1}^{n_{0}} ln L_{i} ({\hat{w}}^{A} + δ) + η_{1} {∥ δ ∥}_{1} + η_{2} {∥ δ ∥}_{2}^{2}\}

3 Calculate:

\hat{β} \leftarrow {\hat{w}}^{A} + {\hat{δ}}^{A}

4 Output:

\hat{β}

2.2.2. Known Transferable Sources and Linear Constraints

If we have some prior knowledge about the parameter

β

of the stochastic frontier model, then we can add linear equations and inequality constraints to transfer learning. Similar to the above,

β

is also estimated by a two-step method, but constraints are added in the solution process.

It is still assumed that the transferable source

A_{h}

is known, where the sample size is expressed as

n_{A}

. Firstly, the target data and transferable data are brought into the log-likelihood function with regularization terms and linear constraints to obtain

{\hat{w}}^{A}

:

\begin{matrix} {\hat{w}}^{A} \leftarrow arg min_{w} \{\sum_{i = 1}^{n_{0} + n_{A}} ln L_{i} (\hat{w}) + λ_{1} {∥ w ∥}_{1} + λ_{2} {∥ w ∥}_{2}^{2}\}, \\ s . t . C \hat{w} \geq d, E \hat{w} = f, \end{matrix}

(13)

where

ε_{i} = y_{i} - x_{i} \hat{w}

.

Secondly, the target data is brought into the log-likelihood function with regularization terms and constraints to obtain

{\hat{δ}}_{A}

:

\begin{matrix} {\hat{δ}}^{A} \leftarrow arg min_{δ} \{\sum_{i = 1}^{n_{0}} ln L_{i} ({\hat{w}}^{A} + δ) + η_{1} {∥ δ ∥}_{1} + η_{2} {∥ δ ∥}_{2}^{2}\}, \\ s . t . C ({\hat{w}}^{A} + δ) \geq d, E ({\hat{w}}^{A} + δ) = f, \end{matrix}

(14)

where

ε_{i} = y_{i} - x_{i} ({\hat{w}}^{A} + δ)

.

Similarly, we can obtain the penalty parameters

λ_{1}

,

λ_{2}

,

η_{1}

and

η_{2}

by cross validation. The

A

-ConTrans algorithm is shown in Algorithm 2.

Algorithm 2:

A

-ConTrans

Input: Target data

(X^{(0)}, Y^{(0)})

, transferable source data

{(X^{(k)}, Y^{(k)})}_{k = 1}^{K}

, penalty parameter

λ_{1}

,

λ_{2}

,

η_{1}

and

η_{2}

, constraint parameter

C, d, E, f

Output: The estimated coefficient vector

\hat{β}

1 Transferring step:

\begin{matrix} {\hat{w}}^{A} \leftarrow arg min_{w} \{\sum_{i = 1}^{n_{0} + n_{A}} ln L_{i} (\hat{w}) + λ_{1} {∥ w ∥}_{1} + λ_{2} {∥ w ∥}_{2}^{2}\}, \\ s . t . C \hat{w} \geq d, E \hat{w} = f \end{matrix}

2 Correction step:

\begin{matrix} {\hat{δ}}^{A} \leftarrow arg min_{δ} \{\sum_{i = 1}^{n_{0}} ln L_{i} ({\hat{w}}^{A} + δ) + η_{1} {∥ δ ∥}_{1} + η_{2} {∥ δ ∥}_{2}^{2}\}, \\ s . t . C ({\hat{w}}^{A} + δ) \geq d, E ({\hat{w}}^{A} + δ) = f \end{matrix}

3 Calculate:

\hat{β} \leftarrow {\hat{w}}^{A} + {\hat{δ}}^{A}

4 Output:

\hat{β}

2.2.3. Unknown Transferable Sources and Linear Constraints

When the transferable source is unknown, the consequences of not running the transferable source detection is unimaginable. In order to solve this problem, this paper implements a transferable source detection algorithm based on the constrained Elastic Net stochastic frontier model, which detects the data set that can be transferred from the source data set, and the unconstrained situation is the same. The penalty parameters

λ_{1}^{(k) [r]}

and

λ_{2}^{(k) [r]}

are obtained by cross validation. For the sake of convenience, the transferable source detection algorithm first divides the target data set into five parts

(r = 5)

. It runs the migration step on each source domain and every four fold target data. Then the average error

{\hat{R}}_{0}^{(0)}

of the residual fold of the objective function is calculated. For the target data of every four folds, we use the constrained regularized stochastic frontier model to fit and calculate the error of the remaining folds. Then the average error

{\hat{R}}_{0}^{(0)}

is calculated as the baseline. Finally, these two errors are compared against a threshold: source domains with errors greater than the threshold are discarded, while those with errors less than the threshold are added to the collection

A

.

For convenience, we assume that

n_{0}

can be divided into five parts. The average error of the rth fold of the target data

(X^{(0) [r]}, Y^{(0) [r]})

is

{\hat{R}}_{0}^{[r]} (w) = \frac{1}{n_{0} / 5} \sum_{i = 1}^{n_{0} / 5} {(y_{i}^{(0) [r]} - x_{i}^{(0) [r]} w)}^{2} .

(15)

The specific algorithm is shown in Algorithm 3. It should be noted that Algorithm 3 does not require the input of h. And we add some constraints to the algorithm to limit the inclusion of non-transferable source domains.

Algorithm 3:

A

-ConTrans Detection

Require: Target data $(X^{(0)}, Y^{(0)})$ , all all source data ${(X^{(k)}, Y^{(k)})}_{k = 1}^{K}$ , penalty parameters ${{(λ_{1}^{(k) [r]}, λ_{2}^{(k) [r]})}_{k = 0}^{K}}_{r = 1}^{5}$ , a constant $C_{0} > 0$ , constraint parameter $C, d, E, f$
Ensure: The estimated coefficient vector $\hat{β}$ and transferable source data sets $A$

1:: Transferable source detection: Divide $(X^{(0)}, Y^{(0)})$ into five sets of equal size as ${(X^{(0) [i]}, Y^{(0) [i]})}_{i = 1}^{5}$
2:: for $r = 1$ to 5 do
3:: ${\hat{β}}^{(0) [r]} \leftarrow$ fit the Stochastic frontier model of Elastic Net with ${(X^{(0) [i]}, Y^{(0) [i]})}_{i = 1}^{5} ∖ (X^{(0) [i]}, Y^{(0) [i]})$ and penalty parameters $λ_{1}^{(0) [r]}, λ_{2}^{(0) [r]}$ , $s . t . C {\hat{β}}^{(0) [r]} \geq d, E {\hat{β}}^{(0) [r]} = f$
4:: ${\hat{β}}^{(k) [r]} \leftarrow$ running step 1 in Algorithm 2 with $({(X^{(0) [i]}, Y^{(0) [i]})}_{i = 1}^{5} ∖ (X^{(0) [r]}, Y^{(0) [r]})) \cup (X^{(k)}, Y^{(k)})$ and $λ_{1}^{(k) [r]}, λ_{2}^{(k) [r]}$ for $k \neq 0$ , $s . t . C {\hat{β}}^{(k) [r]} \geq d, E {\hat{β}}^{(k) [r]} = f$
5:: calculate the error ${\hat{R}}_{0}^{[r]} ({\hat{β}}^{(k) [r]})$ with $(X^{(0) [r]}, Y^{(0) [r]})$ , for $k = 1, \dots, K$
6:: end for
7:: ${\hat{R}}_{0}^{(k)} \leftarrow \sum_{r = 1}^{5} {\hat{R}}_{0}^{[r]} ({\hat{β}}^{(k) [r]}) / 5, {\hat{R}}_{0}^{(0)} \leftarrow \sum_{r = 1}^{5} {\hat{R}}_{0}^{[r]} ({\hat{β}}^{(0) [r]}) / 5$ , $\hat{σ} = \sqrt{\sum_{r = 1}^{5} {({\hat{R}}_{0}^{[r]} ({\hat{β}}^{(0) [r]}) - {\hat{R}}_{0}^{(0)})}^{2} / 4}$
8:: $\hat{A} \leftarrow {k \neq 0 : {\hat{R}}_{0}^{(k)} - {\hat{E}}_{0}^{(0)} \leq C_{0} (\hat{σ} \lor 0.01)}$
9:: $\hat{A}$ -Trans: $\hat{β} \leftarrow$ running Algorithm 2 with ${(X^{(k)}, Y^{(k)})}_{k \in {0} \cup \hat{A}}$
10:: Output: $\hat{β}$ and $\hat{A}$

Algorithm 3 relies on the unknown constant

C_{0}

, which determines the threshold for selecting transferable source data. Without knowing the true value of

C_{0}

, a larger value may lead to an overestimation of

A

, while a smaller value may result in an underestimation. Following Liu [21], we determine

C_{0}

by running an additional round of cross-validation. Although this increases computational cost, it is an effective approach for solving practical problems.

2.3. Technical Efficiency

When using the stochastic frontier model for analysis, in addition to estimating all model parameters, we also need to estimate the technical efficiency. In this paper, we mainly rely on the JLMS method proposed by James Jondrow et al. [22] to obtain the technical efficiency

T E_{i}

by calculating

e^{E (- {\hat{u}}_{i} | {\hat{ε}}_{i})}

:

{\hat{u}}_{i} = E (u_{i} | {\hat{ε}}_{i}) = (\frac{\hat{σ} \hat{λ}}{1 + {\hat{λ}}^{2}}) ({\hat{z}}_{i} + \frac{ϕ ({\hat{z}}_{i})}{Φ ({\hat{z}}_{i})}),

(16)

{\hat{T E}}_{i} = e^{E (- {\hat{u}}_{i} | {\hat{ε}}_{i})} = e^{(- {\hat{u}}_{i})},

(17)

where

{\hat{ε}}_{i} = y_{i} - x_{i} \hat{β}

,

{\hat{z}}_{i} = \frac{- {\hat{ε}}_{i} \hat{λ}}{\hat{σ}}

,

{\hat{σ}}^{2} = {\hat{σ}}_{u}^{2} + {\hat{σ}}_{v}^{2}

and

\hat{λ} = \frac{{\hat{σ}}_{u}}{{\hat{σ}}_{v}}

.

ϕ (\cdot)

is the probability density function of standard normal distribution,

Φ (\cdot)

is the cumulative distribution function of standard normal distribution. By using maximum likelihood estimation, we can not only obtain the estimator

\hat{β}

but also

\hat{σ}

and

\hat{λ}

, and bring them into the above formula to obtain the estimation of technical efficiency.

3. Simulation Experiments

The transfer learning objective of the high-dimensional stochastic frontier model with linear constraints is to improve the estimation of the target model by transferring knowledge from the source data. Here we consider 50-dimensional data, i.e.,

p = 50

. In order to ensure that the transferable source data satisfy

A = {k : {∥ w^{(k)} - β ∥}_{1} \leq h},

(18)

we define a vector

d_{k} = {(0.5, - 0.5, 0.5, - 0.5 \dots)}^{T}

and

β = {(5 \cdot 1_{s}, 0_{p - s})}^{T}

, where s is defined as 5,

1_{s}

is a value of 1 for s element and

0_{p - s}

is a value of 0 for

p - s

elements. For any transferable source data set, we set

ω^{(k)} = β + h / p \cdot d_{k}

. For the target data, we assume

x_{i}^{(0)} \sim N (0_{p}, Σ)

, where

Σ = {(ρ^{|i - j|})}_{1 \leq i, j \leq p}

with

ρ = 0.95

. For the source data we assume

x_{i}^{(k)} \sim N (0_{p}, Σ)

with

Σ = {(ρ^{|i - j|})}_{1 \leq i, j \leq p}

. The inefficiency term is generated as

u \sim N^{+} (0, 1 . 0^{2})

, and the random noise term as

v \sim N (0, 1 . 5^{2})

. The impact of transfer learning is represented by

\frac{1}{p} \sum_{i = 1}^{p} | β_{i} - {\hat{β}}_{i} |

,

| σ - \hat{σ} |

,

| λ - \hat{λ} |

and

\frac{1}{n_{0}} \sum_{i = 1}^{n_{0}} | T E_{i} - {\hat{T E}}_{i} |

. Each set of experiments is repeated 50 times.

3.1. Transfer Learning When the Transferable Source Domain Is Known

In this section, we will give the transfer learning algorithms with unconstrained and constrained conditions when the transferable source is known.

3.1.1. Transfer Learning of Regularized Stochastic Frontier Model Without Constraint

Each group of experiments has a target data set consisting of 10 data points. Each transferable data set contains 10 transferable source data sets, each with 10 data points. h is taken as

{5, 10, 15, 20, 25, 30, 35, 40}

, respectively, to observe the experimental results.

To facilitate expression, we refer to our model as the Elastic Net stochastic frontier model. When only the

ℓ_{1}

norm regularization term is added, it is called the Lasso stochastic frontier model. When no regularization term is added, it is referred to as the ordinary stochastic frontier model.

The illustration is explained as follows:

Trans-Elastic Net: Elastic Net stochastic frontier model without constraints on transferable source data and target data;
Naive: Elastic Net stochastic frontier model without constraints on target data only;
Trans-Lasso: Lasso stochastic frontier model without constraints on transferable source data and target data;
Trans-SFA: Ordinary stochastic frontier model without constraints on transferable source data and target data.

When the target data set remains unchanged, the coefficient estimation errors are shown in Figure 1. For the same target data set, the Elastic Net stochastic frontier model and the Lasso stochastic frontier model achieve better results at different values of h compared to using only the target data. Regardless of whether the Elastic Net, Lasso, or ordinary stochastic frontier model is used, the coefficient estimation error increases gradually with increasing h. At the same time, the effect of the Elastic Net stochastic frontier model is obviously better than that of the Lasso stochastic frontier model and the ordinary stochastic frontier model.

In terms of technical efficiency estimation error, as shown in Figure 2, the estimation error of

λ

,

σ

and

T E

exhibits a similar trend to that of the coefficient estimation error. The estimation error of the Elastic Net stochastic frontier model is the smallest, which is better than other models. In addition, the technical efficiency estimation errors of all models after transfer learning gradually increase with the increase in h. Overall, the Elastic Net stochastic frontier model performs best in technical efficiency estimation.

The influence on the parameter estimation of the transferable source data under different values of h for the target data and varying numbers of transferable source data sets is shown in Figure 3 and Figure 4. Here, K represents the number of transferable source data sets. The sample size of each transferable source data set is 10, h is set to

{10, 20, 30, 40}

, and K is set to

{5, 7, 9, 12, 15, 18, 21, 24}

. The target data set has 10 samples.

In the whole experiment, the target data set is kept unchanged, and the value of K is changed for the comparison of transfer learning. As shown in Figure 3, it can be seen that in the case of constant h, the coefficient estimation error basically decreases with the increase in the transferable data set K. In the case of

K = 5

, the error of

β

is very large, and the error decreases very quickly from

K = 5

to

K = 7

. The reason is that even after the source data is transmitted, the total amount of data is still too small, which leads to the instability in the estimation results. In addition, as h increases under the condition of constant K, the coefficient estimation error gradually increases. Therefore, when h is smaller and K is larger, the results are better and the estimation error will be smaller.

From Figure 4, it can be seen that in the case of constant h, the estimation errors of the parameters

λ

,

σ

, and

T E

also decrease with the increase in the transferable data set K. In the case of constant K, as h increases, the estimation error gradually increases. Therefore, when h is small and K is large, the estimation error of technical efficiency is the smallest and the result is the best.

3.1.2. Transfer Learning of Regularized Stochastic Frontier Model with Linear Constraint

When some prior knowledge of the coefficients is available, we can add linear constraints to the model to reduce the estimation error of transfer learning. Similarly, each set of experiments includes a target data set containing 10 data points, and each transferable source data set consists of 10 data sets, each with 10 data points. The parameter h is set to

{5, 10, 15, 20, 25, 30, 35, 40}

respectively to observe the experimental results.

The illustration is explained as follows:

Trans-ConElastic Net: Elastic Net stochastic frontier model with linear constraints on transferable source data and target data;
ConNaive: Elastic Net stochastic frontier model with linear constraints on target data only;
Trans-ConLasso: Lasso stochastic frontier model with linear constraints on transferable source data and target data;
Trans-ConSFA: Ordinary stochastic frontier model with linear constraints on transferable source data and target data.

Under linear constraints, as shown in Figure 5, it can be observed that when the target data set remains unchanged, the Elastic Net stochastic frontier model and the Lasso stochastic frontier model achieve smaller coefficient estimation errors and perform better than using only the target data at different h. Due to the small sample size of the target data set and the inherent limitations of the ordinary stochastic frontier model, the estimation error of the coefficients does not change significantly. Under the same linear constraints, regardless of whether the Elastic Net stochastic frontier model, Lasso stochastic frontier model or ordinary stochastic frontier model is used, the coefficient estimation error after transfer learning gradually increases with increasing h.

The estimation errors of the parameters

σ

,

λ

and

T E

are shown in Figure 6. It can be observed that the estimation errors of the Elastic Net stochastic frontier model and the Lasso stochastic frontier model are small, primarily due to the effect of incorporating source data rather than relying solely on the target data. Regardless of whether the Elastic Net stochastic frontier model, the Lasso stochastic frontier model or the ordinary stochastic frontier model is used, the estimation error after transfer learning gradually increases with the increasing h. In contrast, In contrast, the Elastic Net stochastic frontier model under linear constraints yields the best overall performance.

Under the linear constraint, the influence on parameter estimation from the transferable source data, with different h for the target data and varying numbers of transferable source data sets, is shown in Figure 7. Throughout the experiment, the target data set is kept unchanged, while the value of K is varied to compare the performance of transfer learning. As shown in Figure 7, when h is held constant, the coefficient estimation error generally decreases as the number of transferable source data sets K increases. Due to the effect of linear constraints, the sudden change in coefficient estimation error observed under unconstrained conditions at

K = 5

does not occur. In addition, when K is fixed, the coefficient estimation error gradually increases as h increases. Therefore, when h is smaller and K is larger, the results are better and the coefficient estimation error is smaller.

The estimation errors of the parameters

σ

,

λ

and

T E

are shown in Figure 8. It can be seen from the figure that the estimation error decreases as the number of transferable source data sets K increases, while h remains constant. In addition, when K is fixed, the estimation error gradually increases with increasing h. Therefore, when h is smaller and K is larger, the estimation of technical efficiency is more accurate, and the parameter estimation errors will be smaller.

Setting K to 10, with each transferable data set containing 10 data points, and the target data set consisting of 10 samples, we observe the effect of high-dimensional models with and without linear constraints. The results are shown in Figure 9.

As shown in Figure 9, the transfer effect of the Elastic Net stochastic frontier model with constraints is better than that of the model without constraints, and the estimation error increases with increasing h. The addition of a tacit constraint can be regarded as incorporating prior information, which effectively narrows the range of the estimated coefficients and improves the accuracy of coefficient estimation.

3.2. Transfer Learning When the Transferable Source Domain Is Unknown

When the transferable source is unknown, the transferable source is first detected, and then the transfer learning is performed to estimate the coefficients. We conduct experiments based on Elastic Net stochastic frontier model under linear constraints. The target data set is set to contain 15 samples. The total number of transferable and non-transferable source data sets is 10 (i.e.,

K_{A_{h}} + K_{A_{h}^{c}} = 10

), and each source data set contains 100 samples. The parameter h is set to 10. The coefficients of the transferable source data sets are the same as in previous experiments, while the coefficients of the non-transferable source data sets are set as

w^{(k)} = β + h \cdot p

.

The illustration is explained as follows:

Trans-Elastic Net: Transfer learning using Algorithm 3 with all source data and target data;
Naive: Elastic Net Stochastic frontier model with linear constrains on target data only;
All-Elastic Net: Elastic Net stochastic frontier model with linear constraints on all data;
Oracle-Elastic Net: Transfer learning using Algorithm 2 on source data in $A_{h}$ .

As shown in Figure 10, the All-Elastic Net method uses the right axis as the y axis, while all other methods use the left axis as the y axis. It can be observed that if no transferable source detection is performed, the resulting

β

estimation error is significantly larger. In fact, the error in this case is even greater than that obtained when using only the target data.

As shown in Figure 11, for the estimation error of

σ

, the All-Elastic Net uses the right axis as the y axis, while the other methods use the left axis as the y axis. It can be observed that the

σ

estimation error is also very large for All-Elastic Net, primarily due to its large

β

estimation error. However, because of the substantial estimation error in

σ

, the technical efficiency estimation of All-Elastic Net does not exhibit meaningful characteristics, and is therefore not shown in the figure. From Figure 10 and Figure 11, it can be seen that Trans-Elastic Net closely approximates the results of Oracle-Elastic Net, indicating that Algorithm 3 can successfully detect the transferable source and recover

A_{h}

.

4. Real Data

The real data come from the Farm Accountancy Data Network (FADN), which is maintained and enhanced by EU member states, and is available at http://ec.europa.eu/agriculture/rica/, accessed on 1 May 2025. We choose the data from the year 2013 for analysis. Python 3.8.10 is used to remove outliers and missing values, perform logarithmic transformation and normalization, select appropriate features, and conduct other preprocessing steps. After handling missing values and outliers and performing feature engineering, a data set consisting of 101 features is prepared. We use total output as the dependent variable, the total input, the total agricultural utilization area, the total labor input as the dependent variable. The test data exhibit strong collinearity, which is very suitable for our model. The target and source data sets are divided by economic scale. The data with an economic scale of EUR

38, 000

are used as the target domain, totaling 19 samples. The data with economic scales of EUR

8000 - < 25, 000

, EUR

50, 000 - < 100, 000

, EUR

100, 000 - < 500, 000

and

> =

EUR 5000,000 are used as the source domain, containing 84, 131, 132, and 74 samples, respectively. We believe that the influence of total input and total agricultural utilization area among the independent variable on the total output of the dependent variable is always positive. This is a prior knowledge available before estimation, so the constraint is set to

β_{1} > 0, β_{2} > 0

. Using Algorithm 3, we find that data sets with economies of EUR

8000 - < 25, 000

, EUR

50, 000 - < 100, 000

and EUR

100, 000 - < 500, 000

can be migrated.

Since prior knowledge is specified before estimation, it is inevitable that some correct and practical prior knowledge may be omitted, or incorrect and irrelevant prior knowledge may be included. Other prior knowledge about the coefficients may influence the final results, but such knowledge is not as evident as that of the two variables mentioned above. To avoid the risk of poor outcomes caused by the incorrect introduction of prior knowledge, we conservatively apply only the constraint

β_{1} > 0, β_{2} > 0

.

In order to prove the repeatability of our method, we perform five-fold cross-validation analyses on the data and record the average prediction error across the five folds as the final comparative prediction error. The prediction error of each fold is defined by

M A E = \frac{1}{n_{t e s t}} | y_{t e s t} - {\hat{y}}_{t e s t} |

.

We use the Elastic Net stochastic frontier model with or without linear constraints, the Lasso stochastic frontier model, and the ordinary stochastic frontier model for testing. The results after transfer are shown in Figure 12. The various methods are explained below:

trans+ConElastic Net: Elastic Net stochastic frontier model with transferable source data and target data under added constraints.
trans+Elastic Net: Elastic Net stochastic frontier model with transferable source data and target data without adding constraints.
trans+ConLasso: Lasso stochastic frontier model with transferable source data and target data under added constraints.
trans+Lasso: Ordinary stochastic frontier model with transferable source data and target data without adding constraints.
target+ConElastic Net: Elastic Net stochastic frontier model with only target data under linear constraints.
mix+ConElastic Net: Elastic Net stochastic frontier model with all data under added constraints.
trans+ConSFA: Ordinary stochastic frontier model with transferable source data and target data under linear constraints.

As shown in Figure 12, trans+conElastic Net achieves the lowest MAE, followed by trans+Elastic Net, which proves the effectiveness of our proposed algorithm. The MAE under constrained conditions is lower than that under unconstrained conditions. Across all methods, this indicates that incorporating prior knowledge during coefficient estimation is beneficial to the estimation results. When all data are used directly for stochastic frontier analysis, a large MAE is observed, further highlighting that the absence of transferable source detection can lead to significant estimation errors.

From Table 1, we can see that trans + conElastic Net achieves the lowest standard deviation, and the standard deviation of trans + Elastic Net ranks second, only after trans + ConLasso, indicating that our method is both accurate and stable. We use trans + conElastic Net as the benchmark method for the paired t-test. The results show that, except for the p-values of trans + Elastic Net and trans + ConLasso, the p-values of all other methods reach the significance level

(p < 0.05)

, which directly indicates that this method is significantly better than most of the comparison methods in terms of error performance. Combining the results of

M A E

, standard deviation, and the paired t-test, we conclude that our proposed method performs best among the seven candidate methods from multiple perspectives, demonstrating strong practicality and robustness.

5. Discussion

From the above experiments, it can be seen that the transfer learning of the high-dimensional stochastic frontier model via Elastic Net is feasible and effective. Firstly, in the presence of high-dimensional and collinear data, our model outperforms the Lasso stochastic frontier model and ordinary stochastic frontier model. Secondly, adding constraints to the model, that is, incorporating some prior knowledge, can restrict the parameter estimation within a certain range, making the estimation more accurate. Third, transfer learning is used to transfer knowledge from similar source data in the case of limited target data, in order to improve the accuracy of parameter estimation and the performance of prediction. Experiments show that transfer learning is indeed feasible for the high-dimensional stochastic frontier model based on Elastic Net, and greatly improves the performance of the model. Moreover, the effectiveness of both algorithms is demonstrated by simulation experiments and real cases for the scenarios where the transferable source is known and where it is unknown. This study also often observes the phenomenon of negative transfer learning, that is, the negative impact on the final result, which is still very common in transfer learning. How to reduce or avoid the phenomenon of negative transfer learning is a problem for further research in the future. Whether transfer learning can be used for other statistical learning methods will also continue to be studied in the future.

Although the proposed algorithm performs well in numerical simulation and empirical research, it lacks theoretical proof to support its effectiveness. A potential research direction is to provide statistical inference and confidence intervals for the parameters estimated by the proposed method. This will enhance the interpretability and applicability of the algorithm. In addition, extending the existing framework and theoretical analysis to other models is a worthwhile avenue of exploration. Evaluating the applicability of the proposed methods to different models may offer insights into their versatility and potential improvements in various transfer learning scenarios.

Author Contributions

J.C.: Conceptualization, Software, Visualization, Writing—review and editing. All authors reviewed the manuscript. W.C.: Data curation, Visualization. Y.S.: Data curation, Formal analysis, Software, Visualization, Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

The researches are supported by the Natural Science Foundation (NSF) project of Shandong Province of China (ZR2024MA074), the Ministry of Education of Humanities and Social Science project (24YJA910003), Fundamental Research Funds for the Central Universities (No. 23CX03012A).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicting financial interests.

References

Aigner, D.; Lovell, C.; Schmidt, P. Formulation and estimation of stochastic frontier production function models. J. Econom. 1977, 6, 21–37. [Google Scholar] [CrossRef]
Meeusen, W.; van den Broeck, J. Technical Efficiency and Dimensions of the Company: Some Results from Using Frontier Production Functions. Posit. Econ. 1977, 2, 109–122. [Google Scholar]
Battese, G.E.; Corra, G.S. Estimation of a Production Frontier Model: With Application to the Pastoral Zone of Eastern Australia. Aust. J. Agric. Resour. Econ. 1977, 21, 169–179. [Google Scholar] [CrossRef]
Aigner, D.J.; Chu, S.F. On Estimating the Industry Production Function. Am. Econ. Rev. 1968, 58, 826–839. [Google Scholar]
Greene, W.H. On the estimation of a flexible frontier production model. J. Econom. 1980, 13, 101–115. [Google Scholar] [CrossRef]
Greene, W.H. Maximum likelihood estimation of econometric frontier functions. J. Econom. 1980, 13, 27–56. [Google Scholar] [CrossRef]
Stevenson, R.E. Likelihood functions for generalized stochastic frontier estimation. J. Econom. 1980, 13, 57–66. [Google Scholar] [CrossRef]
Xie, M.; Jean, N.; Burke, M.; Lobell, D.; Ermon, S. Transfer Learning from Deep Features for Remote Sensing and Poverty Mapping. Proc. AAAI Conf. Artif. Intell. 2015, 30. [Google Scholar] [CrossRef]
Ghafoorian, M.; Mehrtash, A.; Kapur, T.; Karssemeijer, N.; Marchiori, E.; Pesteie, M.; Guttmann, C.R.G.; De Leeuw, F.E.; Tempany, C.M.; Van Ginneken, B. Transfer Learning for Domain Adaptation in MRI: Application in Brain Lesion Segmentation; Springer: Cham, Switzerland, 2017. [Google Scholar]
Rui, R.; Qing, Z.; Denghui, G.; Chuan, L.; Yang, W.; Chaoqun, W. Emergency tripping strategy of sub/super-synchronous oscillation in wind power grid-connected system based on transfer learning. Smart Power 2024, 52, 23–31. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. (Stat. Methodol.) 2005, 67, 301–320. [Google Scholar] [CrossRef]
Jia, J.; Yu, B. On Model Selection Consistency of the Elastic Net When p ≫ n. Stat. Sin. 2008, 20, 595–611. [Google Scholar]
Yuan, M.; Lin, Y. On the Non-Negative Garrotte Estimator. J. R. Stat. Soc. Ser. B Stat. Methodol. 2007, 69, 143–161. [Google Scholar] [CrossRef]
Bastani, H. Predicting with Proxies: Transfer Learning in High Dimension. Manag. Sci. 2021, 67, 2964–2984. [Google Scholar] [CrossRef]
Li, S.; Cai, T.T.; Li, H. Transfer Learning for High-dimensional Linear Regression: Prediction, Estimation, and Minimax Optimality. arXiv 2020, arXiv:2006.10593. [Google Scholar] [CrossRef] [PubMed]
Tian, Y.; Feng, Y. Transfer Learning Under High-Dimensional Generalized Linear Models. J. Am. Stat. Assoc. 2023, 118, 2684–2697. [Google Scholar] [CrossRef] [PubMed]
Meng, K.; Gai, Y.; Wang, X.; Yao, M.; Sun, X. Transfer learning for high-dimensional linear regression via the elastic net. Knowl.-Based Syst. 2024, 304, 112525. [Google Scholar] [CrossRef]
Xuan Chen, Y.S.; Wang, Y. Robust transfer learning for high-dimensional regression with linear constraints. J. Stat. Comput. Simul. 2024, 94, 2462–2482. [Google Scholar] [CrossRef]
Qingchuan, J.; Shichuan, Z. Research on variable selection of stochastic frontier model. Stat. Decis. 2019, 35, 5. [Google Scholar] [CrossRef]
Behr, A.; Tente, S. Stochastic Frontier Analysis by Means of Maximum Likelihood and the Method of Moments; Deutsche Bundesbank: Frankfurt am Main, Germany, 2008. [Google Scholar]
Liu, S.S. Unified Transfer Learning Models in High-Dimensional Linear Regression. arXiv 2024, arXiv:2307.00238. [Google Scholar]
Jondrow, J.; Knox Lovell, C.; Materov, I.S.; Schmidt, P. On the estimation of technical inefficiency in the stochastic frontier production function model. J. Econom. 1982, 19, 233–238. [Google Scholar] [CrossRef]

Figure 1. Impact of transfer learning on coefficient estimation when target data is invariant without constraint.

Figure 2. Impact of transfer learning on technical efficiency estimation when target data is invariant without constraint.

Figure 3. Effect of different h and K on coefficient estimation without constraint.

Figure 4. Effect of different h and K on technical efficiency estimation without constraint.

Figure 5. Impact of transfer learning on coefficient estimation when target data is invariant with linear constraint.

Figure 6. Impact of transfer learning on technical efficiency estimation when target data is invariant with linear constraint.

Figure 7. Effect of different h and K on coefficient estimation with linear constraint.

Figure 8. Effect of different h and K on technical efficiency estimation with linear constraint.

Figure 9. Effect of constrained and unconstrained on coefficient estimation.

Figure 10. Comparison of coefficient estimation across various algorithms.

Figure 11. Comparison of technical efficiency estimation across various algorithms.

Figure 12. Comparison of MAE for each method when the seven methods are applied to EU agricultural data.

Table 1. Mean ± standard deviation of MAE for each method and paired t-test results compared with the baseline method (trans+ConElastic Net).

Method	MAE	Std	t-Value	p-Value
trans+ConElastic Net	0.1735	±0.1695	–	–
trans+Elastic Net	0.2072	±0.2062	−0.7402	0.4687
trans+ConLasso	0.2307	±0.1870	−0.7204	0.4805
trans+Lasso	0.4073	±0.3453	−3.1678	0.0053 **
target+ConElastic Net	0.5438	±0.5596	−2.3600	0.0298 *
mix+ConElastic Net	0.7156	±0.8451	−2.6351	0.0168 *
trans+ConSFA	0.7446	±0.8776	−2.6911	0.0149 *

Note: *

p < 0.05

, **

p < 0.01

indicate statistically significant differences compared with the baseline method.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, J.; Chen, W.; Song, Y. Transfer Learning of High-Dimensional Stochastic Frontier Model via Elastic Net. Axioms 2025, 14, 507. https://doi.org/10.3390/axioms14070507

AMA Style

Chen J, Chen W, Song Y. Transfer Learning of High-Dimensional Stochastic Frontier Model via Elastic Net. Axioms. 2025; 14(7):507. https://doi.org/10.3390/axioms14070507

Chicago/Turabian Style

Chen, Jiahao, Wenjun Chen, and Yunquan Song. 2025. "Transfer Learning of High-Dimensional Stochastic Frontier Model via Elastic Net" Axioms 14, no. 7: 507. https://doi.org/10.3390/axioms14070507

APA Style

Chen, J., Chen, W., & Song, Y. (2025). Transfer Learning of High-Dimensional Stochastic Frontier Model via Elastic Net. Axioms, 14(7), 507. https://doi.org/10.3390/axioms14070507

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Transfer Learning of High-Dimensional Stochastic Frontier Model via Elastic Net

Abstract

1. Introduction

2. Methodology

2.1. High-Dimensional Stochastic Frontier Model via Elastic Net

2.2. Transfer Learning of High-Dimensional Stochastic Frontier Model via Elastic Net

2.2.1. Known Transferable Sources and No Constraints

2.2.2. Known Transferable Sources and Linear Constraints

2.2.3. Unknown Transferable Sources and Linear Constraints

2.3. Technical Efficiency

3. Simulation Experiments

3.1. Transfer Learning When the Transferable Source Domain Is Known

3.1.1. Transfer Learning of Regularized Stochastic Frontier Model Without Constraint

3.1.2. Transfer Learning of Regularized Stochastic Frontier Model with Linear Constraint

3.2. Transfer Learning When the Transferable Source Domain Is Unknown

4. Real Data

5. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI