A Review of End-to-End Decision Optimization Research: An Architectural Perspective

Zhang, Wenya; Li, Gendao

doi:10.3390/a19010086

Open AccessReview

A Review of End-to-End Decision Optimization Research: An Architectural Perspective

by

Wenya Zhang

¹

and

Gendao Li

^2,*

¹

School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China

²

School of Economics and Management, Changchun University of Science and Technology, Changchun 130022, China

^*

Author to whom correspondence should be addressed.

Algorithms 2026, 19(1), 86; https://doi.org/10.3390/a19010086

Submission received: 12 November 2025 / Revised: 9 December 2025 / Accepted: 18 December 2025 / Published: 20 January 2026

Download

Browse Figures

Versions Notes

Abstract

Traditional decision optimization methods primarily focus on model construction and solution, leaving parameter estimation and inter-variable relationships to statistical research. The traditional approach divides problem-solving into two independent stages: predict first and then optimize. This decoupling leads to the propagation of prediction errors-even minor inaccuracies in predictions can be amplified into significant decision biases during the optimization phase. To tackle this issue, scholars have proposed end-to-end decision optimization methods, which integrate the prediction and decision-making stages into a unified framework. By doing so, these approaches effectively mitigate error propagation and enhance overall decision performance. From an architectural design perspective, this review focuses on categorizing end-to-end decision optimization methods based on how the prediction and decision modules are integrated. It classifies mainstream approaches into three typical paradigms: constructing closed-loop loss functions, building differentiable optimization layers, and parameterizing the representation of optimization problems. It also examines their implementation pathways leveraging deep learning technologies. The strengths and limitations of these paradigms essentially stem from the inherent trade-offs in their architectural designs. Through a systematic analysis of existing research, this paper identifies key challenges in three core areas: data, variable relationships, and gradient propagation. Among these, handling non-convexity and complex constraints is critical for model generalization, while quantifying decision-dependent endogenous uncertainty remains an indispensable challenge for practical deployment.

Keywords:

end-to-end decision optimization; data-driven decision-making; predict first; then optimize; deep learning; decision-oriented learning

1. Introduction

In recent years, the rapid advancement of information technology, coupled with the growing maturity and widespread adoption of machine learning, has significantly accelerated the interaction, sharing, and openness of data and information. Leveraging data analysis and mining to support decision-making in both production and everyday life has become an inevitable trend. This shift is clearly reflected in publication trends; according to the Web of Science Core Collection, studies indexed with the keywords “data-driven” and “decision-making” grew slowly between 2010 and 2016, with fewer than 100 publications per year. By 2024, however, annual output in this field had surpassed 1000 articles.

Traditional uncertain decision-making methods based on a small amount of data follow the framework of “predict first, and then optimize”, in which the optimal decision is generated by solving a mathematical optimization model (e.g., stochastic programming) with uncertain parameters, which are often estimated by machine learning algorithms or statistical methods [1]. That is, traditional uncertain decision-making involves two stages, parameter prediction (estimation) and decision optimization, with these two stages being conducted separately [1]. The main problem with this approach is that the loss function used for prediction in machine learning algorithms is the prediction error, which does not take into account downstream optimization problems, leading to the prediction error propagation problem-even minor inaccuracies in predictions can be amplified into significant decision biases during the optimization phase [2]. Meanwhile, when tackling large-scale problems such as multi-stage inventory management or dynamic pricing, traditional methods involve high manual costs for parameter tuning and prove inefficient. To address these challenges, the concept of end-to-end intelligent decision-making has emerged. It constructs a comprehensive architecture that covers the entire process from data to decision-making, enabling goal-oriented learning to enhance the accuracy and efficiency of decisions.

In today’s increasingly mature data science, the traditional decision optimization process is facing unprecedented challenges, especially the problem of separation between prediction and optimization. Therefore, end-to-end decision optimization is an inevitable trend of technological development. End-to-end decision optimization, also known as decision-focused learning in early research, is a paradigm that maps raw features directly to optimal decisions by jointly training prediction and optimization modules. Its training goal is to minimize decision loss (rather than prediction error), or to achieve the optimal utility of a decision.

As early as 1997, the concept of end-to-end decision optimization was applied to the investment optimization problem, and Bengio et al. [3] demonstrated through theoretical exposition and data experiments that end-to-end decision results are more aligned with the practical needs than those of traditional decision-making methods. Since then, this theory has been widely applied in the economic field, and with the continuous innovation of data analysis methods and techniques in recent years, the idea of end-to-end decision-making has become a hotspot of research in engineering, mathematics, and management science. As demonstrated by Qi et al. [4] in their implementation for multi-cycle inventory replenishment at JD.com, end-to-end models generate replenishment decisions directly from dynamic inventory characteristics, significantly reducing inventory costs and shortening decision cycles. Compared to two-stage approaches, they eliminate the propagation of forecasting errors into safety stock calculations. In power dispatch scenarios, Yu et al. [5] employed deep reinforcement learning to construct an integrated controller. This approach achieves an end-to-end mapping between high-dimensional data (e.g., wind speed, electricity prices, energy storage status) and energy storage commands. It enables near-real-time joint optimization of wind power forecasting and energy storage actions, facilitating dynamic adjustments to minute-level market pricing.

From traditional approaches to data-driven methods incorporating machine learning and end-to-end intelligent decision-making, all three paradigms leverage data to enhance decision quality. However, only the end-to-end approach effectively addresses the problem of misaligned or decoupled objectives between prediction and optimization. The relationships and distinctions among these paradigms are summarized in Table 1. This review examines the paradigm of end-to-end decision optimization from the perspective of implementation architecture, offering a systematic analysis of its current challenges and future directions. While prior surveys have largely focused on generalized decision-making frameworks, and those specifically addressing end-to-end decision optimization have tended to concentrate on narrow technical aspects-such as gradient processing methods-this work distinguishes itself through three key contributions:

Architectural Categorization: Based on how prediction and decision modules are integrated, this review categorizes end-to-end decision optimization into three paradigms: (a) constructing closed-loop loss functions, (b) building differentiable optimization layers, and (c) parameterized representation of optimization problems.
Comparative Analysis and Practical Guidance: Combined with specific cases, this review analyzes and contrasts the strengths, limitations, and applicability of each paradigm, offering actionable recommendations for their deployment in real-world settings.
Critical Synthesis of Challenges and Opportunities: This review identifies and analyzes the challenges facing end-to-end decision optimization in practical scenarios—including issues related to data, complex constraints, non-convexity, decision-dependent uncertainty—and highlights emerging opportunities for future research.

By providing a structured architectural perspective on this rapidly evolving field, this review aims to help researchers better understand the design principles, implementation trade-offs, and open problems in end-to-end decision optimization.

2. Literature Review Analysis and Research Framework

Intelligent decision-making algorithms that combine prediction and optimization have had different emphases in the development of different subject areas, with management reviews focusing on systematization [6,7,8,9], and science and engineering focusing on engineering problems [10,11,12]. Table 2 presents a systematic comparison of the six recent reviews across six dimensions: research field orientation, core issues, core framework, theoretical contributions, limitations, and dominant application areas.

In 2020, Yu et al. [7] summarized five key characteristics of big data decision-making in their paper: dynamic nature, holistic scope, uncertainty, a shift from causal analysis to correlation analysis, and a transition towards meeting personalized needs. They also analyzed current developments and trends, outlining the challenges facing intelligent decision-making powered by big data. In 2021, Kotary et al. [10] introduced end-to-end learning for constrained optimization problems from two perspectives: machine learning-assisted solvers for optimization problems and the use of machine learning to predict uncertain parameters in optimization problems. Their work focused on the application and prospects of machine learning techniques in combinatorial optimization problems, whilst also introducing deep learning models, including deep neural networks, recurrent neural networks, and graph neural networks. Fajemisin et al. [13] provide a framework for constrained optimization learning, whereby realistic optimization problems where constraints or objectives are not explicitly formulated can be solved in a structured manner. Yu [8] proposed five primary directions in this field in his 2022 publication, which are listed as follows: (1) AI-driven decision-making paradigm shift mechanism research; (2) data feature-driven AI prediction theory and methodology research; (3) data feature-driven AI decision optimization theory and methodology research; (4) domain knowledge-dependent AI prediction and decision-making theory and application research; (5) interpretable AI system prediction and decision optimization theory and application research. Each direction is elaborated and analyzed with examples.

In the modern supply chain, industrial chain, and other complex scenarios, the optimization problem faces more parameter uncertainty, distribution unknowns, and other diverse uncertain environments. Data-driven optimization methods can notably boost solution robustness. However, traditional data-driven approaches suffer from the aforementioned prediction-decision separation and feedback neglect, which limits their capacity to address uncertainty and ultimately compromises decision accuracy. Wang et al. [9] focus on management decision-making problems in complex uncertainty environments and review the integration of statistical predictive modelling and decision optimization methods to construct a data-driven joint management paradigm. Both stochastic optimization and distributional robust optimization are developed, and applications in transportation and logistics management, inventory management, and revenue management are given. Sadana et al. [12] summarized different approaches integrating operations research with machine learning, categorizing data-driven decision-making methods into three classes: Decision rule optimization, Sequential learning and optimization (SLO), and Integrated learning and optimization (ILO). This unified terminology is for existing data-driven decision models and methodologies. Mandi et al. [11] categorized optimization approaches from data to decision into decision-focused learning (DFL) and prediction-focused learning (PFL). They provided a comprehensive exposition of decision-centric end-to-end methods, reviewed diverse DFL approaches alongside their practical applications and empirical evaluations, established benchmark datasets and tasks for DFL research, and offered valuable developmental insights. In this paper, we explore the end-to-end approaches and challenges of achieving end-to-end data-to-decision from the implementation architecture of end-to-end decision optimization, and the core research is shown in Figure 1.

To facilitate understanding, this section provides an explanation of relevant terminology. The core research object of this paper is “End-to-End Decision Optimization”. “End-to-End Intelligent Decision-Making” is a synonymous expression with “End-to-End Decision Optimization”, but the former is more macroscopic, while the latter better reflects the integration with optimization. In the subsequent content, the viewpoints proposed in this paper will uniformly adopt “End-to-End Decision Optimization”, and the terminology cited from the literature will remain consistent with the sources. “Prediction-Driven Optimization”, “Contextual Optimization”, and “Decision-Focused Learning” are related expressions; although they differ in focus, they are all encompassed within the research scope of End-to-End Decision Optimization. “End-to-End Decision Optimization” overlaps with concepts such as “Decision Rule Optimization” and “Integrated Learning and Optimization,” yet it has its own distinct emphasis. The latter concepts revolve around optimization problems but do not focus on end-to-end implementation, whereas “End-to-End Decision Optimization” represents the deep integration of modules—it not only retains the meaning of decision optimization from previous studies but also improves the end-to-end framework. “Sequential Learning and Optimization”, by its nature, is a two-stage method.

3. Decision Optimization

This section provides a foundational overview of decision-making concepts and decision optimization problems, establishing the conceptual groundwork for the discussions that follow. It is structured as follows: Section 3.1 elaborates on the understanding of decision-making across diverse domains, highlighting its conceptual essence and practical significance; Section 3.2 formulates the general mathematical expression of optimization problems and provides a concise introduction to solution approaches; And Section 3.3 provides a brief overview of two-stage decision-making.

3.1. The Concept of Decision-Making

The value of data lies primarily in providing useful information for decision-making and revealing underlying patterns and relationships through data analysis and mining. Decision-making is a ubiquitous aspect of human activities, evident in various contexts such as consumer shopping decisions, inventory control, pricing, logistics, distribution planning in enterprises, and traffic control in transportation scenarios. Decision-making is understood in different ways in different domains. At the level of business management, the decision-making process is often seen as an activity that relies on experience and intuition. Managers faced with complex problems need to blend personal experience with environmental insights to make choices that are most beneficial to the organization. The process emphasizes a thorough analysis of the situation and an assessment of potential risks. It is centered on the use of subjective judgment to optimize resource allocation and organizational efficiency. For example, when making strategic decisions, senior management will make judgments based on experience, taking into account market trends, competitive dynamics, and internal resources. Although this approach is somewhat subjective, it has the advantage of being quick and flexible in its response.

In the field of optimization modelling and data analytics, the decision-making process is much more dependent on data and logical reasoning. Optimization modelling transforms a decision-making problem into a mathematical problem and seeks an optimal solution through algorithms, with an emphasis on logic and precision. This approach involves extensive data analysis and model validation processes, such as those used in supply chain management, where optimization models are employed to determine inventory levels and transport routes with the aim of reducing costs and improving operational efficiency. In the field of machine learning, algorithms can automate decision-making, such as decision trees that enable data classification or prediction through node judgment. Model-based decision-making approaches focus on quantitative analysis and systematic processing to provide explicit solutions in a data-driven manner. In summary, whether it is experience-based management decision-making or data-based optimization decision-making, the common goal is to select the most ideal course of action through scientific methods and reasonable judgment. With the societal advancement, the degree of uncertainty in decision-making has grown increasingly pronounced, whilst the difficulty and complexity of decisions continue to escalate alongside greater environmental openness and expanding problem scales [7]. Traditional decision-making approaches, which rely on experience or limited data, have become wholly inadequate for meeting the increasingly personalized, diverse, and intricate demands of contemporary decision-making.

3.2. Decision Optimization Problems

The optimization problem can generally be described as:

\min_{z \in Z} f (z)

(1)

where

z = {(z_{1}, z_{2}, \dots, z_{n})}^{T} \in R^{n}

is a decision variable,

f : R^{n} \to R

is the objective function, and

{Z \in R}^{n}

is the set of constraints or the feasible domain. When

{Z = R}^{n}

, the optimization problems are called unconstrained optimization problems [14]. In this problem, the parameters in the objective function and constraints are all deterministic. However, in practical applications, many parameters cannot be known in advance. Examples include the interference of force majeure on data, data bias, discrepancies between existing models and the actual problem, and simplifications made to models that are difficult to solve. Such problems involving uncertain parameters are termed uncertain decision optimization, and their mathematical model is typically expressed as:

\begin{array}{l} \min f (x, y) \\ s . t . h (x, y) \leq 0, \forall y \in U \end{array}

(2)

where

x

is historical data,

h (\cdot)

is the constraint,

y

is the parameter to be estimated, and

U

is the set of parameters

y

(when

y

is obtained by point estimation, the planning problem becomes a deterministic planning problem). When solving uncertain decision optimization problems, traditional approaches employ methods such as stochastic programming, robust optimization, and fuzzy programming. Uncertain decision-making or stochastic optimization problems hold significant applications in machine learning, deep learning, and related fields. Their objective functions take the form of expectations over unknown parameters; however, owing to the uncertainty surrounding these parameters, methods often employ sample-based approximations of the objective function.

With the advent of the data era, the volume of recordable sample data has been increasing steadily, and the use of data to address uncertain decision optimization has garnered widespread attention. In optimal decision-making problems, data-driven approaches can be viewed as balancing the gap between estimation costs and actual costs, while simultaneously ensuring a certain level of out-of-sample performance [15]. Bennouna et al. [15] developed a data-driven model and gave three different out-of-sample performance models, namely super-exponential, exponential, and sub-exponential models, where the optimal data-driven formulation can be interpreted as the classical robust formulation in the super-exponential state, the entropy-distributed robust formulation in the exponential state, and the variance-penalized formulation in the sub-exponential state. Fan et al. [16] investigated two-stage stochastic optimization problems with random dependencies, wherein adaptive decisions are multiplied by uncertain parameters within the objective function and constraints. To address the computational challenges of this optimization problem, the paper proposes a scalable approximation scheme using piecewise linear and piecewise quadratic decision rules and develops a data-driven distributed robust framework to tackle distributed uncertainty. Data-driven uncertainty decision-making models incorporate uncertainty within a mathematical programming framework to achieve decisions, yet the connection between decision optimization and data characteristics remains tenuous.

3.3. Two-Stage Decision-Making: Prediction Followed by Optimization

Traditionally, decision optimization is frequently employed and is, in fact, composed of a two-stage process of prediction followed by optimization. This two-stage decision model forms a cascading processing flow between prediction and optimization, whose mathematical essence can be expressed as follows:

Prediction stage:

\hat{y} = \underset{ω}{argmin} E_{(x, y) \sim D} [l_{p r e d} (M_{ω} (x), y)]

(3)

Optimization stage:

z^{*} = \underset{z \in Z}{argmin} f (z; \hat{y})

(4)

where

ω

is the learning parameter of predictive model

M_{ω} (x)

,

l_{p r e d} (\cdot)

is the loss function,

\hat{y}

is the predicted value, and

z^{*}

is the optimal solution to the optimization problem. Its structure is shown in Figure 2:

Although such two-stage paradigms remain irreplaceable in scenarios prioritizing interpretability [17], governed by hard constraints [18], or involving organizational decoupling [14], their inherent flaws lie in the following three aspects: (1) Predictive accuracy does not equate to decision quality; thus, when focusing on the utility derived from decisions, the two-stage paradigm leads to fragmented objectives. (2) The upstream–downstream relationship between prediction and decision-making enables error propagation, where even minor prediction deviations may precipitate catastrophic decisions. (3) Point estimation overlooks tail risks, creating blind spots in risk assessment. To circumvent these issues, end-to-end decision-making architectures have emerged as the focal point of this research.

4. End-to-End Decision-Making Architecture Integrating Prediction and Optimization

The initial research of end-to-end approaches is mainly applied to the field of autonomous driving, and two paradigms, reinforcement learning and imitation learning [19,20], are the mainstream directions. Among them, reinforcement learning drives the autonomous evolution of the policy network through the reward signals generated from the interaction between the environmental data collected by the sensors and the environment, while imitation learning builds supervised learning models based on expert demonstration data to directly fit human driving behaviors for optimal decision making. Compared to traditional modular systems that rely on multi-level annotations (e.g., frame-by-frame image semantic segmentation), such approaches use only sparse action instructions as supervisory signals, which present significant challenges of high training complexity and lack of convergence stability [19]. With the advancement of digital management models, the adoption of scientific and efficient decision-making methodologies has become an inevitable trend. Researchers have expanded and integrated the traditional two-stage prediction-optimization paradigm from diverse perspectives, aiming to explore viable end-to-end approaches. Current methods for achieving end-to-end decision optimization broadly fall into three categories: constructing closed-loop loss functions, building differentiable optimization layers, and parameterizing the representation of optimization problems. The advancement of deep learning has introduced novel approaches to specific decision-making problems through its robust representational capabilities. Consequently, an increasing number of deep learning-based end-to-end methods are being explored, enriching both the methodologies and applications of decision optimization. To comprehensively investigate end-to-end decision optimization algorithms, this section will elaborate on these three approaches and present case studies of end-to-end frameworks incorporating deep learning.

4.1. Constructing Closed-Loop Losses

Traditionally, loss functions have been employed to train the predictive capabilities of machine learning models. Within the conventional decision-making paradigm of predicting first and then optimizing, loss functions are applied solely to the initial prediction phase. This approach fails to train the subsequent decision-making process and may even introduce errors. Therefore, introducing decision-oriented loss functions shifts the training focus from prediction outcomes to decision-making itself, thereby mitigating the adverse effects of separating prediction and optimization stages. The representative approach, “Smart ‘Predict, then Optimize’ (SPO)”, has been extensively explored and expanded.

The SPO method considers optimizing the model structure during prediction by treating the error of the target value as a loss function, as shown in Figure 3. Elmachtoub and Grigas [2] first proposed this framework in 2022, referring to the SPO loss function as decision error. The SPO loss function is defined as follows:

l_{S P O}^{w^{*}} (\hat{c}, c) : = c^{T} w^{*} (\hat{c}) - z^{*} (c)

(5)

where

w^{*} (\cdot)

is the decision,

\hat{c}

is the predicted cost,

c

is the actual cost,

z^{*} (c) = \min_{w \in S} c^{T} w

is the objective value of the optimization problem, and

S

denotes the feasible region of decision variables.

S

is specified as a nonempty set: this non-emptiness is a prerequisite for the optimization problem—if

S

were empty, there would be no valid decision

w

satisfying the constraints, and the problem would have no solution. It can be seen that in the actual training process, the cost is first predicted, and then the difference between the optimal decision under the predicted cost and the actual cost is used as the training loss function. From the perspective of decision error, the model integrates the prediction and decision-making processes into a unified framework, but the output of the prediction process is still the parameter of the optimization problem, so it essentially belongs to a two-stage prediction and decision-making process. After analyzing the SPO loss function, the author found that it precisely matches the 0–1 loss function associated with binary classification, which is often an NP hard problem [21]. Therefore, the author used duality theory to derive a convex proxy loss function (SPO+ loss function) [2]:

l_{S P O +} (\hat{c}, c) : = \max_{w \in S} \{c^{T} w - 2 {\hat{c}}^{T} w\} + 2 {\hat{c}}^{T} w^{*} (c) - z^{*} (c)

(6)

The SPO loss function here depends on the structure of linear programming, and when the objective function is nonconvex, the loss function may encounter non-differentiability or convergence issues. This leads to an inability to apply in practice. Moreover, gradient computation requires solving an optimization subproblem for each training sample, and the computational complexity becomes excessively high when decision variables are of high dimension. Liu and Grigas [22] proved the risk boundary for the agent loss function and extensively extended the consistency results between the SPO+ loss function and the SPO loss function. Ho-Nguyen and Kilinç-Karzan [23] studied how prediction performance affects optimization performance under a linear objective function, derived the relationship and conditions between the two, and proved that the proxy loss function proposed by Elmachtoub and Grigas satisfies its strong condition. These studies complementarily analyze the properties of the SPO loss, yet they still fail to break through the limitation of linear objectives. Wilder et al. [24] and Mandi et al. [25] investigated combinatorial optimization problems within the smart “predict-then-optimize” framework. This study is not limited to linear objective functions, but it can only guarantee local optimality, and its training efficiency on large-scale data is lower than that of the SPO loss. Loke et al. [26] generalized the SPO+ loss function by integrating empirical and estimated costs to construct a regularizer, which was then incorporated into the machine learning loss function, termed decision-driven regularization (DDR). However, the DDR framework relies on the setting of mixed cost weights and regularization coefficients. Compared with the SPO loss, it not only increases computational complexity but also remains confined to linear objectives. Elmachtoub et al. [27] further extended the application of the SPO loss function by addressing the scenario of unknown parameters in decision tree prediction optimization. They proposed SPO Trees (SPOTs), which utilize the suboptimality of decisions resulting from predicted parameters as the prediction loss function. This tree-structured approach bypasses the bottleneck of gradient computation, preserves model interpretability, and extends applications to discrete decision-making scenarios. However, the greedy splitting of decision trees tends to overlook the global optimality of decisions, and the model exhibits poor generalization ability under high-dimensional features. Using the same framework idea, Huang et al. [28] proposed the perturbation gradient (PG) loss which is a new decision-aware alternative loss that approximates the directional derivative of the decision loss through the zero-order gradient technique, overcoming the optimization difficulties caused by the discontinuity and nondegradability of the traditional decision loss, and its theory ensures that the approximation error tends to be zero under large samples and achieves the model still converges asymptotically even under the misconfiguration condition to the optimal strategy. However, the PG loss is non-convex, making training prone to getting trapped in local optimality and thus relying on initialization strategies. Schutte et al. [29] proposed three robust loss functions, the Robust Optimization (RO) Loss, the Best k Decisions (Top-k) Loss, and the k-Nearest Neighbour (k-NN) Loss, in order to more robustly approximate the expected regrets, which integrate the robust optimization and integrated learning ideas into end-to-end loss design and which provide a new path to address model misspecification and decision generalization under uncertainty. Robust loss achieves lower decision error than the SPO loss in noisy scenarios, but its accuracy is inferior to that of the SPO loss in noise-free scenarios.

The research mentioned above corrects the parameters of predictive models by incorporating feedback from decision-making. Although its numerical experiments using synthetic and real data validate the method’s effectiveness, the analysis primarily focuses on linear programming scenarios, leveraging the structural characteristics of optimization models. Additionally, the real-world cost parameters within the loss function are derived from historical data, which cannot guarantee they represent costs under optimal decisions. This indirectly impacts decision accuracy. This labeling issue remains a significant challenge in the field of end-to-end decision optimization.

4.2. Constructing a Differentiable Optimization Layer

In recent years, learning-based optimization has emerged as a research hotspot in the field of combinatorial optimization, leveraging machine learning to boost the efficiency of traditional algorithms. Among these approaches, neural networks exemplify an end-to-end methodology that transforms data directly into results. Current research advancements have enabled the integration of convex optimization problems as differentiable layers within neural networks, facilitating a “one-stop” solution for prediction-optimization problems and yielding final decisions, as shown in Figure 4. The crux of this end-to-end decision optimization framework lies in resolving gradient propagation issues for differentiable layers in neural networks, thereby allowing optimization layers to engage in feedback processes.

Butler et al. [30] introduced an alternative network architecture based on the Alternating Direction Method of Multipliers (ADMM) to address the challenge of incorporating medium or large-scale quadratic equations into neural networks. The computational benefits of this approach effectively address the complexity issues arising from the number of variables when using the interior point method for precise solutions. However, it relies on the setting of penalty parameters and does not support non-convex constraints. Amos [31] introduced OptNet as a solution for optimization problems concerning quadratic equations. This approach encodes constraints and complex dependencies between hidden states using similar principles, which are hard to capture with traditional convolutional and fully connected layers. Meanwhile, the authors leveraged GPU-based batch processing to develop an efficient solver, which was further integrated with backpropagation gradient computation. This design achieves enhanced efficiency while controlling computational costs. Additionally, experimental validation using Sudoku instances demonstrated that OptNet exhibits superior performance in learning hard constraints. While the backpropagation of OptNet can reuse the results of forward decomposition, the matrix storage and computational costs surge in high-dimensional scenarios, limiting its applicability to only small-scale problems. Agrawal et al. [32] further expanded this framework to encompass convex cone optimization problems, introducing the CVXPY Layer technique. This technique transforms convex programs into standard cone forms, enabling differentiated solutions for any convex program. While it proves efficient for small-scale problems (with fewer than 100 variables), its scalability to medium and large-scale scenarios is hindered by the O(n³) complexity associated with KKT matrix decomposition. However, it exhibits distinct advantages under sparse data scenarios. It can reduce complexity through matrix sparsity optimization and is suitable for convex optimization problems requiring flexible modeling. McKenzie et al. [33] proposed a differentiable optimization framework rooted in the Davis–Yin splitting algorithm. Specifically, this framework transforms integer linear programming (ILP) into a composite optimization problem and leverages neural networks to dynamically adjust iterative parameters (e.g., step size and momentum), thereby realizing the deep integration of the optimization process and learning mechanisms. Experimental results demonstrate that, in the context of knapsack problems and scheduling tasks, this framework exhibits significant superiority over traditional solvers and pure data-driven models. It further provides a novel paradigm that integrates theoretical guarantees with computational practicality, offering support for the efficient solution of ILP problems. However, this method is only applicable to convex optimization problems with linear constraints, and nonlinear objectives require prior linear approximation. Liu et al. [34] constructed an end-to-end framework to integrate graph neural networks (GNNs) with combinatorial optimization problems. Within this framework, they designed an unsupervised graph prediction module and an unconstrained optimization solver. Experimental findings indicate that this end-to-end approach outperforms both independent GNN-based methods and classical combinatorial optimization methods. However, the pre-training of GNNs requires substantial graph data, leading to poor generalization in few-shot scenarios. Additionally, some combinatorial optimization problems incur high transformation costs, limiting their applicability to only graph-based combinatorial optimization problems with weak generality.

In the method of embedding optimization problems into neural networks as optimization layers, the KKT method incurs a significantly higher computational cost for a single gradient calculation than forward-solving approaches when dealing with a large number of variables. To address the issues of gradient computation complexity and memory consumption, Sun et al. [35] proposed alternating differentiation (Alt-Diff). This method substantially reduces the dimension of the Jacobian matrix by alternately updating the gradients of primal variables and dual variables. While maintaining the theoretical consistency of KKT condition differentiation, Alt-Diff supports truncation acceleration, providing a reference for efficient learning in large-scale constrained optimization problems. However, this method still has limitations: its forward propagation relies on ADMM iterations, resulting in lower efficiency than OptNet for small-scale problems. It is applicable to convex objectives with polyhedral constraints and does not support non-convex problems or complex nonlinear constraints.

All the cases of constructing differentiable optimization layers mentioned in this section share a core objective: addressing the gradient propagation challenge that arises when an optimization problem is integrated as a layer within a neural network. Current research primarily focuses on convex optimization problems and combinatorial optimization problems, which makes it difficult to guarantee effectiveness in scenarios involving problems whose convexity or concavity cannot be determined. Meanwhile, in the application of supervised learning, nearly all these methods pre-assume the existence of optimal decision labels—an assumption that is rarely feasible in practical optimization problems. This challenge also persists in other supervised end-to-end approaches. The differentiable handling of non-convex optimization problems will be further discussed in Section 5.4.

4.3. Parameterized Representation of Optimization Problems

The core idea of parameterized representation for optimization problems is to parameterize the decision variables, objective functions, or constraint conditions of an optimization problem, and integrate them with a learnable parameter model. This integration enables the model’s training process to guide the solution of the optimization problem directly, and its architecture is shown in Figure 5.

Guler et al. [36] defined the coefficients of the optimization problem as model parameters

β

(such as coefficients

β

of a linear model). With the input feature information denoted as

x

, a parameterized function

V_{p} = V_{p} (x, β)

was derived. By further parameterizing the optimization problem into a parameter

O b j (z, V_{p} (x, β))

, a decision solution

z_{p} = s (x, β)

that depends on this parameter was obtained. Meanwhile, repeatedly solving problems using optimizers also incurs computational resource consumption, requiring a balance between efficiency and accuracy in practical applications. Donti et al. [37] investigated end-to-end machine learning methods in stochastic optimization. They adopted the objective function of downstream tasks as the loss function and verified the method’s effectiveness through numerical experiments, though no theoretical analysis was conducted. A notable limitation of this method is that when decision variables and intermediate variables do not have a simple additive-subtractive relationship, the conventional solvers used in model construction fail to support the establishment of an integrated model. Oroojlooyjadid et al. [38] proposed leveraging deep learning to address the Newsvendor Problem in inventory management, enabling the direct derivation of optimal order quantities from data. Specifically, this study [38] designed two types of loss functions for deep neural networks: the sum of squares of the cost function and the modified Euclidean loss function. Zhang and Gao [39] identified that the cost function of the Newsvendor Problem itself can serve as the loss function for deep neural networks, and they further evaluated its performance. However, the construction methods of these loss functions rely on the unique structural characteristics of the specific problems themselves. These methods, based on deep neural networks, achieve the mapping from features to optimal order quantities, skipping the estimation of demand distributions and directly minimizing newsvendor costs. However, they face challenges in tuning hyperparameters such as network structures and learning rates, and their ability to scale to complex constraints is limited. Since the network weight parameters lack intuitive meaning, it is impossible to explain the causal relationship between these parameters and the decisions. Therefore, such methods are suitable for scenarios requiring high flexibility but low interpretability—e.g., low-risk scenarios with multi-feature, stochastically parameterized constraints that accept black-box decisions.

Kong et al. [40] innovatively employed an energy function-based differentiable optimization layer to parameterize the original optimization problem accurately. Building on this foundation, they leveraged the maximum likelihood loss function to successfully develop a self-normalized importance sampler based on the Gaussian mixture model. Termed SO-EBM, this method provides a novel approach for efficiently training energy-based models (EBMs). This method overcomes the limitations of convex optimization but relies on the data quality of “feature-optimal decision” pairs, incurring high preprocessing costs. Additionally, complex constraints still require additional handling. Sun et al. [41] directly adopted a model-agnostic unsupervised method to learn feasible solutions directly from historical data and problem parameters. They added a modified activation function at the end of the network to implicitly enforce constraints, replacing the traditional explicit constraint handling. Meanwhile, they designed a corresponding unsupervised loss function: by directly minimizing the degree of constraint violation (e.g., the sum of squares of constraint slack terms), the method eliminates reliance on labeled data. Although this method lacks explicit interpretability for the problem itself, it offers a more flexible solution for black-box problems with unknown or dynamically changing constraint forms. It is worth noting that this method obtains decisions through interactive exploration of the environment, resulting in poor generalization ability and low efficiency in few-shot scenarios, as well as high costs for hyperparameter tuning. While such methods exhibit stronger adaptability to different problem types and higher flexibility, they sacrifice model interpretability.

Different from methods that construct loss functions, the Local Optimization Decision Loss (LODL) framework proposed by Shah et al. [42] addresses the limitations of traditional Decision-Focused Learning (DFL)—which relies on manually designed surrogate losses and differentiable optimization—by automatically building task-relevant local convex loss functions during the learning phase. It leverages a black-box optimizer to generate decision samples and directly fits the asymptotically optimal decision quality, enabling more general and scalable end-to-end training. However, this method is also highly dependent on data quality and only fits local decision losses, with its global generalization ability relying on sample density. Mandi et al. [43] even considered framing decision optimization problems as ranking problems, with the goal of learning an objective function that correctly ranks feasible points. This learning-to-rank-based framework reduces the number of optimizer calls, improving training efficiency. However, the quality of the workable solution set has a significant impact on performance, and sparse subsets are prone to causing ranking bias.

The aforementioned methods directly learn decisions by establishing different learning models, achieving a direct mapping from features to decisions. They offer high flexibility in optimization problems but generally suffer from the black-box characteristic of insufficient interpretability, strong data dependence, and limited ability to handle complex constraints. Consequently, black-box decision learning struggles to meet the requirements of high-risk scenarios such as finance and healthcare. However, it exhibits high efficiency for low-risk optimization problems with unknown objectives and constraints, such as inventory management and path planning.

Within this framework, reinforcement learning effectively avoids issues related to label quality and structural interpretability. In recent years, solving combinatorial optimization problems using deep reinforcement learning has also attracted widespread attention, as it combines the strong decision-making capabilities of reinforcement learning with the excellent perception capabilities of deep learning [44]. Solozabal et al. [45] defined constrained combinatorial problems as fully observable Constrained Markov Decision Processes (CMDPs) and proposed a deep reinforcement learning-based framework for solving combinatorial optimization problems; this method demonstrates advantages in computing fast solutions. Wang et al. [46] mapped data features to Markov Decision Process (MDP) parameters and obtained policies by solving the predicted MDP using deep reinforcement learning. Ge et al. [47] proposed an end-to-end deep reinforcement learning network framework to address constrained Vehicle Routing Problems. As deep reinforcement learning has achieved significant successes in various fields, uncertainties in network models and interaction processes have garnered attention. Pang et al. [48] explored deep reinforcement learning methods based on uncertainty, sorted out the basic principles, advantages, and disadvantages of various methods, and provided prospects for future development. Therefore, there is substantial room for research on the integration of deep learning and decision optimization.

4.4. Architectural Comparison and Design Implications

This section provides an overview of end-to-end decision optimization methods and summarizes these three solutions through Figure 6 and Table 3. Figure 6 presents a unified architectural diagram of the three paradigms. Through the connecting lines and annotated components in the diagram, the data flow directions, learning signal types, and decision-making interfaces of different paradigms are visually illustrated. It can be clearly observed that the three paradigms share commonalities in their core processes while retaining distinct characteristics that distinguish one from another.

The essence of constructing a closed-loop loss is to integrate the prediction process and decision-making process through the loss function. Since the optimization during training still focuses on the predictive model itself, no additional computational overhead is introduced—this makes it particularly well-suited for low-latency, cost-sensitive scenarios such as linear programming and small-scale inventory optimization. Embedding an optimization problem as a layer into the neural network requires the optimization process to participate in gradient feedback directly. The key to this technique lies in implementing the differential iteration of the optimization layer and incorporating interpretability operations into the model architecture. This makes the approach more advantageous for high-interpretability, high-stakes convex optimization scenarios like financial portfolio optimization; however, it also increases model complexity and computational costs, and struggles to accommodate non-convex constraint problems. The paradigm of parameterizing optimization problems involves converting the optimization problem itself into a learnable parameter form, which is then trained alongside sample data—ultimately outputting decisions directly via the network. This method offers strong dynamic adaptability, making it suitable for non-convex/discrete and highly dynamic scenarios, such as large-scale path planning and power dispatch. However, its “black-box” structural design results in poor interpretability and requires substantial data (or environment-interaction samples). In practical applications, the appropriate solution should be selected based on the core needs of the specific scenario.

5. Current Research Directions and Challenges

Although the end-to-end decision optimization methods mentioned earlier have been validated in simulations and numerical experiments, their inherent limitations still severely restrict their wide application in complex real-world scenarios. In current research, most studies assume data availability or use synthetic data [2,33,36,40], and the quality of data and labels is crucial for decision optimization learning. To simplify the problem during research, it is generally assumed that intermediate variables have a one-way impact on decisions—with decisions only exerting a feedback effect on data or variables—while the causal relationship between decisions and variables in real-world scenarios is not considered [37]. In terms of scalability, the construction of some models relies on the specific structure of optimization problems. For instance, methods in [2,33] are effective only for linear objective functions, and the method in [31] targets only quadratic optimization problems; these methods are not necessarily effective for non-convex problems. The key to solving non-convex problems in end-to-end decision optimization lies in gradient propagation, which also constitutes a significant challenge in terms of model generalization and constraint handling. Next, we will elaborate on the representative challenges of end-to-end decision optimization.

5.1. Data Sources

In the field of machine learning, a key current assumption is that it relies on historical data for learning and evaluation, with such data possessing good quality and an appropriate volume. However, a practical limitation is that in reality, most enterprises and merchants fail to collect comprehensive data or utilize data effectively, resulting in poor data quality. Meanwhile, excessive data increases computational costs, while insufficient data prevents the capture of key information; however, some studies (Besbes et al. [49]) have explored the impact of data volume on the newsvendor problem, and universal solutions remain lacking. In practical applications, data quality flaws severely undermine the model’s decision-making logic. Excessive data leads to a surge in computational costs that cannot meet the requirements for real-time decision-making, while insufficient data hinders the model from learning complete decision rules, thereby reducing its generalization ability. Ultimately, this results in the practical performance of end-to-end decision optimization methods being significantly lower than that in simulated scenarios. Developing targeted data preprocessing techniques tailored to decision optimization tasks to enhance data quality will be a significant endeavor in the future.

5.2. Label-Related Issues

In supervised end-to-end decision optimization, a key assumption is that high-quality labels (e.g., true cost vectors, optimal decisions, optimal paths) are easily accessible for model training and evaluation. However, in practical scenarios, such labels are inherently difficult to obtain—especially for combinatorial optimization problems, where historical data rarely contains strictly optimal or even suboptimal decision solutions. The computational method for simulating cost labels proposed by Elmachtoub and Grigas [2] has been applied to decision research in other similar optimization problems. In some small-sample studies, manual labeling is adopted; however, these approaches lack scalability. Kossen et al. [50] proposed the Active Surrogate Estimators (ASEs) method, which leverages interpolation and active learning to more efficiently predict the loss of unlabeled samples, thereby reducing reliance on direct labeling. Settles [51] and Liu et al. [52] integrated concepts such as active learning and degradation distance to screen and learn instances, prioritizing the labeling of samples that can maximize the improvement of decision performance, thus minimizing annotation costs while ensuring model effectiveness. These studies can extend to several promising research directions, such as improving model-based label estimation techniques, refining active learning strategies based on decision optimization characteristics, and constructing models that do not require decision labels.

5.3. Intrinsic Uncertainty

In existing decision optimization research, a common assumption is that uncertain parameters exert a unidirectional influence on decisions. While this simplifies end-to-end decision models, it overlooks the complex bidirectional relationship between parameters and decisions in real-world scenarios. In practical settings, decisions often reshape the distribution of uncertain parameters (i.e., decision dependence)—a dependence that is difficult to characterize, with models failing to disentangle the bidirectional influences effectively. A typical example is the facility location problem, where determining facility locations based on surrounding demand, the establishment of facilities in turn alters the local demand distribution. Such demand exhibits the key characteristic of decision dependence. To address this phenomenon, Basciftci et al. [53] proposed a modeling approach based on distributionally robust optimization (DRO). This approach interprets stochastic demand as a function of facility location decisions and reformulates the distributionally robust optimization model associated with two-stage decision-making into an integrated formulation. Empirical results demonstrate that this method not only accelerates computational speed but also enhances profit performance. Bertsimas et al. [1] proposed a strategy that decomposes the decision-making process influencing uncertain parameters into two separate components, thereby improving the robustness of the solution. In pricing research, the causal uncertainty between demand and price also poses significant challenges in quantification. To address this, Liu et al. [54] investigated the data-driven newsvendor pricing problem, assigning weights to historical samples based on historical demand and feature data, and further constructed an approximate model. However, due to the dependent relationship between demand and decisions, the model exhibits high computational complexity and is difficult to solve directly. In response to this issue, the authors proposed the Approximate Gradient Descent (AGD) algorithm. They also demonstrated that the proposed model can be extended to a broad range of decision-dependent problems, verifying its generalizability. Alley et al. [55] investigated the estimation of heterogeneous price sensitivity in the secondary ticket market and proposed a double machine learning approach. This approach eliminates the complex influences of market characteristics and prices through orthogonalization to isolate the causal effect between price and demand. Additionally, it incorporates a loss function compatible with non-parametric models (e.g., gradient-boosted trees), allowing for the flexible capture of nonlinear interactions between price sensitivity and ticket attributes. Finally, the authors applied this estimation to transaction management to optimize revenue. However, these existing methods are all problem-specific. Quantifying the causal uncertainty between parameters and decisions remains challenging precisely. Therefore, developing a universal modeling framework for decision-dependent uncertainty—integrating distributionally robust optimization, causal inference, and machine learning to accommodate diverse scenarios—represents a valuable research direction.

5.4. Gradient Propagation for Non-Differentiable Decision Modules

When addressing combinatorial optimization problems, a common assumption is that the problem exhibits a convex feasible region or a convex objective function, thereby enabling practical gradient propagation. However, in real-world scenarios, non-convexity is ubiquitous, which renders traditional gradient propagation methods ineffective. This issue is particularly evident in the construction of differentiable optimization layer architectures. To address the non-differentiability of combinatorial optimization problems, the solutions include:

(1): Continuous relaxation and convex approximation: approximating non-convex feasible regions via convex surrogate problems to construct a differentiable mapping path [32,56];
(2): Encoding heuristic rules for combinatorial optimization as differentiable operators [57];
(3): Handling discrete variables through implicit differentiation [58] or perturbation smoothing [59].

Berthet et al. [59] proposed the Stochastically Perturbed Optimizer, which adds random noise to the parameters of discrete optimization problems to obtain perturbed solutions that are expectation-smoothed and differentiable. Although this perturbation-based method exhibits good performance in classification, sorting, and path prediction tasks, it suffers from significant gradient errors in supply chain problems [24]. Wilder et al. [24] integrated discrete optimization problems into predictive models and used continuous relaxation of discrete problems to propagate gradients throughout the optimization process. They further instantiated the framework in linear programming and submodular maximization problems, but this method is only applicable to specific non-convex structures. Mandi et al. [60] utilized logarithmic barrier terms, widely used in interior-point solvers, and considered the homogeneous self-dual formulation of linear programming (LP) without distinguishing KKT conditions, and verified the effectiveness of this method through experiments. However, these methods are confined to linear programming contexts. The limitations of these approaches restrict the generalization ability of end-to-end decision optimization in practical problems. Regarding gradient propagation for non-convex problems, promising directions may include designing methods that dynamically adjust gradient propagation strategies based on the specific non-convex structures of target problems, and exploring the integration of meta-heuristic algorithms with differentiable models to escape local optima in non-convex scenarios and achieve global optimization in end-to-end training.

5.5. Embedding and Satisfying Complex Constraints

In complex real-world problems, one of the core challenges of end-to-end decision frameworks is how to embed complex constraints (e.g., physical equations, logical rules, or resource limitations) into differentiable computation graphs while ensuring solution feasibility. However, commercial solvers, which are widely used for solving optimization problems, cannot natively handle unstructured constraints output by neural networks. While common solutions exist, including the Lagrange multiplier method [61], dual variable learning [62,63], and feasible solution projection layers [24], they have inherent limitations. Fioretto et al. [61] introduced dual variables to convert the physical constraints of power grids into differentiable penalty terms, which were integrated into the loss function, significantly reducing the constraint violation rate. However, the update step size of dual variables requires manual parameter tuning, and the method may converge to suboptimal solutions in non-convex problems. To avoid explicit updates of dual variables, Kervadec et al. [63] proposed a logarithmic barrier extension to construct a smooth surrogate function for approximating constraint boundaries, where the gradient implicitly contains information on the dual variables. This method does not require an initial feasible solution, and the upper bound of the duality gap provides a suboptimality guarantee. However, it relies on smooth surrogate functions, which may introduce approximation errors. Wilder et al. [24] adopted quadratic regularization projection (QPTL) for resource allocation problems, which smooths the decision space through a quadratic regularization formulation, but lacks generalizability. Recent works have attempted to integrate multiple types of methods, such as combining hierarchical projection with dual learning, and using implicit gradient projection [64]. It can be observed that subsequent efforts to satisfy constraints still involve the gradient propagation issue of non-differentiable decision modules mentioned in the previous section. Therefore, constraint satisfaction and differentiability handling constitute key challenges in solving complex decision problems. In future research, promising directions may include constructing adaptive constraint embedding frameworks, enhancing the accuracy of constraint approximation, and developing universal constraint handling architectures.

The challenges discussed above manifest with varying degrees of impact across the three architectural paradigms of end-to-end decision optimization. Data quality is a universal challenge for all paradigms, while the issue of decision labels has the least impact—if any—on frameworks built around closed-loop loss. Differentiable optimization layers are most severely constrained by non-convexity and constraint feasibility; their core reliance on gradient propagation renders them incompatible with non-convex combinatorial structures. However, compared to paradigms that parameterize optimization problems, this approach exhibits stronger resilience to label scarcity. The black-box nature of parameterized optimization problem representations primarily limits them by decision labels and decision-dependent uncertainty, yet they excel in handling non-convexity. These challenges are not isolated occurrences but rather consequences of the inherent trade-offs within each architectural paradigm.

6. Summary and Prospect

The end-to-end decision optimization paradigm avoids the error propagation issue inherent in the traditional “predict-then-optimize” paradigm by establishing a direct mapping from input data to decisions. It thus exhibits unique advantages in dynamic problems and real-time decision scenarios. Like the development of any technology, the application of end-to-end decision optimization methods requires a dialectical perspective. Through a series of theoretical analyses, Elmachtoub et al. [65] demonstrated that end-to-end methods integrating estimation and optimization perform better in practical problems under general scenarios. However, when the model is close to being accurate and extreme risks (e.g., tail losses in finance) are a concern then the “predict-then-optimize” method is more suitable. This paper focuses on the implementation architecture of end-to-end decision optimization. It categorizes and reviews existing solutions into three paradigms for systematic review, while exploring key challenges and future development directions in this field. Based on the discussions above, this paper puts forward the following actionable takeaways for the development of the field where machine learning integrates with optimization: (1) The selection of model architecture should match the requirements of the optimization problem; for problems with high demands on interpretability, paradigms constructing differentiable optimization layers or traditional methods should be preferred as much as possible. (2) Conquering the challenges of gradient propagation and constraint satisfaction in end-to-end decision optimization is a key subject for the practical application of this field. (3) Explore cross-paradigm combination schemes to balance flexibility and interpretability, and improve the generalization ability of models. As a rapidly evolving interdisciplinary field, end-to-end decision optimization integrates knowledge from multiple disciplines, including mathematics, statistics, operations research, management, and computer science. Continuous exploration and practice are still required in aspects such as model theory, algorithm implementation, and solution deployment.

Author Contributions

W.Z. designed the research, conducted the literature review and analysis, and wrote the main manuscript text. G.L. participated in the literature analysis, provided research direction, overall supervision, and revised the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the General Program of Natural Science Foundation of Jilin Provincial Department of Science and Technology under Grant 20230101177JC.

Data Availability Statement

No datasets were generated or analyzed during the current study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bertsimas, D.; Kallus, N. From Predictive to Prescriptive Analytics. Manag. Sci. 2020, 66, 1025–1044. [Google Scholar] [CrossRef]
Elmachtoub, A.N.; Grigas, P. Smart “Predict, then Optimize”. Manag. Sci. 2022, 68, 9–26. [Google Scholar] [CrossRef]
Bengio, Y. Using a Financial Training Criterion Rather than a Prediction Criterion. Int. J. Neural Syst. 1997, 8, 433–443. [Google Scholar] [CrossRef]
Qi, M.; Shi, Y.; Qi, Y.; Ma, C.; Yuan, R.; Wu, D.; Shen, Z.J. A Practical End-to-End Inventory Management Model with Deep Learning. Manag. Sci. 2022, 69, 759–773. [Google Scholar] [CrossRef]
Yu, Y.; Yang, J.; Yang, M.; Gao, Y. Integrated scheduling of wind farm energy storage system prediction and decision-making based on deep reinforcement learning. Power Syst. Autom. 2021, 45, 132–140. [Google Scholar]
Wang, H.W.; Qi, C.; Wei, Y.C.; Li, B.; Zhu, S. Review on Data-based Decision Making Methodologies. Acta Autom. Sin. 2009, 35, 820–833. [Google Scholar]
Yu, H.; He, D.; Wang, G.; Li, J.; Xie, Y. Big data intelligent decision making. J. Autom. 2020, 46, 878–896. [Google Scholar] [CrossRef]
Lean, Y. Research on theory and method of prediction and decision optimization based on artificial intelligence. Manag. Sci. 2022, 35, 60–66. [Google Scholar]
Wang, S.; Mao, Y.; Wang, S. Prediction driven optimization: Uncertainty, statistical theory, and management applications. Chin. Sci. Found. 2024, 38, 750–761. [Google Scholar] [CrossRef]
Kotary, J.; Fioretto, F.; Van Hentenryck, P.; Wilder, B. End-to-end constrained optimization learning: A survey. arXiv 2021, arXiv:2103.16378. [Google Scholar]
Mandi, J.; Kotary, J.; Berden, S.; Mulamba, M.; Bucarey, V.; Guns, T.; Fioretto, F. Decision-focused learning: Foundations, state of the art, benchmark and future opportunities. J. Artif. Intell. Res. 2024, 80, 1623–1701. [Google Scholar] [CrossRef]
Sadana, U.; Chenreddy, A.; Delage, E.; Forel, A.; Frejinger, E.; Vidal, T. A survey of contextual optimization methods for decision-making under uncertainty. Eur. J. Oper. Res. 2025, 320, 271–289. [Google Scholar] [CrossRef]
Fajemisin, A.O.; Maragno, D.; den Hertog, D. Optimization with constraint learning: A framework and survey. Eur. J. Oper. Res. 2024, 314, 1–14. [Google Scholar] [CrossRef]
Liu, H.; Hu, J.; Li, Y.; Wen, Z. Optimization: Modeling, Algorithms, and Theory; Higher Education Press: Beijing, China, 2020. [Google Scholar]
Bennouna, A.; Van Parys, B.P. Learning and decision-making with data: Optimal formulations and phase transitions: A. Bennouna, BPG Van Parys. Math. Program. 2025, 1–93, Article in advance. [Google Scholar] [CrossRef]
Fan, X.; Hanasusanto, G.A. A decision rule approach for two-stage data-driven distributionally robust optimization problems with random recourse. Inf. J. Comput. 2024, 36, 526–542. [Google Scholar] [CrossRef]
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
Zhang, H.; Li, R.; Chen, Y.; Chu, Z.; Sun, M.; Teng, F. Risk-Aware Objective-Based Forecasting in Inertia Management. IEEE Trans. Power Syst. 2024, 39, 4612–4623. [Google Scholar] [CrossRef]
Chen, Y.; Tian, D.; Lin, C.; Yin, H. Survey of end-to-end autonomous driving systems. J. Image Graph. 2024, 29, 3216–3237. [Google Scholar]
Chen, L.; Wu, P.; Chitta, K.; Jaeger, B.; Geiger, A.; Li, H. End-to-end autonomous driving: Challenges and frontiers. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10164–10183. [Google Scholar] [CrossRef]
Ben-David, S.; Eiron, N.; Long, P.M. On the difficulty of approximately maximizing agreements. J. Comput. Syst. Sci. 2003, 66, 496–514. [Google Scholar] [CrossRef]
Liu, H.; Grigas, P. Risk bounds and calibration for a smart predict-then-optimize method. Adv. Neural Inf. Process. Syst. 2021, 34, 22083–22094. [Google Scholar]
Ho-Nguyen, N.; Kilinç-Karzan, F. Risk Guarantees for End-to-End Prediction and Optimization Processes. Manag. Sci. 2022, 68, 8680–8698. [Google Scholar] [CrossRef]
Wilder, B.; Dilkina, B.; Tambe, M. Melding the Data-Decisions Pipeline: Decision-Focused Learning for Combinatorial Optimization. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence/31st Innovative Applications of Artificial Intelligence Conference/9th AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 1658–1665. [Google Scholar]
Mandi, J.; Demirovi, E.; Stuckey, P.J.; Guns, T. Smart predict-and-optimize for hard combinatorial optimization problems. In Proceedings of the National Conference on Artificial Intelligence, Chennai, India, 27–28 February 2020. [Google Scholar]
Loke, G.G.; Tang, Q.; Xiao, Y. Decision-Driven Regularization: A Blended Model for Predict-then-Optimize. Available SSRN 3623006. 2022. Available online: https://www.semanticscholar.org/paper/Decision-Driven-Regularization-A-Blended-Model-for-Loke-Tang/df9b74cf4b2d7b9642d8bbb585eb086316953df2 (accessed on 11 November 2025).
Elmachtoub, A.N.; Liang, J.C.N.; McNellis, R. Decision Trees for Decision-Making under the Predict-then-Optimize Framework. In Proceedings of the International Conference on Machine Learning (ICML), Electr Network, Vienna, Austria, 13–18 July 2020. [Google Scholar]
Huang, M.; Gupta, V. Decision-focused learning with directional gradients. Adv. Neural Inf. Process. Syst. 2024, 37, 79194–79220. [Google Scholar]
Schutte, N.; Postek, K.; Yorke-Smith, N. Robust losses for decision-focused learning. arXiv 2023, arXiv:2310.04328. [Google Scholar] [CrossRef]
Butler, A.; Kwon, R.H. Efficient differentiable quadratic programming layers: An ADMM approach. Comput. Optim. Appl. 2023, 84, 449–476. [Google Scholar] [CrossRef]
Amos, B.; Kolter, J.Z. OptNet: Differentiable Optimization as a Layer in Neural Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
Agrawal, A.; Amos, B.; Barratt, S.; Boyd, S.; Diamond, S.; Kolter, J.Z. Differentiable convex optimization layers. In Proceedings of the Advances in Neural Information Processing Systems, 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
McKenzie, D.; Fung, S.W.; Heaton, H. Differentiating through integer linear programs with quadratic regularization and davis-yin splitting. arXiv 2023, arXiv:2301.13395. [Google Scholar]
Liu, Y.; Zhou, C.; Zhang, P.; Zhang, S.; Zhang, X.; Li, Z.; Chen, H. Decision-focused graph neural networks for graph learning and optimization. In Proceedings of the 2023 IEEE International Conference on Data Mining (ICDM), Shanghai, China, 1–4 December 2023; pp. 1151–1156. [Google Scholar]
Sun, H.; Shi, Y.; Wang, J.; Tuan, H.D.; Poor, H.V.; Tao, D. Alternating differentiation for optimization layers. arXiv 2022, arXiv:2210.01802. [Google Scholar]
Guler, A.U.; Demirovic, E.; Chan, J.; Bailey, J.; Leckie, C.; Stuckey, P.J. Divide and learn: A divide and conquer approach for predict+ optimize. arXiv 2020, arXiv:2012.02342. [Google Scholar] [CrossRef]
Donti, P.L.; Amos, B.; Kolter, J.Z. Task-based End-to-end Model Learning in Stochastic Optimization. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Oroojlooyjadid, A.; Snyder, L.V.; Takac, M. Applying deep learning to the newsvendor problem. Iise Trans. 2020, 52, 444–463. [Google Scholar] [CrossRef]
Zhang, Y.F.; Gao, J.B. Assessing the Performance of Deep Learning Algorithms for Newsvendor Problem. In Proceedings of the 24th International Conference on Neural Information Processing (ICONIP), Guangzhou, China, 14–18 November 2017; pp. 912–921. [Google Scholar]
Kong, L.; Cui, J.; Zhuang, Y.; Feng, R.; Prakash, B.A.; Zhang, C. End-to-end stochastic optimization with energy-based model. Adv. Neural Inf. Process. Syst. 2022, 35, 11341–11354. [Google Scholar]
Sun, C.; Liu, D.; Yang, C. Model-free unsupervised learning for optimization problems with constraints. In Proceedings of the 2019 25th Asia-Pacific Conference on Communications (APCC), Ho Chi Minh City, Vietnam, 6–8 November 2019; pp. 392–397. [Google Scholar]
Shah, S.; Wang, K.; Wilder, B.; Perrault, A.; Tambe, M. Decision-focused learning without decision-making: Learning locally optimized decision losses. Adv. Neural Inf. Process. Syst. 2022, 35, 1320–1332. [Google Scholar]
Mandi, J.; Bucarey, V.; Mulamba, M.; Guns, T. Decision-Focused Learning: Through the Lens of Learning to Rank. In Proceedings of the 38th International Conference on Machine Learning (ICML), Baltimore, MD, USA, 17–23 July 2022. [Google Scholar]
Dong, S.; Wang, P.; Abbas, K. A survey on deep learning and its applications. Comput. Sci. Rev. 2021, 40, 100379. [Google Scholar] [CrossRef]
Solozabal, R.; Ceberio, J.; Takáč, M. Constrained combinatorial optimization with reinforcement learning. arXiv 2020, arXiv:2006.11984. [Google Scholar] [CrossRef]
Wang, K.; Shah, S.; Chen, H.; Perrault, A.; Doshi-Velez, F.; Tambe, M. Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning. arXiv 2022, arXiv:2106.03279. [Google Scholar] [CrossRef]
Ge, B.; Tian, W.; Xia, C.; Qin, W. Solving capacitated vehicle routing problems based on end to end deep reinforcement learning. Appl. Res. Comput. 2024, 41, 3245–3250. [Google Scholar] [CrossRef]
Pang, J.; Feng, Z. Exploration approaches in deep reinforcement learning based on uncertainty: A review. Appl. Res. Comput. 2023, 40, 3201–3210. [Google Scholar] [CrossRef]
Besbes, O.; Mouchtaki, O. How big should your data really be? Data-driven newsvendor: Learning one sample at a time. Manag. Sci. 2023, 69, 5848–5865. [Google Scholar] [CrossRef]
Kossen, J.; Farquhar, S.; Gal, Y.; Rainforth, T. Active surrogate estimators: An active learning approach to label-efficient model evaluation. Adv. Neural Inf. Process. Syst. 2022, 35, 24557–24570. [Google Scholar]
Settles, B. Active Learning Literature Survey; University of Wisconsin-Madison, Department of Computer Sciences: Madison, WI, USA, 2009; Available online: http://digital.library.wisc.edu/1793/60660 (accessed on 11 November 2025).
Liu, M.; Grigas, P.; Liu, H.; Shen, Z.-J.M. Active learning in the predict-then-optimize framework: A margin-based approach. arXiv 2023, arXiv:2305.06584. [Google Scholar]
Basciftci, B.; Ahmed, S.; Shen, S. Distributionally robust facility location problem under decision-dependent stochastic demand. Eur. J. Oper. Res. 2021, 292, 548–561. [Google Scholar] [CrossRef]
Liu, W.; Zhang, Z. Solving data-driven newsvendor pricing problems with decision-dependent effect. arXiv 2023, arXiv:2304.13924. [Google Scholar]
Alley, M.; Biggs, M.; Hariss, R.; Herrmann, C.; Li, M.L.; Perakis, G. Pricing for heterogeneous products: Analytics for ticket reselling. Manuf. Serv. Oper. Manag. 2023, 25, 409–426. [Google Scholar] [CrossRef]
Lobo, M.S.; Vandenberghe, L.; Boyd, S.; Lebret, H. Applications of second-order cone programming. Linear Algebr. Appl. 1998, 284, 193–228. [Google Scholar] [CrossRef]
Ferber, A.; Wilder, B.; Dilkina, B.; Tambe, M. Mipaal: Mixed integer program as a layer. In Proceedings of the AAAI conference on artificial intelligence, New York, NY, USA, 7–12 February 2020; pp. 1504–1511. [Google Scholar]
Vlastelica, M.; Paulus, A.; Musil, V.; Martius, G.; Rolínek, M. Differentiation of blackbox combinatorial solvers. arXiv 2019, arXiv:1912.02175. [Google Scholar]
Berthet, Q.; Blondel, M.; Teboul, O.; Cuturi, M.; Vert, J.-P.; Bach, F. Learning with differentiable pertubed optimizers. Adv. Neural Inf. Process. Syst. 2020, 33, 9508–9519. [Google Scholar]
Mandi, J.; Guns, T. Interior point solving for lp-based prediction+ optimisation. Adv. Neural Inf. Process. Syst. 2020, 33, 7272–7282. [Google Scholar]
Fioretto, F.; Van Hentenryck, P.; Mak, T.W.; Tran, C.; Baldo, F.; Lombardi, M. Lagrangian duality for constrained deep learning. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Ghent, Belgium, 14–18 September 2020; pp. 118–135. [Google Scholar]
Nandwani, Y.; Pathak, A.; Singla, P. A primal dual formulation for deep learning with constraints. Adv. Neural Inf. Process. Syst. 2019, 32. Available online: https://papers.nips.cc/paper_files/paper/2019/hash/cf708fc1decf0337aded484f8f4519ae-Abstract.html (accessed on 11 November 2025).
Kervadec, H.; Dolz, J.; Yuan, J.; Desrosiers, C.; Granger, E.; Ayed, I.B. Constrained deep networks: Lagrangian optimization via log-barrier extensions. In Proceedings of the 2022 30th European Signal Processing Conference (EUSIPCO), Belgrade, Serbia, 29 August–2 September 2022; pp. 962–966. [Google Scholar]
Donti, P.L.; Rolnick, D.; Kolter, J.Z. DC3: A learning method for optimization with hard constraints. arXiv 2021, arXiv:2104.12225. [Google Scholar] [CrossRef]
Elmachtoub, A.N.; Lam, H.; Lan, H.; Zhang, H. Dissecting the Impact of Model Misspecification in Data-driven Optimization. arXiv 2025, arXiv:2503.00626. [Google Scholar] [CrossRef]

Figure 1. Core elements of this paper.

Figure 2. Schematic diagram of two-stage decision-making.

Figure 3. SPO methodological framework. Note (for Figure 3): In this SPO methodological framework, the input data

(x_{i}, c_{i})

(where

i = 1,2, \dots, n

) consists of feature vectors

x_{i}

and problem parameters

c_{i}

. Within the framework: the objective function is

\min_{w \in S} c^{T} w

, the loss function is SPO loss, and the resulting decision is

w^{*} (\hat{c})

.

Figure 3. SPO methodological framework. Note (for Figure 3): In this SPO methodological framework, the input data

(x_{i}, c_{i})

(where

i = 1,2, \dots, n

) consists of feature vectors

x_{i}

and problem parameters

c_{i}

. Within the framework: the objective function is

\min_{w \in S} c^{T} w

, the loss function is SPO loss, and the resulting decision is

w^{*} (\hat{c})

.

Figure 4. Framework for constructing differentiable optimization layers.

Figure 5. Mapping mechanism of the parameterized representation method.

Figure 6. Unified architectural diagram of different methods.

Table 1. Comparison of forecasting and decision-making optimization methods.

Dimension	Traditional Methods	Past Data-Driven Methods	End-to-End Methods
Core objective	Separate prediction and optimization	Data-driven prediction + independent optimization	Directly optimize decision outcomes
Technical means	Statistical model + mathematical programming	Machine learning prediction + traditional model	Decision-making participation feedback dissemination
Dealing with uncertainty	Dependent distribution assumption	Data-driven ambiguity sets	Learning decision-making strategies
Issues of decision dependence	Inapplicable	Partially supported (complex modelling required)	Supported (e.g., pricing affects demand)
Case advantage	Quick solutions to simple problems	High-dimensional covariate scenarios (such as e-commerce demand forecasting)	Complex decision chains (inventory, path planning)

Table 2. Comparative analysis of review literature.

Studies	Yu. et al. (2020) [7]	Yu. (2022) [8]	Wang, et al. (2024) [9]	Kotary et al. (2021) [10]	Mandi et al. (2024) [11]	Sadana et al. (2025) [12]
Field Positioning	automatic	management science	Statistics management	optimization algorithm	machine learning	operations research (OR)
Core issues	Big Data-Driven Decision Making with Uncertainty	Mechanisms of paradigm shift in decision-making driven by artificial intelligence	Statistical Theory of Prediction-Driven Optimization	End-to-end portfolio optimization	Classification of Decision Focused Learning techniques	A Unified Framework for Contextual Optimization Methods
core framework	Five-aspect analysis: intelligent decision support systems uncertainty handling correlation analysis information fusion incremental learning	Key research directions: transformation of forms of decision-making data feature-driven forecasting domain knowledge dependent decision making interpretable AI	Two-tier architecture: stochastic prediction optimization distributed robust prediction optimization	Differentiability based processing approaches: convex optimization layer combinatorial optimization relaxation affine strategy approximation	DFL quadruple taxonomy: analytic differential analytic smoothing random perturbation agent loss	The triple paradigm of contextual optimization: separate learning-optimization integrated learning-optimization policy rule optimization
theoretical contribution	Characteristics of big data decision-making are summarized	Proposing an AI-enabled decision-making paradigm in management science	A unified theory of statistical performance guarantees for stochastic and distributionally robust optimization	Research into the theory of constraint-optimized differentiable programming	First systematic classification of DFL techniques with applications and evaluations	Establishment of a unified terminology system for prediction-optimization in the OR/MS field
limitations	Quantitative theories of decision quality not addressed	Lack of specific algorithmic implementation details	Uncovered combinatorial optimization problems	Dependency on specific structural elements	Unresolved large-scale issues scalability	Not delved deeply into machine learning theory
Dominant application areas	Industrial control systems	Strategic decision-making for enterprises	Operations management (om)	Embedded system	Combinatorial optimization problem	Service system

Table 3. Comparison of different methods.

Solutions	Constructing Closed-Loop Losses	Constructing a Differentiable Optimization Layer	Parameterized Representation of Optimization Problems
feature	Connect the two stages into a decision-centered whole through a loss function	Embedding the optimization problem solution into a layer of the network	Parameterizing optimization problems for learning with data
advantage	Easy to operate, low computational cost	The optimization process is highly interpretable; Gradient backpropagation is precise.	It is independent of the structure of optimization problems. Exhibits strong dynamic adaptability
shortcoming	Relying on linear objectives and training requirements to solve optimization problems	High computational cost and non-convex constraints can lead to infeasible solutions	Lack of interpretability, requiring a large amount of data
data requirements	A small amount (input, optimize problem parameters) pairs	Medium (input, constraint parameters) pairs	High (input, optimal solution) pairs
convexity hypothesis	strong	strong	weak
decision label	none	none	tall
interpretability	mid	tall	low
applicable scenarios	Linear programming, low real-time performance, cost-sensitive scenarios	Convex optimization, high interpretability, precision first scenario	Non-convex/discrete optimization, high dynamic scenarios

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, W.; Li, G. A Review of End-to-End Decision Optimization Research: An Architectural Perspective. Algorithms 2026, 19, 86. https://doi.org/10.3390/a19010086

AMA Style

Zhang W, Li G. A Review of End-to-End Decision Optimization Research: An Architectural Perspective. Algorithms. 2026; 19(1):86. https://doi.org/10.3390/a19010086

Chicago/Turabian Style

Zhang, Wenya, and Gendao Li. 2026. "A Review of End-to-End Decision Optimization Research: An Architectural Perspective" Algorithms 19, no. 1: 86. https://doi.org/10.3390/a19010086

APA Style

Zhang, W., & Li, G. (2026). A Review of End-to-End Decision Optimization Research: An Architectural Perspective. Algorithms, 19(1), 86. https://doi.org/10.3390/a19010086

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

A Review of End-to-End Decision Optimization Research: An Architectural Perspective

Abstract

1. Introduction

2. Literature Review Analysis and Research Framework

3. Decision Optimization

3.1. The Concept of Decision-Making

3.2. Decision Optimization Problems

3.3. Two-Stage Decision-Making: Prediction Followed by Optimization

4. End-to-End Decision-Making Architecture Integrating Prediction and Optimization

4.1. Constructing Closed-Loop Losses

4.2. Constructing a Differentiable Optimization Layer

4.3. Parameterized Representation of Optimization Problems

4.4. Architectural Comparison and Design Implications

5. Current Research Directions and Challenges

5.1. Data Sources

5.2. Label-Related Issues

5.3. Intrinsic Uncertainty

5.4. Gradient Propagation for Non-Differentiable Decision Modules

5.5. Embedding and Satisfying Complex Constraints

6. Summary and Prospect

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI