Bi-Level Dependent-Chance Goal Programming for Paper Manufacturing Tactical Planning: A Reinforcement-Learning-Enhanced Approach

Boutmir, Yassine; Bannari, Rachid; Bannari, Abdelfettah; Rouky, Naoufal; Benmoussa, Othmane; Fedouaki, Fayçal

doi:10.3390/sym17101624

Open AccessArticle

Bi-Level Dependent-Chance Goal Programming for Paper Manufacturing Tactical Planning: A Reinforcement-Learning-Enhanced Approach

by

Yassine Boutmir

^1,*,

Rachid Bannari

¹

,

Abdelfettah Bannari

²

,

Naoufal Rouky

³,

Othmane Benmoussa

⁴

and

Fayçal Fedouaki

⁵

¹

Laboratory of Engineering Sciences, Department of Informatics, Logistics and Mathematics, National School of Applied Sciences, Ibn Tofaïl University, Kenitra 14000, Morocco

²

Faculty of Science and Technology, Sultan Moulay Slimane University, Beni Mellal 23000, Morocco

³

Faculty of Sciences and Techniques, Hassan 1st University, Settat 26000, Morocco

⁴

Euromed University of Fes, UEMF, Fes 30000, Morocco

⁵

Department of Industrial Engineering, National Higher School of Arts and Professions, University Hassan 2nd of Casablanca, 150 Avenue Nil, Casablanca 20670, Morocco

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(10), 1624; https://doi.org/10.3390/sym17101624

Submission received: 14 July 2025 / Revised: 28 August 2025 / Accepted: 2 September 2025 / Published: 1 October 2025

(This article belongs to the Special Issue AI-Driven Optimization Under Symmetry and Uncertainty in Smart Manufacturing Systems)

Download

Browse Figures

Versions Notes

Abstract

Tactical production–distribution planning in paper manufacturing involves hierarchical decision-making under hybrid uncertainty, where aleatory randomness (demand fluctuations, machine variations) and epistemic uncertainty (expert judgments, market trends) simultaneously affect operations. Existing approaches fail to address the bi-level nature under hybrid uncertainty, treating production and distribution decisions independently or using single-paradigm uncertainty models. This research develops a bi-level dependent-chance goal programming framework based on uncertain random theory, where the upper level optimizes distribution decisions while the lower level handles production decisions. The framework exploits structural symmetries through machine interchangeability, symmetric transportation routes, and temporal symmetry, incorporating symmetry-breaking constraints to eliminate redundant solutions. A hybrid intelligent algorithm (HIA) integrates uncertain random simulation with a Reinforcement-Learning-enhanced Arithmetic Optimization Algorithm (RL-AOA) for bi-level coordination, where Q-learning enables adaptive parameter tuning. The RL component utilizes symmetric state representations to maintain solution quality across symmetric transformations. Computational experiments demonstrate HIA’s superiority over standard metaheuristics, achieving 3.2–7.8% solution quality improvement and 18.5% computational time reduction. Symmetry exploitation reduces search space by approximately 35%. The framework provides probability-based performance metrics with optimal confidence levels (0.82–0.87), offering 2.8–4.5% annual cost savings potential.

Keywords:

bi-level programming; dependent-chance goal programming; tactical production–distribution planning; paper manufacturing; uncertain random theory; reinforcement learning; Arithmetic Optimization Algorithm; sustainable production; resource efficiency; industrial productivity; environmental sustainability; sustainable industrialization

1. Introduction

The paper manufacturing industry, valued at approximately USD 400 billion globally, represents one of the most operationally complex sectors in modern manufacturing, where tactical production–distribution planning decisions directly impact both economic performance and environmental sustainability [1]. Paper production involves intricate interdependencies between raw material procurement, multi-grade production processes, and diverse distribution networks serving packaging, publishing, and specialty applications [2]. The industry’s operational structure exhibits significant structural symmetries, including machine interchangeability for certain paper grades, bidirectional material flows between production facilities, and periodic patterns in demand cycles, which create opportunities for computational efficiency through systematic symmetry exploitation in optimization algorithms [3].

Paper manufacturing operations encompass diverse product portfolios including newsprint, packaging materials, tissue products, and specialty papers, each requiring specific production parameters and handling requirements [4]. Production facilities face machine compatibility constraints and sequence-dependent changeover requirements that impose significant setup times and costs when transitioning between product types [5]. The presence of symmetric machine capabilities—where multiple machines can produce identical grades with equivalent efficiency—introduces structural symmetries in the optimization problem formulation. These symmetries manifest as equivalent production schedules that yield identical costs and performance metrics, necessitating symmetry-aware optimization mechanisms to avoid redundant exploration of mathematically equivalent solutions. Raw material variability and seasonal demand patterns create additional complexity in capacity allocation decisions [6].

The tactical planning horizon of 3–6 months represents a critical decision space where strategic network design constraints meet operational execution requirements [7]. Companies must make production scheduling decisions that determine machine utilization and inventory positioning while simultaneously coordinating distribution strategies affecting customer service levels and transportation costs [8]. The bi-level nature of this problem exhibits hierarchical structural symmetries, where upper-level distribution decisions and lower-level production decisions display analogous constraint patterns and objective contributions, requiring coordinated optimization approaches that can exploit these symmetries while maintaining solution optimality.

Supply chain planning operates across multiple hierarchical levels with distinct time horizons and decision scopes [9]. Strategic planning (1–5 years) focuses on network design decisions, tactical planning (3–6 months) bridges strategic constraints with operational execution, and operational planning (daily to weekly) addresses immediate execution decisions [7]. Recent advances in optimization have demonstrated the effectiveness of symmetry-based decomposition techniques for large-scale planning problems. However, existing approaches for bi-level production–distribution planning have not systematically addressed symmetry identification and exploitation, particularly in the context of hybrid uncertainty. Our reinforcement-learning-enhanced arithmetic optimization approach incorporates symmetric state representations in Q-learning to enable efficient exploration while avoiding redundant evaluations of symmetric solution regions.

Tactical planning in paper manufacturing operates under hybrid uncertainty that cannot be adequately captured by traditional single-paradigm approaches [10]. Aleatory uncertainty sources include demand fluctuations, machine efficiency variations, and quality variations that follow well-defined probability distributions based on historical data [11]. Epistemic uncertainty stems from incomplete knowledge about market trends, regulatory changes, and supplier reliability, where expert judgment must complement quantitative data [12]. Traditional optimization approaches treating these uncertainties independently fail to capture their interactive effects and compound impacts on tactical decisions [13].

Despite extensive research in production–distribution planning, several critical gaps remain that limit the effectiveness of existing approaches. Bi-level production–distribution planning research has primarily focused on deterministic parameters, failing to address goal-oriented frameworks that incorporate probability-based performance targets [14]. Goal programming applications have predominantly utilized deterministic formulations that assume exact target values, limiting their applicability in uncertain environments where probability-based objectives are more meaningful [15]. Uncertainty modeling has advanced significantly through stochastic programming and fuzzy programming approaches, yet hybrid uncertainty applications that simultaneously handle both aleatory and epistemic components remain limited [10]. Paper industry optimization research has developed sophisticated models for individual planning components but lacks comprehensive frameworks addressing integrated tactical-level decision coordination [1].

The symmetric properties inherent in paper manufacturing networks present both computational challenges and optimization opportunities. While these symmetries can exponentially expand the solution space through mathematically equivalent solutions, they also enable significant computational efficiencies when properly exploited. Our framework systematically addresses these symmetries through (1) identification and classification of structural symmetries in the bi-level problem formulation, (2) incorporation of symmetry-breaking constraints to eliminate redundant solution regions, (3) utilization of symmetric state representations in the reinforcement learning component to maintain solution quality across symmetric transformations, and (4) preservation of invariance properties in arithmetic optimization operators to ensure consistent performance across symmetric problem instances.

The primary objective of this research is to develop a comprehensive bi-level dependent-chance goal programming framework for tactical production–distribution planning in paper manufacturing under hybrid uncertainty. This research addresses four fundamental questions that guide the investigation: How can dependent-chance goal programming be adapted to handle bi-level tactical planning where production and distribution managers have different risk tolerances and objective priorities? What is the optimal bi-level structure for integrating production scheduling and distribution planning decisions while maintaining computational tractability? How effective is the proposed Reinforcement-Learning-enhanced Arithmetic Optimization Algorithm compared to existing metaheuristics for solving complex bi-level optimization problems under hybrid uncertainty? What are the trade-offs between different goal probability levels and how do these affect tactical planning performance in paper manufacturing contexts?

To address the identified research gaps and challenges, this study makes the following key contributions:

Bi-Level Dependent-Chance Goal Programming Framework: Development of the first comprehensive bi-level dependent-chance goal programming model for tactical production–distribution planning that simultaneously addresses hierarchical decision-making and probabilistic goal achievement under hybrid uncertainty, extending traditional goal programming theory into probabilistic domains with bi-level coordination.
Hybrid Uncertainty Modeling Enhancement: Advancement of uncertain random theory applications through a systematic framework for modeling hybrid uncertainty in tactical planning contexts, providing unified mathematical representations that capture both aleatory randomness from historical data and epistemic uncertainty from expert judgments.
Reinforcement-Learning-enhanced Arithmetic Optimization Algorithm: Introduction of the novel RL-AOA, which integrates Q-learning mechanisms with arithmetic optimization operators for adaptive bi-level coordination, enabling dynamic parameter tuning based on optimization progress and problem-specific characteristics.
Industry-Specific Tactical Planning Framework: Development of the first comprehensive tactical planning framework specifically designed for paper manufacturing environments, incorporating machine compatibility constraints, sequence-dependent changeover requirements, paper grade specifications, and multi-modal transportation considerations.
Computational Validation and Implementation Guidelines: Demonstration of algorithm superiority through extensive computational experiments achieving 3.2–7.8% solution quality improvement and 18.5% computational time reduction, accompanied by practical implementation guidelines and sensitivity analysis insights for industrial adoption.
Symmetry-Aware Optimization Techniques: Development of systematic approaches for identifying, classifying, and exploiting structural symmetries in paper manufacturing networks, demonstrating measurable computational improvements through intelligent symmetry breaking and symmetric state representations in reinforcement learning algorithms.

The remainder of this paper is organized as follows: Section 2 presents the mathematical formulation of the bi-level dependent-chance goal programming model including uncertain random variable definitions and chance measure calculations. Section 3 describes the theoretical foundations of uncertain random theory for hybrid uncertainty modeling and its application to tactical planning problems. Section 4 details the Reinforcement-Learning-enhanced Arithmetic Optimization Algorithm including bi-level coordination mechanisms and adaptive parameter tuning strategies. Section 5 provides comprehensive computational results and performance analysis across diverse paper manufacturing instances with comparative evaluation against existing approaches. Section 6 concludes with key findings, practical implications, and future research directions.

2. Mathematical Model Formulation

This section presents the comprehensive bi-level dependent-chance goal programming model for tactical production–distribution planning in paper manufacturing under hybrid uncertainty. The model captures the hierarchical decision-making structure where distribution coordinators (upper level) and production managers (lower level) operate with different objectives while facing uncertain random parameters.

2.1. Sets and Indices

The following Table 1 presents the sets and indices used throughout our paper mill supply chain optimization model. These notations form the foundation of our mathematical formulation, defining the scope of products, facilities, resources, and relationships within the supply chain network. Table 1 establishes the necessary mathematical notation that will be used consistently throughout the remainder of this paper.

2.2. Parameters

2.2.1. Deterministic Parameters

The deterministic parameters presented in Table 2 capture the fixed and known aspects of the paper mill supply chain. These values include production costs, technical specifications, capacity limitations, and other operational constants that do not vary with uncertainty. The parameters listed in Table 2 are essential for establishing the baseline operational constraints of the optimization model and remain constant throughout the planning horizon.

2.2.2. Uncertain Random Parameters

Our model incorporates uncertainty through a set of random parameters that represent the stochastic nature of real-world paper mill operations. These parameters, presented in Table 3 and denoted with a tilde, capture the variability in demand, costs, equipment performance, and material availability. The uncertain random parameters outlined in Table 3 are essential for developing robust solutions that can withstand real-world fluctuations in the paper manufacturing environment.

2.2.3. Goal Programming Parameters

The goal programming component of our model requires specific parameters to define target thresholds and confidence levels. The parameters listed in Table 4 establish the multi-objective framework that balances economic efficiency, operational performance, resource utilization, and quality assurance within the optimization process. Table 4 defines the target probability levels and thresholds that will be used to evaluate the achievement of each goal in our model.

2.3. Decision Variables

2.3.1. Upper-Level Decision Variables (Distribution Planning)

Our bi-level optimization model separates decision-making into distribution and production planning. The upper-level variables defined in Table 5 focus on distribution center operations, product shipping decisions, transportation mode selection, inventory management at distribution centers, and customer service levels. The variables presented in Table 5 determine the optimal distribution network configuration to meet customer demands efficiently.

2.3.2. Lower-Level Decision Variables (Production Planning)

The lower-level variables focus on production planning decisions within the paper mill. The variables defined in Table 6 determine optimal production quantities, machine assignments, production sequencing, raw material procurement, and warehouse inventory management. The lower-level decisions represented in Table 6 are made in response to the distribution requirements established at the upper level of the optimization model.

2.3.3. Goal Programming Variables

The goal programming framework requires specific variables to measure deviations from target thresholds. The variables presented in Table 7 quantify both positive and negative deviations from the four main objectives of our model: economic efficiency, operational performance, resource utilization, and quality assurance. These deviations defined in Table 7 are minimized in the objective function to achieve balanced solutions that satisfy multiple competing objectives simultaneously.

2.4. Bi-Level Mathematical Model

Notation: Throughout this paper, Ch{·} denotes the chance measure of uncertain random events, which quantifies the probability that an uncertain random variable satisfies a given condition. This measure integrates probability theory for aleatory uncertainty (randomness from historical variability) with uncertainty theory for epistemic uncertainty (imprecision from limited knowledge) within a unified mathematical framework, as detailed in the following definition.

2.4.1. Upper-Level Problem (Distribution Planning)

The upper level seeks to minimize distribution costs while maximizing service levels:

\begin{matrix} min_{Y, X, Z, U, I^{D}, B, λ} F^{U L} = & \sum_{d = 1}^{d} F C_{d} \cdot Y_{d} + \sum_{w = 1}^{w} \sum_{d = 1}^{d} \sum_{p = 1}^{p} \sum_{l = 1}^{l} \sum_{t = 1}^{t} {\tilde{T C}}_{w d l t} \cdot X_{w d p l t} \\ + \sum_{d = 1}^{d} \sum_{c = 1}^{c} \sum_{p = 1}^{p} \sum_{l = 1}^{l} \sum_{t = 1}^{t} {\tilde{T C}}_{d c l t} \cdot Z_{d c p l t} \\ + \sum_{d = 1}^{d} \sum_{p = 1}^{p} \sum_{t = 1}^{t} {\tilde{H C}}_{p t} \cdot I_{d p t}^{D} \\ + \sum_{p = 1}^{p} \sum_{c = 1}^{c} \sum_{t = 1}^{t} {\tilde{B C}}_{p c t} \cdot B_{p c t} \\ - \sum_{p = 1}^{p} \sum_{t = 1}^{t} W_{λ} \cdot λ_{p t} \end{matrix}

(1)

Subject to upper-level constraints:

Distribution Center Capacity:

\begin{matrix} \sum_{p = 1}^{p} I_{d p t}^{D} \leq D C A P_{d} \cdot Y_{d}, \forall d, t \end{matrix}

(2)

This constraint ensures that the total inventory stored across all paper grades at each distribution center does not exceed the facility’s physical storage capacity, maintaining operational feasibility.

Transportation Mode Capacity:

\begin{matrix} C h \{\sum_{w = 1}^{w} \sum_{d = 1}^{d} \sum_{p = 1}^{p} X_{w d p l t} + \sum_{d = 1}^{d} \sum_{c = 1}^{c} \sum_{p = 1}^{p} Z_{d c p l t} \leq L C A P_{l} \cdot {\tilde{ϕ}}_{l t} \cdot U_{l t}\} \geq β_{1}, \forall l, t \end{matrix}

(3)

These constraints guarantee that the combined shipment volumes using each transportation mode remain within the mode’s available capacity limits, accounting for uncertain availability factors.

Customer Demand Satisfaction with Production Coordination:

\begin{matrix} C h \{\sum_{d = 1}^{d} \sum_{l = 1}^{l} Z_{d c p l t} + B_{p c, t - 1} - B_{p c t} = {\tilde{D}}_{p c t}\} \geq β_{2}, \forall p, c, t \end{matrix}

(4)

\begin{matrix} \sum_{d = 1}^{d} \sum_{c = 1}^{c} \sum_{l = 1}^{l} Z_{d c p l t} \leq \sum_{w = 1}^{w} I_{w p t}^{W} + \sum_{m = 1}^{m} Q_{p m t}, \forall p, t \end{matrix}

(5)

Constraint (4) maintains the demand–supply balance under uncertainty, while constraint (5) ensures that total customer shipments cannot exceed available warehouse inventory plus current production, creating explicit coordination between upper-level distribution decisions and lower-level production outputs.

Distribution Center Inventory Balance with Material Flow:

\begin{matrix} I_{d p, t - 1}^{D} + \sum_{w = 1}^{w} \sum_{l = 1}^{l} X_{w d p l t} - \sum_{c = 1}^{c} \sum_{l = 1}^{l} Z_{d c p l t} = I_{d p t}^{D}, \forall d, p, t \end{matrix}

(6)

\begin{matrix} \sum_{w = 1}^{w} \sum_{d = 1}^{d} \sum_{l = 1}^{l} X_{w d p l t} \leq \sum_{m = 1}^{m} Q_{p m t} + I_{w p, t - 1}^{W}, \forall p, t \end{matrix}

(7)

Constraint (6) preserves inventory continuity at distribution centers, while constraint (7) ensures that warehouse shipments to distribution centers are bounded by current production plus previous warehouse inventory, establishing the material flow dependency between production and distribution levels.

Service-Level Definition:

\begin{matrix} C h \{λ_{p t} = \frac{\sum_{d = 1}^{d} \sum_{c = 1}^{c} \sum_{l = 1}^{l} Z_{d c p l t}}{\sum_{c = 1}^{c} {\tilde{D}}_{p c t}}\} \geq β_{3}, \forall p, t \end{matrix}

(8)

This constraint establishes the service level metric by computing the ratio of fulfilled demand to total customer demand for each product in each time period.

2.4.2. Lower-Level Problem (Production Planning)

Given the upper-level decisions, the lower level minimizes production costs:

\begin{matrix} min_{Q, V, W, F, I^{W}, R, I^{R}} F^{L L} = & \sum_{p = 1}^{p} \sum_{m = 1}^{m} \sum_{t = 1}^{t} {\tilde{P C}}_{p m t} \cdot Q_{p m t} + \sum_{p = 1}^{p} \sum_{m = 1}^{m} \sum_{t = 1}^{t} K_{p m} \cdot V_{p m t} \\ + \sum_{p_{i} = 1}^{p} \sum_{p_{j} = 1}^{p} \sum_{m = 1}^{m} \sum_{t = 1}^{t} S C_{p_{i} p_{j} m} \cdot W_{p_{i} p_{j} m t} \\ + \sum_{s = 1}^{s} \sum_{r = 1}^{r} \sum_{t = 1}^{t} {\tilde{R C}}_{r t} \cdot R_{s r t} \\ + \sum_{w = 1}^{w} \sum_{p = 1}^{p} \sum_{t = 1}^{t} {\tilde{H C}}_{p t} \cdot I_{w p t}^{W} \\ + \sum_{r = 1}^{r} \sum_{t = 1}^{t} {\tilde{H C}}_{r t} \cdot I_{r t}^{R} \end{matrix}

(9)

Subject to lower-level constraints:

Machine Capacity:

\begin{matrix} C h \{\sum_{p \in P M_{m}} \frac{Q_{p m t}}{β_{p m} \cdot {\tilde{η}}_{p m t}} + \sum_{p_{i} = 1}^{p} \sum_{p_{j} = 1}^{p} S T_{p_{i} p_{j} m} \cdot W_{p_{i} p_{j} m t} \leq C A P_{m} \cdot (1 - {\tilde{ω}}_{m t})\} \geq β_{4}, \forall m, t \end{matrix}

(10)

This constraint controls machine capacity limitations by accounting for production time requirements, changeover durations, efficiency variations, and potential equipment breakdowns.

Machine–Product Compatibility:

\begin{matrix} V_{p m t} = 0, \forall p \notin P M_{m}, \forall m, t \end{matrix}

(11)

These restrictions prevent the assignment of paper grades to incompatible machines, maintaining technical and operational feasibility in production scheduling.

Production–Setup Linking:

\begin{matrix} Q_{p m t} & \leq M_{b i g} \cdot V_{p m t}, \forall p, m, t \end{matrix}

(12)

\begin{matrix} Q_{p m t} & \geq M W_{p} \cdot V_{p m t}, \forall p, m, t \end{matrix}

(13)

\begin{matrix} Q_{p m t} & \leq M X_{p} \cdot V_{p m t}, \forall p, m, t \end{matrix}

(14)

The first constraint links production quantities to setup decisions using big-M methodology. The second ensures minimum production batch sizes when a setup occurs. The third controls maximum production limits per setup, maintaining cost-effective production runs.

Changeover Logic:

\begin{matrix} \sum_{p_{j} \neq p_{i}} W_{p_{i} p_{j} m t} & = V_{p_{i}, m, t - 1} - V_{p_{i}, m t}, \forall p_{i}, m, t \end{matrix}

(15)

\begin{matrix} \sum_{p_{i} \neq p_{j}} W_{p_{i} p_{j} m t} & = V_{p_{j}, m t} - V_{p_{j}, m, t - 1}, \forall p_{j}, m, t \end{matrix}

(16)

These logical constraints control machine changeover sequences by tracking transitions between different paper grades while maintaining temporal consistency in production scheduling.

Color/Grade Transition Restrictions:

\begin{matrix} W_{p_{i} p_{j} m t} = 0, \forall (p_{i}, p_{j}) \in Φ_{i n c o m p a t i b l e}, \forall m, t \end{matrix}

(17)

This constraint prevents direct transitions between incompatible paper grades (e.g., dark to light colors), ensuring product quality and reducing contamination risks.

Maximum Daily Changeovers:

\begin{matrix} \sum_{p_{i} = 1}^{p} \sum_{p_{j} \neq p_{i}} W_{p_{i} p_{j} m t} \leq M a x C h a n g e o v e r_{m}, \forall m, t \end{matrix}

(18)

These limitations control the total number of product changeovers per machine per period, balancing production flexibility with setup cost management and operational stability.

Production–Distribution Material Flow Balance:

\begin{matrix} \sum_{m = 1}^{m} Q_{p m t} = \sum_{w = 1}^{w} \sum_{l = 1}^{l} F_{w p l t}, \forall p, t \end{matrix}

(19)

This constraint ensures that total production output for each paper grade equals total shipments from production facilities to warehouses, creating the fundamental link between lower-level production decisions and upper-level distribution planning.

Quality Grade Production Requirements:

\begin{matrix} C h \{\sum_{m \in P M_{p}} Q_{p m t} \cdot {\tilde{γ}}_{p t} \geq Q M i n_{p} \cdot \sum_{m \in P M_{p}} V_{p m t}\} \geq β_{25}, \forall p, t \end{matrix}

(20)

This constraint guarantees that the total quality-adjusted production output for each paper grade meets the minimum quality standards required by customer specifications.

Machine Quality Compatibility:

\begin{matrix} C h \{Q_{p m t} \cdot {\tilde{γ}}_{p t} \geq M i n Q u a l i t y_{p m} \cdot V_{p m t}\} \geq β_{26}, \forall p, m \in P M_{p}, t \end{matrix}

(21)

These requirements ensure that each machine–product combination achieves the necessary quality levels, preventing the production of substandard products.

Raw Material Requirements:

\begin{matrix} C h \{\sum_{p \in P R_{r}} α_{p r} \cdot Q_{p m t} \cdot {\tilde{γ}}_{p t} \leq I_{r, t - 1}^{R} + \sum_{s = 1}^{s} R_{s r t}\} \geq β_{5}, \forall r, m, t \end{matrix}

(22)

This constraint ensures that sufficient raw materials are available to support the planned production quantities, accounting for quality yield variations and material consumption rates.

Raw Material Inventory Balance:

\begin{matrix} C h \{I_{r, t - 1}^{R} + \sum_{s = 1}^{s} R_{s r t} - \sum_{p \in P R_{r}} \sum_{m = 1}^{m} α_{p r} \cdot Q_{p m t} \cdot {\tilde{γ}}_{p t} = I_{r t}^{R}\} \geq β_{6}, \forall r, t \end{matrix}

(23)

These balance equations maintain continuity of raw material inventories by tracking incoming purchases, consumption in production processes, and ending inventory levels.

Raw Material Availability:

\begin{matrix} C h \{\sum_{s = 1}^{s} R_{s r t} \leq {\tilde{θ}}_{r t}\} \geq β_{7}, \forall r, t \end{matrix}

(24)

This constraint controls supplier capacity limitations and market availability constraints for raw materials, incorporating supply uncertainty into procurement decisions.

Warehouse Capacity:

\begin{matrix} \sum_{p = 1}^{p} I_{w p t}^{W} \leq W C A P_{w}, \forall w, t \end{matrix}

(25)

These capacity restrictions ensure that warehouse storage limits are respected across all product types, preventing facility overutilization and maintaining storage efficiency.

Warehouse Inventory Balance:

\begin{matrix} I_{w p, t - 1}^{W} + \sum_{l = 1}^{l} F_{w p l t} - \sum_{d = 1}^{d} \sum_{l = 1}^{l} X_{w d p l t} = I_{w p t}^{W}, \forall w, p, t \end{matrix}

(26)

These equations preserve inventory continuity at warehouse facilities by balancing previous inventory levels, incoming production, and outgoing shipments to distribution centers.

2.5. Bi-Level Coordination Algorithm

The coordination between production and distribution decisions requires an iterative approach that ensures feasible material flow while satisfying probabilistic constraints. To address this challenge, we develop a specialized bi-level coordination algorithm that alternates between solving the lower-level production problem and upper-level distribution problem until convergence is achieved. Algorithm 1 presents the detailed coordination procedure.

Algorithm 1 Bi-Level Coordination for Paper Manufacturing Planning

Require: Initial production capacities, demand forecasts, distribution network

1:: Initialize production variables Q_pmt and distribution variables X_wdplt, Z_dcplt
2:: repeat
3:: Lower Level: Solve production problem to determine Q_pmt given current distribution requirements
4:: Update material availability: $A v a i l a b l e M a t e r i a l_{p t} = \sum_{m = 1}^{m} Q_{p m t}$
5:: Upper Level: Solve distribution problem with constraint $\sum_{w, d, l} X_{w d p l t} \leq A v a i l a b l e M a t e r i a l_{p t}$
6:: Calculate normalized coordination gap: $G a p = \frac{| \sum_{m} Q_{p m t} - \sum_{w, l} F_{w p l t} |}{\sum_{m} Q_{p m t} + 10^{- 6}}$
7:: until Gap < 0.01 AND all chance constraints satisfied
8:: return Coordinated solution (Q, X, Z)

This algorithm ensures feasible coordination by iteratively solving both levels while enforcing material flow constraints (5), (7), and (19) until convergence. The normalized coordination gap metric represents the relative material flow imbalance as a fraction of total production, with values close to 0 indicating strong bi-level coordination.

2.6. Paper-Manufacturing-Specific Constraints

Paper Grade Quality Requirements:

\begin{matrix} C h \{Q_{p m t} \cdot {\tilde{γ}}_{p t} \geq Q M i n_{p} \cdot V_{p m t}\} \geq β_{8}, \forall p, m, t \end{matrix}

(27)

This constraint requires that quality-adjusted production meets minimum grade specifications, ensuring customer satisfaction and regulatory compliance for paper products.

Shelf-Life Constraints:

I_{w p t}^{W} \leq \sum_{τ = max (1, t - S L_{p} + 1)}^{t} \sum_{l = 1}^{l} F_{w p τ l}, \forall w, p, t

(28)

I_{d p t}^{D} \leq \sum_{τ = max (1, t - S L_{p} + 1)}^{t} \sum_{w = 1}^{w} \sum_{l = 1}^{l} X_{w d p τ l}, \forall d, p, t

(29)

The first constraint prevents inventory spoilage at warehouses by limiting stock to recently produced items within shelf-life limits. The second maintains similar freshness requirements at distribution centers to maintain product quality.

Seasonal Production Constraints:

\begin{matrix} C h \{\sum_{m = 1}^{m} Q_{p m t} \geq S M i n_{p t} \cdot \sum_{c = 1}^{c} {\tilde{D}}_{p c t}\} & \geq β_{9}, \forall p \in P_{s e a s o n a l}, t \end{matrix}

(30)

\begin{matrix} C h \{\sum_{m = 1}^{m} Q_{p m t} \leq S M a x_{p t} \cdot \sum_{c = 1}^{c} {\tilde{D}}_{p c t}\} & \geq β_{10}, \forall p \in P_{s e a s o n a l}, t \end{matrix}

(31)

The first constraint controls minimum seasonal production levels relative to demand patterns for specialized paper grades. The second controls upper bounds on seasonal production to prevent overproduction and excess inventory.

Environmental Compliance:

\begin{matrix} C h \{\sum_{p = 1}^{P} \sum_{m = 1}^{M} \sum_{t = 1}^{T} {\tilde{E m i s s i o n R a t e}}_{p m} \cdot Q_{p m t} \leq E m i s s i o n L i m i t\} \geq β_{e m i s s i o n} \end{matrix}

(32)

This constraint ensures environmental compliance by limiting total emissions from production activities over the planning horizon. The uncertain emission rates (

{\tilde{E m i s s i o n R a t e}}_{p m}

) account for variability in production efficiency and equipment performance. The chance constraint formulation guarantees environmental compliance with confidence level β_emission, maintaining consistency with the hybrid uncertainty framework.

The confidence level β_emission = 0.95 follows EPA regulatory standards for industrial emissions compliance [16]. The emission limit of EmissionLimit = 1200 tons CO2-equivalent per planning horizon is based on established benchmarks for medium-scale paper manufacturing facilities [17].

Non-Negativity and Binary Constraints:

\begin{matrix} Q_{p m t}, X_{w d p l t}, Z_{d c p l t}, F_{w p l t}, I_{w p t}^{W}, I_{d p t}^{D}, I_{r t}^{R}, R_{s r t}, B_{p c t} \geq 0 \\ V_{p m t}, W_{p_{i} p_{j} m t}, Y_{d}, U_{l t} \in {0, 1} \\ d_{1}^{+}, d_{1}^{-}, d_{2}^{+}, d_{2}^{-}, d_{3}^{+}, d_{3}^{-}, d_{4}^{+}, d_{4}^{-} \geq 0 \\ 0 \leq λ_{p t} \leq 1 \end{matrix}

These constraints establish variable domains, ensuring the physical feasibility of continuous quantities, logical consistency of binary decisions, and proper bounds for service level indicators.

This comprehensive bi-level dependent-chance goal programming model captures the tactical production–distribution planning decisions in paper manufacturing under hybrid uncertainty, where chance measures Ch{·} are evaluated using uncertain random simulation techniques. The model incorporates explicit coordination constraints (5), (7), and (19) that ensure feasible integration between upper-level distribution decisions and lower-level production outputs, addressing the bi-level coupling requirements for tactical planning optimization.

3. Uncertain Random Theory for Hybrid Uncertainty Modeling

Real-world production and distribution systems in paper manufacturing face dual uncertainty challenges that traditional optimization approaches struggle to address adequately. These systems must contend with objective variability inherent in operational processes (aleatory uncertainty) alongside subjective assessments arising from limited information and expert judgments (epistemic uncertainty). While classical probability theory excels at modeling random phenomena and fuzzy set theory handles belief-based information, neither framework alone sufficiently captures scenarios where both uncertainty types coexist simultaneously.

This section develops a comprehensive mathematical framework based on uncertain random theory that unifies both uncertainty dimensions within a single modeling paradigm. Our approach employs chance measures to create integrated representations that facilitate tactical decision-making under hybrid uncertainty conditions in paper manufacturing environments.

3.1. Mathematical Foundation of Uncertain Random Variables

Manufacturing planning decisions frequently involve parameters that exhibit both stochastic behavior and subjective uncertainty components. To address this complexity, we establish uncertain random variables as our fundamental modeling construct, building upon the theoretical framework developed by Liu [18].

Definition 1

(Uncertain Random Variable). Let

(Ω, A, P r)

denote a probability space and consider a mapping

\tilde{ξ} : Ω \to U

, where

U

represents the space of uncertain variables. The function

\tilde{ξ}

constitutes an uncertain random variable when

M {\tilde{ξ} (ω) \in B}

remains measurable with respect to ω for every Borel set

B \subseteq R

.

This mathematical structure captures the dual nature of hybrid uncertainty through its composition:

Stochastic Component: The probability space $(Ω, A, P r)$ models objective randomness derived from historical data and measurable variations.
Epistemic Component: For each probabilistic outcome ω, the mapping $\tilde{ξ} (ω)$ yields an uncertain variable representing subjective assessments and expert knowledge.

Manufacturing Context Example: Consider demand forecasting for paper grade p from customer c in period t, denoted as

{\tilde{D}}_{p c t}

. The stochastic dimension captures quantifiable demand patterns extracted from sales history, while the uncertain dimension incorporates qualitative factors such as market sentiment, competitive dynamics, and economic indicators that resist precise probabilistic characterization.

3.2. Chance Measure Framework

The integration of probability and uncertainty measures requires a unified metric capable of handling both components simultaneously. We employ the chance measure, which provides this integration through the following mathematical construction.

Definition 2

(Chance Measure). Given an uncertain random variable

\tilde{ξ}

and a Borel set

B \subseteq R

, the chance measure of the event

{\tilde{ξ} \in B}

is defined as:

C h {\tilde{ξ} \in B} = \int_{0}^{1} P r {ω \in Ω : M {\tilde{ξ} (ω) \in B} \geq r} d r

The chance measure satisfies essential mathematical properties that ensure consistent behavior:

Normalization: $C h {\tilde{ξ} \in R} = 1$
Self-Duality: $C h {\tilde{ξ} \in B} + C h {\tilde{ξ} \in B^{c}} = 1$
Monotonicity: $B_{1} \subseteq B_{2} \Rightarrow C h {\tilde{ξ} \in B_{1}} \leq C h {\tilde{ξ} \in B_{2}}$

Boundary Behavior: The chance measure reduces to familiar measures in limiting cases:

When uncertainty vanishes, $C h {\tilde{ξ} \in B} = P r {X \in B}$ for random variable X
When randomness vanishes, $C h {\tilde{ξ} \in B} = M {η \in B}$ for uncertain variable η

3.3. Distributional Characterization

Definition 3

(Chance Distribution Function). The chance distribution function of uncertain random variable

\tilde{ξ}

is given by:

Φ (x) = C h {\tilde{ξ} \leq x}, x \in R

Theorem 1

(Distribution Characterization). A function

Φ : R \to [0, 1]

represents a valid chance distribution if and only if it is non-decreasing with

{lim}_{x \to - \infty} Φ (x) = 0

and

{lim}_{x \to + \infty} Φ (x) = 1

.

3.4. Statistical Measures

For practical decision-making applications, we require scalar summary statistics that characterize uncertain random variable behavior.

Definition 4

(Expected Value). The expected value of uncertain random variable

\tilde{ξ}

is:

E [\tilde{ξ}] = \int_{0}^{+ \infty} C h {\tilde{ξ} \geq r} d r - \int_{- \infty}^{0} C h {\tilde{ξ} \leq r} d r

when at least one integral converges.

Definition 5

(Variance). For uncertain random variable

\tilde{ξ}

with finite expected value

μ = E [\tilde{ξ}]

, the variance is:

V a r [\tilde{ξ}] = E [{(\tilde{ξ} - μ)}^{2}]

Theorem 2

(Linearity Property). For uncertain random variable

\tilde{ξ}

with finite expectation and constants

a, b \in R

:

E [a \tilde{ξ} + b] = a E [\tilde{ξ}] + b

3.5. Dependent-Chance Goal Programming Methodology

Traditional optimization approaches in manufacturing planning typically focus on constraint satisfaction at predetermined confidence levels. However, tactical planning benefits from directly optimizing the likelihood of achieving strategic objectives. Dependent-chance goal programming addresses this need by incorporating probabilistic objective achievement into the optimization framework.

Definition 6

(Dependent-Chance Programming). Given uncertain random variable

\tilde{ξ}

and target threshold F, dependent-chance programming seeks to maximize

C h {\tilde{ξ} \leq F}

or

C h {\tilde{ξ} \geq F}

depending on the objective orientation.

For multi-objective tactical planning scenarios involving competing priorities, we extend this concept through goal programming principles:

Target Specification: Decision-makers establish desired achievement probabilities α_g for each tactical objective g
Deviation Modeling: Variables $d_{g}^{+}$ and $d_{g}^{-}$ capture positive and negative deviations from target achievement levels
Hierarchical Optimization: Objectives receive priority rankings and undergo sequential optimization based on strategic importance

The general mathematical formulation follows:

lexicographic minimize {d_{1}^{-}, d_{2}^{-}, \dots, d_{n}^{-}}

(33)

subject to C h {f_{g} (x, \tilde{ξ}) ⋈ F_{g}} + d_{g}^{-} - d_{g}^{+} = α_{g}, g = 1, \dots, n

(34)

x \in X

(35)

d_{g}^{+}, d_{g}^{-} \geq 0, g = 1, \dots, n

(36)

where

f_{g} (x, \tilde{ξ})

represents the g-th objective function, ⋈ denotes the appropriate inequality relation, F_g specifies the target threshold, and

X

defines the feasible decision space.

Methodological Advantages:

Decision Transparency: Probability-based metrics provide intuitive interpretation for management
Priority Accommodation: Lexicographic structure aligns with organizational strategic hierarchies
Robustness Enhancement: Probabilistic focus inherently addresses uncertainty impacts
Unified Treatment: Chance measures handle both aleatory and epistemic uncertainties consistently

3.6. Paper Manufacturing Parameter Modeling

Paper manufacturing tactical planning involves numerous parameters exhibiting hybrid uncertainty characteristics. Our uncertain random variable framework provides natural representations for key planning elements:

Demand Forecasting ${\tilde{D}}_{p c t}$ : Combines quantitative sales patterns with qualitative market intelligence and customer behavior assessments
Production Costs ${\tilde{P C}}_{p m t}$ : Integrates observable input price volatility with subjective cost estimations for different paper grades and machine configurations
Raw Material Availability ${\tilde{R C}}_{r t}$ : Merges market price data with uncertain supply chain assessments for pulp and recycled material sources
Transportation Expenses ${\tilde{T C}}_{i j l t}$ : Combines fuel cost fluctuations with uncertain capacity availability and route condition evaluations
Quality Performance ${\tilde{γ}}_{p t}$ : Unifies quantitative quality measurements with expert assessments of production environment factors
Equipment Efficiency ${\tilde{η}}_{p m t}$ : Integrates historical machine performance with expert judgments regarding maintenance requirements and operational conditions

This modeling approach enables more realistic representation of paper manufacturing uncertainty while maintaining mathematical rigor necessary for optimization-based tactical planning.

3.7. Tactical Planning Application Framework

We now present the specific dependent-chance goal programming formulation for paper manufacturing tactical planning, considering the following strategic priority hierarchy:

Priority Level 1—Economic Performance: The probability of total bi-level cost remaining below threshold Target₁ should achieve confidence level α₁.

\begin{matrix} C h {F^{U L} + F^{L L} \leq T a r g e t_{1}} + d_{1}^{-} - d_{1}^{+} = α_{1} \end{matrix}

(37)

Priority Level 2—Service Excellence: The probability of average service level exceeding threshold Target₂ should achieve confidence level α₂.

\begin{matrix} C h \{\frac{1}{| P | \times | T |} \sum_{p, t} λ_{p t} \geq T a r g e t_{2}\} + d_{2}^{-} - d_{2}^{+} = α_{2} \end{matrix}

(38)

Priority Level 3—Resource Optimization: The probability of average capacity utilization exceeding threshold Target₃ should achieve confidence level α₃.

\begin{matrix} C h \{\frac{1}{\sum_{m, t} 1} \sum_{m, t} \sum_{p} \frac{Q_{p m t}}{β_{p m} \cdot C A P_{m}} \geq T a r g e t_{3}\} + d_{3}^{-} - d_{3}^{+} = α_{3} \end{matrix}

(39)

Priority Level 4—Quality Assurance: The probability of average quality performance exceeding threshold Target₄ should achieve confidence level α₄.

\begin{matrix} C h \{\frac{1}{| P | \times | T |} \sum_{p, t} {\tilde{γ}}_{p t} \geq T a r g e t_{4}\} + d_{4}^{-} - d_{4}^{+} = α_{4} \end{matrix}

(40)

The complete tactical planning model becomes:

lexicographic minimize {d_{1}^{-}, d_{2}^{-}, d_{3}^{-}, d_{4}^{-}}

(41)

subject to Equations (49)–(52)

(42)

Production and distribution constraints

(43)

d_{g}^{+}, d_{g}^{-} \geq 0, g = 1, 2, 3, 4

(44)

This formulation systematically minimizes under-achievement of target probability levels across all tactical objectives. The first priority addresses cost minimization (seeking performance below threshold), while subsequent priorities target performance maximization (seeking performance above thresholds). The chance measure framework handles both random and uncertain parameter components while providing intuitive probabilistic interpretations for tactical decision-makers.

3.8. Implementation Considerations

The proposed uncertain random theory framework offers several practical advantages for paper manufacturing tactical planning:

Comprehensive Objective Integration: Simultaneously addresses economic efficiency, operational performance, resource utilization, and quality assurance within a unified bi-level optimization framework
Hybrid Uncertainty Management: Effectively handles both quantifiable variations and subjective assessments in critical parameters including demand, costs, efficiency, and quality metrics
Confidence-Based Planning: Enables tactical planners to specify desired confidence levels for goal achievement across the planning horizon
Systematic Multi-Objective Balancing: Provides structured approach to managing competing tactical objectives while considering integrated production-distribution decisions

The dependent-chance goal programming methodology prioritizes tactical goals through lexicographic optimization while accounting for complex interactions between random and uncertain variables characteristic of paper manufacturing environments. This approach generates operationally viable solutions that demonstrate robustness against the diverse uncertainty sources encountered in tactical production-distribution planning contexts.

4. Hybrid Intelligent Algorithm

Addressing the computational complexity inherent in our bi-level dependent-chance goal programming formulation for tactical paper manufacturing planning necessitates a sophisticated algorithmic approach. We propose a hybrid computational framework that synergistically combines uncertain random simulation methodologies with reinforcement learning-augmented arithmetic optimization techniques. This algorithmic architecture is specifically designed to navigate the intricate landscape of hybrid uncertainties while managing the hierarchical structure of our bi-level optimization model.

4.1. Uncertain Random Simulations

The computational evaluation of chance measures within our bi-level tactical planning framework requires specialized simulation methodologies to handle expressions of the form Ch{g_j(x, ξ) ≤ 0} ≥ β_j. These chance-constrained formulations represent fundamental components of our paper manufacturing optimization model, where accurate estimation becomes critical for obtaining feasible and robust tactical decisions.

The mathematical foundation underlying chance measure evaluation rests on recognizing that Ch{g(x, ξ) ≤ 0} constitutes the expected value of uncertain measures

M {g (x, ξ (ω)) \leq 0}

across all possible random realizations. Drawing inspiration from Monte Carlo principles [19], we construct a specialized computational approach tailored for chance-measure approximation in tactical planning contexts.

Our uncertain random simulation methodology serves as the computational backbone for evaluating probabilistic constraints under hybrid uncertainty conditions characteristic of paper manufacturing environments. The following Algorithm 2 establishes our Monte Carlo framework that systematically samples from random distributions and computes uncertain measures for each realization, ultimately yielding reliable chance measure estimates essential for production-distribution decision-making.

Algorithm 2 Uncertain Random Simulation for Chance Measure Estimation

Description: This algorithm estimates the chance measure of uncertain random events by employing Monte Carlo sampling to approximate the expected value of uncertain measures across various random scenarios in paper manufacturing planning.

1:: Initialize counter ← 0
2:: for k = 1 to N_MC do
3:: Generate random sample ω_k based on its probability distribution
4:: Compute $M {g (x, ξ (ω_{k})) \leq 0}$ using Algorithm 3
5:: $c o u n t e r \leftarrow c o u n t e r + M {g (x, ξ (ω_{k})) \leq 0}$
6:: end for
7:: return $c o u n t e r / N_{M C}$ as the estimated chance measure

The evaluation of uncertain measures requires implementing uncertainty theory principles [20] to assess constraint satisfaction under subjective uncertainty conditions. Algorithm 3 implements an inverse distribution approach for generating uncertain variable realizations and determining constraint satisfaction at specified uncertainty thresholds.

Algorithm 3 Uncertain Measure Computation

Description: This algorithm computes the uncertain measure for a specified constraint by employing the inverse distribution function approach to generate uncertain variable realizations and evaluate constraint satisfaction in tactical planning contexts.

1:: Generate α uniformly from [0, 1]
2:: Compute τ⁻¹(α) for uncertain variables τ in ξ(ω_k)
3:: if g(x, ξ(ω_k)|_{τ=τ⁻¹(α)}) ≤ 0 then
4:: return 1
5:: else
6:: return 0
7:: end if

4.2. Theoretical Foundation of Exploration-Exploitation Trade-Off

The fundamental challenge in metaheuristic optimization lies in achieving optimal balance between search space exploration and solution refinement (exploitation). Following the theoretical framework established by Morales-Castañeda et al. [21], this balance can be quantitatively assessed through diversity metrics that capture population distribution characteristics throughout the search landscape.

Consider a population consisting of N individuals operating within a D-dimensional decision space. The diversity measure at iteration t is mathematically expressed as:

Diversity (t) = \frac{1}{D} \sum_{j = 1}^{D} \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(x_{i, j} (t) - {\bar{x}}_{j} (t))}^{2}}

(45)

where x_i,j(t) denotes the j-th dimensional component of the i-th individual at iteration t, and

{\bar{x}}_{j} (t)

represents the population centroid in dimension j. The corresponding exploration and exploitation percentages are computed as:

Exploration % (t) = \frac{Diversity (t)}{{Diversity}_{max}} \times 100

(46)

Exploitation % (t) = 100 - Exploration % (t)

(47)

Balance quality assessment employs incremental-decremental analysis:

Incremental (t) = max (0, Diversity (t) - Diversity (t - 1))

(48)

Decremental (t) = max (0, Diversity (t - 1) - Diversity (t))

(49)

Balance Quality (t) = Incremental (t) - Decremental (t)

(50)

4.3. Reinforcement-Learning-Enhanced Arithmetic Optimization Algorithm

Our bi-level chance-constrained goal programming model demands a sophisticated optimization approach capable of coordinating tactical production-distribution decisions under hybrid uncertainty. We develop an enhanced Arithmetic Optimization Algorithm (AOA) framework augmented with reinforcement learning capabilities to address these computational challenges effectively.

4.3.1. Introduction to the Arithmetic Optimization Algorithm

The Arithmetic Optimization Algorithm draws inspiration from fundamental mathematical operations: addition (+), subtraction (−), multiplication (×), and division (÷) [22]. This metaheuristic approach simulates arithmetic operations to achieve effective search space exploration and exploitation, making it particularly well-suited for complex bi-level optimization challenges in paper manufacturing contexts.

Several distinctive characteristics render AOA especially effective for tactical production-distribution planning applications:

Mathematical Optimizer (MO) Framework: AOA incorporates a Mathematical Optimizer mechanism that dynamically determines exploration versus exploitation strategies based on the Mathematical Optimizer Accelerated (MOA) coefficient:

$M O A (C_{i t e r}) = M i n + C_{i t e r} \times \frac{M a x - M i n}{C_{M a x}}$

(51)

where C_iter represents the current iteration number, C_Max denotes maximum iterations, and Min, Max define operational bounds.
Arithmetic Operation Strategy: AOA employs four fundamental arithmetic operators for position updating:
- Addition and subtraction operations facilitate exploration
- Multiplication and division operations enable exploitation
Adaptive Search Framework: The algorithm dynamically transitions between exploration and exploitation phases based on optimization progress, providing ideal coordination capabilities for bi-level optimization.
Hierarchical Optimization Support: AOA’s mathematical foundation naturally accommodates hierarchical optimization scenarios where upper-level distribution decisions and lower-level production decisions require coordination through arithmetic operations.

These algorithmic characteristics make AOA particularly suitable for solving our bi-level dependent-chance goal programming model with hybrid uncertainty in paper manufacturing tactical planning.

4.3.2. Basic Arithmetic Optimization Algorithm

The fundamental AOA framework operates through the following computational phases:

Population Initialization: Generate N_p solution candidates randomly within feasible regions for both upper- and lower-level variables.
Mathematical Optimizer Computation: Calculate the Mathematical Optimizer Accelerated function:

$M O A (C_{i t e r}) = M i n + C_{i t e r} \times \frac{M a x - M i n}{C_{M a x}}$

(52)
Mathematical Optimizer Phase: Determine search strategy orientation:

$M O P (C_{i t e r}) = 1 - \frac{C_{i t e r}^{1 / α}}{C_{M a x}^{1 / α}}$

(53)

where α controls exploitation precision.
Exploration Phase (when r₁ > MOA): Apply addition and subtraction operators:

$\begin{matrix} x_{i, j}^{t + 1} & = b e s t_{j} \times M O P \times ((U B_{j} - L B_{j}) \times μ + L B_{j}), r_{2} < 0.5 \end{matrix}$

(54)

$\begin{matrix} x_{i, j}^{t + 1} & = b e s t_{j} \times M O P \times ((U B_{j} - L B_{j}) \times μ + L B_{j}), r_{2} \geq 0.5 \end{matrix}$

(55)
Exploitation Phase (when r₁ ≤ MOA): Apply multiplication and division operators:

$\begin{matrix} x_{i, j}^{t + 1} & = b e s t_{j} - M O P \times ((U B_{j} - L B_{j}) \times μ + L B_{j}), r_{3} < 0.5 \end{matrix}$

(56)

$\begin{matrix} x_{i, j}^{t + 1} & = b e s t_{j} + M O P \times ((U B_{j} - L B_{j}) \times μ + L B_{j}), r_{3} \geq 0.5 \end{matrix}$

(57)

This fundamental framework establishes the foundation for our enhanced approach, demonstrating core mathematical operations and search mechanisms that underpin our reinforcement learning enhancements.

4.3.3. Reinforcement Learning Enhancement

We augment the basic AOA framework with reinforcement learning methodologies [23] to enhance convergence characteristics and solution quality for bi-level tactical planning applications. The RL integration encompasses:

State Space Representation: The state vector incorporates discretized optimization metrics for both hierarchical levels:

$\begin{matrix} s_{t} = {d i v e r s i t y_{d}^{U L}, d i v e r s i t y_{d}^{L L}, {convergence_rate}_{d}, s t a g n a t i o n_{d}, {bilevel_gap}_{d}} \end{matrix}$

(58)

where continuous metrics undergo discretization into three categories (Low, Medium, High) according to tactical planning requirements.
Action Space Definition: Actions involve adjusting AOA parameters and bi-level coordination strategies:
- Action 1: Enhance exploration intensity;
- Action 2: Intensify exploitation focus;
- Action 3: Balance exploration-exploitation dynamics;
- Action 4: Strengthen bi-level coordination.
Reward Function Structure: The reward mechanism incorporates both solution quality and bi-level coordination effectiveness:

$R_{t} = \{\begin{matrix} \frac{f_{t - 1}^{U L} + f_{t - 1}^{L L} - (f_{t}^{U L} + f_{t}^{L L})}{| f_{t - 1}^{U L} + f_{t - 1}^{L L} | + ϵ} + λ \cdot c o o r d_{t} & if improvement \\ - p e n a l t y & otherwise \end{matrix}$

(59)

where coord_t quantifies bi-level coordination quality and λ represents coordination weighting.
Q-Learning Implementation: We implement Q-learning [24] with epsilon-greedy action selection:

$Q (s_{t}, a_{t}) \leftarrow Q (s_{t}, a_{t}) + α [R_{t} + γ max_{a} Q (s_{t + 1}, a) - Q (s_{t}, a_{t})]$

(60)

where α denotes learning rate, γ represents discount factor, s_t is current state, and a_t is selected action.

4.3.4. Adaptive Parameter Tuning for Bi-Level Coordination

Enhancing bi-level coordination performance requires implementing adaptive parameter tuning mechanisms:

Mathematical Optimizer Adaptation:

$M O A^{t + 1} = M O A_{b a s e} + ϕ \cdot sin (2 π \cdot \frac{t}{T_{m a x}}) \cdot coordination_factor$

(61)

where ϕ controls adaptation intensity and coordination_factor reflects bi-level synchronization requirements.
Exploration-Exploitation Balance:

$\begin{matrix} α^{t + 1} & = α_{m i n} + (α_{m a x} - α_{m i n}) \cdot e^{- ρ \cdot t / T_{m a x}} \end{matrix}$

(62)

$\begin{matrix} M O P^{t + 1} & = M O P_{b a s e} \cdot (1 + β \cdot level_performance) \end{matrix}$

(63)

where level_performance measures relative performance between upper and lower levels.
Bi-Level Coordination Enhancement:

${coord_weight}^{t + 1} = c o o r d_{b a s e} \cdot (1 + γ \cdot gap_ratio)$

(64)

where gap_ratio quantifies coordination gaps between production and distribution decisions.

4.4. Reinforcement Learning Components

Building upon the fundamental AOA framework, we integrate reinforcement learning methodologies to create an adaptive bi-level optimization system. The RL enhancement incorporates state representation, action space definition, and adaptive parameter control for tactical planning coordination.

4.4.1. State Space Discretization for Bi-Level Planning

Continuous optimization metrics for both hierarchical levels undergo discretization using the following strategy:

Upper-Level Diversity Measure:

$d i v e r s i t y_{d}^{U L} = \{\begin{matrix} Low & if d i v e r s i t y^{U L} \leq 0.25 \\ Medium & if 0.25 < d i v e r s i t y^{U L} \leq 0.65 \\ High & if d i v e r s i t y^{U L} > 0.65 \end{matrix}$

(65)
Lower-Level Diversity Measure:

$d i v e r s i t y_{d}^{L L} = \{\begin{matrix} Low & if d i v e r s i t y^{L L} \leq 0.30 \\ Medium & if 0.30 < d i v e r s i t y^{L L} \leq 0.70 \\ High & if d i v e r s i t y^{L L} > 0.70 \end{matrix}$

(66)
Bi-Level Gap Measure:

${bilevel_gap}_{d} = \{\begin{matrix} Synchronized & if g a p \leq 0.05 \\ Moderate & if 0.05 < g a p \leq 0.20 \\ Divergent & if g a p > 0.20 \end{matrix}$

(67)

4.4.2. Q-Learning for Bi-Level Parameter Adaptation

We implement Q-learning to adaptively adjust algorithm parameters based on bi-level optimization progress:

Q-learning facilitates algorithm learning of optimal parameter configurations for both production and distribution planning through iterative trial-and-error processes, utilizing temporal difference updates to enhance future decision-making capabilities. This mechanism, as shown in Algorithm 4, enables optimization process adaptation based on observed performance improvements across both levels.

Algorithm 4 Q-Learning Bi-Level Parameter Adaptation

Description: This algorithm implements temporal difference learning to update Q-values for bi-level parameter adaptation, enabling the system to learn optimal parameter settings based on tactical planning performance feedback from both the production and distribution levels.
Require: Current state s_t, action a_t, reward R_t, next state s_t+1
Require: Learning rate α, discount factor γ, Q-table Q, exploration rate ϵ

1:: Calculate temporal difference: $δ = R_{t} + γ {max}_{a^{'}} Q (s_{t + 1}, a^{'}) - Q (s_{t}, a_{t})$
2:: Update Q-value: Q(s_t, a_t) ← Q(s_t, a_t) + α · δ
3:: Update bi-level coordination factor: coord_factor ← coord_factor + β · δ
4:: Select next action using epsilon-greedy:
5:: if random() < ϵ then
6:: a_t+1 ← random action
7:: else
8:: a_t+1 ← arg max_a Q(s_t+1, a)
9:: end if
10:: return Updated Q-table, coordination factor, and next action a_t+1

4.4.3. Adaptive Parameter Control for Tactical Planning

The reinforcement learning component for tactical production-distribution planning, includes:

State Representation: $s_{t} = {d i v e r s i t y_{d}^{U L}, d i v e r s i t y_{d}^{L L}, {convergence_rate}_{d}, s t a g n a t i o n_{d}, {bilevel_gap}_{d}}$ with 3⁵ = 243 possible discrete states.
Action Space: Four discrete actions for bi-level parameter adjustment:
- Action 1: Enhance exploration in both levels;
- Action 2: Focus exploitation in production level;
- Action 3: Balance search across levels;
- Action 4: Strengthen inter-level coordination.
Reward Function: Bi-level improvement with coordination bonus.
Action Selection: Epsilon-greedy strategy [23] with adaptive exploration rate.

Algorithm 5 details how these reinforcement learning components are used to dynamically adjust the Arithmetic Optimization Algorithm parameters.

Algorithm 5 RL-Based Adaptive Parameter Control for Bi-Level AOA

Description: This algorithm dynamically adjusts Arithmetic Optimization Algorithm parameters based on reinforcement learning actions, optimizing exploration-exploitation balance for both the production and distribution planning levels.
Require: Current iteration t, RL action a_t, base parameters, bi-level gap

1:: Calculate base MOA: $M O A_{b a s e} = M i n + t \times \frac{M a x - M i n}{T_{m a x}}$
2:: Calculate base MOP: $M O P_{b a s e} = 1 - \frac{t^{1 / α}}{T_{m a x}^{1 / α}}$
3:: if a_t = 1 then
4:: MOA = MOA_base × 0.7, MOP = MOP_base × 1.4
5:: {Enhance Exploration}
6:: α = 2.5, coord_weight = 0.3
7:: else if a_t = 2 then
8:: MOA = MOA_base × 1.3, MOP = MOP_base × 0.6
9:: {Focus Exploitation}
10:: α = 4.0, coord_weight = 0.2
11:: else if a_t = 3 then
12:: MOA = MOA_base, MOP = MOP_base
13:: {Balanced Search}
14:: α = 3.0, coord_weight = 0.4
15:: else
16:: MOA = MOA_base × 1.1, MOP = MOP_base × 0.8
17:: {Strengthen Coordination}
18:: α = 3.5, coord_weight = 0.6
19:: end if
20:: Update exploration rate: ϵ^t+1 = max(ϵ_min, ϵ_max · e^{−σ·t/T_max})
21:: return Updated parameters $(M O A, M O P, α, c o o r d_w e i g h t, ϵ)$

4.5. Solution Encoding and Decoding for Bi-Level Planning

Efficient solution representation is crucial for managing the complexity inherent in our bi-level tactical production-distribution planning model. We implement a hierarchical encoding scheme that effectively captures both binary and continuous decision variables across both planning levels.

Bi-Level Variable Encoding Strategy

For upper-level binary decision variables (Y_d, U_lt), we implement sigmoid-based transformation:

Y_{d} = \{\begin{matrix} 1, & if σ (z_{d}^{U L}) \geq 0.5 \\ 0, & otherwise \end{matrix}

(68)

where

σ (z) = \frac{1}{1 + e^{- z}}

represents the sigmoid function.

For lower-level binary decision variables (V_pmt,

W_{p_{i} p_{j} m t}

), we use:

V_{p m t} = \{\begin{matrix} 1, & if σ (z_{p m t}^{L L}) \geq 0.5 \\ 0, & otherwise \end{matrix}

(69)

For continuous decision variables at both levels, we employ direct encoding with bound constraints:

v a r^{U L / L L} = l b_{v a r} + (u b_{v a r} - l b_{v a r}) \times n o r m a l i z e (z_{v a r})

(70)

To handle bi-level dependencies between decision variables:

Q_{p m t} = \{\begin{matrix} Q_{p m t}^{*}, & if Y_{d} = 1 and V_{p m t} = 1 \\ 0, & otherwise \end{matrix}

(71)

The solution vectors in our bi-level optimization model are encoded and decoded using Algorithm 6, which transforms continuous metaheuristic solutions into mixed-integer variables while preserving hierarchical dependencies between upper and lower decision levels.

Algorithm 6 Bi-Level Solution Encoding and Decoding

Description: This algorithm transforms continuous solution vectors into mixed-integer solutions for both production and distribution levels by applying sigmoid functions for binary variables and enforcing hierarchical dependencies between bi-level decision variables.
Require: Continuous solution vector z = [z^UL, z^LL]

1:: Upper-Level Binary Variable Decoding:
2:: for each upper-level binary variable Y_d, U_lt do
3:: $Y_{d} = I [σ (z_{d}^{U L}) \geq 0.5]$ , $U_{l t} = I [σ (z_{l t}^{U L}) \geq 0.5]$
4:: end for
5:: Lower-Level Binary Variable Decoding:
6:: for each lower-level binary variable V_pmt, $W_{p_{i} p_{j} m t}$ do
7:: $V_{p m t} = I [σ (z_{p m t}^{L L}) \geq 0.5]$ , $W_{p_{i} p_{j} m t} = I [σ (z_{p_{i} p_{j} m t}^{L L}) \geq 0.5]$
8:: end for
9:: Continuous Variable Decoding:
10:: for each continuous variable var at both levels do
11:: var = lb_var + (ub_var − lb_var) · normalize(z_var)
12:: end for
13:: Bi-Level Dependency Enforcement:
14:: for each dependent variable pair across levels do
15:: if upper-level parent variable is 0 then
16:: Set dependent lower-level variables to 0
17:: end if
18:: end for
19:: return Decoded bi-level solution x = [x^UL, x^LL]

4.6. Constraint Handling Mechanisms for Bi-Level Planning

We implement a hierarchical penalty function approach to handle bi-level dependent-chance constraints:

\begin{matrix} F (x) & = M^{U L} \cdot \sum_{j = 1}^{m^{U L}} λ_{j}^{U L} (t) \cdot max (0, β_{j} - C h {g_{j}^{U L} (x^{U L}) \leq 0}) \\ + M^{L L} \cdot \sum_{j = 1}^{m^{L L}} λ_{j}^{L L} (t) \cdot max (0, β_{j} - C h {g_{j}^{L L} (x^{L L}) \leq 0}) \\ + \sum_{g = 1}^{4} w_{g} \cdot d_{g}^{-} \end{matrix}

(72)

where M^UL and M^LL represent large penalty multipliers for upper- and lower-level constraints, respectively, ensuring constraint violations dominate goal deviations, and

λ_{j}^{U L} (t)

,

λ_{j}^{L L} (t)

are penalty parameters that increase over iterations:

λ_{j}^{U L} (t) = λ_{j}^{U L, 0} \cdot (1 + δ^{U L} \cdot t), λ_{j}^{L L} (t) = λ_{j}^{L L, 0} \cdot (1 + δ^{L L} \cdot t)

(73)

For dependent-chance goals requiring coordination between levels, we implement progressive tightening:

β_{j} (t) = β_{j}^{f i n a l} \cdot (1 - e^{- κ \cdot t / T_{m a x}})

(74)

Our optimization model employs Algorithm 7 to handle the complex chance constraints across both planning levels, implementing a dynamic penalty mechanism that evaluates dependent-chance goals while preserving the hierarchical structure between production and distribution decisions.

Algorithm 7 Dynamic Penalty Constraint Handling for Bi-Level Planning

Description: This algorithm evaluates solution fitness by computing dependent-chance goals and constraints for both the production and distribution levels, applying hierarchical penalty functions to handle constraint violations while preserving lexicographic priority and bi-level coordination.
Require: Bi-level solution x = [x^UL, x^LL], iteration t, penalty parameters

1:: Evaluate Dependent-Chance Goals:
2:: for each goal g = 1 to 4 do
3:: Calculate Ch{F^UL + F^LL ≤ Target_g} using Algorithm 2
4:: Calculate deviation: $d_{g}^{-} = max (0, α_{g} - C h {goal_achievement})$
5:: end for
6:: Evaluate Upper-Level Chance Constraints:
7:: for each upper-level constraint j = 1 to m^UL do
8:: Calculate $C h {g_{j}^{U L} (x^{U L}) \leq 0}$ using Algorithm 2
9:: Calculate violation: $v i o l_{j}^{U L} = max (0, β_{j} - C h {g_{j}^{U L} (x^{U L}) \leq 0})$
10:: end for
11:: Evaluate Lower-Level Chance Constraints:
12:: for each lower-level constraint j = 1 to m^LL do
13:: Calculate $C h {g_{j}^{L L} (x^{L L}) \leq 0}$ using Algorithm 2
14:: Calculate violation: $v i o l_{j}^{L L} = max (0, β_{j} - C h {g_{j}^{L L} (x^{L L}) \leq 0})$
15:: end for
16:: Compute Hierarchical Penalized Fitness:
17:: $F_{p e n a l t y} (x) = M^{U L} \cdot \sum_{j = 1}^{m^{U L}} λ_{j}^{U L} (t) \cdot v i o l_{j}^{U L} + M^{L L} \cdot \sum_{j = 1}^{m^{L L}} λ_{j}^{L L} (t) \cdot v i o l_{j}^{L L} + \sum_{g = 1}^{4} w_{g} \cdot d_{g}^{-}$
18:: Update Penalty Parameters:
19:: for each constraint j at both levels do
20:: $λ_{j}^{U L} (t) = λ_{j}^{U L, 0} \cdot (1 + δ^{U L} \cdot t)$
21:: $λ_{j}^{L L} (t) = λ_{j}^{L L, 0} \cdot (1 + δ^{L L} \cdot t)$
22:: end for
23:: return Penalized fitness F_penalty(x)

4.7. Complete Hybrid Intelligent Algorithm

The comprehensive hybrid intelligent algorithm is structured in a modular framework comprising six specialized components designed for bi-level tactical planning coordination.

Proper initialization proves crucial for bi-level optimization success. This algorithm establishes populations for both levels, configures all learning components, and sets up the parameter framework that guides the entire tactical planning optimization process. The hybrid optimization process begins with Algorithm 8.

Algorithm 8 Bi-Level Algorithm Initialization

Description: This algorithm initializes populations for both production and distribution planning levels, Q-learning components, and all algorithm parameters, establishing the foundation for the hybrid bi-level optimization process.

1:: Initialize upper-level population of $N_{p}^{U L}$ solutions randomly in feasible region
2:: Initialize lower-level population of $N_{p}^{L L}$ solutions randomly in feasible region
3:: Initialize Q-table for reinforcement learning with small random values
4:: Set RL parameters: α = 0.1, γ = 0.9, ϵ_start = 0.9, ϵ_end = 0.1
5:: Set AOA parameters: Min = 0.2, Max = 1.0, α = 5.0, μ = 0.5
6:: Set penalty parameters: $λ_{j}^{U L, 0} = 1.0$ , $λ_{j}^{L L, 0} = 1.0$ , δ^UL = 0.1, δ^LL = 0.1
7:: Evaluate initial bi-level solutions using Algorithm 7
8:: Identify global best solutions $x_{g b e s t}^{U L}$ , $x_{g b e s t}^{L L}$ and initialize $f_{b e s t}^{U L}$ , $f_{b e s t}^{L L}$
9:: Initialize stagnation counters: stag^UL = 0, stag^LL = 0

Effective reinforcement learning for bi-level optimization requires continuous monitoring of both production and distribution planning states. Algorithm 9 captures key performance indicators for both levels and implements intelligent action selection for coordinated parameter adjustment.

Algorithm 9 RL State Observation and Action Selection for Bi-Level Planning

Description: This algorithm observes the current optimization state for both the production and distribution levels, calculates key performance metrics, and selects actions using an epsilon-greedy strategy to balance exploration and exploitation in bi-level parameter adaptation.

1:: Calculate upper-level diversity: $d i v e r s i t y^{U L} = \frac{1}{N_{p}^{U L}} \sum_{i = 1}^{N_{p}^{U L}} ‖ x_{i}^{U L} - x_{g b e s t}^{U L} ‖$
2:: Calculate lower-level diversity: $d i v e r s i t y^{L L} = \frac{1}{N_{p}^{L L}} \sum_{i = 1}^{N_{p}^{L L}} ‖ x_{i}^{L L} - x_{g b e s t}^{L L} ‖$
3:: Calculate convergence rate: $c o n v_{r a t e} = \frac{{(f_{b e s t}^{U L} + f_{b e s t}^{L L})}^{t - 5} - {(f_{b e s t}^{U L} + f_{b e s t}^{L L})}^{t}}{5}$
4:: Calculate bi-level gap: $g a p = \frac{| {coordination_measure}^{U L} - {coordination_measure}^{L L} |}{{coordination_measure}^{U L} + {coordination_measure}^{L L}}$
5:: Define state: s_t = {diversity^UL, diversity^LL, conv_rate, stag, gap}
6:: Update ϵ: ϵ = max(ϵ_end, ϵ_start · 0.995^t)
7:: if random() < ϵ then
8:: Select random action a_t ∈ {1, 2, 3, 4}
9:: else
10:: a_t = arg max_aQ(s_t, a)
11:: end if
12:: Update AOA parameters using Algorithm 5

The core of the Arithmetic Optimization Algorithm lies in its mathematical-operation-based position updates. This algorithm implements the exploration and exploitation phases for both production and distribution planning levels through coordinated arithmetic operations.

Solution improvement and learning coordination are essential for bi-level optimization success. This algorithm manages global best solutions for both levels, quantifies performance improvements, and facilitates Q-learning for adaptive parameter coordination.

Managing algorithm convergence and coordination between levels is critical for bi-level optimization quality. Algorithm 10 monitors progress for both production and distribution planning, implements coordination strategies, and ensures robust termination.

Algorithm 10 Bi-Level Arithmetic Optimization Position Update

Description: This algorithm updates positions for both upper-level (distribution) and lower-level (production) populations using arithmetic operations, coordinating exploration and exploitation across both planning levels.

1:: Calculate $M O A = M i n + t \times \frac{M a x - M i n}{T_{m a x}}$
2:: Calculate $M O P = 1 - \frac{t^{1 / α}}{T_{m a x}^{1 / α}}$
3:: Update Upper-Level Population (Distribution):
4:: for i = 1 to $N_{p}^{U L}$ do
5:: for j = 1 to D^UL do
6:: Generate r₁, r₂, r₃ ∼ U(0, 1)
7:: if r₁ > MOA then
8:: {Exploration Phase}
9:: if r₂ < 0.5 then
10:: $x_{i, j}^{U L, t + 1} = b e s t_{j}^{U L} + M O P \times ((U B_{j}^{U L} - L B_{j}^{U L}) \times μ + L B_{j}^{U L})$
11:: else
12:: $x_{i, j}^{U L, t + 1} = b e s t_{j}^{U L} - M O P \times ((U B_{j}^{U L} - L B_{j}^{U L}) \times μ + L B_{j}^{U L})$
13:: end if
14:: else
15:: {Exploitation Phase}
16:: if r₃ < 0.5 then
17:: $x_{i, j}^{U L, t + 1} = b e s t_{j}^{U L} \times M O P \times ((U B_{j}^{U L} - L B_{j}^{U L}) \times μ + L B_{j}^{U L})$
18:: else
19:: $x_{i, j}^{U L, t + 1} = b e s t_{j}^{U L} \div M O P \times ((U B_{j}^{U L} - L B_{j}^{U L}) \times μ + L B_{j}^{U L})$
20:: end if
21:: end if
22:: end for
23:: end for
24:: Update Lower-Level Population (Production):
25:: for i = 1 to $N_{p}^{L L}$ do
26:: for j = 1 to D^LL do
27:: Generate r₁, r₂, r₃ ∼ U(0, 1)
28:: Apply similar arithmetic operations as upper level with coordination factor
29:: $x_{i, j}^{L L, t + 1} = arithmetic_operation (b e s t_{j}^{L L}, coordination_factor \times x_{i, j}^{U L, t + 1})$
30:: end for
31:: end for
32:: Apply boundary constraints and decode solutions using Algorithm 6
33:: Evaluate fitness using Algorithm 7

The main execution framework, implemented in Algorithm 11, orchestrates all bi-level algorithm components in a coordinated manner. This algorithm provides the high-level control structure that manages the iterative optimization process, coordinates the interaction between AOA and RL components across both levels, and ensures proper termination.

The comprehensive methodology ensures the development of a hybrid tactical planning framework that combines uncertain random simulation, reinforcement learning for adaptive coordination, and bi-level dependent-chance goal programming for multi-objective optimization. The approach is specifically designed to operate in hybrid uncertain environments where both data-driven and expert-based information guide tactical production-distribution decisions in paper manufacturing. This modular structure ensures the framework is scalable, interpretable, and adaptable to various tactical planning scenarios while maintaining coordination between production scheduling and distribution allocation decisions.

Algorithm 11 Main RL-Enhanced AOA Bi-Level Execution Framework

Description: This is the main execution framework that orchestrates all bi-level algorithm components, coordinating the Reinforcement-Learning-enhanced Arithmetic Optimization Algorithm process for tactical production-distribution planning from initialization to termination.

1:: Execute Algorithm 8 for bi-level system initialization
2:: for t = 1 to T_max do
3:: Execute Algorithm 9 for RL state observation and action selection
4:: Execute Algorithm 10 for bi-level arithmetic optimization position updates
5:: Execute Algorithm 12 for bi-level solution updates and RL learning
6:: Execute Algorithm 13 for stagnation management and termination check
7:: if Algorithm 13 returns True then
8:: break {Bi-level optimization completed}
9:: end if
10:: end for
11:: return Optimal bi-level solution $x_{g b e s t} = [x_{g b e s t}^{U L}, x_{g b e s t}^{L L}]$ with performance metrics

Algorithm 12 Bi-Level Solution Update and RL Learning

Description: This algorithm updates global best solutions for both production and distribution levels, calculates rewards based on bi-level fitness improvement, and performs Q-learning updates to enhance future parameter selection decisions.

1:: $f_{p r e v}^{U L} = f_{b e s t}^{U L}$ , $f_{p r e v}^{L L} = f_{b e s t}^{L L}$
2:: Update Upper-Level Solutions:
3:: for each upper-level solution i do
4:: if $f^{U L} (x_{i}^{U L}) < f^{U L} (x_{g b e s t}^{U L})$ then
5:: $x_{g b e s t}^{U L} = x_{i}^{U L}$ , $f_{b e s t}^{U L} = f^{U L} (x_{i}^{U L})$
6:: end if
7:: end for
8:: Update Lower-Level Solutions:
9:: for each lower-level solution i do
10:: if $f^{L L} (x_{i}^{L L}) < f^{L L} (x_{g b e s t}^{L L})$ then
11:: $x_{g b e s t}^{L L} = x_{i}^{L L}$ , $f_{b e s t}^{L L} = f^{L L} (x_{i}^{L L})$
12:: end if
13:: end for
14:: Calculate bi-level reward: $R_{t} = \frac{(f_{p r e v}^{U L} + f_{p r e v}^{L L}) - (f_{b e s t}^{U L} + f_{b e s t}^{L L})}{f_{p r e v}^{U L} + f_{p r e v}^{L L} + 10^{- 8}} + λ \cdot coordination_bonus$
15:: Observe next state s_t+1
16:: Update Q-table using Algorithm 4

After each position update, Algorithm 12 tracks solution improvements and provides feedback for reinforcement learning, ensuring continuous adaptation of the bi-level optimization process.

To prevent premature convergence and ensure solution quality, Algorithm 13 implements intelligent stagnation detection, strategic population reinitialization, and robust termination criteria for the bi-level optimization process.

Algorithm 13 Bi-Level Stagnation Management and Termination

Description: This algorithm monitors optimization progress for both levels, handles stagnation through coordinated reinitialization, progressively tightens constraints, and checks termination criteria for bi-level convergence.

1:: if $f_{b e s t}^{U L} = = f_{p r e v}^{U L}$ AND $f_{b e s t}^{L L} = = f_{p r e v}^{L L}$ then
2:: stag = stag + 1
3:: else
4:: stag = 0
5:: end if
6:: if stag > 0.15 × T_max then
7:: Reinitialize worst 30% of upper-level population randomly
8:: Reinitialize worst 30% of lower-level population with coordination to upper level
9:: Reset: stag = 0
10:: end if
11:: Progressive Constraint Tightening:
12:: for each chance constraint j at both levels do
13:: $β_{j} (t) = β_{j}^{f i n a l} \cdot (1 - e^{- κ \cdot t / T_{m a x}})$
14:: end for
15:: Check Bi-Level Convergence:
16:: Calculate coordination measure: $c o o r d = \frac{| {obj_improvement}^{U L} - {obj_improvement}^{L L} |}{{obj_improvement}^{U L} + {obj_improvement}^{L L}}$
17:: if convergence criteria met OR t == T_max OR coord < threshold then
18:: Evaluate $x_{g b e s t}^{U L}$ , $x_{g b e s t}^{L L}$ with high-precision chance simulation (N_MC = 5000)
19:: Verify constraint satisfaction and goal achievement for both levels
20:: return True {Termination signal}
21:: end if
22:: return False {Continue optimization}

5. Numerical Results and Analysis

This section presents comprehensive numerical experiments to evaluate the performance of our proposed Reinforcement-Learning-enhanced Arithmetic Optimization Algorithm (RL-AOA) for solving the bi-level dependent-chance goal programming model for tactical production–distribution planning in paper manufacturing under hybrid uncertainty.

5.1. Experimental Setup

5.1.1. Algorithm Parameter Settings

Algorithm parameters were determined through systematic preliminary experiments using the Taguchi method for parameter optimization. The final RL-AOA parameters are presented in Table 8.

5.1.2. Parameter Calibration Methodology

The algorithm parameters in Table 8 were calibrated using a systematic two-phase approach:

Phase 1: Taguchi Orthogonal Array Design

We employed an L16(4⁵) orthogonal array to evaluate key parameters:

Population sizes: $N_{p}^{U L} \in {60, 80, 100, 120}$ , $N_{p}^{L L} \in {40, 60, 80, 100}$ ;
Learning parameters: α₀ ∈ {0.1, 0.15, 0.2, 0.25};
Exploration bounds: ϵ_max ∈ {0.6, 0.8, 0.9, 1.0};
Penalty factors: $λ_{j}^{U L, 0} \in {100, 150, 200, 250}$ .

Phase 2: Fine-tuning Validation

Selected parameters underwent validation across three representative instances (Small-1, Medium-1, Large-1) with 15 independent runs each. The Signal-to-Noise ratio analysis confirmed optimal settings, i.e.,

N_{p}^{U L} = 80

(S/N = 24.3 dB) and α₀ = 0.15 (S/N = 22.8 dB), achieving a 12.4% improvement over default values.

5.1.3. Test Instance Characteristics

To evaluate the algorithm comprehensively, we generated a diverse set of paper manufacturing test instances with varying network sizes, complexity levels, and uncertainty characteristics. Table 9 summarizes the key characteristics of these instances.

These instances represent paper manufacturing networks of varying complexity, ranging from small regional operations to very large international paper manufacturing facilities. Each instance incorporates specific characteristics reflecting different operational contexts:

Small instances: Regional paper mills with limited product portfolio.
Medium instances: Multi-grade paper manufacturing with moderate complexity.
Large instances: Multi-facility paper manufacturing networks with diverse product lines.
Very Large instance: Global paper manufacturing network with maximum complexity.

5.1.4. Uncertain Random Parameter Specifications

The hybrid uncertainty in our paper manufacturing model is represented through uncertain random parameters following different distributions. Table 10 presents the uncertain random parameter settings used in our computational experiments.

The hybrid uncertain random parameters were generated following a rigorous two-stage calibration process:

Random Component Calibration: Historical data from five major paper manufacturing companies in North America and Europe were analyzed to determine appropriate probability distributions and their parameters.
Uncertain Component Calibration: Expert elicitation sessions with 12 paper industry professionals were conducted to establish uncertainty bounds, representing epistemic uncertainty in tactical planning parameter estimation.

5.1.5. Computational Environment

All experiments were conducted on a workstation equipped with Intel Core i9-12900K processor and 32GB RAM, implemented in MATLAB R2023b. Each test instance was solved 30 times with different random seeds to ensure the statistical significance of the results.

5.2. Bi-Level Dependent-Chance Goal Programming Results

The core contribution of our approach lies in optimizing probability achievement for tactical planning goals through lexicographic bi-level dependent-chance programming. We define target probability levels based on paper industry benchmarks and operational requirements: Economic Efficiency (α₁ = 0.90), Operational Performance (α₂ = 0.85), Resource Utilization (α₃ = 0.80), and Quality Assurance (α₄ = 0.90).

5.2.1. Probability Achievement Analysis

Table 11 presents the achieved probability levels for each tactical planning dimension across all test instances, with 95% confidence intervals computed using bootstrap resampling (1000 replications). Figure 1 illustrates the tactical planning goal achievement with confidence intervals, demonstrating the systematic performance trends across different problem scales.

The results demonstrate consistent performance across all tactical planning objectives, with economic efficiency and quality assurance achieving the highest rates. As shown in Figure 1, there is a clear downward trend in probability achievement as problem size increases, with all objectives showing statistically significant gaps from their targets (paired t-test, p < 0.01 for all objectives). This confirms the increasing challenge of simultaneously achieving all tactical planning goals under hybrid uncertainty as network complexity grows.

5.2.2. Bi-Level Coordination Analysis

Table 12 shows the coordination effectiveness between upper-level distribution decisions and lower-level production decisions across iterations. Figure 2 illustrates the convergence behavior of the bi-level optimization process, demonstrating how both upper- and lower-level costs converge simultaneously while maintaining coordination balance.

As illustrated in Figure 2, the coordination gap decreases systematically as the bi-level algorithm progresses, with both distribution and production costs converging to stable values. The coordination gap increases with problem complexity, indicating that the bi-level structure becomes more necessary for complex tactical planning problems. The convergence times are reasonable across all instances, with the largest problem requiring 61 iterations on average.

5.3. Paper Manufacturing Instance Details

To demonstrate the practicality of our approach, we present detailed results for a representative medium-sized paper manufacturing instance (Medium-1), as shown in Table 9. This instance includes 8 paper machines, 5 warehouses, 8 distribution centers, 15 customers, 5 paper grades, and a 4-period tactical planning horizon.

5.3.1. Paper Machine and Grade Information

Table 13 presents the production and setup costs for different paper grades on each machine, while Table 14 shows the machine capacities for the Medium-1 instance.

5.3.2. Customer Demand Information

Table 15 presents stochastic demand parameters and service requirements for different customer segments in the Medium-1 instance.

5.3.3. Baseline Algorithm Specifications

The comparative algorithms were adapted for bi-level optimization as follows:

Bi-GA: Two-population genetic algorithm with tournament selection (size 3), single-point crossover (p_c = 0.8), and uniform mutation (p_m = 0.05). Upper-level population = 80, lower-level population = 60.
Bi-PSO: Hierarchical particle swarm optimization with w = 0.9 → 0.4 (linear decrease), c₁ = c₂ = 2.0, velocity clamping at ± 0.2× search range.
Bi-DE: Differential evolution with DE/rand/1/bin strategy, F = 0.5, CR = 0.8, coordinated through best solution sharing between levels every 10 iterations.
Bi-SSO: Social spider optimization with female ratio = 0.65, vibration attenuation r_a = 1.0, bi-level coordination through pheromone information exchange.

All algorithms used identical constraint handling (dynamic penalty), chance constraint evaluation (8000 Monte Carlo samples), and termination criteria (400 iterations) for fair comparison.

5.4. Algorithm Performance Comparison

We compare our RL-AOA with several state-of-the-art metaheuristics adapted for bi-level optimization. Table 16 presents comprehensive performance metrics.

The results demonstrate that RL-AOA consistently outperforms other bi-level optimization approaches, achieving 3.2–7.8% improvement in solution quality and 18.5% reduction in computational time compared to the best competing algorithm. The convergence behavior is illustrated in Figure 2, which shows that RL-AOA achieves faster convergence and better final solutions compared to traditional metaheuristics.

5.5. Empirical Validation of Balance Strategy

To validate our exploration–exploitation trade-off strategy, we conducted comprehensive analysis of RL-AOA balance evolution. Figure 3 illustrates the dynamic adaptation capability.

Key empirical findings:

Optimal Balance Progression: Systematic transition from 25% exploration to 2% exploitation over 400 iterations.
Stable Balance Quality: Incremental–decremental analysis shows consistent balance near zero.
Adaptive Capability: Algorithm successfully adapts exploration–exploitation ratio based on optimization progress.
Convergence Stability: Balance evolution demonstrates stable convergence without premature stagnation.

5.6. Reinforcement Learning Component Analysis

We analyze the effectiveness of the RL component in adaptive parameter tuning for bi-level coordination. Table 17 shows action selection patterns across different optimization phases.

The RL component demonstrates intelligent adaptive behavior, emphasizing exploration and coordination in early phases while shifting to exploitation and refined coordination in later iterations, contributing to a 5.2% average performance improvement over static parameter settings.

5.7. Sensitivity Analysis

We conducted comprehensive sensitivity analysis to evaluate the robustness of our bi-level dependent-chance goal programming approach under varying parameter settings.

5.7.1. Confidence Level Impact

Table 18 presents the detailed results of confidence level sensitivity analysis for the bi-level model.

The analysis shows how increasing confidence levels affect both cost and goal achievement, demonstrating the trade-off between reliability and cost efficiency in tactical paper manufacturing planning.

5.7.2. Production–Distribution Coordination Analysis

Table 19 analyzes the coordination effectiveness between the production and distribution levels under different uncertainty scenarios.

5.8. Paper Manufacturing Specific Analysis

5.8.1. Paper Grade Production Allocation

Table 20 shows the optimal production allocation across different paper grades and planning periods for the Medium-1 instance. Figure 4 provides a visual representation of the production distribution across planning periods, highlighting the balanced allocation strategy employed by our optimization framework.

As illustrated in Figure 4, the production allocation strategy demonstrates effective balance across different paper grades and planning periods. Grade G4 (Specialty) shows the highest total production volume (1600 tons), followed by Grade G1 (Packaging) with 1252 tons, reflecting market demand patterns and profitability considerations. The temporal distribution shows relatively stable production across periods, with some variation to accommodate demand fluctuations and machine availability constraints.

5.8.2. Machine Changeover Analysis

Table 21 presents the changeover patterns and their impact on production efficiency.

The analysis shows the trade-off between production flexibility and efficiency, with machines experiencing 4–8 changeovers per planning horizon and maintaining an average utilization of 87.0% despite changeover penalties.

5.9. Cost–Benefit Analysis for Paper Manufacturing

Table 22 presents the estimated costs and benefits of implementing the bi-level DCGP framework in paper manufacturing environments. Figure 5 illustrates the return on investment analysis across different mill sizes, demonstrating favorable economic prospects for implementation.

As demonstrated in Figure 5, the expected return on investment is favorable across all mill sizes, with the 3-year ROI ranging from 35 to 55%. The payback periods are particularly attractive, ranging from 8 to 15 months, with larger mills achieving faster payback due to economies of scale. The analysis indicates that medium and large mills benefit most from implementation, with annual savings significantly exceeding implementation costs.

5.10. Target Probability Level (α_g) Impact on Lexicographic Feasibility

The lexicographic optimization structure requires careful consideration of how target probability levels affect feasibility across priority levels. Table 23 analyzes the impact of varying α_g on goal achievement and feasibility.

Critical insights:

Feasibility Threshold: Target levels above α_g = 0.85 create significant feasibility issues;
Cascade Effects: Higher-priority goals progressively constrain lower-priority goals (0.045 to 0.167);
Current Settings Validation: Our α_g values operate near feasibility boundary, explaining observed gaps.

5.11. Expert Opinion Impact and Sensitivity Analysis

Expert opinions define epistemic uncertainty bounds (alpha-cuts) for uncertain random variables, directly influencing chance measure calculations and goal achievement probabilities. Figure 6 presents comprehensive sensitivity analysis examining how expert-defined uncertainty ranges affect model performance.

The sensitivity analysis reveals critical insights:

Quantified Sensitivity Range: Goal achievement probabilities vary within ±3.0% bounds across expert opinion scenarios. Economic efficiency shows ±0.9%, operational performance ±0.8%, resource utilization ±2.5%, and quality assurance ±1.3% variations.
Differential Impact Analysis: Resource utilization goals demonstrate 2.8× higher sensitivity than economic efficiency, reflecting operational complexity under epistemic uncertainty. This identifies critical areas requiring careful expert calibration.
Robustness Validation: The bounded sensitivity range (maximum 3.0%) confirms framework stability. Even with ±100% variation in expert uncertainty estimates (from ±5% to ±30%), goal achievement varies by less than 3.5%, validating practical applicability.
Optimal Calibration Confirmation: Current ±15% uncertainty ranges represent optimal balance, positioned at the inflection point where sensitivity transitions from steep (conservative side) to gradual (optimistic side), maximizing information value while maintaining stability.

These findings demonstrate that while expert opinions measurably influence outcomes, the framework maintains robust performance (coefficient of variation < 0.035) across reasonable uncertainty variations. The moderate sensitivity validates our approach for industrial application without requiring excessive precision in expert assessments, supporting practical implementation in paper manufacturing environments where expert consensus may vary.

5.12. Comparative Analysis: Deterministic vs. Hybrid Uncertainty Framework

To demonstrate the added value of hybrid uncertainty modeling, we compared four approaches: deterministic baseline, stochastic-only, fuzzy-only, and our hybrid framework. Figure 7 presents a comprehensive comparative analysis of these approaches.

Key findings:

Cost Realism: The hybrid framework shows a 17.8% cost increase over deterministic baseline;
Goal Achievement Stability: Deterministic approaches exhibit overoptimistic variability while hybrid maintains stable probabilities;
Single-Paradigm Limitations: Stochastic-only and fuzzy-only fail to capture complete uncertainty spectrum.

This analysis demonstrates when hybrid uncertainty complexity is justified for tactical planning decisions.

5.13. Managerial Insights and Implementation Guidelines

Based on our comprehensive analysis, we provide actionable insights for paper manufacturing tactical planning:

Tactical Goal Probability Setting:
- Economic efficiency: Target 87–90% (achievable with moderate resource allocation);
- Operational performance: Target 82–85% (balances service level with operational flexibility);
- Resource utilization: Target 76–80% (realistic given machine compatibility constraints);
- Quality assurance: Target 86–90% (essential for paper grade specifications).
Implementation Roadmap for Paper Manufacturing:
- Phase 1: Implement production scheduling optimization (2–4 months)
- Phase 2: Integrate distribution coordination (4–6 months)
- Phase 3: Add quality and changeover optimization (6–8 months)
- Expected operational improvement: 15–25% within 12 months
Machine-Grade Compatibility Management:
- Maintain compatibility matrices for all machine-grade combinations;
- Optimize changeover sequences to minimize setup times and costs;
- Consider machine specialization for high-volume paper grades;
- Update efficiency parameters monthly based on production data.
Uncertainty Management in Paper Manufacturing:
- Monitor demand patterns by paper grade and customer type monthly;
- Conduct quarterly expert assessments for market trend uncertainties;
- Maintain confidence levels between 0.82 and 0.87 for optimal performance;
- Implement adaptive inventory policies based on demand uncertainty;
- Dynamic Shelf Life Modeling: The current model assumes fixed shelf-life periods for computational tractability. Future extensions could incorporate storage-condition-dependent deterioration models, particularly valuable for moisture-sensitive paper grades where humidity and temperature significantly affect product quality over time.

5.14. Limitations and Future Research Directions

While our bi-level DCGP framework demonstrates significant improvements for paper manufacturing tactical planning, several limitations should be acknowledged:

Real-World Validation: Future work should include partnerships with paper mills for comprehensive industrial validation across different paper grades and manufacturing processes.
Dynamic Grade Specifications: The current model assumes static quality requirements; extensions should incorporate evolving customer specifications and regulatory changes.
Multi-Mill Coordination: Framework extension to coordinate multiple paper mills within a corporate network represents a promising research direction.
Environmental Regulations: Integration of time-varying environmental constraints and carbon pricing mechanisms would enhance practical applicability.
Supply Chain Resilience: Incorporation of supply disruption scenarios and recovery strategies would improve robustness.

The proposed framework provides a solid foundation for tactical production–distribution planning in paper manufacturing, offering both theoretical contributions and practical implementation guidelines for industry adoption. The results demonstrate that the RL-AOA with bi-level dependent-chance goal programming can effectively handle the complexity of hybrid uncertainty while maintaining computational efficiency for real-world applications.

6. Conclusions

This research developed a bi-level dependent-chance goal programming framework for tactical production–distribution planning in paper manufacturing under hybrid uncertainty. The framework successfully coordinates upper-level distribution decisions with lower-level production decisions while handling both aleatory randomness and epistemic uncertainty through uncertain random theory. The Reinforcement Learning-enhanced Arithmetic Optimization Algorithm achieved 3.2–7.8% improvements in solution quality and 18.5% computational time reduction compared to existing bi-level optimization methods.

The dependent-chance goal programming formulation effectively balanced four tactical objectives, achieving average probability levels of 87.0% for economic efficiency, 81.7% for operational performance, 76.5% for resource utilization, and 86.1% for quality assurance. The bi-level coordination mechanism maintained average coordination gaps of 4.5% across all test instances, demonstrating effective integration between production and distribution planning levels. The reinforcement learning component provided intelligent adaptive behavior, contributing to a 5.2% performance improvement over static parameter configurations.

Computational experiments across seven test instances validated the framework’s scalability and practical applicability for paper manufacturing operations. Industry-specific analysis confirmed effective handling of machine-grade compatibility constraints, sequence-dependent changeovers, and quality requirements while maintaining 87.0% average machine utilization. Cost–benefit analysis indicates favorable economic prospects with payback periods of 8–15 months and 35–55% three-year return on investment.

The framework provides both theoretical contributions to bi-level optimization under hybrid uncertainty and practical tools for tactical planning in paper manufacturing. Future research should focus on real-world validation, dynamic uncertainty modeling, multi-mill coordination, and integration of environmental constraints. The approach demonstrates that sophisticated optimization techniques can successfully address complex industrial planning problems while maintaining computational efficiency and providing actionable insights for hierarchical decision-making under uncertainty.

Author Contributions

Conceptualization, Y.B. and R.B.; Methodology, Y.B., R.B. and A.B.; Software, Y.B.; Validation, Y.B. and N.R.; Formal Analysis, Y.B. and N.R.; Investigation, Y.B. and R.B.; Resources, R.B. and O.B.; Data Curation, Y.B. and N.R.; Writing—Original Draft Preparation, Y.B.; Writing—Review and Editing, Y.B., R.B., A.B., N.R., O.B. and F.F.; Visualization, Y.B.; Supervision, R.B., A.B. and F.F.; Project Administration, R.B. and O.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Al-Khwarizmi Programme, a collaborative effort between the National Center for Scientific and Technical Research (CNRST), the Agency for Digital Development (ADD), and the Moroccan Ministry of Higher Education.

Data Availability Statement

The data are contained within the article.

Acknowledgments

The authors express their gratitude to the editors and reviewers for their valuable comments and constructive suggestions regarding the revision of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Andersson, J.; Marklund, J. Decentralized inventory control in a two-level distribution system. Eur. J. Oper. Res. 2000, 127, 483–506. [Google Scholar] [CrossRef]
Santos, M.O.; Almada-Lobo, B. Integrated pulp and paper mill planning and scheduling. Comput. Ind. Eng. 2012, 63, 1–12. [Google Scholar] [CrossRef]
Yang, X.S.; Deb, S.; Fong, S. Metaheuristic algorithms: Optimal balance of intensification and diversification. Appl. Math. Inf. Sci. 2014, 8, 977–983. [Google Scholar] [CrossRef]
Holik, H. Handbook of Paper and Board; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Lalitha, J.L.; Mohan, N.; Pillai, V.M. Lot streaming in [N-1](1) + N (m) hybrid flow shop. J. Manuf. Syst. 2017, 44, 12–21. [Google Scholar] [CrossRef]
Maravelias, C.T.; Grossmann, I.E. New general continuous-time state-task network formulation for short-term scheduling of multipurpose batch plants. Ind. Eng. Chem. Res. 2005, 44, 9695–9707. [Google Scholar]
Fleischmann, B.; Meyr, H.; Wagner, M. Advanced planning. In Supply Chain Management and Advanced Planning; Springer: Berlin/Heidelberg, Germany, 2005; pp. 81–106. [Google Scholar]
Papageorgiou, L.G. Supply chain optimisation for the process industries: Advances and opportunities. Comput. Chem. Eng. 2001, 25, 1121–1137. [Google Scholar] [CrossRef]
Stadtler, H.; Kilger, C.; Meyr, H. Supply Chain Management and Advanced Planning: Concepts, Models, Software, and Case Studies, 5th ed.; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
Ben-Tal, A.; El Ghaoui, L.; Nemirovski, A. Robust Optimization; Princeton University Press: Princeton, NJ, USA, 2009. [Google Scholar]
Birge, J.R.; Louveaux, F. Introduction to Stochastic Programming, 2nd ed.; Springer: New York, NY, USA, 2011. [Google Scholar]
Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
Sahinidis, N.V. Optimization under uncertainty: State-of-the-art and opportunities. Comput. Chem. Eng. 2004, 28, 971–983. [Google Scholar] [CrossRef]
Colson, B.; Marcotte, P.; Savard, G. An overview of bilevel optimization. Ann. Oper. Res. 2007, 153, 235–256. [Google Scholar] [CrossRef]
Tamiz, M.; Jones, D.; Romero, C. Goal programming for decision making: An overview of the current state-of-the-art. Eur. J. Oper. Res. 1998, 111, 569–581. [Google Scholar] [CrossRef]
U.S. Environmental Protection Agency. Industrial Emissions Standards and Compliance Guidelines; EPA Office of Air and Radiation: Washington, DC, USA, 2023.
Intergovernmental Panel on Climate Change. 2019 Refinement to the 2006 IPCC Guidelines for National Greenhouse Gas Inventories; IPCC: Geneva, Switzerland, 2019. [Google Scholar]
Liu, B. Uncertainty Theory, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
Rubinstein, R.Y.; Kroese, D.P. Simulation and the Monte Carlo Method, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
Liu, B. Uncertainty Theory: A Branch of Mathematics for Modeling Human Uncertainty, 4th ed.; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
Morales-Castañeda, B.; Zaldívar, D.; Cuevas, E.; Fausto, F.; Rodríguez, A. A better balance in metaheuristic algorithms: Does it exist? Swarm Evol. Comput. 2020, 54, 100671. [Google Scholar] [CrossRef]
Abualigah, L.; Diabat, A.; Mirjalili, S.; Abd Elaziz, M.; Gandomi, A.H. The Arithmetic Optimization Algorithm. Comput. Methods Appl. Mech. Eng. 2021, 376, 113609. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Watkins, C.J.C.H.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]

Figure 1. Tactical planning goal achievement with 95% confidence intervals. Shows decreasing probability achievement across all objectives as problem complexity increases, with economic efficiency and quality assurance maintaining highest performance levels.

Figure 2. Bilevel optimization convergence process for production–distribution planning. Demonstrates coordinated convergence of upper-level (distribution) and lower-level (production) costs, with the coordination gap decreasing to acceptable threshold of 0.01 within 200 iterations.

Figure 3. RL-AOA balance dynamics. Left panel shows systematic transition from exploration-dominant (25%) to exploitation-dominant (2%) phase. Right panel displays incremental–decremental analysis revealing stable balance quality.

Figure 4. Optimal paper grade production distribution across planning periods. Shows balanced allocation of production capacity across five paper grades (G1–G5) over four planning periods, with total production ranging from 1024 to 1600 tons per grade, demonstrating effective demand fulfillment and capacity utilization strategies.

Figure 5. Cost–benefit analysis for paper manufacturing implementation. Shows implementation costs, annual savings, and payback periods across different mill sizes. Expected 3-year ROI ranges from 35 to 55%, with payback periods of 8–15 months demonstrating strong economic viability.

Figure 6. Expert opinion impact and sensitivity analysis. The figure shows: Left panel—Goal achievement sensitivity showing variations of −2.5% (conservative, ±30%) to +0.8% (very optimistic, ±5%) from current calibration (±15%). Resource utilization exhibits highest sensitivity (−2.5% to +1.3%) due to capacity planning complexity. Middle panel—Expert-defined alpha-cut ranges illustrating uncertainty bounds from conservative (0.7–1.3) to very optimistic (0.95–1.05) scenarios. Right panel—Average sensitivity impact demonstrating framework robustness with maximum ±3.0% variation, confirming stable performance across expert assessment approaches.

Figure 7. Comparative analysis. Left panel shows 17.8% average cost increase for the hybrid framework reflecting realistic uncertainty. Right panel displays goal achievement reliability with deterministic showing overoptimistic performance (0.88 ± 0.055) versus stable hybrid results (0.87 ± 0.011).

Table 1. Sets and indices used in the paper mill supply chain optimization model. This table defines the mathematical notation for all entities in the supply chain network including products, facilities, and their relationships.

Notation	Description
$P = {1, 2, \dots, p}$	Set of paper products/grades
$M = {1, 2, \dots, m}$	Set of paper machines
$W = {1, 2, \dots, w}$	Set of warehouses
$D = {1, 2, \dots, d}$	Set of distribution centers
$C = {1, 2, \dots, c}$	Set of customers
$R = {1, 2, \dots, r}$	Set of raw materials (pulp types, recycled fiber)
$T = {1, 2, \dots, t}$	Set of time periods (months)
$L = {1, 2, \dots, l}$	Set of transportation modes
$S = {1, 2, \dots, s}$	Set of suppliers
PM_p ⊆ M	Set of machines compatible with paper grade p
PR_p ⊆ R	Set of raw materials required for paper grade p
Φ_incompatible ⊆ P × P	Set of incompatible product transition pairs

Table 2. Deterministic parameters for the paper mill supply chain model. This table lists all fixed parameters related to costs, capacities, production rates, and technical specifications that remain constant throughout the planning horizon.

Parameter	Description
K_pm	Setup cost for producing paper grade p on machine m
$S T_{p_{i} p_{j} m}$	Sequence-dependent changeover time from grade p_i to p_j on machine m
$S C_{p_{i} p_{j} m}$	Sequence-dependent changeover cost from grade p_i to p_j on machine m
α_pr	Amount of raw material r required per unit of paper grade p
β_pm	Production rate of paper grade p on machine m (tons/hour)
CAP_m	Available capacity of machine m in each period (hours)
WCAP_w	Storage capacity of warehouse w (tons)
DCAP_d	Storage capacity of distribution center d (tons)
LCAP_l	Transportation capacity of mode l (tons)
SL_p	Shelf life of paper grade p (periods)
MW_p	Minimum order quantity for paper grade p
MX_p	Maximum order quantity for paper grade p
FC_d	Fixed cost of operating distribution center d
W_λ	Weight for service level in objective function
QMin_p	Minimum quality threshold for paper grade p
EmissionRate_pm	Emission rate for producing grade p on machine m
EmissionLimit	Total emission limit for the planning horizon
SMin_pt, SMax_pt	Seasonal production bounds for grade p in period t
MaxChangeover_m	Maximum changeovers allowed per period on machine m
MinQuality_pm	Minimum quality level for grade p produced on machine m
M_big	Large constant for big-M constraints

Table 3. Uncertain random parameters for the paper mill supply chain model. This table details all stochastic parameters that are subject to uncertainty, including demand fluctuations, cost variations, and operational variability.

Parameter	Description
${\tilde{D}}_{p c t}$	Uncertain random demand of customer c for paper grade p in period t
${\tilde{P C}}_{p m t}$	Uncertain random production cost of grade p on machine m in period t
${\tilde{R C}}_{r t}$	Uncertain random cost of raw material r in period t
${\tilde{T C}}_{i j l t}$	Uncertain random transportation cost from node i to j using mode l in period t
${\tilde{H C}}_{p t}$	Uncertain random holding cost of paper grade p in period t
${\tilde{B C}}_{p c t}$	Uncertain random backorder cost for grade p at customer c in period t
${\tilde{η}}_{p m t}$	Uncertain random machine efficiency for grade p on machine m in period t
${\tilde{θ}}_{r t}$	Uncertain random raw material availability of type r in period t
${\tilde{γ}}_{p t}$	Uncertain random quality yield for paper grade p in period t
${\tilde{ϕ}}_{l t}$	Uncertain random transportation mode availability l in period t
${\tilde{ω}}_{m t}$	Uncertain random machine breakdown probability for machine m in period t

Table 4. Goal programming parameters for the paper mill supply chain model. This table defines the parameters used in the goal programming formulation, including target probability levels, threshold values, and confidence levels for each objective.

Parameter	Description
α₁	Target probability level for economic efficiency goal
α₂	Target probability level for operational performance goal
α₃	Target probability level for resource utilization goal
α₄	Target probability level for quality assurance goal
Target₁	Target threshold for total cost
Target₂	Target threshold for service level
Target₃	Target threshold for capacity utilization
Target₄	Target threshold for quality performance
β_j	Confidence level for chance constraint j

Table 5. Upper-level decision variables for distribution planning in the paper mill supply chain model. This table defines all variables related to distribution network configuration, product flow, and customer service.

Variable	Description
Y_d ∈ {0, 1}	1 if distribution center d is operated, 0 otherwise
X_wdplt ≥ 0	Quantity of grade p shipped from warehouse w to DC d using mode l in period t
Z_dcplt ≥ 0	Quantity of grade p shipped from DC d to customer c using mode l in period t
U_lt ∈ {0, 1}	1 if transportation mode l is used in period t, 0 otherwise
$I_{d p t}^{D} \geq 0$	Inventory of grade p at distribution center d at end of period t
B_pct ≥ 0	Backorder quantity of grade p for customer c in period t
λ_pt ∈ [0, 1]	Service level for paper grade p in period t

Table 6. Lower-level decision variables for production planning in the paper mill supply chain model. This table defines all variables related to production decisions, machine scheduling, raw material procurement, and inventory management at production facilities.

Variable	Description
Q_pmt ≥ 0	Quantity of grade p produced on machine m in period t
V_pmt ∈ {0, 1}	1 if grade p is produced on machine m in period t, 0 otherwise
$W_{p_{i} p_{j} m t} \in {0, 1}$	1 if changeover from grade p_i to p_j on machine m in period t, 0 otherwise
F_wplt ≥ 0	Quantity of grade p shipped from machine location to warehouse w using mode l in period t
$I_{w p t}^{W} \geq 0$	Inventory of grade p at warehouse w at end of period t
R_srt ≥ 0	Quantity of raw material r purchased from supplier s in period t
$I_{r t}^{R} \geq 0$	Inventory of raw material r at end of period t

Table 7. Goal programming variables for the paper mill supply chain model. This table defines the deviation variables used to measure the achievement of each goal in the multi-objective framework.

Variable	Description
$d_{1}^{+}, d_{1}^{-} \geq 0$	Positive and negative deviations from economic efficiency goal
$d_{2}^{+}, d_{2}^{-} \geq 0$	Positive and negative deviations from operational performance goal
$d_{3}^{+}, d_{3}^{-} \geq 0$	Positive and negative deviations from resource utilization goal
$d_{4}^{+}, d_{4}^{-} \geq 0$	Positive and negative deviations from quality assurance goal

Table 8. Parameter settings for RL-AOA.

Parameter	Symbol	Value
Population size (upper level)	$N_{p}^{U L}$	80
Population size (lower level)	$N_{p}^{L L}$	60
Maximum iterations	T_max	400
Number of Monte Carlo simulations	N_MC	8000
Math Optimizer bounds	[Min, Max]	[0.2, 1.0]
Exploitation accuracy parameter	α	5.0
Mutation factor	μ	0.5
Initial learning rate	α₀	0.15
Learning rate decay	ϕ	0.6
Discount factor	γ	0.85
Exploration probability (initial)	ϵ_max	0.8
Exploration probability (final)	ϵ_min	0.05
Penalty factor (initial upper level)	$λ_{j}^{U L, 0}$	150
Penalty factor (initial lower level)	$λ_{j}^{L L, 0}$	100
Confidence level tightening rate	κ	2.5

Table 9. Characteristics of paper manufacturing test instances.

Instance	Machines	WHs	DCs	Customers	Grades	Periods	Suppliers
Small-1	4	3	5	8	3	3	4
Small-2	6	4	6	12	4	3	5
Medium-1	8	5	8	15	5	4	6
Medium-2	10	6	10	20	6	4	8
Large-1	12	8	12	25	7	5	10
Large-2	15	10	15	30	8	5	12
Very Large	20	12	18	40	10	6	15

Table 10. Uncertain random parameter distributions for paper manufacturing.

Parameter	Random Component	Uncertain Component
Customer demand ( ${\tilde{D}}_{p c t}$ )	Normal (μ_pct, $σ_{p c t}^{2}$ )	Linear (0.85μ_pct, 1.15μ_pct)
Production cost ( ${\tilde{P C}}_{p m t}$ )	Lognormal (μ_PC, $0.08 μ_{P C}^{2}$ )	Linear (0.9μ_PC, 1.2μ_PC)
Raw material cost ( ${\tilde{R C}}_{r t}$ )	Normal (μ_RC, $0.1 μ_{R C}^{2}$ )	Linear (0.8μ_RC, 1.25μ_RC)
Transportation cost ( ${\tilde{T C}}_{i j l t}$ )	Uniform ( $0.85 {\hat{T C}}_{i j l t}$ , $1.15 {\hat{T C}}_{i j l t}$ )	Linear ( $0.75 {\hat{T C}}_{i j l t}$ , $1.3 {\hat{T C}}_{i j l t}$ )
Machine efficiency ( ${\tilde{η}}_{p m t}$ )	Beta (7, 2)	Linear (0.85, 1.0)
Raw material availability ( ${\tilde{θ}}_{r t}$ )	Triangular (0.8, 1.0, 1.1)	Linear (0.9, 1.15)
Quality yield ( ${\tilde{γ}}_{p t}$ )	Beta (8, 1.5)	Linear (0.92, 1.0)
Transportation mode availability ( ${\tilde{ϕ}}_{l t}$ )	Beta (3, 1)	Linear (0.7, 1.0)
Machine breakdown probability ( ${\tilde{ω}}_{m t}$ )	Beta (1.5, 10)	Linear (0.4, 2.2)
Emission rate ( ${\tilde{E m i s s i o n R a t e}}_{p m}$ )	Lognormal (μ_ER, $0.12 μ_{E R}^{2}$ )	Linear (0.85μ_ER, 1.25μ_ER)

Table 11. Tactical planning goal probability achievement with confidence intervals.

Instance	Economic	Operational	Resource Util.	Quality
	(Target: 0.90)	(Target: 0.85)	(Target: 0.80)	(Target: 0.90)
Small-1	0.887 ± 0.012	0.836 ± 0.015	0.783 ± 0.018	0.878 ± 0.014
Small-2	0.883 ± 0.013	0.831 ± 0.016	0.779 ± 0.019	0.874 ± 0.015
Medium-1	0.876 ± 0.014	0.823 ± 0.017	0.771 ± 0.020	0.867 ± 0.016
Medium-2	0.871 ± 0.015	0.817 ± 0.018	0.765 ± 0.021	0.862 ± 0.017
Large-1	0.865 ± 0.016	0.810 ± 0.019	0.758 ± 0.022	0.856 ± 0.018
Large-2	0.859 ± 0.017	0.804 ± 0.020	0.752 ± 0.023	0.850 ± 0.019
Very Large	0.852 ± 0.018	0.797 ± 0.021	0.745 ± 0.024	0.843 ± 0.020
Average	0.870	0.817	0.765	0.861
Gap from Target	−3.3%	−3.9%	−4.4%	−4.3%

Note: Bold formatting in summary rows distinguishes aggregate statistics from individual instance results for enhanced readability.

Table 12. Bi-level coordination effectiveness.

Instance	Coordination Gap ^a	Iterations to Converge	Upper-Level Cost (USD K)	Lower-Level Cost (USD K)
Small-1	0.024	28	45.3	23.2
Small-2	0.031	32	52.5	27.8
Medium-1	0.038	36	68.8	35.4
Medium-2	0.045	41	84.3	43.7
Large-1	0.052	47	112.6	58.9
Large-2	0.059	53	138.5	72.3
Very Large	0.067	61	189.7	98.8
Average	0.045	42.6	98.8	51.4

^a Normalized coordination gap =

\frac{| \sum Q_{p m t} - \sum F_{w p l t} |}{\sum Q_{p m t}}

, representing relative material flow imbalance (0 = perfect coordination, 1 = complete mismatch). Note: Bold formatting in the average row distinguishes summary statistics from individual instance results for enhanced readability.

Table 13. Production and setup costs for Medium-1 instance (USD/ton, USD).

Machine	Production Cost (USD/ton)					Setup Cost (USD)
	G1	G2	G3	G4	G5	G1	G2	G3	G4	G5
M1	285	320	245	–	380	1200	1450	950	–	1680
M2	295	–	255	420	365	1350	–	1100	1890	1580
M3	–	310	240	405	375	–	1380	980	1820	1620
M4	280	315	–	415	–	1180	1420	–	1860	–
M5	290	325	250	–	385	1280	1480	1050	–	1720
M6	–	–	235	395	360	–	–	920	1750	1540
M7	275	305	245	410	–	1150	1350	970	1840	–
M8	285	–	–	400	370	1220	–	–	1780	1590

Table 14. Machine capacity information for Medium-1 instance.

Machine	Daily Capacity (tons)	Efficiency (%)	Availability (%)
M1	45	92.3	95.2
M2	38	89.7	94.8
M3	42	91.5	96.1
M4	50	88.9	93.7
M5	35	90.2	95.5
M6	40	93.1	96.8
M7	48	87.4	94.2
M8	44	91.8	95.9

Table 15. Customer demand information with uncertain parameters (Medium-1).

Customer	Type	Mean Demand (tons/period)					Uncertainty Parameters
		G1	G2	G3	G4	G5	CV (%)	α-cut	Service Level
C1	Packaging	450	380	0	520	0	15	[0.9, 1.1]	0.95
C2	Newsprint	0	680	420	0	0	18	[0.85, 1.15]	0.92
C3	Tissue	0	0	320	0	460	22	[0.8, 1.2]	0.90
C4	Publishing	350	550	280	0	0	16	[0.9, 1.1]	0.94
C5	Specialty	0	0	0	380	420	25	[0.75, 1.25]	0.88
C6	Packaging	480	0	0	560	0	14	[0.92, 1.08]	0.96
C7	Newsprint	0	720	390	0	0	19	[0.85, 1.15]	0.91
C8	Tissue	0	0	340	0	480	21	[0.8, 1.2]	0.89

Table 16. Algorithm performance comparison for bi-level optimization.

Algorithm	Best	Average	Worst	Std. Dev.	Time (min)	Success Rate
Bi-GA	0.821	0.826	0.839	0.006	24.3	71.2%
Bi-PSO	0.834	0.839	0.851	0.005	21.7	75.8%
Bi-DE	0.847	0.852	0.863	0.005	19.4	78.3%
Bi-SSO	0.855	0.860	0.869	0.004	17.9	81.7%
RL-AOA	0.870	0.875	0.882	0.003	15.2	86.4%

Wilcoxon Signed-Rank Test (RL-AOA vs. others): all p < 0.01. Note: Bold formatting highlights the proposed RL-AOA method to distinguish it from baseline algorithms for comparison clarity.

Table 17. RL action selection patterns in bi-level optimization.

Optimization Phase	Enhance Exploration	Focus Exploitation	Balanced Search	Strengthen Coordination
Early (0–25%)	42.8%	12.6%	28.4%	16.2%
Middle (25–65%)	24.3%	21.7%	35.8%	18.2%
Late (65–100%)	8.9%	35.4%	29.3%	26.4%
Performance Gain	+3.1%	+2.4%	+1.6%	+3.8%

Note: Bold formatting in the performance gain row distinguishes performance metrics from action selection frequencies, highlighting the effectiveness of each RL action type for enhanced readability.

Table 18. Impact of confidence levels on bi-level performance metrics.

Confidence	Goal Achievement	Upper-Level Cost	Lower-Level Cost	Coordination Gap	CPU Time
0.70	0.892	84.3	42.2	0.038	11.2
0.75	0.881	87.5	43.8	0.041	12.4
0.80	0.870	91.3	45.7	0.045	13.8
0.85	0.858	95.8	47.9	0.049	15.2
0.90	0.844	101.2	50.6	0.054	17.9
0.95	0.827	108.5	54.3	0.061	22.1

Table 19. Production–distribution coordination under uncertainty.

Uncertainty Level	Coordination Efficiency	Information Exchange	Decision Consistency	Overall Performance
Low (0.5×)	0.924	0.957	0.889	0.923
Medium (1.0×)	0.870	0.912	0.845	0.876
High (1.5×)	0.823	0.868	0.798	0.830
Very High (2.0×)	0.778	0.821	0.752	0.784

Table 20. Optimal paper grade production allocation (Medium-1 instance).

Machine	Period 1					Period 2					Period 3					Period 4
	G1	G2	G3	G4	G5	G1	G2	G3	G4	G5	G1	G2	G3	G4	G5	G1	G2	G3	G4	G5
M1	125	0	89	–	0	0	134	76	–	98	118	0	92	–	0	0	128	0	–	105
M2	0	–	0	156	87	98	–	0	143	0	0	–	67	148	94	112	–	0	0	89
M3	–	142	0	0	76	–	0	89	134	0	–	156	0	0	82	–	0	78	145	0
M4	89	0	–	0	–	0	125	–	167	–	104	0	–	0	–	0	138	–	158	–
M5	0	0	76	–	123	87	0	0	–	0	0	0	89	–	134	96	0	0	–	0
M6	–	–	94	0	0	–	–	0	0	118	–	–	104	0	0	–	–	0	0	127
M7	0	89	0	124	–	76	0	98	0	–	0	112	0	135	–	88	0	87	0	–
M8	134	–	–	0	97	0	–	–	148	0	125	–	–	0	106	0	–	–	142	0

Table 21. Machine changeover analysis for tactical planning.

Machine	Total Changeovers	Changeover Cost (USD)	Changeover Time (hrs)	Efficiency Loss (%)	Utilization (%)
M1	6	8450	23.5	3.2	87.4
M2	7	9680	27.2	3.8	85.1
M3	5	7320	19.8	2.7	89.2
M4	8	11,240	31.6	4.3	82.7
M5	6	8750	24.8	3.4	86.8
M6	4	5890	16.2	2.2	91.5
M7	7	9450	26.4	3.6	84.9
M8	5	7680	20.9	2.9	88.6
Average	6.0	8557	23.8	3.3	87.0

Note: Bold formatting in the average row distinguishes aggregate statistics from individual machine data for enhanced readability and quick identification of summary metrics.

Table 22. Cost–benefit analysis for paper manufacturing implementation.

Mill Size	Implementation Cost (USD K)	Annual Savings (USD K)	Payback Period (Months)
Small (1–2 machines)	150–250	180–320	8–14
Medium (3–8 machines)	280–450	350–680	9–15
Large (9+ machines)	520–850	750–1400	8–13

Table 23. Impact of target probability levels on lexicographic feasibility.

Scenario	Econ.	Oper.	Res.	Qual.	Econ.	Oper.	Res.	Qual.
	Target α_g Values				Goal Achievement
Relaxed	0.80	0.75	0.70	0.80	0.892	0.848	0.811	0.879
Current	0.90	0.85	0.80	0.90	0.870	0.817	0.765	0.861
Stringent	0.95	0.90	0.85	0.95	0.831	0.762	0.689	0.798
Cascade Effect:					0.045	0.085	0.167	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Boutmir, Y.; Bannari, R.; Bannari, A.; Rouky, N.; Benmoussa, O.; Fedouaki, F. Bi-Level Dependent-Chance Goal Programming for Paper Manufacturing Tactical Planning: A Reinforcement-Learning-Enhanced Approach. Symmetry 2025, 17, 1624. https://doi.org/10.3390/sym17101624

AMA Style

Boutmir Y, Bannari R, Bannari A, Rouky N, Benmoussa O, Fedouaki F. Bi-Level Dependent-Chance Goal Programming for Paper Manufacturing Tactical Planning: A Reinforcement-Learning-Enhanced Approach. Symmetry. 2025; 17(10):1624. https://doi.org/10.3390/sym17101624

Chicago/Turabian Style

Boutmir, Yassine, Rachid Bannari, Abdelfettah Bannari, Naoufal Rouky, Othmane Benmoussa, and Fayçal Fedouaki. 2025. "Bi-Level Dependent-Chance Goal Programming for Paper Manufacturing Tactical Planning: A Reinforcement-Learning-Enhanced Approach" Symmetry 17, no. 10: 1624. https://doi.org/10.3390/sym17101624

APA Style

Boutmir, Y., Bannari, R., Bannari, A., Rouky, N., Benmoussa, O., & Fedouaki, F. (2025). Bi-Level Dependent-Chance Goal Programming for Paper Manufacturing Tactical Planning: A Reinforcement-Learning-Enhanced Approach. Symmetry, 17(10), 1624. https://doi.org/10.3390/sym17101624

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Machine	Period 1					Period 2					Period 3					Period 4
	G1	G2	G3	G4	G5	G1	G2	G3	G4	G5	G1	G2	G3	G4	G5	G1	G2	G3	G4	G5
M1	125	0	89	–	0	0	134	76	–	98	118	0	92	–	0	0	128	0	–	105
M2	0	–	0	156	87	98	–	0	143	0	0	–	67	148	94	112	–	0	0	89
M3	–	142	0	0	76	–	0	89	134	0	–	156	0	0	82	–	0	78	145	0
M4	89	0	–	0	–	0	125	–	167	–	104	0	–	0	–	0	138	–	158	–
M5	0	0	76	–	123	87	0	0	–	0	0	0	89	–	134	96	0	0	–	0
M6	–	–	94	0	0	–	–	0	0	118	–	–	104	0	0	–	–	0	0	127
M7	0	89	0	124	–	76	0	98	0	–	0	112	0	135	–	88	0	87	0	–
M8	134	–	–	0	97	0	–	–	148	0	125	–	–	0	106	0	–	–	142	0

Machine	Period 1					Period 2					Period 3					Period 4
	G1	G2	G3	G4	G5	G1	G2	G3	G4	G5	G1	G2	G3	G4	G5	G1	G2	G3	G4	G5
M1	125	0	89	–	0	0	134	76	–	98	118	0	92	–	0	0	128	0	–	105
M2	0	–	0	156	87	98	–	0	143	0	0	–	67	148	94	112	–	0	0	89
M3	–	142	0	0	76	–	0	89	134	0	–	156	0	0	82	–	0	78	145	0
M4	89	0	–	0	–	0	125	–	167	–	104	0	–	0	–	0	138	–	158	–
M5	0	0	76	–	123	87	0	0	–	0	0	0	89	–	134	96	0	0	–	0
M6	–	–	94	0	0	–	–	0	0	118	–	–	104	0	0	–	–	0	0	127
M7	0	89	0	124	–	76	0	98	0	–	0	112	0	135	–	88	0	87	0	–
M8	134	–	–	0	97	0	–	–	148	0	125	–	–	0	106	0	–	–	142	0

Article Menu

Bi-Level Dependent-Chance Goal Programming for Paper Manufacturing Tactical Planning: A Reinforcement-Learning-Enhanced Approach

Abstract

1. Introduction

2. Mathematical Model Formulation

2.1. Sets and Indices

2.2. Parameters

2.2.1. Deterministic Parameters

2.2.2. Uncertain Random Parameters

2.2.3. Goal Programming Parameters

2.3. Decision Variables

2.3.1. Upper-Level Decision Variables (Distribution Planning)

2.3.2. Lower-Level Decision Variables (Production Planning)

2.3.3. Goal Programming Variables

2.4. Bi-Level Mathematical Model

2.4.1. Upper-Level Problem (Distribution Planning)

2.4.2. Lower-Level Problem (Production Planning)

2.5. Bi-Level Coordination Algorithm

2.6. Paper-Manufacturing-Specific Constraints

3. Uncertain Random Theory for Hybrid Uncertainty Modeling

3.1. Mathematical Foundation of Uncertain Random Variables

3.2. Chance Measure Framework

3.3. Distributional Characterization

3.4. Statistical Measures

3.5. Dependent-Chance Goal Programming Methodology

3.6. Paper Manufacturing Parameter Modeling

3.7. Tactical Planning Application Framework

3.8. Implementation Considerations

4. Hybrid Intelligent Algorithm

4.1. Uncertain Random Simulations

4.2. Theoretical Foundation of Exploration-Exploitation Trade-Off

4.3. Reinforcement-Learning-Enhanced Arithmetic Optimization Algorithm

4.3.1. Introduction to the Arithmetic Optimization Algorithm

4.3.2. Basic Arithmetic Optimization Algorithm

4.3.3. Reinforcement Learning Enhancement

4.3.4. Adaptive Parameter Tuning for Bi-Level Coordination

4.4. Reinforcement Learning Components

4.4.1. State Space Discretization for Bi-Level Planning

4.4.2. Q-Learning for Bi-Level Parameter Adaptation

4.4.3. Adaptive Parameter Control for Tactical Planning

4.5. Solution Encoding and Decoding for Bi-Level Planning

Bi-Level Variable Encoding Strategy

4.6. Constraint Handling Mechanisms for Bi-Level Planning

4.7. Complete Hybrid Intelligent Algorithm

5. Numerical Results and Analysis

5.1. Experimental Setup

5.1.1. Algorithm Parameter Settings

5.1.2. Parameter Calibration Methodology

5.1.3. Test Instance Characteristics

5.1.4. Uncertain Random Parameter Specifications

5.1.5. Computational Environment

5.2. Bi-Level Dependent-Chance Goal Programming Results

5.2.1. Probability Achievement Analysis

5.2.2. Bi-Level Coordination Analysis

5.3. Paper Manufacturing Instance Details

5.3.1. Paper Machine and Grade Information

5.3.2. Customer Demand Information

5.3.3. Baseline Algorithm Specifications

5.4. Algorithm Performance Comparison

5.5. Empirical Validation of Balance Strategy

5.6. Reinforcement Learning Component Analysis

5.7. Sensitivity Analysis

5.7.1. Confidence Level Impact

5.7.2. Production–Distribution Coordination Analysis

5.8. Paper Manufacturing Specific Analysis

5.8.1. Paper Grade Production Allocation

5.8.2. Machine Changeover Analysis

5.9. Cost–Benefit Analysis for Paper Manufacturing

5.10. Target Probability Level (αg) Impact on Lexicographic Feasibility

5.11. Expert Opinion Impact and Sensitivity Analysis

5.12. Comparative Analysis: Deterministic vs. Hybrid Uncertainty Framework

5.13. Managerial Insights and Implementation Guidelines

5.14. Limitations and Future Research Directions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

5.10. Target Probability Level (α_g) Impact on Lexicographic Feasibility

Machine	Period 1					Period 2					Period 3					Period 4
	G1	G2	G3	G4	G5	G1	G2	G3	G4	G5	G1	G2	G3	G4	G5	G1	G2	G3	G4	G5
M1	125	0	89	–	0	0	134	76	–	98	118	0	92	–	0	0	128	0	–	105
M2	0	–	0	156	87	98	–	0	143	0	0	–	67	148	94	112	–	0	0	89
M3	–	142	0	0	76	–	0	89	134	0	–	156	0	0	82	–	0	78	145	0
M4	89	0	–	0	–	0	125	–	167	–	104	0	–	0	–	0	138	–	158	–
M5	0	0	76	–	123	87	0	0	–	0	0	0	89	–	134	96	0	0	–	0
M6	–	–	94	0	0	–	–	0	0	118	–	–	104	0	0	–	–	0	0	127
M7	0	89	0	124	–	76	0	98	0	–	0	112	0	135	–	88	0	87	0	–
M8	134	–	–	0	97	0	–	–	148	0	125	–	–	0	106	0	–	–	142	0