Next Article in Journal
Comparative Analysis of Local Flow Fields of Typical Inner Jet Holes-Type Reverse Circulation Drill Bit for Pneumatic Hollow-Through DTH Hammer Based on CFD Simulation
Previous Article in Journal
HGAA: A Heterogeneous Graph Adaptive Augmentation Method for Asymmetric Datasets
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bi-Level Dependent-Chance Goal Programming for Paper Manufacturing Tactical Planning: A Reinforcement-Learning-Enhanced Approach

1
Laboratory of Engineering Sciences, Department of Informatics, Logistics and Mathematics, National School of Applied Sciences, Ibn Tofaïl University, Kenitra 14000, Morocco
2
Faculty of Science and Technology, Sultan Moulay Slimane University, Beni Mellal 23000, Morocco
3
Faculty of Sciences and Techniques, Hassan 1st University, Settat 26000, Morocco
4
Euromed University of Fes, UEMF, Fes 30000, Morocco
5
Department of Industrial Engineering, National Higher School of Arts and Professions, University Hassan 2nd of Casablanca, 150 Avenue Nil, Casablanca 20670, Morocco
*
Author to whom correspondence should be addressed.
Symmetry 2025, 17(10), 1624; https://doi.org/10.3390/sym17101624
Submission received: 14 July 2025 / Revised: 28 August 2025 / Accepted: 2 September 2025 / Published: 1 October 2025

Abstract

Tactical production–distribution planning in paper manufacturing involves hierarchical decision-making under hybrid uncertainty, where aleatory randomness (demand fluctuations, machine variations) and epistemic uncertainty (expert judgments, market trends) simultaneously affect operations. Existing approaches fail to address the bi-level nature under hybrid uncertainty, treating production and distribution decisions independently or using single-paradigm uncertainty models. This research develops a bi-level dependent-chance goal programming framework based on uncertain random theory, where the upper level optimizes distribution decisions while the lower level handles production decisions. The framework exploits structural symmetries through machine interchangeability, symmetric transportation routes, and temporal symmetry, incorporating symmetry-breaking constraints to eliminate redundant solutions. A hybrid intelligent algorithm (HIA) integrates uncertain random simulation with a Reinforcement-Learning-enhanced Arithmetic Optimization Algorithm (RL-AOA) for bi-level coordination, where Q-learning enables adaptive parameter tuning. The RL component utilizes symmetric state representations to maintain solution quality across symmetric transformations. Computational experiments demonstrate HIA’s superiority over standard metaheuristics, achieving 3.2–7.8% solution quality improvement and 18.5% computational time reduction. Symmetry exploitation reduces search space by approximately 35%. The framework provides probability-based performance metrics with optimal confidence levels (0.82–0.87), offering 2.8–4.5% annual cost savings potential.

1. Introduction

The paper manufacturing industry, valued at approximately USD 400 billion globally, represents one of the most operationally complex sectors in modern manufacturing, where tactical production–distribution planning decisions directly impact both economic performance and environmental sustainability [1]. Paper production involves intricate interdependencies between raw material procurement, multi-grade production processes, and diverse distribution networks serving packaging, publishing, and specialty applications [2]. The industry’s operational structure exhibits significant structural symmetries, including machine interchangeability for certain paper grades, bidirectional material flows between production facilities, and periodic patterns in demand cycles, which create opportunities for computational efficiency through systematic symmetry exploitation in optimization algorithms [3].
Paper manufacturing operations encompass diverse product portfolios including newsprint, packaging materials, tissue products, and specialty papers, each requiring specific production parameters and handling requirements [4]. Production facilities face machine compatibility constraints and sequence-dependent changeover requirements that impose significant setup times and costs when transitioning between product types [5]. The presence of symmetric machine capabilities—where multiple machines can produce identical grades with equivalent efficiency—introduces structural symmetries in the optimization problem formulation. These symmetries manifest as equivalent production schedules that yield identical costs and performance metrics, necessitating symmetry-aware optimization mechanisms to avoid redundant exploration of mathematically equivalent solutions. Raw material variability and seasonal demand patterns create additional complexity in capacity allocation decisions [6].
The tactical planning horizon of 3–6 months represents a critical decision space where strategic network design constraints meet operational execution requirements [7]. Companies must make production scheduling decisions that determine machine utilization and inventory positioning while simultaneously coordinating distribution strategies affecting customer service levels and transportation costs [8]. The bi-level nature of this problem exhibits hierarchical structural symmetries, where upper-level distribution decisions and lower-level production decisions display analogous constraint patterns and objective contributions, requiring coordinated optimization approaches that can exploit these symmetries while maintaining solution optimality.
Supply chain planning operates across multiple hierarchical levels with distinct time horizons and decision scopes [9]. Strategic planning (1–5 years) focuses on network design decisions, tactical planning (3–6 months) bridges strategic constraints with operational execution, and operational planning (daily to weekly) addresses immediate execution decisions [7]. Recent advances in optimization have demonstrated the effectiveness of symmetry-based decomposition techniques for large-scale planning problems. However, existing approaches for bi-level production–distribution planning have not systematically addressed symmetry identification and exploitation, particularly in the context of hybrid uncertainty. Our reinforcement-learning-enhanced arithmetic optimization approach incorporates symmetric state representations in Q-learning to enable efficient exploration while avoiding redundant evaluations of symmetric solution regions.
Tactical planning in paper manufacturing operates under hybrid uncertainty that cannot be adequately captured by traditional single-paradigm approaches [10]. Aleatory uncertainty sources include demand fluctuations, machine efficiency variations, and quality variations that follow well-defined probability distributions based on historical data [11]. Epistemic uncertainty stems from incomplete knowledge about market trends, regulatory changes, and supplier reliability, where expert judgment must complement quantitative data [12]. Traditional optimization approaches treating these uncertainties independently fail to capture their interactive effects and compound impacts on tactical decisions [13].
Despite extensive research in production–distribution planning, several critical gaps remain that limit the effectiveness of existing approaches. Bi-level production–distribution planning research has primarily focused on deterministic parameters, failing to address goal-oriented frameworks that incorporate probability-based performance targets [14]. Goal programming applications have predominantly utilized deterministic formulations that assume exact target values, limiting their applicability in uncertain environments where probability-based objectives are more meaningful [15]. Uncertainty modeling has advanced significantly through stochastic programming and fuzzy programming approaches, yet hybrid uncertainty applications that simultaneously handle both aleatory and epistemic components remain limited [10]. Paper industry optimization research has developed sophisticated models for individual planning components but lacks comprehensive frameworks addressing integrated tactical-level decision coordination [1].
The symmetric properties inherent in paper manufacturing networks present both computational challenges and optimization opportunities. While these symmetries can exponentially expand the solution space through mathematically equivalent solutions, they also enable significant computational efficiencies when properly exploited. Our framework systematically addresses these symmetries through (1) identification and classification of structural symmetries in the bi-level problem formulation, (2) incorporation of symmetry-breaking constraints to eliminate redundant solution regions, (3) utilization of symmetric state representations in the reinforcement learning component to maintain solution quality across symmetric transformations, and (4) preservation of invariance properties in arithmetic optimization operators to ensure consistent performance across symmetric problem instances.
The primary objective of this research is to develop a comprehensive bi-level dependent-chance goal programming framework for tactical production–distribution planning in paper manufacturing under hybrid uncertainty. This research addresses four fundamental questions that guide the investigation: How can dependent-chance goal programming be adapted to handle bi-level tactical planning where production and distribution managers have different risk tolerances and objective priorities? What is the optimal bi-level structure for integrating production scheduling and distribution planning decisions while maintaining computational tractability? How effective is the proposed Reinforcement-Learning-enhanced Arithmetic Optimization Algorithm compared to existing metaheuristics for solving complex bi-level optimization problems under hybrid uncertainty? What are the trade-offs between different goal probability levels and how do these affect tactical planning performance in paper manufacturing contexts?
To address the identified research gaps and challenges, this study makes the following key contributions:
  • Bi-Level Dependent-Chance Goal Programming Framework: Development of the first comprehensive bi-level dependent-chance goal programming model for tactical production–distribution planning that simultaneously addresses hierarchical decision-making and probabilistic goal achievement under hybrid uncertainty, extending traditional goal programming theory into probabilistic domains with bi-level coordination.
  • Hybrid Uncertainty Modeling Enhancement: Advancement of uncertain random theory applications through a systematic framework for modeling hybrid uncertainty in tactical planning contexts, providing unified mathematical representations that capture both aleatory randomness from historical data and epistemic uncertainty from expert judgments.
  • Reinforcement-Learning-enhanced Arithmetic Optimization Algorithm: Introduction of the novel RL-AOA, which integrates Q-learning mechanisms with arithmetic optimization operators for adaptive bi-level coordination, enabling dynamic parameter tuning based on optimization progress and problem-specific characteristics.
  • Industry-Specific Tactical Planning Framework: Development of the first comprehensive tactical planning framework specifically designed for paper manufacturing environments, incorporating machine compatibility constraints, sequence-dependent changeover requirements, paper grade specifications, and multi-modal transportation considerations.
  • Computational Validation and Implementation Guidelines: Demonstration of algorithm superiority through extensive computational experiments achieving 3.2–7.8% solution quality improvement and 18.5% computational time reduction, accompanied by practical implementation guidelines and sensitivity analysis insights for industrial adoption.
  • Symmetry-Aware Optimization Techniques: Development of systematic approaches for identifying, classifying, and exploiting structural symmetries in paper manufacturing networks, demonstrating measurable computational improvements through intelligent symmetry breaking and symmetric state representations in reinforcement learning algorithms.
The remainder of this paper is organized as follows: Section 2 presents the mathematical formulation of the bi-level dependent-chance goal programming model including uncertain random variable definitions and chance measure calculations. Section 3 describes the theoretical foundations of uncertain random theory for hybrid uncertainty modeling and its application to tactical planning problems. Section 4 details the Reinforcement-Learning-enhanced Arithmetic Optimization Algorithm including bi-level coordination mechanisms and adaptive parameter tuning strategies. Section 5 provides comprehensive computational results and performance analysis across diverse paper manufacturing instances with comparative evaluation against existing approaches. Section 6 concludes with key findings, practical implications, and future research directions.

2. Mathematical Model Formulation

This section presents the comprehensive bi-level dependent-chance goal programming model for tactical production–distribution planning in paper manufacturing under hybrid uncertainty. The model captures the hierarchical decision-making structure where distribution coordinators (upper level) and production managers (lower level) operate with different objectives while facing uncertain random parameters.

2.1. Sets and Indices

The following Table 1 presents the sets and indices used throughout our paper mill supply chain optimization model. These notations form the foundation of our mathematical formulation, defining the scope of products, facilities, resources, and relationships within the supply chain network. Table 1 establishes the necessary mathematical notation that will be used consistently throughout the remainder of this paper.

2.2. Parameters

2.2.1. Deterministic Parameters

The deterministic parameters presented in Table 2 capture the fixed and known aspects of the paper mill supply chain. These values include production costs, technical specifications, capacity limitations, and other operational constants that do not vary with uncertainty. The parameters listed in Table 2 are essential for establishing the baseline operational constraints of the optimization model and remain constant throughout the planning horizon.

2.2.2. Uncertain Random Parameters

Our model incorporates uncertainty through a set of random parameters that represent the stochastic nature of real-world paper mill operations. These parameters, presented in Table 3 and denoted with a tilde, capture the variability in demand, costs, equipment performance, and material availability. The uncertain random parameters outlined in Table 3 are essential for developing robust solutions that can withstand real-world fluctuations in the paper manufacturing environment.

2.2.3. Goal Programming Parameters

The goal programming component of our model requires specific parameters to define target thresholds and confidence levels. The parameters listed in Table 4 establish the multi-objective framework that balances economic efficiency, operational performance, resource utilization, and quality assurance within the optimization process. Table 4 defines the target probability levels and thresholds that will be used to evaluate the achievement of each goal in our model.

2.3. Decision Variables

2.3.1. Upper-Level Decision Variables (Distribution Planning)

Our bi-level optimization model separates decision-making into distribution and production planning. The upper-level variables defined in Table 5 focus on distribution center operations, product shipping decisions, transportation mode selection, inventory management at distribution centers, and customer service levels. The variables presented in Table 5 determine the optimal distribution network configuration to meet customer demands efficiently.

2.3.2. Lower-Level Decision Variables (Production Planning)

The lower-level variables focus on production planning decisions within the paper mill. The variables defined in Table 6 determine optimal production quantities, machine assignments, production sequencing, raw material procurement, and warehouse inventory management. The lower-level decisions represented in Table 6 are made in response to the distribution requirements established at the upper level of the optimization model.

2.3.3. Goal Programming Variables

The goal programming framework requires specific variables to measure deviations from target thresholds. The variables presented in Table 7 quantify both positive and negative deviations from the four main objectives of our model: economic efficiency, operational performance, resource utilization, and quality assurance. These deviations defined in Table 7 are minimized in the objective function to achieve balanced solutions that satisfy multiple competing objectives simultaneously.

2.4. Bi-Level Mathematical Model

Notation: Throughout this paper, Ch{·} denotes the chance measure of uncertain random events, which quantifies the probability that an uncertain random variable satisfies a given condition. This measure integrates probability theory for aleatory uncertainty (randomness from historical variability) with uncertainty theory for epistemic uncertainty (imprecision from limited knowledge) within a unified mathematical framework, as detailed in the following definition.

2.4.1. Upper-Level Problem (Distribution Planning)

The upper level seeks to minimize distribution costs while maximizing service levels:
min Y , X , Z , U , I D , B , λ F U L = d = 1 d F C d · Y d + w = 1 w d = 1 d p = 1 p l = 1 l t = 1 t T C ˜ w d l t · X w d p l t + d = 1 d c = 1 c p = 1 p l = 1 l t = 1 t T C ˜ d c l t · Z d c p l t + d = 1 d p = 1 p t = 1 t H C ˜ p t · I d p t D + p = 1 p c = 1 c t = 1 t B C ˜ p c t · B p c t p = 1 p t = 1 t W λ · λ p t
Subject to upper-level constraints:
Distribution Center Capacity:
p = 1 p I d p t D D C A P d · Y d , d , t
This constraint ensures that the total inventory stored across all paper grades at each distribution center does not exceed the facility’s physical storage capacity, maintaining operational feasibility.
Transportation Mode Capacity:
C h w = 1 w d = 1 d p = 1 p X w d p l t + d = 1 d c = 1 c p = 1 p Z d c p l t L C A P l · ϕ ˜ l t · U l t β 1 , l , t
These constraints guarantee that the combined shipment volumes using each transportation mode remain within the mode’s available capacity limits, accounting for uncertain availability factors.
Customer Demand Satisfaction with Production Coordination:
C h d = 1 d l = 1 l Z d c p l t + B p c , t 1 B p c t = D ˜ p c t β 2 , p , c , t
d = 1 d c = 1 c l = 1 l Z d c p l t w = 1 w I w p t W + m = 1 m Q p m t , p , t
Constraint (4) maintains the demand–supply balance under uncertainty, while constraint (5) ensures that total customer shipments cannot exceed available warehouse inventory plus current production, creating explicit coordination between upper-level distribution decisions and lower-level production outputs.
Distribution Center Inventory Balance with Material Flow:
I d p , t 1 D + w = 1 w l = 1 l X w d p l t c = 1 c l = 1 l Z d c p l t = I d p t D , d , p , t
w = 1 w d = 1 d l = 1 l X w d p l t m = 1 m Q p m t + I w p , t 1 W , p , t
Constraint (6) preserves inventory continuity at distribution centers, while constraint (7) ensures that warehouse shipments to distribution centers are bounded by current production plus previous warehouse inventory, establishing the material flow dependency between production and distribution levels.
Service-Level Definition:
C h λ p t = d = 1 d c = 1 c l = 1 l Z d c p l t c = 1 c D ˜ p c t β 3 , p , t
This constraint establishes the service level metric by computing the ratio of fulfilled demand to total customer demand for each product in each time period.

2.4.2. Lower-Level Problem (Production Planning)

Given the upper-level decisions, the lower level minimizes production costs:
min Q , V , W , F , I W , R , I R F L L = p = 1 p m = 1 m t = 1 t P C ˜ p m t · Q p m t + p = 1 p m = 1 m t = 1 t K p m · V p m t + p i = 1 p p j = 1 p m = 1 m t = 1 t S C p i p j m · W p i p j m t + s = 1 s r = 1 r t = 1 t R C ˜ r t · R s r t + w = 1 w p = 1 p t = 1 t H C ˜ p t · I w p t W + r = 1 r t = 1 t H C ˜ r t · I r t R
Subject to lower-level constraints:
Machine Capacity:
C h p P M m Q p m t β p m · η ˜ p m t + p i = 1 p p j = 1 p S T p i p j m · W p i p j m t C A P m · ( 1 ω ˜ m t ) β 4 , m , t
This constraint controls machine capacity limitations by accounting for production time requirements, changeover durations, efficiency variations, and potential equipment breakdowns.
Machine–Product Compatibility:
V p m t = 0 , p P M m , m , t
These restrictions prevent the assignment of paper grades to incompatible machines, maintaining technical and operational feasibility in production scheduling.
Production–Setup Linking:
Q p m t M b i g · V p m t , p , m , t
Q p m t M W p · V p m t , p , m , t
Q p m t M X p · V p m t , p , m , t
The first constraint links production quantities to setup decisions using big-M methodology. The second ensures minimum production batch sizes when a setup occurs. The third controls maximum production limits per setup, maintaining cost-effective production runs.
Changeover Logic:
p j p i W p i p j m t = V p i , m , t 1 V p i , m t , p i , m , t
p i p j W p i p j m t = V p j , m t V p j , m , t 1 , p j , m , t
These logical constraints control machine changeover sequences by tracking transitions between different paper grades while maintaining temporal consistency in production scheduling.
Color/Grade Transition Restrictions:
W p i p j m t = 0 , ( p i , p j ) Φ i n c o m p a t i b l e , m , t
This constraint prevents direct transitions between incompatible paper grades (e.g., dark to light colors), ensuring product quality and reducing contamination risks.
Maximum Daily Changeovers:
p i = 1 p p j p i W p i p j m t M a x C h a n g e o v e r m , m , t
These limitations control the total number of product changeovers per machine per period, balancing production flexibility with setup cost management and operational stability.
Production–Distribution Material Flow Balance:
m = 1 m Q p m t = w = 1 w l = 1 l F w p l t , p , t
This constraint ensures that total production output for each paper grade equals total shipments from production facilities to warehouses, creating the fundamental link between lower-level production decisions and upper-level distribution planning.
Quality Grade Production Requirements:
C h m P M p Q p m t · γ ˜ p t Q M i n p · m P M p V p m t β 25 , p , t
This constraint guarantees that the total quality-adjusted production output for each paper grade meets the minimum quality standards required by customer specifications.
Machine Quality Compatibility:
C h Q p m t · γ ˜ p t M i n Q u a l i t y p m · V p m t β 26 , p , m P M p , t
These requirements ensure that each machine–product combination achieves the necessary quality levels, preventing the production of substandard products.
Raw Material Requirements:
C h p P R r α p r · Q p m t · γ ˜ p t I r , t 1 R + s = 1 s R s r t β 5 , r , m , t
This constraint ensures that sufficient raw materials are available to support the planned production quantities, accounting for quality yield variations and material consumption rates.
Raw Material Inventory Balance:
C h I r , t 1 R + s = 1 s R s r t p P R r m = 1 m α p r · Q p m t · γ ˜ p t = I r t R β 6 , r , t
These balance equations maintain continuity of raw material inventories by tracking incoming purchases, consumption in production processes, and ending inventory levels.
Raw Material Availability:
C h s = 1 s R s r t θ ˜ r t β 7 , r , t
This constraint controls supplier capacity limitations and market availability constraints for raw materials, incorporating supply uncertainty into procurement decisions.
Warehouse Capacity:
p = 1 p I w p t W W C A P w , w , t
These capacity restrictions ensure that warehouse storage limits are respected across all product types, preventing facility overutilization and maintaining storage efficiency.
Warehouse Inventory Balance:
I w p , t 1 W + l = 1 l F w p l t d = 1 d l = 1 l X w d p l t = I w p t W , w , p , t
These equations preserve inventory continuity at warehouse facilities by balancing previous inventory levels, incoming production, and outgoing shipments to distribution centers.

2.5. Bi-Level Coordination Algorithm

The coordination between production and distribution decisions requires an iterative approach that ensures feasible material flow while satisfying probabilistic constraints. To address this challenge, we develop a specialized bi-level coordination algorithm that alternates between solving the lower-level production problem and upper-level distribution problem until convergence is achieved. Algorithm 1 presents the detailed coordination procedure.
Algorithm 1 Bi-Level Coordination for Paper Manufacturing Planning
  • Require: Initial production capacities, demand forecasts, distribution network
   1:
  Initialize production variables Qpmt and distribution variables Xwdplt, Zdcplt
   2:
  repeat
   3:
   Lower Level: Solve production problem to determine Qpmt given current distribution requirements
   4:
   Update material availability: A v a i l a b l e M a t e r i a l p t = m = 1 m Q p m t
   5:
   Upper Level: Solve distribution problem with constraint w , d , l X w d p l t A v a i l a b l e M a t e r i a l p t
   6:
   Calculate normalized coordination gap: G a p = | m Q p m t w , l F w p l t | m Q p m t + 10 6
   7:
  until Gap < 0.01 AND all chance constraints satisfied
   8:
  return Coordinated solution (Q, X, Z)
This algorithm ensures feasible coordination by iteratively solving both levels while enforcing material flow constraints (5), (7), and (19) until convergence. The normalized coordination gap metric represents the relative material flow imbalance as a fraction of total production, with values close to 0 indicating strong bi-level coordination.

2.6. Paper-Manufacturing-Specific Constraints

Paper Grade Quality Requirements:
C h Q p m t · γ ˜ p t Q M i n p · V p m t β 8 , p , m , t
This constraint requires that quality-adjusted production meets minimum grade specifications, ensuring customer satisfaction and regulatory compliance for paper products.
Shelf-Life Constraints:
(28) I w p t W τ = max ( 1 , t S L p + 1 ) t l = 1 l F w p τ l , w , p , t (29) I d p t D τ = max ( 1 , t S L p + 1 ) t w = 1 w l = 1 l X w d p τ l , d , p , t
The first constraint prevents inventory spoilage at warehouses by limiting stock to recently produced items within shelf-life limits. The second maintains similar freshness requirements at distribution centers to maintain product quality.
Seasonal Production Constraints:
C h m = 1 m Q p m t S M i n p t · c = 1 c D ˜ p c t β 9 , p P s e a s o n a l , t
C h m = 1 m Q p m t S M a x p t · c = 1 c D ˜ p c t β 10 , p P s e a s o n a l , t
The first constraint controls minimum seasonal production levels relative to demand patterns for specialized paper grades. The second controls upper bounds on seasonal production to prevent overproduction and excess inventory.
Environmental Compliance:
C h p = 1 P m = 1 M t = 1 T E m i s s i o n R a t e ˜ p m · Q p m t E m i s s i o n L i m i t β e m i s s i o n
This constraint ensures environmental compliance by limiting total emissions from production activities over the planning horizon. The uncertain emission rates ( E m i s s i o n R a t e ˜ p m ) account for variability in production efficiency and equipment performance. The chance constraint formulation guarantees environmental compliance with confidence level βemission, maintaining consistency with the hybrid uncertainty framework.
The confidence level βemission = 0.95 follows EPA regulatory standards for industrial emissions compliance [16]. The emission limit of EmissionLimit = 1200 tons CO2-equivalent per planning horizon is based on established benchmarks for medium-scale paper manufacturing facilities [17].
Non-Negativity and Binary Constraints:
Q p m t , X w d p l t , Z d c p l t , F w p l t , I w p t W , I d p t D , I r t R , R s r t , B p c t 0 V p m t , W p i p j m t , Y d , U l t { 0 , 1 } d 1 + , d 1 , d 2 + , d 2 , d 3 + , d 3 , d 4 + , d 4 0 0 λ p t 1
These constraints establish variable domains, ensuring the physical feasibility of continuous quantities, logical consistency of binary decisions, and proper bounds for service level indicators.
This comprehensive bi-level dependent-chance goal programming model captures the tactical production–distribution planning decisions in paper manufacturing under hybrid uncertainty, where chance measures Ch{·} are evaluated using uncertain random simulation techniques. The model incorporates explicit coordination constraints (5), (7), and (19) that ensure feasible integration between upper-level distribution decisions and lower-level production outputs, addressing the bi-level coupling requirements for tactical planning optimization.

3. Uncertain Random Theory for Hybrid Uncertainty Modeling

Real-world production and distribution systems in paper manufacturing face dual uncertainty challenges that traditional optimization approaches struggle to address adequately. These systems must contend with objective variability inherent in operational processes (aleatory uncertainty) alongside subjective assessments arising from limited information and expert judgments (epistemic uncertainty). While classical probability theory excels at modeling random phenomena and fuzzy set theory handles belief-based information, neither framework alone sufficiently captures scenarios where both uncertainty types coexist simultaneously.
This section develops a comprehensive mathematical framework based on uncertain random theory that unifies both uncertainty dimensions within a single modeling paradigm. Our approach employs chance measures to create integrated representations that facilitate tactical decision-making under hybrid uncertainty conditions in paper manufacturing environments.

3.1. Mathematical Foundation of Uncertain Random Variables

Manufacturing planning decisions frequently involve parameters that exhibit both stochastic behavior and subjective uncertainty components. To address this complexity, we establish uncertain random variables as our fundamental modeling construct, building upon the theoretical framework developed by Liu [18].
Definition 1
(Uncertain Random Variable). Let ( Ω , A , P r ) denote a probability space and consider a mapping ξ ˜ : Ω U , where U represents the space of uncertain variables. The function ξ ˜ constitutes an uncertain random variable when M { ξ ˜ ( ω ) B } remains measurable with respect to ω for every Borel set B R .
This mathematical structure captures the dual nature of hybrid uncertainty through its composition:
  • Stochastic Component: The probability space ( Ω , A , P r ) models objective randomness derived from historical data and measurable variations.
  • Epistemic Component: For each probabilistic outcome ω, the mapping ξ ˜ ( ω ) yields an uncertain variable representing subjective assessments and expert knowledge.
Manufacturing Context Example: Consider demand forecasting for paper grade p from customer c in period t, denoted as D ˜ p c t . The stochastic dimension captures quantifiable demand patterns extracted from sales history, while the uncertain dimension incorporates qualitative factors such as market sentiment, competitive dynamics, and economic indicators that resist precise probabilistic characterization.

3.2. Chance Measure Framework

The integration of probability and uncertainty measures requires a unified metric capable of handling both components simultaneously. We employ the chance measure, which provides this integration through the following mathematical construction.
Definition 2
(Chance Measure). Given an uncertain random variable ξ ˜ and a Borel set B R , the chance measure of the event { ξ ˜ B } is defined as:
C h { ξ ˜ B } = 0 1 P r { ω Ω : M { ξ ˜ ( ω ) B } r } d r
The chance measure satisfies essential mathematical properties that ensure consistent behavior:
  • Normalization: C h { ξ ˜ R } = 1
  • Self-Duality: C h { ξ ˜ B } + C h { ξ ˜ B c } = 1
  • Monotonicity: B 1 B 2 C h { ξ ˜ B 1 } C h { ξ ˜ B 2 }
Boundary Behavior: The chance measure reduces to familiar measures in limiting cases:
  • When uncertainty vanishes, C h { ξ ˜ B } = P r { X B } for random variable X
  • When randomness vanishes, C h { ξ ˜ B } = M { η B } for uncertain variable η

3.3. Distributional Characterization

Definition 3
(Chance Distribution Function). The chance distribution function of uncertain random variable ξ ˜ is given by:
Φ ( x ) = C h { ξ ˜ x } , x R
Theorem 1
(Distribution Characterization). A function Φ : R [ 0 , 1 ] represents a valid chance distribution if and only if it is non-decreasing with lim x Φ ( x ) = 0 and lim x + Φ ( x ) = 1 .

3.4. Statistical Measures

For practical decision-making applications, we require scalar summary statistics that characterize uncertain random variable behavior.
Definition 4
(Expected Value). The expected value of uncertain random variable ξ ˜ is:
E [ ξ ˜ ] = 0 + C h { ξ ˜ r } d r 0 C h { ξ ˜ r } d r
when at least one integral converges.
Definition 5
(Variance). For uncertain random variable ξ ˜ with finite expected value μ = E [ ξ ˜ ] , the variance is:
V a r [ ξ ˜ ] = E [ ( ξ ˜ μ ) 2 ]
Theorem 2
(Linearity Property). For uncertain random variable ξ ˜ with finite expectation and constants a , b R :
E [ a ξ ˜ + b ] = a E [ ξ ˜ ] + b

3.5. Dependent-Chance Goal Programming Methodology

Traditional optimization approaches in manufacturing planning typically focus on constraint satisfaction at predetermined confidence levels. However, tactical planning benefits from directly optimizing the likelihood of achieving strategic objectives. Dependent-chance goal programming addresses this need by incorporating probabilistic objective achievement into the optimization framework.
Definition 6
(Dependent-Chance Programming). Given uncertain random variable ξ ˜ and target threshold F, dependent-chance programming seeks to maximize C h { ξ ˜ F } or C h { ξ ˜ F } depending on the objective orientation.
For multi-objective tactical planning scenarios involving competing priorities, we extend this concept through goal programming principles:
  • Target Specification: Decision-makers establish desired achievement probabilities αg for each tactical objective g
  • Deviation Modeling: Variables d g + and d g capture positive and negative deviations from target achievement levels
  • Hierarchical Optimization: Objectives receive priority rankings and undergo sequential optimization based on strategic importance
The general mathematical formulation follows:
(33) lexicographic   minimize { d 1 , d 2 , , d n } (34) subject   to C h { f g ( x , ξ ˜ ) F g } + d g d g + = α g , g = 1 , , n (35)   x X (36)   d g + , d g 0 , g = 1 , , n
where f g ( x , ξ ˜ ) represents the g-th objective function, ⋈ denotes the appropriate inequality relation, Fg specifies the target threshold, and X defines the feasible decision space.
Methodological Advantages:
  • Decision Transparency: Probability-based metrics provide intuitive interpretation for management
  • Priority Accommodation: Lexicographic structure aligns with organizational strategic hierarchies
  • Robustness Enhancement: Probabilistic focus inherently addresses uncertainty impacts
  • Unified Treatment: Chance measures handle both aleatory and epistemic uncertainties consistently

3.6. Paper Manufacturing Parameter Modeling

Paper manufacturing tactical planning involves numerous parameters exhibiting hybrid uncertainty characteristics. Our uncertain random variable framework provides natural representations for key planning elements:
  • Demand Forecasting  D ˜ p c t : Combines quantitative sales patterns with qualitative market intelligence and customer behavior assessments
  • Production Costs  P C ˜ p m t : Integrates observable input price volatility with subjective cost estimations for different paper grades and machine configurations
  • Raw Material Availability  R C ˜ r t : Merges market price data with uncertain supply chain assessments for pulp and recycled material sources
  • Transportation Expenses  T C ˜ i j l t : Combines fuel cost fluctuations with uncertain capacity availability and route condition evaluations
  • Quality Performance  γ ˜ p t : Unifies quantitative quality measurements with expert assessments of production environment factors
  • Equipment Efficiency  η ˜ p m t : Integrates historical machine performance with expert judgments regarding maintenance requirements and operational conditions
This modeling approach enables more realistic representation of paper manufacturing uncertainty while maintaining mathematical rigor necessary for optimization-based tactical planning.

3.7. Tactical Planning Application Framework

We now present the specific dependent-chance goal programming formulation for paper manufacturing tactical planning, considering the following strategic priority hierarchy:
Priority Level 1—Economic Performance: The probability of total bi-level cost remaining below threshold Target1 should achieve confidence level α1.
C h { F U L + F L L T a r g e t 1 } + d 1 d 1 + = α 1
Priority Level 2—Service Excellence: The probability of average service level exceeding threshold Target2 should achieve confidence level α2.
C h 1 | P | × | T | p , t λ p t T a r g e t 2 + d 2 d 2 + = α 2
Priority Level 3—Resource Optimization: The probability of average capacity utilization exceeding threshold Target3 should achieve confidence level α3.
C h 1 m , t 1 m , t p Q p m t β p m · C A P m T a r g e t 3 + d 3 d 3 + = α 3
Priority Level 4—Quality Assurance: The probability of average quality performance exceeding threshold Target4 should achieve confidence level α4.
C h 1 | P | × | T | p , t γ ˜ p t T a r g e t 4 + d 4 d 4 + = α 4
The complete tactical planning model becomes:
(41) lexicographic   minimize { d 1 , d 2 , d 3 , d 4 } (42) subject   to Equations (49)–(52) (43)   Production and distribution constraints (44)   d g + , d g 0 , g = 1 , 2 , 3 , 4
This formulation systematically minimizes under-achievement of target probability levels across all tactical objectives. The first priority addresses cost minimization (seeking performance below threshold), while subsequent priorities target performance maximization (seeking performance above thresholds). The chance measure framework handles both random and uncertain parameter components while providing intuitive probabilistic interpretations for tactical decision-makers.

3.8. Implementation Considerations

The proposed uncertain random theory framework offers several practical advantages for paper manufacturing tactical planning:
  • Comprehensive Objective Integration: Simultaneously addresses economic efficiency, operational performance, resource utilization, and quality assurance within a unified bi-level optimization framework
  • Hybrid Uncertainty Management: Effectively handles both quantifiable variations and subjective assessments in critical parameters including demand, costs, efficiency, and quality metrics
  • Confidence-Based Planning: Enables tactical planners to specify desired confidence levels for goal achievement across the planning horizon
  • Systematic Multi-Objective Balancing: Provides structured approach to managing competing tactical objectives while considering integrated production-distribution decisions
The dependent-chance goal programming methodology prioritizes tactical goals through lexicographic optimization while accounting for complex interactions between random and uncertain variables characteristic of paper manufacturing environments. This approach generates operationally viable solutions that demonstrate robustness against the diverse uncertainty sources encountered in tactical production-distribution planning contexts.

4. Hybrid Intelligent Algorithm

Addressing the computational complexity inherent in our bi-level dependent-chance goal programming formulation for tactical paper manufacturing planning necessitates a sophisticated algorithmic approach. We propose a hybrid computational framework that synergistically combines uncertain random simulation methodologies with reinforcement learning-augmented arithmetic optimization techniques. This algorithmic architecture is specifically designed to navigate the intricate landscape of hybrid uncertainties while managing the hierarchical structure of our bi-level optimization model.

4.1. Uncertain Random Simulations

The computational evaluation of chance measures within our bi-level tactical planning framework requires specialized simulation methodologies to handle expressions of the form Ch{gj(x, ξ) ≤ 0} ≥ βj. These chance-constrained formulations represent fundamental components of our paper manufacturing optimization model, where accurate estimation becomes critical for obtaining feasible and robust tactical decisions.
The mathematical foundation underlying chance measure evaluation rests on recognizing that Ch{g(x, ξ) ≤ 0} constitutes the expected value of uncertain measures M { g ( x , ξ ( ω ) ) 0 } across all possible random realizations. Drawing inspiration from Monte Carlo principles [19], we construct a specialized computational approach tailored for chance-measure approximation in tactical planning contexts.
Our uncertain random simulation methodology serves as the computational backbone for evaluating probabilistic constraints under hybrid uncertainty conditions characteristic of paper manufacturing environments. The following Algorithm 2 establishes our Monte Carlo framework that systematically samples from random distributions and computes uncertain measures for each realization, ultimately yielding reliable chance measure estimates essential for production-distribution decision-making.
Algorithm 2 Uncertain Random Simulation for Chance Measure Estimation
  • Description: This algorithm estimates the chance measure of uncertain random events by employing Monte Carlo sampling to approximate the expected value of uncertain measures across various random scenarios in paper manufacturing planning.
   1:
  Initialize counter ← 0
   2:
  for k = 1 to NMC do
   3:
   Generate random sample ωk based on its probability distribution
   4:
   Compute M { g ( x , ξ ( ω k ) ) 0 } using Algorithm 3
   5:
    c o u n t e r c o u n t e r + M { g ( x , ξ ( ω k ) ) 0 }
   6:
  end for
   7:
  return  c o u n t e r / N M C as the estimated chance measure
The evaluation of uncertain measures requires implementing uncertainty theory principles [20] to assess constraint satisfaction under subjective uncertainty conditions. Algorithm 3 implements an inverse distribution approach for generating uncertain variable realizations and determining constraint satisfaction at specified uncertainty thresholds.
Algorithm 3 Uncertain Measure Computation
  • Description: This algorithm computes the uncertain measure for a specified constraint by employing the inverse distribution function approach to generate uncertain variable realizations and evaluate constraint satisfaction in tactical planning contexts.
   1:
  Generate α uniformly from [0, 1]
   2:
  Compute τ−1(α) for uncertain variables τ in ξ(ωk)
   3:
  if g(x, ξ(ωk)|τ=τ−1(α)) ≤ 0 then
   4:
   return 1
   5:
  else
   6:
   return 0
   7:
  end if

4.2. Theoretical Foundation of Exploration-Exploitation Trade-Off

The fundamental challenge in metaheuristic optimization lies in achieving optimal balance between search space exploration and solution refinement (exploitation). Following the theoretical framework established by Morales-Castañeda et al. [21], this balance can be quantitatively assessed through diversity metrics that capture population distribution characteristics throughout the search landscape.
Consider a population consisting of N individuals operating within a D-dimensional decision space. The diversity measure at iteration t is mathematically expressed as:
Diversity ( t ) = 1 D j = 1 D 1 N i = 1 N ( x i , j ( t ) x ¯ j ( t ) ) 2
where xi,j(t) denotes the j-th dimensional component of the i-th individual at iteration t, and x ¯ j ( t ) represents the population centroid in dimension j. The corresponding exploration and exploitation percentages are computed as:
(46) Exploration % ( t ) = Diversity ( t ) Diversity max × 100 (47) Exploitation % ( t ) = 100 Exploration % ( t )
Balance quality assessment employs incremental-decremental analysis:
(48) Incremental ( t ) = max ( 0 , Diversity ( t ) Diversity ( t 1 ) ) (49) Decremental ( t ) = max ( 0 , Diversity ( t 1 ) Diversity ( t ) ) (50) Balance Quality ( t ) = Incremental ( t ) Decremental ( t )

4.3. Reinforcement-Learning-Enhanced Arithmetic Optimization Algorithm

Our bi-level chance-constrained goal programming model demands a sophisticated optimization approach capable of coordinating tactical production-distribution decisions under hybrid uncertainty. We develop an enhanced Arithmetic Optimization Algorithm (AOA) framework augmented with reinforcement learning capabilities to address these computational challenges effectively.

4.3.1. Introduction to the Arithmetic Optimization Algorithm

The Arithmetic Optimization Algorithm draws inspiration from fundamental mathematical operations: addition (+), subtraction (−), multiplication (×), and division (÷) [22]. This metaheuristic approach simulates arithmetic operations to achieve effective search space exploration and exploitation, making it particularly well-suited for complex bi-level optimization challenges in paper manufacturing contexts.
Several distinctive characteristics render AOA especially effective for tactical production-distribution planning applications:
  • Mathematical Optimizer (MO) Framework: AOA incorporates a Mathematical Optimizer mechanism that dynamically determines exploration versus exploitation strategies based on the Mathematical Optimizer Accelerated (MOA) coefficient:
    M O A ( C i t e r ) = M i n + C i t e r × M a x M i n C M a x
    where Citer represents the current iteration number, CMax denotes maximum iterations, and Min, Max define operational bounds.
  • Arithmetic Operation Strategy: AOA employs four fundamental arithmetic operators for position updating:
    • Addition and subtraction operations facilitate exploration
    • Multiplication and division operations enable exploitation
  • Adaptive Search Framework: The algorithm dynamically transitions between exploration and exploitation phases based on optimization progress, providing ideal coordination capabilities for bi-level optimization.
  • Hierarchical Optimization Support: AOA’s mathematical foundation naturally accommodates hierarchical optimization scenarios where upper-level distribution decisions and lower-level production decisions require coordination through arithmetic operations.
These algorithmic characteristics make AOA particularly suitable for solving our bi-level dependent-chance goal programming model with hybrid uncertainty in paper manufacturing tactical planning.

4.3.2. Basic Arithmetic Optimization Algorithm

The fundamental AOA framework operates through the following computational phases:
  • Population Initialization: Generate Np solution candidates randomly within feasible regions for both upper- and lower-level variables.
  • Mathematical Optimizer Computation: Calculate the Mathematical Optimizer Accelerated function:
    M O A ( C i t e r ) = M i n + C i t e r × M a x M i n C M a x
  • Mathematical Optimizer Phase: Determine search strategy orientation:
    M O P ( C i t e r ) = 1 C i t e r 1 / α C M a x 1 / α
    where α controls exploitation precision.
  • Exploration Phase (when r1 > MOA): Apply addition and subtraction operators:
    x i , j t + 1 = b e s t j × M O P × ( ( U B j L B j ) × μ + L B j ) , r 2 < 0.5
    x i , j t + 1 = b e s t j × M O P × ( ( U B j L B j ) × μ + L B j ) , r 2 0.5
  • Exploitation Phase (when r1MOA): Apply multiplication and division operators:
    x i , j t + 1 = b e s t j M O P × ( ( U B j L B j ) × μ + L B j ) , r 3 < 0.5
    x i , j t + 1 = b e s t j + M O P × ( ( U B j L B j ) × μ + L B j ) , r 3 0.5
This fundamental framework establishes the foundation for our enhanced approach, demonstrating core mathematical operations and search mechanisms that underpin our reinforcement learning enhancements.

4.3.3. Reinforcement Learning Enhancement

We augment the basic AOA framework with reinforcement learning methodologies [23] to enhance convergence characteristics and solution quality for bi-level tactical planning applications. The RL integration encompasses:
  • State Space Representation: The state vector incorporates discretized optimization metrics for both hierarchical levels:
    s t = { d i v e r s i t y d U L , d i v e r s i t y d L L , convergence_rate d , s t a g n a t i o n d , bilevel_gap d }
    where continuous metrics undergo discretization into three categories (Low, Medium, High) according to tactical planning requirements.
  • Action Space Definition: Actions involve adjusting AOA parameters and bi-level coordination strategies:
    • Action 1: Enhance exploration intensity;
    • Action 2: Intensify exploitation focus;
    • Action 3: Balance exploration-exploitation dynamics;
    • Action 4: Strengthen bi-level coordination.
  • Reward Function Structure: The reward mechanism incorporates both solution quality and bi-level coordination effectiveness:
    R t = f t 1 U L + f t 1 L L ( f t U L + f t L L ) | f t 1 U L + f t 1 L L | + ϵ + λ · c o o r d t if   improvement p e n a l t y otherwise
    where coordt quantifies bi-level coordination quality and λ represents coordination weighting.
  • Q-Learning Implementation: We implement Q-learning [24] with epsilon-greedy action selection:
    Q ( s t , a t ) Q ( s t , a t ) + α [ R t + γ max a Q ( s t + 1 , a ) Q ( s t , a t ) ]
    where α denotes learning rate, γ represents discount factor, st is current state, and at is selected action.

4.3.4. Adaptive Parameter Tuning for Bi-Level Coordination

Enhancing bi-level coordination performance requires implementing adaptive parameter tuning mechanisms:
  • Mathematical Optimizer Adaptation:
    M O A t + 1 = M O A b a s e + ϕ · sin ( 2 π · t T m a x ) · coordination_factor
    where ϕ controls adaptation intensity and coordination_factor reflects bi-level synchronization requirements.
  • Exploration-Exploitation Balance:
    α t + 1 = α m i n + ( α m a x α m i n ) · e ρ · t / T m a x
    M O P t + 1 = M O P b a s e · ( 1 + β · level_performance )
    where level_performance measures relative performance between upper and lower levels.
  • Bi-Level Coordination Enhancement:
    coord_weight t + 1 = c o o r d b a s e · ( 1 + γ · gap_ratio )
    where gap_ratio quantifies coordination gaps between production and distribution decisions.

4.4. Reinforcement Learning Components

Building upon the fundamental AOA framework, we integrate reinforcement learning methodologies to create an adaptive bi-level optimization system. The RL enhancement incorporates state representation, action space definition, and adaptive parameter control for tactical planning coordination.

4.4.1. State Space Discretization for Bi-Level Planning

Continuous optimization metrics for both hierarchical levels undergo discretization using the following strategy:
  • Upper-Level Diversity Measure:
    d i v e r s i t y d U L = Low if d i v e r s i t y U L 0.25 Medium if 0.25 < d i v e r s i t y U L 0.65 High if d i v e r s i t y U L > 0.65
  • Lower-Level Diversity Measure:
    d i v e r s i t y d L L = Low if d i v e r s i t y L L 0.30 Medium if 0.30 < d i v e r s i t y L L 0.70 High if d i v e r s i t y L L > 0.70
  • Bi-Level Gap Measure:
    bilevel_gap d = Synchronized if g a p 0.05 Moderate if 0.05 < g a p 0.20 Divergent if g a p > 0.20

4.4.2. Q-Learning for Bi-Level Parameter Adaptation

We implement Q-learning to adaptively adjust algorithm parameters based on bi-level optimization progress:
Q-learning facilitates algorithm learning of optimal parameter configurations for both production and distribution planning through iterative trial-and-error processes, utilizing temporal difference updates to enhance future decision-making capabilities. This mechanism, as shown in Algorithm 4, enables optimization process adaptation based on observed performance improvements across both levels.
Algorithm 4 Q-Learning Bi-Level Parameter Adaptation
  • Description: This algorithm implements temporal difference learning to update Q-values for bi-level parameter adaptation, enabling the system to learn optimal parameter settings based on tactical planning performance feedback from both the production and distribution levels.
  • Require: Current state st, action at, reward Rt, next state st+1
  • Require: Learning rate α, discount factor γ, Q-table Q, exploration rate ϵ
     1:
  Calculate temporal difference: δ = R t + γ max a Q ( s t + 1 , a ) Q ( s t , a t )
     2:
  Update Q-value: Q(st, at) ← Q(st, at) + α · δ
     3:
  Update bi-level coordination factor: coordfactorcoordfactor + β · δ
     4:
  Select next action using epsilon-greedy:
     5:
  if random() < ϵ then
     6:
   at+1 ← random action
     7:
  else
     8:
   at+1 ← arg maxa Q(st+1, a)
     9:
  end if
   10:
  return Updated Q-table, coordination factor, and next action at+1

4.4.3. Adaptive Parameter Control for Tactical Planning

The reinforcement learning component for tactical production-distribution planning, includes:
  • State Representation: s t = { d i v e r s i t y d U L , d i v e r s i t y d L L , convergence_rate d , s t a g n a t i o n d , bilevel_gap d } with 35 = 243 possible discrete states.
  • Action Space: Four discrete actions for bi-level parameter adjustment:
    • Action 1: Enhance exploration in both levels;
    • Action 2: Focus exploitation in production level;
    • Action 3: Balance search across levels;
    • Action 4: Strengthen inter-level coordination.
  • Reward Function: Bi-level improvement with coordination bonus.
  • Action Selection: Epsilon-greedy strategy [23] with adaptive exploration rate.
Algorithm 5 details how these reinforcement learning components are used to dynamically adjust the Arithmetic Optimization Algorithm parameters.
Algorithm 5 RL-Based Adaptive Parameter Control for Bi-Level AOA
  • Description: This algorithm dynamically adjusts Arithmetic Optimization Algorithm parameters based on reinforcement learning actions, optimizing exploration-exploitation balance for both the production and distribution planning levels.
  • Require: Current iteration t, RL action at, base parameters, bi-level gap
     1:
  Calculate base MOA: M O A b a s e = M i n + t × M a x M i n T m a x
     2:
  Calculate base MOP: M O P b a s e = 1 t 1 / α T m a x 1 / α
     3:
  if at = 1 then
     4:
   MOA = MOAbase × 0.7, MOP = MOPbase × 1.4
     5:
   {Enhance Exploration}
     6:
   α = 2.5, coord_weight = 0.3
     7:
  else if at = 2 then
     8:
   MOA = MOAbase × 1.3, MOP = MOPbase × 0.6
     9:
   {Focus Exploitation}
   10:
   α = 4.0, coord_weight = 0.2
   11:
  else if at = 3 then
   12:
   MOA = MOAbase, MOP = MOPbase
   13:
   {Balanced Search}
   14:
   α = 3.0, coord_weight = 0.4
   15:
  else
   16:
   MOA = MOAbase × 1.1, MOP = MOPbase × 0.8
   17:
   {Strengthen Coordination}
   18:
   α = 3.5, coord_weight = 0.6
   19:
  end if
   20:
  Update exploration rate: ϵt+1 = max(ϵmin, ϵmax · eσ·t/Tmax)
   21:
  return Updated parameters ( M O A , M O P , α , c o o r d _ w e i g h t , ϵ )

4.5. Solution Encoding and Decoding for Bi-Level Planning

Efficient solution representation is crucial for managing the complexity inherent in our bi-level tactical production-distribution planning model. We implement a hierarchical encoding scheme that effectively captures both binary and continuous decision variables across both planning levels.

Bi-Level Variable Encoding Strategy

For upper-level binary decision variables (Yd, Ult), we implement sigmoid-based transformation:
Y d = 1 , if σ ( z d U L ) 0.5 0 , otherwise
where σ ( z ) = 1 1 + e z represents the sigmoid function.
For lower-level binary decision variables (Vpmt, W p i p j m t ), we use:
V p m t = 1 , if σ ( z p m t L L ) 0.5 0 , otherwise
For continuous decision variables at both levels, we employ direct encoding with bound constraints:
v a r U L / L L = l b v a r + ( u b v a r l b v a r ) × n o r m a l i z e ( z v a r )
To handle bi-level dependencies between decision variables:
Q p m t = Q p m t , if Y d = 1 and V p m t = 1 0 , otherwise
The solution vectors in our bi-level optimization model are encoded and decoded using Algorithm 6, which transforms continuous metaheuristic solutions into mixed-integer variables while preserving hierarchical dependencies between upper and lower decision levels.
Algorithm 6 Bi-Level Solution Encoding and Decoding
  • Description: This algorithm transforms continuous solution vectors into mixed-integer solutions for both production and distribution levels by applying sigmoid functions for binary variables and enforcing hierarchical dependencies between bi-level decision variables.
  • Require: Continuous solution vector z = [zUL, zLL]
     1:
  Upper-Level Binary Variable Decoding:
     2:
  for each upper-level binary variable Yd, Ult do
     3:
    Y d = I [ σ ( z d U L ) 0.5 ] , U l t = I [ σ ( z l t U L ) 0.5 ]
     4:
  end for
     5:
  Lower-Level Binary Variable Decoding:
     6:
  for each lower-level binary variable Vpmt, W p i p j m t  do
     7:
    V p m t = I [ σ ( z p m t L L ) 0.5 ] , W p i p j m t = I [ σ ( z p i p j m t L L ) 0.5 ]
     8:
  end for
     9:
  Continuous Variable Decoding:
   10:
  for each continuous variable var at both levels do
   11:
   var = lbvar + (ubvarlbvar) · normalize(zvar)
   12:
  end for
   13:
  Bi-Level Dependency Enforcement:
   14:
  for each dependent variable pair across levels do
   15:
   if upper-level parent variable is 0 then
   16:
    Set dependent lower-level variables to 0
   17:
   end if
   18:
  end for
   19:
  return Decoded bi-level solution x = [xUL, xLL]

4.6. Constraint Handling Mechanisms for Bi-Level Planning

We implement a hierarchical penalty function approach to handle bi-level dependent-chance constraints:
F ( x ) = M U L · j = 1 m U L λ j U L ( t ) · max ( 0 , β j C h { g j U L ( x U L ) 0 } ) + M L L · j = 1 m L L λ j L L ( t ) · max ( 0 , β j C h { g j L L ( x L L ) 0 } ) + g = 1 4 w g · d g
where MUL and MLL represent large penalty multipliers for upper- and lower-level constraints, respectively, ensuring constraint violations dominate goal deviations, and λ j U L ( t ) , λ j L L ( t ) are penalty parameters that increase over iterations:
λ j U L ( t ) = λ j U L , 0 · ( 1 + δ U L · t ) , λ j L L ( t ) = λ j L L , 0 · ( 1 + δ L L · t )
For dependent-chance goals requiring coordination between levels, we implement progressive tightening:
β j ( t ) = β j f i n a l · ( 1 e κ · t / T m a x )
Our optimization model employs Algorithm 7 to handle the complex chance constraints across both planning levels, implementing a dynamic penalty mechanism that evaluates dependent-chance goals while preserving the hierarchical structure between production and distribution decisions.
Algorithm 7 Dynamic Penalty Constraint Handling for Bi-Level Planning
  • Description: This algorithm evaluates solution fitness by computing dependent-chance goals and constraints for both the production and distribution levels, applying hierarchical penalty functions to handle constraint violations while preserving lexicographic priority and bi-level coordination.
  • Require: Bi-level solution x = [xUL, xLL], iteration t, penalty parameters
     1:
  Evaluate Dependent-Chance Goals:
     2:
  for each goal g = 1 to 4 do
     3:
   Calculate Ch{FUL + FLLTargetg} using Algorithm 2
     4:
   Calculate deviation: d g = max ( 0 , α g C h { goal_achievement } )
     5:
  end for
     6:
  Evaluate Upper-Level Chance Constraints:
     7:
  for each upper-level constraint j = 1 to mUL do
     8:
   Calculate C h { g j U L ( x U L ) 0 } using Algorithm 2
     9:
   Calculate violation: v i o l j U L = max ( 0 , β j C h { g j U L ( x U L ) 0 } )
   10:
  end for
   11:
  Evaluate Lower-Level Chance Constraints:
   12:
  for each lower-level constraint j = 1 to mLL do
   13:
   Calculate C h { g j L L ( x L L ) 0 } using Algorithm 2
   14:
   Calculate violation: v i o l j L L = max ( 0 , β j C h { g j L L ( x L L ) 0 } )
   15:
  end for
   16:
  Compute Hierarchical Penalized Fitness:
   17:
   F p e n a l t y ( x ) = M U L · j = 1 m U L λ j U L ( t ) · v i o l j U L + M L L · j = 1 m L L λ j L L ( t ) · v i o l j L L + g = 1 4 w g · d g
   18:
  Update Penalty Parameters:
   19:
  for each constraint j at both levels do
   20:
    λ j U L ( t ) = λ j U L , 0 · ( 1 + δ U L · t )
   21:
    λ j L L ( t ) = λ j L L , 0 · ( 1 + δ L L · t )
   22:
  end for
   23:
  return Penalized fitness Fpenalty(x)

4.7. Complete Hybrid Intelligent Algorithm

The comprehensive hybrid intelligent algorithm is structured in a modular framework comprising six specialized components designed for bi-level tactical planning coordination.
Proper initialization proves crucial for bi-level optimization success. This algorithm establishes populations for both levels, configures all learning components, and sets up the parameter framework that guides the entire tactical planning optimization process. The hybrid optimization process begins with Algorithm 8.
Algorithm 8 Bi-Level Algorithm Initialization
  • Description: This algorithm initializes populations for both production and distribution planning levels, Q-learning components, and all algorithm parameters, establishing the foundation for the hybrid bi-level optimization process.
   1:
  Initialize upper-level population of N p U L solutions randomly in feasible region
   2:
  Initialize lower-level population of N p L L solutions randomly in feasible region
   3:
  Initialize Q-table for reinforcement learning with small random values
   4:
  Set RL parameters: α = 0.1, γ = 0.9, ϵstart = 0.9, ϵend = 0.1
   5:
  Set AOA parameters: Min = 0.2, Max = 1.0, α = 5.0, μ = 0.5
   6:
  Set penalty parameters: λ j U L , 0 = 1.0 , λ j L L , 0 = 1.0 , δUL = 0.1, δLL = 0.1
   7:
  Evaluate initial bi-level solutions using Algorithm 7
   8:
  Identify global best solutions x g b e s t U L , x g b e s t L L and initialize f b e s t U L , f b e s t L L
   9:
  Initialize stagnation counters: stagUL = 0, stagLL = 0
Effective reinforcement learning for bi-level optimization requires continuous monitoring of both production and distribution planning states. Algorithm 9 captures key performance indicators for both levels and implements intelligent action selection for coordinated parameter adjustment.
Algorithm 9 RL State Observation and Action Selection for Bi-Level Planning
  • Description: This algorithm observes the current optimization state for both the production and distribution levels, calculates key performance metrics, and selects actions using an epsilon-greedy strategy to balance exploration and exploitation in bi-level parameter adaptation.
     1:
  Calculate upper-level diversity: d i v e r s i t y U L = 1 N p U L i = 1 N p U L x i U L x g b e s t U L
     2:
  Calculate lower-level diversity: d i v e r s i t y L L = 1 N p L L i = 1 N p L L x i L L x g b e s t L L
     3:
  Calculate convergence rate: c o n v r a t e = ( f b e s t U L + f b e s t L L ) t 5 ( f b e s t U L + f b e s t L L ) t 5
     4:
  Calculate bi-level gap: g a p = | coordination_measure U L coordination_measure L L | coordination_measure U L + coordination_measure L L
     5:
  Define state: st = {diversityUL, diversityLL, convrate, stag, gap}
     6:
  Update ϵ: ϵ = max(ϵend, ϵstart · 0.995t)
     7:
  if random() < ϵ then
     8:
   Select random action at ∈ {1, 2, 3, 4}
     9:
  else
   10:
   at = arg maxaQ(st, a)
   11:
  end if
   12:
  Update AOA parameters using Algorithm 5
The core of the Arithmetic Optimization Algorithm lies in its mathematical-operation-based position updates. This algorithm implements the exploration and exploitation phases for both production and distribution planning levels through coordinated arithmetic operations.
Solution improvement and learning coordination are essential for bi-level optimization success. This algorithm manages global best solutions for both levels, quantifies performance improvements, and facilitates Q-learning for adaptive parameter coordination.
Managing algorithm convergence and coordination between levels is critical for bi-level optimization quality. Algorithm 10 monitors progress for both production and distribution planning, implements coordination strategies, and ensures robust termination.
Algorithm 10 Bi-Level Arithmetic Optimization Position Update
  • Description: This algorithm updates positions for both upper-level (distribution) and lower-level (production) populations using arithmetic operations, coordinating exploration and exploitation across both planning levels.
     1:
  Calculate M O A = M i n + t × M a x M i n T m a x
     2:
  Calculate M O P = 1 t 1 / α T m a x 1 / α
     3:
  Update Upper-Level Population (Distribution):
     4:
  for i = 1 to N p U L  do
     5:
   for j = 1 to DUL do
     6:
    Generate r1, r2, r3U(0, 1)
     7:
    if r1 > MOA then
     8:
     {Exploration Phase}
     9:
     if r2 < 0.5 then
   10:
       x i , j U L , t + 1 = b e s t j U L + M O P × ( ( U B j U L L B j U L ) × μ + L B j U L )
   11:
     else
   12:
       x i , j U L , t + 1 = b e s t j U L M O P × ( ( U B j U L L B j U L ) × μ + L B j U L )
   13:
     end if
   14:
    else
   15:
     {Exploitation Phase}
   16:
     if r3 < 0.5 then
   17:
       x i , j U L , t + 1 = b e s t j U L × M O P × ( ( U B j U L L B j U L ) × μ + L B j U L )
   18:
     else
   19:
       x i , j U L , t + 1 = b e s t j U L ÷ M O P × ( ( U B j U L L B j U L ) × μ + L B j U L )
   20:
     end if
   21:
    end if
   22:
   end for
   23:
  end for
   24:
  Update Lower-Level Population (Production):
   25:
  for i = 1 to N p L L  do
   26:
   for j = 1 to DLL do
   27:
    Generate r1, r2, r3U(0, 1)
   28:
    Apply similar arithmetic operations as upper level with coordination factor
   29:
     x i , j L L , t + 1 = arithmetic_operation ( b e s t j L L , coordination_factor × x i , j U L , t + 1 )
   30:
   end for
   31:
  end for
   32:
  Apply boundary constraints and decode solutions using Algorithm 6
   33:
  Evaluate fitness using Algorithm 7
The main execution framework, implemented in Algorithm 11, orchestrates all bi-level algorithm components in a coordinated manner. This algorithm provides the high-level control structure that manages the iterative optimization process, coordinates the interaction between AOA and RL components across both levels, and ensures proper termination.
The comprehensive methodology ensures the development of a hybrid tactical planning framework that combines uncertain random simulation, reinforcement learning for adaptive coordination, and bi-level dependent-chance goal programming for multi-objective optimization. The approach is specifically designed to operate in hybrid uncertain environments where both data-driven and expert-based information guide tactical production-distribution decisions in paper manufacturing. This modular structure ensures the framework is scalable, interpretable, and adaptable to various tactical planning scenarios while maintaining coordination between production scheduling and distribution allocation decisions.
Algorithm 11 Main RL-Enhanced AOA Bi-Level Execution Framework
  • Description: This is the main execution framework that orchestrates all bi-level algorithm components, coordinating the Reinforcement-Learning-enhanced Arithmetic Optimization Algorithm process for tactical production-distribution planning from initialization to termination.
     1:
  Execute Algorithm 8 for bi-level system initialization
     2:
  for t = 1 to Tmax do
     3:
   Execute Algorithm 9 for RL state observation and action selection
     4:
   Execute Algorithm 10 for bi-level arithmetic optimization position updates
     5:
   Execute Algorithm 12 for bi-level solution updates and RL learning
     6:
   Execute Algorithm 13 for stagnation management and termination check
     7:
   if Algorithm 13 returns True then
     8:
    break {Bi-level optimization completed}
     9:
   end if
   10:
  end for
   11:
  return Optimal bi-level solution x g b e s t = [ x g b e s t U L , x g b e s t L L ] with performance metrics
Algorithm 12 Bi-Level Solution Update and RL Learning
  • Description: This algorithm updates global best solutions for both production and distribution levels, calculates rewards based on bi-level fitness improvement, and performs Q-learning updates to enhance future parameter selection decisions.
     1:
   f p r e v U L = f b e s t U L , f p r e v L L = f b e s t L L
     2:
  Update Upper-Level Solutions:
     3:
  for each upper-level solution i do
     4:
   if  f U L ( x i U L ) < f U L ( x g b e s t U L )  then
     5:
     x g b e s t U L = x i U L , f b e s t U L = f U L ( x i U L )
     6:
   end if
     7:
  end for
     8:
  Update Lower-Level Solutions:
     9:
  for each lower-level solution i do
   10:
   if  f L L ( x i L L ) < f L L ( x g b e s t L L )  then
   11:
     x g b e s t L L = x i L L , f b e s t L L = f L L ( x i L L )
   12:
   end if
   13:
  end for
   14:
  Calculate bi-level reward: R t = ( f p r e v U L + f p r e v L L ) ( f b e s t U L + f b e s t L L ) f p r e v U L + f p r e v L L + 10 8 + λ · coordination_bonus
   15:
  Observe next state st+1
   16:
  Update Q-table using Algorithm 4
After each position update, Algorithm 12 tracks solution improvements and provides feedback for reinforcement learning, ensuring continuous adaptation of the bi-level optimization process.
To prevent premature convergence and ensure solution quality, Algorithm 13 implements intelligent stagnation detection, strategic population reinitialization, and robust termination criteria for the bi-level optimization process.
Algorithm 13 Bi-Level Stagnation Management and Termination
  • Description: This algorithm monitors optimization progress for both levels, handles stagnation through coordinated reinitialization, progressively tightens constraints, and checks termination criteria for bi-level convergence.
     1:
  if  f b e s t U L = = f p r e v U L AND f b e s t L L = = f p r e v L L  then
     2:
   stag = stag + 1
     3:
  else
     4:
   stag = 0
     5:
  end if
     6:
  if stag > 0.15 × Tmax then
     7:
   Reinitialize worst 30% of upper-level population randomly
     8:
   Reinitialize worst 30% of lower-level population with coordination to upper level
     9:
   Reset: stag = 0
   10:
  end if
   11:
  Progressive Constraint Tightening:
   12:
  for each chance constraint j at both levels do
   13:
    β j ( t ) = β j f i n a l · ( 1 e κ · t / T m a x )
   14:
  end for
   15:
  Check Bi-Level Convergence:
   16:
  Calculate coordination measure: c o o r d = | obj_improvement U L obj_improvement L L | obj_improvement U L + obj_improvement L L
   17:
  if convergence criteria met OR t == Tmax OR coord < threshold then
   18:
   Evaluate x g b e s t U L , x g b e s t L L with high-precision chance simulation (NMC = 5000)
   19:
   Verify constraint satisfaction and goal achievement for both levels
   20:
   return True {Termination signal}
   21:
  end if
   22:
  return False {Continue optimization}

5. Numerical Results and Analysis

This section presents comprehensive numerical experiments to evaluate the performance of our proposed Reinforcement-Learning-enhanced Arithmetic Optimization Algorithm (RL-AOA) for solving the bi-level dependent-chance goal programming model for tactical production–distribution planning in paper manufacturing under hybrid uncertainty.

5.1. Experimental Setup

5.1.1. Algorithm Parameter Settings

Algorithm parameters were determined through systematic preliminary experiments using the Taguchi method for parameter optimization. The final RL-AOA parameters are presented in Table 8.

5.1.2. Parameter Calibration Methodology

The algorithm parameters in Table 8 were calibrated using a systematic two-phase approach:
Phase 1: Taguchi Orthogonal Array Design
We employed an L16(45) orthogonal array to evaluate key parameters:
  • Population sizes: N p U L { 60 , 80 , 100 , 120 } , N p L L { 40 , 60 , 80 , 100 } ;
  • Learning parameters: α0 ∈ {0.1, 0.15, 0.2, 0.25};
  • Exploration bounds: ϵmax ∈ {0.6, 0.8, 0.9, 1.0};
  • Penalty factors: λ j U L , 0 { 100 , 150 , 200 , 250 } .
Phase 2: Fine-tuning Validation
Selected parameters underwent validation across three representative instances (Small-1, Medium-1, Large-1) with 15 independent runs each. The Signal-to-Noise ratio analysis confirmed optimal settings, i.e., N p U L = 80 (S/N = 24.3 dB) and α0 = 0.15 (S/N = 22.8 dB), achieving a 12.4% improvement over default values.

5.1.3. Test Instance Characteristics

To evaluate the algorithm comprehensively, we generated a diverse set of paper manufacturing test instances with varying network sizes, complexity levels, and uncertainty characteristics. Table 9 summarizes the key characteristics of these instances.
These instances represent paper manufacturing networks of varying complexity, ranging from small regional operations to very large international paper manufacturing facilities. Each instance incorporates specific characteristics reflecting different operational contexts:
  • Small instances: Regional paper mills with limited product portfolio.
  • Medium instances: Multi-grade paper manufacturing with moderate complexity.
  • Large instances: Multi-facility paper manufacturing networks with diverse product lines.
  • Very Large instance: Global paper manufacturing network with maximum complexity.

5.1.4. Uncertain Random Parameter Specifications

The hybrid uncertainty in our paper manufacturing model is represented through uncertain random parameters following different distributions. Table 10 presents the uncertain random parameter settings used in our computational experiments.
The hybrid uncertain random parameters were generated following a rigorous two-stage calibration process:
  • Random Component Calibration: Historical data from five major paper manufacturing companies in North America and Europe were analyzed to determine appropriate probability distributions and their parameters.
  • Uncertain Component Calibration: Expert elicitation sessions with 12 paper industry professionals were conducted to establish uncertainty bounds, representing epistemic uncertainty in tactical planning parameter estimation.

5.1.5. Computational Environment

All experiments were conducted on a workstation equipped with Intel Core i9-12900K processor and 32GB RAM, implemented in MATLAB R2023b. Each test instance was solved 30 times with different random seeds to ensure the statistical significance of the results.

5.2. Bi-Level Dependent-Chance Goal Programming Results

The core contribution of our approach lies in optimizing probability achievement for tactical planning goals through lexicographic bi-level dependent-chance programming. We define target probability levels based on paper industry benchmarks and operational requirements: Economic Efficiency (α1 = 0.90), Operational Performance (α2 = 0.85), Resource Utilization (α3 = 0.80), and Quality Assurance (α4 = 0.90).

5.2.1. Probability Achievement Analysis

Table 11 presents the achieved probability levels for each tactical planning dimension across all test instances, with 95% confidence intervals computed using bootstrap resampling (1000 replications). Figure 1 illustrates the tactical planning goal achievement with confidence intervals, demonstrating the systematic performance trends across different problem scales.
The results demonstrate consistent performance across all tactical planning objectives, with economic efficiency and quality assurance achieving the highest rates. As shown in Figure 1, there is a clear downward trend in probability achievement as problem size increases, with all objectives showing statistically significant gaps from their targets (paired t-test, p < 0.01 for all objectives). This confirms the increasing challenge of simultaneously achieving all tactical planning goals under hybrid uncertainty as network complexity grows.

5.2.2. Bi-Level Coordination Analysis

Table 12 shows the coordination effectiveness between upper-level distribution decisions and lower-level production decisions across iterations. Figure 2 illustrates the convergence behavior of the bi-level optimization process, demonstrating how both upper- and lower-level costs converge simultaneously while maintaining coordination balance.
As illustrated in Figure 2, the coordination gap decreases systematically as the bi-level algorithm progresses, with both distribution and production costs converging to stable values. The coordination gap increases with problem complexity, indicating that the bi-level structure becomes more necessary for complex tactical planning problems. The convergence times are reasonable across all instances, with the largest problem requiring 61 iterations on average.

5.3. Paper Manufacturing Instance Details

To demonstrate the practicality of our approach, we present detailed results for a representative medium-sized paper manufacturing instance (Medium-1), as shown in Table 9. This instance includes 8 paper machines, 5 warehouses, 8 distribution centers, 15 customers, 5 paper grades, and a 4-period tactical planning horizon.

5.3.1. Paper Machine and Grade Information

Table 13 presents the production and setup costs for different paper grades on each machine, while Table 14 shows the machine capacities for the Medium-1 instance.

5.3.2. Customer Demand Information

Table 15 presents stochastic demand parameters and service requirements for different customer segments in the Medium-1 instance.

5.3.3. Baseline Algorithm Specifications

The comparative algorithms were adapted for bi-level optimization as follows:
  • Bi-GA: Two-population genetic algorithm with tournament selection (size 3), single-point crossover (pc = 0.8), and uniform mutation (pm = 0.05). Upper-level population = 80, lower-level population = 60.
  • Bi-PSO: Hierarchical particle swarm optimization with w = 0.9 → 0.4 (linear decrease), c1 = c2 = 2.0, velocity clamping at ± 0.2× search range.
  • Bi-DE: Differential evolution with DE/rand/1/bin strategy, F = 0.5, CR = 0.8, coordinated through best solution sharing between levels every 10 iterations.
  • Bi-SSO: Social spider optimization with female ratio = 0.65, vibration attenuation ra = 1.0, bi-level coordination through pheromone information exchange.
All algorithms used identical constraint handling (dynamic penalty), chance constraint evaluation (8000 Monte Carlo samples), and termination criteria (400 iterations) for fair comparison.

5.4. Algorithm Performance Comparison

We compare our RL-AOA with several state-of-the-art metaheuristics adapted for bi-level optimization. Table 16 presents comprehensive performance metrics.
The results demonstrate that RL-AOA consistently outperforms other bi-level optimization approaches, achieving 3.2–7.8% improvement in solution quality and 18.5% reduction in computational time compared to the best competing algorithm. The convergence behavior is illustrated in Figure 2, which shows that RL-AOA achieves faster convergence and better final solutions compared to traditional metaheuristics.

5.5. Empirical Validation of Balance Strategy

To validate our exploration–exploitation trade-off strategy, we conducted comprehensive analysis of RL-AOA balance evolution. Figure 3 illustrates the dynamic adaptation capability.
Key empirical findings:
  • Optimal Balance Progression: Systematic transition from 25% exploration to 2% exploitation over 400 iterations.
  • Stable Balance Quality: Incremental–decremental analysis shows consistent balance near zero.
  • Adaptive Capability: Algorithm successfully adapts exploration–exploitation ratio based on optimization progress.
  • Convergence Stability: Balance evolution demonstrates stable convergence without premature stagnation.

5.6. Reinforcement Learning Component Analysis

We analyze the effectiveness of the RL component in adaptive parameter tuning for bi-level coordination. Table 17 shows action selection patterns across different optimization phases.
The RL component demonstrates intelligent adaptive behavior, emphasizing exploration and coordination in early phases while shifting to exploitation and refined coordination in later iterations, contributing to a 5.2% average performance improvement over static parameter settings.

5.7. Sensitivity Analysis

We conducted comprehensive sensitivity analysis to evaluate the robustness of our bi-level dependent-chance goal programming approach under varying parameter settings.

5.7.1. Confidence Level Impact

Table 18 presents the detailed results of confidence level sensitivity analysis for the bi-level model.
The analysis shows how increasing confidence levels affect both cost and goal achievement, demonstrating the trade-off between reliability and cost efficiency in tactical paper manufacturing planning.

5.7.2. Production–Distribution Coordination Analysis

Table 19 analyzes the coordination effectiveness between the production and distribution levels under different uncertainty scenarios.

5.8. Paper Manufacturing Specific Analysis

5.8.1. Paper Grade Production Allocation

Table 20 shows the optimal production allocation across different paper grades and planning periods for the Medium-1 instance. Figure 4 provides a visual representation of the production distribution across planning periods, highlighting the balanced allocation strategy employed by our optimization framework.
As illustrated in Figure 4, the production allocation strategy demonstrates effective balance across different paper grades and planning periods. Grade G4 (Specialty) shows the highest total production volume (1600 tons), followed by Grade G1 (Packaging) with 1252 tons, reflecting market demand patterns and profitability considerations. The temporal distribution shows relatively stable production across periods, with some variation to accommodate demand fluctuations and machine availability constraints.

5.8.2. Machine Changeover Analysis

Table 21 presents the changeover patterns and their impact on production efficiency.
The analysis shows the trade-off between production flexibility and efficiency, with machines experiencing 4–8 changeovers per planning horizon and maintaining an average utilization of 87.0% despite changeover penalties.

5.9. Cost–Benefit Analysis for Paper Manufacturing

Table 22 presents the estimated costs and benefits of implementing the bi-level DCGP framework in paper manufacturing environments. Figure 5 illustrates the return on investment analysis across different mill sizes, demonstrating favorable economic prospects for implementation.
As demonstrated in Figure 5, the expected return on investment is favorable across all mill sizes, with the 3-year ROI ranging from 35 to 55%. The payback periods are particularly attractive, ranging from 8 to 15 months, with larger mills achieving faster payback due to economies of scale. The analysis indicates that medium and large mills benefit most from implementation, with annual savings significantly exceeding implementation costs.

5.10. Target Probability Level (αg) Impact on Lexicographic Feasibility

The lexicographic optimization structure requires careful consideration of how target probability levels affect feasibility across priority levels. Table 23 analyzes the impact of varying αg on goal achievement and feasibility.
Critical insights:
  • Feasibility Threshold: Target levels above αg = 0.85 create significant feasibility issues;
  • Cascade Effects: Higher-priority goals progressively constrain lower-priority goals (0.045 to 0.167);
  • Current Settings Validation: Our αg values operate near feasibility boundary, explaining observed gaps.

5.11. Expert Opinion Impact and Sensitivity Analysis

Expert opinions define epistemic uncertainty bounds (alpha-cuts) for uncertain random variables, directly influencing chance measure calculations and goal achievement probabilities. Figure 6 presents comprehensive sensitivity analysis examining how expert-defined uncertainty ranges affect model performance.
The sensitivity analysis reveals critical insights:
  • Quantified Sensitivity Range: Goal achievement probabilities vary within ±3.0% bounds across expert opinion scenarios. Economic efficiency shows ±0.9%, operational performance ±0.8%, resource utilization ±2.5%, and quality assurance ±1.3% variations.
  • Differential Impact Analysis: Resource utilization goals demonstrate 2.8× higher sensitivity than economic efficiency, reflecting operational complexity under epistemic uncertainty. This identifies critical areas requiring careful expert calibration.
  • Robustness Validation: The bounded sensitivity range (maximum 3.0%) confirms framework stability. Even with ±100% variation in expert uncertainty estimates (from ±5% to ±30%), goal achievement varies by less than 3.5%, validating practical applicability.
  • Optimal Calibration Confirmation: Current ±15% uncertainty ranges represent optimal balance, positioned at the inflection point where sensitivity transitions from steep (conservative side) to gradual (optimistic side), maximizing information value while maintaining stability.
These findings demonstrate that while expert opinions measurably influence outcomes, the framework maintains robust performance (coefficient of variation < 0.035) across reasonable uncertainty variations. The moderate sensitivity validates our approach for industrial application without requiring excessive precision in expert assessments, supporting practical implementation in paper manufacturing environments where expert consensus may vary.

5.12. Comparative Analysis: Deterministic vs. Hybrid Uncertainty Framework

To demonstrate the added value of hybrid uncertainty modeling, we compared four approaches: deterministic baseline, stochastic-only, fuzzy-only, and our hybrid framework. Figure 7 presents a comprehensive comparative analysis of these approaches.
Key findings:
  • Cost Realism: The hybrid framework shows a 17.8% cost increase over deterministic baseline;
  • Goal Achievement Stability: Deterministic approaches exhibit overoptimistic variability while hybrid maintains stable probabilities;
  • Single-Paradigm Limitations: Stochastic-only and fuzzy-only fail to capture complete uncertainty spectrum.
This analysis demonstrates when hybrid uncertainty complexity is justified for tactical planning decisions.

5.13. Managerial Insights and Implementation Guidelines

Based on our comprehensive analysis, we provide actionable insights for paper manufacturing tactical planning:
  • Tactical Goal Probability Setting:
    • Economic efficiency: Target 87–90% (achievable with moderate resource allocation);
    • Operational performance: Target 82–85% (balances service level with operational flexibility);
    • Resource utilization: Target 76–80% (realistic given machine compatibility constraints);
    • Quality assurance: Target 86–90% (essential for paper grade specifications).
  • Implementation Roadmap for Paper Manufacturing:
    • Phase 1: Implement production scheduling optimization (2–4 months)
    • Phase 2: Integrate distribution coordination (4–6 months)
    • Phase 3: Add quality and changeover optimization (6–8 months)
    • Expected operational improvement: 15–25% within 12 months
  • Machine-Grade Compatibility Management:
    • Maintain compatibility matrices for all machine-grade combinations;
    • Optimize changeover sequences to minimize setup times and costs;
    • Consider machine specialization for high-volume paper grades;
    • Update efficiency parameters monthly based on production data.
  • Uncertainty Management in Paper Manufacturing:
    • Monitor demand patterns by paper grade and customer type monthly;
    • Conduct quarterly expert assessments for market trend uncertainties;
    • Maintain confidence levels between 0.82 and 0.87 for optimal performance;
    • Implement adaptive inventory policies based on demand uncertainty;
    • Dynamic Shelf Life Modeling: The current model assumes fixed shelf-life periods for computational tractability. Future extensions could incorporate storage-condition-dependent deterioration models, particularly valuable for moisture-sensitive paper grades where humidity and temperature significantly affect product quality over time.

5.14. Limitations and Future Research Directions

While our bi-level DCGP framework demonstrates significant improvements for paper manufacturing tactical planning, several limitations should be acknowledged:
  • Real-World Validation: Future work should include partnerships with paper mills for comprehensive industrial validation across different paper grades and manufacturing processes.
  • Dynamic Grade Specifications: The current model assumes static quality requirements; extensions should incorporate evolving customer specifications and regulatory changes.
  • Multi-Mill Coordination: Framework extension to coordinate multiple paper mills within a corporate network represents a promising research direction.
  • Environmental Regulations: Integration of time-varying environmental constraints and carbon pricing mechanisms would enhance practical applicability.
  • Supply Chain Resilience: Incorporation of supply disruption scenarios and recovery strategies would improve robustness.
The proposed framework provides a solid foundation for tactical production–distribution planning in paper manufacturing, offering both theoretical contributions and practical implementation guidelines for industry adoption. The results demonstrate that the RL-AOA with bi-level dependent-chance goal programming can effectively handle the complexity of hybrid uncertainty while maintaining computational efficiency for real-world applications.

6. Conclusions

This research developed a bi-level dependent-chance goal programming framework for tactical production–distribution planning in paper manufacturing under hybrid uncertainty. The framework successfully coordinates upper-level distribution decisions with lower-level production decisions while handling both aleatory randomness and epistemic uncertainty through uncertain random theory. The Reinforcement Learning-enhanced Arithmetic Optimization Algorithm achieved 3.2–7.8% improvements in solution quality and 18.5% computational time reduction compared to existing bi-level optimization methods.
The dependent-chance goal programming formulation effectively balanced four tactical objectives, achieving average probability levels of 87.0% for economic efficiency, 81.7% for operational performance, 76.5% for resource utilization, and 86.1% for quality assurance. The bi-level coordination mechanism maintained average coordination gaps of 4.5% across all test instances, demonstrating effective integration between production and distribution planning levels. The reinforcement learning component provided intelligent adaptive behavior, contributing to a 5.2% performance improvement over static parameter configurations.
Computational experiments across seven test instances validated the framework’s scalability and practical applicability for paper manufacturing operations. Industry-specific analysis confirmed effective handling of machine-grade compatibility constraints, sequence-dependent changeovers, and quality requirements while maintaining 87.0% average machine utilization. Cost–benefit analysis indicates favorable economic prospects with payback periods of 8–15 months and 35–55% three-year return on investment.
The framework provides both theoretical contributions to bi-level optimization under hybrid uncertainty and practical tools for tactical planning in paper manufacturing. Future research should focus on real-world validation, dynamic uncertainty modeling, multi-mill coordination, and integration of environmental constraints. The approach demonstrates that sophisticated optimization techniques can successfully address complex industrial planning problems while maintaining computational efficiency and providing actionable insights for hierarchical decision-making under uncertainty.

Author Contributions

Conceptualization, Y.B. and R.B.; Methodology, Y.B., R.B. and A.B.; Software, Y.B.; Validation, Y.B. and N.R.; Formal Analysis, Y.B. and N.R.; Investigation, Y.B. and R.B.; Resources, R.B. and O.B.; Data Curation, Y.B. and N.R.; Writing—Original Draft Preparation, Y.B.; Writing—Review and Editing, Y.B., R.B., A.B., N.R., O.B. and F.F.; Visualization, Y.B.; Supervision, R.B., A.B. and F.F.; Project Administration, R.B. and O.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Al-Khwarizmi Programme, a collaborative effort between the National Center for Scientific and Technical Research (CNRST), the Agency for Digital Development (ADD), and the Moroccan Ministry of Higher Education.

Data Availability Statement

The data are contained within the article.

Acknowledgments

The authors express their gratitude to the editors and reviewers for their valuable comments and constructive suggestions regarding the revision of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Andersson, J.; Marklund, J. Decentralized inventory control in a two-level distribution system. Eur. J. Oper. Res. 2000, 127, 483–506. [Google Scholar] [CrossRef]
  2. Santos, M.O.; Almada-Lobo, B. Integrated pulp and paper mill planning and scheduling. Comput. Ind. Eng. 2012, 63, 1–12. [Google Scholar] [CrossRef]
  3. Yang, X.S.; Deb, S.; Fong, S. Metaheuristic algorithms: Optimal balance of intensification and diversification. Appl. Math. Inf. Sci. 2014, 8, 977–983. [Google Scholar] [CrossRef]
  4. Holik, H. Handbook of Paper and Board; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
  5. Lalitha, J.L.; Mohan, N.; Pillai, V.M. Lot streaming in [N-1](1) + N (m) hybrid flow shop. J. Manuf. Syst. 2017, 44, 12–21. [Google Scholar] [CrossRef]
  6. Maravelias, C.T.; Grossmann, I.E. New general continuous-time state-task network formulation for short-term scheduling of multipurpose batch plants. Ind. Eng. Chem. Res. 2005, 44, 9695–9707. [Google Scholar]
  7. Fleischmann, B.; Meyr, H.; Wagner, M. Advanced planning. In Supply Chain Management and Advanced Planning; Springer: Berlin/Heidelberg, Germany, 2005; pp. 81–106. [Google Scholar]
  8. Papageorgiou, L.G. Supply chain optimisation for the process industries: Advances and opportunities. Comput. Chem. Eng. 2001, 25, 1121–1137. [Google Scholar] [CrossRef]
  9. Stadtler, H.; Kilger, C.; Meyr, H. Supply Chain Management and Advanced Planning: Concepts, Models, Software, and Case Studies, 5th ed.; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
  10. Ben-Tal, A.; El Ghaoui, L.; Nemirovski, A. Robust Optimization; Princeton University Press: Princeton, NJ, USA, 2009. [Google Scholar]
  11. Birge, J.R.; Louveaux, F. Introduction to Stochastic Programming, 2nd ed.; Springer: New York, NY, USA, 2011. [Google Scholar]
  12. Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
  13. Sahinidis, N.V. Optimization under uncertainty: State-of-the-art and opportunities. Comput. Chem. Eng. 2004, 28, 971–983. [Google Scholar] [CrossRef]
  14. Colson, B.; Marcotte, P.; Savard, G. An overview of bilevel optimization. Ann. Oper. Res. 2007, 153, 235–256. [Google Scholar] [CrossRef]
  15. Tamiz, M.; Jones, D.; Romero, C. Goal programming for decision making: An overview of the current state-of-the-art. Eur. J. Oper. Res. 1998, 111, 569–581. [Google Scholar] [CrossRef]
  16. U.S. Environmental Protection Agency. Industrial Emissions Standards and Compliance Guidelines; EPA Office of Air and Radiation: Washington, DC, USA, 2023.
  17. Intergovernmental Panel on Climate Change. 2019 Refinement to the 2006 IPCC Guidelines for National Greenhouse Gas Inventories; IPCC: Geneva, Switzerland, 2019. [Google Scholar]
  18. Liu, B. Uncertainty Theory, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  19. Rubinstein, R.Y.; Kroese, D.P. Simulation and the Monte Carlo Method, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
  20. Liu, B. Uncertainty Theory: A Branch of Mathematics for Modeling Human Uncertainty, 4th ed.; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
  21. Morales-Castañeda, B.; Zaldívar, D.; Cuevas, E.; Fausto, F.; Rodríguez, A. A better balance in metaheuristic algorithms: Does it exist? Swarm Evol. Comput. 2020, 54, 100671. [Google Scholar] [CrossRef]
  22. Abualigah, L.; Diabat, A.; Mirjalili, S.; Abd Elaziz, M.; Gandomi, A.H. The Arithmetic Optimization Algorithm. Comput. Methods Appl. Mech. Eng. 2021, 376, 113609. [Google Scholar] [CrossRef]
  23. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
  24. Watkins, C.J.C.H.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
Figure 1. Tactical planning goal achievement with 95% confidence intervals. Shows decreasing probability achievement across all objectives as problem complexity increases, with economic efficiency and quality assurance maintaining highest performance levels.
Figure 1. Tactical planning goal achievement with 95% confidence intervals. Shows decreasing probability achievement across all objectives as problem complexity increases, with economic efficiency and quality assurance maintaining highest performance levels.
Symmetry 17 01624 g001
Figure 2. Bilevel optimization convergence process for production–distribution planning. Demonstrates coordinated convergence of upper-level (distribution) and lower-level (production) costs, with the coordination gap decreasing to acceptable threshold of 0.01 within 200 iterations.
Figure 2. Bilevel optimization convergence process for production–distribution planning. Demonstrates coordinated convergence of upper-level (distribution) and lower-level (production) costs, with the coordination gap decreasing to acceptable threshold of 0.01 within 200 iterations.
Symmetry 17 01624 g002
Figure 3. RL-AOA balance dynamics. Left panel shows systematic transition from exploration-dominant (25%) to exploitation-dominant (2%) phase. Right panel displays incremental–decremental analysis revealing stable balance quality.
Figure 3. RL-AOA balance dynamics. Left panel shows systematic transition from exploration-dominant (25%) to exploitation-dominant (2%) phase. Right panel displays incremental–decremental analysis revealing stable balance quality.
Symmetry 17 01624 g003
Figure 4. Optimal paper grade production distribution across planning periods. Shows balanced allocation of production capacity across five paper grades (G1–G5) over four planning periods, with total production ranging from 1024 to 1600 tons per grade, demonstrating effective demand fulfillment and capacity utilization strategies.
Figure 4. Optimal paper grade production distribution across planning periods. Shows balanced allocation of production capacity across five paper grades (G1–G5) over four planning periods, with total production ranging from 1024 to 1600 tons per grade, demonstrating effective demand fulfillment and capacity utilization strategies.
Symmetry 17 01624 g004
Figure 5. Cost–benefit analysis for paper manufacturing implementation. Shows implementation costs, annual savings, and payback periods across different mill sizes. Expected 3-year ROI ranges from 35 to 55%, with payback periods of 8–15 months demonstrating strong economic viability.
Figure 5. Cost–benefit analysis for paper manufacturing implementation. Shows implementation costs, annual savings, and payback periods across different mill sizes. Expected 3-year ROI ranges from 35 to 55%, with payback periods of 8–15 months demonstrating strong economic viability.
Symmetry 17 01624 g005
Figure 6. Expert opinion impact and sensitivity analysis. The figure shows: Left panel—Goal achievement sensitivity showing variations of −2.5% (conservative, ±30%) to +0.8% (very optimistic, ±5%) from current calibration (±15%). Resource utilization exhibits highest sensitivity (−2.5% to +1.3%) due to capacity planning complexity. Middle panel—Expert-defined alpha-cut ranges illustrating uncertainty bounds from conservative (0.7–1.3) to very optimistic (0.95–1.05) scenarios. Right panel—Average sensitivity impact demonstrating framework robustness with maximum ±3.0% variation, confirming stable performance across expert assessment approaches.
Figure 6. Expert opinion impact and sensitivity analysis. The figure shows: Left panel—Goal achievement sensitivity showing variations of −2.5% (conservative, ±30%) to +0.8% (very optimistic, ±5%) from current calibration (±15%). Resource utilization exhibits highest sensitivity (−2.5% to +1.3%) due to capacity planning complexity. Middle panel—Expert-defined alpha-cut ranges illustrating uncertainty bounds from conservative (0.7–1.3) to very optimistic (0.95–1.05) scenarios. Right panel—Average sensitivity impact demonstrating framework robustness with maximum ±3.0% variation, confirming stable performance across expert assessment approaches.
Symmetry 17 01624 g006
Figure 7. Comparative analysis. Left panel shows 17.8% average cost increase for the hybrid framework reflecting realistic uncertainty. Right panel displays goal achievement reliability with deterministic showing overoptimistic performance (0.88 ± 0.055) versus stable hybrid results (0.87 ± 0.011).
Figure 7. Comparative analysis. Left panel shows 17.8% average cost increase for the hybrid framework reflecting realistic uncertainty. Right panel displays goal achievement reliability with deterministic showing overoptimistic performance (0.88 ± 0.055) versus stable hybrid results (0.87 ± 0.011).
Symmetry 17 01624 g007
Table 1. Sets and indices used in the paper mill supply chain optimization model. This table defines the mathematical notation for all entities in the supply chain network including products, facilities, and their relationships.
Table 1. Sets and indices used in the paper mill supply chain optimization model. This table defines the mathematical notation for all entities in the supply chain network including products, facilities, and their relationships.
NotationDescription
P = { 1 , 2 , , p } Set of paper products/grades
M = { 1 , 2 , , m } Set of paper machines
W = { 1 , 2 , , w } Set of warehouses
D = { 1 , 2 , , d } Set of distribution centers
C = { 1 , 2 , , c } Set of customers
R = { 1 , 2 , , r } Set of raw materials (pulp types, recycled fiber)
T = { 1 , 2 , , t } Set of time periods (months)
L = { 1 , 2 , , l } Set of transportation modes
S = { 1 , 2 , , s } Set of suppliers
PMpMSet of machines compatible with paper grade p
PRpRSet of raw materials required for paper grade p
ΦincompatibleP × PSet of incompatible product transition pairs
Table 2. Deterministic parameters for the paper mill supply chain model. This table lists all fixed parameters related to costs, capacities, production rates, and technical specifications that remain constant throughout the planning horizon.
Table 2. Deterministic parameters for the paper mill supply chain model. This table lists all fixed parameters related to costs, capacities, production rates, and technical specifications that remain constant throughout the planning horizon.
ParameterDescription
KpmSetup cost for producing paper grade p on machine m
S T p i p j m Sequence-dependent changeover time from grade pi to pj on machine m
S C p i p j m Sequence-dependent changeover cost from grade pi to pj on machine m
αprAmount of raw material r required per unit of paper grade p
βpmProduction rate of paper grade p on machine m (tons/hour)
CAPmAvailable capacity of machine m in each period (hours)
WCAPwStorage capacity of warehouse w (tons)
DCAPdStorage capacity of distribution center d (tons)
LCAPlTransportation capacity of mode l (tons)
SLpShelf life of paper grade p (periods)
MWpMinimum order quantity for paper grade p
MXpMaximum order quantity for paper grade p
FCdFixed cost of operating distribution center d
WλWeight for service level in objective function
QMinpMinimum quality threshold for paper grade p
EmissionRatepmEmission rate for producing grade p on machine m
EmissionLimitTotal emission limit for the planning horizon
SMinpt, SMaxptSeasonal production bounds for grade p in period t
MaxChangeovermMaximum changeovers allowed per period on machine m
MinQualitypmMinimum quality level for grade p produced on machine m
MbigLarge constant for big-M constraints
Table 3. Uncertain random parameters for the paper mill supply chain model. This table details all stochastic parameters that are subject to uncertainty, including demand fluctuations, cost variations, and operational variability.
Table 3. Uncertain random parameters for the paper mill supply chain model. This table details all stochastic parameters that are subject to uncertainty, including demand fluctuations, cost variations, and operational variability.
ParameterDescription
D ˜ p c t Uncertain random demand of customer c for paper grade p in period t
P C ˜ p m t Uncertain random production cost of grade p on machine m in period t
R C ˜ r t Uncertain random cost of raw material r in period t
T C ˜ i j l t Uncertain random transportation cost from node i to j using mode l in period t
H C ˜ p t Uncertain random holding cost of paper grade p in period t
B C ˜ p c t Uncertain random backorder cost for grade p at customer c in period t
η ˜ p m t Uncertain random machine efficiency for grade p on machine m in period t
θ ˜ r t Uncertain random raw material availability of type r in period t
γ ˜ p t Uncertain random quality yield for paper grade p in period t
ϕ ˜ l t Uncertain random transportation mode availability l in period t
ω ˜ m t Uncertain random machine breakdown probability for machine m in period t
Table 4. Goal programming parameters for the paper mill supply chain model. This table defines the parameters used in the goal programming formulation, including target probability levels, threshold values, and confidence levels for each objective.
Table 4. Goal programming parameters for the paper mill supply chain model. This table defines the parameters used in the goal programming formulation, including target probability levels, threshold values, and confidence levels for each objective.
ParameterDescription
α1Target probability level for economic efficiency goal
α2Target probability level for operational performance goal
α3Target probability level for resource utilization goal
α4Target probability level for quality assurance goal
Target1Target threshold for total cost
Target2Target threshold for service level
Target3Target threshold for capacity utilization
Target4Target threshold for quality performance
βjConfidence level for chance constraint j
Table 5. Upper-level decision variables for distribution planning in the paper mill supply chain model. This table defines all variables related to distribution network configuration, product flow, and customer service.
Table 5. Upper-level decision variables for distribution planning in the paper mill supply chain model. This table defines all variables related to distribution network configuration, product flow, and customer service.
VariableDescription
Yd ∈ {0, 1} 1 if distribution center d is operated, 0 otherwise
Xwdplt ≥ 0 Quantity of grade p shipped from warehouse w to DC d using mode l in period t
Zdcplt ≥ 0 Quantity of grade p shipped from DC d to customer c using mode l in period t
Ult ∈ {0, 1} 1 if transportation mode l is used in period t, 0 otherwise
I d p t D 0 Inventory of grade p at distribution center d at end of period t
Bpct ≥ 0 Backorder quantity of grade p for customer c in period t
λpt ∈ [0, 1] Service level for paper grade p in period t
Table 6. Lower-level decision variables for production planning in the paper mill supply chain model. This table defines all variables related to production decisions, machine scheduling, raw material procurement, and inventory management at production facilities.
Table 6. Lower-level decision variables for production planning in the paper mill supply chain model. This table defines all variables related to production decisions, machine scheduling, raw material procurement, and inventory management at production facilities.
VariableDescription
Qpmt ≥ 0 Quantity of grade p produced on machine m in period t
Vpmt ∈ {0, 1} 1 if grade p is produced on machine m in period t, 0 otherwise
W p i p j m t { 0 , 1 } 1 if changeover from grade pi to pj on machine m in period t, 0 otherwise
Fwplt ≥ 0 Quantity of grade p shipped from machine location to warehouse w using mode l in period t
I w p t W 0 Inventory of grade p at warehouse w at end of period t
Rsrt ≥ 0 Quantity of raw material r purchased from supplier s in period t
I r t R 0 Inventory of raw material r at end of period t
Table 7. Goal programming variables for the paper mill supply chain model. This table defines the deviation variables used to measure the achievement of each goal in the multi-objective framework.
Table 7. Goal programming variables for the paper mill supply chain model. This table defines the deviation variables used to measure the achievement of each goal in the multi-objective framework.
VariableDescription
d 1 + , d 1 0 Positive and negative deviations from economic efficiency goal
d 2 + , d 2 0 Positive and negative deviations from operational performance goal
d 3 + , d 3 0 Positive and negative deviations from resource utilization goal
d 4 + , d 4 0 Positive and negative deviations from quality assurance goal
Table 8. Parameter settings for RL-AOA.
Table 8. Parameter settings for RL-AOA.
ParameterSymbolValue
Population size (upper level) N p U L 80
Population size (lower level) N p L L 60
Maximum iterationsTmax400
Number of Monte Carlo simulationsNMC8000
Math Optimizer bounds [Min, Max] [0.2, 1.0]
Exploitation accuracy parameterα5.0
Mutation factorμ0.5
Initial learning rateα00.15
Learning rate decayϕ0.6
Discount factorγ0.85
Exploration probability (initial)ϵmax0.8
Exploration probability (final)ϵmin0.05
Penalty factor (initial upper level) λ j U L , 0 150
Penalty factor (initial lower level) λ j L L , 0 100
Confidence level tightening rateκ2.5
Table 9. Characteristics of paper manufacturing test instances.
Table 9. Characteristics of paper manufacturing test instances.
InstanceMachinesWHsDCsCustomersGradesPeriodsSuppliers
Small-14358334
Small-264612435
Medium-185815546
Medium-21061020648
Large-112812257510
Large-2151015308512
Very Large2012184010615
Table 10. Uncertain random parameter distributions for paper manufacturing.
Table 10. Uncertain random parameter distributions for paper manufacturing.
ParameterRandom ComponentUncertain Component
Customer demand ( D ˜ p c t )Normal (μpct, σ p c t 2 )Linear (0.85μpct, 1.15μpct)
Production cost ( P C ˜ p m t )Lognormal (μPC, 0.08 μ P C 2 )Linear (0.9μPC, 1.2μPC)
Raw material cost ( R C ˜ r t )Normal (μRC, 0.1 μ R C 2 )Linear (0.8μRC, 1.25μRC)
Transportation cost ( T C ˜ i j l t )Uniform ( 0.85 T C ^ i j l t , 1.15 T C ^ i j l t )Linear ( 0.75 T C ^ i j l t , 1.3 T C ^ i j l t )
Machine efficiency ( η ˜ p m t )Beta (7, 2)Linear (0.85, 1.0)
Raw material availability ( θ ˜ r t )Triangular (0.8, 1.0, 1.1)Linear (0.9, 1.15)
Quality yield ( γ ˜ p t )Beta (8, 1.5)Linear (0.92, 1.0)
Transportation mode availability ( ϕ ˜ l t )Beta (3, 1)Linear (0.7, 1.0)
Machine breakdown probability ( ω ˜ m t )Beta (1.5, 10)Linear (0.4, 2.2)
Emission rate ( E m i s s i o n R a t e ˜ p m )Lognormal (μER, 0.12 μ E R 2 )Linear (0.85μER, 1.25μER)
Table 11. Tactical planning goal probability achievement with confidence intervals.
Table 11. Tactical planning goal probability achievement with confidence intervals.
InstanceEconomicOperationalResource Util.Quality
(Target: 0.90)(Target: 0.85)(Target: 0.80)(Target: 0.90)
Small-10.887 ± 0.0120.836 ± 0.0150.783 ± 0.0180.878 ± 0.014
Small-20.883 ± 0.0130.831 ± 0.0160.779 ± 0.0190.874 ± 0.015
Medium-10.876 ± 0.0140.823 ± 0.0170.771 ± 0.0200.867 ± 0.016
Medium-20.871 ± 0.0150.817 ± 0.0180.765 ± 0.0210.862 ± 0.017
Large-10.865 ± 0.0160.810 ± 0.0190.758 ± 0.0220.856 ± 0.018
Large-20.859 ± 0.0170.804 ± 0.0200.752 ± 0.0230.850 ± 0.019
Very Large0.852 ± 0.0180.797 ± 0.0210.745 ± 0.0240.843 ± 0.020
Average0.8700.8170.7650.861
Gap from Target−3.3%−3.9%−4.4%−4.3%
Note: Bold formatting in summary rows distinguishes aggregate statistics from individual instance results for enhanced readability.
Table 12. Bi-level coordination effectiveness.
Table 12. Bi-level coordination effectiveness.
InstanceCoordination Gap aIterations to ConvergeUpper-Level Cost (USD K)Lower-Level Cost (USD K)
Small-10.0242845.323.2
Small-20.0313252.527.8
Medium-10.0383668.835.4
Medium-20.0454184.343.7
Large-10.05247112.658.9
Large-20.05953138.572.3
Very Large0.06761189.798.8
Average0.04542.698.851.4
a Normalized coordination gap = | Q p m t F w p l t | Q p m t , representing relative material flow imbalance (0 = perfect coordination, 1 = complete mismatch). Note: Bold formatting in the average row distinguishes summary statistics from individual instance results for enhanced readability.
Table 13. Production and setup costs for Medium-1 instance (USD/ton, USD).
Table 13. Production and setup costs for Medium-1 instance (USD/ton, USD).
MachineProduction Cost (USD/ton)Setup Cost (USD)
G1G2G3G4G5G1G2G3G4G5
M1285320245380120014509501680
M22952554203651350110018901580
M3310240405375138098018201620
M4280315415118014201860
M52903252503851280148010501720
M623539536092017501540
M7275305245410115013509701840
M8285400370122017801590
Table 14. Machine capacity information for Medium-1 instance.
Table 14. Machine capacity information for Medium-1 instance.
MachineDaily Capacity (tons)Efficiency (%)Availability (%)
M14592.395.2
M23889.794.8
M34291.596.1
M45088.993.7
M53590.295.5
M64093.196.8
M74887.494.2
M84491.895.9
Table 15. Customer demand information with uncertain parameters (Medium-1).
Table 15. Customer demand information with uncertain parameters (Medium-1).
CustomerTypeMean Demand (tons/period)Uncertainty Parameters
G1G2G3G4G5CV (%)α-cutService Level
C1Packaging4503800520015[0.9, 1.1]0.95
C2Newsprint06804200018[0.85, 1.15]0.92
C3Tissue00320046022[0.8, 1.2]0.90
C4Publishing3505502800016[0.9, 1.1]0.94
C5Specialty00038042025[0.75, 1.25]0.88
C6Packaging48000560014[0.92, 1.08]0.96
C7Newsprint07203900019[0.85, 1.15]0.91
C8Tissue00340048021[0.8, 1.2]0.89
Table 16. Algorithm performance comparison for bi-level optimization.
Table 16. Algorithm performance comparison for bi-level optimization.
AlgorithmBestAverageWorstStd. Dev.Time (min)Success Rate
Bi-GA0.8210.8260.8390.00624.371.2%
Bi-PSO0.8340.8390.8510.00521.775.8%
Bi-DE0.8470.8520.8630.00519.478.3%
Bi-SSO0.8550.8600.8690.00417.981.7%
RL-AOA0.8700.8750.8820.00315.286.4%
Wilcoxon Signed-Rank Test (RL-AOA vs. others): all p < 0.01. Note: Bold formatting highlights the proposed RL-AOA method to distinguish it from baseline algorithms for comparison clarity.
Table 17. RL action selection patterns in bi-level optimization.
Table 17. RL action selection patterns in bi-level optimization.
Optimization PhaseEnhance ExplorationFocus ExploitationBalanced SearchStrengthen Coordination
Early (0–25%)42.8%12.6%28.4%16.2%
Middle (25–65%)24.3%21.7%35.8%18.2%
Late (65–100%)8.9%35.4%29.3%26.4%
Performance Gain+3.1%+2.4%+1.6%+3.8%
Note: Bold formatting in the performance gain row distinguishes performance metrics from action selection frequencies, highlighting the effectiveness of each RL action type for enhanced readability.
Table 18. Impact of confidence levels on bi-level performance metrics.
Table 18. Impact of confidence levels on bi-level performance metrics.
ConfidenceGoal AchievementUpper-Level CostLower-Level CostCoordination GapCPU Time
0.700.89284.342.20.03811.2
0.750.88187.543.80.04112.4
0.800.87091.345.70.04513.8
0.850.85895.847.90.04915.2
0.900.844101.250.60.05417.9
0.950.827108.554.30.06122.1
Table 19. Production–distribution coordination under uncertainty.
Table 19. Production–distribution coordination under uncertainty.
Uncertainty LevelCoordination EfficiencyInformation ExchangeDecision ConsistencyOverall Performance
Low (0.5×)0.9240.9570.8890.923
Medium (1.0×)0.8700.9120.8450.876
High (1.5×)0.8230.8680.7980.830
Very High (2.0×)0.7780.8210.7520.784
Table 20. Optimal paper grade production allocation (Medium-1 instance).
Table 20. Optimal paper grade production allocation (Medium-1 instance).
MachinePeriod 1Period 2Period 3Period 4
G1G2G3G4G5G1G2G3G4G5G1G2G3G4G5G1G2G3G4G5
M1125089001347698118092001280105
M200156879801430067148941120089
M31420076089134015600820781450
M489000125167104000138158
M5007612387000008913496000
M69400001181040000127
M7089012476098001120135880870
M813409701480125010601420
Table 21. Machine changeover analysis for tactical planning.
Table 21. Machine changeover analysis for tactical planning.
MachineTotal ChangeoversChangeover Cost (USD)Changeover Time (hrs)Efficiency Loss (%)Utilization (%)
M16845023.53.287.4
M27968027.23.885.1
M35732019.82.789.2
M4811,24031.64.382.7
M56875024.83.486.8
M64589016.22.291.5
M77945026.43.684.9
M85768020.92.988.6
Average6.0855723.83.387.0
Note: Bold formatting in the average row distinguishes aggregate statistics from individual machine data for enhanced readability and quick identification of summary metrics.
Table 22. Cost–benefit analysis for paper manufacturing implementation.
Table 22. Cost–benefit analysis for paper manufacturing implementation.
Mill SizeImplementation Cost (USD K)Annual Savings (USD K)Payback Period (Months)
Small (1–2 machines)150–250180–3208–14
Medium (3–8 machines)280–450350–6809–15
Large (9+ machines)520–850750–14008–13
Table 23. Impact of target probability levels on lexicographic feasibility.
Table 23. Impact of target probability levels on lexicographic feasibility.
ScenarioEcon.Oper.Res.Qual.Econ.Oper.Res.Qual.
Target αg ValuesGoal Achievement
Relaxed0.800.750.700.800.8920.8480.8110.879
Current0.900.850.800.900.8700.8170.7650.861
Stringent0.950.900.850.950.8310.7620.6890.798
Cascade Effect: 0.0450.0850.167-
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Boutmir, Y.; Bannari, R.; Bannari, A.; Rouky, N.; Benmoussa, O.; Fedouaki, F. Bi-Level Dependent-Chance Goal Programming for Paper Manufacturing Tactical Planning: A Reinforcement-Learning-Enhanced Approach. Symmetry 2025, 17, 1624. https://doi.org/10.3390/sym17101624

AMA Style

Boutmir Y, Bannari R, Bannari A, Rouky N, Benmoussa O, Fedouaki F. Bi-Level Dependent-Chance Goal Programming for Paper Manufacturing Tactical Planning: A Reinforcement-Learning-Enhanced Approach. Symmetry. 2025; 17(10):1624. https://doi.org/10.3390/sym17101624

Chicago/Turabian Style

Boutmir, Yassine, Rachid Bannari, Abdelfettah Bannari, Naoufal Rouky, Othmane Benmoussa, and Fayçal Fedouaki. 2025. "Bi-Level Dependent-Chance Goal Programming for Paper Manufacturing Tactical Planning: A Reinforcement-Learning-Enhanced Approach" Symmetry 17, no. 10: 1624. https://doi.org/10.3390/sym17101624

APA Style

Boutmir, Y., Bannari, R., Bannari, A., Rouky, N., Benmoussa, O., & Fedouaki, F. (2025). Bi-Level Dependent-Chance Goal Programming for Paper Manufacturing Tactical Planning: A Reinforcement-Learning-Enhanced Approach. Symmetry, 17(10), 1624. https://doi.org/10.3390/sym17101624

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop