Exact Methods and Heuristics for Order Acceptance Scheduling Problem under Time-of-Use Costs and Carbon Emissions

: This research focuses on an Order Acceptance Scheduling (OAS) problem on a single machine under time-of-use (TOU) tariffs and taxed carbon emissions periods with the objective to maximize total proﬁt minus tardiness penalties and environmental costs. Due to the NP-hardness of the considered problem especially in presence of sequence-dependent setup-times, two ﬁx-and-relax (FR) heuristics based on different time-indexed (TI) formulations are proposed. A metaheuristic based on the Dynamic Island Model (DIM) framework is also employed to tackle this optimization problem. These approached methods show promising results both in terms of solution quality and solving time compared to state-of-the-art exact solving approaches.


Introduction
In this period of economic recession stemming from the COVID-19 pandemic [1] and coupled with the climate emergency, the implementation of effective policies and tools remains crucial to tackle current challenges. According to the International Energy Agency, the industrial sector accounted for almost 28% of the energy use in 2018, whereas the current crisis is likely to shift the industrial output to more energy-intensive manufacturers [2]. Consequently, the International Panel on Climate Change urges governments and economic actors to engage rational and coordinated responses to the climate change through a sustainable development. The latter sets on economic, social and environmental pillars guarantying prosperity, social justice and nature conservation. In addition to scientific concerns, civil society is increasingly calling for sustainable development, pressuring governments and companies to adhere to ethical standards and green framework.
To curb GHG (greenhouse gas) emissions, for instance, the implementation of European Union Emission Trading System (EU ETS) for energy-intensive industry has been a keystone of EU energy policy. EU ETS forms a 'cap and trade' scheme allowing the companies to emit and exchange GHG allowances, decreasing on yearly basis. Heavy fines are applied if the allowance is not complied. Since its introduction, it has achieved a 8.7% cut on European GHG emissions [3]. To complement EU ETS, members states introduce taxes on carbon whenever emissions exceeds a given threshold. In this context, this paper especially focuses on taxes on CO 2 emissions. Along with these regulations, monitoring energy consumption and carrying out an energy audit is now mandatory for companies that meet specific criteria. In our economy based on supply and demand, the industrial sector has been establishing itself as a major actor. For a manufacturing company, the production of goods to satisfy customers demand generates profit, investment and employment. Altogether, this participates in the real economy as the manufacturing sector contributes to nearly 20% of global gross domestic product [4]. Therefore, the sector must adapt as effectively as possible to the newest regulations while maintaining competitiveness. This past decade, the environmental impact of the supply chain has been widely studied, suggesting that the operational level is more prone to profound changes. Indeed, integrating features less than 4% of their corpus. One of the most important points highlighted in this study is that energy efficiency is conceptualized by two approaches. First and foremost, it is done by introducing it as a criterion. Indeed, targeted criteria such as Total Energy Consumption or Costs (TEC) or total carbon emissions have been successfully incorporated into scheduling problems in numerous work [16][17][18][19]. Second, energy efficiency is modeled by dedicated constraints coupled with a classical scheduling objective [20,21]. For example, in the work of Liao et al. [20], weighted tardiness and completion times are minimized while satisfying a periodic threshold on energy consumption for a single machine.
In the literature, different assumptions relative to the energy aspects may be encountered. These assumptions are related to the quantity considered (carbon emissions, energy consumption, power etc.), the machine characteristics (energy states, speed), the variation of the energy costs during the day or the system involved (single machine or shop systems). Depending on these assumptions, the problem entails particular properties and thus is solved with specific approaches.
For single machines, Mouzon and Yildirim [5] present an adaptive search metaheuristic to minimize TEC and total tardiness. In their study, they examine the idle, setup and processing energy of the machine in order to efficiently adjust the production and avoid tardy jobs. The neighborhood move developed by the authors inserts setup or idle times between jobs, which can reduce energy costs but can lead to tardiness. Che et al. [22] consider a TI MILP to minimize TEC under TOU electricity tariffs and develop a greedy insertion heuristic which moves jobs to the off-peak periods. Aghelinejad et al. [23] propose a dynamic program for the single machine problem under TOU tariffs considering machine states and investigate the complexity of various energy costs strategies that can induce the problem to be polynomial.
As for shop scheduling or parallel machines, other energy aspects are studied. Zhang et al. [24] address a speed scaling job shop problem with the objective to minimize both tardiness and TEC. In their work, they monitor machines speed to efficiently modulate the production process. Dedicated local search procedures are designed to cope with the complexity of the problem. In the same vein, Jiang et al. [25] consider energy consumption per time unit and idle energy consumption minimizing TEC and makespan. They employ an Evolutionary Algorithm (EA) in their solving approach. In [26], the authors optimize the TEC and the makespan of unrelated parallel machines under time-and-machine-dependent electricity costs. Their solving approach involves an hybrid GA that incorporate an idle-time insertion procedure to cut costs on electricity expenses. Considering CO 2 emissions, Foumani and Smith-Miles [27] assess common carbon reduction policies on a flow shop. One of their conclusion confirms that optimizing the schedule plays a key role in the reduction of CO 2 emissions rather than changing equipment. In the meantime, they show that the 'cap and trade' approach is a cost-effective policy. In [7], a flowshop under time-dependent electricity tariffs and CO 2 emissions is tackled. The authors propose a TI MILP to minimize simultaneously carbon footprint and TEC with machines having different consumption levels. Their study suggests that a trade-off between electricity costs and CO 2 emissions appears when the energy providers are coal-based. As in [7,12], the assumption on time-dependent CO 2 emissions and electricity costs holds for this paper.
OAS is a particular scheduling problem where the decision covers the selection of a subset of orders, among n, and their sequencing in a capacity-constrained production system. Typically, this entails a fixed time frame to complete orders and an associated costdriven event where the company fails to produce within the time-window. The solution space of OAS problem extends classical scheduling one, as jobs can be accepted or not. Indeed, at worst the number of possible solutions in OAS problem is ∑ n k=1 k!, where all the k-permutations of n without repetition are considered, whereas only n! solutions form the solution space in classical scheduling problems. In the literature, for both single-or multimachine systems, a variety of configurations of problems are considered such as sequencedependent setup times [9,28,29], preemption [11] or resource constraints [10,30]. As for our research, the immediate related works are those presented in [9,28,29]. A comprehensive survey [31] presents an overview on OAS problems, while in [32], a focus is made on scheduling problem with rejection.
OAS involves mainly economic-related criteria, primarily embodied by the maximization of the total profit generated by the orders. Service level [33], percentage of accepted orders or utilization can also be maximized in OAS problems. In addition, cost penalties can be integrated in the objective function when tardiness or order rejection occur. For instance, Oguz et al. [9] maximize the total profit of accepted orders minus their possible tardiness penalties. As in scheduling problem, OAS solving methods involve exact and heuristic approaches. MILP [9,28,29], dynamic programming [34] and branching methods [35] have been employed for various OAS problems. In the meantime, as these problems are mostly NP-hard, a wide range of metaheuristics from local search to EA have been utilized and have shown very robust performances [36][37][38]. Besides, reports in the literature describe an order-based and a time-based FR heuristics applied to an OAS problem under resource constraints [30] that both achieve a tighter gap for large instances. A recent work of Tarhan et Oguz [39] proposes a two-phase matheuristic that exploit a time-indexed model. First, they assign orders to time segments using the relaxed version of their model and generate a schedule subsequently; the solution is then improved by a VNS. This process is repeated until the termination criterion is met. However, energy aspects are not considered in their work.
Literature on OAS considering resource constraints and/or energy aspects is very sparse. Garcia [10] tackles an resource-constrained OAS problem with the objective to maximize profit with rejection penalties using an EA and a priority rule heuristic. Kong et al. [40] maximize the net revenue of a parallel machines system with order acceptance and a global constraint on the energy consumed by machines and their launch budget. In their work, a comparative analysis between diverse variable neighbor search algorithms and a dynamic programming approach is conducted. Considering electricity tariffs, to the best of our knowledge, three papers have been reported [12,13,41]. These papers follow up the works in [12,13] that both investigate the OAS problem under TOU and CO 2 emissions periods and sequence-dependent setup times with a disjunctive formulation in [12] and an ATI model in [13]. Moreover, this paper contributes to the comprehension of the OAS under energy aspects by introducing approached solving methods that improve the existing results.
A vast majority of the investigated problems on scheduling are NP-hard or pseudopolynomial, justifying the significant use of heuristics that can outperform exact methods. Likewise, matheuristics are developed to tackle the aforementioned problems.
FR heuristic is a model-based heuristic which is applied in planning and scheduling problems. Promoted by Absi et al. [42], this heuristic has been employed in production planning researches considering energy aspects [43,44]. For instance, Masmoudi et al. [43] minimize TEC for a single-item capacitated lot sizing problem in a flow shop with TOU tariffs and power constraints. Their FR strategy relies on the relaxation of the binary decision variables involved in the time-dependent constraints. Besides, the FR heuristic is also used for scheduling problems such as operating rooms scheduling [45,46] and harvest scheduling [47]. In [45], Silva et al. maximize the use of operating rooms with constraints on the staff schedule and skills. Following the advances of Industry 4.0, a recent study of Li et al. [48] features a GA combined with machine learning approaches that minimize makespan for a job shop rescheduling production system. The machine learning techniques aim at evaluating rescheduling patterns. Their framework is able to outperform classical approaches with less configuration changes made at the right time. The survey of Dolgui et al. [49] summarizes the contours of scheduling problems from the point of view of optimal control. In the context of complex systems, this approach appears to answer to the new challenges raised by the Industry 4.0. Finally, Q-learning techniques have also been employed in [50] for an online single machine scheduling with the objective to minimize tardiness in the context of a smart factory [51]. This research compares the performances of classical scheduling methods against reinforcement learning techniques and concludes that the latter can improve the resolution quality and time [25,[52][53][54] . In this vein, the island-based framework introduced in [14] is a good candidate for solving combinatorial optimization problems such as scheduling. This framework proposes a self-adaptive migration policy between islands of individuals to efficiently explore and intensify the search. As in reinforcement learning techniques, the utility of the mutation operators, which control the number of individuals in each island, are re-evaluated at each iteration depending on the past performances. Good results have been presented for the 0/1 knapsack problem and the MAX-SAT problem in [55]. Therefore, this paper proposes an island-based metaheuristic as well as two FR heuristics based on two distinct exact models in order to efficiently solve the OAS problem with released dates and sequence-dependent setup times under TOU and taxed CO 2 emissions.
To finish this section some conclusions can be drawn. First, with the growing interest on environmental issues, both industrial and academics are paying more attention to incorporate them in their production and their models. Second, in the current economic climate, OAS problems find numerous applications; this is due to their capacity to introduce constraints on resources that usually are assumed unlimited. Last, metaheuristics, or more globally, artificial intelligence approaches, are privileged over exact methods. Moreover, the current trend is to use novel machine learning techniques as standalone solving approaches or to boost heuristics.

Problem Description
The OAS with sequence-dependent setup times, release date under TOU costs and taxed carbon periods is investigated in this paper. Parameters and notations are detailed in this section (Table 1). Processing time of order j = 0, . . . , n min r j Release date of order j = 0, . . . , n min d j Due date of order j = 0, . . . , n min d j Deadline of order j = 0, . . . , n min e j Revenue generated by order j = 0, . . . , n $ Ω j Power consumption of order j = 1, . . . , n kWh w j Tardiness penalties of order j = 0, . . . , n min s ij Setup-times between order i = 0, . . . , n and order j = 1, . . . , n min m Number of TOU intervals b k Starting time of TOU interval k = 1, . . . , m min EC k Electricity cost of TOU interval k = 1, . . . , m $/kWh h Number of CO 2 emission intervals g l Starting time of CO 2 interval l = 1, . . . , h min q l Amount of CO 2 emission per kWh in interval l = 1, . . . , h kg/kWh Tax Tax per kg of CO 2 emitted $/kg c jt Energy cost of order j = 1, . . . , n at time t = r j , . . . ,d j $ f jt Profit of order j = 1, . . . , n at time t = r j , . . . ,d j $ Each order j = 1, . . . , n is completely defined by its processing time p j , release date r j , due date d j , deadlined j , revenue e j , power consumption Ω j and tardiness penalties w j . In addition, a sequence-dependent setup time s ij is defined between any pair of orders i and j. A dummy order 0 is introduced in order to start the sequence. Each of its properties are set to zero except its setup time s 0j between any order j.
An order j is accepted when it is sequenced in the span ranging from its release date r j to its deadlined j and rejected otherwise. A tardiness penalty w j is subtracted to an order revenue e j for each time unit beyond its due date d j . In Figure 1, w j represents the slope of the revenue decay between d j andd j . Moreover, in the original work, the planning horizon is divided into intervals with fluctuating TOU tariffs and CO 2 emissions. Each TOU interval k = 1, . . . , m is characterized by a starting time b k and an electricity cost EC k . Each CO 2 emissions interval l = 1, . . . , h is determined by a starting time g l and an amount of CO 2 per kg and a Tax per emitted kg of carbon. As in [7,12], by assumption, CO 2 emissions are time-dependent, i.e., the emitted amount fluctuates over the day as the employed power sources are coal-based during off-peak hours and gas-based during mid-peak and on-peak hours. In this problem, the objective is to maximize the revenue minus tardiness penalties and energy costs. For simplification reasons, the energy costs can be calculated at each time-slots rather than at each intervals, especially as TOU and CO 2 emissions intervals partition differently the horizon. The energy cost c jt for any order j = 1, . . . , n at any period t = r j , . . . ,d j is thus computed with the formula given in Equation (1).
The energy cost of each time period t and for each order j corresponds to the sum of the respective TOU and CO 2 taxed emissions costs of the examined period t multiplied by the order's energy consumption expressed into minutes. In this expression, the indicator function 1 x takes value 1 if condition x holds, and 0 otherwise. In addition, some assumptions are stated in this problem. Preemption is not allowed, idle time energy is negligible . Setup and production use the same amount of energy. The planning horizon ends at the maximum of deadlines, that is, T = max j=1,...,nd j + 1.

Exact Approaches
The initial solving approach for this problem involves a sequence-based or disjunctive MILP proposed by Chen et al. [12]. Their model is based on integer decision variables that represent starting times, completion times and tardiness of each order, whereas the sequence is determined by binary decision variables defined between each pair of orders. Acceptation of orders are handled by binary decision variables. Due to its inherent properties, the disjunctive MILP is not as efficient for some instances with particular features. An ATI model is developed in [13] that can overcome some aspects of the disjunctive model.
In this paper, two new TI formulations deriving from the work in [41] that explored the problem without sequence-dependent setup times are presented and achieve better results than the disjunctive formulation. In Section 4.1, an On/Off formulation is presented. The model in Section 4.2, referred as TI Pulse, is an equivalent model for the same problem. Performances comparison between each MILP is presented in the last subsection.

On/Off Formulation
In this model, each binary decision variable x jt = 1 indicates whether the order j is processed at time t = r j , . . . ,d j , or not x jt = 0. In the same way, the binary decision variables y jt = 1 corresponds to a unit of processed setup of an order j = 1, . . . , n at time t = r j , . . . ,d j − p j . For any order j = 0, . . . , n, the binary decision variable a j takes value 1 if order j is accepted; 0 otherwise. For any pair of orders i, j = 0, . . . , n the binary decision variable u ij equals 1 if order i precedes directly order j. Finally the integer decision variables T j ∈ N represent the tardiness of any order j = 1, . . . , n and C j ∈ N its completion time.
The MILP for the TI On/Off formulation is written as follows.
The objective function (2) is the maximization of the total profit, i.e., the revenue minus the possible tardiness penalties and the environmental costs. Constraints (3) state that at each time the machine is either doing nothing, processing an order or doing a setup operation. Constraints (4) compute the completion times of order j by retrieving the instant t + 1 when the production ends, that is, when x jt = 1 and x jt+1 = 0. Constraints (5) limit the completion time of an accepted order j to its deadline. Constraints (6) refer to the calculation of the tardiness of an accepted order j with its completion time minus its due date. Constraints (7) indicate that an accepted order can have at most a successor. Constraints (8) impose that each accepted order must have a predecessor. Constraints (9) impose to any order j to be processed exactly p j time units. Constraints (10) guarantee nonpreemption by forcing the contiguity of the decision variables x jt . To be more specific, if at time period t order j is produced (x jt = 1), constraints exclude production p j units before and after t in r j , . . . , t − p j and t + p j , . . . ,d j . Implicitly, this means that order j is produced in the interval t − p j + 1, . . . , t + p j − 1. Constraints (11) define the setup operation of at most s ij time units between each order j and its predecessor order i. Constraints (12) determine the continuity of the setup operation while guarantying that it should be done right before the processing of an order. Meaning that if at time t, order j is in setup (y jt = 1), either the setup operation is carried out (y jt+1 = 1) or the production starts (x jt+1 = 1). Constraints (13) establish the precedence relationship between a predecessor order i and the sequence-dependent setup operation of order j. More precisely, if at period t, order j is in setup (y jt = 1) and order i precedes order j (u ij = 1), the order i must be completely processed before. This means that order i is produced at least p i time units from r i + 1 to t − 1, otherwise, the right hand-side is canceled. Constraints (14) establish the precedence relationship between the processing of an order j and its sequence-dependent setup operation s ij . It means that if order j is in production at period t, and order i precedes order j, the order j should have been setup before, during s ij time units, from r j to t − 1. Constraints (15)-(18) ensure that each order cannot be processed or setup before its release date and after its deadline. Finally, constraint (19) forces the dummy order to start the sequence at time 0.

Pulse Formulation
The decision variables z jt in the Pulse model refer to the possible instants t = r j , . . . ,d j − p j + 1 when the order j = 0, . . . , n starts. It means that z jt = 1 if and only if order j starts its production at time period t, and 0 otherwise. Finally, for each pair of orders i, j = 0, . . . , n the binary decision variable u ij equals 1 if order i precedes directly order j. In this formula-tion, the f jt term represents the profit of an order j = 1, . . . , n at period t = r j , . . . ,d j − p j +1 minus the tardiness penalties: The objective (20) is the maximization of the total profit including the tardiness penalties and environmental costs during processing and setup operations. Constraints (21) specify that at each time t, the machine can start at most one job. Constraints (22) indicate that an accepted order has at least a successor. Constraints (23) impose to an accepted order j = 1, . . . , n to have exactly one preceding order. Constraints (24) restrict the starting time of each order to the interval defined from its release date to its deadline minus its processing time. Constraints (25) precise precedence relationship between two orders, guaranteeing that if order i precedes directly order j, its starting time must be defined at least at a period after its release date r j and the setup operation s ij , and after p i the processing of order i. Constraints (26) and (27) prevent each order to be processed before its release date and after its deadline. Finally, constraint (28) forces the dummy order to starts the sequence at time 0.

Performances
This subsection provides performances comparison between each of the presented MILP, the ATI formulation [13] and the disjunctive formulation [12]. Models are first compared in term of spatial complexity, and then a comparative analysis is conducted on two benchmarks B and B of 18 instances each. The benchmark B differs from B on setuptimes values which are all set to 0. The tested instances come from [12] and correspond to instances with n = 10, 15, 20, τ ∈ {0.1, 0.5}, and R ∈ {0.1, 0.5, 0.9}. Table 2 displays the number of variables and constraints of each formulation using Landau notation. Table 2. Spatial complexity of each model.

#Variables #Constraints
As can be seen, ATI formulation is disadvantaged by its number of decision variables. The other formulations have the same number of variables in the worst case. The TI Pulse formulation benefits from fewer constraints than the other formulations with O(n 2 ) constraints, coming from the precedence constraints.
Results of the tests on the benchmarks B and B are reported in Tables 3 and 4 . Each line within the tables corresponds to 6 instances of same n with diverse τ and R values. The number of feasible and optimal solutions found by the models are reported in columns #fea and #opt respectively. Finally, average solving time in seconds (column cpu), average CPLEX gap (column gap), standard deviation for the solving time (column σ cpu ) and standard deviation for the gap (column σ gap ) are presented for each batch of instances. The last line of the tables gives a summary of performances of each model by displaying the total number of feasible and optimal solutions, the average solving time, the average gap and the average standard deviation values across all instances.  In Table 3, models are tested on benchmark B, which is without setup times. The results show that both TI formulations prevail on the other ones in terms of solving time and solution quality. The TI On/Off is even better than the TI Pulse, finding all optimal solutions of B in a minute on average.
According to Table 4, the TI formulations are rapidly overwhelmed in contrast to the ATI and the disjunctive formulations. The solution quality rapidly decreases proportionally to n for the TI models. In addition, both TI models cannot find some feasible solutions of B . Following the spatial complexity analysis, an explanation can be proposed. The TI formulations are limited by the number of constraints and their nature such as big-M constraints to preserve precedence. Even if the ATI model has a polynomial number of variables, it dominates all the MILP, finding almost all optimal solutions.

Heuristic Approaches
As the OAS problem in its basic form is NP-hard [9], heuristic solving approaches have been developed. In Section 5.1, the principle of the FR heuristic is presented and implemented for each of the provided formulations. In Section 5.2, a population-based metaheuristic is described and applied to the considered problem.

Fix-and-Relax Heuristics
According to Absi et al. [42], FR heuristic consists in building iteratively a solution from the consecutive solving of relaxed sub-models (or simplified versions) of the studied problem by fixing the value of decision variables deduced from the previously solved sub-problems.

Principle
FR heuristic procedure involves an Observation Window (OW) of length σ k overlapping δ k+1 periods between two successive steps k and k + 1. In this study, the OW length and the number of overlapping steps remain fixed, thus, for all k σ k = σ and δ k = δ. Two successive steps of the procedure are illustrated in Figure 2.

T
Step k Step Frozen window Observation window Approximation window At each step k, the decision variables are partitioned into 3 sets according to parameters a k and b k : a Frozen Window (FW) comprising variables indexed between 0 and a k − 1, an OW in which variables indexes fall between a k and b k , and an Approximation Window (AW) for variables indexed after b k + 1. At step k > 1, the values of the decision variables within the FW are known and integrated into the submodel Pr k . Moreover, constraints containing decision variables within the OW are completely taken into account, whereas constraints involving decision variables in the AW are either dropped or simplified.
Steps of the FR heuristic are described in Algorithm 1. FR heuristic takes as inputs Pr the problem, σ the OW length and δ the number of periods overlapping.
Initial steps from lines 1 to 3 consist in setting k = 0 and fixing a = 0, b = σ − 1, while the main loop ends when b ≥ T. The loop instructions are described from lines 4 to 11. Line 5, the submodel Pr k is solved. From lines 6 to 8, k, a k and b k are updated as follows : k is incremented by 1, a k is set to b k − δ, and the ending time of the OW b k is incremented by σ − δ steps. Line 9, the if-statement sets b k to T, preventing b k to overflow. Finally, at line 13, the last model is solved.

Adaptation
As the developed models are time-indexed, the partitioning of the decision variables follows the time horizon, distinguishing the FW t = 0, . . . , a k − 1 from the OW t = a k , . . . , b k and the AW t = b k + 1, . . . , T where T is the horizon.
As the considered problem includes sequence-dependent setup times, which contribute to the high complexity of the problem, the heuristic approach aims at estimating a simplified version of the problem with constant setup timess j for orders j that will possibly be produced in the AW. Therefore, the following binary decision variables are introduced in Pr k :  Step k = 0 Step k = 1 Step k = 2 During the procedure at step k, the binary decision variables x jt or z jt such that t = a k , . . . , b k are only considered, which reduce in width the subproblem. This is indicated in Figure 3 when the horizontal axis is graduated. Moreover, the decision variables x jt or z jt associated with already sequenced jobs are evicted from Pr k . This reduces in height the subproblem. The set of non-processed jobs at each step k is denoted as J k . Only the decision variables of the last processed order l in the previous subproblems are kept in Pr k in order to have the correct setup to make the connection with the next job. Thus, the model shrinks in size iteration by iteration. At each step k, the sub-problem model Pr k integrates the following constraint: Constraint (29) evaluates that the production of orders, and their setup operations in the AW cannot last more than T − b k time units. Setup operations of produced orders in the AW have a fixed duration defined bys j . Constraint (29) is, in a way, similar to a time-budget constraint, with an allowance of T − b k units of production and setup.

On/Off Model
The TI On/Off model described in Section 4.1 is adapted to suit to the FR heuristic procedure. The previously introduced decision variables α j and β j are defined by the group of constraints (30), for all j ∈ J k .
Constraints (30a) give value 1 to α j as soon as order j is processed within the OW, i.e., from a k to b k , provided that it can be processed completely before its deadlined j , and 0 if the order is not produced yet. Constraints (30b) ensure that an order j is either starting before the end of the OW (α j = 1) or completely produced after (β j = 1) or is not scheduled at all (α j = 0, β j = 0). Constraints (30c) indicate that the previously processed job in the previous step is fixed in the OW and cannot be processed in the AW.
Constraints (31) state that an order i scheduled in the OW can have at most a successor order j. Constraints (32) ensure that if order i is processed during the OW, it must have exactly a predecessor order j. These reformulated constraints imply that any order i that are scheduled in the AW have u ij values set to zero, as a consequence the right-hand side of constraints (14) and (15) are canceled.
Expressions of constraints (33) and (34) are replaced by Constraints (33) ensure that when an order j starts in the AW (β j = 1), the right-hand side is canceled and thus the y jt are free. This implies that the setup times between order i and j are not forced to last at least s ij time-units.
In the same way, constraints (34) ensure that when an order j starts in the AW (β j = 1), the right-hand side of the constraint is canceled. This means that if an order i (whether β i = 1 or β i = 0) precedes an order j that is known to be processed in the AW, the setup operation order j is not constrained to starts p i units after the processing of order i. The objective function (2) is rewritten as follows, where lb = max{r j , a k } and ub = min{d j , b k }.
In expression (35), the real profit is calculated for any order j ∈ J k that is scheduled in the OW. The profit of orders that are scheduled in the AW is approximated by rβ j e j with r = 0.8 this is the revenue discounted by a rate r. It encourages the model to schedule orders as much as possible in the OW.

Pulse Model
With respect to the Pulse model presented in Section 4.2, the formulation of the heuristic strategy is more direct as this model uses decision variables to represent whether an order starts at a specific time period (z jt = 1) or not (z jt = 0).
In the same manner as the TI On/Off model, the decision variables α j and β j are incorporated into Pr k , taking into account the characteristics of the TI Pulse formulation. Group of constraints (36) gather the definition of the aforementioned decision variables for Constraints (36a) give value 1 to β j if order j starts in the OW. Constraints (36b) link α j and β j , and allow order j to starts in the OW (α j = 1) or starts in the AW (α j = 1) or not starting at all. In the OW, precedence between orders must be considered. Therefore, constraints (25) are rewritten for all i ∈ J k ∪ {l} and j ∈ J k given that i = j, where lb 1 = max{r i , a k }, lb 2 = max{r j + s ij , a k } + 1, ub 1 = min{b k − p i + 1,d i − p i + 1} and ub 2 = min(d j − p j + 1, b k − p j + 1).
Constraints (37a) guarantee that if order i precedes order j, the starting time of order j is at least planned after the setup operation s ij and the processing p i of order i. These constraints examine only orders i and j scheduled before the end of the OW defined by b k , provided that they can be processed. To enforce this, the term E ij defined in Equation (37b) aims at canceling the precedence constraint whenever α i = 0. This helps to eliminate precedence between an order i that does not appear in the OW and any other order, thus improving the solving time.
A truth table is presented in Table 5 in order to enumerate each value that E ij takes depending on the values of β i and β j .
Finally, the constraints (22) and (23) with respect to u ij decision variables are modified accordingly.
Constraints (38) state that an order i starting in the OW should have at least a successor order j. Constraints (39) ensure that if order i is scheduled in the OW, it must have exactly a predecessor order j. This implies that any order that are outside the OW have u ij values set to zero. The objective function is replaced by expression (40), where lb = max{r j , a k }, In expression (40), the real profit is calculated for orders j ∈ J k which are scheduled in the OW. Otherwise, the profit of orders that are planned in the AW is approximated by rβ j f jb k with r = 0.8. That is, the expected profit at the end of the OW with a discounted returns of rate r. It prevents the model to delay orders that can be scheduled in the OW.
An overview of three steps of the FR heuristic procedure for the TI Pulse model on an example of n = 5 jobs is presented in Figure 4. It illustrates how decision variables z jt are managed in the implementation. A two-entry table is utilized to represent these variables for j ∈ J k and t = 0, . . . , T at each step k. The light gray area represents decision variables that are not considered.  4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Step Step Step

Dynamic Island Model
The Dynamic Island Model (DIM) introduced in [14] is applied to the considered problem. DIM is a framework designed as an adaptive operator selection mechanism for an EA in order to solve an optimization problem (S, f ). To resume, this technique allows a better exploration and exploitation of the search space S by reinforcing inertia of the best islands (movements) through a stochastic migration process rewarding the most promising island in terms of best improvement according to f , while maintaining a certain diversity. In this scheme, the population of each island evolves independently with a classical EA. Table 6 presents a summary of the nomenclature used in this paper.

Principle
The general principles of the DIM are presented in Algorithm 2. This algorithm takes as inputs an optimization problem (S, f ), values for parameters α and β, N the number of individual in each island, the number of maximal iterations itmax, the mutation rate p m and the crossover rate p c . The algorithm returns a solution s, which is the best solution among the populations at the end of the iterations.
The first instructions from line 1 to 4 consist in initializing D to zero and M to an equiprobable value for each pair (i, j) of islands. Then, line 3 P i is populated with individuals (greedy, random). In line 4, variable it is set to zero. The main while loop is described from line 5 to 21. This loop breaks when the maximum number of iteration is reached.
A for loop, from line 8 to 14, ranging from 1 to ν is used to apply a generic steady-state EA to each island. Line 5, two individuals p 1 and p 2 from P i are selected in order to provide children from a crossover operator, counting 50% of the population size. Then, children are mutated using the operator o i with a probability of p m . Finally, line 9, the top 50% worst solutions from P i are picked out and replaced by children solutions. analyze(D, f ); This process allows modifying the transitions M ij only for the best islands i. This is possible with intermediates vectors: a reward vector R and a stochastic noise vector N (∑ ν j=1 N j = 1) defined for each island j = 1, . . . , ν. A set B is also defined to store the indexes of the best island according to D.
The transition M ij for each pair of islands (i, j) is then updated as follows.
As stated in [14], only the islands where the individuals obtained significant improvements benefit from a reward, thus reinforcing the best operators. In the calculation of M ij , the parameter α represents the inertia (or exploitation) and β, the amount of noise necessary to explore other search space areas.

Migrate
The migration process involves every individual from each island and the transition matrix M. In this process, a random number r ∼ U (0, 1) is drawn from a uniform distribution for each individual of the population P i . For each destination island j, if r < M ij then the individual is sent to island j and remove from island i.

Analyze
This process retrieves for each pair of islands (i, j) the best fitness among individuals that have migrated from island i to island j in the previous iteration, thus measuring the benefits of the transition i → j.
The 'analyze' process needs the originating island of individuals at previous iteration. Consequently, this information is stored in the solution.
In Figure 5, both the transition matrix M and the population size of each island are represented with a directed graph (DG). Each edge (i, j) from vertex i to vertex j has a weight that corresponds to M ij . As the edges represent stochastic transition between vertices, the sum of the weights of the out-coming edges must be 1. Finally, the vertices in this diagram are proportional to the size of the population.

Solution Representation
A solution s is represented by a sequence {i 1 , . . . , i k , . . . i κ } of size κ ≤ n where i k = 1, . . . , n is the kth order to be scheduled. Moreover, an integer origin = 1, . . . , ν is used to represent the residence island at previous iteration. Associated completion times, C i k , for all k = 1, . . . , κ are computed in the decoding phase ( Figure 6).
Completion times C i k are calculated so that each order i k starts as soon as possible at time period ST i k = max(C i k−1 , r i k ), thus C i k = ST i k + p i k + s i k−1 i k when C i k >d i k the order is rejected.  f (s) Compute completion times Compute objective Figure 6. Decoding a solution by updating completion times.

Mutations Operators
Each island is characterized by a mutation operator. The majority of these operators can lead to the rejection of orders in the sequence. Completion times are thus updated accordingly whenever an operator is applied.

Add
This operator picks randomly a orderā among the rejected order list and inserts it at a random position k in the sequence.

Swap
The operator swap takes randomly two orders i j and i k and swaps their position j and k in the sequence.

Exchange
This operator picks randomly a rejected orderā and an order i k in the sequence and replaces them. That is, the order i k is rejected and the orderā is put at position k in the sequence.

Shift
This operator aims at shifting all the orders at the best starting time in terms of energy cost without causing tardiness or rejection. This operator examines for all order i k the best insertion period by computing exhaustively the energy cost between the starting times ST A and ST B and choosing t best where the energy cost is minimal.
The scramble operator picks two position k and j randomly between 1, . . . , κ and shuffles the sequence between these positions.

Inversion
This operator generates randomly two positions k and j between 1, . . . , κ with k < j. Then, between k and j the sequence is inverted.
For the purpose of efficiency, a version of the operators Add and Exchange that take into consideration the revenue load ratio (RL j = e j p j ) has been developed. For the operator Add, this consists in systematically add the best rejected order according to this ratio. For the Exchange operator, this implies choosing the worst order according to RL and replace it the best rejected order, in accordance with the revenue load ratio. These versions are respectively referred as AddGreedy and ExchangeGreedy.

Crossover Operator
For the considered problem, the crossover operator from in [57] has been utilized. This crossover takes as input two parents solutions p 1 and p 2 . The sequence of p 1 is examined at each position. For each position, orders are successively add up to the children sequence, using a random number to either inserts the order from p 1 or the one from p 2 . To prevent duplicate orders in the child solution, the orders appearing in the children that have already been inserted are not considered (Figure 7).

Initial Population
The initial population is built with 80% of random solution. The other 20% of the population are created with greedy heuristics.

Random Solution
This procedure starts by inserting all orders to the sequence and shuffle it. Then, completion times are calculated.

Earliest Released Job
This greedy heuristic sorts the sequence of orders using the Earliest Released Job rule. Completion times are computed accordingly.

m-ATCS
This greedy heuristic is taken from the work of Cesaret et al. [38]. The procedure starts with L = 1, . . . , n the list of orders, l = 0 the initial scheduled order and t = 0 the starting time. It iteratively inserts order i from L into a sequence s using the Apparent Tardiness Cost with Setups (ATCS) index at time t, knowing the previous scheduled order l. The formula is given in Equation (45). The order i with the largest ATCS(i,l,t) is added in s and erased from L, then t is set to max(t, r i ) + p i + s l,i . This continues until L is empty. Completion times are computed according to this sequence.
In (45),s corresponds to the average setup times andp refers to the average processing times.

Experimental Design
Experimental designs aim at determining levels of influence and interaction of external factors on a process. In this research, both the FR heuristics and the metaheuristic depend on many factors such as the OW length or the setup values for the FR heuristics and such as the mutation, crossover or learning rates for the DIM metaheuristic.
Taguchi method has been chosen to carry out these experimental designs for the proposed approaches. The reasons are that this easy-to-implement method has proven itself to be efficient and robust to tune GA and heuristics in this domain. Details are available in the Appendix A.

Benchmark and Material
The benchmark is from Chen et al. [12] with sequence-dependent setup times. It contains 45 instances with various number of orders n = 10, 15, 25, 50, 100 and two parameters to control the tardiness factor τ = 0.1, 0.5, 0.9 and the due date range R = 0.1, 0.5, 0.9. These parameters aim at having a diverse set of instances. Figures 8 and 9 illustrate the impact of τ and R on instance characteristics. The instances represented share the same processing times p but differ on release dates r, due dates d and deadlinesd depending on the value of τ. In the figures, the possible production span between r j and d j for an order j is indicated as a white rectangle, whereas the penalty interval between d j andd j is indicated as a gray rectangle.  As displayed in Figure 8 , the value of τ determines the dispersion of the release dates (r j ) as well as the production slack, that is, the length of the production interval between release and due dates (d j ). The greater τ is, the more scattered the release dates are and the less flexible the production is. Note that more flexibility increases the combinatorial aspect of the problem.
In Figure 9, the impact of R on the instances is shown. R controls the length of the interval for each order j between due dates (d j ) and deadlines (d j ), that is, the penalty interval. R affects also the slope of the tardiness penalty (w j ). When R is large, the penalty interval is large as well and the tardiness penalty decreases less quickly.
The implementation of the FR heuristics has been made in C++17 using the IBM CPLEX library v.12.10. Moreover, the DIM metaheuristic has been developed in C++17.
The tests have been performed on a desktop computer with Intel i5 2GHz CPU processor and 4GB RAM. Solving time is limited to 3600 s for each instance.
According to the Taguchi analysis performed on the FR TI Pulse, the best setting is the following : σ = 3 × p max , δ = 50%,s j = Max. With regard to the TI On/Off model, the most performant setting is σ = 2 × p max , δ = 25%,s j = Max. For the FR heuristics, the solving time of the subproblem at each step has been limited to nT/3600 s. For the DIM, the best tuning is the following: popsize = 100, α = 0.8, β = 0.1, itmax = 1000, p c = 0.9 and p m = 0.7.

Results
First, results of the FR heuristics TI Pulse and TI On/Off are presented in comparison with the best known solution (BKS) found (either ATI or Chen et al. [12], which is shown in Table A8 in the Appendix A), then with the most efficient FR heuristic and the best and the average performance of the DIM among 10 runs. In Tables 7 and 8, each line represents an instance with its setting n, τ, R and T. Objective value, CPU time (s) and deviation from the BKS (%) are given for each approach. The last line presents the average performances across the benchmark.   Table 7 displays a comparison between the BKS and the two FR heuristics. The FR Pulse heuristic obtains 2% deviation on average from the best solutions found by the models in [12,13], with reasonable solving time. For small to medium instances, the heuristic finds solutions with 13% deviation on average. For large instances from n = 50 to 100, the heuristic finds better solutions than the exact approaches (10/45). This is particularly true on instances with τ = 0.1 and R = 0.1. The FR Pulse heuristic is able to improve up to 91% of the objective. The FR TI On/Off heuristic has 22% deviation from the BKS with an average solving time of 30 s. On average, the solution founds are of lower quality for this heuristic compared to the TI Pulse. However, this heuristic can find solutions with less computational effort (10 times less). The FR TI On/Off finds better solutions than the exact approaches for 7 instances with n = 50, 100 and specific τ and R values.
FR heuristics are better for small values of τ on average. This is probably because the FR heuristic is guided by the accumulation of information in the OW to make selecting and sequencing decisions. When τ = 0.1, orders are available from the beginning, therefore, the local choices (in the OW) better match the global choice and thus the optimal solution. If τ is large, orders are available at different points of the horizon making the decision in the OW much more local. Consequently, the heuristic will possibly be led to make a succession of bad choices, without being able to backtrack. When σ is larger, FR heuristic benefits from more information to make decisions but it increases the computational effort. Table 8 displays a comparison between the BKS, the FR TI Pulse heuristic, and the best and average solution found by DIM. The DIM metaheuristic (best) obtains −14% deviation on average from the BKS for a resolution time of 26 s on average. The tests allowed to find 13 better solutions compared to the MILPs. The robustness of the metaheuristic on small instances (0.2%) is as good as on large instances (0.8%). Finally, the performance in terms of deviation from the BKS is −35% for large instances on average and 0.08% for small to medium instances on average.
In terms of deviation, the DIM metaheuristic dominates all the approaches. However, we notice that the performances depend on the characteristics of the instances. For τ = 0.9, the DIM metaheuristic is less robust and need much more iterations to find the optimal solutions. This can be explained by the non-specificity of the mutation operators to tackle instances with scattered release dates and less flexibility. When n ≥ 50, the FR heuristics become better than MILPs with reasonable solving times. Moreover, a clear improvement is noticeable with instances for which τ = 0.1.

Conclusions and Perspectives
This paper proposes two new mathematical formulations for a rather recent research, that is, the OAS problem with release dates, sequence-dependent setup times, TOU costs and CO 2 emissions periods. The provided MILPs are time-indexed; however, these new exact models are limited to solve medium and large instances in presence of sequencedependent setup times. Without setup, the TI On/Off formulation is the most competitive.
In this context, original FR heuristics that approximate setup are developed, taking advantage of time indexation. Better solutions have been found by these heuristics. According to the results obtained on the state-of-the-art benchmark, the best version of FR heuristic is the one with the TI Pulse formulation. Moreover, in this paper, a population-based metaheuristic is also developed. The latter is based on Dynamic Island Model framework. This procedure can solve small to large instances within half a minute on a personal computer with average performance features.
Future work can be dedicated to the extension of the problem to other systems such as parallel machines or floor shop. Determining mathematical properties of the studied problem can be undertaken. In addition, further analysis can be devoted to the instance settings τ and R and TOU tariffs policy. Moreover, an extensive analysis on other CO 2 emissions reduction policies is an interesting prospect, as this work only focuses on the taxes on carbon emissions. For instance, a limitation on carbon emissions could be incorporated in the formulation.
Extended tests on larger instances n > 100 shall be performed in order to assess the performance of FR heuristics. For instance, Benders decomposition approaches can be developed on the presented time-indexed formulations and compare the performances with the provided FR heuristics. As for the DIM, more specific mutation operators shall be developed in order to tackle instances that are difficult to solve. Other solution representation can also be explored in order to compare it with the sequence-based representation.  Data Availability Statement: The data are available in the following Github repository https:// github.com/worldstar/OpenGA/tree/master/instances/SingleMachineOASWithTOU (accessed on 14 January 2020).

Acknowledgments:
We gratefully acknowledge and express our appreciation to the Editor and the anonymous reviewers who provided feedback.

Conflicts of Interest:
The authors declare no conflicts of interest.

Abbreviations
The following abbreviations are used in this manuscript:  [58]. It limits the number of parameter settings to be tested to estimate the optimal one, when testing all of them would take a tremendous amount of time. To put things into perspective, when five parameters taking two values, a brute-force approach would consist in testing 2 5 settings on a sample of 15 instances. A high estimate of 3600 s per instance gives a total of 1.7 million seconds to carry out the tests, which is not reasonable. To avoid such thing, Taguchi designs orthogonal arrays, which consider a reduced amount of settings to run. The results of each experiment are converted to a Signal-to-Noise (S/N) ratio in order to estimate the effects of control factors on the data mean and variation. The S/N ratio response table provides the best tuning for the process.
Characterize levels of factors 3.
Select an orthogonal array 4.
Run experiments and collect the response data 5.
Analyze the experimental data 6.
Identify the optimal levels of factors 7.
Validate experiment First, control factors with their range of values are selected. A large catalog of designs are available, as required. For instance, with 4 factors and 2 levels by factor, the proposed design referrers as 2 4 or L 8 , which means a P = 8 runs design. The resulting orthogonal array must be judiciously selected, particularly if factors are dependent. In step 3, the default orthogonal array given by Minitab is chosen. During step 4, experiments are carried out with the proper settings described by the orthogonal array. In each experiment, objective values are collected. As the problem aims at determining factor levels that result in the largest means, the S/N ratio of each experiment e = 1, . . . , P is calculated with the Larger-is-Better formula as follows, with y i the ith objective value and M the sample size: Adjustment have been made to the collected response values in order to properly use the formula described in (A1) and remove statistical biases. First, response data have been smoothed using min-max normalization. Second, as the S/N function takes only strictly positive values, normalized values have been scaled into the interval [1,2].

Appendix A.2. Fr Heuristics
For the tests, two values for the setup times are used, as shown in Table A1. The FR heuristic depends on parameters such as the decision window size σ and the number of overlapping periods δ. The impact of these parameters is studied with σ = 2p max and σ = 3p max (p max = max j=1,...,n p j ) and two values of δ representing 25% and 50% of the length of the decision window σ.
For the FR TI Pulse and the TI On/Off formulations, 3 factors and 2 levels are chosen in the design of experiment. Table A2 presents factors A, B and C and their corresponding description and levels (set of possible values). The orthogonal array for the FR TI formulations is given in Table A3. Each line corresponds to an experiment with its associated settings. For example, in experiment 3, factors A and C have their values fixed at second level and factors B at first level.      Table A6 presents the factors and their levels for the metaheuristic. The response table for S/N ratios for the DIM metaheuristic is displayed in Table A7.