Opportunistic Strategy for Maintenance Interventions Planning: A Case Study in a Wastewater Treatment Plant

Featured Application: The application potential includes any sewage treatment system that makes use of preventive maintenance plans for highly critical assets, and requires the utilization of an optimized grouping strategy to improve the availability of the components involved. Abstract: Wastewater treatment plants (WWTPs) face two fundamental challenges: on the one hand, they must ensure an efﬁcient application of preventive maintenance plans for their survival under competitive environments; and on the other hand, they must simultaneously comply with the requirements of reliability, maintainability, and safety of their operations, ensuring environmental care and the quality of their efﬂuents for human consumption. In this sense, this article seeks to propose a cost-efﬁcient alternative for the execution of preventive maintenance (PM) plans through the formulation and optimization of the opportunistic grouping strategy with time-window tolerances and non-negligible execution times. The proposed framework is applied to a PM plan for critical high-risk activities, addressing primary treatment and anaerobic sludge treatment process in a wastewater treatment plant. Results show a 26% system inefﬁciency reduction versus the initial maintenance plan, demonstrating the capacity of the framework to increase the availability of the assets and reduce maintenance interruptions of the WWTP under analysis.


The Relevance of Modern Maintenance Management in WWTPs
As technological development and operational efficiency increase within organizations, the production systems reveal a continuous increase of their complexity levels, while simultaneously consumers, stakeholders, and communities demand higher quality, reliability, and sustainability of the products and services required [1]. The interaction of these phenomena has generated radical changes within organizations, mainly in the way of meeting the demands of the interested parties, orienting themselves toward a holistic and multidisciplinary approach [2]. In this scenario, modern maintenance management has changed from being considered an auxiliary discipline necessary to maintain the operational status of equipment and assets, to becoming an essential strategic business function in many industries, not only for improving the reliability and productivity levels of the production systems involved, but also to increase the profitability of operations [3][4][5].
In the maintenance management context, the utilization of optimization tools during the maintenance planning stage can offer substantial savings regarding Operation and Maintenance (O&M) costs, considering that the potential for cost reduction can reach between 10 and 20% of the maintenance budget, and the latter can even exceed total operational costs in several industries [4,6]. On the other hand, optimized planning of maintenance activities in organizations can offer substantial improvements in the availability and productivity levels of the systems involved [6]; hence the need to devote a considerable effort for the generation of cost-effective, flexible, and fast-deployable maintenance plans, offering wide applicability and adaptability to different industrial settings and environments [3,7].
In the case of the wastewater treatment process, the industry needs to optimize the use of resources to achieve the expected service levels, which necessarily implies the use of efficient management strategies within treatment plants to ensure their survival under competitive industrial environments [8]. At the same time, water quality standards are becoming more and more stringent to prevent the negative impact of wastewater discharges on the environment [9]. Furthermore, this industry faces an additional challenge: service failures and breakdowns do not only establish an operational cost increase, but also represent a potential risk of environmental damage due to its malfunction [8]. Therefore, wastewater treatment plants (WWTPs) must constantly develop efficient managerial strategies, not only in the management of their operational resources and in the application of maintenance plans and policies, but also in the proper management of wastewater, ensuring the quality of service and environmental management of water resources [10,11].

Wastewater Treatment Plants and the Need for Cost-Effective Preventive Maintenance Strategies
Wastewater treatment is essential to protect human health and ensure the environmental sustainability of the planet [8]. Besides the relevance of stabilization and treatment of effluents for human consumption, wastewater treatment-related processes represent a fundamental task in a wide variety of industries such as livestock, chemical, and mining [12]. In response to the industrial and government requirements and protocols, wastewater treatment can be carried out through a wide variety of physical, chemical, and biological processes [13]. In this regard, biological processes, including anaerobic digestion processes, have demonstrated economic advantages over other treatment processes in terms of invested capital and related operational costs [12].
Biosolids that are generated from the primary and secondary wastewater treatment processes are denominated sludge. Generally, municipal sludge contains a wide variety of organic and inorganic materials, including biomass, oils, heavy metals, synthetic materials, and pathogens; for this reason, these solids must be necessarily pretreated before their disposal [13]. Within the wastewater treatment process, the treatment and use of sewage sludge have become an international waste management problem. In this sense, the digestion process in sludge treatment has become the first step in the reuse chain, representing up to 50% of the operational costs in sewage treatment plants [13,14]. Additionally, the sewage treatment process establishes a valuable source of raw materials, especially for obtaining nutrients, energy, and water for human consumption [9].
Within the municipal sludge treatment systems, anaerobic digestion processes have demonstrated wide application and positive externalities as a sustainable technology [15,16]. Anaerobic digestion consists of a series of biological processes in which several microorganisms decompose biodegradable materials in the absence of oxygen, differing in this regard from aerobic treatment systems such as, e.g., activated sludge processes [13,17]. During the anaerobic digestion process, microorganisms convert the organic material into mainly two products: digestate and biogas. The latter is mainly composed of carbon dioxide and methane, which can be used to generate electricity and heat, starting from the decomposition of biodegradable organic materials [18]. By the implementation of digestion processes, the formation of odors is eliminated and pathogens are destroyed, also generating a stabilization of the organic product [19]. For these reasons, anaerobic digestion processes have received increasing attention in several countries and organizations during the past decades [17].
Anaerobic digesters or anaerobic reactors can be used both for sewage pretreatment and sludge stabilization. Upflow anaerobic sludge reactors (UASB) closely resemble those for sewage treatment, except that: (a) Recirculation of the sludge and its mechanical agitation are kept to a minimum or omitted; (b) the reactor is equipped in the upper part with a system for separation of solid-gas phases [20]. This type of digester has been successfully implemented in recent years for the treatment of municipal wastewater, achieving high efficiency in the removal of suspended solids and biodegradable organic matter in countries with warmer climates [21]. In addition, sludge from high-strength wastewaters, for example, are preferably treated by anaerobic processes, thus providing potential for energy generation while producing low-surplus sludge [12].
Although anaerobic digestion processes, in general, have demonstrated a compact design and positive performance, they have evidenced certain difficulties, e.g., in providing complete stabilization in the case of high resistance water treatment. For this reason, several alternatives have been proposed to provide a cost-effective and efficient treatment for the management of sewage sludge, such as, for example, hybrid systems which combine anaerobic and aerobic processes simultaneously [12]. Alternatively, wastewater treatment plants must respond to an effective and efficient application of their maintenance plans, especially on preventive maintenance (PM) activities, considering the environmental impact that an operational malfunction may eventually cause. In this sense, several studies confirm that the lack of optimal PM plans in industries tends to increase the costs of repairable assets, deteriorating the productivity levels, and entailing social and environmental risks [22][23][24].
As in many reported industrial cases, sewage treatment plants have the potential to improve cost-efficiency through an optimized strategy for the execution of PM plans, considering that WWTP maintenance costs may represent 25% of total operational costs [8]. Additionally, the maintenance plans designed for complex systems address a wide variety of maintenance activities, each with its specifications, execution periods, and technical requirements. In this context, the application of PM activity grouping strategies can be particularly useful, representing a cost-efficient solution for multi-component production systems with high economic dependence, and where the costs associated with scheduled and non-scheduled system stoppages (i.e., system inefficiency costs) are significantly higher than the individual execution costs of the maintenance plan (i.e., direct execution costs).

Maintenance Grouping Strategies and Literature Review
In the maintenance planning context, the grouping strategy establishes a time span over which a joint execution of a certain number of maintenance activities is planned (See e.g., [25,26]). Through the application of the grouping strategy, it is possible to reduce the system setup costs related to the preparation, assembly, or disassembly of components when performing maintenance tasks. In the case of multi-component series systems, the grouping strategy also allows increasing system availability and consequently improving efficiency and productivity levels, while reducing the number of interruptions and downtime service [27]. In this regard, the scientific literature shows a wide application of this type of strategy in several industries, addressing the technical features of the productive environments, as well as the complexity and interaction between its components. In this sense, Thomas [28] originally proposed three types of multi-component system dependencies, namely: economic, structural, and stochastic. In this way, it is possible to classify the strategies proposed in the scientific literature, depending on the interactions addressed in each particular case.
The economic dependence on a production system establishes that the joint execution of maintenance activities is economically more efficient than their individual executions. In this way, the grouping strategy then implies a reduction in total maintenance costs. The vast majority of research related to the application of these strategies seeks to exploit this phenomenon in economic-dependent systems (see Hameed and Vatn [29], Zhou et al. [30], Xia et al. [31], Van et al. [32], Do et al. [26], Pandey et al. [33], Zhu et al. [34], Wu et al. [35], Zhou and Yu [36]). Addressing economic dependency, Mena et al. [34] develops a framework for planning PM activities, considering the use of time-window tolerances for the advance or delay in the execution of activities under a controlled risk zone, facilitating the creation of grouping schemes in a continuous-time planning horizon. Considering the close strategic relationship between operational and maintenance functions, recent research extends the grouping problem from a joint optimization approach to operational and maintenance management (See e.g., Nourelfath and Châtelet [37], Xiao et al. [38], Bertolini et al. [39], Zhou and Shi [40], Chen et al. [41]).
In several production systems, scientific literature in the field have shown the presence of different types of interactions or dependencies, which cannot be omitted when developing and implementing an efficient grouping strategy. For this reason, recent research in the field addresses several dependency types in complex systems. Structural dependence, on the one hand, includes the requirements for replacement, repair, and assembly of other work units to carry out the maintenance activity on a certain component [28]. In this sense, Dao and Zuo [42] address the existence of structural dependence for the evaluation of a selective grouping and maintenance strategy in multi-state multi-component series systems. On the other hand, stochastic dependence establishes the existence of lifetime correlation and deterioration between the different components of the production system. Recently, the research developed in Vijayan and Chaturbedi [43] simultaneously addresses the economic and stochastic dependence in multi-component systems, while research works such as Lu and Zhou [44], and Fan et al. [45] address the stochastic dependence phenomenon by modeling the deterioration of the components.
In the context of sewage treatment plants, some studies have addressed the impact of maintenance and renewal policies on the performance and efficiency of the plants, revealing a strong relationship between the management and application of said policies, the proportion of O&M costs, and the productivity levels. Hernández-Chover et al. [8] assess the effect of maintenance strategies on the efficiency of the facilities for a sample of WWTPs, establishing a correlation analysis between the non-parametric efficiency indicator data envelopment analysis (DEA) and the maintenance cost items. The results show that higher investment in preventive maintenance offers not only a reduction in total repair costs, but also a positive effect on overall efficiency indicators. Similarly, Ozgun et al. [9] propose the determination and analysis of cost functions in several full-scale sewage treatment plants, disaggregating capital costs (investment and equipment), and O&M costs on a sample of facilities in Istanbul. The study establishes, corroborating the results of previous studies, that higher treatment levels increase O&M costs, especially in tertiary treatment plants, reaching 58% of the total costs. This fact could be partly explained by the number and complexity of the components required for tertiary treatment. Additionally, O&M costs are fundamentally composed of energy supply costs and labor costs, reaching 39% and 31% of the total costs in tertiary treatment plants, respectively. These research works reveal the relevance of both energy cogeneration and the implementation of maintenance strategies to improve the cost-efficiency of wastewater treatment facilities.

Research Contribution
Despite the extensive amount of research in the field, to the best of our knowledge, there is no evidence of case studies observed for the application and implementation of preventive maintenance grouping strategies in sewage treatment plants. However, recent studies reveal a positive impact on the application of maintenance policies in the industry, establishing a high correlation between investment in preventive maintenance, savings in energy consumption, and increases in global efficiency indexes [8]. In this way, the research gap reveals two fundamental aspects that this study seeks to exploit: (1) The absence of case studies that address the application of maintenance grouping strategies in the water treatment industry; (2) and besides, the need for adaptive, flexible, and fast implementation strategies of PM plans in the wastewater facilities in response to efficiency, productivity, and quality of service requirements. In summary, this research proposes a new strategy to improve cost-efficiency and asset availability in wastewater treatment plants, with a focus on the implementation of the grouping framework during the maintenance planning and scheduling stages.
The framework presented below considers the extension of the formulation originally presented in Mena et al. [46], addressing the application of new criteria to improve the applicability and adherence of the strategy in industrial environments, through the incorporation of time-window tolerances for the creation of opportunistic grouping schemes and non-negligible PM execution times. Following the research line, the framework proposed in this article seeks to improve the availability level of complex production systems, reducing the number of system interruptions, while minimizing maintenance costs through the distribution of stoppage costs for PM activities.
The article is structured as follows: Section 2 presents the established Methodology for the application of the opportunistic grouping strategy; Section 3 presents the results obtained under the application of the strategy; Section 4 discusses the results and their impact on the analyzed case study; Section 5 establishes recommendations, limitations, and future research in the field.
Research gaps found in the literature: • Absence of case studies addressing formulation and implementation of maintenance grouping strategies in the wastewater treatment industry; • New and tailored strategies for an adaptive, flexible, and fast implementation of PM plans in the wastewater facilities, seeking to improve cost-efficiency and quality requirements.

Main contributions of the study:
• Presentation of a novel case study for the implementation of preventive maintenance execution strategy in wastewater and sludge treatment facilities; • Framework formulation and computational optimization of a grouping strategy for preventive maintenance activities, seeking to minimize the number of planned interruptions, fixed maintenance costs, and system downtime; • Adaptation of the grouping strategy to the case study under analysis, in response to the requirements of maintenance cost reduction, productivity, and quality of service required on wastewater facilities; • In-depth discussion of results and recommendations with a focus on managerial insights.

Materials and Methods
The research methodology addresses five essential aspects: the presentation and context of the sludge and wastewater treatment process in the case study; the formulation of the optimization model for the opportunistic strategy; computational implementation and programming settings; and the discussion of the proposed performance indicators.

Wastewater Treatment Process: Case Study
The research addresses the study of the primary and sludge treatment in a WWTP for human consumption. The first stage considers a preliminary treatment, which consists of the removal of coarse fine solids, and sand. The preliminary treatment is followed by the sedimentation process, removing approximately 40% of biochemical oxygen demand (BOD) and 50% total suspended solids (TSS). After the primary treatment by flocculation and settling process, the water effluent continues its course to the secondary treatment. At the same time, the WWTP also considers the biologic treatment of sludge generated from the lamellar settling stage, going through an anaerobic digestion process, until its final disposal in a storage silo to become a sanitary landfill. The primary wastewater treatment process is set up from the stages described in Figure 1. In the WWTP understudy, it is possible to consider three fundamental processing lines: wastewater, biogas, and sludge treatment. The sludge is produced as waste from the primary treatment phase and is transformed to treated sludge or safe biosolids, through three main processes: sludge thickening, anaerobic digestion, and dehydration using centrifugal pumps for proper handling and final disposal.

Effluent Treatment Process
The water flow mainly comprises the primary treatment process. The objective of this stage is to remove the heavy solids from the stream, as well as eliminate possible fats and oils spilled on the wastewater on its way to the plant. During this stage, the inlet flow measurement to the plant is first established. Later, the retention of the bulky waste takes place before the impact grates, to go through the roughing and sieving processes. Once the high granulometry particles have been eliminated, the effluent continues its course toward grinding and degreasing. The residues are extracted and diverted to the fat concentrator and the sand classifier. Finally, the watercourse drifts to the mixing and flocculation chambers, concluding in the primary sedimentation based on lamellar settlers. The flow is measured again at the inlet of the settler, where the final effluent is available for secondary treatment.

Sludge Treatment
The sludge treatment line contemplates the sifting, thickening, stabilizing, and dewatering of the sludge from the sedimentation or primary clarification process. This stage is based on the biological treatment of sludge, through the anaerobic digestion process. After removing the waste through a sieving process, the sludge is pretreated considering a thickening process, in order to obtain thickened sludge with an appropriate concentration for entering the sludge digesters. The purpose of the thickening process is to reduce the volume of sludge by partially removing useful water. In the thickener, the sludge is agitated and remains in the chamber for a prolonged period, where the heavier particles settle at the bottom of the thickener, separating the useful water from the sludge, the former being extracted by mechanic suction.
The sludge is subsequently derived from the anaerobic digesters for its stabilization. This stabilization is achieved through a biological procedure that allows degradation of the remaining organic matter, utilizing a biological fermentation carried out in a vacuum chamber [13]. A variety of gases are obtained from the chemical reaction processes involved in fermentation, mainly methane and carbon dioxide. To improve the mesophilic stabilization process, the sludge is preheated at the inlet of the digester to maintain a temperature of 35 • C. For this purpose, the sludge building contains a system of hot water boilers, spiral water, sludge and heat exchangers, and horizontal centrifugal pumps to propel the hot water to the exchangers. Finally, the stabilized sludge is stored in a buffer tank and later dried using decanter centrifuges. The dewatered sludge is finally transported for its storage in silos.

Gas Processing and Power Generation
Sewage treatment comprises high energy-consuming processes, whose electricity consumption contributes up to 30% of total operating costs, and the treatment and disposal of sludge constitute about 20% of the total operating costs [47]. From the total electricity consumption of the plant, 35% is used for the treatment of the sludge and its disposal [48]. A considerable part of the thermal energy is required to manage the sludge produced during the digestion process. The biogas generated in the anaerobic digestion process can be used to generate combined heat and power (CHP) for energy cogeneration [48].
During the sludge stabilization process, biogas is obtained as a result of a methanogenesis reaction, which can be used for the production of electrical energy and the consumption of the plant. In the gas processing line, biogas is produced, stored, treated, and the obtained energy is reused in the combustion process for energy use within the treatment plant [49]. In this sense, the yield of the biogas and the digested matter are interdependent: while the former increases, the latter decreases. Additionally, the organic solids generated in the anaerobic digestion process can be used as fertilizers, so their agricultural application can be more environmentally benign [48].
A portion of biogas is used during the digestion of sludge, as fuel for the generation of electrical energy in the plant. The remanent biogas is carried out in the sludge heaters used to maintain the digester under mesophilic temperature, supplying energy to the heat exchanger. As the sludge heats up and enters the digesters, the water cools and can be returned to the engine cooling system. Finally, the odors are extracted through a network of ducts that will connect the processes with the deodorization system. Figure 2 summarizes the different assets involved in the anaerobic digestion process within the WWTP.
Based on the primary treatment and sludge treatment described in Section 2.1, and the application of a criticality analysis of the assets, it is possible to identify fifteen (15) maintenance activities that are critical and highly hazardous to the operation of the WWTP. These activities correspond to different essential tasks for the operation of the different areas and processes involved in wastewater treatment. To carry out each of these activities, it is necessary to stop the continuous process in the wastewater primary treatment or in the sludge treatment, which involves a loss in the availability of the assets involved. The information on the preventive maintenance plan is summarized in Table 1.  The grouping strategy proposed below considers an extension of the research originally presented in Mena et al. [46], which incorporates the use of time-window tolerances for the creation of grouping schemes during the planning stage of preventive maintenance activities. The extended formulation, as presented below, considers the incorporation of new applicability criteria to improve adherence and adaptation of grouping strategies under several industrial environments, including wastewater treatment plants.
Consider an arbitrary set I that includes a total of |I| maintenance activities to be planned on a fixed planning horizon T, characterized by a continuous-time analysis and set by the planner according to scheduling requirements. Consider J i as the set of total executions to be scheduled for each i ∈ I. Considering that the most common scenario is to establish a fixed frequency for each PM i ∈ I activity, the framework contemplates a fixed execution periodicity T i < T and known fixed execution times p i . As in the base formulation presented in Mena et al. [46], the framework originally establishes a time window over which each activity can delay or advance its execution time t s i, j , by a maximum of w i = e · T i time units, where e represents a tolerance factor, expressed in terms of a percentage. In this way, the length of the time window w i depends exclusively on the periodicity associated with each PM activity i, and in turn, In this way, the start time can be modified with respect to its tentative execution date, which corresponds to Therefore, the use of tolerance seeks to generate simultaneous executions of work activities, thus establishing grouping schemes throughout the planning horizon (See Figure 3a). The framework defines the existence of a work package when two or more executions (i, j) and (i , j ) satisfy the following condition: This means that a work package is necessarily formed if two or more selected activities are executed in parallel and at the same start time. Therefore, the grouping definition stated in expression (2) establishes that the time windows of the executions (i, j) and (i , j ) must overlap to generate new grouping schemes. This formally implies that two activities can be grouped only if the following condition is satisfied, with respect to the time-window tolerance definition: The definition of grouping schemes presented is based on two main advantages. On the one hand, simultaneous execution of PM activities allows sharing fixed maintenance costs in economic-dependent systems, reducing the number of system interruptions while reducing downtime in multi-component series processing lines. On the other hand, the definition allows to reduce the complexity of the problem arising in other overlapping conditions, e.g., in the case where t s i, j = t s i , j and max t s i, j , t s i , j As a general rule, it is satisfied that p i = p i . Therefore, the existence of grouping schemes generates that t f i, j − t s i, j ≥ p i , given the difference between the repair times of the activities involved (See Figure 3b). Therefore, it is necessary to distinguish the unavailability time r i, j , versus its individual execution time p i , for each of the executions (i, j) ∈ I × J i . This is necessary considering the relationship between the completion time and the start time of the same activity, i.e., t f i, j − t s i, j = r i, j ≥ p i . In this way, the unavailability time for each execution (i, j) is determined by the occurrence of the grouping schemes, which is formally expressed as where z i, j, i , j represents a grouping binary variable which takes the value of 1 if the pairs (i, j) and (i , j ) form a work package (or equivalently, an opportunistic group). As indicated above, the resumption time of a finished execution can be modified due to the formation of grouping schemes, which affect the future planning of the execution times. Besides, considering that t s i, j and t f i, j depend on r i, j , which in turn depends on the grouping decision (i.e., the value of the binary variable z i, j, i , j ), it is not possible to determine a priori the set of values t s i, j , t f i, j i∈I, j∈J i . Therefore, considering that J i corresponds to a known parameter, the following procedure will be considered to determine the number of scheduled executions: where r = max i∈I {p i }. Once J i has been determined, it is then necessary to determine the feasibility of the groupings between the different activities that compose the PM plan. Considering that the time-window tolerances impose grouping feasibility constraints, a search algorithm is proposed in order to explore feasible grouping schemes, thus improving the processing time of the optimization model. The search Algorithm 1 is presented below: If N P = ∅ : 17: For each (i, j, n, o) ∈ N p : 18: Add (i, j, n, o) to M 19: For each (i, j, n, o) ∈ M : 20: If (i, j, n, o) / ∈ N: 21: Add grouping pair (i, j, n, o) to N Output: N set The algorithm begins by generating a set of arbitrary values r i, j i∈I, j∈J i , obtained from a sample of individual execution times {p i } i ∈I . A set of values then establishes an arbitrary instance for the activities and executions {(i, j)} (i∈I, j∈J i ) , from which is possible to obtain each feasible grouping pair {(i, j, i , j ) : i, i ∈ I; j ∈ J i , j ∈ J i } for a given instance. The sample for each scheme can be approached through an exhaustive searching process, iterating over the Cartesian product R = P C , where P = {p i : i ∈ I}; C = ∑ i∈I |J i |, obtaining in this way all possible grouping combinations. Each scheme allows the construction of the set N i , defined as: In turn, the set N p contains all the information of the feasible groping pairs for each arbitrary iteration instance r i, j i∈I, j∈J i . This is formally defined by the following expression: Once all the possible grouping pairs (i, j, i , j ) have been determined for each instance, the tuples are incorporated and stored in the definitive set of grouping schemes N. According to Mena et al. [46], the set N defines the solution space (feasible grouping pairs) for the clustering binary decision variables z i, j, i , j . Therefore, the presented framework seeks to address two fundamental planning decisions: the grouping schemes (i.e., the optimal set z i, j, i , j (i,j,i ,j ) ∈ N ), and the execution time scheduling of the activities (i.e., the optimal set t s i,j , t f i,j j∈ J i , i∈I ), considering the use of time-window tolerances. In this regard, a conservative approach is taken into account for the use of the aforementioned tolerance, to maintain as much as possible the tentative PM execution plan. This approach is applied considering that the advancement or delay of the executions implies incurring in suboptimal economic performance (advancing), or in a greater occurrence of failures (delaying) [46]. Considering the assumption that the system unavailability corresponds exclusively to the occurrence of PM activities (i.e., system downtime is the exclusive result of repair times associated with scheduled activities), the objective function corresponds to the minimization of system unavailability, or equivalently, the time out of service. The framework will be modeled through the MILP paradigm for optimization, considering the following assumptions: General assumptions for the integrated framework: • The system contemplates a multi-component series arrangement, where each preventive maintenance activity involves a system shutdown; • The impact of stochastic and structural interactions on the system can be neglected; • The system considers a uniform continuous (24 h) operational regime;

•
The system considers a two-state operational condition (i.e., the system is necessarily under repair or under operation); • The execution of grouped PM activities always involves a simultaneous/parallel execution; • The execution of the PM activity returns the related equipment to its initial operating conditions (perfect maintenance).

An Optimization Model for the Maintenance Grouping Framework
Based on the proposed framework, the optimization model considers the following decision variables and parameters (See Table 3).
The following optimization model is formulated under the mixed-integer linear programming (MILP) paradigm. The model considers the following set of constraints for the use of time-window tolerances for the opportunistic grouping scheme: Under the utilization of constraints (9) and (10), it is possible to limit the execution time according to the width of the specified time window for each execution (i, j). The set of constraints (11)-(16) establishes the adaptation of the base formulation for the application of the grouping strategy with time windows of tolerances, according to Mena et al. [46].
To correctly quantify the downtimes associated with each execution pair (i, j) in the event of grouping schemes, it is necessary to incorporate the following sets of additional constraints to the base formulation. The set of constraints (17)- (19) establishes in a first instance suitable lower bounds for the measurement of r i, j . In this way, if a certain execution (i, j) belongs to a work package, then the constraints (17) and (18) state that Otherwise, the expression satisfies that r i, j ≥ p i , through constraint (19). Constraints (20)- (22) establish several upper bounds to control the growth of the variable r i, j , through the incorporation of the auxiliary binary variables u i, j and v i, j , avoiding an overestimation of the unavailability associated with each execution. In this sense, constraint set (23) ensures that for the set of variables u i, j , v i,j,k,l , v i,j,n,o , only one of them is activated for each pair (i, j), establishing precisely the best adjusted upper bound, so that the expression (27) is translated into: In this way, the set of constraints (17)-(23) allows a correct assignment of the downtime values through variable r i,j . Constraints (24) and (25) limit the growth of these variables in the absence of grouping schemes, so that v i, j, n, o = 0, if z i, j, n, o = 0. Therefore, constraints (16)-(25) generate the following behavior: if the tuple (i, j) is grouped, then its associated downtime corresponds to the expression (28); otherwise, this time corresponds to r i, j = p i . Additionally, constraint (26) is proposed to correctly count the unavailability associated with each work package only once. Finally, the objective function corresponds to the minimization of expression (8). In this way, the counting of d i, j corresponds exactly to the unavailability of the system, so that expression (8) corresponds to the minimization of the system's downtime. In summary, the model conformed by the objective function (8) and constraints (9)- (26) corresponds to the optimization model for the opportunistic grouping framework proposed in Section 2.2.1.

Computational Implementation and Optimization Settings
The framework presented in Section 2.2.3 responds to the MILP mathematical programming paradigm, whose computational implementation has been widely extended in optimization problems, especially on operations and maintenance management, including the planning and scheduling phases, logistics, distribution, and multi-product optimization [50]. In decision-based optimization problems, it is common to incorporate not only quantities attributable to continuous variables but also discrete variables and non-linearities that can be replaced by piecewise linearizations [50]. In these cases, the MILP paradigm is adjusted responding to the requirements through Linear Programming-base Branch and Bound solvers, which allow, in addition to the generation of strict upper and lower bounds, to obtain performance metrics regarding the optimization of the solutions obtained, improving computing times, and decreasing computational costs through the decision tree pruning of integrality constraints [51]. In this way, these algorithms are characterized by the obtention of deterministic optimal solutions, considering only exact parametrizations.
Despite the approach of these algorithms, a variety of highly complex problems, including maintenance activity grouping problems [52], can offer a large number of ramifications of the decision tree, reducing the quality of upper and lower bounds, and therefore generating a suboptimal solution [50]. However, new commercial alternatives combine sophisticated techniques that combine the use of Branch and Bound algorithms with Branch and Cut, e.g., engines such as CPLEX or Gurobi. Using commercial optimization engines to solve the MILP model, the programming stages are described in detail below:

1.
Definition of instance sets and parameters: In the first stage, the script defines the parametric set that characterizes the instance presented through the PM plan in Section 2.1. This implies incorporating information such as the number and frequency of activities, the execution tolerance factors, and repair times per activity.

2.
Embedding MILP solver: The Gurobi solver is implemented using Python programming language for its optimization and obtaining results. To do this, a general optimization instance is defined over which the sets of constraints that make up the formulated framework are loaded. The optimization engine is executed under standard searching settings, aiming at an intensification strategy to improve the optimality of the solution found, at the expense of higher computing times.

3.
Determination of performance indicators and rendering process: Once the best solution found by the Gurobi solver has been obtained, the results are compiled and stored in sorted list-type data, on which the information of each of the performance indicators is obtained according to the definition specified in Section 3. To calculate these indicators, the use of specialized programming packages is considered to manipulate the data using real intervals, allowing the operations of union and intersection of sets in the system unavailability estimation. The graphic material is then rendered under the Python programming language, using the information stored from the compilation.

Results
The computational results were measured through a 2.6 GHz processor and 8 GB of built-in RAM, considering a 64-bit architecture. The implementation was carried out using the Gurobi optimization solver, embedded with Pyomo programming language. Different performance indicators were considered to measure and quantify the results of each maintenance plan, under several tolerance factors. In addition to the system availability indicators and related indicators, it is also relevant to measure the effective use of the tolerance time windows, with respect to the estimated optimal grouping scheme. In this sense, the following percentage indicators of average advancement A AV and average delay D AV of the activities are presented in (29)- (30), where the sets I J − and I J + exclusively consider the number of early or late execution pairs (i, j), respectively: Additionally, the percentage frequencies of advanced, delayed, and conservative executions of PM activities will be reported as auxiliary indicators for each instance. These performance indicators are defined through the expressions (31)- (33) as follows: Considering the complexity in the representation of the instances, as well as the difficulty of presenting them explicitly, the results of the computational implementation under several tolerance factors are summarized in Table 4. For the implementation of the grouping strategy, it was considered an annual planning horizon (52 weeks), and a maximum tolerance factor of 5%, according to the standard percentages proposed in [46]. According to the reported optimality gap, results reveal that the optimization model has reached optimal solutions for each one of the instances. Simultaneously, a consistent increase in the tolerance factor offers, in turn, a progressive decrease in the system unavailability. The results show the capacity of the formulated framework to generate new grouping schemes as the tolerance factor increases. This observation is corroborated by the progressive decrease in the number of activities carried out on their tentative date. Alternatively, the performance indicators A AV and D AV do not show a systematic behavior related to the tolerance factor, as well as f A and f D . Additionally, the results show an increase in the computing times related to each optimal solution as the tolerance factor increases. This is directly related to the increase in the number of feasible grouping schemes as the time-window tolerances increase their width for each execution pair. Therefore, an increase in the number of grouping schemes implies a greater number of decision variables. However, the computing processing times do not exceed more than 1(s) to solve the optimality problem, which is not a limitation for solving grouping schemes to optimality in reasonable computational times. Table 5 presents the results of unavailability by scenario, and the reduction of inefficiency, measured as the percentual difference between the system unavailability reported from the initial solution with null tolerance factor (i.e., tentative execution plan), and the unavailability reported by the optimal solution. As observed in the previous results, the reduction of inefficiency through the application of the framework increases progressively as the tolerance factor increases, achieving a reduction of 26% in the downtime of the system under a 5% tolerance factor, reducing unavailability from eight weeks to six weeks. Additionally, what is observed is a decrease in the progressive reduction of inefficiency by consecutively reaching a higher tolerance factor. For example, going from 2% to 3% in the increase of the allowable tolerance causes an increase of almost 6% in the reduction of inefficiency, while going from 4% to 5% only reports a 0.2% in the reduction of unavailability. The use of the tolerance time windows, as well as the assignment of the execution times and conformed grouping schemes, can be presented employing a Gantt chart-based representation, according to the nomenclature presented in Figure 3. Figure 7 represents the base PM plan of the critical activities analyzed for a planning horizon of 52 weeks. As observed, system stoppages are represented by grey vertical lines, while system unavailability is represented by green bars. The initial tentative maintenance plan includes a total of 35 stoppages, generating system unavailability of approximately 8 weeks per year. If the application of a tolerance factor of 5% is now considered, then the number of stoppages is reduced to 19, which implies an inefficiency reduction of almost 30%, compared to the base case, which offers a gain equivalent of two-week system availability per year. What is observed in this scenario (See Figure 8) is the use of the time-window tolerances for the generation of grouping schemes, thus allowing to considerably reduce the number of interruptions while increasing the availability of the system.

Discussion
Although the results obtained from the computational implementation are consistent, it is necessary to discuss the limitations and computational complexity issues that the framework includes. In the first term, the values associated with the deviation indicators from the tentative maintenance plan cannot be underestimated. Although the optimal maintenance plan establishes a considerable reduction in the unavailability of the system as the tolerance factor increases, the number of activities maintaining their tentative execution date does not exceed 20% of the total executions.
In this sense, it is necessary to consider that the deviation from the tentative execution date of the PM activities implies a risk: the advance of activities on its part implies a poor economic performance, while its delay may increase the probability of failure occurrence in certain assets [52]. This may bring a considerable impact on the degradation of the assets, and therefore, on the probability of critical failures of the components if a previous estimation of the tolerance factor is undertaken. Hence the need to estimate, based on advanced optimization techniques and expert recommendations, the use of tolerance based on a controlled risk zone, which does not involve malfunctions that affect the economic performance of the plant, as well as the potential failures that may put the health of the personnel and the environment at risk.
On the other hand, it is necessary to discuss the resolution times reported in the computational implementation. Although the different instances have reported a reasonable computing time, what is observed is a consistent increase in the number of decision variables, as well as in the number of constraints and the computing times reported (See Figure 9). This must be taken into account when considering an increase in the number of PM activities to plan and schedule, which may eventually cause an increase in computing times or a suboptimal grouping scheme. In this sense, it is necessary to establish that the size of the MILP model previously presented, and therefore its complexity does not depend exclusively on the tolerance factor. Considering that the proposed MILP model is implemented through the use of a commercial solver, what is proposed as a measure of complexity is a theoretical expression for calculating the maximum number of variables and constraints, which directly depend on the number of executions, and indirectly, on the number and periodicity of activities, as well as the tolerance of execution of each one of them. In order to quantify the problem complexity, consider the following expression for the determination of the number of variables and constraints involved in the framework. Let the quantitiesĵ,n,k, andv be defined as:ĵ Then it is possible to calculate, considering the set of values ĵ ,n,k,v , the maximum theoretical number of variables and constraints, considering the limit case when e → ∞ (i.e., there are no grouping feasibility constraints by the use of tolerance time windows). Table 6 shows the experimental values and the maximum theoretical value for the evaluated instance, considering thatĵ = 43. As can be seen, the number of variables and constraints increases progressively and approaches to maximum theoretical values when the tolerance factor increases. This makes it possible to quantify the level of complexity of the formulated framework, and also evidence the combinatorial explosion phenomenon detected in [46]. The size and complexity of the model depend directly on the total number of executions to be planned, and the latter depends in turn on a variety of parameters, including the execution periodicity, the number of activities to be planned, as well as the length of the planning horizon. In this sense, it is important to observe that the grouping problem presented here can be conceived as a variant of the Set Covering Problem, a problem widely documented as an NP-Hard class problem [53,54]. In this sense, it is necessary to take into account the occurrence of the combinatorial explosion phenomenon for larger instances [46,52].
Additionally, the above-analyzed framework considers an unlimited availability of resources for its execution. In a scenario of scarce availability of resources, the optimal maintenance plan presented here may have to undergo considerable modifications. In this case, it will be necessary to include new decision variables that address the problem of resource allocation, thus considerably increasing the complexity levels and computational tractability of the optimization model previously presented.

Conclusions and Managerial Recommendations
The research proposes the framework formulation and optimization model for the opportunistic grouping strategy in the planning of preventive maintenance activities for a wastewater treatment plant. A review of the operating context, as well as the formulation of the framework and the respective optimization model, is addressed in detail through this research. The computational results from the implementation of the model reveal in turn a considerable decrease in the level of unavailability associated with the primary treatment line and sludge treatment, which implies the potential to greater cost-efficiency and an increase in productivity and performance levels through optimized strategies for maintenance management in the WWTP under analysis.
However, it is necessary to consider the limitations and the challenges that the framework formulation pose, among which are the consideration of a fixed periodicity under perfect maintenance assumption, and the incapability to stochastically include the failure phenomenon and its impact on maintenance plans. Other challenges posed through this research include: the impact of the maintenance execution and opportunistic groups modeling with multiple failure modes per equipment; the incorporation of different maintenance policies and grouping strategies with a focus on repairable systems; and alternatively, the possibility of modeling degradation, along with stochastic and structural interaction of the components involved in the complex production systems. These limitations in turn represent research lines and suggestive challenges that will be addressed in future research, which are directly related to the capacity of the framework to improve its flexibility, adherence, and applicability of the grouping strategy in several industrial settings.
Based on the computational implementation and the obtained results, several recommendations and insights focused on the maintenance management decision-making are presented below:

•
The maintenance grouping framework must be applied, restructured, and updated periodically to ensure effectiveness as an operational support tool. Said update process must be managed both at an operational level, (through the requirements of parts and pieces, personnel qualification, and repair and logistic-delay times) as well as at a tactical level, (e.g., faced with the incorporation or redesign of tasks through maintenance plan design tools such as Reliability Centered Maintenance).

•
The framework assumes that components can necessarily share fixed setup costs through the clustering process. This information must be corroborated at a tactical level, by reviewing the procedures for the execution of PM activities, as well as the effect of said activities on the productivity level or on the treatment plant cost function. Otherwise, the framework formulated will not allow to optimally improve the cost-efficiency of the current maintenance plan. • It is also assumed that a certain work package can share a setup cost, regardless of the number of grouped components. This necessarily implies that the more components are added to a group, the greater the fixed maintenance cost savings. However, the structure of fixed maintenance costs is generally complex, and besides, the availability of resources is generally scarce and limited. Additionally, there may be activities that may not eventually be grouped due to the existence of technical constraints (assembly/disassembly procedures, technical infeasibility due to physical/layout limitations). In this case, it is necessary to review the implementation of new constraints to ensure the grouping feasibility and the application of resource and budgetary restrictions. Future research works aim to address the aspects mentioned above to consistently improve the applicability of the framework in different industrial settings.

•
Regarding the relationship between the framework and risk management, the evidence shows the exclusion of uncertainty metrics when defining risk, and the lack of a clear definition of probability obtained in the risk assessment standards for sewage treatment systems. Considering that the tolerance level implicitly establishes a certain risk-acceptance level, it is important to establish the risk tolerance that the system can allow without affecting the performance of the treatment plant. In this regard, it is relevant to set the execution tolerance level based on the expertise, using tools such as single-component multi-attribute optimization with a focus on risk management. In this regard, it is important to consider the dependencies or critical interactions between the components of the system, considering at least three main consequence types: fatalities, injuries, and economic losses, which must be estimated on organizations, environment, and stakeholders.