Next Article in Journal
Numerical Modeling of Reinforcement Solutions in Traditional Stone Masonry Using a Particle Model
Previous Article in Journal
Architectural Evolution of Stupas in the Western Regions During the Han and Tang Dynasties
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dynamic Maintenance Cost Optimization in Data Centers: An Availability-Based Approach for K-out-of-N Systems

by
Mostafa Fadaeefath Abadi
,
Mohammad Javad Bordbari
,
Fariborz Haghighat
and
Fuzhan Nasiri
*
Department of Building, Civil and Environmental Engineering, Concordia University, Montreal, QC H3G 1M8, Canada
*
Author to whom correspondence should be addressed.
Buildings 2025, 15(7), 1057; https://doi.org/10.3390/buildings15071057
Submission received: 9 January 2025 / Revised: 2 March 2025 / Accepted: 14 March 2025 / Published: 25 March 2025
(This article belongs to the Section Construction Management, and Computers & Digitization)

Abstract

Data Centers (DCs) are critical infrastructures that support the digital world, requiring fast and reliable information transmission for sustainability. Ensuring their reliability and efficiency is essential for minimizing risks and maintaining operations. This study presents a novel availability-driven approach to optimizing maintenance costs in DC Uninterruptible Power Supply (UPS) systems configured in a parallel k-out-of-n arrangement. The model integrates reliability and availability metrics into a dynamic optimization framework, determining the optimal number of components needed to achieve the desired availability while minimizing maintenance costs. Through simulations and a case study by utilizing variable failure rates and monthly maintenance costs, the model achieves a combined system availability of 99.991%, which exceeds the Tier 1 DC requirement of 99.671%. A sensitivity analysis, incorporating ±10% variations in Mean Time Between Failures (MTBF), Mean Time To Repair (MTTR), and maintenance costs, was conducted to demonstrate the model’s robustness and adaptability across diverse operational conditions. The analysis also evaluates how different k-out-of-n UPS system configurations influence overall availability and maintenance costs. Additionally, feasible k-out-of-n configurations that achieve the required system availability while balancing operational costs were examined. Furthermore, the optimal number of UPS components and their associated minimum costs were compared across different DC tiers, highlighting the impact of varying availability requirements on maintenance strategies. These results showcase the model’s effectiveness in supporting critical maintenance planning, providing DC managers with a robust tool for balancing operational expenses and uptime.

1. Introduction

Data Centers (DCs) have become indispensable pillars of contemporary society, serving as the backbone for critical infrastructure and operations. Their intricate systems and components demand meticulous attention, particularly in the realm of Operations and Maintenance (O&M). As technology advances, the DC industry undergoes rapid evolution, with transformative changes reshaping its landscape. Based on recent reports [1], this sector is witnessing exceptional expansion, emerging customer demands, and shifts in the approach to deploying physical infrastructure.
A DC may exist independently within its own facility or be housed within a facility that accommodates various functions or organizations. They now range from small-scale server rooms to hyperscale facilities supporting global cloud services [2,3]. DCs have various subsystems and components. These include heating, ventilation, and air conditioning systems, chillers, and cooling towers. They also have computer room air conditioning units, backup power sources, boilers, and generators. Additional components include lighting systems, servers, and other essential equipment.
DCs are being built across the globe to supply various needs, reflecting a growing demand for their services. Currently, there are approximately more than 8 million enterprise DCs operating worldwide, providing different online and cloud services. These essential facilities have undergone significant transformations over time, and their technological advancements have led to increased complexity in DC operations in recent years.
There are numerous difficulties and concerns in overseeing DC facilities and their elements. Notably, the substantial power consumption by servers stands out. According to recent research findings [2,3], approximately 2% of the annual energy consumption in the United States is allocated to DCs. Moreover, researchers have conducted studies on DC operations to efficiently implement the DC operations management framework with the aim of reducing staff workload. Furthermore, another objective is to enhance work efficiency, ultimately achieving maximum overall effectiveness for the DC operations management system.
As financial strain and economic unpredictability continue to rise, the Information Technology (IT) infrastructure and hardware choices are starting to feel the impact. Recent studies [4] demonstrate that IT departments are being compelled to adopt cost-saving measures considering reduced budgets. Many businesses and companies, both large and small, are now choosing to extend the lifespan of their hardware, postponing significant capital expenditures. This practice of extending equipment lifecycles is not exclusive to smaller enterprises; even big tech corporations like Amazon Web Services (AWSs) are finding it necessary to do the same.
The intersection of communication networks and the rising demand for substantial storage and processing capabilities, especially in recent years, have spurred a heightened request for everything-as-a-service, resulting in the proliferation of new DC constructions [5]. Nonetheless, guaranteeing the dependability of these infrastructures, with a predominant emphasis on achieving system availability, remains paramount.
In terms of DC maintenance, the 2024 Data Center and Infrastructure Report [6] reveals that IT professionals prioritize price, quality service, and resolution time as the key factors influencing their maintenance decisions. The occurrence of equipment failures and downtime has a substantial impact on business operations and productivity. Industry professionals emphasize that physical infrastructure maintenance is crucial for ensuring optimal performance and cost-effectiveness throughout the DC’s lifecycle [7]. To optimize spending while enhancing support and customer experience, businesses are increasingly seeking infrastructure management and support automation solutions. Also, because of persistent supply chain challenges, prolonging the lifespan of equipment continues to be a prominent strategy for cost savings and promoting sustainability.
In addition, according to recent reports [8], the COVID-19 pandemic has had a significant impact on the hyperscale DC market, with a predicted Compound Annual Growth Rate (CAGR) of 21% from 2020 to 2024. The global expansion of DCs is expected to have varying effects on the IT industry due to this rapid growth.
Reliability engineering plays a key role in DC maintenance, with redundancy configurations like k-out-of-n systems providing fault tolerance and improved availability. One strategy involves organizing components in parallel, where the functioning of one or more elements is necessary for the system’s operation. If there are n elements arranged in parallel, the system may necessitate a minimum of k elements to function properly, where k must be less than or equal to n. When k equals n, the configuration represents a series system, as all elements must be operational [9]. The Reliability Block Diagram (RBD) and the Reliability Function of the k-out-of-n system are shown in Figure 1.
Therefore, in line with the principles of reliability engineering and as illustrated in Figure 1, this redundancy structure (alternatively termed the r-out-of-n configuration), is characterized by a system that remains operational as long as at least “k” components (elements) out of a total of “n” components (elements) are present and functioning normally [11], simultaneously ensuring the system’s successful operation. This is commonly referred to as k-out-of-n redundancy, where k is less than or equal to n. This configuration is essential for fault tolerance in critical systems such as DCs to ensure high availability.
Given that the parameter n typically exceeds the parameter k in a k-out-of-n system, redundancy is commonly incorporated. It is noteworthy that both parallel and series systems can be regarded as specific instances of the broader k-out-of-n framework. The k-out-of-n system configuration stands as a widely favored form of redundancy within fault-tolerant frameworks, enjoying extensive utilization across industrial and military domains alike. Its applications span diverse sectors, encompassing critical systems such as the multi-display setup within cockpit environments, the multiengine arrangement in aircraft, and the multi-pump configuration integral to hydraulic control systems. DCs are also one of the critical systems and infrastructures that include a k-out-of-n configuration and system redundancy.
Dynamic Programming (DP), which was initially introduced in the 1950s [12,13], is a robust and efficient mathematical and multi-stage optimization method that facilitates optimal maintenance scheduling and management for intricate, multi-state deteriorating components and systems within an infrastructure. It is a quantitative analytical technique, particularly effective in addressing large-scale and complex problems characterized by sequential decision-making processes. It achieves this by decomposing such problems into multiple decision stages, enabling systematic and optimized solutions [14].
The organization of this research article is structured based on distinct categories. Moving on to the following subsection, a comprehensive review of the pertinent literature and publications related to DC maintenance management, maintenance optimization, and k-out-of-n systems is presented. Additionally, various applications of DP in the context of maintenance management and optimization were reviewed. Section 2 presents details about the materials and methods of this research, encompassing aspects such as reliability and availability analyses, as well as the application of the DP optimization model in this study. The chosen case study and the outcomes derived from the optimization model are expounded upon and visually depicted in Section 3 (Results). Section 4 (Discussions) presents deeper analyses of the results with detailed and critical discussions and provides a comprehensive synthesis of the research’s findings and an overview of the study’s key constraints and limitations. Finally, Section 5 addresses the conclusions drawn and outlines potential directions for future research.

1.1. Literature Review

Research on maintenance scheduling and management spans various industries, but studies specific to DCs are sparse. This section reviews the relevant literature, focusing on recent advancements in maintenance across industrial systems and infrastructure. The subsections explore maintenance optimization in k-out-of-n systems, DC-specific maintenance strategies, and applications of DP in maintenance management.
  • Maintenance Management in Industrial Systems and Applications
In diverse industrial systems, maintenance models continue to evolve, supported by emerging technologies that enhance predictive maintenance capabilities. The integration of Artificial Intelligence (AI) and machine learning (ML) has enabled data-driven decision-making in building maintenance [2,3], promoting energy efficiency and offering early detection of equipment issues. For instance, recent studies on heating, ventilation, and air conditioning (HVAC) systems [11], particularly the Vapor Compression Refrigeration System (VCRS), highlight how optimizing partial load usage across components can boost system reliability. This shift toward using high-quality, load-sharing components has demonstrated a near 10% improvement in system reliability over extended operation periods. Furthermore, availability-centered maintenance models, such as those applied to Domestic Hot Water (DHW) systems [15], reveal that prioritizing maintenance activities based on their impact on overall system availability reduces unnecessary interventions, enhancing uptime and cost-effectiveness. The Keeping System Availability (KSA) method, originally designed for power plants, has been adopted by researchers to incorporate the impact of maintenance activities. A recent focus on power distribution systems highlights the critical role of asset management and strategic maintenance planning in preventing costly failures and improving overall performance. Comprehensive reviews [16] emphasize that well-planned maintenance reduces operational costs and prevents shutdowns while also addressing economic, social, and environmental impacts, including those arising from deregulated power markets.
  • Advancements in Predictive Maintenance
Recent advancements in predictive maintenance have been significantly influenced by integrating emerging technologies such as AI and Internet of Things (IoT). The transformative role of AI in predictive maintenance, highlighting its ability to generate and process extensive data from industrial sensors to predict failures accurately, has been studied [17], which could enhance operational efficiency and reduce downtime. By having accurate failure predictions, the researchers have achieved proactive and cost-effective maintenance interventions to optimize the system’s reliability and performance. According to recent studies in the automotive industry, AI-driven approaches are improving Electric Vehicle (EV) maintenance and battery management, which enhance their performance and safety. The integration of the Internet of Things (IoT) and big data analytics in EV systems has been reviewed and examined by scholars to enable predictive maintenance and fleet-level optimization. Researchers have analyzed these advancements and highlighted AI’s pivotal role in shaping next-generation, energy-efficient EVs. Recent studies in the automotive industry highlight the transformative impact of AI-driven approaches on Electric Vehicle (EV) maintenance and battery management, leading to improved performance and safety [18]. These advancements underscore AI’s crucial contribution to developing next-generation, energy-efficient EVs. AI has also improved urban infrastructure management with City Information Modeling (CIM) [19]. Based on a systematic review, Digital Twin (DT) technology has been used to enable the real-time monitoring and predictive maintenance of urban assets, which leads to optimizing resource efficiency. Also, the integration of AI with Closed-Circuit Television (CCTV) inspections has revolutionized sewer pipe maintenance by enabling accurate deterioration modeling, facilitating timely interventions, and extending the service life of critical infrastructure [20]. By using an unsupervised multilinear regression technique and conducting Weibull analysis, the results of their study, such as the R2 values for different sewer and stormwater pipes, demonstrated AI’s potential in improving predictive maintenance and sustainable urban drainage systems. Moreover, in the renewable energy sector, AI-assisted predictive maintenance has enhanced fault detection in wind turbines, improving specificity and time efficiency in inspections [21]. In the context of Industry 4.0, implementing AI has optimized production processes through effective data analysis and model development, thereby enhancing product quality control and predictive maintenance [22]. Collectively, these studies underscore the critical role of AI and IoT in revolutionizing predictive maintenance across various industries, leading to improved reliability and performance of complex systems.
  • K-out-of-N Configurations and Fault-Tolerant Frameworks
In complex engineering applications, fault-tolerant system design is critical for ensuring reliability. These k-out-of-n systems have been widely studied and compared with other fault-tolerant strategies such as N + 1 redundancy, active/standby redundancy, and system-level redundancy.
Component-level versus system-level redundancy for k-out-of-n configurations has been analyzed by researchers [23], demonstrating that component-level redundancy often enhances fault tolerance more efficiently than system-wide redundancy. Similarly, the reliability of k-out-of-n data storage systems has been investigated with deterministic repair times under serial and parallel repair models [24]. The study found that parallel repair strategies enhance system reliability by minimizing downtime compared to sequential repairs. Also, the redundancy allocation in k-out-of-n systems was explored by evaluating active versus standby redundancy, concluding that selecting an optimal redundancy strategy improves both reliability and cost efficiency [25]. The researchers demonstrated how the k-out-of-n configuration is a viable alternative to traditional N + 1 redundancy models.
Further research on comparing k-out-of-n systems with other fault-tolerant methods has been seen in recent studies. The Matrix-cased System Reliability Method (MSRM) and the Reliability Growth Models (RGMs) have been applied to k-out-of-n systems for reliability growth analysis evaluation [26]. The results of the numerical examples in the study demonstrated the efficiency and applicability of the proposed method, concluding that the k-out-of-n configurations can offer superior fault tolerance compared to traditional redundancy schemes, particularly in large-scale and complex systems.
Furthermore, an optimal condition-based maintenance policy for k-out-of-n systems has been developed, considering the interdependencies between internal deterioration and external shocks using a Markov decision process framework [27]. By modeling stochastic dependencies, this approach improves maintenance decision-making, achieving up to 9.9% cost savings in a case study on offshore wind turbines. The findings highlight the importance of integrating degradation interactions into maintenance strategies, making the approach valuable for reliability-critical industries.
  • Maintenance Optimization in K-out-of-N systems
A variety of studies have explored maintenance optimization in k-out-of-n systems, focusing on their unique challenges like load sharing and common-cause failures. A comprehensive approach [28] to failure modeling and maintenance strategy development has been proposed, examining two-stage failure processes, imperfect inspections, and associated maintenance actions. Their work provides theoretical metrics for system availability and cost rates, supported by numerical examples. A two-threshold group maintenance policy has been proposed by researchers [29] for k-out-of-n load-sharing systems to enhance operational safety and cost-effectiveness, allowing for arbitrary lifetime distribution in maintenance decision-making and demonstrating superior performance through numerical experiments. By implementing this maintenance strategy, the aim was to achieve cost savings by reducing the likelihood of failures and minimizing disruptions caused by maintenance activities, thereby ensuring uninterrupted operations. The opportunistic maintenance model for load-sharing k-out-of-n systems has also been considered to minimize the total expected cost and to increase system reliability by combining the corrective and preventive maintenance (CM and PM) methods based on the number of failures and a specified time interval. The proposed model [30] addresses the complexity of reliability modeling and maintenance policies for load-sharing systems, aiming to optimize maintenance strategies for minimizing the total expected costs and downtime. These strategies collectively illustrate how k-out-of-n systems benefit from maintenance models that balance reliability with cost efficiency, supporting uninterrupted operation in load-sharing contexts.
  • Maintenance Management in Data Centers
Researchers have conducted an in-depth study that involved a critical literature review on O&M management [2,31]. Their focus was to identify recent research on O&M models and methods, specifically emphasizing DCs. The authors underscore the scarcity of research and the primary gaps and limitations in maintenance models for DCs in their publication by studying various methodologies and case studies and recommend future research directions. Their research advocates for further investigation into extensive reliability, failure, and availability analysis integrated with maintenance scheduling, tailored exclusively for the unique context of DCs. Recent research highlights the role of AI and ML in enhancing data-driven decision-making for equipment O&M, emphasizing digitized processes that reduce failure rates and extend equipment lifespans [31].
As highlighted by researchers [2], DCs consume approximately 40 times more energy than conventional structures, such as offices, underscoring the urgent need for energy-efficient maintenance management. Implementing energy-centered maintenance strategies, such as minor temperature adjustments (e.g., an increase of 1 °C), has been shown to enhance both reliability and efficiency, benefiting building operators and automation companies alike. Additionally, a study on Tier 1 DC infrastructure using blade systems employed modeling techniques like RBDs and stochastic Petri nets to evaluate maintenance strategies across different Service Levels (SLAs) [5]. The findings underscore challenges in achieving high availability, particularly in scenarios lacking redundancy and with extended maintenance intervals, while providing valuable metrics on dependability and network performance. According to DC industry professionals [32], to minimize the risks of expensive equipment downtime, the advancement of AI and ML algorithms has established a foundation for transforming maintenance scheduling into a predictive process. By focusing on the O&M management system of a DC Software-Defined Network (SDN), researchers have analyzed the progress of SDN technology based on OpenFlow [33]. Their study puts forth a networking scheme tailored for large-scale DCs, introducing a multi-POD structure where each POD comprises an autonomous SDN controller and forwarding device, integrated into the overarching network architecture. The scholars underscore the ongoing development and optimization of SDN technology, aimed at overcoming the inherent inflexibility of traditional networks while facilitating the adaptable expansion and utilization of cloud and fog computing platforms. Their investigation provides profound insights into the O&M management system of a DC SDN, highlighting SDN’s transformative potential in reshaping network architecture and addressing challenges inherent to conventional networks. The proposed networking scheme and strategic concepts present noteworthy contributions to the realms of network technology and DC management.
  • Utilization of Dynamic Programming in maintenance management
DCs depend on critical subsystems like power supply, air conditioning, and network connectivity, all requiring high reliability and minimal downtime. The complexities of maintaining these systems make multi-stage optimization models highly valuable for allocating maintenance actions and costs across components effectively. DP, a versatile optimization approach introduced by Bellman in 1957 [13,34], has seen increasing applications in maintenance management due to its ability to address these complexities comprehensively. DP has been applied by researchers as an important tool in research and practical applications due to its effectiveness and flexibility in solving optimization problems [35].
In rail grinding operations, DP-based strategies address defects through stochastic decision-making, enabling optimal resource allocation [36]. By modeling defect transitions and income impacts as matrix elements, these methods expedite the removal of high-risk defects, ensuring safety while managing less critical defects economically. Similarly, for a Network-Level Pavement Asset Management System, DP combined with Ant Colony Optimization (ACO) offers road agencies practical solutions for selecting Maintenance and Rehabilitation (M&R) activities under budget constraints [37]. This approach optimizes pavement performance by evaluating road conditions, available resources, and desired outcomes, showcasing DP’s utility in complex decision-making.
For power distribution networks, DP supports multiyear maintenance planning through risk-based models that decouple failure risk factors. A novel state transition model [16] refines maintenance schedules, enhancing system reliability and minimizing risks. Across these diverse applications, DP emerges as a powerful tool for optimizing maintenance strategies, demonstrating its adaptability and effectiveness in reducing costs, improving safety, and ensuring uninterrupted operations in complex systems like DCs.
Recent advancements in maintenance optimization for DCs emphasize the importance of availability-based approaches. For instance, a new study has explored the use of DP to optimize maintenance prioritization while ensuring system availability [14]. This research highlights the effectiveness of DP in balancing maintenance costs with system availability, reinforcing the applicability of DP in the DC infrastructure. Integrating such methodologies into maintenance management frameworks has proven to be a valuable strategy for improving operational efficiency and minimizing downtime. Other studies have proposed innovative strategies for optimizing DC operations. For example, a DoE-based approach has been presented by researchers [38] to enhance the efficiency of electrical infrastructures in DCs, demonstrating its impact on reducing energy consumption and operational costs. Similarly, an edge computing platform [39] has been proposed for intelligent operational monitoring in internet DCs, highlighting how real-time data analytics can improve maintenance decision-making. These contributions further underscore the growing need for advanced optimization techniques in DC maintenance management.

1.2. Research Gap Analysis

As described in the “Literature Review” section, several maintenance management and optimization strategies across various industrial systems, including power plants, HVAC systems, and power distribution networks, have been explored by researchers. Advanced techniques such as AI, ML, and DP have been applied to enhance decision-making, reliability, and cost-effectiveness in these domains. However, a critical gap exists in the application of these methodologies, specifically within the context of DCs.
Despite the importance of ensuring high reliability and availability in DCs, there is a scarcity of academic research focusing on maintenance strategies tailored to their unique operational demands. Existing studies on DCs primarily address O&M management, energy efficiency, and network reliability, but they lack comprehensive models that integrate reliability, availability, failure, and maintenance scheduling specifically for DC environments.
Additionally, traditional long-term maintenance planning is ineffective for DCs due to their critical components, dynamic operational environments, varying failure modes, and high availability demands. This highlights the need for flexible, dynamic, cost-effective, and real-time maintenance management models that can adapt to the complexities and rapid fluctuations in DC operations. As reviewed in the literature, the limitations of traditional methods, including RBD, further underscore the inadequacy of static models to capture the complex interdependencies and dynamic nature of DCs.
This gap is particularly evident in the application of multi-stage optimization models, such as those based on DP, which have proven effective in other industrial contexts. These models are underexplored in DCs, where they could potentially optimize maintenance actions and costs across critical subsystems like continuous power supply, air conditioning, and network connectivity.
Thus, there is a significant opportunity to develop and apply cost and availability-based optimization models, leveraging DP and other advanced techniques, specifically designed to address the challenges of maintenance management in DCs. As discussed earlier, this research introduces a novel availability-based maintenance cost optimization model tailored specifically for DCs.

1.3. Research Significance

Drawing from the reviewed literature, the research endeavor and model delineated in this manuscript have enriched the existing body of knowledge and maintenance scheduling models by incorporating the following features and facets:
  • Integration of a practical maintenance management model tailored specifically to the components of DCs, addressing the unique challenges and requirements of this critical infrastructure.
  • Development of a mathematical optimization model designed to determine optimal solutions for maintenance costs associated with specific components within DCs, considering factors such as the k-out-of-n configuration and obtaining the k optimal components of the model while accounting for specific constraints and requirements, thereby potentially reducing operational expenses and enhancing efficiency.
  • Introduction of a DP approach for DC maintenance management, enabling the capture of the recursive and agile nature inherent to maintenance scheduling activities within these complex environments.
  • Conducting reliability and availability analysis for DC components, serving as the foundation for formulating the optimization problem and ensuring the robustness of the proposed maintenance strategy.
  • Incorporation of system availability constraints derived from international DC standards into the optimization (and the DP) model, thereby guaranteeing adherence to industry-specific availability requirements and thresholds, essential for maintaining uninterrupted operations in DCs.
The comprehensive details of this model are elaborated on in the subsequent sections and subsections.

2. Materials and Methods

This section presents and discusses the proposed availability-based maintenance cost optimization model for a k-out-of-n system in a DC by implementing the DP method. Before presenting the quantitative analysis, the structure of the model and its elements and parameters, as well as the assumptions, are defined and described.
This research formulates a dynamic maintenance cost optimization model specifically designed for k-out-of-n-configured components in DCs. The model optimizes the selection of k components to minimize maintenance costs, while ensuring the system meets the required availability threshold. This k-out-of-n configuration is particularly relevant in DC environments where uninterrupted operations are critical and maintenance budgets are constrained. It is imperative to acknowledge that within our proposed model, the optimal (minimum) count of k components in the k-out-of-n configuration is attained precisely when the prescribed minimum total system availability percentage is fulfilled. This pivotal constraint is delineated and elucidated in the subsequent paragraphs.

2.1. Problem Formulation and Algorithm

According to the information mentioned above, the primary optimization formulation is structured as follows:
Minimize Z = Total maintenance costs of assets (components) in the DC =
∑(i = 1)n[Ci,t × Xi,t]
where
  • Ci,t represents the maintenance cost of each component at time t.
  • Xi,t represents the binary variables for each component, where i ranges from 1 to n and indicates whether a specific component is chosen (1) or not chosen (0) at time t.
The above optimization formula is subject to the total system’s availability constraint (which will be elaborated further) as follows:
1 − ∏(i = 1)n[(1 − Ai,t] ≥ At,Required
where
  • Ai,t: availability of the ith component at time t, which is the probability that the component is operational at time t.
  • At,Required: the required system availability threshold at the time, which represents the minimum acceptable level of system availability required for the system to meet its operational objectives.
This constraint represents the system’s reliability condition in a parallel k-out-of-n system.
Thus, the above information demonstrates the objective of minimizing the total maintenance costs by selecting the optimal combination of components, subject to constraints ensuring that the selected components achieve the required system availability.
The DP algorithm used and applied in this research efficiently determines the optimal number of components to minimize the total maintenance costs while ensuring that the total system availability meets the specified minimum availability requirement (based on DC standards). By utilizing DP, the algorithm calculates the minimum maintenance cost for each possible combination of components. It considers both cases: choosing or not choosing the current component. Then, the system availability is computed, and the optimal number of components (k), the corresponding minimum maintenance cost, and the final system availability are given as the algorithm’s outputs.

2.1.1. Model’s Parameters

Before the presentation of the case study implementation and the problem’s solution, the corresponding parameters and criteria that should be considered and included in the maintenance cost optimization model are elaborated on and explained below.
n: total number of components
MTBF [MTBF1, MTBF2, …, MTBFn]: the list of Mean Time Between Failures (MTBF) for each component (which will be discussed in the next part).
MTTR [MTTR1, MTTR2, …, MTTRn]: the list of mean time to repairs (MTTR) for each component (which will be discussed in the next part).
Maintenance costs [cost1, cost2, …, costn]: list of maintenance costs for each component.
K (integer variable) represents the number of components to choose (from a total of n components).
Min_availability: the minimum required system availability (which will be discussed further).
Other models’ parameters are thoroughly described further.
  • Reliability and Failure Rate
The component or system reliability (also known as the Survivor function) is known as the probability of performing a specified mission during operation satisfactorily based on specified conditions. The component’s failure rate (λ) demonstrates failure numbers in a specific time horizon, and it is applicable for the exponential distribution (as the hazard rate) and could be considered constant in calculations. This parameter is a conditioned probability for an asset when it breaks down (fails) during operation in a specific time interval [11].
  • System’s Availability
Because of the digital economy’s rapid expansion, DC initiatives must allocate ample resources to ensure optimal availability. Consequently, various metrics can be assessed and quantified, particularly tailored to align with specific tier requirements [5].
Availability refers to the system’s capability to perform its intended task within a given timeframe. It is often conveyed using various “nines”, which indicate the percentage of time a component or system is operational throughout the year. This capability is assessed through steady-state examination or simulation, employing the equation(s) provided below [5,15,40].
Availability of system/component = (MTBF (or MTTF))/(MTBF (or MTTF) + MTTR)
The Mean Time To Repair (MTTR) refers to the average duration required to either replace or fix a malfunctioning part. It does not encompass the logistical aspects of the repair process, like acquiring parts or assembling the team. MTTRs are linked to the maintenance strategy chosen. This parameter can be derived by dividing the total maintenance time by the total number of maintenance actions in a timeframe [5,41].
Also, the Mean Time Between Failure (MTBF) is the computed average time between failure occurrences, and the Mean Time To Failure (MTTF) refers to the average duration of time a component operates for without needing repair or replacement before experiencing a subsequent failure. MTTF is generally applied to assess the likelihood of failure in components or devices within a system that are either replaceable or irreparable. Conversely, MTBF is utilized for components that can be repaired to estimate their failure rate. Therefore, depending on the component’s type, either MTTF or MTBF should be selected and used in the above-mentioned equation to calculate the system’s availability [5,42,43].
Therefore, based on the above information, the individual availability for each component is calculated using the corresponding formula.
The proposed model incorporates the availability requirements and standards of Data Center (DC) systems, as derived from globally recognized DC standards. These standards define four primary levels of DC availability thresholds known as the DC tiers. Table 1 outlines the corresponding availability requirements based on the Uptime Institute standards [44]. The subsequent sections will detail the calculations for the system’s availability.
According to Table 1, the system’s or component’s availability (At) parameters, explained later in the “Problem Formulation” section, are integrated into our methodology and considered as the primary thresholds.
Therefore, depending on the desired tier, the minimum availability levels of DCs are
At ≥ 99.671%, (Tier 1 Data Center)
At ≥ 99.741%, (Tier 2 Data Center)
At ≥ 99.982%, (Tier 3 Data Center)
At ≥ 99.995%, (Tier 4 Data Center)
Hence, in order to meet the availability constraints, the following equation should always be valid depending on the desired tier:
At ≥ At,Required
  • Maintenance Costs
The maintenance costs [cost1, cost2, …, costn] for each component over a specific time horizon in the DC were collected from reliable sources, including manufacturer data, industry reports, and field observations. These costs serve as critical inputs into the proposed optimization model, capturing the economic implications of maintenance strategies across varying operational scenarios.
The costs encompass both preventive maintenance (aimed at minimizing failure risks) and corrective maintenance (which addresses unexpected repairs). Additionally, they incorporate real-world variations in repair complexity, system requirements, and component-specific needs. To comprehensively represent these aspects, six distinct maintenance cost categories are specified:
CF: costs of services for each incident/failure;
CPM: preventive maintenance costs;
CCM: corrective maintenance costs;
CPC: costs of power and cooling services;
CB: costs of battery replacement service;
CCR: component renewal costs;
CI: investment costs.
Hence, to calculate the total maintenance costs of each component at time ‘t’, the following formula was used.
Ci,t = CF + CPM + CCM + CPC + CB + CCR + CI
This formula reflects the cumulative nature of maintenance expenses, ensuring that all cost dimensions are accounted for in the optimization process.
To provide further clarity, additional details regarding the calculation and application of these costs, as well as their integration into the optimization framework, are elaborated upon in the “Case Study Implementation” subsection.
  • Asset Condition State Index
Recently, the significance of asset management, particularly regarding asset health, has witnessed a notable escalation, extending beyond the basic diagnosis of assets. Consequently, the utilization of asset information retrieval applications has emerged as a fundamental tool for assessing the asset’s health effectively. The health index, pivotal for decision-making regarding equipment maintenance or replacement, is formulated by considering the equipment’s status information. Furthermore, the precision and dependability of this health index are contingent upon the specific status information utilized for an asset [45].
According to relevant research works [45], for some critical assets and components (such as power distribution equipment), which could be deployed and utilized in outdoor settings or might be susceptible to damage or power outages caused by external environmental variables like harsh weather conditions, establishing semantic connections between the state of assets and their maintenance history information is very important. This connection facilitates the efficient development of diverse information services for equipment asset management within the IoT framework of electric power energy systems.
Considering the assessment of an asset’s current condition, several guidelines, approaches, methods, and standards have been proposed and applied [46,47]. In addition, based on various condition assessment reports and relevant research works, to assess and measure an asset’s health condition in an infrastructure’s facility, different measuring systems, criteria, ratings, and rankings have been used [48,49].
In a research work on power equipment in power systems [50], its technical condition was evaluated, which can be described by applying the Index of Technical Condition (ITC). This index considers the state of each node included in the unit of equipment and the whole system generally. The values for this index are between 0 (the worst value) and 100 (the best value).
Concerning studies including Facility Condition Analysis (FCA), the Facility Condition Needs Index (FCNI) metric has been utilized for assessing the condition of DCs and other facilities [51]. This metric is derived by dividing the recommended upgrade costs by the facility replacement costs.
In this paper, according to a comprehensive search across relevant resources and by gathering insights from information provided by reputable sources, such as the Institute of Public Works Engineering Australasia (IPWEA) and the National Asset Management System (NAMS) Group in New Zealand, a condition assessment and analysis ranking are assumed and applied for DC components.
In Table 2, an asset condition grading system, known as the Simple Approach by the International Infrastructure Management Manual (IIMM) [52], is selected and applied in this study to assign each DC component a condition state raking as an input for the proposed optimization model. The condition states are integrated into the optimization model indirectly. Hence, based on the condition state of each component, the corresponding maintenance costs are defined and given to the algorithm.

2.1.2. Dynamic Programming Approach

In this section, the Dynamic Programming (DP) approach is described. The DP model optimizes maintenance costs for k-out-of-n configurations while ensuring system availability thresholds are met.
The DP algorithm identifies the optimal combination of k components from n available units by solving recursive subproblems. Each decision state represents a set of operational components, with transitions reflecting maintenance actions and costs. This algorithm evaluates the following:
  • System Availability: ensures the selected configuration meets or exceeds the availability threshold depending on the selected tier.
  • Cost Minimization: selects the configuration with the lowest total maintenance cost while meeting minimum availability requirement.
The following steps show how DP is applied in the algorithm to solve the problem.
  • DP Table Initialization
First, the algorithm creates a 2D table, which is initialized with dimensions (n + 1) × (n + 1), where n is the total number of components.
Each cell dp[i][j] in the DP table represents the minimum maintenance cost to achieve system availability with i components, where j out of the total i components are selected.
  • Base Cases
The base cases handle the scenarios with 0 or 1 components:
The dp[i][0] is set to 0, which indicates no cost for 0 availability (no component is selected), and when it is set to infinity, it means that there is an infinite cost for non-zero availability with no components available.
  • Filling the DP Table
The DP bottom-up approach has been utilized to iteratively complete the DP table, and the nested loops iterate through all possible numbers of components (i) and all possible component selections (j).
Two cases are considered for each cell dp[i][j]:
-
In Case 1, the current component is not chosen and dp[i][j] is updated with the minimum cost from the previous row, indicating not choosing the current component.
-
In Case 2, the current component is selected and if the current component is chosen (j ≤ i), the system availability is calculated.
Then, if the calculated system availability meets the minimum requirement, the minimum maintenance cost is updated based on the previous row and the maintenance cost of the current component.
It should be noted that each state in the DP algorithm represents the operational status and maintenance cost of specific components, with decisions based on recursive evaluations of the subproblems.
  • Obtaining the Optimal k and the Minimum Maintenance Costs
After DP table completion, a loop iterates over possible numbers of components (k) to find the optimal solution, which is the one achieving the minimum maintenance costs while meeting the availability requirement.
  • Final Output
The minimum cost and corresponding optimal number of components are determined, and the final system availability and other relevant information, such as the optimal number of components and minimum maintenance costs, are provided as the algorithm outputs.
Therefore, the proposed DP application efficiently solves the optimization problem by breaking it down into smaller subproblems and reusing solutions to those subproblems, resulting in an optimal solution with improved time complexity compared to brute force or other recursive approaches. A summary of the proposed algorithm is provided below.
This algorithm takes as inputs the total number of components (n), their Mean Time Between Failures (MTBF), Mean Time To Repairs (MTTR), and maintenance costs, and the minimum acceptable system availability. The outputs include the optimal number of components (k) to achieve the required availability, the minimum associated cost, the final system availability, and the specific combination of selected components.
The process begins with input validation to ensure all lists are consistent, the MTBF and MTTR values are positive, and the minimum availability is within the valid range (0–1) depending on the DC’s tier. Next, the availability of each component is calculated using Formula (3) (Availability = MTBF/(MTBF + MTTR)), which was presented earlier. A DP table is then initialized to systematically store the best combinations of components for varying selections, tracking both availability and cost. The table is populated by evaluating all possible combinations of components to identify those that meet or exceed the minimum availability requirement at the lowest cost.
The algorithm subsequently determines the optimal number of components (k) by selecting the configuration that satisfies the availability constraint while minimizing costs, ensuring that at least five components are included. Finally, the results—optimal k, minimum cost, final system availability, and the chosen combination of components—are extracted from the DP table and returned as the solution.
The following pseudocode presented in Algorithm 1, provides a high-level overview of the model to assist readers unfamiliar with the DP algorithm. It outlines the key steps, from input validation to determining the optimal component selection while minimizing the total maintenance costs and meeting the total required system availability.
Pseudocode for the optimal UPS component selection algorithm:
Algorithm 1. Optimal UPS Component Selection.

1: FUNCTION FindOptimumUPSCombination(components, requiredAvailability):
2:  // Validate Inputs
3:   FOR each component IN components:
4:    IF component.MTBF ≤ 0 OR component.MTTR < 0 OR component.MaintenanceCost < 0:
5:      RETURN “Error: Invalid input values.”

6:  // Compute Availability for Each Component
7:  FOR each component IN components:
8:     component.Availability = component.MTBF/(component.MTBF + component.MTTR)

9:  // Initialize DP Table
10:   DPTable = Array[NumComponents][requiredAvailability + 1] filled with Infinity

11:   // Check Feasibility
12:    IF no valid configuration exists:
13:      RETURN “Error: No valid component configuration found.”

14:   // Populate DP Table
15:   FOR k FROM 1 TO NumComponents:
16:     FOR each combination OF components:
17:      totalAvailability = ComputeTotalAvailability(combination)
18:      totalCost = ComputeTotalCost(combination)
19:      IF totalAvailability ≥ requiredAvailability:
20:           DPTable[k][totalAvailability] = MIN(DPTable[k][totalAvailability], totalCost)

21:  // Determine Optimal k
22:  OptimalK = −1, MinCost = Infinity
23:  FOR k FROM 5 TO NumComponents:
24:     FOR j FROM requiredAvailability TO 0:
25:       IF DPTable[k][j] < MinCost:
26:           MinCost = DPTable[k][j]
27:           OptimalK = k

28:  // Output Results
29:  IF OptimalK = −1:
30:     RETURN “Error: Required availability cannot be met.”

31:  SelectedComponents = RetrieveSelectedComponents(DPTable, OptimalK, MinCost)
32:  RETURN OptimalK, MinCost, ComputeTotalAvailability(SelectedComponents), SelectedComponents

33: FUNCTION ComputeTotalAvailability(components):
34:  RETURN PRODUCT(component.Availability FOR component IN components)

35: FUNCTION ComputeTotalCost(components):
36:  RETURN SUM(component.MaintenanceCost FOR component IN components)

37: FUNCTION RetrieveSelectedComponents(DPTable, OptimalK, MinCost):
38:  //Backtrack to retrieve selected components
39:  RETURN SelectedComponents
In addition, a flowchart illustrating this algorithm is presented in Figure 2, providing a clear and concise step-by-step visual representation of the steps, decision points, and iterative loops.
  • Dynamic Maintenance Costs
It should also be noted that in the implemented algorithm, the concept of variable failure rates is employed to derive dynamic maintenance costs based on the condition of each asset. By applying this approach, different failure rate functions are considered, which reflect the varying reliability and performance characteristics of individual assets within a system, and a more dynamic model is proposed and implemented.
In the following subsections, the authors express how variable failure rates are integrated into the code to achieve dynamic maintenance costs:
-
Individual asset characteristics: Each asset in the system is characterized by its own Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR). These parameters define the asset’s failure rate and repair rate, respectively, and serve as a basis for calculating its availability and determining the impact of its failure on the overall system performance.
-
Dynamic availability calculation: The availability of each asset is dynamically calculated based on its MTBF and MTTR. Availability represents the probability that the asset will be operational at any given time, considering its historical failure and repair patterns.
-
Dynamic maintenance costs: maintenance costs are tied to the condition and reliability of each asset. As the availability of an asset changes dynamically over time, so does the associated maintenance cost. Lower availability, resulting from higher failure rates or longer repair times, leads to increased maintenance costs to restore the asset to operational status.
-
DP optimization: The DP approach is used to optimize maintenance costs while ensuring system availability meets specified requirements. By considering different combinations of assets and their associated maintenance costs, the DP algorithm identifies the most cost-effective configuration of assets that satisfies the availability target.
-
Cost-effectiveness analysis: the use of variable failure rates and dynamic maintenance costs allows for a more accurate cost-effectiveness analysis of maintenance strategies.
Based on the above-mentioned information, failure rates (λ), Mean Time To Repair (MTTR), and Mean Time Between Failures (MTBF) are central to this optimization model and are integrated into the cost and reliability calculations. The method ensures scalability and adaptability, providing solutions for various subsystems and operational scales.
To ensure real-world applicability, the model accounts for
  • Variable component aging and failure trends;
  • Budget constraints and cost variations for different maintenance actions.

2.2. Case Study Implementation

2.2.1. System and Component Description

For implementing and executing the proposed maintenance optimization model, a collection of Uninterruptible Power Supply (UPS) units in a k-out-of-n system was chosen as the focal point of investigation within this research. These units function collectively as an integral k-out-of-n subsystem within the DC power infrastructure.
Maintaining continuous operation and minimizing downtime are paramount for DCs, emphasizing the indispensable role of a UPS in their functionality. UPS systems supply power to equipment during maintenance or unexpected power outages when the primary source fails. It is crucial for DCs to invest in equipment that aligns with their requirements and to diligently monitor Key Performance Indicators (KPIs) to ensure seamless operations [53].
Various UPS types and configurations are employed across diverse facilities based on their specific operational needs and demands. UPS systems are broadly categorized into static and rotary types and further classified into single-conversion or double-conversion topologies. The static UPS, a fundamental variant, typically integrates a battery as its primary emergency power source in the DC. This system incorporates electronic switching components to convert DC voltage from the battery into AC voltage, facilitating its utilization by the connected IT equipment (ITE). Additionally, a switch, whether electronic or electromechanical, is incorporated within the static UPS to manage the transition between primary power and battery backup during power outages [42,54,55]. The Schematic Block Diagram of a UPS system is shown Figure 3.
To simplify the maintenance cost optimization model and maintain consistency in the results of this study, a single type of UPS system (APC Symmetra PX 500 kW with Right Mounted Maintenance Bypass and Distribution) has been chosen to conduct the calculations and analysis [57,58]. The manufacturer of this UPS system is APC by Schneider Electric which is based in Rueil-Malmaison, France. This UPS is three-phase, modular and scalable, with high performance and industry-leading efficiency and capacity, which makes it an ideal power protection solution for medium to large DCs and mission-critical environments [57]. Notably, this UPS system is based on a real-world implementation currently in use at the Cologix MTL3 Data Center in Montréal, Québec, Canada [59], further demonstrating its practical applicability in a live DC environment.
As an assumption and for implementing the model and obtaining the results, our model assumes that maintenance actions are planned monthly, with decisions influenced by operational thresholds specific to Tier 1 DC systems.
Thus, the minimum total system availability constraint is considered 99.671% for the Tier 1 Data Center based on the DC tier classification introduced by the Uptime Institute [60]. In addition, the case study system assumes that a minimum of five UPS units must be operational to meet the DC’s power and operational requirements. These assumptions are incorporated into the algorithm, which runs the optimization model to obtain the results.

2.2.2. Failure Modes and Condition States

Building upon the condition state rankings and descriptions outlined earlier, this section delves into the specific details and potential failure modes associated with a UPS system within the DC infrastructure.
Various methods and approaches have been employed by researchers to determine the reliability parameters of components. Failure Modes, Effects, and Criticality Analysis (FMECA) has been utilized to investigate the causes and impacts of component failures. For UPS systems, manufacturers typically use field data measurement methods to estimate reliability parameters. Additionally, some studies have employed RBD and Monte Carlo simulations to calculate UPS failure rates, MTBF, availability, and unavailability [61]. Therefore, the reliability data in this study were derived from these pertinent sources.
Figure 4 shows the top 10 failures in UPS components, which were collected from various manufacturers by Fulcrum Collaborations [53].
To operationalize the proposed optimization model, as described earlier and presented in Table 2, a discrete scale of 1 to 5 was employed to represent the varying physical and functional condition states of the UPS units. This ranking system facilitates the integration of these condition states into the model, enabling the selection of appropriate maintenance actions.

2.2.3. Reliability and Availability Information

As mentioned in the previous section, variable failure rates for the UPS units based on their conditions were employed to derive dynamic maintenance costs and deploy them into the maintenance cost optimization model.
Regarding the asset condition grading system presented earlier in Table 2, a 10-point classification scheme, with 10 representing the best and 1 the worst condition, was established for the UPS systems based on their failure rates and availability percentages. The MTBF and MTTR data were sourced from standards specifically for UPS systems in small computer rooms [42]. These data served as the baseline for the best asset condition (State 10). To comprehensively represent the reliability profile for the remaining condition states (from 1 to 9), additional parameters were incorporated through informed assumptions. These categories and parameters are shown in the table below (Table 3) [42] for a static UPS system. The available UPS units in the case study system are named UPS_DC_1 to UPS_DC_10 (for the conditions 1 to 10, respectively).

2.2.4. Dynamic Failure Rates and Maintenance Costs

This section provides details on how the failure rates and asset conditions are connected to categorize the UPS units and assign a failure rate function to each group of assets to obtain variable monthly failure rates and maintenance costs for each UPS unit.
These maintenance costs underpin the model’s ability to simulate realistic operational scenarios, particularly for k-out-of-n configurations in DC systems.
In the Fulcrum Collaborations report [53], first-hand and real-time data sourced from users of Mission Critical Information Management (MCIM) systems have been collected for Static UPS brands on a global scale, which offers valuable insights into the operational dependability. As Figure 5 presents, crucial benchmarks for over 3750 static UPS systems are documented by MCIM. This dataset includes products from leading manufacturers such as Eaton, Schneider Electric, Vertiv Group Corp., and other top companies within the MCIM database. The data offer valuable insights into the following key metrics for assessing the reliability of static UPS systems and their manufacturers: failures per asset, age of failure by lifecycle stage, MTBF, and failure modes. MCIM’s analysis underscores the most prominent static UPS brands currently utilized in the global market.
Therefore, by gathering relevant information from different references, three condition categories were selected to simplify the model’s implementation. These three asset condition categories were assumed based on the UPS unit’s age. Therefore, the first category represents new UPS systems or units (UPS_DC_10 and UPS_DC_9), with an age between 0 and 2 years (beginning of their lifecycle). Then, the middle-aged UPS components (from 2 to 18 years old and in their useful lifecycle) were grouped with condition states 3 to 8 (UPS_DC_8 to UPS_DC_3). Finally, the third category belongs to UPS units at the end of their lifecycle (between 18 and 20 years in operation), which are conditions 1 and 2.
Figure 6 presents a heatmap visualization of the yearly failure rates for UPS systems, derived from the data presented in Table 3. The heatmap effectively illustrates the relationship between a UPS system’s condition state (ranging from 10, representing optimal condition, to 1, representing the worst) and its age group (categorized as Group 1, Group 2, or Group 3). The visualization clearly demonstrates a trend of increasing failure rates with declining condition state and an increasingly older age group, highlighting the importance of both factors in UPS reliability.
In the next step, the corresponding failure rates were obtained from [53], and the failure rate function (λ(t)), which represents the probability of failure per unit of time in a one-year time horizon, was derived for each of these three categories as follows.
  • Group 1—new UPS components (0–2 years old), including condition states 9 and 10:
The failure rate function is
λ(t) = 0.2 + 0.8 × e(−0.5x)
where
x represents the time of year (month or hours).
Figure 7 shows the failure rate function plot for this group of UPS components in a one-year time horizon.
  • Group 2—middle-aged UPS components (2–18 years old), including condition states 3 to 8:
It is important to highlight that the data pertaining to this asset group span a 16-year duration, corresponding to the typical useful life of UPS systems. Based on the insights gleaned from Figure 5, the failure rate function was recalibrated to align with a one-year timeframe.
The Weibull continuous random variable distribution is a versatile tool for modeling various physical phenomena. Its flexibility lies in the ability to adjust parameters within its reliability functions, allowing for the representation of diverse distributions. By characterizing failure modes with a slope parameter (b) and considering the associated age and probability of failure for a component, the Weibull distribution becomes instrumental in statistical analyses of experimental data [11].
Therefore, the Weibull distribution was assumed and used for this group of UPS systems, which is more appropriate due to its flexibility in modeling different types of failure behaviors, including early-life failures, random failures during the useful part of the lifecycle, and wear-out failures as the system ages. The equation for the normalized failure rate function for Group 2 is given below.
λ(t) = 0.16 + F(t) × (0.46 − 0.16)
where
F(t) represents the Weibull Cumulative Distribution Function (CDF) evaluated at time ‘t’ (each month).
  • Group 3—end-of-life UPS components (18–20 years old), including condition states 1 and 2:
The failure rate function is
λ(t) = 0.14 × e0.1551x
where
x represents the time of year (month or hours).
Figure 8 and Figure 9 show the failure rate function plots and the failure rate changes over the 12 months for Groups 2 and 3 of UPS components.
Therefore, by obtaining the failure rate functions for each month and for each group of asset conditions for the available UPS components in the case study system, the failure rates were integrated into the corresponding monthly maintenance costs. Hence, we have the dynamic maintenance costs for each month depending on the type of UPS components. These maintenance costs will be presented later in this paper.
The following maintenance costs were gathered from different resources to calculate the total cost for each maintenance action under each condition state of the UPS system. Table 4 lists the available maintenance services for the selected UPS model for our case study, along with the corresponding costs provided by the manufacturer. Then, the maintenance costs were combined and categorized to calculate the total maintenance costs of each UPS system (Ci,t), presented earlier in Equation (9) and given in Table 5.
According to the data in Table 5, the pie chart provided in Figure 10 illustrates the distribution of various lifecycle and maintenance costs of the APC Symmetra PX 500 kW UPS system over a given timeframe. The initial investment in the new device accounts for a significant portion of the overall cost (approximately 62%). Component renewal, cooling, and power services also form substantial shares (22%, 10%), while other services contribute less.
Now that the yearly categorized and combined maintenance costs have been defined, the monthly dynamic maintenance costs, which are based on variable failure rates for each group of UPS systems, are calculated.
For Group 1, which comprises new UPS components (0–2 years old), we assume that only the preventive maintenance (inspection) (CPM) costs and the costs of each incident/failure (CF) were applicable. Therefore, the total monthly maintenance costs of this group of assets (conditions 9 and 10) were calculated as follows:
  • Monthly maintenance costs of Group 1 assets =
(CPM/12) + CF * (monthly failures)]
For Group 2, which comprises the middle-aged UPS components (2–18 years old), the costs of preventive maintenance (inspection) actions (CPM), the costs of corrective maintenance actions (CCM), the costs of each incident/failure (CF), and the costs of a battery replacement service (CBR) were considered and added together. Therefore, the total monthly maintenance costs of this group of assets (conditions 3, 4, 5, 6, 7 and 8) were calculated as follows:
  • Monthly maintenance costs of Group 2 assets =
[(CPM + CCM + CBR)/12) + CF * (monthly failures)]
Finally, for the third group, which comprises end-of-life UPS components (18–20 years old), the costs of electric critical power and cooling services (CPC), the component renewal costs (CCR), and the costs of each incident/failure (CF) were added together and calculated as follows:
  • Monthly maintenance costs of Group 3 assets =
[(CPC + CCR)/12) + CF * (monthly failures)]
The monthly maintenance costs of all three groups are presented in Table 6.
Table 6 provides data that are visually represented in Figure 11 to illustrate the monthly dynamic maintenance costs for three UPS groups: new, mid-life, and end of life. The costs for new UPS systems (Group 1) show a decreasing trend, stabilizing around USD 850 per month. For mid-life UPS systems (Group 2), these costs gradually increase, reaching approximately USD 2720 per month by the end of the year. In contrast, end-of-life UPS systems (Group 3) exhibit a significant and consistent rise, reaching about USD 13,200 per month by year’s end.
The next section presents the detailed results and analysis of the maintenance optimization model for a k-out-of-n configuration in a DC.

3. Results

As mentioned in the previous sections, by running this cost- and availability-based DP optimization model, we aim to find the optimal number or combinations of UPS units (k) out of all available UPS units (n) in the parallel system while having the minimum monthly maintenance costs and the meeting the required total system availability.
Hence, the model is executed by using the DP algorithm and by having the following input data for the first month provided in Table 7. We assume that the total minimum required parallel system availability is 99.671% for the Tier 1 DC [60]. Similarly, the input data for the other 11 months (from month 2 to month 12) are derived from Table 6 and used to run the optimization model for the whole year.
By running the optimization algorithm for the first month of operation, the optimal combination of UPS components in the DC power system configured as a k-out-of-n parallel system was determined. The results, presented in Table 8, detail the selected components and their characteristics. A 3D visualization of this optimal combination is depicted in Figure 12, illustrating the selected components alongside their respective availabilities and monthly maintenance costs. In addition, the selected UPS components and their availabilities are shown in the column chart in Figure 13.
Table 8 demonstrates the outcome of the optimization process applied to the DC power system’s UPS components. Specifically, a subset of 5 UPS units—namely, UPS_DC_10, UPS_DC_9, UPS_DC_8, UPS_DC_7, and UPS_DC_6—was selected for maintenance actions out of the total 10 available UPS systems or units. This selection optimizes the system’s availability and maintenance costs, highlighting the efficacy of the k-out-of-n redundancy approach. Also, as illustrated in Table 8, the total monthly maintenance budget required amounts to USD 9973. Furthermore, the combined system availability achieved with the optimal configuration of UPS components is 99.991%, surpassing the minimum system availability requirement for a Tier 1 Data Center, which is 99.671%. This allocation is essential for conducting various maintenance procedures on the optimal combination of UPS components throughout one month of operation.
In addition to running the DP optimization model, further investigation has been conducted to assess the robustness and applicability of the proposed optimization model and enhance the value of this study by performing a comprehensive sensitivity analysis. This analysis focused on key parameters such as MTBF, MTTR, failure rates, maintenance costs, and system availability thresholds.
Specifically, the first part of the sensitivity analysis explored the impact of ±10% variations in key parameters, including MTBF, MTTR, and maintenance costs. The analysis examined how these changes influenced the total system availability and total maintenance costs, highlighting the interdependence of these factors and their effect on the optimization outcomes [63].
As observed in Figure 14, decreasing the MTBF by 10% slightly reduces total system availability, while a 10% increase results in marginal improvement. The availability remains high across the range, indicating robustness to MTBF variations.
According to the sensitivity analysis presented in Figure 15, a 10% increase in MTTR results in a significant reduction in system availability, whereas a 10% decrease in MTTR leads to a marked improvement in availability. These findings underscore the pivotal role that minimizing repair times plays in sustaining optimal system performance, highlighting the direct correlation between rapid fault resolution and enhanced operational reliability.
The analysis presented in Figure 16 demonstrates that variations in maintenance costs lead to proportional changes in the total monthly maintenance expenses, confirming the model’s predictable scaling with cost fluctuations. A baseline total cost of approximately USD 9973 was observed, which increased or decreased by roughly USD 1000 under sensitivity scenarios, reflecting a ±10% change in maintenance costs. This result highlights the robustness of the model in responding linearly to cost adjustments, ensuring its reliability for scenario analysis and financial planning.
In the second part of our sensitivity analysis, a systematic approach was conducted to assess the impact of different k-out-of-n UPS system configurations on the total system availability and maintenance costs. The analysis systematically varied the minimum required UPS units (k) from 5 to 9, identifying the lowest-cost configurations that maintained a total system availability above the required 99.671% total availability threshold for Tier 1 DC. The results presented in Figure 17 help in identifying the optimal number of components to maintain while meeting the Tier 1 DC availability requirement at a reasonable cost. This approach provides valuable insights for DC managers aiming to balance maintenance costs and availability in their operations.
In the third part of the sensitivity analysis, different optimal selections of UPS units under different monthly maintenance budget constraints (ranging from USD 2000 to USD 10,000 per month) were found to achieve a minimum total system availability of 0.99671. The optimization process explores various k-out-of-n system configurations while considering the associated maintenance costs.
Figure 18 illustrates feasible k-out-of-n UPS configurations across different allocated maintenance budget constraints, highlighting the number of selected UPS units (k) that meet the minimum total system availability requirement. The heatmap highlights how lower budgets significantly restrict feasible configurations, while higher budgets allow for more redundancy.
Figure 19 presents a heatmap visualization of total maintenance cost variations for different optimal k-out-of-n configurations under different budget constraints. The total maintenance cost values are color-coded, with darker shades representing higher costs. The results demonstrate that as the number of selected UPS units increases, the maintenance costs also increase due to the additional servicing requirements, and increasing the budget does not necessarily lead to higher availability. A higher maintenance budget allows for a greater number of UPS units to be maintained, but beyond a certain threshold, cost efficiency diminishes.
According to the two heatmaps presented above, it is observed that optimizing the number of UPS units is crucial for achieving a balance between high availability and cost-effectiveness. Increasing the maintenance budget does not always result in higher availability, as evidenced by the diminishing returns observed for larger values of k. Therefore, selecting an appropriate k-out-of-n configuration is essential for maximizing system availability while keeping operational expenses under control.
This analysis provides valuable insights into the trade-offs between budget constraints, system redundancy, and availability, offering a systematic approach for optimizing UPS maintenance strategies in mission-critical environments.
A comparison of the optimal number of UPS components and their associated minimum costs across different DC tiers was performed in the fourth and final part of our sensitivity analysis, which revealed a clear relationship between redundancy requirements and operational expenses.
Figure 20 presents the optimal number of UPS components and their minimum maintenance costs across different DC tiers based on their availability requirements. As depicted, the optimal number of UPS units increases with higher tier levels, reflecting the enhanced redundancy and availability demands. Specifically, Tier I (1), with an availability requirement of 99.671%, necessitates five UPS units at a minimum cost of USD 9973. In contrast, Tier II (2), which requires 99.741% availability, optimally utilizes six UPS units, incurring a cost of USD 11,417. For Tier III (3), with a stringent availability target of 99.982%, eight UPS units are optimal, resulting in a cost of USD 14,856. Finally, Tier IV (4), the highest tier with a 99.995% availability requirement, demands 10 UPS units, with the minimum cost rising to USD 19,946. Table 9 provides a summarized overview for improved clarity.
This progression underscores the significant impact of redundancy on both the number of components and the financial investment required to maintain high availability in DC operations.
The results of the sensitivity analysis demonstrated that the DP optimization model remains effective under varying conditions, thereby reinforcing its reliability and practical relevance.
The optimized selection of components underscores the effectiveness of the proposed DP model in balancing reliability and cost efficiency. Achieving an availability of 99.991% not only meets but surpasses the Tier 1 standard (99.671%), which demonstrates the model’s capability to provide robust maintenance planning even under strict operational constraints.
The results highlight the practical utility of incorporating dynamic failure rates and variable maintenance costs into the optimization framework. By selecting fewer components while maintaining high availability, the model minimizes redundant expenses, offering a significant cost-saving advantage for DC operators.
The optimization model and algorithm used in this research were developed and implemented using the Spyder open-source scientific environment, leveraging the capabilities of Python 3. The scripts were executed in the recent versions of Python 3 environment (3.10, 3.11 and 3.12) with libraries such as NumPy 2 and Matplotlib 3 [63,64,66].

4. Discussions

While the proposed availability-based maintenance cost optimization model demonstrates effectiveness in optimizing maintenance strategies for UPS systems within DCs, certain limitations must be acknowledged in greater detail, along with their potential impact on the results and strategies for mitigation. This section first examines the challenges associated with implementing the model, followed by an analysis of its key limitations and constraints.

4.1. Challenges in Implementing the Proposed Model

There are several potential challenges to implementing our proposed availability-based maintenance cost optimization model in real-world scenarios. The important ones are discussed below.
One of the primary challenges in validating and refining the proposed model is the restricted access to accurate and comprehensive maintenance and reliability data for DCs. Due to security concerns and confidentiality agreements, organizations are often reluctant to share operational data related to system failures, maintenance schedules, and cost breakdowns. Despite multiple attempts to engage with various DC operators and obtain access to relevant datasets, we were unable to secure real-world records for direct validation. This limitation affects our ability to fine-tune model parameters, particularly failure rates and maintenance cost variations, which are essential for ensuring accurate predictions.
DCs operate within complex infrastructures that incorporate diverse equipment, monitoring tools, and maintenance frameworks. Integrating the proposed model into existing maintenance management systems requires compatibility with different software solutions and data processing architectures. Additionally, the model’s reliance on dynamic failure rate estimations necessitates real-time data collection and analysis, which may not be readily available in all DCs.
To mitigate these challenges, future research should explore collaboration with DC operators under strict confidentiality agreements to obtain anonymized datasets for model validation. Additionally, integrating machine learning techniques to estimate missing parameters dynamically can improve prediction accuracy. Further efforts should also focus on developing user-friendly software interfaces that simplify model implementation and adoption by industry professionals.

4.2. Key Limitations and Constraints

This research is based on a specific DC configuration, with a focus on UPS systems in a k-out-of-n arrangement. Consequently, the optimization model may not be directly applicable to different system configurations or other critical DC subsystems, such as HVAC, network systems, or other components. Thus, future research could explore extending the model to incorporate additional critical DC components, such as cooling systems, by integrating multi-component optimization frameworks. Additionally, developing a modular version of the model that allows for subsystem-specific parameters would enhance its adaptability to diverse DC architectures.
In addition, as discussed as one of the model’s important implementation challenges, the model relies on assumed failure rates and maintenance costs due to the variability and limited availability of real-world operational data for each component across different DCs. This assumption introduces potential inaccuracies, particularly for organizations with distinct operational environments, unique maintenance policies, or varying workload intensities. In the event of a significant deviation between the actual and the assumed failure rates, the model’s results for optimal maintenance strategies may require recalibration to preserve their effectiveness. Additionally, collaborations with DC operators to collect real-time operational data would improve parameter accuracy, leading to more reliable optimization results.
Also, certain assumptions were made regarding the system’s reliability metrics, such as MTBF and MTTR, and the availability thresholds aligned with Tier 1 standards, which may not be suitable for DCs of higher or lower criticality (e.g., Tiers 2, 3, and 4), and this will limit the direct applicability of the model’s results beyond similar operational environments. Therefore, the model could be adjusted and modified in the future based on other DC tiers.
Lastly, although the model incorporates dynamic maintenance costs, it does not account for external economic factors such as inflation, supply chain disruptions, or regulatory changes that may impact long-term cost predictions. Thus, the long-term validity of cost-based optimization may decrease if external cost factors change unpredictably. To mitigate this limitation, future extension of this research could integrate economic forecasting models to adjust maintenance cost predictions dynamically. Furthermore, incorporating stochastic optimization techniques would enhance its robustness in handling uncertain cost fluctuations.
Therefore, to ensure simplicity and feasibility, the model incorporated certain assumptions that were necessary for its initial implementation. Adjusting these parameters and assumptions—such as failure rates, maintenance costs, and availability thresholds—would alter the model’s outcomes and potentially expand its applicability. Thus, the model can be modified to suit other DC tiers, allowing it to adapt to varied operational environments and requirements.
Based on the discussed limitations and restrictions of the proposed model, specific methodologies, technologies, and frameworks could be explored in future studies to improve the model’s precision, scalability, and real-world implementation.
The incorporation of machine learning (ML) techniques could improve failure rate predictions by dynamically analyzing historical maintenance logs. This refinement would lead to more precise maintenance scheduling. Accurate data collection with ML, combined with predictive maintenance models leveraging supervised and unsupervised learning algorithms, could allow us to analyze historical failure data, detect patterns, and predict component degradation more effectively.
Furthermore, IoT technologies can be integrated into maintenance frameworks which allow real-time monitoring and data acquisition from DC components. Thus, future research could explore the integration of IoT-based condition monitoring systems with the proposed optimization model to enhance decision-making accuracy.
The proposed model could also be enhanced by incorporating stochastic elements through Monte Carlo simulations or probabilistic risk assessment techniques. Various failure scenarios and maintenance actions under uncertain conditions could be simulated by researchers to evaluate the system’s reliability and availability for different strategies and to determine optimal responses to unexpected system behaviors.
Despite the discussed challenges, limitations, and restrictions, the proposed model successfully identifies the optimal (minimum) number of k components to meet the minimum system availability threshold precisely using DP.
As reviewed, the existing literature underscores the numerous advantages of the DP method, notably its reduced computational time and efficiency in handling complex problems. A distinguishing feature of the proposed maintenance cost optimization model, in contrast to prior scholarly work, is its explicit design for DCs. This model uniquely incorporates principal DC availability requirements as established by the Uptime Institute, dynamic monthly failure rates with corresponding maintenance costs, and the application of the DP algorithm to ascertain the optimal “k” out of “n” components within the DC’s parallel k-out-of-n system.
As illustrated in the results, the sensitivity analysis confirms that the DP optimization model consistently identifies optimal maintenance strategies while maintaining high system availability and cost efficiency. This analysis not only enhances the robustness of the research conclusions but also provides valuable guidance for DC operators in making informed, cost-effective maintenance decisions under varying operational constraints.
This tailored approach ensures practicality in real-world applications and aids DC managers in optimizing maintenance schedules, reducing downtime, and extending equipment lifecycles as hyperscale and cloud-based DCs expand globally. The model’s ability to optimize resource utilization while maintaining high system availability addresses challenges like budget constraints and increasing energy costs. By incorporating dynamic maintenance costs, the framework adapts to real-world variations in asset performance and reliability, offering a robust tool for long-term infrastructure management. Beyond maintenance cost optimization, the model also supports sustainable infrastructure planning by improving resource utilization in DC operations. By determining the optimal number of k components, it minimizes unnecessary, energy-intensive maintenance and component replacements, indirectly reducing energy consumption and lowering the environmental footprint. Additionally, optimizing maintenance strategies ensures that critical systems operate efficiently with minimal downtime, reducing reliance on redundant backup power. This sets the stage for future work on advanced optimization methods tailored to industrial challenges and underscores the model’s versatility for other critical systems. Furthermore, the theoretical implications of this study extend to reliability engineering and optimization, demonstrating the applicability of DP to complex maintenance systems. This research sets the foundation for future advancements in optimization methods tailored to industrial challenges, highlighting the model’s versatility for other critical systems.

5. Conclusions

This study introduced an availability-based maintenance cost optimization model for k-out-of-n UPS systems in DCs using a DP approach. By incorporating variable failure rates and dynamic maintenance costs, the model effectively determines the optimal number of components (k) while minimizing maintenance costs and ensuring stringent reliability compliance. This novel framework advances the field of DC maintenance optimization by providing a structured approach to balancing operational expenses with system uptime, a critical factor in mission-critical infrastructure management.
The applied DP optimization model integrates variable failure rates and dynamic maintenance costs into the optimization framework. This approach enables a thorough analysis of system availability and cost optimization. Hence, the model supports more informed decision-making for asset management and maintenance planning. By modeling the impact of asset conditions on maintenance costs, stakeholders can make informed decisions about resource allocation and risk mitigation measures. The contributions of this study to the state-of-the-art in practice and to the existing body of knowledge in the construction industry are the development of a novel optimization model that incorporates dynamic failure rates and maintenance costs, providing valuable and theoretical references for O&M decision-makers.
In addition, the sensitivity analysis results demonstrated the model’s robustness to variations in key parameters, such as MTBF, MTTR, and maintenance costs. While availability remains stable under MTBF variations, it is significantly affected by MTTR changes, highlighting the importance of rapid repair times. Maintenance costs scale predictably with ±10% fluctuations, confirming the model’s reliability in cost estimation. Also, the sensitivity analysis findings reinforce the importance of optimizing the k-out-of-n configuration to achieve a balance between high availability and cost-effectiveness, as increasing the maintenance budget does not always yield proportional availability improvements. By systematically evaluating redundancy trade-offs and budget constraints, the proposed model provides a structured approach to UPS maintenance optimization in mission-critical environments. Additionally, a comparison of optimal UPS configurations across different DC tiers reveals a clear relationship between redundancy requirements and operational expenses. Higher availability demands necessitate increased UPS units and higher costs, ranging from USD 9973 for Tier 1 (5 units) to USD 19,946 for Tier 4 (10 units). These findings emphasize the model’s adaptability and practical utility in diverse operational scenarios.
The findings of the proposed optimization model effectively support DC operators by improving maintenance planning, minimizing downtime, and reducing costs, particularly in k-out-of-n UPS systems. Leveraging the DP method, it optimizes maintenance expenses while achieving a system availability of 99.991%, exceeding the Tier 1 requirement of 99.671%. This demonstrates the model’s ability to balance operational costs and uptime, providing a reliable tool for critical maintenance decisions.
While the proposed model significantly enhances maintenance cost efficiency and availability optimization, its reliance on assumed failure rates and uniform maintenance strategies highlights opportunities for further refinement. Future research should focus on validating the model using real-world operational data and integrating real-time condition monitoring and AI-driven predictive maintenance for adaptive decision-making. Additionally, expanding the model’s scope to include other critical DC subsystems, such as HVAC and network infrastructure, would further enhance its applicability.
To ensure successful real-world implementation, collaborations with industry partners are essential. Partnering with DC operators, UPS manufacturers, and data analytics firms would facilitate model validation, refinement, and integration into existing DC maintenance frameworks.
Moreover, further studies might explore integrating ML algorithms for predictive maintenance, allowing for dynamic adjustments aligned with the specific operational demands of diverse DC environments. By applying multi-component and multi-objective optimization models, the model could be expanded to simultaneously optimize maintenance strategies for multiple DC subsystems. Given the importance of energy efficiency and sustainability, future studies could also incorporate energy metrics into the framework for environmentally responsible maintenance planning.
By advancing cost-efficient and availability-based maintenance strategies, this research provides a scalable, adaptable tool for Data Center operators and maintenance decision-makers. With further development, this optimization framework has the potential to shape next-generation predictive maintenance models, ensuring resilient, cost-effective, and sustainable Data Center infrastructure management.

Author Contributions

Conceptualization, M.F.A.; methodology, M.F.A.; software, M.F.A. and M.J.B.; validation, M.F.A.; formal analysis, M.F.A.; investigation, M.F.A.; resources, M.F.A.; data curation, M.F.A. and M.J.B.; writing—original draft preparation, M.F.A.; writing—review and editing, M.F.A., M.J.B., F.N. and F.H.; visualization, M.F.A.; supervision, F.H. and F.N.; project administration, M.F.A., F.N. and F.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article and further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

DCData Center
O&MOperations and Maintenance
DPDynamic Programming
KPIsKey Performance Indicators
RBDReliability Block Diagrams
FCIFacility Condition Index
IoTInternet of Things
AIArtificial Intelligence
MLMachine Learning
λFailure (Hazard) rate showing the number of failures for each year
λ(t)Failure (Hazard) rate function which represents the probability of failures per unit of time (t)
AtComponent’s availability in timeframe (t)
MTTFMean Time To Failure
MTBFMean Time Between Failure
MTTRMean Time To Repair (Restore or Recover)
FCAFacilities Condition Analysis
UPSUninterruptible Power Supply
FMECAFailure Modes, Effect and Critically Analysis
CFCosts of Services for each incident/failure
CPMPreventive maintenance Costs
CCMCorrective maintenance Costs
CPCCosts of Power and Cooling services
CBCosts of Battery replacement service
CCRComponent Renewal Costs
CIComponent Investment Costs

References

  1. Informa PLC—AFCOM. State of the Data Center. 2024. Available online: https://afcom.com/events/EventDetails.aspx?id=1820212&group= (accessed on 1 March 2025).
  2. Abadi, M.F.; Haghighat, F.; Nasiri, F. Data center maintenance: Applications and future research directions. Facilities 2020, 38, 691–714. Available online: https://www.emerald.com/insight/content/doi/10.1108/F-09-2019-0104/full/pdf?title=data-center-maintenance-applications-and-future-research-directions (accessed on 1 March 2025). [CrossRef]
  3. Alshakhshir, F.S.; Howell, M. Chapter 11 Energy Centered Maintenance in Data Centers. In Energy Centered Maintenance—A Green Maintenance System; River Publishers: Aalborg, Denmark, 2021; pp. 169–172. Available online: https://ieeexplore.ieee.org/document/9549143 (accessed on 13 December 2024).
  4. Kirkwood, J. What the Reliability Bathtub Curve Means for Your Hardware Refresh Cycles. Service Express. Available online: https://serviceexpress.com/resources/reliability-bathtub-curve/ (accessed on 1 March 2025).
  5. Camboim, K.; Melo, C.; Araujo, J.; Alencar, F. Availability Evaluation and Maintenance Policy of Data Center Infrastructure. In Proceedings of the Anais Estendidos do X Simpósio Brasileiro de Engenharia de Sistemas Computacionais (SBESC Estendido 2020), Sociedade Brasileira de Computação—SBC, Florianopolis, Brazil, 23–27 November 2020; pp. 198–203. [Google Scholar] [CrossRef]
  6. Service Express. Data Center & Infrastructure Report: Priorities and Challenges in 2024. 2023. Available online: https://lp.serviceexpress.com/rs/021-JNM-575/images/2024%20Data%20Center%20Infrastructure%20Report.pdf?_gl=1*eqx2ik*_gcl_au*MjIzNTgxMDA3LjE3NDIyNzc3ODg.*_ga*NDQyODMwMzk3LjE3NDIyNzc3ODk.*_ga_1PJY9JWPTZ*MTc0MjI3Nzc4OS4xLjEuMTc0MjI3Nzg5OS4xMS4wLjA (accessed on 5 March 2025).
  7. Kumar, P. The Future of Data Center Maintenance: Trends and Best Practices for Modern Infrastructure. The Innovations of Data Center. Available online: https://www.linkedin.com/pulse/future-data-center-maintenance-trends-best-practices-modern-kumar/ (accessed on 20 February 2025).
  8. Technavio. COVID-19 Pandemic Impact on Global Hyperscale Data Center Market 2020–2024|Technavio. 2020. Available online: https://www.businesswire.com/news/home/20200825005097/en/COVID-19-Pandemic-Impact-on-Global-Hyperscale-Data-Center-Market-2020-2024-Technavio (accessed on 8 January 2025).
  9. Schenkelberg, F. K Out of N. FMS Reliability. Available online: https://accendoreliability.com/k-out-of-n-2/ (accessed on 28 February 2024).
  10. Birolini, A. Reliability Engineering: Theory and Practice, 8th ed.; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar] [CrossRef]
  11. Abadi, M.F.; Rahdar, M.H.; Nasiri, F.; Haghighat, F. Fault Identification and Fault Impact Analysis of the Vapor Compression Refrigeration Systems in Buildings: A System Reliability Approach. Energies 2022, 15, 5774. [Google Scholar] [CrossRef]
  12. Bellman, R. The theory of dynamic programming. Bull. Am. Math. Soc. 1954, 60, 503–515. [Google Scholar] [CrossRef]
  13. Bellman, R.E. Dynamic Programming; Princeton University Press: Princeton, NJ, USA, 1957. [Google Scholar]
  14. Abadi, M.F.; Haghighat, F.; Nasiri, F. Availability-based maintenance prioritization for data centres: A dynamic programming approach. Saf. Reliab. 2025, 1–36. [Google Scholar] [CrossRef]
  15. Pourhosseini, O.; Nasiri, F. Availability-Based Reliability-Centered Maintenance Scheduling: Case Study of Domestic (Building-Integrated) Hot Water Systems. ASCE ASME J. Risk Uncertain. Eng. Syst. A Civ. Eng. 2018, 4, 1–13. [Google Scholar] [CrossRef]
  16. Mirhosseini, M.; Keynia, F. Asset management and maintenance programming for power distribution systems: A review. IET Gener. Transm. Distrib. 2021, 15, 2287–2297. [Google Scholar] [CrossRef]
  17. Nadaf, S. AI for Predictive Maintenance in Industries. Int. J. Res. Appl. Sci. Eng. Technol. 2024, 12, 2013–2017. [Google Scholar] [CrossRef]
  18. Cavus, M.; Dissanayake, D.; Bell, M. Next Generation of Electric Vehicles: AI-Driven Approaches for Predictive Maintenance and Battery Management. Energies 2025, 18, 1041. [Google Scholar] [CrossRef]
  19. Lawal, O.O.; Nawari, N.O.; Lawal, O. AI-Enabled Cognitive Predictive Maintenance of Urban Assets Using City Information Modeling—Systematic Review. Buildings 2025, 15, 690. [Google Scholar] [CrossRef]
  20. Salihu, C.; Mohandes, S.R.; Kineber, A.F.; Hosseini, M.R.; Elghaish, F.; Zayed, T. A Deterioration Model for Sewer Pipes Using CCTV and Artificial Intelligence. Buildings 2023, 13, 952. [Google Scholar] [CrossRef]
  21. Shin, W.; Han, J.; Rhee, W. AI-assistance for predictive maintenance of renewable energy systems. Energy 2021, 221, 119775. [Google Scholar] [CrossRef]
  22. Johanesa, T.V.A.; Equeter, L.; Mahmoudi, S.A. Survey on AI Applications for Product Quality Control and Predictive Maintenance in Industry 4.0. Electronics 2024, 13, 976. [Google Scholar] [CrossRef]
  23. Kuiti, M.R.; Hazra, N.K.; Finkelstein, M. On Component Redundancy Versus System Redundancy for a k-out-of-n System. arXiv 2017, arXiv:1710.09202. [Google Scholar]
  24. Aggarwal, V. Reliability of k-out-of-n Data Storage System with Deterministic Parallel and Serial Repair. arXiv 2016, arXiv:1611.08514. [Google Scholar]
  25. Aghaei, M.; Hamadani, A.Z.; Ardakan, M.A. Redundancy allocation problem for k-out-of-n systems with a choice of redundancy strategies. J. Ind. Eng. Int. 2017, 13, 81–92. [Google Scholar] [CrossRef]
  26. Byun, J.-E.; Noh, H.-M.; Song, J. Reliability growth analysis of k-out-of-N systems using matrix-based system reliability method. Reliab. Eng. Syst. Saf. 2017, 165, 410–421. [Google Scholar] [CrossRef]
  27. Kasuya, M.; Jin, L. Structural Properties of Optimal Maintenance Policies for k-out-of-n Systems with Interdependence Between Internal Deterioration and External Shocks. Mathematics 2025, 13, 716. [Google Scholar] [CrossRef]
  28. Nan, Z.; Liu, Y.; Cai, K.; Jun, Z. Maintenance Optimization of A K-out-of-N System Considering Common Cause Failure and Load Sharing. Oper. Res. Manag. Sci. 2023, 32, 44. [Google Scholar]
  29. Wu, T.; Wei, F.; Yang, L.; Ma, X.; Hu, L. Maintenance Optimization of k-Out-of-n Load-Sharing Systems Under Continuous Operation. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 6329–6341. [Google Scholar]
  30. Jamali, M.A.; Pham, H. Opportunistic maintenance model for load sharing k-out-of-n systems with perfect PM and minimal repairs. Qual. Eng. 2022, 34, 205–214. [Google Scholar]
  31. Alshakhshir, F.; Howell, M.T. Data Driven Energy Centered Maintenance, 2nd ed.; River Publishers: Aalborg, Denmark, 2021. [Google Scholar] [CrossRef]
  32. O’Keeffe, M. The Future of Data Center Maintenance. Data Centre Dynamics Ltd (DCD). Available online: https://www.datacenterdynamics.com/en/opinions/the-future-of-data-center-maintenance/#:~:text=Maximize%20airflow,of%20operational%20efficiency%20and%20resilience (accessed on 20 February 2025).
  33. Cheng, H.; Li, M.; Cao, W.; Dong, X. Research on Operation and Maintenance Management System of Data Center SDN Network. In Proceedings of the 2023 IEEE International Conference on Sensors, Electronics and Computer Engineering (ICSECE), Jinzhou, China, 18–20 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1254–1258. [Google Scholar]
  34. Bradley, S.; Hax, A.; Magnati, T. Applied Mathematical Programming; Addison-Wesley: Boston, MA, USA, 1977; Available online: https://web.mit.edu/15.053/www/AMP.htm (accessed on 8 January 2025).
  35. Zhang, Y. A survey of dynamic programming algorithms. Appl. Comput. Eng. 2024, 35, 183–189. [Google Scholar] [CrossRef]
  36. Ilinykh, A.S.; Bondarev, E.S. Planning work on railroad track maintenance based on dynamic programming. Transp. Res. Procedia 2022, 61, 699–707. [Google Scholar]
  37. Albatayneh, O.; Aleadelat, W.; Ksaibati, K. Dynamic programming of 0/1 knapsack problem for network-level pavement asset management system. Can. J. Civ. Eng. 2021, 48, 356–365. [Google Scholar]
  38. Melo, F.; Andrade, E.; Callou, G. Optimization of electrical infrastructures at data centers through a DoE-based approach. J. Supercomput. 2022, 78, 406–439. [Google Scholar] [CrossRef]
  39. Jiang, C.; Qiu, Y.; Gao, H.; Fan, T.; Li, K.; Wan, J. An Edge Computing Platform for Intelligent Operational Monitoring in Internet Data Centers. IEEE Access 2019, 7, 133375–133387. [Google Scholar] [CrossRef]
  40. Loeffler, C.; Spears, E. Uninterruptible Power Supply System. In Data Center Handbook; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2014; Chapter 27; pp. 495–521. [Google Scholar]
  41. Riello Elettronica Group. How Is UPS Resilience Measured (MTBF, MTTR, Availability)? Available online: https://www.riello-ups.com/questions/52-how-is-ups-resilience-measured-mtbf-mttr-availability (accessed on 1 March 2025).
  42. Heising, C. IEEE Recommended Practice for the Design of Reliable Industrial and Commercial Power Systems; IEEE Inc.: New York, NY, USA, 2007. [Google Scholar]
  43. Kidd, C. MTBF vs. MTTF vs. MTTR: Defining IT Failure; BMC Software Inc.: Houston, TX, USA, 2019; Available online: https://www.bmc.com/blogs/mtbf-vs-mtff-vs-mttr-whats-difference/# (accessed on 1 March 2025).
  44. Gabriel, C. Data Center Disaster Recovery and High Availability. In Data Center Handbook; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2014; Chapter 35; pp. 639–657. [Google Scholar]
  45. Lee, H.; Lee, B. The development of a state-aware equipment maintenance application using sensor data ranking techniques. Sensors 2020, 20, 3038. [Google Scholar] [CrossRef]
  46. Federal Transit Administration. TAM Facility Performance Measure Reporting Guidebook: Condition Assessment Calculation; Federal Transit Administration: Washington, DC, USA, 2018.
  47. The Regional Municipality of Durham. The 2019 Regional Municipality of Durham Asset Management Plan; The Regional Municipality of Durham: Durham, ON, Canada, 2019.
  48. Ahmed, R.; Zayed, T.; Nasiri, F. A hybrid genetic algorithm-based fuzzy markovian model for the deterioration modeling of healthcare facilities. Algorithms 2020, 13, 210. [Google Scholar] [CrossRef]
  49. Town of Ajax. Corporate Asset Management Plan. 2017. Available online: https://www.ajax.ca/en/inside-townhall/resources/Departments/Ops/2017-Corporate-Asset-Management-Plan.pdf (accessed on 1 September 2023).
  50. Kuryanov, V.N.; Sultanov, M.M.; Kuryanova, E.V.; Skopova, E.M. Mathematical model of the processes of restoration of power equipment in power systems by criterion of the index of technical condition. J. Phys. Conf. Ser. 2020, 1683, 42041. [Google Scholar]
  51. DTZ. Facilities Condition Analysis. 2013. Available online: https://cdn.ymaws.com/www.sais.org/resource/resmgr/imported/Facility_Condition_Analysis_Capital_Planning_Merrow_MISBO_Oct2013.pdf (accessed on 1 September 2023).
  52. IPWEA; NAMS. Condition Assessment and Asset Performance Guidelines. 2012. Available online: https://higherlogicdownload.s3.amazonaws.com/IPWEA/1605183f-a91c-4680-b953-cde30dd2c09a/UploadedImages/Bookshop/PN%20Preamble_lp_v2.pdf (accessed on 1 September 2023).
  53. MCIM by Fulcrum Collaborations. Benchmarking the Reliability of Static UPS Systems. 2023. Available online: https://info.mcim24x7.com/static-ups-benchmarking (accessed on 1 January 2024).
  54. de Jonge, B.; Klingenberg, W.; Teunter, R.; Tinga, T. Optimum maintenance strategy under uncertainty in the lifetime distribution. Reliab. Eng. Syst. Saf. 2015, 133, 59–67. [Google Scholar] [CrossRef]
  55. Geng, H. Data Center Handbook; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2015. [Google Scholar] [CrossRef]
  56. Legrand. Reliability & Availability in Legrand UPS. 87045 Limoges Cedex—France. 2021. Available online: https://ups.legrand.com/media/document/reliability-and-availability-in-legrand-ups.pdf (accessed on 1 January 2025).
  57. APC—Schneider Electric. Symmetra PX 500kW Scalable to 500kW with Maintenance Bypass Left & Distribution. Available online: https://www.apc.com/ca/en/product/SY500K500DL-PD/symmetra-px-500kw-scalable-to-500kw-with-maintenance-bypass-left-distribution/ (accessed on 1 March 2025).
  58. CDW LLC. APC Symmetra PX 500kW Scalable to 500kW with Right Mounted Maintenance Bypass. Available online: https://www.cdw.com/product/apc-symmetra-px-500kw-scalable-to-500kw-with-right-mounted-maintenance-bypa/1673300#WAR (accessed on 1 September 2023).
  59. Cologix. Cologix—The Downtown Montréal Carrier Hotel—MTL 3: 1250 René-Lévesque West. 2019. Available online: https://www.cologix.com/wp-content/uploads/2018/12/Montreal-Data-Center-MTL-3.pdf (accessed on 10 February 2025).
  60. Uptime Institute, LLC. Tier Classification System. Available online: https://uptimeinstitute.com/tiers (accessed on 1 September 2023).
  61. Rahmat, K.; Jovanovic, S.; Lo, K.L. Reliability Comparison of Uninterruptible Power Supply (UPS) System Configurations. In Proceedings of the Intelec 2013; 35th International Telecommunications Energy Conference, Smart Power and Efficiency, VDE, Hamburg, Germany, 13–17 October 2013; pp. 1–6. [Google Scholar]
  62. Hummingbird Networks. APC On-Site Service On-Site Warranty Extension—Extended Service Agreement—Parts and Labor. Available online: https://www.hummingbirdnetworks.com/apc-on-site-service-on-site-warranty-extension-extended-service-agreement-parts-and-labor-for-ups-300-500-kva-1-year-on-site-business-hours-response-time-nbd-for-eps-7000-woe1yr-e7-50 (accessed on 1 September 2023).
  63. OpenAI. ChatGPT (December 2024 Version) [AI Language Model]. Available online: https://openai.com/chatgpt (accessed on 5 December 2024).
  64. Python Software. Python Software Foundation. Available online: https://www.python.org/ (accessed on 1 September 2024).
  65. Microsoft Copilot AI. Copilot AI (Response Generated by Microsoft Copilot AI). Microsoft. Available online: https://copilot.cloud.microsoft/ (accessed on 5 December 2024).
  66. Spyder Website Contributors. Spyder. Available online: https://www.spyder-ide.org/ (accessed on 1 September 2024).
Figure 1. Reliability Block Diagram of the k-out-of-n configuration [10].
Figure 1. Reliability Block Diagram of the k-out-of-n configuration [10].
Buildings 15 01057 g001
Figure 2. Flowchart of the dynamic cost and availability-based optimization algorithm for DC components in a k-out-of-n parallel configuration.
Figure 2. Flowchart of the dynamic cost and availability-based optimization algorithm for DC components in a k-out-of-n parallel configuration.
Buildings 15 01057 g002
Figure 3. Schematic block diagram of a UPS system [56].
Figure 3. Schematic block diagram of a UPS system [56].
Buildings 15 01057 g003
Figure 4. Top 10 failures in UPS systems [53].
Figure 4. Top 10 failures in UPS systems [53].
Buildings 15 01057 g004
Figure 5. Number of failures of static UPS systems by their lifecycle stage [53].
Figure 5. Number of failures of static UPS systems by their lifecycle stage [53].
Buildings 15 01057 g005
Figure 6. Heatmap visualization of the UPS yearly failure rates.
Figure 6. Heatmap visualization of the UPS yearly failure rates.
Buildings 15 01057 g006
Figure 7. Failure rate function for the UPS components of Group 1 in one year.
Figure 7. Failure rate function for the UPS components of Group 1 in one year.
Buildings 15 01057 g007
Figure 8. Failure rate function (Weibull distribution) for the UPS components of Group 2 in one year.
Figure 8. Failure rate function (Weibull distribution) for the UPS components of Group 2 in one year.
Buildings 15 01057 g008
Figure 9. Failure rate function for the UPS components of Group 3 in one year.
Figure 9. Failure rate function for the UPS components of Group 3 in one year.
Buildings 15 01057 g009
Figure 10. Lifecycle and maintenance cost distribution for the APC Symmetra PX 500 kW UPS system.
Figure 10. Lifecycle and maintenance cost distribution for the APC Symmetra PX 500 kW UPS system.
Buildings 15 01057 g010
Figure 11. Monthly dynamic maintenance costs for three groups of UPS systems.
Figure 11. Monthly dynamic maintenance costs for three groups of UPS systems.
Buildings 15 01057 g011
Figure 12. Three-dimensional visualization of the optimal combination of UPS components in one month of operation.
Figure 12. Three-dimensional visualization of the optimal combination of UPS components in one month of operation.
Buildings 15 01057 g012
Figure 13. Optimal combination of UPS components in one month of operation.
Figure 13. Optimal combination of UPS components in one month of operation.
Buildings 15 01057 g013
Figure 14. Sensitivity analysis: impact of ±10% variations in MTBF on availability [64].
Figure 14. Sensitivity analysis: impact of ±10% variations in MTBF on availability [64].
Buildings 15 01057 g014
Figure 15. Sensitivity analysis: impact of ±10% variations in MTTR on availability [64].
Figure 15. Sensitivity analysis: impact of ±10% variations in MTTR on availability [64].
Buildings 15 01057 g015
Figure 16. Sensitivity analysis: impact of ±10% variations in maintenance costs on total maintenance costs [64].
Figure 16. Sensitivity analysis: impact of ±10% variations in maintenance costs on total maintenance costs [64].
Buildings 15 01057 g016
Figure 17. Sensitivity analysis: maintenance costs and total system availability comparison for various k-out-of-n UPS configurations from k = 5 to k = 9 [64].
Figure 17. Sensitivity analysis: maintenance costs and total system availability comparison for various k-out-of-n UPS configurations from k = 5 to k = 9 [64].
Buildings 15 01057 g017
Figure 18. Sensitivity analysis: system availability variations for different optimal k-out-of-n configurations under varying budget constraints [64].
Figure 18. Sensitivity analysis: system availability variations for different optimal k-out-of-n configurations under varying budget constraints [64].
Buildings 15 01057 g018
Figure 19. Sensitivity analysis: maintenance costs variations for different optimal k-out-of-n configurations under varying budget constraints [64].
Figure 19. Sensitivity analysis: maintenance costs variations for different optimal k-out-of-n configurations under varying budget constraints [64].
Buildings 15 01057 g019
Figure 20. Sensitivity analysis: optimal combination of UPS components across different DC tiers [64,65].
Figure 20. Sensitivity analysis: optimal combination of UPS components across different DC tiers [64,65].
Buildings 15 01057 g020
Table 1. DC availability standards based on Uptime Institute tiers [14,44].
Table 1. DC availability standards based on Uptime Institute tiers [14,44].
DC TierSystem’s DescriptionDC Availability Percentage
1A single, non-redundant distribution path which supplies power to IT equipment, with no redundant capacity components.99.671%
2Includes all Tier 1 requirements and adds redundant infrastructure components to enhance availability.99.741%
3Meets or exceeds all Tier 1 and Tier 2 requirements; multiple independent distribution paths serving the IT equipment. All IT equipment must be dual-powered and fully compatible with the topology of a site’s architecture, concurrently maintainable site infrastructure.99.982%
4Meets or exceeds all Tier 1, Tier 2, and Tier 3 requirements. All cooling pieces of equipment are individually dual-powered, including chillers and heating, ventilating, and air conditioning (HVAC) systems. Fault-tolerant site infrastructure with electrical power, storage, and distribution facilities.99.995%
Table 2. Asset condition rating and maintenance actions [14,52].
Table 2. Asset condition rating and maintenance actions [14,52].
Rank (Rating)Asset
Condition
Condition’s
Description
Corresponding
Maintenance
Action(s)
1Excellent or
Very Good
Brand new or nearly new, fully operational.Only routine and planned
(recurring) maintenance is
required.
2GoodFunctional with minor issues or slight defects.Minor corrective maintenance
required (5%)—repairing failed or degraded components/assets
3Adequate or FairMaintenance required to restore acceptable level of performance.Significant corrective maintenance required (10–20%)—repairing failed or degraded
components/assets
4Marginal or PoorComponent renewal is necessary.Significant renewal/upgrade
required (20–40%)
5CriticalApproaching end of life—non-functional asset.Over 50% of asset requires replacement—substitution or exchange of an existing component/asset
Table 3. UPS system’s reliability and availability data [14,42].
Table 3. UPS system’s reliability and availability data [14,42].
Asset Condition StateAsset
Condition
Description
MTBF
(Hours)
MTTR
(h)
Failure Rate
(per Year)
Asset
Availability (Percentage)
10Brand new or near new condition933,70820.00938299.999%
9Brand new or near new condition930,00040.00941999.999%
8Fully operational asset850,00080.01030699.999%
7Partially operational asset with some failures500,000120.01752099.997%
6Partially operational asset with some failures300,000140.02920099.995%
5Fair condition100,000180.08760099.982%
4Poor condition and needs component renewal (battery or other parts)50,000250.17520099.950%
3Poor condition and needs component renewal (battery or other parts)30,000270.29200099.910%
2Critical condition and near end of life20,000290.43800099.855%
1Critical condition and near end of life10,000300.87600099.701%
Table 4. Available maintenance services for the APC Symmetra PX 500 kW UPS system [14,58,62].
Table 4. Available maintenance services for the APC Symmetra PX 500 kW UPS system [14,58,62].
Manufacturer’s Maintenance Service(s)Cost per Year
(USD)
New device (APC Symmetra PX 500 kW)261,464
Electric critical power and cooling service advantage ultra service plan—on-site44,269
APC modular battery replacement service—installation and configuration—on-site—includes installation, maintenance, replacement, or removal of one UPS battery during business hours9049
APC on-site service on-site warranty extension—extended service agreement—parts and labor (for UPS 300–500 KVA)—1 year—on-site—business hours6660
On-site service upgrade to factory warranty or existing on-site service contract—4-hour response2704
APC modular battery replacement service scheduling upgrade to 7 × 24 (7 days a week and 24 h a day)—installation/configuration (for UPS battery)—on-site2073
APC 7 × 24 scheduling upgrade from existing preventive maintenance service—1 incident—on-site1027
Table 5. Combined and categorized maintenance actions and their corresponding costs for the APC Symmetra PX 500 kW UPS system [14,58,62].
Table 5. Combined and categorized maintenance actions and their corresponding costs for the APC Symmetra PX 500 kW UPS system [14,58,62].
Type of Maintenance ActionCost per Year (USD)
Service for each incident/failure (CF)USD 1027 × (yearly failure rate)
Preventive maintenance (inspection) (CPM)USD 6656
Corrective maintenance (CCM)USD 6656 + USD 2704 = USD 9363.59
Battery replacement service (CB)USD 9048.99 + USD 2072.99 = USD 11,121.98
Electric critical power and cooling services (CPC)USD 44,268.99
Component renewal (CCR)
(30–40% of purchasing a new device)
0.35 × USD 261,463.99 = USD 91,500
Investment (new device purchase) (CI)USD 261,463.99
Table 6. Monthly dynamic failure rates and maintenance costs for the UPS units based on their condition.
Table 6. Monthly dynamic failure rates and maintenance costs for the UPS units based on their condition.
Asset
Group →
Group 1 (UPS_DC_10
and UPS_DC_9)
Group 2 (UPS_DC_3 to UPS_DC_8)Group 3 (UPS_DC_1
and UPS_DC_2)
Month ↓Monthly FailuresMonthly Maintenance
Costs
(USD)
Monthly FailuresMonthly Maintenance
Costs
(USD)
Monthly FailuresMonthly Maintenance
Costs
(USD)
10.6852213440.161902428 0.1634912,410
20.4943011480.170802438 0.1909212,438
30.3785010290.188902456 0.2229512,471
40.308279570.216302484 0.2603512,509
50.265679130.251302520 0.3040312,554
60.239838870.290802561 0.3550412,607
70.224168710.330702602 0.4146112,668
80.214658610.367402639 0.4841712,739
90.208898550.398002671 0.5654112,823
100.205398510.421502695 0.6602712,920
110.203278490.437802712 0.7710413,034
120.201988480.448202722 0.9004013,167
Table 7. Model input data for the first month of operation.
Table 7. Model input data for the first month of operation.
Condition StateAsset GroupMTBF
(Hours)
MTTR
(Hours)
Failures
(per Month)
UPS Availability (Percentage)
10Group 1933,70820.6852299.999%
9930,0004 99.999%
8Group 2850,00080.1619099.999%
7500,00012 99.998%
6300,00014 99.995%
5100,00018 99.982%
450,00025 99.950%
330,00027 99.910%
2Group 320,000290.1634999.855%
110,00030 99.701%
Table 8. Optimized combination of UPS components in the k-out-of-n system in one month.
Table 8. Optimized combination of UPS components in the k-out-of-n system in one month.
ComponentAsset
Condition
State
Available
Components
for
Maintenance
Selected
Components for
Maintenance
Monthly
Maintenance Costs
for
Available (n) Components (USD)
Monthly
Maintenance
Costs
for
Selected (k)
Components
(USD)
UPS_DC_10101113441344
UPS_DC_991113441344
UPS_DC_881124282428
UPS_DC_771124282428
UPS_DC_661124282428
UPS_DC_551024280
UPS_DC_441024280
UPS_DC_331024280
UPS_DC_221012,4100
UPS_DC_111012,4100
Total:105Total Monthly Maintenance Costs:9973
Total System Availability = 99.991% ≥ 99.671% (Tier 1 DC) ✓
Table 9. Optimal number of UPS components and maintenance costs across DC tiers.
Table 9. Optimal number of UPS components and maintenance costs across DC tiers.
DC TierAvailability
Requirement
Optimal Number of UPS UnitsMinimum Maintenance Cost (USD)
Tier I (1)99.671%59973
Tier II (2)99.741%611,417
Tier III (3)99.982%814,856
Tier IV (4)99.995%1019,946
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fadaeefath Abadi, M.; Bordbari, M.J.; Haghighat, F.; Nasiri, F. Dynamic Maintenance Cost Optimization in Data Centers: An Availability-Based Approach for K-out-of-N Systems. Buildings 2025, 15, 1057. https://doi.org/10.3390/buildings15071057

AMA Style

Fadaeefath Abadi M, Bordbari MJ, Haghighat F, Nasiri F. Dynamic Maintenance Cost Optimization in Data Centers: An Availability-Based Approach for K-out-of-N Systems. Buildings. 2025; 15(7):1057. https://doi.org/10.3390/buildings15071057

Chicago/Turabian Style

Fadaeefath Abadi, Mostafa, Mohammad Javad Bordbari, Fariborz Haghighat, and Fuzhan Nasiri. 2025. "Dynamic Maintenance Cost Optimization in Data Centers: An Availability-Based Approach for K-out-of-N Systems" Buildings 15, no. 7: 1057. https://doi.org/10.3390/buildings15071057

APA Style

Fadaeefath Abadi, M., Bordbari, M. J., Haghighat, F., & Nasiri, F. (2025). Dynamic Maintenance Cost Optimization in Data Centers: An Availability-Based Approach for K-out-of-N Systems. Buildings, 15(7), 1057. https://doi.org/10.3390/buildings15071057

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop