Predictive Maintenance 4.0 for Chilled Water System at Commercial Buildings: A Systematic Literature Review

: Predictive maintenance plays an important role in managing commercial buildings. This article provides a systematic review of the literature on predictive maintenance applications of chilled water systems that are in line with Industry 4.0/Quality 4.0. The review is based on answering two research questions about understanding the mechanism of identifying the system’s faults during its operation and exploring the methods that were used to predict these faults. The research gaps are explained in this article and are related to three parts, which are faults description and handling, data collection and frequency, and the coverage of the proposed maintenance programs. This article suggests performing a mixed method study to try to ﬁll in the aforementioned gaps.


Background
At commercial buildings (CBs)/large facilities, business work fills most people's time and occupies most employees or workforces, who spend most of their workday inside these buildings, so CBs make up a sizable portion of the built environment for the people. Common-sense drives the organizations/owners to take care of these buildings in order to avoid any negative impact on the surrounding or the internal environment of these buildings.
CBs are different from city to city and could be massive or regular ones such as universities, offices buildings, shopping malls, hotels, factories, compounds, hypermarkets, etc., and cover most of the land areas in the cities. The University of Michigan reported that CBs' floor spaces are foreseen to encompass 124.7 billion square feet by 2050, which is a 34 percent increase from 2019 [1]. Moreover, they are obviously playing a significant role in the communities, as a great CB can enhance people's more social life and can generate more jobs. However, they are approximately consuming up to 40 percent of the total global energy demand [2]. Moreover, one of the main challenges that CBs are facing is climate change. Monge-Barrio and Gutierrez indicated that climate change has a significant impact on such buildings [3]. Furthermore, climate change is predicted to have strong effects on the energy requirements of CBs, as their heating and cooling needs are highly related to temperature conditions and weather variations [4]. In addition, activities in buildings contribute to a major share of global environmental concerns [5]. These challenges motivate any facility manager or engineer to take valid actions toward building performance improvement and maintenance, as well as looking after the associated operation and maintenance (O&M) costs. This should be performed, as CBs are increasingly equipped with sophisticated engineering facilities, as well, such as Heat, Ventilation, and Air Conditioning (HVAC) equipment/machines [6]. By doing so, the facility manager will fulfil the sustainability of his/her CB [7].
Per ASHRAE [24], the operation of CWS starts with chillers producing the chilled water required to operate the AHUs/FCUs and thereby to achieve the designed room conditions. Chillers, primary chilled water pumps, are operated and sequenced to produce chilled water at a set temperature, whereas a specified temperature of water required by the condenser component of chillers is produced by the cooling towers through the condenser water pumps. The produced chilled water is then pumped by the secondary water pumps to all the terminal units, such as AHUs and FCUs, and in case of a variable flow system, their speed is controlled to maintain a set differential pressure in the pipe network. Finally, the terminal units receive the chilled water and control their respective valve actuators to achieve the desired temperatures inside the rooms they are serving. Figure 1 shows a schematic drawing of a CWS. system, their speed is controlled to maintain a set differential pressure in the pipe network. Finally, the terminal units receive the chilled water and control their respective valve actuators to achieve the desired temperatures inside the rooms they are serving. Figure 1 shows a schematic drawing of a CWS.

Predictive Maintenance Paradigm
Predictive maintenance (PdM) was first devised back in the late 1940s [25] and is basically used to assist in determining the status of an operated equipment in order to estimate the time of performing the maintenance actions [26]. According to Selcuk, it can be defined as an exercise of pre-empting failures depending on historical data in order to optimize the maintenance efforts [27]. Moreover, it is considered to be conditioned-based maintenance (CBM) to predict the likelihood of the failure time of a particular equipment and advise which maintenance task should be performed accordingly [28]. Figure 2 illustrates the position of PdM, along with other maintenance strategies. Since PdM is under the preventive maintenance (PM) category, it allows for convenient scheduling of reactive maintenance (RM) and prevents the equipment at a particular CB from any unexpected failure, where its principle is to evaluate the actual operating condition of a certain system and its components in order to optimize the O&M costs [29]. So, PdM can be considered as an enhancement of PM and RM; Figure 3 visualizes this argument. Furthermore, this research believes that PdM is significant for CB's maintenance program, as it counts on the current operational situation of an equipment and leads the concerned party to identify the expected issue immediately rather than average or expected life statistics, and also to predict when a maintenance activity will be needed. Verbert and others have assured that the routine maintenance does not usually identify the faults but can be sorted by implementing a PdM program [30].

Predictive Maintenance Paradigm
Predictive maintenance (PdM) was first devised back in the late 1940s [25] and is basically used to assist in determining the status of an operated equipment in order to estimate the time of performing the maintenance actions [26]. According to Selcuk, it can be defined as an exercise of pre-empting failures depending on historical data in order to optimize the maintenance efforts [27]. Moreover, it is considered to be conditioned-based maintenance (CBM) to predict the likelihood of the failure time of a particular equipment and advise which maintenance task should be performed accordingly [28]. Figure 2 illustrates the position of PdM, along with other maintenance strategies. Since PdM is under the preventive maintenance (PM) category, it allows for convenient scheduling of reactive maintenance (RM) and prevents the equipment at a particular CB from any unexpected failure, where its principle is to evaluate the actual operating condition of a certain system and its components in order to optimize the O&M costs [29]. So, PdM can be considered as an enhancement of PM and RM; Figure 3 visualizes this argument. Furthermore, this research believes that PdM is significant for CB's maintenance program, as it counts on the current operational situation of an equipment and leads the concerned party to identify the expected issue immediately rather than average or expected life statistics, and also to predict when a maintenance activity will be needed. Verbert and others have assured that the routine maintenance does not usually identify the faults but can be sorted by implementing a PdM program [30]. system, their speed is controlled to maintain a set differential pressure in the pipe network. Finally, the terminal units receive the chilled water and control their respective valve actuators to achieve the desired temperatures inside the rooms they are serving. Figure 1 shows a schematic drawing of a CWS.

Predictive Maintenance Paradigm
Predictive maintenance (PdM) was first devised back in the late 1940s [25] and is basically used to assist in determining the status of an operated equipment in order to estimate the time of performing the maintenance actions [26]. According to Selcuk, it can be defined as an exercise of pre-empting failures depending on historical data in order to optimize the maintenance efforts [27]. Moreover, it is considered to be conditioned-based maintenance (CBM) to predict the likelihood of the failure time of a particular equipment and advise which maintenance task should be performed accordingly [28]. Figure 2 illustrates the position of PdM, along with other maintenance strategies. Since PdM is under the preventive maintenance (PM) category, it allows for convenient scheduling of reactive maintenance (RM) and prevents the equipment at a particular CB from any unexpected failure, where its principle is to evaluate the actual operating condition of a certain system and its components in order to optimize the O&M costs [29]. So, PdM can be considered as an enhancement of PM and RM; Figure 3 visualizes this argument. Furthermore, this research believes that PdM is significant for CB's maintenance program, as it counts on the current operational situation of an equipment and leads the concerned party to identify the expected issue immediately rather than average or expected life statistics, and also to predict when a maintenance activity will be needed. Verbert and others have assured that the routine maintenance does not usually identify the faults but can be sorted by implementing a PdM program [30].    Having said it is a significant paradigm, well-known and key industrial manufacturers have invested in PdM to maximize machine parts and their uptime and disseminate maintenance to be more cost-effective [31]. Wang and others have argued that scheduled and unscheduled shutdowns; astronomical O&M costs; avoidable inventory; and undue maintenance activities performed on a particular equipment, machine, or system can be dwindled with PdM [32]. However, any technique has its own pros and cons; the main advantages of PdM are making the repairs based on the equipment condition, and this will sometimes lead to twenty-percent savings, as well as enriching safety aspects of the equipment and its surrounding; meanwhile, the disadvantages of PdM come from the organization's culture of hesitating to assign a sufficient budget for it [33].
PdM uses data analytics to detect equipment faults and to try to rectify operational inefficiencies with a goal of eliminating the root cause of potential system flops [34]. Amruthnath and Gupta did mention that observing equipment performance and monitoring the critical parameter of a particular system are one of the main PdM techniques [35]. Moreover, Huang and Wang considered components' monitoring of a particular system as one of PM's themes, which is the derived category of PdM [36]. Nguyen and Medjaher plus Yu and others have indicated that fault detection and diagnosis (FDD) and condition monitoring are critical components of PdM [37]. To perform automatic fault detection, PdM requires a big data collection, analytics platform, and data sufficiency [38]. The analytics platform must incorporate domain expertise, so that the algorithms have an intended application to the system under study [39]. According to Garg and Deshmukh, data sufficiency is the availability of data from enough sensors, actuators, meters, and control parameters so that a meaningful analysis can be performed accordingly [40].
Per Ran and others, maintenance in business industrial life is mainly RM and PM, with the PdM strategy being applied only for critical situations [41]. They believe that these maintenance strategies do not consider the vast amount of data that can be generated and the available approaches that align with Industry 4.0/Quality 4.0 principles, such Having said it is a significant paradigm, well-known and key industrial manufacturers have invested in PdM to maximize machine parts and their uptime and disseminate maintenance to be more cost-effective [31]. Wang and others have argued that scheduled and unscheduled shutdowns; astronomical O&M costs; avoidable inventory; and undue maintenance activities performed on a particular equipment, machine, or system can be dwindled with PdM [32]. However, any technique has its own pros and cons; the main advantages of PdM are making the repairs based on the equipment condition, and this will sometimes lead to twenty-percent savings, as well as enriching safety aspects of the equipment and its surrounding; meanwhile, the disadvantages of PdM come from the organization's culture of hesitating to assign a sufficient budget for it [33].
PdM uses data analytics to detect equipment faults and to try to rectify operational inefficiencies with a goal of eliminating the root cause of potential system flops [34]. Amruthnath and Gupta did mention that observing equipment performance and monitoring the critical parameter of a particular system are one of the main PdM techniques [35]. Moreover, Huang and Wang considered components' monitoring of a particular system as one of PM's themes, which is the derived category of PdM [36]. Nguyen and Medjaher plus Yu and others have indicated that fault detection and diagnosis (FDD) and condition monitoring are critical components of PdM [37]. To perform automatic fault detection, PdM requires a big data collection, analytics platform, and data sufficiency [38]. The analytics platform must incorporate domain expertise, so that the algorithms have an intended application to the system under study [39]. According to Garg and Deshmukh, data sufficiency is the availability of data from enough sensors, actuators, meters, and control parameters so that a meaningful analysis can be performed accordingly [40].
Per Ran and others, maintenance in business industrial life is mainly RM and PM, with the PdM strategy being applied only for critical situations [41]. They believe that these maintenance strategies do not consider the vast amount of data that can be generated and the available approaches that align with Industry 4.0/Quality 4.0 principles, such as machine learning (ML), internet of things (IoT), Artificial Intelligence (AI), big data, advanced data analytics, data driven, cloud computing, and augmented reality.
Based on the thoughts of Chukwuekwe and others, PdM 4.0 is aligned with Industry 4.0, which is a paradigm shift in industrial processes impelled by intelligent informationprocessing approaches [42]. This shift in the maintenance paradigm has motivated this research's argument to believe in the PdM 4.0 paradigm, which can consider the operational status of CWS and shows the concerned manager, the maintenance engineer, or the system's user the health condition of the said system and make affirmative measures toward that, when required. Figure  as machine learning (ML), internet of things (IoT), Artificial Intelligence (AI), big data, advanced data analytics, data driven, cloud computing, and augmented reality. Based on the thoughts of Chukwuekwe and others, PdM 4.0 is aligned with Industry 4.0, which is a paradigm shift in industrial processes impelled by intelligent informationprocessing approaches [42]. This shift in the maintenance paradigm has motivated this research's argument to believe in the PdM 4.0 paradigm, which can consider the operational status of CWS and shows the concerned manager, the maintenance engineer, or the system's user the health condition of the said system and make affirmative measures toward that, when required. Figure 4 explains the idea behind PdM 4.0.

Systematic Literature Review
Typically, a systematic literature review (SLR) is a kind of review that compiles varied research studies and epitomizes them in order to find the answers for a research question by using stringent methods [43]. Here, in this article, the protocol that was outlined by Kitchenham and others is followed [44]. The SLR went through four stages, as follows: This stage is the launching of SLR. It consisted of defining the research questions (RQs) of this study. The RQs are questions that a study or research project intends to answer [45]. To make a strong RQ for all fields, especially in technology, engineering, and management, Figure 5 shows the principles that should be used [46].

Systematic Literature Review
Typically, a systematic literature review (SLR) is a kind of review that compiles varied research studies and epitomizes them in order to find the answers for a research question by using stringent methods [43]. Here, in this article, the protocol that was outlined by Kitchenham and others is followed [44]. The SLR went through four stages, as follows:

1.
Determining the research questions.

2.
Base of the research.
Quality assessment.

Stage #1
This stage is the launching of SLR. It consisted of defining the research questions (RQs) of this study. The RQs are questions that a study or research project intends to answer [45]. To make a strong RQ for all fields, especially in technology, engineering, and management, Figure 5 shows the principles that should be used [46].
As the idea of this research is to look after the studies that proposed a PdM 4.0 for CWS from an engineering management point of view, the following two RQs arose: • RQ1: How can the faults be identified in order to predict them? • RQ2: What are the methods that can be used to predict the faults?

Stage #2
This stage shows the search string and source selection. For the search string, operators called Boolean allow the researcher to use specific keywords with symbols such as "AND" and "OR" in order to limit the relevant research papers [47]. Based on the information of previous sections, Boolean operators were exercised at the search engines as follows: As the idea of this research is to look after the studies that proposed a PdM 4.0 for CWS from an engineering management point of view, the following two RQs arose: • RQ1: How can the faults be identified in order to predict them? • RQ2: What are the methods that can be used to predict the faults?

Stage #2
This stage shows the search string and source selection. For the search string, operators called Boolean allow the researcher to use specific keywords with symbols such as "AND" and "OR" in order to limit the relevant research papers [47]. Based on the information of previous sections, Boolean operators were exercised at the search engines as follows: The search engines or database used in this article, in addition to MDPI, are Google Scholar, IEEE, Springer, ACM Digital Library, Scopus, ProQuest, Web of Science, and Sci-enceDirect, as they are persuasive and reliable [48,49].

Stage #3
Following the actions that were performed within the previous two stages, all studies that were not pertinent to the aim of this article were removed. To do so, the following exclusion criteria, which are shown in Table 1, were applied.
("Industry 4.0" OR "Quality 4.0") AND ("Machine learning" OR "Deep Learning" OR "Data Driven" OR "Artificial Intelligence") AND ("Predictive Maintenance" OR "Faults Detection" OR "Faults Diagnosis" OR "Condition Based Maintenance" OR "Condition Monitor Maintenance") AND ("Architecture" OR "Framework" OR "Management" OR "Program") AND ("Ontology" OR "Reasoning") AND ("Chilled Water System" OR "HVAC" OR "AC" OR "Chiller" OR "Cooling Tower" OR "Primary Pump" OR "Secondary Pump" OR "Condenser Pump" OR "Terminal Unit" OR "Air Handling Units" OR "Fan Coil Unit") AND ("Commercial Buildings" OR "Large Facilities").  As the idea of this research is to look after the studies that proposed a PdM 4.0 for CWS from an engineering management point of view, the following two RQs arose: • RQ1: How can the faults be identified in order to predict them? • RQ2: What are the methods that can be used to predict the faults?

Stage #2
This stage shows the search string and source selection. For the search string, operators called Boolean allow the researcher to use specific keywords with symbols such as "AND" and "OR" in order to limit the relevant research papers [47]. Based on the information of previous sections, Boolean operators were exercised at the search engines as follows: The search engines or database used in this article, in addition to MDPI, are Google Scholar, IEEE, Springer, ACM Digital Library, Scopus, ProQuest, Web of Science, and Sci-enceDirect, as they are persuasive and reliable [48,49].

Stage #3
Following the actions that were performed within the previous two stages, all studies that were not pertinent to the aim of this article were removed. To do so, the following exclusion criteria, which are shown in Table 1, were applied.
The search engines or database used in this article, in addition to MDPI, are Google Scholar, IEEE, Springer, ACM Digital Library, Scopus, ProQuest, Web of Science, and ScienceDirect, as they are persuasive and reliable [48,49].

Stage #3
Following the actions that were performed within the previous two stages, all studies that were not pertinent to the aim of this article were removed. To do so, the following exclusion criteria, which are shown in Table 1, were applied.

Exclusion Criteria Reference
Papers (journals or conferences) that are not related to predictive PdM in a beeline [48][49][50] Papers that are not related to Industry 4.0 or Quality 4.0 or data-driven analysis or data mining in a beeline [48][49][50] Grey literature [51] Non-English publications [51] Pre-1999 publications [49] Papers that are not peer-reviewed [52] Buildings 2022, 12, 1229 7 of 29 After that, filtering process have been implemented. Duplicate papers to be removed, thereafter, titles, and abstracts to be analyzed, and then the entire text to be analyzed [48].

Stage #4
Following SLR's procedure [44], the remaining articles were subjected to four questions; at least two questions out of these four questions should be fulfilled with a "yes" answer. The said four questions are as follows: • Is the purpose of the research clearly presented? • Does the research showed a framework/an architectural proposal or a research methodology? • Does/do the author(s) present and discuss the results of the research? • Does the paper used an ontology or reasoning?

Search Results
Starting from the second stage of SLR up to the fourth one, 168 studies are the total number of considered research papers in this article. Table 2 shows the papers' selection journey and how many papers are left after each stage.

Applications
This section covers the considered studies that were mentioned in the previous section. It has four subsections-one for each of CWS components.

Chillers
PdM for chillers was presented in many ways either by a general maintenance framework or by FDD protocol in order to keep pace with the rapid industrial development. Rueda and others reported the development of FDD for liquid chillers based on AI techniques at one of the laboratory test facilities [53]. By using an artificial neural network (ANN), they predicted the temperature increment of the water-cooled condenser with almost ninety-nine per cent prediction accuracy. A similar valuable study was performed in the United Kingdom by Tassou and Grace to predict the refrigeration leak fault of a particular liquid chiller at one of the large CBs [54]. This fault was also predicted by using the Kalman Filter (KF) algorithm [55]. Han and others integrated k-nearest neighbors (KNN), support victor machine (SVM), and random forest (RF) into an ensemble diagnostic model to predict the said fault and achieved around ninety-nine per cent accuracy [56]. Liu and others stated that the leakage faults are seriously affecting the reliability of chillers, and therefore, they proposed an excellent timely and accurate method based on the adaptive moment estimation algorithm with multilayer feedforward neural networks trained with the error backpropagation neural network (Adam-BPNN) [57]. In Hong Kong and China, seven studies applied principal-component analysis (PCA) to predict several faults of sensors that are reading operational parameters, such as chilled-water flow rate, condenser water flow rate, and evaporating pressure [58][59][60][61][62][63][64]. Furthermore, Hu and others applied self-adaptive PCA to enhance sensors' FDD efficiency [65]. In contrast, Li and others reported that support vector data description (SVDD) is better than PCA, as PCA is not very efficient when it comes to predicting complex sensor faults, due to the weakness of Q-statistic plot, which is part of it [66]. Choi and others utilized data from one of ASHRAE projects to predict multiple sensors faults of parameters such as the evaporator water entering temperature [67]. They applied three data-driven techniques, which are multiway dynamic PCA, multiway partial least squares (PLS), and deep-learning SVM. Based on their results, they found that the first two techniques, which employed generalized likelihood ratio test, are more accurate than the neural network one (SVM). This finding emerged with another study performed by Namburu and others to predict eight different faults of chillers by using the same three techniques [68]. From another ASHRAE project, Schein and Bushby applied a hierarchical rule-based FDD to predict the scheduling fault during three different weather seasons but with no broaching to the data sample of their study [69].
Sensors faults were not usually considered in the previous studies. For example, at one of CBs in Hong Kong, performance indices (PIs) proposed to predict evaporator fouling using regression model [70]. PIs were again proposed to predict the other seven faults, such as condenser fouling using fuzzy modeling and ANN technique [71]. Both previous studies concluded that PI may not be effective in fault diagnosis. In this regard, it would be interesting if the data of one of the ASHRAE projects which were mentioned by Comstock and others were utilized for proposing a new FDD, as the sensitivity of eight common faults were already tested [72]. Han and others applied FDD for multiple simultaneous faults of two chillers using combined SVM and multi-label (MLB) techniques [73]. These combined techniques showed high accuracy detection of the chillers' performance, although the experimental data were limited. On a separate note, such techniques require sufficient training data for high-quality outputs [74,75]. Per Ma and Wang, chiller performance degradation can be detected significantly by using a hybrid quick search (HQS) method through characterizing the PIs of multiple operational parameters, such as the temperature of the condenser water supply [76].
A high chiller's load affects the performance and leads to the appearance of faults such as condenser fouling. Yu with Chan discussed that via two studies, the first one on how to improve chiller management using regression model and the other one proposed an assessment strategy of chiller's performance using clustering analysis [77,78]. Zhao and others indicated that early identification of the said fault (condenser fouling) is essential to highly maintain chiller performance and developed a virtual sensor for that fault [79]. Moreover, Magoules and others proposed a significant FDD strategy using a recursive deterministic perception neural network (RDPNN) to predict faults related to chiller's load [80]. Data from one ASHRAE project were utilized in twelve different studies to predict condenser fouling, along with other faults [81][82][83][84][85][86][87][88][89][90][91][92]. The first study applied exponentially weighted moving average (EWMA) control charts; the second one applied Bayesian belief network (BBN); the third one applied SVDD; the fourth one used SVM; the fifth one applied conditional Wasserstein generative antagonistic networks (CWGANs); the sixth one combined extended KF (EKF) and recursive one-class SVM (ROSVM); the seventh one derived a tree-structured fault dependence kernel (TFDK); the eighth one used PCA, along with SVDD; and the ninth one adopted Linear Discriminant Analysis (LDA). With regards to the tenth one, One-Dimensional Convolutional Neural Network (1D-CNN) and Gated Recurrent Unit (GRU) were applied while the eleventh one conjoined a distance rejection (DR) technique with Bayesian network (BN) via transforming the chiller FDD problem into a single-class classification problem. The last one in that group predicted seven different faults by using the large margin information fusion (LMIF) method and found that this method is more accurate than others, such as multi-class SVM (MSVM), ANN, decision tree (DT), quadratic discriminant analysis (QDA), Ada Boost (AB), and logistic regression (LR). All of these studies showed significant accuracies but did not include fault-free situation in their data training. Moreover, three more studies used the same ASHRAE project just to compare different models for the same purpose of the previous twelve studies [93][94][95]. The first study presented two models, one by SVM and the second by combining nonlinear least squares support vector regression (SVR) based on the differential evolution (DE) algorithm with EWMA control charts, and it was found that the second one has better prediction. The second study applied multiple linear regression (MLReg), kriging algorithm, and radial basis function (RBF) and concluded that RBF is the best. The outcome of the third study showed that ANN is more accurate than KNN and bagged tree (BT) algorithms. The impact of condenser fouling was discussed and, accordingly, a decoupling-based FDD method was proposed to predict this fault [96]. This method was applied by observing the cooling capacity and suggested to clean the condenser water tubes before data collection. Later, this method was applied again alongside another two methods for efficiency comparison purposes in detecting multiple simultaneous chiller's faults [97]. The aforementioned other two methods were MLRrg and simple linear regression (SLReg), and it was found that these two methods (MLReg and SLReg) are not very effective. Bonvini and others argued that observing the energy consumption of chiller is considerable to predict the faults that are related to the high load [98]. They introduced the FDD approach based on unscented KF (UKF), which is an advanced Bayesian nonlinear state estimation technique, to predict three of the aforementioned faults. KF can be considered as a quite proven technique and does not require long-time focused studies when applied in individual CWS devices in different CBs [99]. This pretext came from a study that used KF to detect gradual chiller degradation based on the gray-box model at the Jinmao tower of China. The said model is based on measuring and analyzing the variations of chilled water flow rate and supplied chilled water temperature through statistical process control (SPC). Moreover, Karami and Wang integrated the Gaussian mixture model regression (GMMR) technique with UKF to model a nonlinear system based on the measurement data of four operational parameters and found this to be efficient in detecting chiller degradation and reducing the number of detecting sensors, as well [100].
The chiller faults either that are related to the high load or from other issues can be linked to human interventions and, accordingly, can influence the occupant's satisfaction. Having said that, maintenance characteristics such as the skills, the knowledge, and the number of maintenance laborers were addressed by Au-Yong and others at one of the office buildings, using mixed methods [101]. Following a survey that was shared with the occupants, as well as with key responsible staff, they predicted eight of that maintenance characteristics via a regression model and found empirical evidence that such communications with the concerned parties can improve the maintenance management and lead to the occupant satisfaction. In regard to high performance levels, it has been noticed that some CBs are using building management system (BMS) software in relation to their maintenance activities. For example, Alonso and others suggested utilizing BMS, in addition to plant management software (PMS), to control CWS, and they successfully applied this idea by observing the coefficient of chillers' performance at one of the large hospitals [102]. Yan and others proposed chiller's FDD procedure to develop BMS via a hybrid model that integrated SVM with autoregressive exogenous variables (ARX) and obtained a high prediction accuracy and minimal false-alarm rate [103]. Identical results were presented by Mclintosh and Mitcell, using statistical analysis by modeling the log-mean temperature difference and condenser water temperature difference to predict six faults of chillers [104]. In addition, two studies proposed a control strategy for chiller operation uncertainty by using the Monte-Carlo simulation (MCS) [105,106]. To curb the deterioration of chillers, Beghi and others proposed a semi-data-driven approach by using PCA in differentiating anomalies from normal operation variability and a reconstruction-based contribution approach to segregate variables related to faults [107]. To minimize faults prediction errors, Kocyigit addressed eight faults and claimed to use a fuzzy interference system (FIS) and Levenberg-Marquart-type ANN (LMANN) algorithm by evaluating several operational parameters, such as condenser pressure and evaporator pressure [108]. Additionally, Gao and others presented a novel FDD strategy in combining maximal information coefficient (MIC) with a long short-term memory (LSTM) network by using a virtual sensor [109].
It has been noted from the market that multiple providers of maintenance solutions for smart CBs are proposing building information modeling (BIM) and the building automation system (BAS) in addition to BMS. Cheng and others utilized BIM with IoT sensors to predict chillers' faults through ANN and SVM [110]. However, their approach could not be applied for other CWS components due to the differences in operational parameters. From BMS data, Escobar and others used a fuzzy logic clustering (FLC) approach for smart buildings that was called the learning algorithm for multivariable data analysis (LAMDA) and succeeded in reaching a zero-error state for chillers' control [111]. Besides BMS, Srinivasan and others have used explainable AI (XAI) for chiller FDD and showed how it is significant to acquire the trust of maintenance officers [112]. Hu and others used BAS for collecting data of chiller operational parameters, such as the condenser water flow, and then used them to detect faults by using SVM [113]. Considering the fault-free situation in their model training, their approach detected only one single fault, which was compressor overcharging. The same fault was efficaciously detected by using a PCA-based EWMA and virtual refrigerant charge (VRC) algorithm [114]. From BMS, Luo and others collected the data of chilled water supply and return temperatures in every minute frequency of six days from four different weather seasons to predict six different faults, in addition to fault free condition, using k-means clustering [115]. The said faults were not fully described, and as with other studies, the frequencies of their data sampling were not justified even after the development of this approach [116]. Theiblemont and others explored state-of-the-art control strategies [117]. The first strategy, called "Model-Free Control Strategy", does not require building a model or the use of historical data. It can be performed by programming the ambient temperature based on the weather forecast of the next day. The second strategy is an intelligent one which uses AI, along with a cold thermal energy storage (CTES) system. This strategy suggests combing a fuzzy logic controller and a feed-forward controller with weather predictions. To do so, the authors listed 27 rules for that [117]. Advanced control is the third strategy, which includes two techniques: Non-Optimal Advance Predictive Control and Model Predictive Control (MPC). The Unknown-but-Bounded method is an example of the first technique, and its implementation is costly. The concept of MPC is to optimize the variables of CWS as a function of future horizon to satisfy the relevant constraints. Arteconi and others suggested applying CTES for Demand-Side Management (DSM) strategy, which can change the chiller load profile to optimize the power system from generation to delivery [118].
Some studies used the ratio between the cooling load and the energy consumed, which is called the coefficient of performance (COP), as a data sample for scheduling PdM activities. In this regard, Wu and others proposed a method to optimize the PdM scheduling for HVAC system by mixed-integer programming (MIP) [119]. The said method has two stages: the first one is the parameter generation through historical data, and the second one is the optimization by linear programming. They conducted a case study on chillers and addressed COP. The idea of the first stage is to study the operational status and then listing the related constraints while the optimizing model (second stage) has to be solved to present a high-quality PdM schedule in order to detect the chillers' degradation. The model is a bit general, and it did not consider or discuss any precise faults or issues that lead to the chiller degradation. Li and others proposed a novel FDD method using a deep belief network (DBN) [120]. Their data were collected through an IoT agent and processed through four different stages, including optimizing them by particle swarm optimization (PSO) algorithm. Moreover, they did compare DBN with deep neural network (DNN), KNN, and SVM and obtained almost same prediction accuracy. From COP data, Motomura and others developed two outstanding simulation models to evaluate multiple chillers faults [121,122]. The first model calculated the increase amount of daily peak power, while the second one tracked the decrease rate of COP. Sulaiman and others observed chillers' COP and developed an FDD approach by using deep learning (DL), multi-layer perceptron (MLP), and SVM, and they mentioned that MLP is more accurate than others [123]. From a chiller's COP data sample, Ng and others used BN to predict sensor bias of water flow temperature, but the results were not very encouraging [124]. To obtain usual promising results, Harasty and others argued that an ANN should be used in PdM management [125].

Cooling Towers
Compared to the studies on chillers, the studies on cooling towers were limited and were either part of chillers' ones or were discussed separately. Ahn and others developed a simulation model to detect three faults of cooling towers [126]. Their model was built based on the deviation of different operational parameters such as the difference between the water temperatures that are leaving the tower and the temperatures that are entering the same. The only claim against this study is the data collection, as the authors did not clarify the source of their samples that were used in the associated experiment. Zhou and others used a regression model to detect air fan degradation fault by formulating the PI of the air-flow-rate reduction [70]. The sample size of their data was small, as it was generated from only five days in the summer season, including the fault-free condition. Hu and others collected data on fan power to detect the same fault by using SVM, and their sample size was also small [113]. From a qualitative method study, Chew and Yan suggested cleaning cooling towers' fans before applying any FDD approach [127]. Khan and Zubair discussed another fault, which is fouling of fills, and predicted it very well by using a regression model [128]. Through this model, the correlation was analyzed between the PIs of different operational parameters. Per Ma and Wang, the said two faults (fouling of fills and air fan degradation) can be detected significantly by using the HQS method through characterizing the PIs of multiple operational parameters, such as the inlet water temperature [76]. Air fan faulty was again predicted by Sulaiman and others when they compared MLP, SVM, and DL methods, and they found that MLP is more accurate than others [123].
Human and organizational factors are obviously affecting PdM costs and its scheduling. In this regard, Jain and others studied the failure conditions of a particular cooling tower by introducing a process resilience analysis framework [129]. This framework utilized a BN model to integrate two factors, which are process parameter variations as a technical factor, and human and organizational factor as a social one. It illustrated the impact of the said model on PdM management from cost and safety points of view. Melani and others insisted that making a significant investment in PdM is essential to maintaining the availability of systems that are operating CBs [130]. Having said that, they developed a generalized stochastic Petri net (GSPN) model to predict multiple faults, such as those related to fans, including the operational errors caused by humans. Furthermore, Aguilar and others proposed an autonomic cycle of data analysis tasks (ACODAT) involving BMS to manage the failures of two cooling towers of opera palace in Spain [131]. They utilized three techniques, namely MLP, KNN, and gradient boosting (GB), and reached to similar prediction accuracies. To diagnose such failures, Poit and Lancon suggested CBs to use SCANSITES 3D system and surveyed several cooling towers in France and found the said system to be very useful [132].
As was performed in chillers, the FDD of sensor faults was also studied in regard to cooling towers. At the Oak Ridge National Laboratory (ORNL) in the United States of America, the air fan degradation faults of the high flux isotope reactor (HFIR) were predicted by using wireless sensors [133]. Wang and others predicted the motor degradation by using PCA [63]. Their data samples were collected through a sensor that read one of the operational parameters, which was the inlet water temperatures, and per their results, the PCA did not always record the occurrence timings of that fault, and, accordingly, they could not evaluate the PI of the aforementioned parameter. An excellent study collected data from the same parameter to predict fan degradation fault by using the KF method [99]. Another study used the KF method to observe the cooling towers' performance at one of China's CBs [134]. To reduce the false-alarm rate, the said study analyzed and measured some chosen parameters via SPC. Motomura and others developed two superb simulation models to assess multiple cooling-tower faults [121,122]. The first model checked the water flow and the outside air wet-bulb temperature, whilst the second model focused on the inlet and the outlet condenser water temperatures. Data on air wet-bulb temperature, as well as other parameters, were collected to predict a particular cooling tower's performance and to eliminate the severity of the related faults by using the BPNN method [135]. The said method resulted in the obtainment of a very good correlation coefficient between the predicted and the experimental values.

Pumps
Following the literature on cooling towers, the number of studies on pumps is almost the same. Karim and others predicted five faults of pumps-out of which two were related to the cooling system-using ANN method, and their hypothetical data showed that such a method is capable of predicting the aforementioned faults [136]. Using a clustering method, Luo and others studied the sensors bias of primary and secondary pumps, but with no full description of the faults [115]. Through the HFIR project at ORNL, Hashemian predicted three different faults by using wireless sensors [133]. These faults are excessive noise, control switch failure, and faulty starter, and all of them are related to the secondary pump. From BAS, Hu and others collected a good data sample of differential pressure to predict the degradation of secondary pump by using SVM [113]. In order to keep control on the differential pressure of primary and secondary pumps, Ma and Wang developed a simulation model that takes the water flow rates into consideration [137]. Miyata and others used MCS to detect the operational uncertainty caused by the imponderable pressure [105]. Zhou and others used a regression model to detect partial clog fault in the secondary pump by formulating the PI of the increase in the pipeline resistance [70]. On the other hand, Wang and others predicted the same fault (partial clog) by using the PCA [63]. Furthermore, Liu and others studied the pipeline resistance and then predicted the primary pump's leakage fault by using Adam-BPNN [57]. Motomura and others developed two valuable simulation models to predict the faults of primary, secondary, and condenser pumps [121,122]. From the BMS data, their first model observed the water flow in liter per minute, while the second one focused on the sensor errors, and it also studied the impact of pumps specifications, such as the caliber.
The appearance of faults obviously affects the CWS performance, whether they are caused by human interventions or by an operational issue or unreliable sensor. Au-Yong and others focused on the pumps within their mixed-method study, which was explained in the section on chillers [101]. Per the qualitative method research of Chew and Chan, the maintenance officers and the researchers are advised to check the condenser pumps for corrosion before applying any FDD approach [127]. Moreover, Yang and others proposed the use of the FDD strategy with the ML method and counted data samples via BMS that are related to pumps, but they did not specify the associated operational parameters nor the ML method [138]. Yuan and Liu used a semi-supervised learning (SSL) technique to predict severe gear damage of a particular pump and took into consideration the fault free condition while training the model [139]. Bouabdallaoui and others introduced a PdM framework by using LSTM [140]. As part of this framework, they collected data for three pumps via BAS and IoT devices, but they did not specify the associated operational parameters nor the detected faults. With regard to the state-of-the-art control strategies, Theiblemont and others suggested applying adaptive MPC to decrease the pumps running time [117].

Terminal Units
The subject component has the largest number of studies comparing it to other CWS components. Liang and Du proposed an FDD model of the HVAC system that uses mixed methods. The under-study component was an AHU of a particular CB in Hong Kong [141]. They combined the simulation-based-model method with the SVM method. Three types of faults were addressed, which are return damper jam, cooling coil blockage, and speed reducing of the supply fan, noting that false signal fault is not considered in their study. Their method was built by collecting data of multiple parameters, such as the set temperature and the indoor cooling load. The original sample size was small; because it was generated from ten operational hours but based on the qualitative output of a related research that was reviewed by Ding, they assumed that the fault would arrive within one hour [142]. So, they did depend on this assumption when finalizing their required data and obtained a bigger sample size, which was used to build the said model. Through BIM and Modelica software, Andriamamonjy and others presented a simulation model to detect damper faults of a particular AHU [143]. Their model showed the potential of BIM for a significant reduction of the manual configuration needed to disseminate such a model, which was based on calculating the normalized root mean square error (RMSE) of multiple operational parameters, such as supply air temperature under three conditions, faulty, uncertain, and fault free. In contrast to a case study performed at one of the universities, Alavi and Forcada argued that BIM cannot constitute complete information on maintenance activities when implementing decision-making frameworks [144]. The study, which discussed the impact of human interventions in the occurrence of faults and was explained in chillers and pumps sections, also included AHUs [101]. The PdM framework of Bouabdallaoui and others, which was discussed in the pumps section, was also embedded with two AHUs, but they were not defined by the predicted faults in their case study, which was performed at one of the sport facilities in France [140].
Bruton and others discussed previous procedures and proposed a good one on how to choose the appropriate ML technique based on AHUs conditions [145]. Thereafter, they developed an automated FDD for AHUs, the contents of which are data access layer to be flexible with BMS, business layer to be flexible with any combination of sensors with operational parameters, and graphical user interface to evaluate the performance of AHUs [146]. Candanedo and others used DT technique for evaluating an early stage PdM model of terminal units [147]. In a set of buildings that are between zero and thirty years old, they obtained historical data of the indoor temperatures in order to compare them with the designed ones and then to identify any abnormal behavior. They indicated that DT showed its accuracy in covering the faults possibilities. To achieve the thermal comfort inside CBs, an experiment was performed by collecting occupant skin temperatures to predict and evaluate multiple issues, such as the air velocity of AHUs, using SVM and extreme learning machine (ELM) techniques, and obtained satisfactory results from both [148]. To get high accuracy FDD model, it is advised to clean the impeller, the fan scroll, and the blower blade of AHUs before applying that model [127]. Arteconi and others suggested a state-of-the-art control strategy using DSM to reduce the required AHU's size up to 40%, which leads to energy saving [118].
The variable air volume (VAV) of AHUs was discussed in many studies. For instance, in a multi-purpose research and test facility called an environmental chamber, Cho and others conducted two studies on a number of rooms that represent CBs' standards [149,150]. In addition to the fault-free condition, the first study used ANN to predict eight faults linked to AHU parts, including VAV, while the second study applied transient pattern analysis (TPA) to isolate the said faults to reach steady-state condition. The study of Schein and Bushby, which was mentioned above in chillers section, did predict a VAV sensor fault when reading the discharge air temperature [69]. At a large academic office building in Canada, Gunay and others developed an excellent simulation model to detect five VAV sequencing logic faults in two AHUs [151]. By using ASHRAE project's data, another excellent simulation model was developed by Norford and others to detect multiple AHU's faults that are related to VAV's damper, fan, and filter coil system [152]. Moreover, Li and others proposed a simulation model to predict eleven VA faults at a particular CB in China, and they succeeded in detecting nine of the faults, including outdoor air damper stuck and multiple sensors faults [153]. Two more valuable studies precited the damper stuck fault at two different CBs: the first one applied RF, while the second developed a simulation model [154,155]. At other different CBs, thirteen interesting studies used data from one ASHRAE project to predict some faults of AHUs and FCUs, including the ones that related to VAV, and obtained an acceptable prediction accuracy for each [156][157][158][159][160][161][162][163][164][165][166][167][168]. The first study applied the temporal association rules mining (TARM) algorithm, while both the second and third ones applied BN. The fourth study applied ensemble rapid centroid estimation (ERCE), the fifth one applied regression tree (RT), and the sixth one applied SVM. With regard to the seventh one, the generative adversarial network (GAN) was applied, and the eighth one combined RF with SVM. The ninth one utilized simulation software called HVACSIM+, the tenth one derived LMIF, the eleventh one applied PCA, the twelfth one applied SSL, and the last one applied SVM with ARX. In contrast, Zhao and others criticized the same ASHRAE project because its data did not cover a vast range of operating conditions [169].
Combining the FDD approach with faults-isolation approach is one of the PdM ideas. A study in Canada presented this idea by applying the PCA to detect two selected faults of AHUs faults and active functional testing (AFT) to isolate the same faults [170]. Two more studies applied the PCA, but in both detecting and isolating a number of faults on AHUs [171,172]. Ranade and others developed a simulation model to predict five selected faults of FCU and VAV, including fault free condition [173]. They argued that these faults can be isolated easily by applying DT. Using data of AHU's outlet water and supply air temperatures, Shahnazari and others applied a recurrent neural network (RNN) to detect and isolate the faults of the associated sensors [174]. Moreover, Wang and Chen conducted a case study at a particular CB, which has thirty-six floors, by applying EWMA for the same purpose [175]. Wang and others applied a genetic algorithm (GA) to predict and isolate the faults of AHU's supply fan and VAV [176]. Data from BMS were utilized to predict and isolate ten selected faults of AHUs by using BN [177]. At a green CB, an excellent experiment resulted in developing four simulation models to detect and isolate four faults of AHUs (one model for each fault) [178]. Yang and others presented a pragmatic simulation model to detect only four selected faults at forty-four buildings in Canada. Their solution relied on clustering work orders datasets, which were collected from occupants' complaints, and then computing mean time between failure (MTBF) [179]. MTBF was also computed by Sanchez-Barroso and Sanz-Calcedo [180]. To detect and isolate small bias sensors faults, a novel study advised using a hybrid-model-based FDD that combines the fractal correlation dimension (FCD) algorithm with SVR [181]. Zhang and Hong explained the background of multiple AHU faults, which will help the researchers or CBs' officers to take that into consideration while making PdM programs [182].
By recalling of what is written in the chillers section about how CBs are using BMS to control CWS performance, we note that Hosamo and others stated that BMS cannot detect many faults, including those that are related to AHUs [183]. Having said that, they conducted a case study on four AHUs at a particular university which proposed a digital twin technology that utilizes BIM and IoT's sensors, noting that this technology is an ANN-based technique. On a related note, Lee and others predicted seven faults of sensors by using a general regression neural network (GRNN) model [184]. Gao and others studied the impact of the system's water temperature difference (delta-T) on AHUs' performance [185]. They developed a worthy simulation model that generates the PI of each multiple operational parameter. Choi and Yeom introduced a thermal satisfaction prediction model that combined human factors and physiological signals [186]. Their data were collected from volunteer students through a LabVIEW-based data acquisition (DAQ) system and were analyzed by multiple statistical analysis and data-mining software called WEKA. Their study showed a significant correlation between the said factors and signals. A similar study discussed IAQ and used hypothesis tests to diagnose AHU's sensors' faults [187]. Shaw and others studied the correlation between multiple operational parameters to obtain reliable FDD results [188]. In Australia, an auto FDD model, which was developed by Guo and others in one of the large CBs, merged the hidden Markov model (HMM) and SVM [189]. Their data were collected through BMS from fifteen AHU sensors, and their model was trained based on selected faults over two business months. Unfortunately, they did not specify which parameters of the said AHU were studied, nor did they consider the sensors' false signal in their model. Holub and Macek presented a simulation model within a stochastic system by addressing the set temperature of a rooftop AHU [190]. The target of their application was to detect a diagnostic fault that links to the fan. Frankly, the data used to simulate the aforementioned model were limited where they applied a hybrid system. To obtain an active simulation model, Deshmukh and others suggested (while collecting AHU's data of fault free mode) holding three operational conditions, including closing the cooling valve [191]. Ma and others introduced a PdM framework which integrated BIM, geographic information system, and reliabilitycentered maintenance technologies by implementing a quantitative decision-making model, along with an MCS model [192]. Their case study was performed on a virtual university campus that includes AHUs, and they found it difficult to acquire a large data sample size. Gourabpasi and Nik-Bakht indicated that the lack of knowledge in locating the sensors is causing difficulty for either data collection or sensor's FDD [193].
Terminal units' FDD can be considered as a probabilistic approach. In the USA, Dey and Dong applied BBN in a probabilistic way to predict some AHU faults at one of the universities [194]. Du and others applied the wavelet neural network (WNN) to fix the AHU's sensor bias [195]. A subtractive clustering technique and BPNN were combined to catch the missing alarm when an AHU's fault occurred [196]. For missing alarm issues, a study suggested applying LMANN to eliminate that [197]. To enhance the thermal comfort, Dudzik and others used BAS and applied ANN to develop and examine the environmental quality management system [198]. At one of Qatar's sport facilities, Elnour and others applied a neural network that clustered the RMSE of some operational parameters, and then compared that with SVR, KNN, and DT techniques [199]. They found their approach to be more efficient than the aforementioned three techniques in controlling AHU's operation. Through virtual refrigerant mass flow sensors, Kim and Braun presented an FDD approach to predict five selected faults, such as condenser fouling [200]. Lauro and others used FLC to predict the abnormal behavior of a particular building's FCU [201]. Li and Wen used wavelet transform with the PCA technique (WPCA) to predict some AHU faults [202]. Liu and others applied the Markov chain Monte Carlo (MCMC) algorithm to drive the statistical characteristics of an AHU's faults levels [203]. On a single terminal unit, Lo and others applied fuzzy GA to eliminate sensors' false signals [204]. The study of Luo and others, as was explained in the chillers and pumps sections, also included terminal units [115]. By utilizing of ASHRAE's thermal comfort database, a novel study compared the thermal sensation vote (TSV) and the predicted mean vote (PMV) by using an RF model and resulted in around sixty-five percent accuracy in TSV prediction [19]. The study of Miyata and others, as mentioned in the chillers and pumps sections, did include AHUs, but they defined the related faults [105]. From an ASHRAE project dataset, Montazeri and Kargar applied six algorithms, namely SVM, RBF, kernel PCA (KPCA), DT, DBN, and shallow neural network (SNN), to detect the sensor and actuator faults, and they stated that DT had the biggest prediction accuracy [205].
ASHRAE datasets were not the only source in developing FDD within the literature. Novikova and others utilized VAST Challenge 2016 dataset to develop a simulation model that monitored and assessed terminal units' performance at a three floors' CB [206]. The data of residential complex building were utilized by Parzinger and others for AHU's FDD, using ARX and RF techniques [207]. Both techniques showed similar and acceptable prediction accuracy. Rafati and others performed a good review of the utilization of nonintrusive load monitoring software in terminal units' FDD [208]. In cooperation with a leading building management company, Satta and others proposed a PdM approach for the cohort of seventeen appliances that are similar to terminal units and examined it at one of Italian hospitals [209]. Using the historical data of different variables, such as the indoor temperature, they used DT to detect the abnormal behavior of these appliances. They argued that the reciprocal dissimilarities between appliances' behavior can expose an upcoming fault with enough anticipation to allow for a proactive meddling and avert breakage in operation. Tehrani and others addressed one fault related to a particular terminal unit at one of the Canadian universities [210]. The said fault was filter blockage, and they used ANN to predict the behavior of the said unit. Moreover, they determined that the performance of the unit in discussion has improved by using DT instead of ANN. Furthermore, Shakerian and others recommended applying the synthetic minority oversampling (SMO) technique to improve the prediction accuracy [211]. Sulaiman and others developed a fuzzy fault detection model for centralized CWS, using simulation [212]. They implemented the said model in the air-supply damper of an AHU, which is linked to two specific rooms. Three cases were studied in their research to simulate the said model. Two of them were related to the damper's faults, and the third one was at normal operation, without any faults. They identified these faults by checking the room-temperature variation. They mentioned that the developed model had resulted in detecting the damper faults, but with no technical details. Another FDD approach, which was presented by them and explained in the chillers and cooling towers section, also covered AHUs [123]. Thumati and others developed a generic simulation model to detect terminal unit faults and to isolate the associated residual errors [213]. Their idea could be presented perfectly by using virtualsensor approaches such as Verbert and et al.'s one [30]. At a residential facility, Turner and others developed a simulation model for AHUs' FDD [214]. During seven days' study time, they focused on the outdoor temperature and the set indoor temperature parameters to detect selected faults, such as compressor failure. They believe that using such data-driven approaches for tracking the said parameters can help easily detect the associated faults. Van Every and others applied Gaussian regression (GR) and SVM to estimate AHU's sensor values and to detect the associate faults, respectively [215]. Velibeyoglu and others applied directed acyclic graph (DAG) to assess the detectability of AHUs' simultaneous faults and obtained promising results [216].
The usage of one or more software or systems such as DAQ, BIM, IoT sensors, BAS, and BMS is important in controlling CWS performance. Villa and others extolled the usage of such software in AHU's FDD purposes, and, accordingly, they introduced an outstanding PdM framework, using an automatic ML platform called H2O [217]. Using FLC, Wijayasekara and others assessed BMS's performance in controlling the thermal comfort inside selected rooms [218]. Alongside BMS, DT was used by Yan and others to develop a diagnostic strategy for AHUs [219]. Nine cases were addressed for their related experiment, which are eight faults, such as duct leakage, and one case for normal operation (fault free). For this experiment, they used data that were recorded from one of the ASHRAE projects. They emphasized that data-driven methods are unique to glean the useful information from large datasets and for modeling the behavior of HVAC systems. Yu and others proposed association rule mining (ARM), which is a data-mining technique, to test the correlation between all AHUs' operational parameters at one of the complex buildings that contains offices and chemical labs [220]. It seems that they faced some difficulties in regard to the data-collection part. The absence of data sources makes any ML model weak in detecting and diagnosing the faults [108].

Discussion
From the previous section, it is observed that chillers and terminal units were mostly researched, while there was not much research focused on cooling towers and pumps. Following SLR, the maximum number of research studies on chillers was carried out in the years 2016 and 2019, whereas on terminal units, it was performed in the year 2020. Regarding cooling towers and pumps, the year 2019 recorded the maximum number of research studies. Figure 6 highlights the research trends from the year 1999 onward.

Discussion
From the previous section, it is observed that chillers and terminal units were mostly researched, while there was not much research focused on cooling towers and pumps. Following SLR, the maximum number of research studies on chillers was carried out in the years 2016 and 2019, whereas on terminal units, it was performed in the year 2020. Regarding cooling towers and pumps, the year 2019 recorded the maximum number of research studies. Figure 6 highlights the research trends from the year 1999 onward. The considered studies sometimes addressed CWS components independently and other times in combinations. For combinations, some of them addressed either two components or three components in total; no research study addressed the whole system (i.e., four components) at once. Table 3 shows the number of considered studies that addressed either a single component or addressed more than one within the same research.  The considered studies sometimes addressed CWS components independently and other times in combinations. For combinations, some of them addressed either two components or three components in total; no research study addressed the whole system (i.e., four components) at once. Table 3 shows the number of considered studies that addressed either a single component or addressed more than one within the same research. This research is intended to answer the aforementioned two RQs where the idea behind both of them is to explore the ways of implementing a PdM program on a CWS. The aim of RQ1 is to understand how the researchers are preparing or arranging for the PdM program. This includes checking the way of identifying and studying the system faults, which are considered as a base of implementing the PdM program. Moreover, it includes identifying the operational parameters, which allow us to observe the system and lead to the faults, as well as knowing their data sample size and their source. With regard to the aim of RQ2, it is a further action after RQ1 and intends to explore the tools, methods, frameworks, or control strategies that were applied to make the PdM program. So, the third section of this article (Application) is presented based on the aims of these RQs. Each considered study has gone through these activities unless it has missing information.
The following four subsections highlight the findings of this research. The first one addresses the faults-related information, and the second subsection focuses on the operational parameters and their data samples. Both of these subsections are related to RQ1. The third subsection is related to RQ2, while the last subsection lists and explains the research gaps.

System Faults
The following points are the major findings with regard to system faults:

•
The fault is defined as any failure that may lead to a CWS breakdown over time.

•
Some of the considered studies were focused on only one fault, such as condenser fouling of chillers.

•
Many of the considered studies addressed refrigeration leaks, which can be considered as the most popular fault in chillers.

•
Fouling of fills and air-fan degradation are the most addressed faults of cooling towers.

•
The most addressed fault of pumps is the partial clogging, whereas the full clogging, which is more critical, was not addressed at all.

•
For terminal units, the most addressed faults were return damper jam, cooling coil blockage, and speed reducing of the supply fan including the VAV related faults.

•
There are obvious differences between the studies in the number of the addressed faults for all CWS components. • Some of the studies addressed the same faults, while the others addressed different faults. • Some studies did not state the addressed faults, and even if they stated the faults, they were found to be not fully described, stating "abnormal behavior" as a fault, for example. • Some studies stated multiple faults, but they did not address or predict all of them in their case studies. • Faults such as sensors' bias and controller false alarms were not usually considered in the aforementioned studies. • High chiller's load is affecting the performance and leading to faults such as condenser fouling and compressor overcharging. • Condenser fouling was found to be the fault with the most negative impact on CWS reliability, as well as on the occupants' satisfaction.

•
Human factors, such as the skills of the maintenance officer who manages the system, have a significant impact on faults appearance. • Fault-free mode has to be considered in any research in order to increase the prediction reliability.

Operational Parameters and Data Collection
Operational parameters are the measurable factors that provide numerical data of the system performance [26]. Hereunder are the major findings in this regard: • Any operational parameter can give a glimpse of the health condition of the related CWS component. • Water leaving temperature is the most operational parameter used for both chillers and cooling towers.

•
The most operational parameter used for pumps is the differential pressure, while the space temperature is the most common one for terminal units. • Some studies did not specify which operational parameter their studies were built on. In addition, some studies stated the operational parameters, but they did not provide detailed information about the associated data, including the sample size, source, and frequencies.

•
The way of collecting the data is different for different the studies. The samples size and the frequencies were not the same for all considered studies. With regard to the data source, some studies used ASHRAE projects, part of them counted on the sensors, and others used historical records of the same buildings that were under study. Moreover, some studies utilized IoT sensors, BMS, or BAS to obtain their data, as well as to control the system. • In the literature considered that used sensors as a data source, it has been noticed that no study has suggested management or technical procedures in the case of unavailability of that source at a particular building.

•
Using the PI of operational parameters may not be effective in fault diagnosis.

•
The buildings management software, such as BMS, cannot detect all the faults. In contrast, faults predictions and system control can be improved by using them alongside ML methods.

Predictive Tools and Control
This subsection gives a swift overview on PdM tools that were applied within the considered studies and highlights the major findings of the same as per the below points:

•
The methods mostly used are a simulation model, SVM, DT, and ANN. They are furnished in Table 4 DSM, which is one of the state-of-the-art control strategies, showed a significant impact on energy savings.

•
To ensure an excellent ML model, it is advised to clean some CWS's critical parts such as the AHU's fan scroll and chiller's condenser water tubes before collecting the data.

•
Having an excellent ML model would be challenging if the data sample size is inadequate.

•
The literature showed that the proposed PdM programs ended with testing the ML model. Moreover, no management solutions were provided for the addressed faults.

Research Gaps
Based on the RQs, the research gaps found in this literature review from engineering management point of view can be summarized into three parts. The first part is on how the faults are identified and described, the second part is on data collection and the frequency of training the chosen model, and the third part is to what extent the proposed PdM programs are made. Hereunder is the description of the said gaps:

•
The literature did not have the same faults and was concentrated only on selected faults, as some faults were either not stated/mentioned or were not fully described.

•
The current literature did not specify how the data were collected or justify the period or the frequency of the collected data, and it was limited to testing the model and not controlling it.

•
The suggested programs/frameworks/models contained either no or inconclusive solutions for the said faults from the management point of view, as they ended at how to detect/predict the faults. Moreover, the said programs did not study/cover the whole system comprehensively.  CH and TU  HMM  TU  WNN  TU  H2O  CH and TU  RT  TU  KPCA  TU  ARM  TU  RNN  TU  SMO  TU  TARM  TU  FCD  TU  DAG  TU  ERCE  TU  DSM CH and TU

Conclusions
This article aimed to answer two RQs, which are addressing the literature from an engineering management point of view. The first RQ asked about the arrangements or the preparations to identify the faults, and the second RQ is considered as a further activity of the first RQ, as it is asking about the tools that were used in line with Industry 4.0/Quality 4.0. This article implemented an SLR that includes four stages, and then it highlighted the studies performed post-1999 on PdM in CBs and explored many frameworks, programs, and methods. It also highlighted the gaps found in the literature from an engineering management point of view. Following SLR, especially the second stage, it was found that there is no research covering the entire CWS. The considered studies are covering either one, two, or three components only. Therefore, this article was made to focus on all four components. From a maintenance management point of view, it is recommended for the researchers to study the whole system or at least to give more research attention to the cooling towers and pumps.
As per the first research gap, which is mentioned in the previous section, it is concluded that CWS may have some other different faults than those that were studied in the literature, and, therefore, it is recommended that researchers perform further studies to explore more faults. With regard to the second research gap, it is recommended that the researchers verify the required sample size, as well as the frequencies of data readings/record. For data collection, it is recommended that the researchers create their own dataset from the same building in order to obtain more accurate data about the current operational situation and to avoid depending on historical record from different building or project. In addition, it is highly recommended to make a control plan after testing the ML model in order to keep continuous tracking on the building's O&M.
The suggested future course of action of this article is to make a PdM 4.0 program by using mixed methods. The first step would be to design a survey and share it with the facility management professionals at CBs located in the study area. The content of the said survey should list the faults found in the literature and ask the participants to select the ones they find in their site and add more faults, if any. This will enrich the knowledge of the research community with more faults about CWS. Moreover, it should include asking the participant about the frequency of such faults' occurrence (minimum and maximum duration). The survey should also let the participants prescribe solutions to the faults. This article believes the operational parameters that are reflecting the health condition of CWS components are leaving temperatures for chillers, as well as for cooling towers, the pressure for pumps, and space temperature for terminal units. This argument has been supported by the literature, along with practical experience. So, the survey should include a verification of this belief from the participants. Furthermore, it is recommended to conduct a pilot study to ensure that the said survey is valid.
The second step would be to make a methodological framework that contains three parts, as follows:

1.
Setup Part: This part should include the following points: • Showing how to understand the drawings of CWS at a particular building.

•
Proposing a format to extract/identify the number of components of CWS from the said drawings (number of chillers, number of cooling towers, number of primary/secondary pumps, and number of terminal units).

•
Introducing a management procedure on checking the availability of reading tools, such as sensors, building management software, etc., and how to deal with their unavailability situation. • Proposing a way of formulating a team that will be responsible for data collection.

2.
ML Part: This part should explain how the proposed predictive maintenance program is in line with Industry 4.0/Quality 4.0. The first task in the ML part is data collection. The readings should be from the operational parameters and should be taken based on the minimum and maximum frequencies that should come from the said survey.
A check sheet is proposed to be used in recording the data at the building under study. All of these points should be formatted in a methodological manner from a management point of view. Python is suggested to be used in the next step of the ML part. For the said step, a procedure should be proposed on how to insert the data, train the ML model, and test the same model for each component of CWS.

3.
Quality-Control Part: Here, a control plan for CWS through a tabulated format should be presented. The outcomes of the survey (frequencies) should again be used here as part of the said control plan. The other survey's outcome (solutions for each fault) should be summarized here, as well, in order to create a comprehensive management program for PdM 4.0 for CWS. to O&M Manager at Alfaisal University (Eng. Kaleemoddin Ahmed) for his great assistance during this research.

Conflicts of Interest:
The authors declare no conflict of interest.