Machine Learning Approach to Predict the Performance of a Stratified Thermal Energy Storage Tank at a District Cooling Plant Using Sensor Data

In the energy management of district cooling plants, the thermal energy storage tank is critical. As a result, it is essential to keep track of TES results. The performance of the TES has been measured using a variety of methodologies, both numerical and analytical. In this study, the performance of the TES tank in terms of thermocline thickness is predicted using an artificial neural network, support vector machine, and k-nearest neighbor, which has remained unexplored. One year of data was collected from a district cooling plant. Fourteen sensors were used to measure the temperature at different points. With engineering judgement, 263 rows of data were selected and used to develop the prediction models. A total of 70% of the data were used for training, whereas 30% were used for testing. K-fold cross-validation were used. Sensor temperature data was used as the model input, whereas thermocline thickness was used as the model output. The data were normalized, and in addition to this, moving average filter and median filter data smoothing techniques were applied while developing KNN and SVM prediction models to carry out a comparison. The hyperparameters for the three machine learning models were chosen at optimal condition, and the trial-and-error method was used to select the best hyperparameter value: based on this, the optimum architecture of ANN was 14-10-1, which gives the maximum R-Squared value, i.e., 0.9, and minimum mean square error. Finally, the prediction accuracy of three different techniques and results were compared, and the accuracy of ANN is 0.92%, SVM is 89%, and KNN is 96.3%, concluding that KNN has better performance than others.


Introduction
From the standpoint of heating and air conditioning, stratified thermal energy storage (TES) has become increasingly popular in recent years [1,2]. The advantage of adopting TES for comfort and cooling applications is that energy and cost concerns are no longer an issue. The actual performance of TES is determined by several elements, including mixing at the tank's inflow, mixing of hot and cold water, heat loss to the environment, and aspect ratio [3].
The above-mentioned factors impact the temperature distribution in the TES tank during the charging period; the transition between hot and cold water in the tank is known as the thermocline thickness (W TC ). The performance of TES tank is determined through from the literature that machine learning has recently drawn the attention of various fields and also in energy systems, but has not yet been well utilized for stratified thermal energy storage performance in district cooling plants; this is why, in this study, machine learning has been applied to predict the performance of a stratified thermal energy storage tank with sensor data. An algorithm's success is determined by the nature of the problem, such as its variables, boundaries, and additional complications, such as patterns in data. The ability of an algorithm to solve a particular problem, however, does not guarantee that it will perform better than a random search [36,37]. Three machine learning techniques, artificial neural networks (ANN), support vector machines (SVM), and K-Nearest Neighbor (KNN), were used in this study.
The structure of the paper is as follows: in Section 2, the methods used to achieve the study objective are discussed. In Section 3, the results generated using machine learning algorithms are presented and discussed. Finally, Section 4 exhibits the conclusion and the future works.

Data Collection
The data in this study was collected from the operating TES tank installed at the UTP GDC plant. Figure 1 shows the layout of TES system at GDC UTP and Figure 2 represents the flow during the charging process of TES tank at the GDC plant. efficiently predicting performance. Jia, Liu et al. 2022 created a system for TES operating strategy optimization by fusing physics-based modelling with deep learning [35]. The ANN has recently seen significant use in applications related to the energy system, as the aforementioned literature on ANN demonstrates. Based on the above discussion, it is clear from the literature that machine learning has recently drawn the attention of various fields and also in energy systems, but has not yet been well utilized for stratified thermal energy storage performance in district cooling plants; this is why, in this study, machine learning has been applied to predict the performance of a stratified thermal energy storage tank with sensor data. An algorithm's success is determined by the nature of the problem, such as its variables, boundaries, and additional complications, such as patterns in data. The ability of an algorithm to solve a particular problem, however, does not guarantee that it will perform better than a random search [36,37]. Three machine learning techniques, artificial neural networks (ANN), support vector machines (SVM), and K-Nearest Neighbor (KNN), were used in this study. The structure of the paper is as follows: in Section 2, the methods used to achieve the study objective are discussed. In Section 3, the results generated using machine learning algorithms are presented and discussed. Finally, Section 4 exhibits the conclusion and the future works.

Data Collection
The data in this study was collected from the operating TES tank installed at the UTP GDC plant. Figure 1 shows the layout of TES system at GDC UTP and Figure 2 represents the flow during the charging process of TES tank at the GDC plant.  After collecting the data from the TES sensors, the data was analyzed in an Excel worksheet. Figure 3 shows that during charging, the temperature distribution moves from the bottom to the top direction, and the width of the thermocline thickness increases with the passage of time. This behavior has been verified in many previous studies, such as [6,[38][39][40]. Evolution of temperature distribution and thermocline thickness during charging is shown in Figure 3. After collecting the data from the TES sensors, the data was analyzed in an Excel worksheet. Figure 3 shows that during charging, the temperature distribution moves from the bottom to the top direction, and the width of the thermocline thickness increases with the passage of time. This behavior has been verified in many previous studies, such as [6,[38][39][40]. Evolution of temperature distribution and thermocline thickness during charging is shown in Figure 3.

Description of TES System
The TES capacity is 10,000 RTh, connected with four vapor compression or electric chillers (EC) which charge the TES tank and have a capacity of 325 RTh each. The cooling demand of the UTP campus and mosque is fulfilled by the GDC plant. The 14 temperature sensors are installed vertically at a distance of 1 m in the tank, and the data acquisition system is used to collect the temperature data, as shown in Figure 1. During off-peak hours, the chillers are used to charge the tank, and during on-peak periods the tank is discharged to supply the chilled water to the campus, as shown in Figure 2. The tank is cylindrical, having an inside diameter of 22.3 m and a height of 15 m. The total storage capacity of the tank is 54,000 m . The lower nozzle is made up of a pipe with a diameter of 0.5 m located at a height of 1.82 m, and the upper nozzle has a diameter of 0.3 and its location is at 12.3 m; diffusers are attached with both nozzles. The insulation material used  After collecting the data from the TES sensors, the data was analyzed in an Excel worksheet. Figure 3 shows that during charging, the temperature distribution moves from the bottom to the top direction, and the width of the thermocline thickness increases with the passage of time. This behavior has been verified in many previous studies, such as [6,[38][39][40]. Evolution of temperature distribution and thermocline thickness during charging is shown in Figure 3.

Description of TES System
The TES capacity is 10,000 RTh, connected with four vapor compression or electric chillers (EC) which charge the TES tank and have a capacity of 325 RTh each. The cooling demand of the UTP campus and mosque is fulfilled by the GDC plant. The 14 temperature sensors are installed vertically at a distance of 1 m in the tank, and the data acquisition system is used to collect the temperature data, as shown in Figure 1. During off-peak hours, the chillers are used to charge the tank, and during on-peak periods the tank is discharged to supply the chilled water to the campus, as shown in Figure 2. The tank is cylindrical, having an inside diameter of 22.3 m and a height of 15 m. The total storage capacity of the tank is 54,000 m . The lower nozzle is made up of a pipe with a diameter of 0.5 m located at a height of 1.82 m, and the upper nozzle has a diameter of 0.3 and its location is at 12.3 m; diffusers are attached with both nozzles. The insulation material used

Description of TES System
The TES capacity is 10,000 RTh, connected with four vapor compression or electric chillers (EC) which charge the TES tank and have a capacity of 325 RTh each. The cooling demand of the UTP campus and mosque is fulfilled by the GDC plant. The 14 temperature sensors are installed vertically at a distance of 1 m in the tank, and the data acquisition system is used to collect the temperature data, as shown in Figure 1. During off-peak hours, the chillers are used to charge the tank, and during on-peak periods the tank is discharged to supply the chilled water to the campus, as shown in Figure 2. The tank is cylindrical, having an inside diameter of 22.3 m and a height of 15 m. The total storage capacity of the tank is 54,000 m 3 . The lower nozzle is made up of a pipe with a diameter of 0.5 m located at a height of 1.82 m, and the upper nozzle has a diameter of 0.3 and its location is at 12.3 m; diffusers are attached with both nozzles. The insulation material used is polystyrene, with 300 mm thickness, and epoxy paint is used for the internal coating of the tank. The mass flow rate of the electric chiller is designed at 131 m 3 /h, the inlet and outlet temperature are set at 13 • C and 6 • C, respectively.
The input to the machine learning models is the temperature distribution collected from the sensors installed vertically in the tank, and the output data for the machine learning models model is W TC . The outputs were obtained using the non-linear regression technique proposed by [41,42]. Equation (1) has been used to predict W TC .

Data Preprocessing
In order to have an accurate prediction model, dealing with the input data before feeding it to the algorithm is always desirable. Hence, in this study, the input data were normalized and smoothed with two different smoothing techniques, such as moving average filter and median filter. The procedure is shown in Figure 4 below. is polystyrene, with 300 mm thickness, and epoxy paint is used for the internal coating of the tank. The mass flow rate of the electric chiller is designed at 131 m /h, the inlet and outlet temperature are set at 13 ℃ and 6 ℃, respectively.
The input to the machine learning models is the temperature distribution collected from the sensors installed vertically in the tank, and the output data for the machine learning models model is . The outputs were obtained using the non-linear regression technique proposed by [41,42]. Equation (1) has been used to predict .

Data Preprocessing
In order to have an accurate prediction model, dealing with the input data before feeding it to the algorithm is always desirable. Hence, in this study, the input data were normalized and smoothed with two different smoothing techniques, such as moving average filter and median filter. The procedure is shown in Figure 4 below.

A. Data Normalization
In this study, the data normalization was carried out using the minimax technique, using Equation (3). In this method, the rescaling of the outputs was carried out by

A. Data Normalization
In this study, the data normalization was carried out using the minimax technique, using Equation (3). In this method, the rescaling of the outputs was carried out by transferring one range of values to another range of values [43,44]. Mostly the rescaling is carried out between [0, 1] or [−1, 1]. The linear interpretation formula, such as Equation (3), was used for the rescaling. In this study, the range of normalization was between 0 and 1. The normalization procedure could lessen the learning algorithm's ambiguity regarding the significance of each parameter with a smaller amplitude. Figure 5 represents that how the data is normalized.
where X is the original value, X min is the minimum value and X max is the maximum value.
transferring one range of values to another range of values [43,44]. Mostly the rescaling carried out between [0, 1] or [-1, 1]. The linear interpretation formula, such as Equatio (3), was used for the rescaling. In this study, the range of normalization was between and 1. The normalization procedure could lessen the learning algorithm's ambiguity r garding the significance of each parameter with a smaller amplitude. Figure 5 represen that how the data is normalized.
where X is the original value, is the minimum value and is the maximu value. B. Data smoothing/denoising techniques In this study, the moving average filter and median filter were used to denoise th data. The mean average filter replaces the point with the average values found for th number over a specified frame length, where the moving median filter works with th principle of processing the signal entry by entry, and replacing each input with media values over a certain frame length [45].
The moving average filter is expressed as: where the input values are x[I − j], the output value is y[j] , and the frame duration is M The filter or the smoothing using moving average filter is expressed as: where A is an array of input data, Y is smoothed data, and m is the frame length. The filter or the smoothing using moving median filter is expressed as: where A is an array of input data, Y is smoothed data, and m is the frame length. Figure 6 shows the behavior of the input and output data. The data clearly show that there are some noises which must be denoised.

B. Data smoothing/denoising techniques
In this study, the moving average filter and median filter were used to denoise the data. The mean average filter replaces the point with the average values found for the number over a specified frame length, where the moving median filter works with the principle of processing the signal entry by entry, and replacing each input with median values over a certain frame length [45].
The moving average filter is expressed as: where the input values are x[i − j], the output value is y[j], and the frame duration is M. The filter or the smoothing using moving average filter is expressed as: where A is an array of input data, Y is smoothed data, and m is the frame length. The filter or the smoothing using moving median filter is expressed as: where A is an array of input data, Y is smoothed data, and m is the frame length. Figure 6 shows the behavior of the input and output data. The data clearly shows that there are some noises which must be denoised. Hence the data was smoothed using the moving average filter and median average filter, as shown in Figure 7. The graph shows that T1, T2, T3, and T4 were more smoothed by the mean average filter, T5, T6, T7, T8, T9, and T10 were more smoothed by moving median filter, whereas T11, T12, T13, and T14 show the same smoothing magnitude for both smoothing techniques at the same frame length. The frame length for both filters was 20. Hence the data was smoothed using the moving average filter and median average filter, as shown in Figure 7. The graph shows that T1, T2, T3, and T4 were more smoothed by the mean average filter, T5, T6, T7, T8, T9, and T10 were more smoothed by moving median filter, whereas T11, T12, T13, and T14 show the same smoothing magnitude for both smoothing techniques at the same frame length. The frame length for both filters was 20.    The blue color graph is for unsmoothed normalized data, the red color graph for mean average filter-smoothed and normalized data, and the yellow one is for median average filter-smoothed and normalized data.

Data Division
Once the dataset was obtained from the plant, it was stored in the CSV file, then the file was uploaded in the MATLAB toolbox where it was divided into three parts. A subset of 185 (70%) from the whole data was used for training purposes. A subset of 39 (15%) from the whole dataset was used for cross validation. A subset of 39 (15%) was used for testing purposes. Table 1 shows the example of division of dataset. Table 1. Example of division of dataset for one day data. Dataset  Hour T1  T2  T3  T4  T5  T6  T7  T8  T9  T10 T11 T12 T13 T14 W

ANN Modeling
After the data collection and preprocessing, ANN modeling was performed using the feed-forward backpropagation technique [44] as shown in Figure 9. The mathematical form of artificial neural network is presented in below.

ANN Modeling
After the data collection and preprocessing, ANN modeling was performed using the feed-forward backpropagation technique [44] as shown in Figure 9. The mathematical form of artificial neural network is presented in below.

Steps in ANN Modeling
Step 1 Determine the output of the input layer.
Step 2 Determine the output of the hidden layer. The corresponding normalized input value is multiplied with the corresponding initial weight and these products are added together. This summation is again added to with the variable bias. The resulting sum is applied to the activation function. The activation function employed is the log sigmoid function, due to its best prediction capability as proved in some studies, including but not limited to [46], as shown in Equation (7).
where is the result of adding the inputs and weights.
Step 3 Determine the output of the output layer. Steps in ANN Modeling Step 1 Determine the output of the input layer.
Step 2 Determine the output of the hidden layer. The corresponding normalized input value is multiplied with the corresponding initial weight and these products are added together. This summation is again added to with the variable bias. The resulting sum is applied to the activation function. The activation function employed is the log sigmoid function, due to its best prediction capability as proved in some studies, including but not limited to [46], as shown in Equation (7).
where x is the result of adding the inputs and weights. Step 3 Determine the output of the output layer. The same logic, involving multiplication of weights, with inputs followed by summation and its application to the activation function (as in Step 2) is applied, but the activation function used here is linear function, as shown in the Equation (8).
Step 4 Determine the error of the model. For each training pattern, the error of the output layer is compared to the desired output, and the difference between the two is the error, which is determined using Equation (9).
Y is the forecast value, while T is the target value.
Step 5 Calculate the square of the error. The square of the error can be calculated using Equation (10).
A. Backward Pass The purpose of the backward pass is to calculate errors in the hidden and input layers and modify the weights for better predictions.
Step 1 Determine the node's error in the hidden layer. The derivative of the activation function is multiplied by error, and the result is used to determine the error. The activation function is derived as follows: The activation function's derivative is given by Y × (1 − Y) Step 2 Determine the node's error in the input layer. The error is determined by multiplying the activation function's derivation by the error.
Step 3 Adjustment of the weights. There are several methods for finding the minima of a parabola or any function in any dimension. One of the best techniques for training ANNs is the Levenberg-Marquardt optimization algorithm, particularly when the number of weights is high. Although requiring more memory than other techniques, this method is strongly advised as the first choice for supervised tasks. Equation (13) is used to update the weights in a Levenberg-Marquardt training function (LMTF).
Step 4 Determine whether or not the training should continue.
Repeat the previous steps until the minimum error is reached or the set number of iterations has been exceeded.
Step 5 Denormalization of the output. This step is crucial to return the values to their original, unnormalized form using Equation (14).

Support Vector Machine
The supervised method (the most-isolation approach) is the support vector machines method. By supplying numerical data, it solves classification and regression problems [47].
The precise location for the hyperplane is determined by the support vector machine process through optimization until the hyperplanes are positioned in the maximum margin [48]. The hyperplane with the highest margin is capable of accurately classifying data or other input parameters [49]. The key concern in support vector machine (SVM) is how to determine the hyperplane's ideal position. The most acceptable position should be chosen after conducting many tries. The hyperplane's main function is to divide the two classes. Thus, widening the decision boundary between the two classes is suitable. The line crossing across the support vector is referred to as the separate lines to indicate the decision boundary. The concept of SVM is depicted in Figure 10. margin [48]. The hyperplane with the highest margin is capable of accurately classifying data or other input parameters [49]. The key concern in support vector machine (SVM) is how to determine the hyperplane's ideal position. The most acceptable position should be chosen after conducting many tries. The hyperplane's main function is to divide the two classes. Thus, widening the decision boundary between the two classes is suitable. The line crossing across the support vector is referred to as the separate lines to indicate the decision boundary. The concept of SVM is depicted in Figure 10. All the input data is classified into the necessary classes using a support vector machine. The label, yi, and the set of data, "x1... xn," are required. The input data is either predicted to be in the class label, yi = 1 or in the class label, yi = −1 [50]. Accordingly, it is written as: where n is the number of data points and p is the feature dimension.
where ( ) is the dimensions row vector corresponding to the normal axis to the classifier, b is the bias, and X i is a provided row vector associated with the i th All the input data is classified into the necessary classes using a support vector machine. The label, y i , and the set of data, "x1 . . . xn," are required. The input data is either predicted to be in the class label, y i = 1 or in the class label, y i = −1 [50]. Accordingly, it is written as: where n is the number of data points and p is the feature dimension.
The hyperplane equation is expressed as: The equation for 2D space is as follows: From Equations (1)-(3), we can generalize that: where W = (w1 . . . wd) is the dimensions row vector corresponding to the normal axis to the classifier, b is the bias, and Xi is a provided row vector associated with the ith data point. yi ∈ {−1, +1} is a binary class variable of the ith data point. The bias b controls the distance of the classifier line from the origin as the vector W controls the classifier line's position.

K-Nearest Neighbor
The fundamental supervised algorithm is commonly referred to as k-NN. The concept behind KNN is used in many other machine learning methods, therefore to understand other techniques, studying KNN could be a great place to start [51]. The instances are not stored throughout the training phase. In training, some reasonable indexing is required to quickly identify the classes. The k-nearest classes of the one instance must be found via KNN [52]. The effectiveness of classification is dependent on the preference of similarity by calculating the distance between two instances. The k-nearest neighbor algorithm is shown in Figure 11. The Euclidean distance method is the most practical of the various ways to determine the distance between two points. The Euclidean distance between two points in XY dimensions is calculated as follows: Euclidean Distance between point A 1 and B 2 =

K-Nearest Neighbor
The fundamental supervised algorithm is commonly referred to as k-NN. The concept behind KNN is used in many other machine learning methods, therefore to understand other techniques, studying KNN could be a great place to start [51]. The instances are not stored throughout the training phase. In training, some reasonable indexing is required to quickly identify the classes. The k-nearest classes of the one instance must be found via KNN [52]. The effectiveness of classification is dependent on the preference of similarity by calculating the distance between two instances. The k-nearest neighbor algorithm is shown in Figure 11. The Euclidean distance method is the most practical of the various ways to determine the distance between two points. The Euclidean distance between two points in XY dimensions is calculated as follows: Figure 11. K-Nearest neighbor algorithm.

Classifier Tuning and Validation Techniques
The performance of the classification model prediction is tested using a variety of techniques. It is conventional to use 70% of the data set for training and the remaining 30% for testing the model's efficacy. This approach is known as leave-p-out. Cross-validation is the second type of model training and testing strategy, and works by dividing the given dataset into training and validation data subsets and then rotating the

Classifier Tuning and Validation Techniques
The performance of the classification model prediction is tested using a variety of techniques. It is conventional to use 70% of the data set for training and the remaining 30% for testing the model's efficacy. This approach is known as leave-p-out. Cross-validation is the second type of model training and testing strategy, and works by dividing the given dataset into training and validation data subsets and then rotating the evaluations among the subsets; it can assess a classifier's generalized overall accuracy on an unbiased dataset. When the data set is small, cross validation is used. A generalization called k-fold crossvalidation divides the data into k−1 training sets and a validation set. The partitions must roughly be equal in size and are formed by randomly choosing data. The average of each performance metric, such as classification accuracy, will be the outcome of cross-validation. K-fold cross validation means using the partition just once while repeating the validation k times. MATLAB programming language was used to develop the classification algorithms. During development of the SVM and KNN prediction model, the frame length for both moving average filter and moving median filter was 20. The frame length was varied and finally a frame length which leads to the best classification accuracy was selected.

Hyper Parameter Optimization for the Model
The machine learning models are highly dependent on the following hyerparameters as shown in the Table 2.

Evaluation Performance Indices
After the prediction was achieved from the machine learning models, the output of machine learning models was compared with the actual output, i.e., the training dataset. To compare the performance of the ANN and non-linear regression model results, MSE, as shown in Equation (20), and R 2 , as shown in Equation (21), were used in this study as follows. However, for the support vector machine and KNN, the confusion matrix [53] was generated, which shows the prediction accuracy.

Result and Discussion
This section has been divided into two parts. The first part consists of the results of the ANN architecture-building and the second part is related to the validation or comparison of the output values with the real behavior of the thermocline thickness. The algorithms are assessed using the R-square and MSE metrics that were previously discussed. For the train and test datasets, the predicted and actual values are compared. The projected and experimental data in the best-case scenario should be a 45 • slope line that may be shown as the best fit curve to visually analyze the deviation.

Prediction with ANN Model Architecture
The optimum ANN setup, which is dependent on control parameters such as the number of hidden neurons, the number of hidden layers, and the learning rate, was achieved. Back-propagation with the LM algorithm was the learning algorithm used here, as indicated in Table 3. A total of 263 training data sets were used to train the network. The impact of the number of neurons on model performance can be seen in Figure 12 in terms of R-square and MSE, respectively. The number of neurons has an impact on the R-square of the model, with 10 neurons achieving the best results, and after that, the model began to overtrain, as experienced in [54]. The impact of the learning rate on the R-square and MSE of the model is also shown in Figure 12. The minimum value of MSE and maximum value of R-square is noted at 0.01 learning rate.

Prediction with ANN Model Architecture
The optimum ANN setup, which is dependent on control parameters such as the number of hidden neurons, the number of hidden layers, and the learning rate, was achieved. Back-propagation with the LM algorithm was the learning algorithm used here, as indicated in Table 3. Table 3. Final Parameters for the model.

Sr No
Parameter Remarks 01 Learning algorithm LM 02 Activation function Log-sigmoid 03 Number of neurons 10 04 Learning rate 0.01 A total of 263 training data sets were used to train the network. The impact of the number of neurons on model performance can be seen in Figure 12 in terms of R-square and MSE, respectively. The number of neurons has an impact on the R-square of the model, with 10 neurons achieving the best results, and after that, the model began to overtrain, as experienced in [54]. The impact of the learning rate on the R-square and MSE of the model is also shown in Figure 12. The minimum value of MSE and maximum value of R-square is noted at 0.01 learning rate.  Figure 13 shows the overall regression of the collected and predicted data and Figure  14 shows the comparison between the ANN predicted and actual values during the charging period. The graph shows a very motivated representation, as the data is very close to the central line and gives an R-square of 0.92 for . According to the theory, increases during charging time due to mixing and conduction effects. It can be seen from Figure 14 that during the charging period the increased from 1.94 to 3.5, which supports the theoretical behavior of thermocline during charging. The average error between  Figure 13 shows the overall regression of the collected and predicted data and Figure 14 shows the comparison between the ANN predicted and actual values during the charging period. The graph shows a very motivated representation, as the data is very close to the central line and gives an R-square of 0.92 for W TC . According to the theory, W TC increases during charging time due to mixing and conduction effects. It can be seen from Figure 14 that during the charging period the W TC increased from 1.94 to 3.5, which supports the theoretical behavior of thermocline during charging. The average error between the ANN predicted and actual values is 2.4%. Figure 14 shows the comparison between the calculated and ANN predicted values of W TC ; it can be seen that the W TC has increased with respect to charging time as expected [9,41,55,56]. the ANN predicted and actual values is 2.4%. Figure 14 shows the comparison between the calculated and ANN predicted values of ; it can be seen that the has increased with respect to charging time as expected [9,41,55,56].   Figure 15 represents the changes in the MSE for training, validation, and testing; with respect to the iterations, it can be seen from the graph that the best result was achieved at the 23 iterations, which means that the minimum value of MSE is achieved at 23 iterations or epochs. the ANN predicted and actual values is 2.4%. Figure 14 shows the comparison between the calculated and ANN predicted values of ; it can be seen that the has increased with respect to charging time as expected [9,41,55,56].   Figure 15 represents the changes in the MSE for training, validation, and testing; with respect to the iterations, it can be seen from the graph that the best result was achieved at the 23 iterations, which means that the minimum value of MSE is achieved at 23 iterations or epochs.  Figure 15 represents the changes in the MSE for training, validation, and testing; with respect to the iterations, it can be seen from the graph that the best result was achieved at the 23 iterations, which means that the minimum value of MSE is achieved at 23 iterations or epochs.

Prediction of Thermocline Thickness
The outputs of the model and the actual values are compared for training, validation, and testing and the results are shown in Figure 16. The R value shows the closeness to the targeted values; if the value is greater than 0.9 it means the predictions are satisfactory, and it can be seen from the Figure 16 that the R values for training, validation, and testing are 0.97, 0.94, and 0.95, respectively. A total number of 263 data points were used in the

Prediction via KNN and SVM
The Input data magnitude with it's label shown in Table 4 below.

Sensors 2022, 22, x FOR PEER REVIEW
19 of 25 Figure 17 shows the confusion matrix for k-nearest neighbor with mean average filter prediction, and it shows that the maximum false positive value is 1, which happened in class 7; the maximum false negative is 1, which happened in class 2, 10, and 11. The result has prevailed that KNN is a promising tool to predict the output. The overall classification accuracy was 96.3%.  Figure 18 shows the confusion matrix for k-nearest neighbor with median average filter prediction and the figure shows that the maximum false positive value is 1, happening in class 3; whereas the maximum false negative is 1, which happened in class 1, 5, 6, 8, 10, and 11. The result prevailed that KNN is also a promising tool to predict the output with median average filter. The overall classification accuracy is 93.5%.  Figure 18 shows the confusion matrix for k-nearest neighbor with median average filter prediction and the figure shows that the maximum false positive value is 1, happening in class 3; whereas the maximum false negative is 1, which happened in class 1, 5, 6, 8, 10, and 11. The result prevailed that KNN is also a promising tool to predict the output with median average filter. The overall classification accuracy is 93.5%.  Figure 19 shows the confusion matrix for the support vector machine with mean average filter prediction, and it shows that the maximum false positive value is 1, happening in class 2, 3, 6, 7, and 11; whereas the maximum false negative is 2, and this happened in class 6.  Figure 19 shows the confusion matrix for the support vector machine with mean average filter prediction, and it shows that the maximum false positive value is 1, happening in class 2, 3, 6, 7, and 11; whereas the maximum false negative is 2, and this happened in class 6.  Figure 20 shows the confusion matrix for the support vector machine with median average filter prediction, and it shows that the maximum false positive value is 5, and this happened in class 2 and 3, whereas there are no false negative values. Even though the SVM accuracy is lower than the prediction accuracy of KNN, it predicts the output at 89.7% using mean average filter, and 82.4% using median average filter. The result shows that k-nearest neighbor predicts more false negative values than false positive values, whereas support vector machine predicts more false positive values than false negative values. Generally, the result prevailed that k-nearest neighbor is the best prediction tool with mean average filter.  Figure 20 shows the confusion matrix for the support vector machine with median average filter prediction, and it shows that the maximum false positive value is 5, and this happened in class 2 and 3, whereas there are no false negative values. Even though the SVM accuracy is lower than the prediction accuracy of KNN, it predicts the output at 89.7% using mean average filter, and 82.4% using median average filter. The result shows that k-nearest neighbor predicts more false negative values than false positive values, whereas support vector machine predicts more false positive values than false negative values. Generally, the result prevailed that k-nearest neighbor is the best prediction tool with mean average filter.

Conclusions
This study focuses on the evaluation performance of a stratified TES tank installed at the GDC plant of UTP in terms of thermocline thickness, using machine-learning algorithms. The study was conducted using backpropagation with the Lavenberg-Marquadt algorithm, support vector machine, and k-nearest neighbor, trained with the temperature sensor data collected from the GDC plant installed at the university Teknologi PETRONAS, Malaysia. Based on the modeling results, the following conclusions can be drawn:

Conclusions
This study focuses on the evaluation performance of a stratified TES tank installed at the GDC plant of UTP in terms of thermocline thickness, using machine-learning algorithms. The study was conducted using backpropagation with the Lavenberg-Marquadt algorithm, support vector machine, and k-nearest neighbor, trained with the temperature sensor data collected from the GDC plant installed at the university Teknologi PETRONAS, Malaysia. Based on the modeling results, the following conclusions can be drawn: 1.
The model was trained by using the data obtained from the temperature sensor data and non-linear regression model based on Musser and Behnfleth's approach. The model was trained to predict W TC from temperature distribution data obtained from an operating TES tank. The average error during charging between ANN and actual values for W TC was 2.4%. The maximum R-square was 0.92.

2.
The ANN model for the TES tank was built based on various parameters i.e., number of hidden layers, numbers of hidden neurons, and learning rate. All the parameters were decided based on a trial-and-error approach. After performing various trials, the optimum parameters were decided as 10 numbers of neurons, 0.01 learning rate, and 1 hidden layer. On the decided parameters, the model results were validated with the results obtained by the non-linear regression model used for data generation. After validation with the model, the R-square achieved was 0.94 between the ANN and the non-linear regression model. 3.
The SVM model was also trained with the same data, the R-square obtained using the SVM model was 89% and for KNN it was 96.3%.

4.
Finally, based on the results, KNN outperformed the other machine learning models used in this study. The author would like to extend the use of the machine learning model for other parameter predictions, such as figure of merit and for optimization of the operations in future.

Research Limitations and Future Work
Although predicting the performance of stratified thermal energy storage tanks using ANN, SVM, and KNN produced excellent results, there are still some limitations in the work, such as:

•
Less number of observations in dataset.

•
The modeling lacks physics understanding.
The above-mentioned limitations can be improved further. The author considers that the work can still be extended as follows:

•
Consider employing more sophisticated machine learning models, such as deep learning models, which have remained unexplored in this area. • Using physics-informed neural networks, which can solve the system's governing equations and lessen the problem of data scarcity in the research.

•
Planning for maintenance can be added to the work. Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data available on request due to restrictions.

Conflicts of Interest:
The authors declare no conflict of interest.