Operational Energy and Carbon Cost Assessment Model for Family Houses in Saudi Arabia

: In Saudi Arabia, housing projects account for almost half of the total electricity consumed by the construction industry because of the large number of housing projects compared to other types of buildings. This paper proposes a quantitative approach using a multiple linear regression assessment model to predict the energy cost and environmental cost of housing building in Saudi Arabia. It was developed to assist house owners in Saudi Arabia in estimating the monthly energy cost and associated operational carbon cost according to several predictor parameters. Based on related literature, these parameters were reviewed and discussed by experts from the Ministry of Housing. They included building location, wall type, number of occupants, window type, envelope insulation, building age, building area, number of air-conditioning units and their systems, and lighting system. The model development process included ﬁve main stages: collecting the energy and carbon cost data from completed operating housing units, categorizing the collected data based on parameters, diagnosing the quality of gathered data and ﬁltering outlier data if any, building and generating a model, and lastly, testing and validating the model. More than 77 datasets were collected across the country during different times of the year. The ﬁndings of this study reveal that the relationship between the number of users and the building area with the energy cost is signiﬁcant and that the number of users is more correlated to the energy cost than the age of the building or the number of central air conditioners installed. Moreover, the results show that the developed model has the ability to predict energy and carbon costs with high accuracy. The developed model serves as a decision support tool for householders and decision makers in the Ministry of Housing to control the predicted parameters. This would be beneﬁcial for the housing unit owners for allocating constrained budgets.


Introduction
Buildings in Saudi Arabia are progressively demanding significant energy needs, particularly during the summer season, due to rising air-conditioning demands associated with extremely high outside temperatures across the country. Saudi Arabia's electricity expenditure accounts for over one-third of its total regular oil production [1]. The building sector uses about 80% of the overall electricity produced [2].
Moreover, Saudi Arabia's population is continuously increasing, and the economy is rapidly expanding, resulting in a surge in suburban construction. In 2016, the Saudi housing minister announced a new construction program that sought to build about 1.5 million new homes over the next seven to eight years. Presently, housing projects use approximately new homes over the next seven to eight years. Presently, housing projects use approximately half of the overall power use of the construction stock [3]. This is because of flaws in construction code, layout procedures, metropolitan layout, and building products [4][5][6][7]. In comparison to the worldwide construction industry, which uses around 40% of overall energy consumption, Saudi Arabia has the highest residential energy consumption. It is expected to rise annually by 5-8% [8]. By 2035, domestic oil production will equal the consumption of domestic oil. Thus, the Saudi government and major corporations developed the new program of Saudi Energy Efficiency in an effort to cut energy costs by 30%. The Saudi government announced the establishment of the Saudi Energy Efficiency Center (SEEC) in November 2010.
Furthermore, in April 2016, Saudi 2030 Vision was launched by the government focusing on emerging towns and on how to attain ecological sustainability [9]. However, while the government's leadership has been trying to reduce the existing energy intake, the energy consumption of buildings keeps increasing, leading to an increase in the energy cost of buildings.
Realizing the impact of buildings' energy consumption on overall energy costs, the government issued the Saudi Building Code in 2018. With such a policy, it hopes to address the perennial problem of high energy consumption and its associated costs.
A recent study has identified some of the issues that Saudi residential buildings face [1]. It noted that a large portion of their electricity consumption had gone toward AC units and around 70% of the constructed houses were not thermally sealed. In Saudi Arabia, 1, 811.5 million kWh of electricity is generated daily using four million barrels of oil, which represents half of the daily oil exports and one-third of daily oil production [1]. Furthermore, the building sector consumes 80% (649 million kWh) of the total energy produced daily which is the highest rate in the region.
A construction project entails several phases, including incorporation of intangible design, comprehensive design, development, building, and process and preservation as shown in Figure 1. Despite the availability of many types of commercial software, it is difficult to predict a project's power, construction, and maintenance costs precisely at the theoretical stage due to a lack and/or inaccuracy of information [1]. Data become available as the development progresses, allowing more precise cost assessments. As the design moves from the intangible to the tangible stage, adjusting it becomes a challenge [10][11][12][13][14]. The energy consumption in Saudi buildings has been studied extensively [15][16][17]. Researchers Taleb and Sharples [15] aimed to create strategies to provide sustainable The energy consumption in Saudi buildings has been studied extensively [14][15][16]. Researchers Taleb and Sharples [14] aimed to create strategies to provide sustainable housing in Saudi Arabia by researching the country's current energy and water consumption. This Sustainability 2022, 14, 1278 3 of 18 study endeavored to identify inherent design shortcomings in the existing Saudi buildings. According to Taleb and Sharples, there are several methods by which Saudi buildings can be made more energy-efficient. The insulation of exterior walls and roofs, the implementation of superior glazing systems, and smart lighting are among the methods needed. Ashraf and Almaziad [15] used an energy simulation program to assess the effects of a multilevel educational building's facades on its energy consumption. When it came to residential facilities, Alrashed and Asif [16] studied the impact of factors such as weather conditions, residence types, envelopes, cooking devices, and air-conditioning systems. Their study used a 2012 survey of 115 residences, including 25 traditional houses, 28 apartments, and 62 villas. They suggested using double glazing and a mini-split air-conditioning system to reduce energy consumption.
The previous studies indicated that there is scarcity of providing an accurate measurement tool for evaluating architectural designs that were energy-efficient at the conceptual design phase. In addition, there was a lack of computer-based applications, which could guide owners and architectural engineers in Saudi Arabia to select the most energy-efficient designs early in a project's conceptual phase to minimize energy costs except that of Alshibani and Alshamrani [1]. The literature shows that software available for modeling buildings had several limitations, including their lack of accuracy, the requirement for a substantial amount of data for modeling, which could be unavailable in the conceptual design phase, and the required time it took to finish the modeling process. The time taken depended on the buildings' and spaces' dimensions [17]. Information technology advancements have contributed to the development of many computer-based tools. A prominent example is artificial neural networks, one of the greatest commonly used techniques for developing prediction models across various fields [18,19]. There is a growing number of applications for predicting energy consumption that utilize Artificial Neural Networks (ANNs) [14,[20][21][22][23][24][25][26][27].
ANNs are more effective and accurate at predicting the energy consumption of buildings than ordinary simulation models or regression techniques [23]. For instance, the Abductory Induction Mechanism (AIM), developed by Abdel-Aal et al. [20], estimates the domestic consumption of energy in Saudi Arabia's eastern region on a monthly basis. The model takes into account only demographic measures, economic indices, and weather parameters [21]. Nasr et al. created an artificial-neural-network-based model for electricity consumption based on weather-dependent variables and time series to predict electricity use in Lebanon. A hybrid growth model devised by Meng et al. [22] was used to predict China's monthly electrical energy consumption by extracting the trend of cumulative consumption. Although these models attempted to predict monthly electric energy consumption, they neglected to consider the architectural design of the building and its integration with the architectural elements. Using an ANN, Kumar et al. [24] assessed the heat load and total carbon emissions for a multi-story building. Karattasou et al. [25] suggested integrating statistical processes such as cross-validation, information criterion, and hypothesis testing into ANN models to improve performance.
Using a three-layered back-propagation ANN, Ekici and Aksoy [26] predicted the thermal energy requirement of different building types with a forecasting percentage at 94.8-98.5%. Mena-Yedra et al. [17] introduced a short-term ANN-based model to estimate the electricity demand at a solar energy research center and found admissible results. Along with ANNs, CAD tools have become advanced enough to estimate the building design parameters with ease, accuracy, and speed. Alshiabani and Alshamrani [1] developed an ANN/BIM model to predict the total electric energy costs of residential buildings in Saudi Arabia using a neural network. With the Kingdom's climate in mind, the system aimed to help architects design residential buildings to reduce energy costs. Many models and systems have been developed for different applications and jurisdictions for energy consumption in Saudi Arabia's building sector. Furthermore, regression [28] and neuralnetwork-based models [29] were developed to predict the energy consumption in school facilities in Saudi Arabia. The findings indicate the capability of the proposed model to estimate the energy consumption. Gao et al. [30] also developed a model based on the regression for estimating energy exchange efficiency. The results of the study indicate the suitability of the developed model to predict energy-exchange efficiency under different conditions. Mastrucci et al. [31] developed a statistical model based on a geographical information system to predict the consumption of energy of residential buildings. The important feature of the proposed model was the ability to predict consumption of energy on a large scale using an accurate, fast, and simplified tool without using a huge input dataset or assumptions. Gardezi et al. [32] developed a model based on multi-variant regression and a life cycle assessment method for embodied carbon footprint prediction in a residential building. The model was validated and made predictions against observed values of five. More housing units different from case studies also ensured efficiency and consistency.
Based on the aforementioned literature and to the best of the authors' knowledge, the previous developed models individually and collectively lack accurate prediction of the monthly energy footage and associated carbon emission of Saudi buildings. For residential structures, predicting the energy cost is a vital consideration, which can help the architect select the most energy-efficient alternative that meets practical objectives. During the conceptual design phase, engineers often have difficulty predicting power costs without adequate data.
This research is intricate due to the complexities in the curve of energy expenditure [33,34], along with incomplete data for the design of Heating, Ventilation, and Air Conditioning (HVAC) and illumination systems and envelope system. Consequently, this paper introduces a quantitative approach using a multiple-regression-based model to predict the monthly energy footage and associated carbon costs of Saudi housing buildings. The carbon cost is embodied in the model in order to meet the Saudi government's goal that is reflected in the decarbonation strategy which, consequently, will help in reducing the environmental impact, climate change, and global warming. In this paper, the factors influencing operational energy and carbon costs are identified. In addition, the correlation between the predictor parameters and the response factor is investigated. Moreover, the average monthly energy and carbon cost of housing projects in Saudi Arabia are predicted. The best regression model is then developed and validated using actual data from real-time projects. The main findings show that building area, central air conditioning, building age, and the number of users are the main factors that influence the model's prediction of energy and operational carbon costs.
The model can be used in other countries with climates similar to that of the Middle East and Gulf region to estimate the energy costs of residential buildings. However, the electricity tariff rate for a particular country needs to be taken into account. Since the factors that influence energy costs are identified and the correlation between these factors is investigated, the developed model can be used as a framework. The model could be modified based on the residential building's requirements for it to be adoptable for similar applications in other countries. Furthermore, the developed model could be retrained using new datasets rather than starting from scratch. This study is expected to serve as a milestone, assisting professionals and researchers in making quick, effective, and long-term energy cost assessments, decisions, and solutions.

Methodology
This study employed a real-time energy and carbon cost dataset of Saudi family houses to develop an energy and carbon cost assessment model. The average cost of energy for each month was determined by identifying and describing various input parameters. These parameters included wall, window type, number of occupants, envelope insulation, building age, building location, building area, air-conditioning type, the number of air conditioners, and lighting system. In this study, data gathered to build a model were as follows: To predict the monthly energy and carbon costs, different scenarios with various parameter combinations were run. The tested scenarios were categorized based on some changes to the alternatives for a single parameter while the other parameters remained constant. Figure 2a,b show the predicted costs for various window types and wall systems, respectively, whereas other parameters are held constant. To predict the monthly energy and carbon costs, different scenarios with various parameter combinations were run. The tested scenarios were categorized based on some changes to the alternatives for a single parameter while the other parameters remained constant. Figure 2a,b show the predicted costs for various window types and wall systems, respectively, whereas other parameters are held constant.  The developed model sought to diagnose any possible correlation between response variables (monthly energy and carbon cost) and predictor variables (energy and building factors). The model used multiple regression techniques to analyze the relationship between the monthly energy and carbon cost of Saudi family houses to develop a prediction model. Figure   The developed model sought to diagnose any possible correlation between response variables (monthly energy and carbon cost) and predictor variables (energy and building factors). The model used multiple regression techniques to analyze the relationship between the monthly energy and carbon cost of Saudi family houses to develop a prediction model. Figure 3 depicts the development of the multiple linear regression model in five stages: the gathering of actual average monthly energy and carbon costs of existing houses, segmenting and identifying the input and output parameters, diagnosing the quality of preliminary data, developing and generating the regression model, and testing and validating the model. segmenting and identifying the input and output parameters, diagnosing the quality of preliminary data, developing and generating the regression model, and testing and validating the model. This study used Grubbs' test, an outlier testing program, to identify the presence of outliers, as they have the potential to influence the model's predictive performance. According to the null hypothesis, the sample belongs to the same normal population, but the alternative hypothesis states that the extreme values are outliers. Results are shown in Figure 4 and Table 1. Based on the results, there are no outliers at a 5% significance level, and all data pertain to the same population. This study used Grubbs' test, an outlier testing program, to identify the presence of outliers, as they have the potential to influence the model's predictive performance. According to the null hypothesis, the sample belongs to the same normal population, but the alternative hypothesis states that the extreme values are outliers. Results are shown in Figure 4 and Table 1. Based on the results, there are no outliers at a 5% significance level, and all data pertain to the same population.   A total of 77 house-building datasets were collected in Saudi Arabia, throughout the eastern region, to gather energy cost data. An assessment model was made using 85% of the 77 datasets (65 points) to predict the energy consumption of Saudi family homes, with the remaining 15% (12 points) chosen randomly to validate the developed regression model. Table 2 shows a sample of the data gathered in this study.  1  1  335  35  0  1  13  1  6  6  0  2  1075  2  1  500  3  0  1  6  1  0  11  0  2  348  3  1  700  3  0  2  7  2  0  9  0  2  942  4  2  550  20  0  3  4  1  4  2  0  2  588  5  1  237  6  1  4  5  2  0  8  0  1  298  6  1  250  3  1  1  5  1  0  7  0  1  225  7  5  250  2  0  1  5  2  0  9  0  3  400  8  3  550  5  1  3  6  2  0  5  0  1  1000  9  1  370  5  0  1  6  2  0  9  0  1  320  10  2  300  3  1  4  7  2  0  10  0  3  395  11  2  200  30  1  1  5  2  6  0  0  3  150  12  3  500  25  0  1  7  2  0  7  4  1  600  13  1  350  5  1  1  8  2  0  7  0  1  112  14  1  500  9  1  4  7  2  0  0  13  3  2000  15  3  300  3  0  4  4  2  0  10  0  1  253  16  1  340  4  0  1  5  2  0  8  0  1  220  17  1  400  6  1  1  8  2  3  15  0  3    A total of 77 house-building datasets were collected in Saudi Arabia, throughout the eastern region, to gather energy cost data. An assessment model was made using 85% of the 77 datasets (65 points) to predict the energy consumption of Saudi family homes, with the remaining 15% (12 points) chosen randomly to validate the developed regression model. Table 2 shows a sample of the data gathered in this study.  Preliminary data analysis consisted of two major tests: calculating the best subset regression and identifying any data interaction and correlation. In order to develop a model, four steps were taken: generating the regression model, reviewing elementary factors, conducting the residual study, and selecting the validation model.
A correlation analysis was performed before the regression analysis, and the correlation coefficient "R" expressed the relationship between variables. Table 3 shows the correlation between the input factors. As can be seen from the table, the R-value between any two variables is less than 0.7. This indicates that all variables were part of the regression analysis. Furthermore, scatter plots were used to visualize and analyze the data and determine the relationship between the input variables and their responses (i.e., energy cost). In addition to their simplicity, scatter plots can accurately show the linear as well as the non-linear relationship between two given factors. For instance, the relationship between the number of users and the building area with the energy cost was significant, as shown in Figure 5a,b. Moreover, Figure 5 reveals that the number of users was more correlated to the energy cost than the age of the building or the number of central air conditioners installed. Preliminary data analysis consisted of two major tests: calculating the best subset regression and identifying any data interaction and correlation. In order to develop a model, four steps were taken: generating the regression model, reviewing elementary factors, conducting the residual study, and selecting the validation model.
A correlation analysis was performed before the regression analysis, and the correlation coefficient "R" expressed the relationship between variables. Table 3 shows the correlation between the input factors. As can be seen from the table, the R-value between any two variables is less than 0.7. This indicates that all variables were part of the regression analysis.  Furthermore, scatter plots were used to visualize and analyze the data and determine the relationship between the input variables and their responses (i.e., energy cost). In addition to their simplicity, scatter plots can accurately show the linear as well as the nonlinear relationship between two given factors. For instance, the relationship between the number of users and the building area with the energy cost was significant, as shown in Figure 5 a,b. Moreover, Figure 5 reveals that the number of users was more correlated to the energy cost than the age of the building or the number of central air conditioners installed.

Results and Discussion
The monthly energy and carbon costs of Saudi family houses were assessed using multiple linear regression modeling techniques. The developed model can assist building owners in selecting building systems, materials, and architectural design that can help reduce the operating energy and carbon cost of their houses. The developed model can predict energy and carbon costs based on predictor variables. These variables include wall type, number of occupants, window type, envelope insulation, building age, building location, building area, air conditioner type, number of air conditioners, and lighting systems. Table 4 displays the generated regression-based models in each line by analyzing the subset regression output. The lowest standard deviation (S) and Cp values are 12.0 and 239.4, respectively. S is the standard deviation of residuals, and Cp is expressed in the following equation:

Best Subset Result
where C p is used to evaluate the fit of a regression model estimated using ordinary least squares, SSE P is the error sum of squares for the fitted subset regression model with p parameters (p−1 predictors), MSE X 1 . . . X p−1 is the unbiased estimate of variance, and n is the number of observations. where AECC is the average of monthly energy and carbon cost in Saudi riyals (SAR), A is the building location (Dammam 1, Khobar 2, Qatif 3, Dhahran 4, Thoqbah 5), B is the gross area varied (150 m 2 to 1000 m 2 ), C is the building age (1 year-40 years), D is the insulation (applied 1, not applied 2), E is the wall system (concrete block 1, clay 2, Siporex 3, and precast 4), F is the number of users (2 persons to 15 persons), G is the window type (single 3, double 2, and triple glazing 1), H is the number of window air-conditioning units (1-16 units), J is the number of split air conditioners (1-16 units), K is the number of central air conditioners (1-16 units), and L is the lighting systems (condescend 3, fluorescent 2, and LED 1).

Assessment Tests of Model Adequacy
The analysis of the response factor (monthly energy and carbon cost) versus the predictors indicates that some predictor variables have a positive correlation (direct proportion) with the resulting cost, while others have a negative relationship (direct reflex). For example, a positive relationship exists among building age, building area, insulation, number of users, and air-conditioning types. A direct reflex correlation exists among wall type, building location, and lighting system versus the corresponding energy and carbon cost factor.
Based on the preliminary tests, the coefficient of determination, R 2 , and R 2 (adjusted) values are 84.4% and 81.2%, respectively. In the developed model, R 2 indicates that predictor variables explain 84.4% of the variance in the response variable (energy and carbon cost). R 2 (adjusted) is a modified value of R 2 associated with several terms of description in the developed model. The standard deviation value (S) is 12, and the R 2 value shows that the model is suitable for the data.
Furthermore, the T-test determines whether the predictor variables significantly correlate with the response variable or not. Most of the p-values for the estimated coefficients for each predictor in Table 5 are less than 0.05. Thus, in this case, the null hypothesis was rejected, whereas the alternative hypothesis was accepted. The majority of the predictor variables significantly correlate with the response variable "monthly energy and carbon costs" at 0.1 α-level. There is, however, a different pattern when it comes to other predictors, including location, wall type, and insulation, which have an insignificant correlation with the response factor. In addition, Table 5 shows p-values for the input variables. The analysis shows the significance of the developed model at 0.05 α-level. As a result, at least one coefficient should not be zero in the selected regression model. Some of the predictors have a high p-value which might not reflect the significance of these variables. However, as proven by the best subset analysis, they should still be included in the model to achieve the best predictions of energy and carbon cost.
The Pareto charts in Figure 6 with the absolute values of the standardized effects represent all of the standardized effects on the response; they are t-statistics. However, statistical significance was only found for effects that exceeded the dashed line. In Figure 6, the dashed line marks the 2.00 abscissa for a 0.05 significance level. The statistically significant variables and interaction factors in this test were the building area, AC systems, users, and building age. The strongest predictor was the building area (B) with t-value = 5.73, followed by central air conditioning (K) with t-value = 5.52, followed by window air conditioning (H) and the number of users (F) with t-values = 4.38 and 4.03, respectively. The factors of building age (C) and lighting system (L) had close t-values of ∼ = 2.6. Based on the t-test, factors such as wall type (E) predictor recorded the lowest significant correlation due to the expected similar u-values of a different wall system. The second-lowest significant correlated factor was location (A) since these distinct locations were within the same region and climate zone. The Pareto charts in Figure 6 with the absolute values of the standardized effects represent all of the standardized effects on the response; they are t-statistics. However, statistical significance was only found for effects that exceeded the dashed line. In Figure  6, the dashed line marks the 2.00 abscissa for a 0.05 significance level. The statistically significant variables and interaction factors in this test were the building area, AC systems, users, and building age. The strongest predictor was the building area (B) with t-value = 5.73, followed by central air conditioning (K) with t-value = 5.52, followed by window air conditioning (H) and the number of users (F) with t-values = 4.38 and 4.03, respectively. The factors of building age (C) and lighting system (L) had close t-values of ≅ 2.6. Based on the t-test, factors such as wall type (E) predictor recorded the lowest significant correlation due to the expected similar u-values of a different wall system. The second-lowest significant correlated factor was location (A) since these distinct locations were within the same region and climate zone.

Residual Analysis Result
The normal probability graph shown in Figure 7 indicates that all error terms follow an almost normal distribution. There are a few deviations from normality that can be observed in the residuals' normal probability chart. An analysis of these observations reveals that they are possible abnormalities. By excluding these abnormalities, the values of regression coefficients, such as R 2 , can be improved. However, the model would not represent the real-world data available. The residual analysis results are accepted because minor deviations from normality are not a cause for concern [25]. Figure 8 shows how outliers appear in the histogram of the residual plot. Small bars on either side of the histogram indicate a high standard deviation from the mean value, which means the data for these values do not agree perfectly with the model. To estimate the probability of errors and outliers in the normal probability plots, the output of Minitab for the outliers and unusual observations was analyzed in this study. Table 6 presents standardized residual observations that have a considerable impact on the characteristics of the developed model. These observations influence the normal probability plot of residuals. Some of these observations are excluded from the model to improve the regression model without affecting the process for predicting the operational energy and carbon cost (response factor).

Residual Analysis Result
The normal probability graph shown in Figure 7 indicates that all error terms follow an almost normal distribution. There are a few deviations from normality that can be observed in the residuals' normal probability chart. An analysis of these observations reveals that they are possible abnormalities. By excluding these abnormalities, the values of regression coefficients, such as R 2 , can be improved. However, the model would not represent the real-world data available. The residual analysis results are accepted because minor deviations from normality are not a cause for concern [26].  Figure 8 shows how outliers appear in the histogram of the residual plot. Small bars on either side of the histogram indicate a high standard deviation from the mean value, which means the data for these values do not agree perfectly with the model. To estimate the probability of errors and outliers in the normal probability plots, the output of Minitab for the outliers and unusual observations was analyzed in this study. Table 6 presents standardized residual observations that have a considerable impact on the characteristics of the developed model. These observations influence the normal probability plot of residuals. Some of these observations are excluded from the model to improve the regression model without affecting the process for predicting the operational energy and carbon cost (response factor).    Figure 8 shows how outliers appear in the histogram of the residual plot. Small bars on either side of the histogram indicate a high standard deviation from the mean value, which means the data for these values do not agree perfectly with the model. To estimate the probability of errors and outliers in the normal probability plots, the output of Minitab for the outliers and unusual observations was analyzed in this study. Table 6 presents standardized residual observations that have a considerable impact on the characteristics of the developed model. These observations influence the normal probability plot of residuals. Some of these observations are excluded from the model to improve the regression model without affecting the process for predicting the operational energy and carbon cost (response factor).

Economic Analysis
The economic analysis of the predictor parameters in the developed regression model shows that the highest energy and carbon cost is caused by applying different lighting types at 310 SAR followed by window types at 256 SAR. The triple glazing can reduce the cost by 512 SAR compared to the single one. The second-highest energy and carbon cost is recorded at 220 SAR when the thermal insulation is not applied in the external walls. One hundred and sixty-two riyals is the energy and carbon cost value that will be added for each central air-conditioning unit while the window air-conditioning unit will increase the cost by 148 SAR. The developed regression model shows that every family member can increase the energy and carbon cost by 140 SAR. The location of the building will affect the cost by 50 SAR while each split air-conditioning unit will increase the cost by 41 SAR. Building a different wall system can help in reducing the cost by 34 SAR while the age of the building (each year) can increase the energy and carbon cost by 22 SAR. Finally, the outcome of the developed regression model shows that the energy and carbon cost will increase by 2.0 SAR/m 2 , depending on the area of the house.

Model Validation
Model validation is the process of evaluating the proposed model's performance and accuracy with new data by applying it to new datasets. In this paper, twelve actual points of the energy cost and associated operational carbon cost from real projects were used to validate the developed model. These points represented 15% data points which were plotted to compare the in-hand gathered data and the predictive model. The validation relied on a mathematical model proposed by Zayed et al. [24]. It calculated the average invalidity and valid percentage to test the validity of data using the following formula: and AVP = 1 − AIP where AVP, AIP, C i , E i , and n denote the rate of average validity, average invalidity rate, actual value, predicted value, and observation numbers, respectively. AIP value varies between 0 and 1.
As seen from the AVP value, the predicted model's accuracy is 97.9% which is adequate. The validation plot chart in Figure 9 illustrates the comparison of predicted and actual operational energy and carbon costs. Based on the graph, predicted values are close to the real response values. The resulting outputs of the second validation test are, therefore, deemed acceptable. The validation plot chart in Figure 9 illustrates the comparison of predicted and actual operational energy and carbon costs. Based on the graph, predicted values are close to the real response values. The resulting outputs of the second validation test are, therefore, deemed acceptable.

Model Testing and Training
A hypothetical case study was assumed in this study to test and train the developed multiple linear regression model for assessing the energy and associated carbon cost. A new family house in the city of Dammam with a medium area size of (500 m 2 ) was selected

Model Testing and Training
A hypothetical case study was assumed in this study to test and train the developed multiple linear regression model for assessing the energy and associated carbon cost. A new family house in the city of Dammam with a medium area size of (500 m 2 ) was selected for this case study. The total number of users was seven persons, and the number of air conditioners was 10 units. The test was performed by applying different possible scenarios for window types, air-conditioning systems, lighting systems, and various wall systems while the insulation is becoming mandatory in the city. The various test scenarios were conducted using the developed regression model in this study as follows: Various tested scenarios with different parameter combinations were conducted on the selected case study to measure the economic impact of different parameter alternatives. The tested scenarios were grouped according to some changes of the alternatives for one parameter while the other parameters were kept fixed. The comparisons were conducted based on the monthly predicted energy and carbon cost as well as the total assessed annual energy and carbon cost as shown in Table 7. The lowest annual energy and carbon cost for the different window scenarios is predicted for the triple-glazing window (10,780 SAR), while the highest cost is predicted for the single-glazing window (16,900 SAR). Hence, the triple-glazing window can reduce the cost by 36.2% compared to the single-glazing window. The lowest annual energy and carbon cost for the different wall-system scenarios is predicted for the precast concrete panel (18,700 SAR), while the highest cost is predicted for the clay block system (19,500 SAR). Hence, the precast concrete panel can reduce the cost by 4.1% compared to the clay block. The lowest annual energy and carbon cost for the different air-conditioning-system scenarios is predicted for the split AC unit (17,100 SAR), while the highest cost is predicted for the central AC system (31,625 SAR). Hence, the split AC system can reduce the cost by 46.0% compared to the central AC system. The lowest annual energy and carbon cost for the different lighting-system scenarios is predicted for the LED system (23,475 SAR), while the highest cost is predicted for the condensing lighting system (33,880 SAR). Hence, the LED lighting system can reduce the cost by 30.7% compared to the central AC system. For the overall tested scenarios applying the developed regression model, it was found that the lowest annual predicted energy and carbon cost is recorded at (10,780 SAR), while the highest is recorded at (33,880 SAR). The total annual energy and carbon cost can be reduced by 68.2% when different systems are combined as presented in Table 7.
Hence, the developed multiple regression model can be a significant tool that will help homeowners in predicting and reducing their annual energy and carbon cost. It can be used as a supportive decision-making tool that will enable them in selecting the most economical and environmentally friendly system options as proven in the tested case study.
The limitations of this study include applying multiple linear regression techniques to create the operational energy and carbon cost. The assessment model was developed based on a limited number of collected data and could be used only in areas in eastern Saudi Arabia with a similar climate zone. The model is limited for application on family houses with the following features: gross area (150-1000 m 2 ), age of the building (1-40 years), number of occupants (2 to 16 persons), and specific wall and window types. This study is limited to investigating the correlation between parameters (predictors) and predicted monthly energy and carbon costs.

Conclusions
This research proposes a quantitative regression-based model for predicting the monthly energy footage and associated carbon costs of Saudi buildings. The proposed model was adequately developed and verified using actual datasets. The results reveal that factors affecting the energy consumption and carbon costs of family-houses the most are large building area, central air-conditioning system, number of window air-conditioning units, number of users, age of the building, and lighting system. The model was tested and validated with a high accuracy of 97.9%, using mathematical and graphical validation. The developed regression model can serve as a decision support tool to enable building owners and decision makers in the Ministry of Housing to control the energy consumption, energy cost, and emitted carbon cost of Saudi family houses based on some predictor variables. Future work could extend the current research on energy and carbon cost to cover other types of buildings at different locations with different climate zones. Other modeling techniques, such as fuzzy logic, genetic algorithms, and hybrid methods, can be included. Research can incorporate more data to ascertain the correlation with other parameters, such as the wall system, thermal insulation type, and thickness. Moreover, the model's applicability would be enhanced further if it were integrated with other factors, especially if the model were used in other countries. Furthermore, energy prices are influenced by a variety of factors, including inflation. These elements could be incorporated into the model in the future.
It is also possible that the developed model can be extended to incorporate the Life Cycle Analysis (LCCA) model for economic analysis for the selection of architectural design with the least cost.