Development and Evaluation of Statistical Models Based on Machine Learning Techniques for Estimating Particulate Matter (PM2.5 and PM10) Concentrations

Despite extensive research on air pollution estimation/prediction, inter-country models for estimating air pollutant concentrations in Southeast Asia have not yet been fully developed and validated owing to the lack of air quality (AQ), emission inventory and meteorological data from different countries in the region. The purpose of this study is to develop and evaluate two machine learning (ML)-based models (i.e., analysis of covariance (ANCOVA) and random forest regression (RFR)) for estimating daily PM2.5 and PM10 concentrations in Brunei Darussalam. These models were first derived from past AQ and meteorological measurements in Singapore and then tested with AQ and meteorological data from Brunei Darussalam. The results show that the ANCOVA model (R2 = 0.94 and RMSE = 0.05 µg/m3 for PM2.5, and R2 = 0.72 and RMSE = 0.09 µg/m3 for PM10) could describe daily PM concentrations over 18 µg/m3 in Brunei Darussalam much better than the RFR model (R2 = 0.92 and RMSE = 0.04 µg/m3 for PM2.5, and R2 = 0.86 and RMSE = 0.08 µg/m3 for PM10). In conclusion, the derived models provide a satisfactory estimation of PM concentrations for both countries despite some limitations. This study shows the potential of the models for inter-country PM estimations in Southeast Asia.


Introduction
Atmospheric air pollution has been a concern globally for decades [1] because it is a major environmental risk to health. In 2016, ambient air pollution was estimated to cause 4.2 million worldwide deaths annually due to stroke, heart disease, lung cancer as well as acute and chronic respiratory diseases, including asthma [2]. Air pollution exposes people to particulate matter (PM) and other air pollutants such as ground-level ozone (O 3 ), nitrogen dioxide (NO 2 ) and sulphur dioxide (SO 2 ). These air pollutants have strong evidence of health effects [3]. Air pollution can also cause various harmful environmental effects such as global warming, climate change, acid rain, eutrophication, haze, ozone depletion as well as crop and forest damage [4]. According to the World Health Organization (WHO), air pollution in the Southeast Asia region is among the highest in the world [5]. The air quality in Southeast Asian countries including Brunei Darussalam and Singapore have been seasonally affected by transboundary smoke haze due to land and forest fires in the region [6][7][8], normally from August to October during the Southwest monsoon period [9]. Precautionary measures to minimize the exposure of air pollutants on individuals could be taken if the air quality and the air pollutant concentrations in the region were known to the community. However, not all countries have long-term regular 1.
To generate ML-based ANCOVA and random forest regression (RFR) models from Singapore's air quality and meteorological data for estimating daily PM 2.5 and PM 10 concentrations in Singapore and to assess the models' estimation performance; 2.
To determine the most important explanatory variable that influenced the model outcome; 3.
To apply and assess the performance of the derived models for estimating daily PM 2.5 and PM 10 concentrations in Brunei Darussalam.

Study Areas
The study will examine the concentrations of PM in two Southeast Asian countries, namely Brunei Darussalam (4.5353 • N, 114.7277 • E) and Singapore (1.3521 • N, 103.8198 • E), as examples. These two countries often encounter transboundary smoke haze events. Brunei Darussalam has a population of 453,600 (as of June 2020) [21] with a total land area of 5765 square kilometers [22] and Singapore has a population of 5.69 million (as of June 2020) [23] with a total land area of 728 square kilometers (as of June 2020) [24]. Both countries have a tropical equatorial climate with warm and uniform air temperatures (mean daily air temperature range: 18 • C to 38 • C in Brunei Darussalam and 24 • C to 32 • C in Singapore), low wind speed (mean wind speed: <0.5 m/s in Brunei Darussalam and <2.5 m/s in Singapore), winds mostly blowing from the south direction in Brunei Darussalam and from the northeast and the south direction in Singapore, and heavy rainfall (mean annual rainfall total: >2300 mm in Brunei Darussalam and >2100 mm in Singapore) over the year [25,26]. These meteorological parameters do not show large monthly variations, but they show prominent daily variations due to the strong relation with solar heating [26].
The air quality in the four districts (i.e., Belait, Tutong, Brunei-Muara and Temburong) of Brunei Darussalam was assessed through 4 air quality monitoring stations (1 in each district) and the meteorological monitoring station is located at the Brunei International Airport in Brunei-Muara district ( Figure 1). In Singapore, there are 22 air quality monitoring stations (18 monitor general ambient air quality and 4 monitor roadside air quality) installed in different parts across its five air quality regions (north, south, east, west and central) [27], and the meteorological monitoring station is located at the rooftop of the Faculty of Engineering's building of the National University Singapore (NUS) (Figure 2) [28].

Data Collection and Preparation
To develop PM estimation models, air quality and meteorological data from Singapore between March 2016 and February 2018 (2 years) were collected. The air quality monitoring data were from five regions (north, south, east, west and central) in Singapore and

Data Collection and Preparation
To develop PM estimation models, air quality and meteorological data from Singapore between March 2016 and February 2018 (2 years) were collected. The air quality monitoring data were from five regions (north, south, east, west and central) in Singapore and they were downloaded from online data provided by the National Environment Agency (NEA) of Singapore (https://www.haze.gov.sg/resources/pollutant-concentrations; accessed on 15 January 2020). The meteorological data were observed from the National University Singapore (NUS) weather station, and they were obtained from a weather portal (https: //www.nusurbanclimate.com/weather-portal; accessed on 17 February 2020, courtesy of Professor Matthias Roth, Department of Geography, NUS). The collected air quality monitoring data from Singapore comprises of 1-h average hourly PM 2.5 concentration (µg/m 3 ), 24-h average hourly PM 10 concentration (µg/m 3 ) and air quality condition (good, moderate or unhealthy), and the corresponding meteorological data comprise of hourly measurements of air temperature ( • C), wind speed (m/s), wind direction ( • ) and rainfall (mm). Air quality is considered in good condition when the Pollutant Standards Index (PSI) value is between 0 and 50, moderate condition when the PSI is between 51 and 100 and unhealthy condition when the PSI is between 101 and 200 [29].
To test the applicability of the derived PM estimation models to another country in the same region, air quality and meteorological data from Brunei Darussalam between January 2009 and December 2019 (11 years) were collected. The daily air quality monitoring data were from four districts (Belait, Tutong, Brunei-Muara and Temburong) in Brunei Darussalam and they were provided by the Department of Environment, Parks and Recreation (JASTRe). The meteorological data were observed from a meteorological station in Brunei-Muara district, and they were provided by the Brunei Darussalam Meteorological Department (BDMD). The collected air quality monitoring data from Brunei Darussalam include daily average PM 10 concentration (µg/m 3 ) and air quality condition (good, moderate or unhealthy), and its meteorological data include daily average measurements of air temperature ( • C), wind speed (m/s), wind direction ( • ) and rainfall (mm). Based on the approach described by the WHO Air Quality Guidelines [30], the (unavailable) daily average PM 2.5 concentration (known as 'theoretical' PM 2.5 concentration in this study) in Brunei Darussalam was estimated by multiplying the daily average PM 10 concentration with the average factor of PM 2.5 over PM 10 (PM 2.5 /PM 10 ) in Brunei Darussalam, which was 0.43. The value of this average factor of PM 2.5 /PM 10 was considered close to the typical value of 0.5 for developing country urban areas stated in the WHO Air Quality Guidelines [30] and it was determined from the PM 2.5 concentration data reported by the Organisation for Economic Co-operation and Development (OECD) [31] and the corresponding PM 10 concentration data provided by JASTRe between 2010 and 2019 for Brunei Darussalam.
Time parameters such as day, month, year and monsoon season were also considered as variables affecting the PM concentration in the region. The monsoon seasons of both countries are north-east (NE) monsoon (from December to March), Inter-monsoon 1 (from April to May), south-west (SW) monsoon (from June to September) and Inter-monsoon 2 (from October to November) [26]. The collected data were grouped into 716 daily average observations of PM concentrations from five regions in Singapore and 4015 daily average observations of PM concentrations from four districts in Brunei Darussalam with their corresponding air quality condition, meteorological data and monsoon season for analysis using XLSTAT software. The data used in this study were restricted to observations during good and moderate conditions due to the limited availability of data during unhealthy air quality conditions in both countries. The models' inputs for both countries consist of 10 explanatory variables, in which 8 of the variables were quantitative in nature and 2 of the variables were qualitative in nature. For PM 2.5 concentration estimation, the models' inputs were day, month, year, monsoon season, daily PM 10 concentration, air temperature, wind speed, wind direction, rainfall and air quality condition. For PM 10 concentration estimation, the models' inputs were day, month, year, monsoon season, daily PM 2.5 concentration, air temperature, wind speed, wind direction, rainfall and air quality condition.

Machine Learning (ML) Techniques
Based on the nature of the dependent variable Y to estimate and the nature of the explanatory X variables, two regression models were explored, namely: (i) analysis of covariance (ANCOVA) and (ii) random forest regression (RFR). The data were randomly separated into two samples in which 80% of the observations were used for model learning/training and 20% of the remaining observations were used for model validation [32]. First, these two models were trained, validated and tested with air quality and meteorological data in Singapore. Then, these derived models for estimating daily PM 2.5 and PM 10 concentrations were applied to new observations with data from Brunei Darussalam. The performance of the ANCOVA and RFR models in estimating daily PM 2.5 and PM 10 concentrations for overall air quality as well as for good and moderate air quality conditions during different monsoon seasons in both countries was evaluated based on statistical indicators such as the determination coefficient (R 2 ) and root mean square of the error (RMSE).

Analysis of Covariance (ANCOVA) Model
ANCOVA analysis was implemented, considering the interactions between the quantitative and qualitative explanatory variables. The interaction between explanatory variables A and B (also known as an interaction variable) was represented by the notation "A × B", which is the product of the explanatory variables A and B [33]. The maximum interaction level of the model was 2. The stepwise variables selection method with an entry probability of 0.05 and a removal probability of 0.10 was chosen for the model. A multiple comparison test was applied to all factors (qualitative variables including the interactions between qualitative variables) to determine if the parameters for the various qualitative variables of a factor differ significantly or not. The comparisons were made between all pairs of qualitative variables with a control variable based on the mean squared error (MSE) that was associated with an interaction term in the model [32].

Random Forest Regression (RFR) Model
Estimation models for daily PM 2.5 and PM 10 concentrations can also be developed by the RFR method with bootstrap aggregating (known as bagging). This method aggregates a group of explanatory variables in the form of classification and regression trees (CART) from different bootstrap samples to obtain a more efficient final explanatory variable. The forest sampling method used in this study was random with replacement. The desired number of trees in the forest was 100 and the depth of the maximum tree was 20. The performance of the RFR model was evaluated by the MSE of the validation sample. The importance of a given variable was measured by the mean increase error (MIE) of a tree when the observed values of this variable were randomly exchanged in the out-of-bag (OOB) samples (i.e., data that were not included in the bootstrap samples at each iteration of the forest). The higher the MIE value, the greater the importance of the variable for the model would be [32].

Data Summary
The descriptive statistics of the measured quantitative variables (i.e., PM 2.5 and PM 10 concentrations, air temperature, wind speed, wind direction and rainfall) from all monitoring areas in Singapore and Brunei Darussalam are summarized in Table 1 10 . The overall mean daily concentrations of PM in these two countries were within the WHO air quality guideline limits for PM, in which the 24-h mean guideline values are 25 µg/m 3 for PM 2.5 and 50 µg/m 3 for PM 10 [30]. No large daily mean variation was observed for air temperature, wind speed and rainfall in these two countries. Due to different geographical locations, the prevailing winds were from the south-southeast direction (15% of the observations) in Singapore and from the north-northeast direction (96% of the observations) in Brunei Darussalam. In the selected time periods, the overall air quality in Singapore was frequently in moderate condition (52% of the observations) and good condition (48% of the observations), whereas the overall air quality in Brunei Darussalam was often in good condition (99% of the observations) and occasionally in moderate condition (1% of the observations).  Figure 3 shows the box plots of observed PM 2.5 and PM 10 concentrations in Singapore from March 2016 to February 2018 during good and moderate air quality conditions in different monsoon seasons. Daily PM 2.5 concentration in Singapore was ranged from 4.38 µg/m 3 to 17.06 µg/m 3 during good air quality condition and from 7.95 µg/m 3 to 63.84 µg/m 3 during moderate air quality condition. The highest daily PM 2.5 concentration was found to be 63.84 µg/m 3 and it was observed during SW monsoon season. This was mainly attributed by smoke haze from large-scale forest and peatland biomass burning in Sumatra and Kalimantan (islands in Indonesia) that had been blown by the prevailing southwest winds towards Singapore [34]. As for the daily PM 10 concentration in Singapore, it was ranged from 11.38 µg/m 3 to 32.25 µg/m 3 during good air quality condition and from 22.18 µg/m 3 to 54.80 µg/m 3 during moderate air quality condition. The highest daily PM 10 concentration was found to be 54.80 µg/m 3 and it was observed during NE monsoon season mainly due to forest, shrubland and grassland biomass burning in Mainland Southeast Asia [35][36][37].
The PM 10 concentration was not recorded at the highest concentration and the value was lower than PM 2.5 during the SW monsoon season because of the difference in types of biomasses burnt [38] during SW and NE monsoon seasons in the region. Generally, PM 2.5 emissions were higher during forest and peatland biomass burning while PM 10 emissions were higher during shrubland, crop residual and grassland biomass burning [38,39]. There were several outliers (values that fall outside 1.5 times the interquartile range (IQR) of the third quartile (Q3); Q3 + 1.5IQR) and extreme outliers (values that fall outside 3 times the IQR of the Q3; Q3 + 3IQR) seen in Figure 3. This indicates that Singapore experienced high and extreme particulate events that led to increased PM 2.5 and PM 10 concentrations.  The box plots of theoretical PM 2.5 and observed PM 10 concentrations in Brunei Darussalam from January 2009 to December 2019 during good and moderate air quality conditions in different monsoon seasons are shown in Figure 4. In Brunei Darussalam, the range of theoretical daily PM 2.5 concentration was from 2.45 µg/m 3 to 23.19 µg/m 3 during good air quality conditions and from 22.13 µg/m 3 to 42.93 µg/m 3 during moderate air quality conditions. The highest theoretical daily PM 2.5 concentration was expected during SW monsoon season with a value of 42.93 µg/m 3 as a result of transboundary smoke haze events caused by biomass burning in the region. The range of observed PM 10 concentration in Brunei Darussalam was from 5.75 µg/m 3 to 54.50 µg/m 3 during good air quality conditions and from 52 µg/m 3 to 100.90 µg/m 3 during moderate air quality conditions. The highest daily PM 10 concentration was found to be 100.90 µg/m 3 and it was observed during SW monsoon season due to smoke haze from hotspots in the Borneo region that had been blown by the prevailing southwest winds to Brunei Darussalam [40]. Numerous outliers and extreme outliers were seen in Figure 4, indicating that Brunei Darussalam also experienced high and extreme particulate events that contributed to the increase in the concentrations of PM. Figure 5 shows the scatter plots of estimated daily PM 2.5 concentration against observed daily PM 2.5 concentration by ANCOVA and RFR models based on air quality and meteorological data in Singapore from March 2016 to February 2018 with learning and validation samples. Evaluation of both models' performances in model learning and validation showed that the ANCOVA model produced better fitting and accuracy (R 2 = 0.72 and RMSE = 2.73 µg/m 3 for model learning; R 2 = 0.81 and RMSE = 2.65 µg/m 3 for model validation) than the RFR model (R 2 = 0.66 and RMSE = 3.16 µg/m 3 for model learning; R 2 = 0.73 and RMSE = 3.21 µg/m 3 for model validation) in estimating daily PM 2.5 concentration in Singapore. Eight variables (i.e., (day × air quality condition), (year × air quality condition), (PM 10 concentration × air quality condition), (year × wind speed), (air temperature × wind speed), (air temperature × wind direction), (air temperature × PM 10 concentration), (wind direction × PM 10 concentration)) were retained in the ANCOVA model when the stepwise variables selection method was employed.

Estimation Models for PM 2.5 Concentration
Based on the sum of squares of the errors (SSE) analysis on the ANCOVA model (refer Table 2), variables (day × air quality condition), (year × air quality condition) and (PM 10 concentration × air quality condition) bring significant information to explain the variability of the dependent variable PM 2.5 concentration. The most influential variable among the explanatory variables was the interaction variable (PM 10 concentration * air quality condition) because it has the highest MSE value (171.98 µg/m 3 ) with a relatively low probability associated with the F value (2 × 10 −6 ) when this variable was removed from the ANCOVA model (Table 2). This can be explained by the fact that PM 2.5 is a subset of PM 10 and that PM 10 is one of the determining factors of air quality condition.  = 0.73 and RMSE = 3.21 µg/m 3 for model validation) in estimating daily PM2.5 concentration in Singapore. Eight variables (i.e., (day × air quality condition), (year × air quality condition), (PM10 concentration × air quality condition), (year × wind speed), (air temperature × wind speed), (air temperature × wind direction), (air temperature × PM10 concentration), (wind direction × PM10 concentration)) were retained in the ANCOVA model when the stepwise variables selection method was employed. Based on the sum of squares of the errors (SSE) analysis on the ANCOVA model (refer Table 2), variables (day × air quality condition), (year × air quality condition) and (PM10 concentration × air quality condition) bring significant information to explain the variability of the dependent variable PM2.5 concentration. The most influential variable among the explanatory variables was the interaction variable (PM10 concentration * air quality condition) because it has the highest MSE value (171.98 µg/m 3 ) with a relatively low probability associated with the value (2 × 10 ) when this variable was removed  The equation of the best ANCOVA model for estimating daily PM 2.5 concentration (µg/m 3 ) with significant explanatory variables is provided in Equation (1): where D is the day, Y is the year, PM 10 represents the observed daily PM 10 concentration (µg/m 3 ) and AQ Moderate represents moderate air quality condition. Equation (1)  . This shows that the most important variable for estimating daily PM 2.5 concentration in Singapore by the RFR model was PM 10 concentration.
When the ANCOVA and RFR models were tested using all the observational data in Singapore from March 2016 to February 2018, both models yielded higher accuracy (in terms of RMSE) compared to when they were trained and validated with the corresponding datasets. During model testing, the ANCOVA model showed poorer fitting and accuracy (R 2 = 0.75 and RMSE = 0.10 µg/m 3 ) than the RFR model (R 2 = 0.89 and RMSE = 0.07 µg/m 3 ) in estimating daily PM 2.5 concentration in Singapore, in general ( Figure 6). Due to these reasons, the difference between the estimated and observed daily PM 2.5 concentrations in Singapore was larger for the ANCOVA model (underestimation by 11% on 48% of the observations and overestimation by 15% on 52% of the observations) compared to the RFR model (underestimation by 6% on 47% of the observations and overestimation by 9% on 53% of the observations). The best estimation (i.e., the intersection point between the best fit line/trendline and the diagonal (y = x) line) of PM 2.5 concentration in Singapore was at 15.05 µg/m 3 for the ANCOVA model and 15.75 µg/m 3 for the RFR model. Besides the meteorological parameters, other air pollutants (such as carbon monoxide (CO) and nitrogen oxides (NO x )) concentration data could be added to the model in the future as explanatory variables to further improve the model performance for estimating daily PM 2.5 concentration since they were found to be associated with PM 2.5 concentrations in São Paulo, Brazil for CO [41] and in Fresno, California, USA for NO x [42]. The models' performances during good and moderate air quality conditions in different monsoon seasons in Singapore from March 2016 to February 2018 were also evaluated. Figure 7 showed that the accuracy of the ANCOVA model was reduced (as indicated by an increment in the RMSE value from 1.78 µg/m 3 to 3.13 µg/m 3 , on average) when the air quality condition changed from good to moderate despite having better data fitting (as indicated by an increment in the value from 0.39 to 0.62, on average). This implies that the ANCOVA model may not be able to handle increased PM2.5 concentration well. The value of the ANCOVA model for estimating daily PM2.5 concentration in different monsoon seasons in Singapore was ranged between 0.20 and 0.61 during good air quality and between 0.50 and 0.78 during moderate air quality ( Table 3). The RMSE value of the ANCOVA model for estimating daily PM2.5 concentration in different monsoon seasons in Singapore was ranged between 1.58 µg/m 3 and 1.95 µg/m 3 during good air quality and between 2.43 µg/m 3 and 4.08 µg/m 3 during moderate air quality ( Table 3). The highest RMSE value (4.08 µg/m 3 ) was attained during SW monsoon season when the air quality was moderate, and this was because of the large variation in daily PM2.5 concentration (as indicated by the outliers and extreme outliers in Figure 3a), as a result of the smoke haze event that often occurs in this season. The ANCOVA model exhibited the best perfor- The models' performances during good and moderate air quality conditions in different monsoon seasons in Singapore from March 2016 to February 2018 were also evaluated. Figure 7 showed that the accuracy of the ANCOVA model was reduced (as indicated by an increment in the RMSE value from 1.78 µg/m 3 to 3.13 µg/m 3 , on average) when the air quality condition changed from good to moderate despite having better data fitting (as indicated by an increment in the R 2 value from 0.39 to 0.62, on average). This implies that the ANCOVA model may not be able to handle increased PM 2.5 concentration well. The R 2 value of the ANCOVA model for estimating daily PM 2.5 concentration in different monsoon seasons in Singapore was ranged between 0.20 and 0.61 during good air quality and between 0.50 and 0.78 during moderate air quality ( Table 3). The RMSE value of the ANCOVA model for estimating daily PM 2.5 concentration in different monsoon seasons in Singapore was ranged between 1.58 µg/m 3 and 1.95 µg/m 3 during good air quality and between 2.43 µg/m 3 and 4.08 µg/m 3 during moderate air quality ( Table 3). The highest RMSE value (4.08 µg/m 3 ) was attained during SW monsoon season when the air quality was moderate, and this was because of the large variation in daily PM 2.5 concentration (as indicated by the outliers and extreme outliers in Figure 3a), as a result of the smoke haze event that often occurs in this season. The ANCOVA model exhibited the best performance for daily PM 2.5 concentration estimation during NE monsoon season when the air quality was good with R 2 value of 0.61 and RMSE value of 1.58 µg/m 3 . Table 3. Comparison of determination coefficient (R 2 ) and root mean square of the errors (RMSE) of the ANCOVA and RFR models for estimating daily PM 2.5 and PM 10 Figure 8 shows the scatter plots of observed and estimated daily PM 2.5 concentrations by the RFR model for good and moderate air quality conditions during different monsoon seasons in Singapore from March 2016 to February 2018. It can be seen that the accuracy of the RFR model was reduced (as indicated by an increment in the RMSE value from 1.03 µg/m 3 to 2.29 µg/m 3 , on average) and the data fitting was slightly affected (as indicated by a very small decrement in the R 2 value from 0.81 to 0.80, on average) when the air quality condition changed from good to moderate. This implies that the RFR model may have a limitation in handling increased PM 2.5 concentration. The ranges of R 2 value of the RFR model for estimating daily PM 2.5 concentration in different monsoon seasons in Singapore were between 0.74 and 0.88 during good air quality and between 0.72 and 0.87 during moderate air quality. The ranges of RMSE value of the RFR model for estimating daily PM 2.5 concentration in different monsoon seasons in Singapore was between 0.96 µg/m 3 and 1.13 µg/m 3 during good air quality and between 2.05 µg/m 3 and 2.85 µg/m 3 during moderate air quality. Comparison of the ANCOVA and RFR models' performance for daily PM 2.5 concentration estimation during good and moderate air quality in different monsoon seasons in Singapore showed that the RFR model was more accurate with better data fitting (R 2 = 0.81 and RMSE = 1.66 µg/m 3 , on average) than the ANCOVA model (R 2 = 0.50 and RMSE = 2.46 µg/m 3 , on average) ( Table 3). RMSE value (4.08 µg/m 3 ) was attained during SW monsoon season when the air quality was moderate, and this was because of the large variation in daily PM2.5 concentration (as indicated by the outliers and extreme outliers in Figure 3a), as a result of the smoke haze event that often occurs in this season. The ANCOVA model exhibited the best performance for daily PM2.5 concentration estimation during NE monsoon season when the air quality was good with value of 0.61 and RMSE value of 1.58 µg/m 3 .

Estimation Models for PM 10 Concentration
Scatter plots of estimated daily PM 10 concentration against observed daily PM 10 concentration by ANCOVA and RFR models based on air quality and meteorological data in Singapore from March 2016 to February 2018 with learning and validation samples are illustrated in Figure 9. Both models' performances in model learning and validation were evaluated and the results show that the ANCOVA model has comparable data fitting and accuracy (R 2 = 0.79 and RMSE = 2.82 µg/m 3 ) to the RFR model (R 2 = 0.80 and RMSE = 2.87 µg/m 3 ) in model learning and the RFR model has comparable data fitting and accuracy (R 2 = 0.83 and RMSE = 3.29 µg/m 3 ) to the ANCOVA model (R 2 = 0.82 and RMSE = 3.71 µg/m 3 ) in model validation. Using the stepwise variables selection method, ten variables (i.e., (year × wind speed), (year × PM 2.5 concentration), (air temperature × PM 2.5 concentration), (wind speed × PM 2.5 concentration), (wind speed × air quality condition), (wind direction × PM 2.5 concentration), (wind direction × air quality condition), (PM 2.5 concentration × monsoon season), (year × air temperature), and (air temperature × wind speed)) were retained in the ANCOVA model.

Estimation Models for PM10 Concentration
Scatter plots of estimated daily PM10 concentration against observed daily PM10 concentration by ANCOVA and RFR models based on air quality and meteorological data in Singapore from March 2016 to February 2018 with learning and validation samples are illustrated in Figure 9.  Results of the sum of squares of the errors (SSE) analysis on the ANCOVA model ( Table 4), indicates that variables (year × wind speed), (year × PM2.5 concentration), (air temperature × PM2.5 concentration), (wind speed × PM2.5 concentration), (wind speed × air quality condition), (wind direction × PM2.5 concentration), (wind direction × air quality condition) and (PM2.5 concentration × monsoon season) bring significant information to explain the variability of the dependent variable PM10 concentration. Among the explanatory variables, the interaction variable (wind direction × air quality condition) was the most influential because it has the lowest probability associated with the value (3.58 Results of the sum of squares of the errors (SSE) analysis on the ANCOVA model ( Table 4), indicates that variables (year × wind speed), (year × PM 2.5 concentration), (air temperature × PM 2.5 concentration), (wind speed × PM 2.5 concentration), (wind speed × air quality condition), (wind direction × PM 2.5 concentration), (wind direction × air quality condition) and (PM 2.5 concentration × monsoon season) bring significant information to explain the variability of the dependent variable PM 10 concentration. Among the explanatory variables, the interaction variable (wind direction × air quality condition) was the most influential because it has the lowest probability associated with the F value (3.58 ×10 −6 ), highest SSE value (204.49 µg/m 3 ) and relatively high MSE value (102.25 µg/m 3 ) when this variable was removed from the ANCOVA model (see Table 4). This could be due to the effect of wind direction on PM 10 concentration in the area [43] and the fact that PM 10 is one of the determining factors of air quality condition. The equation of the best ANCOVA model for estimating daily PM 10 concentration (µg/m 3 ) with significant explanatory variables is provided in Equation (2): where Y is the year, T is the air temperature ( . This shows that air quality condition was the most important variable of RFR model for daily PM 10 concentration estimation in Singapore. The ANCOVA and RFR models for daily PM 10 concentration estimation were tested on all the observational data in Singapore from March 2016 to February 2018 and the results show improvement in accuracy (in terms of RMSE) for both models and data fitting, particularly for the RFR model, compared to model learning and validation. The best model for overall estimation of PM 10 concentration in Singapore was the RFR model as it showed better data fitting and higher accuracy (R 2 = 0.93 and RMSE = 0.07 µg/m 3 ) than the ANCOVA model (R 2 = 0.81 and RMSE = 0.11 µg/m 3 ) (Figure 10). It was found that the ANCOVA model underestimated daily PM 10 concentration in Singapore by 8% (48% of the observations) and overestimated by 9% (52% of the observations) whereas the RFR model underestimated daily PM 10 concentration in Singapore by 4% (46% of the observations) and overestimated by 5% (54% of the observations). The best estimation of PM 10 concentration in Singapore was at 25.32 µg/m 3 for the ANCOVA model and 25.92 µg/m 3 for the RFR model ( Figure 10). Both ANCOVA and RFR models developed in this study for daily PM 10 concentration estimation outperformed the multiple non-linear regression (MNLR) model (R 2 = 0.36 and RMSE = 20.30 µg/m 3 , on average) for estimating daily PM 10 concentration in three cities (Budapest, Miskolc and Pécs) in Hungary [44]. This could be due to more explanatory variables being used in the model development of this study compared to their study, which only has three explanatory variables (i.e., temperature, wind speed and boundary layer height).  Figure 10). Both ANCOVA and RFR models developed in this study for daily PM10 concentration estimation outperformed the multiple non-linear regression (MNLR) model ( = 0.36 and RMSE = 20.30 µg/m 3 , on average) for estimating daily PM10 concentration in three cities (Budapest, Miskolc and Pécs) in Hungary [44]. This could be due to more explanatory variables being used in the model development of this study compared to their study, which only has three explanatory variables (i.e., temperature, wind speed and boundary layer height).  Figure 11 shows the observed and estimated daily PM10 concentrations by the AN-COVA model for good and moderate air quality conditions during different monsoon seasons in Singapore from March 2016 to February 2018. Although the model has better data fitting (as indicated by an increment in the value from 0.39 to 0.62, on average), its accuracy was reduced (as indicated by an increment in the RMSE value from 2.24 µg/m 3 to 3.42 µg/m 3 , on average) when the good air quality condition became moderate. This shows that the ANCOVA model may not have good capability of handling increased PM10 concentration as well as PM2.5 concentration (as mentioned earlier in Section 3.2). A possible reason for this could be the low interaction level of the ANCOVA model used in this study, which may not be enough to account for the confounding effects between the input parameters of the model to describe the complex reality of atmospheric pollution. Therefore, it was suggested to use a higher interaction level for the ANCOVA model in future studies.  Figure 11 shows the observed and estimated daily PM 10 concentrations by the AN-COVA model for good and moderate air quality conditions during different monsoon seasons in Singapore from March 2016 to February 2018. Although the model has better data fitting (as indicated by an increment in the R 2 value from 0.39 to 0.62, on average), its accuracy was reduced (as indicated by an increment in the RMSE value from 2.24 µg/m 3 to 3.42 µg/m 3 , on average) when the good air quality condition became moderate. This shows that the ANCOVA model may not have good capability of handling increased PM 10 concentration as well as PM 2.5 concentration (as mentioned earlier in Section 3.2). A possible reason for this could be the low interaction level of the ANCOVA model used in this study, which may not be enough to account for the confounding effects between the input parameters of the model to describe the complex reality of atmospheric pollution. Therefore, it was suggested to use a higher interaction level for the ANCOVA model in future studies.
As shown in Table 3, the ANCOVA model for estimating daily PM 10 concentration in different monsoon seasons in Singapore has R 2 value ranging between 0.18 and 0.63 during good air quality and between 0.53 and 0.76 during moderate air quality, and RMSE value ranging between 1.93 µg/m 3 and 2.65 µg/m 3 during good air quality and between 2.86 µg/m 3 and 3.98 µg/m 3 during moderate air quality. The highest RMSE value (3.98 µg/m 3 ) was attained during moderate air quality in inter-monsoon 1 season, and this could be due to large variation in daily PM 10 concentration (as indicated by the outliers in Figure 4b) as a result of the prolonged smoke haze event from the NE monsoon season, which limited the dispersion of PM 10 by the meteorological parameters (for examples, wind speed, wind direction and rainfall) [45]. The ANCOVA model for daily PM 10 concentration estimation in Singapore showed the best performance during good air quality in SW monsoon because it has the highest estimation accuracy with an RMSE value of 1.93 g/m 3 although it has a low R 2 value of 0.38. concentration as well as PM2.5 concentration (as mentioned earlier in Section 3.2). A possible reason for this could be the low interaction level of the ANCOVA model used in this study, which may not be enough to account for the confounding effects between the input parameters of the model to describe the complex reality of atmospheric pollution. Therefore, it was suggested to use a higher interaction level for the ANCOVA model in future studies. As shown in Table 3, the ANCOVA model for estimating daily PM10 concentration in different monsoon seasons in Singapore has value ranging between 0.18 and 0.63 during good air quality and between 0.53 and 0.76 during moderate air quality, and RMSE value ranging between 1.93 µg/m 3 and 2.65 µg/m 3 during good air quality and between 2.86 µg/m 3 and 3.98 µg/m 3 during moderate air quality. The highest RMSE value (3.98 Scatter plots of the observed and estimated daily PM 10 concentrations by the RFR model for good and moderate air quality conditions during different monsoon seasons in Singapore from March 2016 to February 2018 are shown in Figure 12. As the air quality changed from good to moderate, the fitting of data on the RFR model was improved (as indicated by an increment in the R 2 value from 0.84 to 0.86, on average) but the accuracy of the model was reduced (as indicated by an increment in the RMSE value from 1.15 µg/m 3 to 2.21 µg/m 3 , on average). The accuracy of the RFR model can possibly be increased by increasing the number of trees in future studies. The RFR model for daily PM 10 concentration estimation in different monsoon seasons in Singapore has an R 2 value ranging between 0.76 and 0.90 during good air quality and between 0.84 and 0.87 during moderate air quality with RMSE value ranging between 1.00 µg/m 3 and 1.45 µg/m 3 during good air quality and between 2.03 µg/m 3 and 2.59 µg/m 3 during moderate air quality. The performances of the ANCOVA and RFR models for daily PM 10 concentration estimation during good and moderate air quality in different monsoon seasons in Singapore were compared in Table 3 and the results show that the RFR model was more accurate with better data fitting (R 2 = 0.85 and RMSE = 1.68 µg/m 3 , on average) than the ANCOVA model (R 2 = 0.50 and RMSE = 2.83 µg/m 3 , on average).

Application of the Derived Models for Estimating PM 2.5 and PM 10 Concentrations in Brunei Darussalam
To test the applicability of the derived models as cross-country models for estimating PM 2.5 and PM 10 concentrations in Southeast Asia, they were tested with air quality and meteorological data from Brunei Darussalam from January 2009 to December 2019 as an example. The results for PM 2.5 and PM 10 concentrations estimations in Brunei Darussalam were provided and discussed separately in the following sections. (1 year) in Taiwan in 2020 [47]. These could be attributed to more homogeneous topography (mainly basins) in Singapore and Brunei Darussalam compared to complicated topography in Iran and Taiwan (with mountainous layouts), leading to the simpler dispersal of air pollutants.

Estimation of PM 2.5 Concentration in Brunei Darussalam
quality and between 2.03 µg/m 3 and 2.59 µg/m 3 during moderate air quality. The performances of the ANCOVA and RFR models for daily PM10 concentration estimation during good and moderate air quality in different monsoon seasons in Singapore were compared in Table 3 and the results show that the RFR model was more accurate with better data fitting ( = 0.85 and RMSE = 1.68 µg/m 3 , on average) than the ANCOVA model ( = 0.50 and RMSE = 2.83 µg/m 3 , on average).

Application of the Derived Models for Estimating PM2.5 and PM10 Concentrations in Brunei Darussalam
To test the applicability of the derived models as cross-country models for estimating PM2.5 and PM10 concentrations in Southeast Asia, they were tested with air quality and  A stagnant pattern of PM2.5 concentration at theoretical values beyond 18 µg/m 3 was seen in Figure 13b and this shows that the derived RFR model was not able to accurately estimate daily PM2.5 concentration in Brunei Darussalam beyond this concentration. This could be due to insufficient explanatory variables [21,48] to describe the increase in PM2.5 concentration in Brunei Darussalam that could be attributed to the occurrence of smoke haze. A possible explanatory variable that could be added to the proposed models is wildfires information because the occurrence of intense wildfires will lead to a heatwave and high PM2.5 concentrations that may be transported thousands of kilometers away from their source areas [48,49], affecting the air quality in the nearby regions. For example, longranged transported PM pollution episodes caused by wildfires in eastern Europe (Russia, Belarus, Ukraine and the Baltic countries) are common in Finland [50]. The study found that the ANCOVA model underestimated by 17% (17% of the observations) and overestimated by 32% (83% of the observations) on the daily PM2.5 concentration in Brunei Darussalam. Meanwhile, the RFR model tends to overestimate the daily PM2.5 concentration in Brunei Darussalam by 30% (99.7% of the observations) and underestimated by 8% (0.3% of the observations). The best estimation for PM2.5 concentration in Brunei Darussalam was at 4.08 µg/m 3 for the ANCOVA model but that could not be determined for the RFR model ( Figure 13).
There are no scatter plots presented in this section to show the estimation results of the derived ANCOVA and RFR models for moderate air quality in NE and both intermonsoon seasons in Brunei Darussalam because of the limited availability of theoretical PM2.5 and observed PM10 concentrations data during these air quality conditions and monsoon seasons. The estimation results of the derived ANCOVA model for daily PM2.5 concentration for good and moderate air quality conditions during other monsoon seasons in Brunei Darussalam from January 2009 to December 2019 are shown in Figure 14. When the good air quality became moderate, the accuracy of the model was worsened (as indicated by a significant increment in the RMSE value from 2.88 µg/m 3 to 13.24 µg/m 3 , on average) despite improvement in data fitting (as indicated by an increment in the value from 0.85 to 0.96, on average). This could possibly be due to the inadequate interaction level among the model's input parameters being accounted for by the derived AN-COVA model. Table 5 shows that the value of the ANCOVA model for estimating daily PM2.5 concentration in different monsoon seasons in Brunei Darussalam was ranged between 0.78 and 0.94 during good air quality and 0.96 during moderate air quality, and A stagnant pattern of PM 2.5 concentration at theoretical values beyond 18 µg/m 3 was seen in Figure 13b and this shows that the derived RFR model was not able to accurately estimate daily PM 2.5 concentration in Brunei Darussalam beyond this concentration. This could be due to insufficient explanatory variables [21,48] to describe the increase in PM 2.5 concentration in Brunei Darussalam that could be attributed to the occurrence of smoke haze. A possible explanatory variable that could be added to the proposed models is wildfires information because the occurrence of intense wildfires will lead to a heatwave and high PM 2.5 concentrations that may be transported thousands of kilometers away from their source areas [48,49], affecting the air quality in the nearby regions. For example, longranged transported PM pollution episodes caused by wildfires in eastern Europe (Russia, Belarus, Ukraine and the Baltic countries) are common in Finland [50]. The study found that the ANCOVA model underestimated by 17% (17% of the observations) and overestimated by 32% (83% of the observations) on the daily PM 2.5 concentration in Brunei Darussalam. Meanwhile, the RFR model tends to overestimate the daily PM 2.5 concentration in Brunei Darussalam by 30% (99.7% of the observations) and underestimated by 8% (0.3% of the observations). The best estimation for PM 2.5 concentration in Brunei Darussalam was at 4.08 µg/m 3 for the ANCOVA model but that could not be determined for the RFR model ( Figure 13).
There are no scatter plots presented in this section to show the estimation results of the derived ANCOVA and RFR models for moderate air quality in NE and both inter-monsoon seasons in Brunei Darussalam because of the limited availability of theoretical PM 2.5 and observed PM 10 concentrations data during these air quality conditions and monsoon seasons. The estimation results of the derived ANCOVA model for daily PM 2.5 concentration for good and moderate air quality conditions during other monsoon seasons in Brunei Darussalam from January 2009 to December 2019 are shown in Figure 14. When the good air quality became moderate, the accuracy of the model was worsened (as indicated by a significant increment in the RMSE value from 2.88 µg/m 3 to 13.24 µg/m 3 , on average) despite improvement in data fitting (as indicated by an increment in the R 2 value from 0.85 to 0.96, on average). This could possibly be due to the inadequate interaction level among the model's input parameters being accounted for by the derived ANCOVA model. Table 5 shows that the R 2 value of the ANCOVA model for estimating daily PM 2.5 concentration in different monsoon seasons in Brunei Darussalam was ranged between 0.78 and 0.94 during good air quality and 0.96 during moderate air quality, and the RMSE value was ranged between 2.49 µg/m 3 and 3.76 µg/m 3 during good air quality and 13.24 µg/m 3 during moderate air quality. The highest RMSE value (13.24 µg/m 3 ) was obtained during moderate air quality in SW monsoon season, and this could be due to the large variation in daily PM 2.5 concentration (see Figure 4a) that was likely to be contributed by the smoke haze event occurring in this season. The ANCOVA model for daily PM 2.5 concentration in Brunei Darussalam showed the best performance during NE monsoon season at good air quality condition by having the highest estimation accuracy (RMSE = 2.49 µg/m 3 ) despite having a trade-off with the fitting of the data (as indicated by the lowest R 2 value of 0.78).
the RMSE value was ranged between 2.49 µg/m 3 and 3.76 µg/m 3 during good air quality and 13.24 µg/m 3 during moderate air quality. The highest RMSE value (13.24 µg/m 3 ) was obtained during moderate air quality in SW monsoon season, and this could be due to the large variation in daily PM2.5 concentration (see Figure 4a) that was likely to be contributed by the smoke haze event occurring in this season. The ANCOVA model for daily PM2.5 concentration in Brunei Darussalam showed the best performance during NE monsoon season at good air quality condition by having the highest estimation accuracy (RMSE = 2.49 µg/m 3 ) despite having a trade-off with the fitting of the data (as indicated by the lowest value of 0.78).    Figure 15 shows the estimation results of the derived RFR model for daily PM 2.5 concentration for good and moderate air quality conditions during different monsoon seasons in Brunei Darussalam from January 2009 to December 2019. The fitting of data on the RFR model was worsened (as indicated by a major drop in the R 2 value from 0.92 to 0.07, on average) and the accuracy of the model was decreased (as indicated by an increment in the RMSE value from 2.34 µg/m 3 to 6.55 µg/m 3 , on average) when the air quality changed from good to moderate. The ranges of R 2 value of the RFR model for estimating daily PM 2.5 concentration in different monsoon seasons in Brunei Darussalam were between 0.90 and 0.94 during good air quality and 0.07 during moderate air quality, and the ranges of RMSE value of the RFR model were between 1.99 µg/m 3 and 3.04 µg/m 3 during good air quality and 6.55 µg/m 3 during moderate air quality. The comparison of the performances of both models for daily PM 2.5 concentration estimation during good and moderate air quality conditions in different monsoon seasons in Brunei Darussalam are shown in Table 5 and the results show that the derived ANCOVA model generally gave better fitting on the data although it has a slightly lower accuracy (R 2 = 0.87 and RMSE = 4.95 µg/m 3 , on average) than the RFR model (R 2 = 0.75 and RMSE = 3.18 µg/m 3 , on average). However, having said that, both derived models show limitations in handling high PM 2.5 concentrations when tested on the datasets from both countries; therefore, further studies to improve these models are necessary before they can be used as cross-country models.
in Table 5 and the results show that the derived ANCOVA model generally gave better fitting on the data although it has a slightly lower accuracy ( = 0.87 and RMSE = 4.95 µg/m 3 , on average) than the RFR model ( = 0.75 and RMSE = 3.18 µg/m 3 , on average). However, having said that, both derived models show limitations in handling high PM2.5 concentrations when tested on the datasets from both countries; therefore, further studies to improve these models are necessary before they can be used as cross-country models.

Estimation of PM10 Concentration in Brunei Darussalam
The testing results of the derived ANCOVA and RFR models to estimate daily PM10 concentrations in Brunei Darussalam from January 2009 to December 2019 are presented in Figure 16. It was seen that the RFR model has comparable data fitting and accuracy ( = 0.86 and RMSE = 0.08 µg/m 3 ) compared to the ANCOVA model ( = 0.723 and RMSE = 0.09 µg/m 3 ) although the RFR model could not handle observed PM10 concentration over 18 µg/m 3 (as indicated by the stagnant estimated PM10 concentrations in Figure 16b). Additional input parameters such as vehicular traffic and forest fires information need to be considered to improve the estimation performance of the derived models. Due to the low RMSE value of both derived models, the ANCOVA model underestimated and overestimated the daily PM10 concentration in Brunei Darussalam by 1% (50% of the observations) and the RFR model underestimated by 9% (54% of the observations) and overestimated by 14% (46% of the observations). The best estimation of PM10 concentration in Brunei Darussalam was at 18.94 µg/m 3 for the ANCOVA model and 16.40 µg/m 3 for the RFR model ( Figure 16).

Estimation of PM 10 Concentration in Brunei Darussalam
The testing results of the derived ANCOVA and RFR models to estimate daily PM 10 concentrations in Brunei Darussalam from January 2009 to December 2019 are presented in Figure 16. It was seen that the RFR model has comparable data fitting and accuracy (R 2 = 0.86 and RMSE = 0.08 µg/m 3 ) compared to the ANCOVA model (R 2 = 0.723 and RMSE = 0.09 µg/m 3 ) although the RFR model could not handle observed PM 10 concentration over 18 µg/m 3 (as indicated by the stagnant estimated PM 10 concentrations in Figure 16b). Additional input parameters such as vehicular traffic and forest fires information need to be considered to improve the estimation performance of the derived models. Due to the low RMSE value of both derived models, the ANCOVA model underestimated and overestimated the daily PM 10 [51]. This could be due to the spatial and temporal variations of PM emissions from other major air pollution sources such as motor vehicles and industrial activities as a result of different developments among countries in Southeast Asia [52]. For example, in Singapore, motor vehicles account for about 50% of the total PM2.5 emissions [53] and about 60% of the estimated total ground transportation PM emissions [54]. In Malaysia, motor vehicles contributed to about 17% of the total PM emissions in 2010 [55] and the highest mean monthly PM10 concentration (68.79 µg/m 3 ) between 1997 and 2015 was recorded in Port Klang, Malaysia [56] due to high traffic volume and proportion of diesel vehicles [57]. Figure 17 shows the observed and estimated daily PM10 concentrations by the AN-COVA model for good and moderate air quality conditions during different monsoon seasons in Brunei Darussalam from January 2009 to December 2019. It can be seen that the accuracy of the derived ANCOVA model was reduced (as indicated by an increment in the RMSE value from 19.06 µg/m 3 to 32.55 µg/m 3 , on average) despite better data fitting on the model (as indicated by an increment in the value from 0.56 to 0.78, on average) when the air quality changed from good to moderate. This may imply that the confounding effects between the input parameters of the model were not well described by the model. To overcome this, the interaction level of the ANCOVA model should be increased in future studies. Table 5 shows that the derived ANCOVA model for estimating daily PM10 concentration in different monsoon seasons in Brunei Darussalam has value ranging between 0.49 and 0.68 during good air quality and 0.78 during moderate air quality with RMSE value ranging between 3.70 µg/m 3 and 6.16 µg/m 3 during good air quality and 32.55 µg/m 3 during moderate air quality. The highest RMSE value (32.55 µg/m 3 ) was obtained during moderate air quality in SW monsoon season due to the large variation in daily PM10 concentration (see Figure 4b) that could have resulted from the occurrence of transported smoke haze in the region during this season.  (7 years) in Sarawak and Peninsular Malaysia in 2020 [51]. This could be due to the spatial and temporal variations of PM emissions from other major air pollution sources such as motor vehicles and industrial activities as a result of different developments among countries in Southeast Asia [52]. For example, in Singapore, motor vehicles account for about 50% of the total PM 2.5 emissions [53] and about 60% of the estimated total ground transportation PM emissions [54]. In Malaysia, motor vehicles contributed to about 17% of the total PM emissions in 2010 [55] and the highest mean monthly PM 10 concentration (68.79 µg/m 3 ) between 1997 and 2015 was recorded in Port Klang, Malaysia [56] due to high traffic volume and proportion of diesel vehicles [57]. Figure 17 shows the observed and estimated daily PM 10 concentrations by the AN-COVA model for good and moderate air quality conditions during different monsoon seasons in Brunei Darussalam from January 2009 to December 2019. It can be seen that the accuracy of the derived ANCOVA model was reduced (as indicated by an increment in the RMSE value from 19.06 µg/m 3 to 32.55 µg/m 3 , on average) despite better data fitting on the model (as indicated by an increment in the R 2 value from 0.56 to 0.78, on average) when the air quality changed from good to moderate. This may imply that the confounding effects between the input parameters of the model were not well described by the model. To overcome this, the interaction level of the ANCOVA model should be increased in future studies. Table 5 shows that the derived ANCOVA model for estimating daily PM 10 concentration in different monsoon seasons in Brunei Darussalam has R 2 value ranging between 0.49 and 0.68 during good air quality and 0.78 during moderate air quality with RMSE value ranging between 3.70 µg/m 3 and 6.16 µg/m 3 during good air quality and 32.55 µg/m 3 during moderate air quality. The highest RMSE value (32.55 µg/m 3 ) was obtained during moderate air quality in SW monsoon season due to the large variation in daily PM 10 concentration (see Figure 4b) that could have resulted from the occurrence of transported smoke haze in the region during this season. The performance of the derived RFR model for estimating daily PM10 concentration for good and moderate air quality conditions during different monsoon seasons in Brunei Darussalam from January 2009 to December 2019 are shown in Figure 18. As the air quality became moderate, the data fitting and accuracy of the RFR model were reduced (as indicated by the drop in the value from 0.83 to 0.69, on average, and the increase in the RMSE value from 3.12 µg/m 3 to 30.55 µg/m 3 , on average). This showed that the derived RFR model could not describe the increase in PM10 concentration well. When the derived RFR model was used to estimate daily PM10 concentration estimation in different monsoon seasons in Brunei Darussalam, it gave value between 0.80 and 0.87 during good air quality and 0.69 during moderate air quality with RMSE value between 2.13 µg/m 3 and 5.17 µg/m 3 during good air quality and 30.55 µg/m 3 during moderate air quality. Although The performance of the derived RFR model for estimating daily PM 10 concentration for good and moderate air quality conditions during different monsoon seasons in Brunei Darussalam from January 2009 to December 2019 are shown in Figure 18. As the air quality became moderate, the data fitting and accuracy of the RFR model were reduced (as indicated by the drop in the R 2 value from 0.83 to 0.69, on average, and the increase in the RMSE value from 3.12 µg/m 3 to 30.55 µg/m 3 , on average). This showed that the derived RFR model could not describe the increase in PM 10 concentration well. When the derived RFR model was used to estimate daily PM 10 concentration estimation in different monsoon seasons in Brunei Darussalam, it gave R 2 value between 0.80 and 0.87 during good air quality and 0.69 during moderate air quality with RMSE value between 2.13 µg/m 3 and 5.17 µg/m 3 during good air quality and 30.55 µg/m 3 during moderate air quality.
Although the derived ANCOVA model generally fits the data less well and it yielded lower accuracy (R 2 = 0.60 and RMSE = 10.32 µg/m 3 , on average) compared to the RFR model (R 2 = 0.80 and RMSE = 8.61 µg/m 3 , on average) (Table 5), the derived ANCOVA model can handle increased PM 10 concentration much better than the RFR model, particularly during moderate air quality in SW monsoon season.  (Table 5), the derived ANCOVA model can handle increased PM10 concentration much better than the RFR model, particularly during moderate air quality in SW monsoon season.

Conclusions
This study explores the potential to estimate daily PM2.5 and PM10 concentrations in Brunei Darussalam using ML-based statistical models (such as ANCOVA and RFR) derived from air quality and meteorological data in Singapore with statistical assessments. The most influential explanatory variables for estimating PM2.5 and PM10 concentrations

Conclusions
This study explores the potential to estimate daily PM 2.5 and PM 10 concentrations in Brunei Darussalam using ML-based statistical models (such as ANCOVA and RFR) derived from air quality and meteorological data in Singapore with statistical assessments. The most influential explanatory variables for estimating PM 2.5 and PM 10 concentrations in Singapore by the ANCOVA model were the interaction variables (PM 10 concentration × air quality condition) and (wind direction × air quality condition), respectively. Meanwhile, the most important variables for estimating daily PM 2.5 and PM 10 concentrations in Singapore by the RFR model were PM 10 concentration and air quality condition, respectively. Both ANCOVA (R 2 = 0.75 and RMSE = 0.10 µg/m 3 for PM 2.5 , and R 2 = 0.81 and RMSE = 0.11 µg/m 3 for PM 10 ) and RFR models (R 2 = 0.89 and RMSE = 0.07 µg/m 3 for PM 2.5 , and R 2 = 0.93 and RMSE = 0.07 µg/m 3 for PM 10 ) performed well when used to estimate daily PM 2.5 and PM 10 concentrations in Singapore. When these derived models were tested with air quality and meteorological data from Brunei Darussalam to estimate its daily PM 2.5 and PM 10 concentrations, the ANCOVA model seems to perform better (R 2 = 0.94 and RMSE = 0.05 µg/m 3 for PM 2.5 , and R 2 = 0.72 and RMSE = 0.09 µg/m 3 for PM 10 ) than the RFR model (R 2 = 0.92 and RMSE = 0.04 µg/m 3 for PM 2.5 , and R 2 = 0.86 and RMSE = 0.08 µg/m 3 for PM 10 ) because it can describes PM concentrations over 18 µg/m 3 better than the RFR model.
The limitations of the models in this work are due to insufficient data for unique incidents (for example, serious smoke haze leading to moderate or unhealthy air quality condition), low interaction level among the ANCOVA model's input parameters used in the model development, low number of trees when developing the RFR model and insufficient explanatory variables relating to atmospheric PM pollution (for examples, vehicular traffic and forest fires information). These estimation models for PM concentrations can be improved further in future studies by including more data recorded during moderate and/or unhealthy air quality conditions, increasing the interaction level of the ANCOVA model, increasing the number of trees for the RFR model, performing cross-validation on the datasets and including major domestic anthropogenic emissions such as vehicular traffic and/or forest fire information as the explanatory variables. Overall, the study had demonstrated the potential of applying the models as cross-country models in the Southeast Asia region although more actual/measured PM 2.5 concentration data from Brunei Darussalam in the future are needed to test the accuracy of the models and more experimental on fine-tuning the models' parameters to improve the model performance as well as their capability in handling higher PM concentrations in the region.