Performance Evaluation of CCAM-CTM Regional Airshed Modelling for the New South Wales Greater Metropolitan Region

A comprehensive evaluation of the performance of the coupled Conformal Cubic Atmospheric Model (CCAM) and Chemical Transport Model (CTM) (CCAM-CTM) for the New South Wales Greater Metropolitan Region (NSW GMR) was conducted based on modelling results for two periods coinciding with measurement campaigns undertaken during the Sydney Particle Study (SPS), namely the summer in 2011 (SPS1) and the autumn in 2012 (SPS2). The model performance was evaluated for fine particulate matter (PM2.5), ozone (O3) and nitrogen dioxide (NO2) against air quality data from the NSW Government’s air quality monitoring network, and PM2.5 components were compared with speciated PM measurements from the Sydney Particle Study’s Westmead sampling site. The model tends to overpredict PM2.5 with normalised mean bias (NMB) less than 20%, however, moderate underpredictions of the daily peak are found on high PM2.5 days. The PM2.5 predictions at all sites comply with performance criteria for mean fractional bias (MFB) of ±60%, but only PM2.5 predictions at Earlwood further comply with the performance goal for MFB of ±30% during both periods. The model generally captures the diurnal variations in ozone with a slight underestimation. The model also tends to underpredict daily maximum hourly ozone. Ozone predictions across regions in SPS1, as well as in Sydney East, Sydney Northwest and Illawarra regions in SPS2 comply with the benchmark of MFB of ±15%, however, none of the regions comply with the benchmark for mean fractional error (MFE) of 35%. The model reproduces the diurnal variations and magnitudes of NO2 well, with a slightly underestimating tendency across the regions. The MFE and normalised mean error (NME) for NO2 predictions fall well within the ranges inferred from other studies. Model results are within a factor of two of measured averages for sulphate, nitrate, sodium and organic matter, with elemental carbon, chloride, magnesium and ammonium being underpredicted. The overall performance of CCAM-CTM modelling system for the NSW GMR is comparable to similar model predictions by other regional airshed models documented in the literature. The performance of the modelling system is found to be variable according to benchmark criteria and depend on the location of the sites, as well as the time of the year. The benchmarking of CCAM-CTM modelling system supports the application of this model for air quality impact assessment and policy scenario modelling to inform air quality management in NSW.


Introduction
Air quality in Sydney is comparable with other cities in Australia and good by world standards.However, particle pollution and ozone concentrations occasionally exceed national air quality standards.The New South Wales Office of Environment and Heritage (NSW OEH) provides information to local communities on current air quality in a timely manner based on air quality monitoring network with 43 stations (https://www.environment.nsw.gov.au/topics/air/monitoringair-quality).This monitoring network delivers robust, continuous air pollution concentration data for stations sited to characterise regional air quality, whereas regional airshed modelling can be used to project high-resolution spatial variations in air pollution concentrations and to forecast air quality.
Regional airshed modelling is a powerful and widely applied tool for air quality management (e.g., [1]).It plays a key role in the development and implementation of air quality regulations.For example, the Weather Research and Forecasting model coupled with Chemistry (WRF/Chem) has been implemented to evaluate the impact of current and possible future vehicle emissions from China on Asian air quality [2].The WRF/Chem has also been used to assess air quality impacts due to decreases in NOx emissions from eastern U.S. power plants and their effects on regional ozone [3].The Community Multi-scale Air Quality (CMAQ) model has been utilized to simulate regional PM 2.5 concentrations in the UK for 2020, with such modelling able to answer air policy questions [4].The CMAQ model has also been applied in China to study air quality improvements to be achieved by the introduction of more stringent emission standards for power plants [5].Over the last decade, regional air quality models have matured, becoming an indispensable component of air quality forecasting (e.g., [6,7]).They are also being used for population air pollution exposure estimation and related health impact assessments (e.g., [8,9]).
Regional airshed modelling efforts in NSW have focused on simulating ozone photochemistry and particle formation and removal processes due these measured air pollutants exceeding national air quality standards from time to time.Photochemical models applied within NSW historically to inform regional air quality management first started with the California Institute of Technology (CIT) airshed model (1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000).After 2001, modelling work moved to a coupling system with a prognostic meteorological model embedded in The Air Pollution Model (TAPM) and a chemical transport model (CTM), the TAPM-CTM modelling system [10].The TAPM-CTM was originally developed by the Commonwealth Scientific and Industrial Research Organisation (CSIRO) for use with the Australian Air Quality Forecasting System [7].More recently, as a part of the NSW Government commissioned Sydney Particle Study (SPS), CSIRO demonstrated the applicability of the coupled Conformal Cubic Atmospheric Model (CCAM) and a CTM (hereafter, the CCAM-CTM).The CCAM-CTM allows for the modelling of continental scale emissions, thus improving projections of background air pollution concentrations, and it was being used to perform regional air quality modelling in the NSW Greater Metropolitan Region (GMR) for investigating the characteristics of particles in Sydney, Australia [11].With the applicability of the CCAM-CTM partially demonstrated in the Sydney Particle Study, this coupled model has been selected for further performance evaluation by NSW OEH with the aim to support robust regional airshed modelling of ozone, primary and secondary particles and other air pollutants in NSW to inform air policies and programs and provide information to communities.
Performance evaluations and benchmarking of regional airshed models have received increasing attention due to the use of these models to support policy development and decision making.Many model evaluation guidance documents have been published by international policy-making jurisdictions, e.g., the United States Environmental Protection Agency (US EPA) published their Guidance on the Use of Models and Other Analyses for Demonstrating Attainment of Air Quality Goals for Ozone, PM 2.5 , and Regional Haze in 2007 [12].The US EPA also introduced a framework for rigorously evaluating regional-scale numerical photochemical modelling systems for determining the suitability of a modelling system for a given application, as well as analysing the impacts of regulatory policy options [13].In the UK, the Department of Environment, Food and Rural Affairs (DEFRA) provides basic advice and presents critical steps for air quality model evaluation [14].The European Environment Agency (EEA) has outlined the procedures for model quality assurance, evaluation and validation in their technical reference guide [15].Within the framework of Forum for Air Quality Modelling in Europe (FAIRMODE), the DELTA tool and benchmarking service for air quality models was developed and reported [16].More model performance evaluation studies can be found within the international research community and cross-agency organization.For example, the Air Quality Model Evaluation International Initiative (AQMEII) was set-up to provide a forum for the advancement of model evaluation methods of regional-scale air quality models (e.g., [17,18]).
In this study, the CCAM-CTM modelling system was used to simulate air quality within the NSW GMR for two periods coinciding with intensive measurement campaigns undertaken during the Sydney Particle Study [11].Meteorological simulations and model predictions for criteria pollutants fine particulate matter (PM 2.5 ), ozone (O 3 ) and nitrogen dioxide (NO 2 ) were assessed based on meteorological and air quality measurements from the NSW OEH air quality monitoring network and the Australian Bureau of Meteorology's (BOM) automatic weather station network.Model predictions for PM 2.5 components were compared to speciated PM measurements from the Westmead sampling site (26 km to the west of the Sydney central business district) established for the Sydney Particle Study.The benchmarking of CCAM-CTM system for the NSW GMR, through reference to the performance of regional airshed modelling documented in the literature, was undertaken to support future applications of this advanced modelling system to provide robust evidence and assist in the design of NSW air policies and programs.

Modelling System
The modelling system implemented in this study consists of a meteorology module (CCAM), emission module, and a chemical transport module (CTM).The schematic diagram of the modelling system is illustrated in Figure 1.CCAM-CTM modelling was undertaken using four nested domains, comprising the outermost Australian domain (AUS) at 80-km × 80-km resolution (75 × 65 grid cells), the New South Wales domain (NSW) at 27-km × 27-km (60 × 60 grid cells), the Greater Metropolitan Region domain (GMR) at 9-km × 9-km resolution (60 × 60 grid cells) and the innermost Sydney domain (SYD) at 3-km × 3-km resolution (60 × 60 grid cells).Model domain configuration is shown in Figure 2. CCAM-CTM was run for two periods coinciding with the Sydney Particle Study measurement campaigns, which were Stage I (hereafter, SPS1) in the summer (from 5 February to 7 March 2011) and Stage II (hereafter, SPS2) in the autumn (from 16 April to 14 May 2012), plus three-day spin-up prior to each experiment.
Atmosphere 2018, 9, x FOR PEER REVIEW 3 of 32 assurance, evaluation and validation in their technical reference guide [15].Within the framework of Forum for Air Quality Modelling in Europe (FAIRMODE), the DELTA tool and benchmarking service for air quality models was developed and reported [16].More model performance evaluation studies can be found within the international research community and cross-agency organization.For example, the Air Quality Model Evaluation International Initiative (AQMEII) was set-up to provide a forum for the advancement of model evaluation methods of regional-scale air quality models (e.g., [17,18]).In this study, the CCAM-CTM modelling system was used to simulate air quality within the NSW GMR for two periods coinciding with intensive measurement campaigns undertaken during the Sydney Particle Study [11].Meteorological simulations and model predictions for criteria pollutants fine particulate matter (PM2.5),ozone (O3) and nitrogen dioxide (NO2) were assessed based on meteorological and air quality measurements from the NSW OEH air quality monitoring network and the Australian Bureau of Meteorology's (BOM) automatic weather station network.Model predictions for PM2.5 components were compared to speciated PM measurements from the Westmead sampling site (26 kilometres to the west of the Sydney central business district) established for the Sydney Particle Study.The benchmarking of CCAM-CTM system for the NSW GMR, through reference to the performance of regional airshed modelling documented in the literature, was undertaken to support future applications of this advanced modelling system to provide robust evidence and assist in the design of NSW air policies and programs.

Modelling System
The modelling system implemented in this study consists of a meteorology module (CCAM), emission module, and a chemical transport module (CTM).The schematic diagram of the modelling system is illustrated in Figure 1

Meteorological Module
CCAM is a semi-implicit, semi-Lagrangian atmospheric climate model based on the conformal cubic grid [19].Further details on the design of CCAM can be obtained from [20].In this study, the European Reanalysis Interim (ERA-Interim) reanalysis was the host global climate model (GCM) data and was fed into CCAM to downscale into four nested domains, as shown in Figure 2. A configuration of 35 vertical levels in the CCAM was also employed.

Emission Module
Emissions from natural sources, including wind-blown dust, marine aerosols and volatile organic compounds (VOC) from vegetation, are calculated in-line within the CTM, as documented by Cope at al. [10,11].The anthropogenic emissions input into the modelling was taken from the NSW GMR Air Emissions Inventory for calendar year 2008 [21].This inventory comprises detailed source and emissions data for over a hundred industrial, commercial, transport, agricultural and residential activities and over a thousand pollutants.The inventory is updated every five years, with the 2013 calendar year emissions inventory not released at the time the modelling study was undertaken.
The 2008 NSW GMR Air Emissions Inventory data were segregated into four categories comprising 16 major source groups, as shown in Table 1.The "On-road motor vehicles" category includes emissions from petrol exhaust, diesel exhaust, other exhaust, petrol evaporation and nonexhaust particulate matter; the "Non-road diesel and marine" category includes emissions from shipping and commercial boats, industrial and commercial vehicles and equipment, aircraft (flight and ground operations), locomotives and commercial non-road equipment; the "Industrial point sources" category comprises emissions from gas-fired and coal-fired power generation and all other industrial stack or vent emissions; and the "Other industrial, commercial and domestic-commercial sources" category includes residential wood heating, industrial area source emissions as separate major source group, and with all other sources in this category combined in a third group.Point sources were modelled at specific location with stack properties (stack height and radius, gas exit temperature and velocity) being input.Other emissions were provided as area sources with emission rates allocated over a 1-km × 1-km grid covering the NSW GMR.

Meteorological Module
CCAM is a semi-implicit, semi-Lagrangian atmospheric climate model based on the conformal cubic grid [19].Further details on the design of CCAM can be obtained from [20].In this study, the European Reanalysis Interim (ERA-Interim) reanalysis was the host global climate model (GCM) data and was fed into CCAM to downscale into four nested domains, as shown in Figure 2. A configuration of 35 vertical levels in the CCAM was also employed.

Emission Module
Emissions from natural sources, including wind-blown dust, marine aerosols and volatile organic compounds (VOC) from vegetation, are calculated in-line within the CTM, as documented by Cope et al. [10,11].The anthropogenic emissions input into the modelling was taken from the NSW GMR Air Emissions Inventory for calendar year 2008 [21].This inventory comprises detailed source and emissions data for over a hundred industrial, commercial, transport, agricultural and residential activities and over a thousand pollutants.The inventory is updated every five years, with the 2013 calendar year emissions inventory not released at the time the modelling study was undertaken.
The 2008 NSW GMR Air Emissions Inventory data were segregated into four categories comprising 16 major source groups, as shown in Table 1.The "On-road motor vehicles" category includes emissions from petrol exhaust, diesel exhaust, other exhaust, petrol evaporation and non-exhaust particulate matter; the "Non-road diesel and marine" category includes emissions from shipping and commercial boats, industrial and commercial vehicles and equipment, aircraft (flight and ground operations), locomotives and commercial non-road equipment; the "Industrial point sources" category comprises emissions from gas-fired and coal-fired power generation and all other industrial stack or vent emissions; and the "Other industrial, commercial and domestic-commercial sources" category includes residential wood heating, industrial area source emissions as separate major source group, and with all other sources in this category combined in a third group.Point sources were modelled at specific location with stack properties (stack height and radius, gas exit temperature and velocity) being input.Other emissions were provided as area sources with emission rates allocated over a 1-km × 1-km grid covering the NSW GMR.Monthly weekdays and weekends time-resolving emission profiles for each source group in Table 1 were developed from the 2008 NSW GMR Air Emissions Inventory.Each emission profile consists of 17 species: nitric oxide (NO), nitrogen dioxide (NO 2 ), carbon monoxide (CO), sulphur dioxide (SO 2 ), particulates (PART, PM 10 ), higher Aldehydes (ALD2), ethene (ETH), ethanol (ETOH), formaldehyde (FORM), isoprene (ISOP), methanol (MEOH), alkenes (Olefins, OLE), alkanes (Paraffins, PAR), toluene (TOL), unreactive (UNR), xylene (XYL), and ammonia (NH 3 ).Emissions for other additional species (e.g., PM 2.5 , EC, OC, Dust, levoglucosan, SO 3 , and extended set of VOC species) required by CB05 mechanism running in CTM were estimated and modelled by applying source-dependent fractions of the species detailed in Appendix A.
Petrol exhaust, diesel exhaust, petrol evaporation and residential wood heater emissions were further scaled in-line within CCAM-CTM based on the modelled ambient temperature at each grid point.February 2008 was used as the reference month for petrol exhaust, diesel exhaust and petrol evaporation emissions, with temperature-adjusted emissions for other months based on this reference month [22].Residential wood heater emissions were varied by the Heating Degree days (HDD, http://www.bom.gov.au/jsp/ncc/climate_averages/degree-days/index.jsp)calculated based on the modelled ambient temperature, with July 2008 taken as the reference month and temperature-adjusted emissions calculated for other months.
Spatial distribution in total PM 2.5 , PM 10 , NO x , VOCs, SO 2 and NH 3 emissions released from anthropogenic sources are illustrated in Figure 3.The spots appearing in the northwest quadrant of Figure 3a,b,e are associated with the emissions from coal-fired power stations Mount Piper and Wallerawang in the Central Tablelands region of New South Wales, Australia.A comprehensive validation of the emissions inventory based on ambient measurements has not been undertaken.Galbally et al [23] provided a comparison of observed and inventoried emission ratios for a range of VOC species, with an analysis on emission ratios also reported in [11].

Chemical Transport Modelling
The chemical transport modelling was implemented by the CSIRO Chemical Transport Model (CTM) [7].The CTM is a three-dimensional Eulerian chemical transport model with the capability of modelling the emission, transport, chemical transformation, wet and dry deposition of a coupled gas and aerosol phase atmospheric system.
The photochemical mechanisms for use in the current CTM is an extended version of the Carbon Bond 5 [24] with updated toluene chemistry [25].There are two options for aerosol schemes: (1) a two-bin aerosol module; or (2) an extended module to include gas phase precursors for secondary inorganic aerosols (SIA) and secondary organic aerosols (SOA), where SIA were assumed to exist in thermodynamic equilibrium with gas phase precursors and were modelled using the ISORROPIA-II model [26] and the SOA were modelled using the Volatility Basis Set approach [27] as documented in [11].
The CTM can be configured to run in either single or two moment configuration, while more detailed aerosol size distribution provided by GLObal Model of Aerosol Processes (GLOMAP) aerosol model [28] is required by the two moment model.Due to the NSW EPA emissions inventory data constraints the CTM was run in single moment model in our study with particle size and chemical speciation information provided for PM 2.5 and PM 10 .
Concentrations of gas and particulate-phase species at the outermost model boundaries were adapted from a global run of the United Kingdom Chemistry and Aerosol scheme (UKCA) for the UK Met Office Unified Model.The chemical boundary conditions for the nested inner domains were then provided by the parent domain CTM runs.

General Guidance
There are a number of performance metrics that can be used to examine model performance of air quality models (e.g., [29,30]).However, there is no universal agreement among the modelling community on the best practice to evaluate model performance.Dennis et al. [13] provided a comprehensive review of methods and tools that are widely applied to regional-scale numerical photochemical modelling evaluation.The general guidance and procedure for regional meteorological and chemical transport models conducted in this study mainly follows the "operational evaluation" proposed in an evaluation framework of Dennis et al. [13].
Firstly, we undertook the traditional metrics-based evaluation, where CCAM-CTM model predictions were compared to observations and the deviations were quantified through statistics.The magnitudes of statistics were then compared with reference criteria to provide a way of characterising the CCAM-CTM modelling system performance.The selected metrics are summarized in Appendix B, which include the mean bias (MB), mean error (ME), normalised mean bias (NMB), normalised mean error (NME), mean fractional bias (MFB), mean fractional error (MFE), root mean square error (RMSE), correlation coefficient (R) and index of agreement (IOA).As discussed in [30], MB and ME are defined as the average difference between all predicted-observed pairs, and the error only includes absolute deviation between the two.The NMB and NME normalize MB and ME by the mean of observations, and they assume the observations are the absolute truth.The NMB ranges from −100% to +∞, while NME is from 0% to +∞, resulting in overpredictions artificially given more weight than underpredictions.The MFB is defined as the bias normalized by the mean of paired predictions and observations, and the MFE can be defined in a similar way.Among the six metrics introduced above, the MFB and MFE are the least biased ones.The IOA is a measure of the ratio of the error magnitudes to the sum of the difference between predicted and observed mean and the difference between observation and the observed mean.The IOA ranges from 0 to 1, where 1 would present a perfect agreement.When comparing seasonal average concentrations of modelled and measured speciated PM 2.5 components, model performance is considered reasonable when they lie within a factor of two of the measured averages.
In addition to traditional measures of metrics, we also conducted a graphical evaluation, which includes comparisons of time series and spatial distribution.The graphical evaluation helps us to visualize and measure how well the model reproduces temporal and spatial variations for various pollutants.The model evaluation in this study focused on the predictions in the inner domain (SYD domain, 3-km × 3-km).The observational datasets used in the evaluation are introduced in the following section.

Observational Datasets Used for Evaluation
Various observational databases were used in this study to evaluate the model performance.The ambient air quality data were provided by the NSW OEH Air Quality Monitoring Network (AQMN) (http://www.environment.nsw.gov.au/topics/air/monitoring-air-quality/).The 18 air quality monitoring stations chosen for the model evaluation (yellow dots in Figure 4) are located in the Sydney East (Chullora, Earlwood, Lindfield, Randwick and Rozelle), Sydney Northwest (Prospect, Richmond, St Marys and Vineyard), Sydney Southwest (Bargo, Bringelly, Liverpool, Macarthur and Oakdale), Illawarra (Albion Park, Kembla Grange and Wollongong) and Newcastle (Newcastle) regions.Ozone (O 3 ), nitrogen oxides (NOx, NO, and NO 2 ), PM 10 , sulphur dioxide (SO 2 ) and carbon monoxide (CO) are continuously monitored at all 18 sites, while PM 2.5 measurements are only available at the following five sites: Chullora, Earlwood, Richmond, Liverpool and Wollongong.The speciated PM 2.5 measurements were carried out at the Westmead sampling site during the Sydney Particle Study.This site is located 26 km west of the Sydney central business district.Reference is made to the measurements from the summer sampling campaign (5 February to 7 March 2011).Samples were collected during two periods each day, a morning sample between 05:00 and 10:00 and an afternoon sample between 11:00 and 19:00 [11].
The meteorological data used for model evaluation were compiled from two measurement networks: weather observations from BOM (http://www.bom.gov.au/nsw/observations/) at Albion Park airport, Badgerys Creek, Bankstown airport, Bellambi, Camden airport, Richmond Royal Australian Air Force (RAAF), Sydney airport and Williamtown RAAF (blue dots in Figure 4), and meteorology parameters such as wind speed, wind direction, air temperature and relative humidity continuously monitored by the NSW OEH air quality monitoring network.
Atmosphere 2018, 9, x FOR PEER REVIEW 8 of 32 In addition to traditional measures of metrics, we also conducted a graphical evaluation, which includes comparisons of time series and spatial distribution.The graphical evaluation helps us to visualize and measure how well the model reproduces temporal and spatial variations for various pollutants.The model evaluation in this study focused on the predictions in the inner domain (SYD domain, 3-km × 3-km).The observational datasets used in the evaluation are introduced in the following section.

Observational Datasets Used for Evaluation
Various observational databases were used in this study to evaluate the model performance.The ambient air quality data were provided by the NSW OEH Air Quality Monitoring Network (AQMN) (http://www.environment.nsw.gov.au/topics/air/monitoring-air-quality/).The 18 air quality monitoring stations chosen for the model evaluation (yellow dots in Figure 4) are located in the Sydney East (Chullora, Earlwood, Lindfield, Randwick and Rozelle), Sydney Northwest (Prospect, Richmond, St Marys and Vineyard), Sydney Southwest (Bargo, Bringelly, Liverpool, Macarthur and Oakdale), Illawarra (Albion Park, Kembla Grange and Wollongong) and Newcastle (Newcastle) regions.Ozone (O3), nitrogen oxides (NOx, NO, and NO2), PM10, sulphur dioxide (SO2) and carbon monoxide (CO) are continuously monitored at all 18 sites, while PM2.5 measurements are only available at the following five sites: Chullora, Earlwood, Richmond, Liverpool and Wollongong.The speciated PM2.5 measurements were carried out at the Westmead sampling site during the Sydney Particle Study.This site is located 26 kilometres west of the Sydney central business district.Reference is made to the measurements from the summer sampling campaign (5 February to 7 March 2011).Samples were collected during two periods each day, a morning sample between 05:00 and 10:00 and an afternoon sample between 11:00 and 19:00 [11].
The meteorological data used for model evaluation were compiled from two measurement networks: weather observations from BOM (http://www.bom.gov.au/nsw/observations/) at Albion Park airport, Badgerys Creek, Bankstown airport, Bellambi, Camden airport, Richmond Royal Australian Air Force (RAAF), Sydney airport and Williamtown RAAF (blue dots in Figure 4), and meteorology parameters such as wind speed, wind direction, air temperature and relative humidity continuously monitored by the NSW OEH air quality monitoring network.

Results and Discussions
To conduct the CCAM-CTM modelling system performance evaluation in a robust way, the meteorological modelling predictions from CCAM must be evaluated first.The performance statistics at each site for the predicted hourly surface temperature, surface winds, and precipitation against observations were investigated and a more detailed evaluation can be found in [31].The main outcomes from the CCAM evaluations show: (1) CCAM generally overpredicts surface temperature ( • C) in both SPS1 and SPS2 periods across OEH and BOM sites.The values of averaged MB and R are 1.14 and 0.92 for SPS1, and 0.93 and 0.82 for SPS2, which show CCAM has a tendency to overpredict temperature in summer (SPS1) rather than in autumn months (SPS2).( 2) CCAM overpredicted surface wind speed (m/s) across OEH and BOM sites in both SPS1 and SPS2 periods.The averaged MB at OEH sites for surface wind predictions is 1.93 (SPS1) to 1.97 (SPS2), and 0.52 (SPS1) to 0.62 (SPS2) at BOM sites.The results show that CCAM tends to predict stronger surface winds at OEH sites compared to that at BOM sites in both summer and autumn months.
The way we conducted CCAM-CTM operational evaluation could generate many graphs and tables, thus we focus on evaluations of model predicted PM 2.5 , O 3 and NO 2 in the main text and leave the evaluations of NO, CO and SO 2 to Appendix C.

PM 2.5
The model predictions of hourly PM 2.5 mass concentration, as a summation of various components, were evaluated against available PM 2.5 observations at Chullora, Earlwood, Richmond, Liverpool and Wollongong.Table 2 presents the quantitative performance statistics at each site for predicted hourly PM 2.5 concentration against observations, along with the mean and standard deviation of predictions and observations for the SPS1 and SPS2 periods.The PM 2.5 is generally overpredicted across Sydney in SPS1, where values of MB, NMB and MFB are 0.58, 11% and 23% at Earlwood; 0.92, 19% and 47% at Richmond; 0.98, 20% and 35% at Liverpool; and 0.56, 10% and 24% at Wollongong, respectively.The only exception is at Chullora, where PM 2.5 is underestimated and values of MB, NMB and MFB are −0.28,−4% and 58%, respectively.The values of ME, NME and MFE across these sites are in the range of 2.54-2.86,65-79% and 46-69%, respectively.Relatively high correlation coefficients (0.50-0.51) and high IOA (0.66) can be found at both Earlwood and Liverpool, while the relatively low MFB and MFE are also found at Earlwood and Liverpool.The overprediction of PM 2.5 concentrations across Sydney is also obvious in SPS2, with the only exception at Liverpool.A lower MFB (compared to that in SPS1) is found at Chullora, Richmond and Liverpool, while a slightly higher MFB is found at Earlwood and Wollongong.A generally higher MFE is found in SPS2 across all sites, except at Chullora.The better correlation and agreement between hourly PM 2.5 predictions and observations in SPS2 is found at Chullora, with the values of R and IOA of 0.50 and 0.67.The worst correlation and agreement occur at Richmond, in terms of the lowest R (0.18) and lowest IOA (0.48).
The bugle plots for hourly PM 2.5 for all predicted/observed pairs for the five air quality monitoring stations during SPS1 and SPS2 are shown in Figure 5. Model performance goal and criteria proposed by Boylan and Russell [30] were used to benchmark PM 2.5 predictions in our study.In Figure 5a, the PM 2.5 predictions for SPS1 (black star) at every site meet the performance criteria for MFB (±60%), and there are two sites (Earlwood and Wollongong) further complying with the performance goal for MFB (±30%).The hourly PM 2.5 predictions at five sites in SPS2 (green triangle) also meet the performance criteria for MFB, and there are four sites (Chullora, Earlwood, Richmond and Liverpool) further complying with the performance goal for MFB.In Figure 5b, the PM 2.5 predictions at every site in SPS1 and SPS2 meet the performance criteria for MFE (75%).However, there are only two sites (Earlwood and Liverpool) that meet the performance goal for MFE (50%) during SPS1 (black star in Figure 5b) and none of the sites in SPS2 comply with the performance goal for MFE.Comparisons of predicted and observed PM 2.5 were further examined using Taylor diagrams, which provide a concise way to summarize some statistical metrics used for model evaluation.The Taylor diagram combining the correlation (i.e., R in Table 2) along with the normalized standardized deviation and centred RMSE is presented in Figure 6 for all sites in SPS1 and SPS2.Most of the sites are characterized with low correlations, along with high normalized standard deviation and high centred RMSE.This reflects one of the difficulties to correctly predict PM 2.5 concentrations without having well captured highly variable emission sources, such as anthropogenic sources including wood smoke, motor vehicles and coal-fired power stations and other industry point sources, as well natural sources sea salt, wind-blown dust and soil [32,33].
Atmosphere 2018, 9, x FOR PEER REVIEW 11 of 32 Comparisons of predicted and observed PM2.5 were further examined using Taylor diagrams, which provide a concise way to summarize some statistical metrics used for model evaluation.The Taylor diagram combining the correlation (i.e., R in Table 2) along with the normalized standardized deviation and centred RMSE is presented in Figure 6 for all sites in SPS1 and SPS2.Most of the sites are characterized with low correlations, along with high normalized standard deviation and high centred RMSE.This reflects one of the difficulties to correctly predict PM2.5 concentrations without having well captured highly variable emission sources, such as anthropogenic sources including wood smoke, motor vehicles and coal-fired power stations and other industry point sources, as well natural sources sea salt, wind-blown dust and soil [32,33].Comparisons of predicted and observed PM2.5 were further examined using Taylor diagrams, which provide a concise way to summarize some statistical metrics used for model evaluation.The Taylor diagram combining the correlation (i.e., R in Table 2) along with the normalized standardized deviation and centred RMSE is presented in Figure 6 for all sites in SPS1 and SPS2.Most of the sites are characterized with low correlations, along with high normalized standard deviation and high centred RMSE.This reflects one of the difficulties to correctly predict PM2.5 concentrations without having well captured highly variable emission sources, such as anthropogenic sources including wood smoke, motor vehicles and coal-fired power stations and other industry point sources, as well natural sources sea salt, wind-blown dust and soil [32,33].Figure 7 shows a time series of predicted and observed hourly PM 2.5 across five sites during the periods of SPS1 and SPS2.CCAM-CTM generally predicted the variations of PM 2.5 for most days at most sites.Overall, the model slightly overpredicted PM 2.5 , however, moderate underpredictions of hourly PM 2.5 peak on the high PM 2.5 days can be found across most sites.For example, the model is not able to capture PM 2.5 peaks on 11 February (SPS1) at Chullora, Richmond and Liverpool; it also has difficulty predicting PM 2.5 peaks on 22 April (SPS2) at Chullora, Earlwood, Richmond, Liverpool, and Wollongong.
Figure 7 shows a time series of predicted and observed hourly PM2.5 across five sites during the periods of SPS1 and SPS2.CCAM-CTM generally predicted the variations of PM2.5 for most days at most sites.Overall, the model slightly overpredicted PM2.5, however, moderate underpredictions of hourly PM2.5 peak on the high PM2.5 days can be found across most sites.For example, the model is not able to capture PM2.5 peaks on 11 February (SPS1) at Chullora, Richmond and Liverpool; it also has difficulty predicting PM2.5 peaks on 22 April (SPS2) at Chullora, Earlwood, Richmond, Liverpool, and Wollongong.(i) (j) In a review of various studies [34], a common trend is found where PM2.5 is overestimated during the winter and underestimated during the summer.The wintertime overestimate of PM2.5 total mass is contributed by overestimates of OC and nitrate, while sulphate and OC contribute most of the summertime PM2.5 underestimate reported in the literature.When some species are overpredicted and some are underpredicted, the evaluation of speciated PM2.5 may provide more insightful information of model performance in addition to an evaluation for total mass of predicted PM2.5.
The spatial distribution of predicted hourly average PM2.5 concentrations for SPS1 and SPS2 are shown in Figure 8a In a review of various studies [34], a common trend is found where PM 2.5 is overestimated during the winter and underestimated during the summer.The wintertime overestimate of PM 2.5 total mass is contributed by overestimates of OC and nitrate, while sulphate and OC contribute most of the summertime PM 2.5 underestimate reported in the literature.When some species are overpredicted and some are underpredicted, the evaluation of speciated PM 2.5 may provide more insightful information of model performance in addition to an evaluation for total mass of predicted PM 2.5 .
The spatial distribution of predicted hourly average PM 2.5 concentrations for SPS1 and SPS2 are shown in Figure 8a,b, and the hourly maximum PM 2.5 concentrations for SPS1 and SPS2 are shown in Figure 8c,d.The areas of elevated PM 2.5 concentrations are found to coincide with populated regions (Sydney, Wollongong and Newcastle) with an average concentration of 6 µg/m 3 and a maximum hourly value of 20 µg/m 3 .Significantly elevated PM 2.5 is also found in the Upper Hunter region, about 200-km northwest of Sydney, which has a northwest-southeast oriented valley.The emissions from open-cut coal mines, coal-fired power stations and agriculture industries are found to have major contributions to high PM 2.5 in this region [21,35].
Atmosphere 2018, 9, x FOR PEER REVIEW 13 of 32 (i) (j) In a review of various studies [34], a common trend is found where PM2.5 is overestimated during the winter and underestimated during the summer.The wintertime overestimate of PM2.5 total mass is contributed by overestimates of OC and nitrate, while sulphate and OC contribute most of the summertime PM2.5 underestimate reported in the literature.When some species are overpredicted and some are underpredicted, the evaluation of speciated PM2.5 may provide more insightful information of model performance in addition to an evaluation for total mass of predicted PM2.5.
The spatial distribution of predicted hourly average PM2.5 concentrations for SPS1 and SPS2 are shown in Figure 8a

O3
The model predictions of hourly O3 concentrations were evaluated against observations.Statistics were calculated separately across the 18 NSW OEH air quality monitoring stations shown in Figure 4.However, to simplify the comparison, a regional average of statistical metrics (as suggested in [12] and done in [36]) was computed across sites in the Sydney East, Sydney Northwest (Sydney NW), Sydney Southwest (Sydney SW), Illawarra and Newcastle regions.
Table 3 presents statistical measures for the predicted hourly O3 concentrations averaged over each region as defined above, along with the mean and standard deviation of predicted values and observations for the SPS1 and SPS2 periods.The CCAM-CTM shows a tendency to underpredict hourly O3 concentrations in SPS1 (summer months of 2011) across all regions, with only one exception for the Illawarra region, where the O3 is overpredicted.The values of MB, NMB and MFB for these regions are in the ranges of −2.31-0.52 ppb, −11-4% and −9-11%, respectively.The values of ME, NME and MFE are in the ranges of 5.57-6.99ppb, 56-73% and 36-48%, respectively.The maximum RMSE across regions is 7.64 ppb.The highest values of correlation coefficient (0.75) and IOA (0.87) are both found in the Sydney NW region, while the lowest value of correlation coefficient (0.51) and IOA (0.64) are in Illawarra.In SPS2 (autumn months of 2012), CCAM-CTM demonstrates greater tendency to underpredict O3 compared to the performance in SPS1.The hourly O3 concentrations are generally underpredicted in the Sydney East, Sydney NW, Sydney SW and Illawarra regions, while O3 is overpredicted in the Newcastle region.The values of MB, NMB and MFB for these regions are in the ranges of −3.07-3.55ppb, −16-29% and −20-45%, respectively.The values of ME, NME and MFE are in the ranges of 6.04-6.98ppb, 61-92% and 59-72%, respectively.The maximum RMSE across all regions is 6.23 ppb.The worst correlation between predictions and observations is found in the Illawarra region, with a correlation coefficient of 0.34 and IOA of 0.55.

O 3
The model predictions of hourly O 3 concentrations were evaluated against observations.Statistics were calculated separately across the 18 NSW OEH air quality monitoring stations shown in Figure 4.However, to simplify the comparison, a regional average of statistical metrics (as suggested in [12] and done in [36]) was computed across sites in the Sydney East, Sydney Northwest (Sydney NW), Sydney Southwest (Sydney SW), Illawarra and Newcastle regions.
Table 3 presents statistical measures for the predicted hourly O 3 concentrations averaged over each region as defined above, along with the mean and standard deviation of predicted values and observations for the SPS1 and SPS2 periods.The CCAM-CTM shows a tendency to underpredict hourly O 3 concentrations in SPS1 (summer months of 2011) across all regions, with only one exception for the Illawarra region, where the O 3 is overpredicted.The values of MB, NMB and MFB for these regions are in the ranges of −2.31-0.52 ppb, −11-4% and −9-11%, respectively.The values of ME, NME and MFE are in the ranges of 5.57-6.99ppb, 56-73% and 36-48%, respectively.The maximum RMSE across regions is 7.64 ppb.The highest values of correlation coefficient (0.75) and IOA (0.87) are both found in the Sydney NW region, while the lowest value of correlation coefficient (0.51) and IOA (0.64) are in Illawarra.In SPS2 (autumn months of 2012), CCAM-CTM demonstrates greater tendency to underpredict O 3 compared to the performance in SPS1.The hourly O 3 concentrations are generally underpredicted in the Sydney East, Sydney NW, Sydney SW and Illawarra regions, while O 3 is overpredicted in the Newcastle region.The values of MB, NMB and MFB for these regions are in the ranges of −3.07-3.55ppb, −16-29% and −20-45%, respectively.The values of ME, NME and MFE are in the ranges of 6.04-6.98ppb, 61-92% and 59-72%, respectively.The maximum RMSE across all regions is 6.23 ppb.The worst correlation between predictions and observations is found in the Illawarra region, with a correlation coefficient of 0.34 and IOA of 0.55.Simon et al. [34] reported RMSEs in the range of 15-20 ppb for hourly O 3 concentrations in most model validation studies, and the RMSEs in our studies are considerably lower.The US EPA recommended benchmarks of MFB and MFE are ±15% and 35% for ozone predictions [12].Accordingly, the hourly O 3 predictions in SPS1 across all regions comply with the benchmark of MFB of ±15%; however, none of the regions comply with the benchmark of MFE of 35%.In SPS2, only the O 3 predictions in Sydney East, Sydney NW and Illawarra regions comply with the benchmark of MFB of ±15%; none of the regions comply the benchmark of MFE of 35%.The implementation of cut-off values for background O 3 is suggested by US EPA [12], which suggests that data under the cut-off values are discarded in the evaluation.Follow this guideline, MFB_15 and MFE_15 with cut-off value of 15 ppb for background O 3 in Sydney [37] were calculated (results shown in Table 3).The values of MFB_15 and MFE_15 across all regions in SPS1 (SPS2) are in the range from −17% to −26% (−9-−48%) and from 26% to 38% (27-53%).
There is an increase in MFB_15 and a decrease in MFE_15 compared to the values of the original MFB and MFE.Based on the values of MFB_15, none of the regions would comply with the benchmark of MFB (±15%); however, based on the values of MFE_15, O 3 predictions in the Sydney East (SPS1), Illawarra (SPS1 and SPS2) and Newcastle (SPS1 and SPS2) regions comply with the benchmark of MFE (35%) and predictions in the Sydney NW (SPS1) and Sydney SW (SPS1) regions are very close.
The predicted and observed O 3 presented in the Taylor diagram (Figure 9) generally shows a more homogeneous pattern across the sites compared to the PM 2.5 predictions shown in Figure 6.The correlations range 0.4-0.7,however, the model performance in O 3 predictions for SPS1 (summer months) are slightly better than that in SPS2 (autumn months) due to higher correlations, lower normalized standard deviation and lower centred RMSE.
Atmosphere 2018, 9, x FOR PEER REVIEW 16 of 32 Simon et al. [34] reported RMSEs in the range of 15-20 ppb for hourly O3 concentrations in most model validation studies, and the RMSEs in our studies are considerably lower.The US EPA recommended benchmarks of MFB and MFE are ±15% and 35% for ozone predictions [12].Accordingly, the hourly O3 predictions in SPS1 across all regions comply with the benchmark of MFB of ±15%; however, none of the regions comply with the benchmark of MFE of 35%.In SPS2, only the O3 predictions in Sydney East, Sydney NW and Illawarra regions comply with the benchmark of MFB of ±15%; none of the regions comply the benchmark of MFE of 35%.The implementation of cut-off values for background O3 is suggested by US EPA [12], which suggests that data under the cut-off values are discarded in the evaluation.Follow this guideline, MFB_15 and MFE_15 with cut-off value of 15 ppb for background O3 in Sydney [37] were calculated (results shown in Table 3).The values of MFB_15 and MFE_15 across all regions in SPS1 (SPS2) are in the range from −17% to −26% (−9-−48%) and from 26% to 38% (27-53%).There is an increase in MFB_15 and a decrease in MFE_15 compared to the values of the original MFB and MFE.Based on the values of MFB_15, none of the regions would comply with the benchmark of MFB (±15%); however, based on the values of MFE_15, O3 predictions in the Sydney East (SPS1), Illawarra (SPS1 and SPS2) and Newcastle (SPS1 and SPS2) regions comply with the benchmark of MFE (35%) and predictions in the Sydney NW (SPS1) and Sydney SW (SPS1) regions are very close.
The predicted and observed O3 presented in the Taylor diagram (Figure 9) generally shows a more homogeneous pattern across the sites compared to the PM2.5 predictions shown in Figure 6.The correlations range 0.4-0.7,however, the model performance in O3 predictions for SPS1 (summer months) are slightly better than that in SPS2 (autumn months) due to higher correlations, lower normalized standard deviation and lower centred RMSE.Unlike what we found for predicted PM 2.5 , predicted O 3 has a larger seasonal dependency.The higher O 3 concentrations due to increased photochemical production are seen in both observations and predictions in the summer months (SPS1).The model is able to reproduce most of the high ozone events in SPS1 and also captures the overall concentration variation in the autumn months (SPS2).However, the model tends to underpredict the daily maximum hourly ozone concentrations that occur at 15:00 (AEST) during most cases.The inaccurate predictions of peak O 3 concentrations are attributed to the combination of several factors, including the uncertainties in the NOx and VOCs emission estimations, which are the precursors of O 3 [38], and the overpredicted surface wind speeds in the CCAM [31] that may also contribute to the underpredicted O 3 concentrations.region; Richmond, Bringelly, Wollongong and Newcastle represent the Sydney NW, Sydney SW, Illawarra and Newcastle regions, respectively.CCAM-CTM generally captures the diurnal variations of O3 for most days at most sites with a slight underestimation.Unlike what we found for predicted PM2.5, predicted O3 has a larger seasonal dependency.The higher O3 concentrations due to increased photochemical production are seen in both observations and predictions in the summer months (SPS1).The model is able to reproduce most of the high ozone events in SPS1 and also captures the overall concentration variation in the autumn months (SPS2).However, the model tends to underpredict the daily maximum hourly ozone concentrations that occur at 15:00 (AEST) during most cases.The inaccurate predictions of peak O3 concentrations are attributed to the combination of several factors, including the uncertainties in the NOx and VOCs emission estimations, which are the precursors of O3 [38], and the overpredicted surface wind speeds in the CCAM [31] that may also contribute to the underpredicted O3 concentrations.The spatial distribution of predicted average hourly O3 and average daily maximum O3 concentrations for the SPS1 and SPS2 periods are shown in Figure 11.In the summer months during SPS1 (Figure 11a), areas of higher average ozone concentrations (18 ppb) are found over the Blue Mountains National Park (100 km west of Sydney) as well as the Wollemi National Park (200 km northwest of Sydney).In the autumn months during SPS2, generally lower than average ozone concentrations are found across Sydney (Figure 11b) compared to that in SPS1.It should be noted that an area of elevated ozone is found over the ocean east of Sydney in both SPS1 and SPS2.There are no surface ozone observations over the ocean to compare with at this stage, however, the possible high ozone may be due to lack of local sources of NOx over the ocean, which leads to less surface deposition.The spatial distribution of O3 daily maximum concentrations averaged for the period of SPS1 and SPS2 are illustrated in Figure 11c,d.It clearly demonstrates the higher peak hourly ozone tend to occur in the Sydney Northwest region and the Wollemi National Park region during summer months (Figure 11c).Peak hourly ozone significantly decreases during autumn months (Figure 11d), however, the reginal maximum still can be found in the Sydney NW region.The spatial distribution of predicted average hourly O 3 and average daily maximum O 3 concentrations for the SPS1 and SPS2 periods are shown in Figure 11.In the summer months during SPS1 (Figure 11a), areas of higher average ozone concentrations (18 ppb) are found over the Blue Mountains National Park (100 km west of Sydney) as well as the Wollemi National Park (200 km northwest of Sydney).In the autumn months during SPS2, generally lower than average ozone concentrations are found across Sydney (Figure 11b) compared to that in SPS1.It should be noted that an area of elevated ozone is found over the ocean east of Sydney in both SPS1 and SPS2.There are no surface ozone observations over the ocean to compare with at this stage, however, the possible high ozone may be due to lack of local sources of NOx over the ocean, which leads to less surface deposition.The spatial distribution of O 3 daily maximum concentrations averaged for the period of SPS1 and SPS2 are illustrated in Figure 11c,d.It clearly demonstrates the higher peak hourly ozone tend to occur in the Sydney Northwest region and the Wollemi National Park region during summer months (Figure 11c).Peak hourly ozone significantly decreases during autumn months (Figure 11d), however, the reginal maximum still can be found in the Sydney NW region.The spatial distribution of predicted average hourly O3 and average daily maximum O3 concentrations for the SPS1 and SPS2 periods are shown in Figure 11.In the summer months during SPS1 (Figure 11a), areas of higher average ozone concentrations (18 ppb) are found over the Blue Mountains National Park (100 km west of Sydney) as well as the Wollemi National Park (200 km northwest of Sydney).In the autumn months during SPS2, generally lower than average ozone concentrations are found across Sydney (Figure 11b) compared to that in SPS1.It should be noted that an area of elevated ozone is found over the ocean east of Sydney in both SPS1 and SPS2.There are no surface ozone observations over the ocean to compare with at this stage, however, the possible high ozone may be due to lack of local sources of NOx over the ocean, which leads to less surface deposition.The spatial distribution of O3 daily maximum concentrations averaged for the period of SPS1 and SPS2 are illustrated in Figure 11c,d.It clearly demonstrates the higher peak hourly ozone tend to occur in the Sydney Northwest region and the Wollemi National Park region during summer months (Figure 11c).Peak hourly ozone significantly decreases during autumn months (Figure 11d), however, the reginal maximum still can be found in the Sydney NW region.

NO2
The model predictions of hourly NO2 concentrations were evaluated against observations.Table 4 presents the regional averaged quantitative performance statistics for predicted hourly NO2 concentrations against observations, along with the mean and standard deviation of predictions and observations for periods of SPS1 and SPS2 periods.Generally, the hourly NO2 concentrations are underpredicted across all regions in both SPS1 and SPS2.In SPS1, the bias, in terms of MB, NMB and MFB, are smallest in the Sydney East region, with values of −1.11, −13% and −10%, respectively; and are largest in the Newcastle region, with values of −2.26, −46% and −54%, respectively.The values of ME, NME and MFE are in the ranges of 2.76-4.03ppb, 60-87% and 60-82%, respectively.The maximum RMSE across all regions is 3.87 ppb.A better correlation between NO2 predictions and observations is in the Sydney East, Sydney SW and Newcastle regions, in terms of higher correlation coefficients (0.49, 0.48 and 0.59, respectively) and higher IOA (0.71, 0.70, and 0.68, respectively).In SPS2, similar to what we found in SPS1, the smallest biases are seen in the Sydney East region with MB, NMB and MFB of −0.4,−1% and −6%, respectively; and the largest biases are found in the Newcastle region.The values of ME, NME and MFE are in the ranges of 4.22-6.63ppb, 68-99% and 54-80%, respectively.The value of maximum RMSE across regions is 7.77 ppb.The best correlation between predictions and observations is seen in the Illawarra region, with correlation coefficient of 0.54 and an IOA of 0.72, while the worst correlation is found in the Sydney NW region, with correlation coefficient of 0.35 and an IOA of 0.62.The statistics for NO2 show a larger bias and error compared to the statistics calculated for O3 (Table 3) due to the higher sensitivity of NO2 predictions to uncertainties and errors in the emissions and meteorology, similar to [38].While there are no benchmarks available for NO2 validation, the MFE and NME of NO2 from our study fall well within similar ranges inferred from [39].
Comparisons of predicted and observed NO2 were further examined using Taylor diagram, as presented in Figure 12.For all sites in both periods, the correlations fall within a range of 0.2-0.6.The higher correlations along with lower normalized standard deviation and lower centred RMSE indicates that CCAM-CTM tends to predict NO2 in the summer months (SPS1) better than in the autumn months (SPS2).

NO 2
The model predictions of hourly NO 2 concentrations were evaluated against observations.Table 4 presents the regional averaged quantitative performance statistics for predicted hourly NO 2 concentrations against observations, along with the mean and standard deviation of predictions and observations for periods of SPS1 and SPS2 periods.Generally, the hourly NO 2 concentrations are underpredicted across all regions in both SPS1 and SPS2.In SPS1, the bias, in terms of MB, NMB and MFB, are smallest in the Sydney East region, with values of −1.11, −13% and −10%, respectively; and are largest in the Newcastle region, with values of −2.26, −46% and −54%, respectively.The values of ME, NME and MFE are in the ranges of 2.76-4.03ppb, 60-87% and 60-82%, respectively.The maximum RMSE across all regions is 3.87 ppb.A better correlation between NO 2 predictions and observations is in the Sydney East, Sydney SW and Newcastle regions, in terms of higher correlation coefficients (0.49, 0.48 and 0.59, respectively) and higher IOA (0.71, 0.70, and 0.68, respectively).In SPS2, similar to what we found in SPS1, the smallest biases are seen in the Sydney East region with MB, NMB and MFB of −0.4,−1% and −6%, respectively; and the largest biases are found in the Newcastle region.The values of ME, NME and MFE are in the ranges of 4.22-6.63ppb, 68-99% and 54-80%, respectively.The value of maximum RMSE across regions is 7.77 ppb.The best correlation between predictions and observations is seen in the Illawarra region, with correlation coefficient of 0.54 and an IOA of 0.72, while the worst correlation is found in the Sydney NW region, with correlation coefficient of 0.35 and an IOA of 0.62.The statistics for NO 2 show a larger bias and error compared to the statistics calculated for O 3 (Table 3) due to the higher sensitivity of NO 2 predictions to uncertainties and errors in the emissions and meteorology, similar to [38].While there are no benchmarks available for NO 2 validation, the MFE and NME of NO 2 from our study fall well within similar ranges inferred from [39].
Comparisons of predicted and observed NO 2 were further examined using Taylor diagram, as presented in Figure 12.For all sites in both periods, the correlations fall within a range of 0.2-0.6.The higher correlations along with lower normalized standard deviation and lower centred RMSE indicates that CCAM-CTM tends to predict NO 2 in the summer months (SPS1) better than in the autumn months (SPS2).Figure 13 shows a time series of predicted and observed hourly NO2 at selected OEH sites, which represent different regions.The model generally captures the diurnal variations and magnitudes well at most sites most of time, with a tendency to underestimate NO2 across all regions.However, the model tends to overpredict NO2 peaks at Bringelly (Sydney SW region), Chullora and Randwick (Sydney East region) during SPS2 (Figure 13g-i) and underpredict NO2 peaks at Richmond (Sydney NW region), Wollongong (Illawarra region) and Newcastle (Newcastle region) (Figure 13j-l).The inaccuracies in the model NO2 predictions are assumed to be highly associated with the uncertainly in the NO2 emission estimations, the main contributions being from on-road motor vehicles and industrial sources in the NSW GMR [21].Figure 13 shows a time series of predicted and observed hourly NO 2 at selected OEH sites, which represent different regions.The model generally captures the diurnal variations and magnitudes well at most sites most of time, with a tendency to underestimate NO 2 across all regions.However, the model tends to overpredict NO 2 peaks at Bringelly (Sydney SW region), Chullora and Randwick (Sydney East region) during SPS2 (Figure 13g-i) and underpredict NO 2 peaks at Richmond (Sydney NW region), Wollongong (Illawarra region) and Newcastle (Newcastle region) (Figure 13j-l).The inaccuracies in the model NO 2 predictions are assumed to be highly associated with the uncertainly in the NO 2 emission estimations, the main contributions being from on-road motor vehicles and industrial sources in the NSW GMR [21].Figure 13 shows a time series of predicted and observed hourly NO2 at selected OEH sites, which represent different regions.The model generally captures the diurnal variations and magnitudes well at most sites most of time, with a tendency to underestimate NO2 across all regions.However, the model tends to overpredict NO2 peaks at Bringelly (Sydney SW region), Chullora and Randwick (Sydney East region) during SPS2 (Figure 13g-i) and underpredict NO2 peaks at Richmond (Sydney NW region), Wollongong (Illawarra region) and Newcastle (Newcastle region) (Figure 13j-l).The inaccuracies in the model NO2 predictions are assumed to be highly associated with the uncertainly in the NO2 emission estimations, the main contributions being from on-road motor vehicles and industrial sources in the NSW GMR [21].The spatial distribution of predicted hourly average NO2 concentrations in SPS1 and SPS2 periods are shown in Figure 14.Significant elevated NO2 concentrations are found over the populated Sydney East region as well as in the Upper Hunter during both periods.The model results also show The spatial distribution of predicted hourly average NO 2 concentrations in SPS1 and SPS2 periods are shown in Figure 14.Significant elevated NO 2 concentrations are found over the populated Sydney East region as well as in the Upper Hunter during both periods.The model results also show higher NO 2 concentrations predicted in the autumn months in SPS2 (Figure 14b).These spatial distributions are consistent with those seen in the Ozone Monitoring Instrument (OMI) satellite observations of tropospheric NO 2 columns during the periods of SPS1 and SPS2, as shown in Figure 15.
higher NO2 concentrations predicted in the autumn months in SPS2 (Figure 14b).These spatial distributions are consistent with those seen in the Ozone Monitoring Instrument (OMI) satellite observations of tropospheric NO2 columns during the periods of SPS1 and SPS2, as shown in Figure 15.

PM2.5 Components
During the Sydney Particle Study, 60 samples were collected for the measurement of aerosol chemical composition, 30 of these in the mornings and 30 samples in the afternoons [11].This number of samples is too limited to support detailed performance statistics.Instead, model results for these periods were extracted and averaged over the full sampling season for comparison with the coinciding measurements, with a focus on the summer campaign (SPS1).Model performance is considered reasonable for modelled averages that are within a factor of two of the measured averages.The summer observation campaign identified sea salt (sodium, chloride and magnesium as marker species) and primary and secondary organic matter (organic carbon) as being the major components

PM2.5 Components
During the Sydney Particle Study, 60 samples were collected for the measurement of aerosol chemical composition, 30 of these in the mornings and 30 samples in the afternoons [11].This number of samples is too limited to support detailed performance statistics.Instead, model results for these periods were extracted and averaged over the full sampling season for comparison with the coinciding measurements, with a focus on the summer campaign (SPS1).Model performance is considered reasonable for modelled averages that are within a factor of two of the measured averages.The summer observation campaign identified sea salt (sodium, chloride and magnesium as marker species) and primary and secondary organic matter (organic carbon) as being the major components

PM 2.5 Components
During the Sydney Particle Study, 60 samples were collected for the measurement of aerosol chemical composition, 30 of these in the mornings and 30 samples in the afternoons [11].This number of samples is too limited to support detailed performance statistics.Instead, model results for these periods were extracted and averaged over the full sampling season for comparison with the coinciding measurements, with a focus on the summer campaign (SPS1).Model performance is considered reasonable for modelled averages that are within a factor of two of the measured averages.The summer observation campaign identified sea salt (sodium, chloride and magnesium as marker species) and primary and secondary organic matter (organic carbon) as being the major components of PM 2.5 , with secondary inorganic aerosol (sulphate, ammonium, and nitrate), soil and elemental carbon also present in significant amounts.
The contribution of modelled and observed PM 2.5 components are shown in Figure 16 and concentrations provided in Table 5.Only measured components of PM 2.5 that correspond to modelled species are shown.The model does not explicitly predict concentrations of all measured species, accounting for several species as lumped species (e.g., dust).The components shown are sodium, chloride and magnesium (all components of sea salt), ammonium, sulphate, nitrate, elemental carbon and organic matter.
Atmosphere 2018, 9, x FOR PEER REVIEW 24 of 32 of PM2.5, with secondary inorganic aerosol (sulphate, ammonium, and nitrate), soil and elemental carbon also present in significant amounts.The contribution of modelled and observed PM2.5 components are shown in Figure 16 and concentrations provided in Table 5.Only measured components of PM2.5 that correspond to modelled species are shown.The model does not explicitly predict concentrations of all measured species, accounting for several species as lumped species (e.g., dust).The components shown are sodium, chloride and magnesium (all components of sea salt), ammonium, sulphate, nitrate, elemental carbon and organic matter.Organic matter, sodium and the secondary inorganic nitrate and sulphate components are reasonably predicted.Elemental carbon, magnesium, chloride and ammonium are underpredicted, with the predicted mass approximating only about a third of the observed mass in most cases and less than this in the case of ammonium.Results indicate that that the model underestimates the fractional sea salt contribution, and that sources of elemental carbon and ammonia are not fully accounted for in the model.

Conclusions
Performance evaluation and benchmarking of the coupled CCAM-CTM modelling system for the NSW GMR was conducted based on modelling results for two periods coinciding with measurement campaigns undertaken during the Sydney Particle Study.Model predictions of PM2.5, O3 and NO2 were evaluated against air quality data from the NSW OEH air quality monitoring network.The main findings of this evaluation study are summarized in the following.


The model generally slightly overpredicts PM2.5 with NMB less than 20%, however, moderate  Organic matter, sodium and the secondary inorganic nitrate and sulphate components are reasonably predicted.Elemental carbon, magnesium, chloride and ammonium are underpredicted, with the predicted mass approximating only about a third of the observed mass in most cases and less than this in the case of ammonium.Results indicate that that the model underestimates the fractional sea salt contribution, and that sources of elemental carbon and ammonia are not fully accounted for in the model.

Conclusions
Performance evaluation and benchmarking of the coupled CCAM-CTM modelling system for the NSW GMR was conducted based on modelling results for two periods coinciding with measurement campaigns undertaken during the Sydney Particle Study.Model predictions of PM 2.5 , O 3 and NO 2 were evaluated against air quality data from the NSW OEH air quality monitoring network.The main findings of this evaluation study are summarized in the following.

•
The model generally slightly overpredicts PM 2.5 with NMB less than 20%, however, moderate underpredictions of the daily peak are found on high PM 2.5 days.The PM 2.5 predictions at all sites comply with the performance criteria for MFB (±60%) for both periods; predictions at Earlwood and Wollongong in the summer months (SPS1), as well as Chullora, Earlwood, Richmond and Liverpool in the autumn months (SPS2) further comply with the performance goal for MFB (±30%).

•
In terms of O 3 , the model generally captures the diurnal variations with a slight underestimation.
The model reproduces most of the high ozone events in SPS1 and captures the diurnal concentration variation in SPS2.However, the model tends to underpredict daily maximum hourly ozone.O 3 predictions across all regions in SPS1, as well as in Sydney East, Sydney Northwest and Illawarra regions in SPS2, comply with the benchmark of MFB (±15%); however, none of the regions comply with the benchmark for MFE (35%).

•
For NO 2 , the model reproduces the diurnal variations well, with a tendency for underestimation across all regions.A better correlation between NO 2 predictions and observations is found across Sydney East, Sydney Southwest, Newcastle and Illawarra regions, while poorer correlations are found for the Sydney Northwest region.Although no benchmarks were identified from the literature for NO 2 validation, the MFE and NME for NO 2 predictions fall within the ranges inferred from other studies.

•
For PM 2.5 species, model results are within a factor of two of measured averages for sulphate, nitrate, sodium and organic matter, with underpredictions of elemental carbon, chloride, magnesium and ammonium.Results indicate the model underestimates the sea salt contribution, and that sources of elemental carbon and ammonia are not fully accounted.
The overall performance of the CCAM-CTM modelling system for the NSW GMR is comparable to similar model predictions by other regional airshed modelling documented in the literature, although the performance characteristics are found to be variable according to criteria chosen and to depend on the location of the sites, as well as the time of the year.Reasons for these discrepancies have not been clearly identified yet, due to the limitations in the operational model performance evaluation approach described in [13].The detailed statistics reported in this study not only benchmark the overall model performance, but also identify several areas of potential improvements for CCAM-CTM modelling in the future including: (1) a better capture of highly variable emissions, in terms of magnitude and spatial distributions for sources such as motor vehicles, residential wood heaters and industrial sources; (2) a "diagnostic" evaluation to characterise the response of regional elevated ozone concentrations to changes in NOx and VOC emissions; and (3) a sensitivity analysis with various surface roughness scheme to understand the CCAM wind predictions.Based on this benchmarking exercise, the application of the CCAM-CTM modelling system for the NSW GMR to assist air policy development and air quality management in NSW is supported.
. CCAM-CTM modelling was undertaken using four nested domains, comprising the outermost Australian domain (AUS) at 80-km × 80-km resolution (75 × 65 grid cells), the New South Wales domain (NSW) at 27-km × 27-km (60 × 60 grid cells), the Greater Metropolitan Region domain (GMR) at 9-km × 9-km resolution (60 × 60 grid cells) and the innermost Sydney domain (SYD) at 3-km × 3-km resolution (60 × 60 grid cells).Model domain configuration is shown in Figure 2. CCAM-CTM was run for two periods coinciding with the Sydney Particle Study measurement campaigns, which were Stage I (hereafter, SPS1) in the summer (from 5 February to 7 March 2011) and Stage II (hereafter, SPS2) in the autumn (from 16 April to 14 May 2012), plus threeday spin-up prior to each experiment.

Figure 1 .
Figure 1.Schematic diagram of the CCAM-CTM modelling system.Figure 1.Schematic diagram of the CCAM-CTM modelling system.

Figure 1 .
Figure 1.Schematic diagram of the CCAM-CTM modelling system.Figure 1.Schematic diagram of the CCAM-CTM modelling system.

Figure 4 .
Figure 4. Location of 18 OEH air quality monitoring stations (yellow dots) and six BOM weather stations (blue dots) used for model performance evaluation.All stations are within the innermost Sydney domain (SYD) used for CCAM-CTM simulation with a horizontal resolution of 3 km with two exceptions of OEH Newcastle station and BOM Williamtown RAAF.

Figure 4 .
Figure 4. Location of 18 OEH air quality monitoring stations (yellow dots) and six BOM weather stations (blue dots) used for model performance evaluation.All stations are within the innermost Sydney domain (SYD) used for CCAM-CTM simulation with a horizontal resolution of 3 km with two exceptions of OEH Newcastle station and BOM Williamtown RAAF.

Figure 6 .
Figure 6.Taylor diagram of hourly PM2.5 for all predicted/observed pairs of values at Chullora, Earlwood, Richmond, Liverpool and Wollongong stations for SPS1 (red dot) and SPS2 (green dot).

Figure 5 .
Figure 5. Bugle plots of (a) mean fractional bias (MFB) and (b) mean fractional error (MFE) for PM 2.5 for all predicted/observed pairs values at Chullora, Earlwood, Richmond, Liverpool and Wollongong stations for SPS1 (black star) and SPS2 (green triangle).

Figure 6 .
Figure 6.Taylor diagram of hourly PM2.5 for all predicted/observed pairs of values at Chullora, Earlwood, Richmond, Liverpool and Wollongong stations for SPS1 (red dot) and SPS2 (green dot).

Figure 6 .
Figure 6.Taylor diagram of hourly PM 2.5 for all predicted/observed pairs of values at Chullora, Earlwood, Richmond, Liverpool and Wollongong stations for SPS1 (red dot) and SPS2 (green dot).

Figure 8 .
Figure 8. Spatial distribution of PM 2.5 concentrations (µg/m 3 ) predicted by CCAM-CTM with GMR 9-km horizontal resolution domain.The hourly average PM 2.5 concentrations for the period of (a) SPS1 and (b) SPS2; and the hourly maximum PM 2.5 concentrations for the period of (c) SPS1 and (d) SPS2.Red cross signs indicate locations of OEH stations where PM 2.5 measurements are available.

Figure 9 .
Figure 9. Tayler diagram of hourly O3 for all predicted/observed pairs of values at 18 OEH stations for SPS1 (red dot) and SPS2 (green dot).

Figure 10
Figure10shows a time series of predicted and observed hourly O3 concentrations at selected OEH sites that represent different regions.Chullora and Randwick both represent the Sydney East

Figure 9 .
Figure 9. Tayler diagram of hourly O 3 for all predicted/observed pairs of values at 18 OEH stations for SPS1 (red dot) and SPS2 (green dot).

Figure 10
Figure 10 shows a time series of predicted and observed hourly O 3 concentrations at selected OEH sites that represent different regions.Chullora and Randwick both represent the Sydney East region; Richmond, Bringelly, Wollongong and Newcastle represent the Sydney NW, Sydney SW,

Figure 10 .
Figure 10.The time series of predicted (red line) and observed (blue hollow dot) hourly O 3 (ppb) at selected six OEH stations during: SPS1 (a-f); and SPS2 (g-l).

Figure 11 .
Figure 11.Spatial distribution of O 3 concentrations (ppb) predicted by CCAM-CTM with GMR 9-km horizontal resolution domain.The hourly average O 3 concentrations for the period of (a) SPS1 and (b) SPS2; and the average of O 3 daily maximums for the period of (c) SPS1 and (d) SPS2.Red cross signs indicate locations of OEH stations.

Figure 12 .
Figure 12.Tayler diagram of hourly NO2 for all predicted/observed pairs of values at 18 OEH stations for SPS1 (red dot) and SPS2 (green dot).

Figure 12 .
Figure 12.Tayler diagram of hourly NO 2 for all predicted/observed pairs of values at 18 OEH stations for SPS1 (red dot) and SPS2 (green dot).

Figure 14 .Figure 15 .
Figure 14.Spatial distribution of NO2 concentrations (ppb) predicted by CCAM-CTM with GMR 9km horizontal resolution domain.The hourly average NO2 concentrations for the period of: (a) SPS1; and (b) SPS2.Red cross signs indicate locations of OEH stations.

Figure 14 .Figure 14 .Figure 15 .
Figure 14.Spatial distribution of NO 2 concentrations (ppb) predicted by CCAM-CTM with GMR 9-km horizontal resolution domain.The hourly average NO 2 concentrations for the period of: (a) SPS1; and (b) SPS2.Red cross signs indicate locations of OEH stations.

Figure 16 .
Figure 16.Modelled and observed average concentrations of PM2.5 species at Westmead for SPS1.

Figure 16 .
Figure 16.Modelled and observed average concentrations of PM 2.5 species at Westmead for SPS1.

Table 1 .
The 16 major source groups segregated from the 2008 NSW GMR emission inventory.

Table 2 .
Quantitative performance statistics for predicted hourly PM 2.5 concentration against observations at Chullora, Earlwood, Richmond, Liverpool and Wollongong for the periods of SPS1 and SPS2.

Table 3 .
Quantitative performance statistics for predicted hourly O 3 concentration against observations in the Sydney East, Sydney NW, Sydney SW, Illawarra and Newcastle regions for the periods of SPS1 and SPS2.
* MFB _15 and MFE_15 are statistics calculated with implementations of cut-off concentrations of 15 ppb for background ozone.

Table 4 .
Quantitative performance statistics for predicted hourly NO 2 concentration against observations in the Sydney East, Sydney NW, Sydney SW, Illawarra and Newcastle regions for the periods of SPS1 and SPS2.

Table 5 .
Comparison of modelled and measured average concentrations of PM2.5 speciated components measured at Westmead during SPS1.

Table 5 .
Comparison of modelled and measured average concentrations of PM 2.5 speciated components measured at Westmead during SPS1.