A Novel Flexible Geographically Weighted Neural Network for High-Precision PM2.5 Mapping across the Contiguous United States

: Air quality degradation has triggered a large-scale public health crisis globally. Existing machine learning techniques have been used to attempt the remote sensing estimates of PM2.5. However, many machine learning models ignore the spatial non-stationarity of predictive variables. To address this issue, this study introduces a Flexible Geographically Weighted Neural Network (FGWNN) to estimate PM2.5 based on multi-source remote sensing data. FGWNN incorporates the Flexible Geographical Neuron (FGN) and Geographical Activation Function (GWAF) within the framework of Artificial Neural Network (ANN) to capture the intricate spatial non-stationary relationships among predictive variables. A robust air quality remote sensing estimation model was constructed using remote sensing data of Aerosol Optical Depth (AOD), Normalized Difference Vegetation Index (NDVI), Temperature (TMP), Specific Humidity (SPFH), Wind Speed (WIND), and Terrain Elevation (HGT) as inputs, and Ground-Based PM2.5 as the observation. The results indicated that FGWNN successfully generates PM2.5 remote sensing data with a 2.5 km spatial resolution for the contiguous United States (CONUS) in 2022. It exhibits higher regression accuracy compared to traditional ANN and Geographically Weighted Regression (GWR) models. FGWNN holds the potential for applications in high-precision and high-resolution remote sensing scenarios.


Introduction
Atmospheric pollution is primarily composed of PM2.5 particles, which can persist in the atmosphere and exert widespread and profound impacts on human health and the environment [1].Environmental remote sensing provides a means for globally, continuously, and in real-time retrieving PM2.5 concentrations [2].The PM2.5 data from the United States (U.S.) Environmental Protection Agency (EPA), as a product of the U.S. ground-based air quality monitoring network, have undergone rigorous quality validation [3], and are widely utilized in environmental science [4], atmospheric science [5], public health [6], disaster management [7], and other fields.This dataset furnishes researchers and policymakers with crucial data support, aiding in our better understanding and addressing of atmospheric pollution issues [8].Nevertheless, existing remote sensing-derived PM2.5 products commonly suffer from low spatial resolution, failing to delineate local details [9].To overcome this limitation, acquiring high-resolution PM2.5 spatial distribution data is of paramount importance for the dynamic monitoring and control of atmospheric PM2.5 pollution [10][11][12].
In the task of retrieving PM2.5 concentrations, the selection of predictive factors is a critical step.The choice of predictors should be based on atmospheric physical and chemical processes, ecological environment quality, meteorological factors, and geographic ISPRS Int.J. Geo-Inf.2024, 13, 217 2 of 19 information.Common parameters include Aerosol Optical Depth (AOD), Normalized Difference Vegetation Index (NDVI), meteorological conditions, and topographic features [13].AOD provides global atmospheric optical information and is a significant indicator of the atmospheric physical and chemical evolution of air pollutants [14].Numerous studies have demonstrated a significant correlation between AOD and surface PM2.5 concentration, making AOD one of the most reliable explanatory factors in PM2.5 prediction [15][16][17].NDVI reflects vegetation health, land use changes, and ecosystem productivity, serving as an essential measure of ecological environment quality.Studies have shown that NDVI significantly impacts PM2.5 concentrations by reducing dust, adsorbing particles, improving microclimate conditions, reducing pollution sources, and enhancing ecosystem purification functions [18].Meteorological conditions actively influence the dispersion, dilution, and deposition of pollutants, significantly affecting the spatiotemporal distribution of PM2.5 concentrations [19].Topographic features alter atmospheric morphology by adjusting air flow, forming the temperature inversion layers, influencing meteorological conditions, and creating urban heat island effects, indirectly affecting the spatial distribution of PM2.5 concentrations [20].
The estimation of PM2.5 through remote sensing involves methods such as AOD inversion [21], atmospheric chemical transport models [22], spatiotemporal interpolation [23], data assimilation [24], among others.The most widely used parameter is the satellite-monitored AOD.To estimate ground-level PM2.5 from AOD, a typical strategy is to establish the statistical relationship between AOD and PM2.5 [25].The accuracy of these methods is constrained by the number of monitoring stations, remote sensing data resolution, and model quality [26], and a comprehensive approach incorporating multiple data sources and methods is often necessary to enhance inversion accuracy.However, the reality is that traditional spatial statistical tools tend to focus on detecting spatial relationships in sample data [27], and when the spatial density and uniformity of sampling points are insufficient, the estimation accuracy and confidence significantly decrease [28].The air quality products derived from ground-based stations can solve these problems.
In recent years, new research has emerged in which geostatistical tools and machine learning methods are used for PM2.5 inversion.Scholars have designed geographically weighted regression (GWR) models [29,30] and mixed-effects models [31] for detecting geographical relationships between PM2.5 and data such as AOD, meteorological parameters, and land use information.The novel convolutional neural network (CNN) model can utilize the spatial correlation between predictor variables to increase the ground-level PM2.5 estimation accuracy to some extent [32].Combining AOD and big data, the PM2.5 regression model using the random forest algorithm can assess the risk of air pollution exposure in the Yangtze River Delta urban agglomeration region during COVID-19 [33].These inverse models are not perfectly compatible with nonlinear fitting and spatial relationship detection.Traditional geostatistical models cannot fit complex nonlinear relationships, while machine learning methods cannot express spatial non-stationarity.
As studies on PM2.5 spatial patterns increase, various machine learning-related methods (e.g., CNN, Artificial Neural Network (ANN) and Generalized Regression Neural Network (GRNN)) have gradually been introduced.To more accurately calculate geographically weighted kernels, the Geographically Neural Network Weighted Regression (GNNWR) innovatively combines Ordinary Least Squares (OLS) and neural networks to successfully estimate complex geographical processes [34].In addition to spatial relationships, temporal series are also important research objects in the field of GWR.The Geographically and Temporally Weighted Neural Network (GTWNN) accounts for both spatial and temporal non-stationarity and has been applied in high-precision crop yield prediction modeling [35].To address nonlinearity and spatiotemporal heterogeneity, researchers have proposed another GTWNN using GRNN, which shows a superior performance in exploring the spatiotemporal relationship between AOD and PM2.5 [36].However, these GWR-ANN methods mainly focus on improving the accuracy of regression relationships without considering the impact of training samples on the accuracy of spatial dependence, resulting in a certain degree of discount in predictive performance.
Many studies have used GWR or neural networks for PM2.5 inversion, with their data processing methods being essentially similar.When faced with imperfect training samples, the common approach is to first train the optimal regression model [37].If the prediction samples are also not ideal, one can choose to enhance the density of prediction samples using interpolation techniques, effectively filling the entire target resolution space, or proceed without any further adjustments [38].Finally, the prediction samples are input into the regression model to obtain the prediction results.If non-ideal prediction samples are left unprocessed, interpolation methods are used to complete the prediction data [39].This posteriori method results in predictions with high specificity, greatly limiting the model's generalization capability [40].In contrast, constructing a uniform and dense spatial network would lead to a more comprehensive and accurate understanding of spatial non-stationarity.
This study endeavors to incorporate spatial non-stationary into a machine learning model for the high-precision estimation of PM2.5 via remote sensing data.The proposed Flexible Geographically Weighted Neural Network (FGWNN) model is designed with the Flexible Geographical Neuron (FGN) and Geographically Weighted Activation Function (GWAF) to mitigate the negative impacts of uniform and sparse samples on regression accuracy.It enables the simultaneous learning of spatial non-stationarity and global non-linear relationships within the neural network.The 2.5 km spatial resolution PM2.5 data over the contiguous U.S. (CONUS) can be predicted by FGWNN with conventional satellite remote sensing product data.The organization of this paper is as follows.Section 2 elaborates on the study region and data materials associated with this study.Section 3 provides a detailed description of the FGWNN model design and evaluation.Section 4 demonstrates the FGWNN's performance and spatiotemporal patterns of PM2.5.Recommendations and further discussions based on this research will be presented in Section 5. Finally, we conclude in Section 6.

Study Area
The CONUS (Figure 1), excluding Alaska and Hawaii, comprises 48 states and the District of Columbia, with a total area of approximately 7.6 million square kilometers, representing over 80% of the nation's land area.The terrain generally exhibits a west-to-east elevation gradient, featuring high mountains and plateaus such as the Rocky Mountains, the Cascade Range, and the Colorado Plateau in the west, and low mountains and plains including the Appalachian Mountains, the Great Plains, and coastal plains in the east.The population of the CONUS is predominantly concentrated in the eastern and western coastal regions, as well as some inland states in the south and west.California, Texas, Florida, and New York are the four most populous states in the CONUS [41].
The influence factors of air quality exhibit regional variations in the CONUS (Figure 2).These variations are attributed to factors such as meteorological conditions, terrain features, and the distribution of emission sources [42].Extreme events, such as forest fires, dust storms, and volcanic eruptions, can also impact air quality in the CONUS.In recent years, large-scale forest fires in Canada have led to a surge in PM2.5 concentrations, affecting millions of people in the CONUS [43].

Data Sources
In the field of air pollution research, obtaining large-scale and long-term remote sensing data is crucial for understanding the spatiotemporal patterns of air pollutants.Remote sensing datasets from MODIS, Landsat, Sentinel, and others, combined with ground-level air quality monitoring data, have greatly facilitated collaboration and research in the field of air pollution [44].However, existing air quality data sources exhibit significant

Data Sources
In the field of air pollution research, obtaining large-scale and long-term remote sensing data is crucial for understanding the spatiotemporal patterns of air pollutants.Remote sensing datasets from MODIS, Landsat, Sentinel, and others, combined with ground-level air quality monitoring data, have greatly facilitated collaboration and research in the field of air pollution [44].However, existing air quality data sources exhibit significant

Data Sources
In the field of air pollution research, obtaining large-scale and long-term remote sensing data is crucial for understanding the spatiotemporal patterns of air pollutants.Remote sensing datasets from MODIS, Landsat, Sentinel, and others, combined with ground-level air quality monitoring data, have greatly facilitated collaboration and research in the field of air pollution [44].However, existing air quality data sources exhibit significant differences in spatial resolution, with low-resolution data diminishing the utility and quality of high-resolution data.

EPA PM2.5
Ground-level PM2.5 monitoring data for the CONUS are derived from the EPA's Outdoor Air Quality Data (https://aqs.epa.gov/aqsweb/airdata/download_files.html,accessed on 21 March 2024) [45].We selected data records from 20 March 2022, to 21 March 2023 as the initial dataset for our study.The annual and seasonal averages of pollutant concentrations were calculated based on 24 h average PM2.5 values (pollutant standard: PM25 24 h 2012).Furthermore, sites included in the average calculation were required to have monitoring data for more than 100 days.A total of 473 valid PM2.5 ground-level monitoring sites were obtained within the study area.The average PM2.5 concentrations for all sites are depicted in Figure 1, reflecting the overall spatial distribution of PM2.5 in the CONUS in 2022.

MODIS AOD
The MCD19A2 V6.1 data product is a Level-2 gridded product of land-based AOD from the MODIS Terra and Aqua instruments, generated daily at a pixel resolution of 1 km (https://lpdaac.usgs.gov/products/mcd19a2v061/,accessed on 21 March 2024) [46].The product includes relevant AOD layers, such as 0.47 µm blue band AOD, 0.55 µm green band AOD, and AOD uncertainty.In this study, the 0.47 µm blue band AOD is selected as the research data.

MODIS NDVI
The MODIS Vegetation Index (MYD13Q1) V6.1 data are generated every 16 days (https://lpdaac.usgs.gov/products/myd13q1v061/,accessed on 21 March 2024) [47] with a spatial resolution of 250 m for Level-3 products.MODIS NDVI products are calculated from atmospherically corrected bi-directional surface reflectance and are processed to mask water, clouds, heavy aerosols, and cloud shadows.

NOAA RTMA
The Real-Time Mesoscale Analysis (RTMA) is part of the NOAA Analysis and Observation (AoR) project (https://www.nco.ncep.noaa.gov/pmb/products/rtma/,accessed on 21 March 2024) [48].It is a high spatiotemporal resolution near-surface weather analysis approach.The product provides hourly analysis data at 2.5 km resolution for CONUS grid cells.The analysis product includes surface-observable weather elements and accounts for terrain effects.It also provides analysis for total cloud cover and visibility.In this study, RTMA is responsible for providing accurate weather conditions (temperature, specific humidity, wind speed) and model terrain elevation data (model terrain elevation) for PM2.5 inversion.

Data Preprocessing and Integration
Before modeling, it is necessary to preprocess and integrate the acquired remote sensing and ground-level datasets to ensure the data quality and consistency.Firstly, the spatial projection system of the remote sensing datasets was integrated and reprojected to the Albers Equal Area Conic projection (ESRI:102003).Secondly, the meteorological and topographic datasets were processed using the nearest neighbor method, sampling aerosol and meteorological variables based on the coordinates of the PM2.5 monitoring points.Then, a square buffer with a 2.5 km spatial resolution was created for each PM2.5 monitoring station, and the mean values of AOD and NDVI within the buffer were resampled onto the corresponding PM2.5 monitoring point.The preprocessing and integration steps are crucial for generating reliable and robust input data for subsequent modeling work.

AOD-PM2.5 Model Structure
In this study, PM2.5 is used as the dependent variable, and AOD, NDVI, TMP, SPFH, WIND, and HGT are used as independent variables to construct the PM2.5 inversion model as follows: We used the average data of 2022 to conduct multicollinearity diagnosis (Table 1).The analysis results indicate that the variance inflation factor (VIF) for all independent variables are less than 10, suggesting no significant collinearity among the variables.After standardization of the regression coefficients (Beta), TMP, NDVI, and AOD have the most significant positive impacts on PM2.5.

Model Development
To overcome the influence on modeling from the spatial distribution and density of training data, this study designed a specific network structure for FGWNN (Figure 3).The input layer with n represents the number of training samples, while it stands for the number of prediction data when the model is used for prediction.It's important to note that the number m in the hidden layer always represents the number of FGN and does not change with the working mode.The new network architecture brings two significant advantages, namely significant savings in computer storage space and computation time.

AOD-PM2.5 Model Structure
In this study, PM2.5 is used as the dependent variable, and AOD, NDVI, TMP, SPFH, WIND, and HGT are used as independent variables to construct the PM2.5 inversion model as follows: We used the average data of 2022 to conduct multicollinearity diagnosis (Table 1).The analysis results indicate that the variance inflation factor (VIF) for all independent variables are less than 10, suggesting no significant collinearity among the variables.After standardization of the regression coefficients (Beta), TMP, NDVI, and AOD have the most significant positive impacts on PM2.5.

Model Development
To overcome the influence on modeling from the spatial distribution and density of training data, this study designed a specific network structure for FGWNN (Figure 3).The input layer with n represents the number of training samples, while it stands for the number of prediction data when the model is used for prediction.It's important to note that the number m in the hidden layer always represents the number of FGN and does not change with the working mode.The new network architecture brings two significant advantages, namely significant savings in computer storage space and computation time.The input layer is responsible for assembling the independent variables (AOD, NDVI, TMP, SPFH, WIND, and HGT) as inputs.The hidden layer stores FGNs, and the GWAF is synchronously set above the corresponding FGNs.The output layer contains only one neuron corresponding to the dependent variable (PM2.5)output.The network weights (w [1] j ) and biases (b [1] j ) in the input layer are consistent with ANN.
Existing neural network activation functions primarily focus on global function transformations of input signals without considering the influence of spatial location on output signals.The GWAF can utilize spatial weighting to establish spatial connections between neurons on the same layer, where different input samples activate the neurons differently.It is a local activation function, and it controls the activation level of neurons through geographical weighting.The specific formula is as follows: The GWAF utilizes spatial weighting to measure spatial correlations, and the activation level of neurons depends on the spatial distance between the neuron and the target sample.This achieves feature transformation of input signals and spatial smoothing.The GWAF couples neurons with spatial positions, transforming ordinary neurons into FGNs.The contribution of the output signal to the result depends on the height of spatial weights.In other words, the GWAF reflects the spatial non-stationarity characteristics of samples.
The regression process of FGWNN can be represented by a matrix equation as follows: origin + b [2]  (6) In equation, the logical multiplication symbol ⊗ represents the element-wise multiplication of the corresponding sub-elements of the matrices on both sides.This operation results in a new matrix with the original dimension.

Model Hyperparameters Setting
The FGWNN model has various hyperparameters required for neural network learning, and including the spatial bandwidth (bw) among them would increase the computational cost of model training.Considering that the new model inherits the characteristics of GWR, we can use GWR for bandwidth selection, which can save a significant amount of computational resources and time.
In GWR, spatial weighting is a tool used to measure geographical proximity.It defines the strength of the relationship between each geographical location and its surrounding neighboring locations, reflecting the spatial correlation of geographical data in the study area.The commonly used Fixed-Gaussian spatial weight calculation formula is as follows: The choice of the optimal bw directly determines the estimation accuracy of the GWR model, and different diagnostic indicators yield different optimal bandwidths.In the GWR field, the AICc criterion is typically used for bandwidth selection [49], and the mathematical formula is as follows: Different information feature recognition results at different spatial scales exhibit some differences, where a smaller spatial resolution provides more details but consumes more computational resources.When conducting geographical observations, the observation scale needs to match the spatial scale of the geographical phenomenon.The setting of the FGN number in FGWNN should depend on the real situation.To construct the ideal state of the geographical neural network, the FGN's positions need to be homogeneous (uniformly distributed) and compact (moderate density) in the study area.
FGWNN uses a single-neuron output layer with an L2 norm loss function.In the field of gradient descent, many new algorithms have been developed, and in practice, there is no strict categorization of algorithms.Multiple algorithms can be used together.The SGD algorithm (Mini-Batch SGD) [50] is widely used for distributing the training set.The NAdam algorithm [51] has stronger constraints on the learning rate and more direct impact on gradient updates.The learning rate can control the step size of hyperparameter iteration in neural networks.A high value can lead to non-convergence, while a low value can increase the learning cost.In the FGWNN model, the learning rate between network layers is used to fine-tune the overall learning rate of the neural network.

Model Evaluation
The error function compares the predicted output with the expected output, calculating the difference between them.Commonly used error functions include Residual Sum of Squares (RSS), Mean Squared Error (MSE), and Cross-Entropy Error.When employing an ANN to handle regression problems, the model's loss function typically uses the RSS.Dividing by 2 in the formula make it easier to calculate the derivative.The specific formula is as follows: where LOSS denotes the amount of loss, target denotes the amount of goal, and output denotes the amount of output.Common evaluation metrics for machine learning regression problems include Root Mean Squared Error (RMSE), LOSS, R 2 , and adjusted R 2 (R 2 adj ).These evaluation metrics are suitable for different scenarios in regression problems, and the choice of the appropriate metric depends on the nature of the problem and the specific requirements for model performance.Given FGWNN's ability to detect spatial non-stationarity, considering introducing local R 2 and local RMSE to jointly evaluate the model.
This study uses K-Fold cross-validation [52] to evaluate the model's performance.In this process, the dataset is divided into K subsets, and the model undergoes K rounds of training and validation.K-Fold cross-validation can reduce the risk of overfitting, improve model robustness, and provide reliable performance estimates.During each round of K-Fold training, FGWNN employs the early-stopping strategy [53].It stops the model's training when the performance on the validation set no longer improves, helping to avoid local optima and overfitting traps.The combination of K-Fold cross-validation and Early-Stopping strategy significantly enhances the model's performance and stability.It ensures that the model not only learns effective features on the training data but also generalizes well to new data.

Uniformization Strategy Performance
The FGWNN model can operate in two modes: nonuniform mode and uniform mode.Before activating the uniformization strategy, the geographic neurons (GNs) in the neural network are aligned with the positions of ground monitoring stations (473 in total).Their spatial distribution is uneven, with dense coverage along coastal areas and sparse coverage in inland regions (Figure 4a).After uniformization, GNs are uniformly distributed across the CONUS (Figure 4b).Comparing the learning curves under the two modes (Figure 4c,d), it is observed that the convergence period for non-uniformization is later than that for uniformization, and the fluctuation in learning is higher for non-uniformization.The optimal global R 2 for the uniformization mode is slightly higher than that for the nonuniformization mode.To quantify the fitting performance of the FGWNN model under different strategies, we introduce the local R 2 metric in Figure 4e,f.The results show that, under the non-uniformization mode, the local R 2 values are higher in densely sampled areas but significantly lower in sparsely sampled areas.After activating the uniformization strategy, the difference in local R 2 is notably reduced, with a decrease in local R 2 in dense areas and an increase in local R 2 in sparse areas.In other words, the uniformization strategy transforms GNs into FGWs, effectively eliminating regression disparities caused by uneven samples.

Uniformization Strategy Performance
The FGWNN model can operate in two modes: nonuniform mode and uniform mode.Before activating the uniformization strategy, the geographic neurons (GNs) in the neural network are aligned with the positions of ground monitoring stations (473 in total).Their spatial distribution is uneven, with dense coverage along coastal areas and sparse coverage in inland regions (Figure 4a).After uniformization, GNs are uniformly distributed across the CONUS (Figure 4b).Comparing the learning curves under the two modes (Figure 4c,d), it is observed that the convergence period for non-uniformization is later than that for uniformization, and the fluctuation in learning is higher for non-uniformization.The optimal global R 2 for the uniformization mode is slightly higher than that for the non-uniformization mode.To quantify the fitting performance of the FGWNN model under different strategies, we introduce the local R 2 metric in Figure 4e,f.The results show that, under the non-uniformization mode, the local R 2 values are higher in densely sampled areas but significantly lower in sparsely sampled areas.After activating the uniformization strategy, the difference in local R 2 is notably reduced, with a decrease in local R 2 in dense areas and an increase in local R 2 in sparse areas.In other words, the uniformization strategy transforms GNs into FGWs, effectively eliminating regression disparities caused by uneven samples.

Overall Comparison of Different Models
Table 2 presents the regression results of five models (MLR, ANN, GWR, GNNWR and FGWNN) on the PM2.5 data cross the CONUS region in 2022.Specifically, the learn ing rate of all three models (ANN, GNNWR, and FGWNN) is 0.001, with 5000 neurons the hidden layer of ANN and FGWNN, compared to 473 in GNNWR.A comparison the computational time across different models reveals that FGWNN consumes the mo time at 161 s, followed by GNNWR at 109 s, ANN at 93 s, GWR at 86 s, and MLR with th lowest consumption at 28 s.The RMSE and LOSS metrics exhibit decreasing trend, wit values ranking from highest to lowest as MLR, ANN, GWR, GNNWR, and FGWNN.Con versely, R 2 and R demonstrate a progressively increasing trend.Overall, the regressio performance of FGWNN notably surpasses that of MLR, ANN, and GWR models, with small improvement over GNNWR.

Comparison with Other Models 4.2.1. Overall Comparison of Different Models
Table 2 presents the regression results of five models (MLR, ANN, GWR, GNNWR, and FGWNN) on the PM2.5 data cross the CONUS region in 2022.Specifically, the learning rate of all three models (ANN, GNNWR, and FGWNN) is 0.001, with 5000 neurons in the hidden layer of ANN and FGWNN, compared to 473 in GNNWR.A comparison of the computational time across different models reveals that FGWNN consumes the most time at 161 s, followed by GNNWR at 109 s, ANN at 93 s, GWR at 86 s, and MLR with the lowest consumption at 28 s.The RMSE and LOSS metrics exhibit decreasing trend, with values ranking from highest to lowest as MLR, ANN, GWR, GNNWR, and FGWNN.Conversely, R 2 and R 2 adj demonstrate a progressively increasing trend.Overall, the regression performance of FGWNN notably surpasses that of MLR, ANN, and GWR models, with a small improvement over GNNWR.Figure 6 depicts the scatter plots of PM2.5 estimated values versus observed values for the four models, along with their evaluation metric scores and linear fitting equations.Both GWR and FGWNN exhibit the ability to detect spatial non-stationarity, with significantly improved fitting accuracy compared to MLR and ANN.In comparison to the traditional GWR approach, FGWNN demonstrates an exceptional fitting capability.Its R 2 adj increases from less than 0.72 to nearly 0.92, with a reduction in LOSS by over 180 and RMSE decreased to within 0.60 µg/m 3 .These results indicate that the FGWNN method possesses a robust generalization performance, accurately reproducing the original state of the PM2.5-AOD model.In summary, FGWNN outperforms MLR, ANN, and GWR models in the 2022 assessment.
ISPRS Int.J. Geo-Inf.2024, 13 Figure 6 depicts the scatter plots of PM2.5 estimated values versus observed values for the four models, along with their evaluation metric scores and linear fitting equations.Both GWR and FGWNN exhibit the ability to detect spatial non-stationarity, with significantly improved fitting accuracy compared to MLR and ANN.In comparison to the traditional GWR approach, FGWNN demonstrates an exceptional fitting capability.Its R increases from less than 0.72 to nearly 0.92, with a reduction in LOSS by over 180 and RMSE decreased to within 0.60 µg/m 3 .These results indicate that the FGWNN method possesses a robust generalization performance, accurately reproducing the original state of the PM2.5-AOD model.In summary, FGWNN outperforms MLR, ANN, and GWR models in the 2022 assessment.8c,d,g,h), the FGWNN model once again demonstrates significant superiority, with average R 2 values of 0.89, 0.92, 0.88, and 0.81, respectively.Hence, from a statistical perspective, the FGWNN model is more suitable for satellite-based PM2.5 mapping compared to the MLR, ANN, and GWR models.

Annual and Seasonal Performance of FGWNN Model
To further understand the annual and seasonal performance of the FGWNN model, we evaluated its spatial performance.Table 3 presents the RMSE results for both the annual and quarterly assessments in 2022, with global RMSE values consistently below 1.0 µg/m 3 for all four seasons.From the results, the order of global RMSE magnitudes is as follows: summer < annual < spring < autumn < winter.Local RMSE values between observed and estimated PM2.5 values were calculated for each monitoring station (see Figure 9).Overall, the FGWNN model demonstrates a reliable spatial prediction capability.Indeed, the FGWNN model performs well in the central regions of station clusters but relatively poorer in the peripheral areas of these clusters.This phenomenon may be attributed to the FGWNN model's ability to smooth regression variations between station points.Despite the uneven spatial distribution of the model performance, the FGWNN model demonstrates an excellent predictive capability overall, with over 64% of sites reporting local RMSE values below 1 µg/m 3 .A comparative analysis suggests that the FGWNN model proposed in this study holds significant potential for satellite-based PM2.5 mapping.

Annual and Seasonal Performance of FGWNN Model
To further understand the annual and seasonal performance of the FGWNN model, we evaluated its spatial performance.Table 3 presents the RMSE results for both the annual and quarterly assessments in 2022, with global RMSE values consistently below 1.0 µg/m 3 for all four seasons.From the results, the order of global RMSE magnitudes is as follows: summer < annual < spring < autumn < winter.Local RMSE values between observed and estimated PM2.5 values were calculated for each monitoring station (see Figure 9).Overall, the FGWNN model demonstrates a reliable spatial prediction capability.Indeed, the FGWNN model performs well in the central regions of station clusters but relatively poorer in the peripheral areas of these clusters.This phenomenon may be attributed to the FGWNN model's ability to smooth regression variations between station points.Despite the uneven spatial distribution of the model performance, the FGWNN model demonstrates an excellent predictive capability overall, with over 64% of sites reporting local RMSE values below 1 µg/m 3 .A comparative analysis suggests that the FGWNN model proposed in this study holds significant potential for satellite-based PM2.5 mapping.

PM2.5 Prediction over CONUS
Once the predictive capability of the FGWNN model is sufficiently validated, the continuous predictions of PM2.5 spatial concentrations in the CONUS region can be made.
Figure 10 displays the annual and seasonal distribution of ground-level PM2.5, based on the inversion of FGWNN (with 5000 FGNs).Overall, the average annual PM2.5 concentration is 7.45 µg/m 3 , representing a 37.9% decrease compared to the Level 1 standard (12 µg/m 3 ) defined by the US-EPA in 2016.Additionally, it is predicted that approximately 43% of pixel cells in CONUS have an annual PM2.5 concentrations exceeding 12 µg/m 3 .These findings suggest that CONUS still experiences mild PM2.5 pollution, and the combination of satellite remote sensing can provide a more detailed spatial distribution information of atmospheric pollutants than ground-based monitoring alone [54].
ISPRS Int.J. Geo-Inf.2024, 13, x FOR PEER REVIEW 14 of Once the predictive capability of the FGWNN model is sufficiently validated, th continuous predictions of PM2.5 spatial concentrations in the CONUS region can be mad Figure 10 displays the annual and seasonal distribution of ground-level PM2.5, based o the inversion of FGWNN (with 5000 FGNs).Overall, the average annual PM2.5 concen tration is 7.45 µg/m 3 , representing a 37.9% decrease compared to the Level 1 standard (1 µg/m 3 ) defined by the US-EPA in 2016.Additionally, it is predicted that approximatel 43% of pixel cells in CONUS have an annual PM2.5 concentrations exceeding 12 µg/m These findings suggest that CONUS still experiences mild PM2.5 pollution, and the com bination of satellite remote sensing can provide a more detailed spatial distribution info mation of atmospheric pollutants than ground-based monitoring alone [54].The results from the five periods reveal a general trend of the decreasing spatial di tribution of PM2.5 from coastal to inland areas, with the western and southeastern region being more susceptible to air pollution than the central and northern regions (Figure 10 PM2.5 concentrations are generally higher in regions with a Mediterranean climate an tropical desert climate.The western, northeastern, and southern regions constitute th three major industrial zones in the United States, emitting significant amounts of particu late pollutants.Across the first three seasons, the air quality in CONUS shows a worsenin trend, with PM2.5 concentrations increasing from 6.16 µg/m 3 to 7.80 µg/m 3 .By winter, th air pollution peaks at 7.81 µg/m 3 , covering almost the entire CONUS region.Air qualit deteriorates progressively in the west as the seasons advance, while pollutants in the ea shift from south to north.Additionally, Canadian forest wildfires serve as significan The results from the five periods reveal a general trend of the decreasing spatial distribution of PM2.5 from coastal to inland areas, with the western and southeastern regions being more susceptible to air pollution than the central and northern regions (Figure 10).PM2.5 concentrations are generally higher in regions with a Mediterranean climate and tropical desert climate.The western, northeastern, and southern regions constitute the three major industrial zones in the United States, emitting significant amounts of particulate pollutants.Across the first three seasons, the air quality in CONUS shows a worsening trend, with PM2.5 concentrations increasing from 6.16 µg/m 3 to 7.80 µg/m 3 .By winter, the air pollution peaks at 7.81 µg/m 3 , covering almost the entire CONUS region.Air quality deteriorates progressively in the west as the seasons advance, while pollutants in the east shift from south to north.Additionally, Canadian forest wildfires serve as significant sources of air pollution in North America, with smoke particles transported into the CONUS airspace by atmospheric movements [55].The northeastern and western coastal regions are major population and urban centers, yet PM2.5 accumulates continuously during this period, undoubtedly increasing health risks for local residents [56].

Discussion
The spatial distance between geographical objects determines the strength of their spatial relationships, referred to as spatial dependency [57].Spatial weighting in GWR [58] describes the varying spatial dependency between individual objects and all objects.In this paper, we define the strong or weak variation in the region as the spatial dependency field (SDF).Although GWR can effectively detect spatial non-stationarity, the SDF used in the model has two limitations.First, in the sparse state (Figure 11a), training samples are homogeneously distributed in space but with low sample density.The SDF can only roughly reflect the spatial dependency pattern of the original data, overlooking the finer details.Second, when spatial density is insufficient, training samples are heterogeneously distributed in space (eccentric and uneven), which falls into the biased state (Figure 11b).This situation can cause the significant deformation of the SDF, affecting the accurate representation of the original spatial dependency pattern, and significantly diminishing the quality of the final model.In conclusion, if the ideal SDF (Figure 11c) adapted to the target spatial scale is constructed, its learned spatial non-stationarity will be more comprehensive and accurate.
sources of air pollution in North America, with smoke particles transported into the CO-NUS airspace by atmospheric movements [55].The northeastern and western coastal regions are major population and urban centers, yet PM2.5 accumulates continuously during this period, undoubtedly increasing health risks for local residents [56].

Discussion
The spatial distance between geographical objects determines the strength of their spatial relationships, referred to as spatial dependency [57].Spatial weighting in GWR [58] describes the varying spatial dependency between individual objects and all objects.In this paper, we define the strong or weak variation in the region as the spatial dependency field (SDF).Although GWR can effectively detect spatial non-stationarity, the SDF used in the model has two limitations.First, in the sparse state (Figure 11a), training samples are homogeneously distributed in space but with low sample density.The SDF can only roughly reflect the spatial dependency pattern of the original data, overlooking the finer details.Second, when spatial density is insufficient, training samples are heterogeneously distributed in space (eccentric and uneven), which falls into the biased state (Figure 11b).This situation can cause the significant deformation of the SDF, affecting the accurate representation of the original spatial dependency pattern, and significantly diminishing the quality of the final model.In conclusion, if the ideal SDF (Figure 11c) adapted to the target spatial scale is constructed, its learned spatial non-stationarity will be more comprehensive and accurate.Large-scale, low spatial resolution remote sensing images inevitably suffer from issues related to insufficient spatial details, challenging target identification, limited image quality, and application constraints [59].The high-definition images inferred and predicted using FGWNN can facilitate the precise identification of areas with air pollution anomalies, providing strong evidence for the analysis of air pollution driving factors [60].The quality of the SDF constructed by traditional geographical detectors depends on the density and uniformity of the spatial distribution of samples, which often suffer from sparsity and non-homogeneity in real-world data [61].To overcome this limitation, FGWNN automatically allocates homogeneous and moderate FGNs to the hidden layer, achieving an ideal SDF state.The FGWNN method proposed in this paper realizes the effective detection of spatial relationships through the establishment of a flexible SDF, which can accurately reconstruct the real features of geographical data.Additionally, it significantly enhances the regression accuracy and spatial resolution of PM2.5 inversion, making it a reliable-efficient remote sensing mapping technique.Large-scale, low spatial resolution remote sensing images inevitably suffer from issues related to insufficient spatial details, challenging target identification, limited image quality, and application constraints [59].The high-definition images inferred and predicted using FGWNN can facilitate the precise identification of areas with air pollution anomalies, providing strong evidence for the analysis of air pollution driving factors [60].The quality of the SDF constructed by traditional geographical detectors depends on the density and uniformity of the spatial distribution of samples, which often suffer from sparsity and nonhomogeneity in real-world data [61].To overcome this limitation, FGWNN automatically allocates homogeneous and moderate FGNs to the hidden layer, achieving an ideal SDF state.The FGWNN method proposed in this paper realizes the effective detection of spatial relationships through the establishment of a flexible SDF, which can accurately reconstruct the real features of geographical data.Additionally, it significantly enhances the regression accuracy and spatial resolution of PM2.5 inversion, making it a reliable-efficient remote sensing mapping technique.
In exploring the spatial and temporal patterns of PM2.5 using high-precision remote sensing products from FGWNN inversion, different levels of the study area require the use of products with a compatible spatial resolution.This aims to strike a balance between inversion accuracy and computational efficiency.Figure 12 shows the study area at four administrative levels, i.e., national level (CONUS), division-level (Pacific Division), state level (California State), and county level (Los Angeles County).When the spatial resolution of the remote sensing image is not less than 20 km, PM2.5 data within the CONUS can be obtained with clear image details and no obvious jaggedness.Increasing the spatial resolution of remote sensing products to 10 km can fully demonstrate the spatial distribution characteristics of PM2.5 in the Pacific Division.In order to clearly detect the air quality distribution pattern in California State, the spatial resolution of remote sensing images is required to be higher than 5 km.Remote sensing products inverted from existing data with a maximum spatial resolution of 2.5 km can roughly reflect the general situation in Los Angeles County.Theoretically, the FGWNN model is able to complete the inversion of remote sensing products with arbitrary target resolution when the spatial resolutions of the independent variables meet the requirements.
In exploring the spatial and temporal patterns of PM2.5 using high-precision remote sensing products from FGWNN inversion, different levels of the study area require the use of products with a compatible spatial resolution.This aims to strike a balance between inversion accuracy and computational efficiency.Figure 12 shows the study area at four administrative levels, i.e., national level (CONUS), division-level (Pacific Division), state level (California State), and county level (Los Angeles County).When the spatial resolution of the remote sensing image is not less than 20 km, PM2.5 data within the CONUS can be obtained with clear image details and no obvious jaggedness.Increasing the spatial resolution of remote sensing products to 10 km can fully demonstrate the spatial distribution characteristics of PM2.5 in the Pacific Division.In order to clearly detect the air quality distribution pattern in California State, the spatial resolution of remote sensing images is required to be higher than 5 km.Remote sensing products inverted from existing data with a maximum spatial resolution of 2.5 km can roughly reflect the general situation in Los Angeles County.Theoretically, the FGWNN model is able to complete the inversion of remote sensing products with arbitrary target resolution when the spatial resolutions of the independent variables meet the requirements.Although the FGWNN has made some progress, there are still some limitations.These potential issues should be further considered in subsequent research or when applying the method more widely.The choice of spatial bandwidth relies on GWR calculations [62], which may limit its practicality in spatiotemporal analysis scenarios.In highresolution remote sensing data contexts, model training and geographical activation processes incur substantial computational costs [63].In the future, we plan to optimize the acquisition of spatial bandwidth by embedding this process within FGWNN.Simultaneously, we aim to refine the FGWNN network structure to reduce its dependence on computer resources.It is our hope that the FGWNN model can be extended to the Although the FGWNN has made some progress, there are still some limitations.These potential issues should be further considered in subsequent research or when applying the method more widely.The choice of spatial bandwidth relies on GWR calculations [62], which may limit its practicality in spatiotemporal analysis scenarios.In high-resolution remote sensing data contexts, model training and geographical activation processes incur substantial computational costs [63].In the future, we plan to optimize the acquisition of spatial bandwidth by embedding this process within FGWNN.Simultaneously, we aim to refine the FGWNN network structure to reduce its dependence on computer resources.It is our hope that the FGWNN model can be extended to the spatiotemporal analysis field, further advancing the development of remote sensing spatiotemporal mapping technology.

Conclusion
In-depth research into the spatiotemporal patterns of air pollution risk in a region can help alleviate concerns about public health crises.Building upon the foundations of GWR and ANN, this study has introduced a novel neural network structure, FGWNN.It can automatically allocate the positions and quantities of FGNs based on the characteristics of the study area, providing simultaneous analysis of the spatial non-stationarity and global nonlinear relationships within the original data.Notably, the ideal state SDF constructed by the new method can perfectly fit complex spatial non-stationary relationships.We successfully predicted PM2.5 concentrations across the CONUS in 2022, with the regression model's fitting accuracy improved to above 0.90.Despite variations in model performance across different seasons, the PM2.5 products generated at a 2.5 km resolution to maintain a high level of fidelity.The remote sensing data produced by FGWNN possess a high spatial resolution, meeting the needs of researchers for air quality assessments at different scales.In the future, it is planned to enhance the model's temporal detection capability and expand its application prospects in the spatiotemporal remote sensing domain.

Figure 1 .
Figure 1.The spatial distribution of PM2.5 concentration monitored by ground-based stations over the CONUS in 2022.

Figure 1 . 20 Figure 1 .
Figure 1.The spatial distribution of PM2.5 concentration monitored by ground-based stations over the CONUS in 2022.

Figure 4 .
Figure 4. Comparison of the effects before and after implementing the uniformization strategy.(a) Non-uniformization GN; (b) Uniformization GN; (c) Learning curve under non-uniformization mode; (d) Learning curve under uniformization mode; (e) Estimated local R 2 for non-uniformization mode; (f) Estimated local R 2 for uniformization mode.

Figure 4 .
Figure 4. Comparison of the effects before and after implementing the uniformization strategy.(a) Non-uniformization GN; (b) Uniformization GN; (c) Learning curve under non-uniformization mode; (d) Learning curve under uniformization mode; (e) Estimated local R 2 for non-uniformization mode; (f) Estimated local R 2 for uniformization mode.

4. 1 . 2 .Figure 5
Figure 5 illustrates the diagnostic results of the training effectiveness of the FGWNN model under different numbers of FGNs after activating the uniformization strategy.The upper part of the figure illustrates the change patterns of LOSS and R 2 .As the number of FGNs increases, LOSS initially decreases rapidly, reaching a turning point (5000), and then gradually converges.In contrast, R 2 exhibits an opposite change pattern to LOSS.To quantify the memory consumption and runtime associated with a different FGN size, this experiment focuses on the data storage of the spatial weight matrix and the learning time of the FGWNN model.The results indicate that both memory consumption and runtime follow a logarithmic growth pattern.When the number of FGNs is set to 5000, both memory consumption and runtime remain at relatively low levels.This indicates that the process of optimizing the number of FGNs needs to consider a balance between learning effectiveness and training costs, aiming to minimize the dependence on computational resources while ensuring high learning effectiveness.

Figure 5 .
Figure 5. FGN number optimization (marked by red dotted circle) and comparison of computational cost changes.

Figure 5 .
Figure 5. FGN number optimization (marked by red dotted circle) and comparison of computational cost changes.

Figure 7
Figure 7 presents the seasonal performance of the MLR, ANN, GWR, and FGWNN models.From the graph, it is evident that, for an equal sample dataset, the FGWNN model outperforms the other three models in all four seasons.Following FGWNN, the GWR and ANN models rank next, while the MLR model performs the poorest, with R 2 values of 0.41, 0.35, 0.46, and 0.20 for the four seasons, respectively.The seasonal performance trend of the MLR model aligns with that of the ANN model, with the highest R 2 value in autumn and the lowest in winter.Conversely, the performance of the GWR model, like the FGWNN model, initially rises, stabilizes, and then declines.Despite the results shown in

Figure 7
Figure7presents the seasonal performance of the MLR, ANN, GWR, and FGWNN models.From the graph, it is evident that, for an equal sample dataset, the FGWNN model outperforms the other three models in all four seasons.Following FGWNN, the GWR and ANN models rank next, while the MLR model performs the poorest, with R 2 values of 0.41, 0.35, 0.46, and 0.20 for the four seasons, respectively.The seasonal performance

Figure 7 .
Figure 7. Trends in global regression metrics across the four seasons of 2022.Utilizing the local R 2 and local RMSE results of each model across the four seasons, we constructed boxplots to illustrate the spatial performance of each model.Regarding the local RMSE results (Figure 8a,b,e,f), the average RMSE values for the four seasons are 0.64, 0.53, 0.68, and 0.99 for the FGWNN model.Additionally, minor spatial variability can be observed in the boxplots of the FGWNN model.By comparing the local R 2 values (Figure8c,d,g,h), the FGWNN model once again demonstrates significant superiority, with average R 2 values of 0.89, 0.92, 0.88, and 0.81, respectively.Hence, from a statistical perspective, the FGWNN model is more suitable for satellite-based PM2.5 mapping compared to the MLR, ANN, and GWR models.

Figure 8 .
Figure 8. Boxplots of local regression metrics across the four seasons of 2022.(a) Local RMSE in Spring; (b) Local RMSE in Summer; (c) Local R 2 in Spring; (d) Local R 2 in Summer; (e) Local RMSE in Autumn; (f) Local RMSE in Winter; (g) Local R 2 in Autumn; (h) Local R 2 in Winter.

Figure 7 .
Figure 7. Trends in global regression metrics across the four seasons of 2022.Utilizing the local R 2 and local RMSE results of each model across the four seasons, we constructed boxplots to illustrate the spatial performance of each model.Regarding the local RMSE results (Figure 8a,b,e,f), the average RMSE values for the four seasons are 0.64, 0.53, 0.68, and 0.99 for the FGWNN model.Additionally, minor spatial variability can be observed in the boxplots of the FGWNN model.By comparing the local R 2 values (Figure8c,d,g,h), the FGWNN model once again demonstrates significant superiority, with average R 2 values of 0.89, 0.92, 0.88, and 0.81, respectively.Hence, from a statistical perspective, the FGWNN model is more suitable for satellite-based PM2.5 mapping compared to the MLR, ANN, and GWR models.
ISPRS Int.J. Geo-Inf.2024, 13, x FOR PEER REVIEW 12 of 20 the RMSE figures, which are contrary to those of the R 2 metrics, the FGWNN model maintains optimal performance.

Figure 7 .
Figure 7. Trends in global regression metrics across the four seasons of 2022.Utilizing the local R 2 and local RMSE results of each model across the four seasons, we constructed boxplots to illustrate the spatial performance of each model.Regarding the local RMSE results (Figure 8a,b,e,f), the average RMSE values for the four seasons are 0.64, 0.53, 0.68, and 0.99 for the FGWNN model.Additionally, minor spatial variability can be observed in the boxplots of the FGWNN model.By comparing the local R 2 values (Figure8c,d,g,h), the FGWNN model once again demonstrates significant superiority, with average R 2 values of 0.89, 0.92, 0.88, and 0.81, respectively.Hence, from a statistical perspective, the FGWNN model is more suitable for satellite-based PM2.5 mapping compared to the MLR, ANN, and GWR models.

Figure 8 .
Figure 8. Boxplots of local regression metrics across the four seasons of 2022.(a) Local RMSE in Spring; (b) Local RMSE in Summer; (c) Local R 2 in Spring; (d) Local R 2 in Summer; (e) Local RMSE in Autumn; (f) Local RMSE in Winter; (g) Local R 2 in Autumn; (h) Local R 2 in Winter.

Figure 8 .
Figure 8. Boxplots of local regression metrics across the four seasons of 2022.(a) Local RMSE in Spring; (b) Local RMSE in Summer; (c) Local R 2 in Spring; (d) Local R 2 in Summer; (e) Local RMSE in Autumn; (f) Local RMSE in Winter; (g) Local R 2 in Autumn; (h) Local R 2 in Winter.

Figure 9 .
Figure 9. Local RMSE results for year and seasons in 2022 via FGWNN model.

Figure 9 .
Figure 9. Local RMSE results for year and seasons in 2022 via FGWNN model.

Figure 11 .
Figure 11.Three states of SDF.(a) Sparse state; (b) Biased state; (c) Ideal state: uniform and dense placement of FGNs.

Figure 11 .
Figure 11.Three states of SDF.(a) Sparse state; (b) Biased state; (c) Ideal state: uniform and dense placement of FGNs.

Figure 12 .
Figure 12.Spatial resolutions corresponding to the different study region levels.(a) CONUS; (b) Pacific Division; (c) California State; (d) Los Angeles County.

Figure 12 .
Figure 12.Spatial resolutions corresponding to the different study region levels.(a) CONUS; (b) Pacific Division; (c) California State; (d) Los Angeles County.

Table 2 .
Comparison of PM2.5 regression with different models over the CONUS in 2022.

Table 2 .
Comparison of PM2.5 regression with different models over the CONUS in 2022.

Table 3 .
RMSE results for year and seasons in 2022.

Table 3 .
RMSE results for year and seasons in 2022.