Featured Application
A low-cost screening tool for prioritizing inspections of urban fuel stations: satellite imagery and OpenStreetMap are fused to generate station-level risk scores and city-wide risk maps that help authorities rank sites near schools, hospitals, and dense housing for targeted safety planning.
Abstract
This study introduces SquareSwish, a smooth, self-gated activation , and benchmarks it against ten established activations (ReLU, LeakyReLU, ELU, SELU, GELU, Snake, LearnSnake, Swish, Mish, Hard-Swish) across six CNN architectures (EfficientNet-B1/B4, EfficientNet-V2-M/S, ResNet-50, and Xception) under a uniform transfer-learning protocol. Two geographically grounded datasets are used in this study. FuelRiskMap-TR comprises 7686 satellite images of urban fuel stations in Türkiye, which is semantically enriched with the OpenStreetMap context and YOLOv8-Small rooftop segmentation (mAP@0.50 = 0.724) to support AI-enabled, ICT-integrated risk screening. In a similar fashion, FuelRiskMap-UK is collected, comprising 2374 images. Risk scores are normalized and thresholded to form balanced High/Low-Risk labels for supervised training. Across identical training settings, SquareSwish achieves a top-1 validation accuracy of 0.909 on EfficientNet-B1 for FuelRiskMap-TR and reaches 0.920 when combined with SELU in a simple softmax-probability ensemble, outperforming the other activations under the same protocol. By squaring the sigmoid gate, SquareSwish more strongly attenuates mildly negative activations while preserving smooth, non-vanishing gradients, tightening decision boundaries in noisy, semantically enriched Earth-observation settings. Beyond classification, the resulting city-scale risk layers provide actionable geospatial outputs that can support inspection prioritization and integration with municipal GIS, offering a reproducible and low-cost safety-planning approach built on openly available imagery and volunteered geographic information.
1. Introduction
Activation functions play a pivotal role in deep neural networks by introducing the nonlinearity necessary for modeling complex relationships and by regulating the propagation of gradients during training. For many years, simple rectifiers, such as the Rectified Linear Unit (ReLU) and its variant LeakyReLU, have dominated, in large part because they impose minimal computational cost and largely avoid vanishing gradients for positive inputs. However, the zeroing of negative values and the abrupt change in slope at the origin can limit representational richness and impede optimization stability. In response, recent research has explored smooth, self-gated activations, such as Swish, Mish, and their variants, which multiply the input by a sigmoid gate, thereby producing smooth, continuous nonlinear curves and maintaining small, smooth gradients over a broad range of negative inputs. These advances have yielded consistent improvements on ImageNet and downstream tasks, especially within transfer-learning methodologies where the delicate balance between learning new task-specific features and retaining pre-trained representations is critical.
Motivated by the promise of self-gating and inspired by the straightforward elegance of Swish, SquareSwish, a one-line modification defined as , is proposed. Squaring the gate (the logistic sigmoid function ) sharpens function’s effect in a way that negative activations decay more rapidly, while small positive inputs are amplified more strongly, tightening the transition around zero without introducing any discontinuity or additional computational overhead. This simple adjustment preserves the smoothness and differentiability that support successful gradient-based optimization, while further reducing the influence of noisy, mildly negative signals in early network layers.
To assess the efficacy of SquareSwish, the FuelRiskMap-TR dataset is introduced, comprising 7686 high-resolution satellite images of fuel stations across Türkiye. Each image is enriched with semantic context derived from OverpassQL queries for nearby roads, schools, hospitals, and commercial facilities, and is further augmented by rooftop-segmentation outputs from a YOLOv8-Small model. This segmentation network, trained on manually annotated roofs, achieves a mean average precision of 0.724 at a 50 percent intersection-over-union threshold. The resulting building counts and contextual feature counts are combined into a continuous risk score. A risk score defines the potential severity of a fuel station explosion, gas leaks, fires, and other hazards that could impact its surroundings. This raw score is then normalized and thresholded to yield balanced HighRisk and LowRisk labels, ensuring a robust and reproducible target for classification.
Within this framework, SquareSwish is benchmarked against ten established activations (ReLU, LeakyReLU, ELU, SELU, GELU, Snake, LearnSnake, Swish, Mish, and Hard-Swish) across six convolutional architectures (EfficientNet-B1/B4 [1], EfficientNet-V2-M/S [2], ResNet-50 [3], and Xception [4]) spanning lightweight mobile models to high-capacity classifiers. Every experiment is conducted under a rigorously uniform transfer-learning protocol: inputs are resized to 224 × 224 pixels, augmented with horizontal flips and ImageNet-style normalization, and split into an 80 percent training set and 20 percent validation set with fixed random seeds. Training begins with three warm-up epochs in which only the newly added classification head is trainable, followed by up to twenty-seven fine-tuning epochs for the full network, employing a constant dropout rate of 0.30 on the fully connected head, and early stopping after three epochs without validation loss improvement. With all other factors held constant, any observed differences in top-1 accuracy and ensemble performance are attributable directly to the choice of activation function. In addition, cross-country generalization is examined on FuelRiskMap-UK, where the same weak-supervision method and rooftop-mask logic are applied, enabling an explicit assessment of domain shift effects.
On the FuelRiskMap-TR dataset, SquareSwish achieves top-1 validation accuracy of 0.909 with EfficientNet-B1, and, when ensembled with SELU, this increases to 0.920, which surpasses all other activation pairings under identical conditions. Moreover, integrating SquareSwish with the YOLOv8-Small segmentation model and the geospatial risk mapping method yields more coherent and discriminative detection of high-risk fuel stations in densely built urban centers, such as Istanbul. Together, these results demonstrate that SquareSwish not only advances the state of the art in activation-function design but also offers practical benefits in real-world, safety-critical applications.
Highlights of this study are given below. This study:
- Proposes SquareSwish , a simple self-gated activation for transfer learning in geospatial risk screening.
- Benchmarks SquareSwish against 10 established activations across six CNN architectures under a strictly uniform transfer-learning protocol.
- Achieves the strongest FuelRiskMap-TR results on EfficientNet-B1; a lightweight SquareSwish + SELU softmax-probability ensemble provides the best overall discrimination under identical conditions.
- Constructs a reproducible weak-supervision method combining YOLOv8 rooftop segmentation with OpenStreetMap context to derive risk labels and generate province-level choropleth maps.
- Examines cross-country transfer on FuelRiskMap-UK and reports robustness using repeated stratified splits, reducing dependence on a single random seed.
- Uses openly available EO basemaps and volunteered geographic information, supporting reproducibility and low-cost deployment for authorities.
2. Related Work
Three broad families dominate modern activation design: piecewise-linear rectifiers (ReLU [5], Leaky-ReLU [6]) prized for their negligible cost but vulnerable to dead neurons and gradient sparsity; smooth saturating activations such as ELU [7], SELU [8], GELU [9], Snake and LearnSnake [10] that moderate variance and improve optimization stability at the price of extra arithmetic; and (self-gated activations (Swish [11], Mish [12], Hard-Swish [13]) that multiply the input by a data-dependent gate, typically a sigmoid, to keep gradients alive for negative inputs and to obtain the non-monotonic shapes that have proven effective in ImageNet-scale models like EfficientNet and MobileNet [14].
Beyond these canonical families, recent work from 2021 to 2024 has increasingly explored learnable or parameterized activations (e.g., ACON/meta-ACON, which learns when to activate vs. inactivate) and architecture-aware, efficiency-oriented variants such as StarReLU (a squared-ReLU variant introduced in MetaFormer baselines) [15,16]. Recent surveys and broader activation-function taxonomies emphasize that, despite the large number of proposals, only a subset yields consistent and reproducible gains across architectures, with recent trends often favoring smooth and gated forms when optimization stability and transfer performance are priorities [17,18,19].
SquareSwish falls within the self-gated family and offers a smoother, more selective gate than Swish/Mish without introducing extra transcendental evaluations or complex piecewise cases that can complicate deployment on constrained accelerators. In addition, several 2023–2024 studies propose learnable activation mechanisms tailored to transformer blocks or derive new trainable activations via different design principles [20,21]; however, these approaches typically target transformer/LLM settings and/or introduce additional parameters or architectural coupling, and are therefore treated as complementary rather than direct baselines for lightweight CNN transfer learning in geospatial risk mapping. SquareSwish can be used as a drop-in replacement, whose key properties, stronger attenuation of negative inputs, near-linear behavior for large positive inputs, and non-vanishing gradients make it especially suitable for transfer-learning setups and for noisy, weakly labeled problems such as geospatial risk mapping.
Geospatial risk mapping has long relied on GIS-based Multi-Criteria Decision Analysis (MCDA) to combine layers such as slope, lithology, distance to infrastructure, land use, and population density into susceptibility or risk scores through expert-defined weights and linear combination techniques [22,23]. Over the past decade, the availability of open geospatial data, including OpenStreetMap (OSM) and Sentinel imagery, as well as crowdsourced reports and administrative records, has enabled richer, multimodal representations of hazards, exposure, and vulnerability. In particular, semantic enrichment, which augments satellite imagery with contextual features such as roads, schools, hospitals, and commercial facilities extracted from OSM or national datasets, has become a cornerstone of modern workflows, improving the fidelity of exposure proxies across domains like flood susceptibility, wildfire risk, and industrial-infrastructure safety assessment [24,25,26].
To move beyond subjective criteria, classical machine-learning models such as random forests, gradient boosting machines, and support vector machines have been applied to hazard susceptibility and risk mapping. These methods frequently outperform MCDA methods by automatically learning nonlinear feature interactions, while retaining some interpretability through feature-importance measures and partial-dependence plots [27]. The advent of deep learning further transformed the field: convolutional neural networks (CNNs), U-Nets, and vision transformers now dominate image-based tasks such as rooftop segmentation, damage detection, and pixel-level susceptibility mapping [28]. Recent remote-sensing models also increasingly leverage attention and segmentation-specific designs to enhance spectral-spatial representation and dense prediction, including Central Attention Network for hyperspectral imagery classification, Multi-Area Target Attention for multi-scale region emphasis, and SegHSI for end-to-end hyperspectral semantic segmentation under limited labeled pixels [29,30,31]. Because labeled geospatial hazard data are often scarce, transfer learning by fine tuning ImageNet-pretrained architectures such as ResNet, EfficientNet, Xception, and MobileNet has become ubiquitous, yielding substantial gains over training from scratch [32,33]. In parallel, graph neural networks have emerged for modeling risk propagation in structured systems such as power grids, especially in urban gas and fuel station safety applications [34].
A fully reproducible, activation-focused evaluation framework is presented, in which each fuel-station image is enriched with contextual features obtained from semantic queries and rooftop-segmentation outputs generated by YOLOv8-Small. Building footprints, road networks, and other relevant assets are detected and segmented, enabling the density and spatial arrangement of exposures that delineate potential hazard zones to be quantified. Following a two-stage design common in recent hazard-mapping work [35], an instance-level segmentation network (e.g., YOLOv8-Seg or Mask R-CNN) is first applied to quantify asset counts or areas; these derived features are then fed into a simple binary classifier trained to distinguish high-risk from low-risk stations. This method has proven effective across domains, including earthquake damage mapping, forest-fire risk forecasting, and urban gas-supply vulnerability assessment.
SquareSwish bridges the gap between architectural simplicity (requiring only a one-line modification) and real-world efficacy, demonstrating that even minor mathematical refinements to gating mechanics can yield measurable gains in both accuracy and robustness in safety-critical applications. To the best of current knowledge, no geospatial risk-mapping study has systematically benchmarked activation functions under a unified transfer-learning protocol across multiple CNN architectures.
3. Materials and Methods
3.1. Study Area and Data Sources
The study covers the national extent of Türkiye, with samples drawn from urban and peri-urban locations across all regions. Each record represents a 50 m-radius neighborhood centered on a mapped fuel station to capture the immediate built environment most relevant to exposure and consequence assessment. Satellite basemaps are fetched via slippy-tile indices at zoom level 18 (≈0.6 m/pixel) and resampled to 600 × 600 px square patches centered on each station. Fuel-station coordinates and identifiers are obtained from OpenStreetMap (OSM) nodes tagged amenity = fuel (optionally fuel:lpg = yes) using OverpassQL. Fuel-station locations and surrounding contextual features are derived from OpenStreetMap (© OpenStreetMap contributors) using OverpassQL; OpenStreetMap data are available under the Open Database License (ODbL) v1.0, and only derived/aggregated features are reported rather than redistributing OpenStreetMap extracts [36]. For each patch, local contextual features (e.g., roads/streets, schools, hospitals, markets, and other commercial venues) are queried within 50 m via OverpassQL to quantify human presence and vulnerability. Building presence and density are derived from rooftop masks produced by a YOLOv8-Small-Seg model trained on a manually annotated subset of patches. After automated retrieval, all images are manually verified to ensure the station appears within the field of view; misplaced coordinates are corrected and patches re-downloaded. Inputs are standardized to a common resolution, normalization, and file-naming convention to support fully reproducible training and evaluation.
3.2. Datasets Construction
3.2.1. FuelRiskMap-TR Datasets Construction
To prepare a geographically grounded image dataset for evaluating fuel-station risk across Türkiye, all OpenStreetMap (OSM) nodes tagged amenity = fuel (optionally fuel:lpg = yes) within Türkiye’s ISO 3166-1 [37] boundary were queried. For each station, the unique OSM identifier (osm_id), name (when present), and latitude-longitude coordinates were extracted, ensuring coverage of petrol, diesel, liquefied petroleum gas (LPG), compressed natural gas (CNG), and electric-hybrid charging points. High-resolution, true-color satellite imagery was retrieved from ArcGIS World Imagery [38]. Geographic coordinates were converted to XYZ tile indices at zoom level 18 (≈0.6 m/pixel); the corresponding 256 × 256 px tiles were downloaded and mosaicked to provide continuous coverage. From these mosaics, square patches centered on each station (covering a 50 m radius) were extracted and resampled to 600 × 600 px using Lanczos interpolation, preserving spatial detail while limiting aliasing artifacts. Images were named station_<idx>_<city>_<osm_id>.png and stored in a standardized directory structure.
A manual quality-control pass verified that each patch contained visible station infrastructure; images failing this check had coordinates refined and were re-downloaded under the same file name. After QC, 7686 images out of 11,723 initial candidates were retained and organized into the top-level FuelRiskMap-TR directory. For the activation-function comparison, the dataset was cast as a binary task by placing images into Low-Risk and High-Risk folders, while the Medium-Risk subset was held out for potential semi-supervised analyses to avoid label ambiguity during core benchmarking. This label-balanced collection of 7686 high-risk and low-risk station images forms the foundation for evaluating SquareSwish across multiple convolutional architectures.
3.2.2. FuelRiskMap-UK Datasets Construction
FuelRiskMap-UK was constructed using the same end-to-end method, but restricted to the United Kingdom (ISO 3166-1 code). Fuel-station coordinates were obtained by querying OSM nodes tagged amenity = fuel within the UK boundary; each station record includes osm_id, name (when present), and geographic coordinates. To preserve consistent naming and facilitate regional analyses, each station was additionally associated with an administrative area name derived via a spatial join between station points and UK administrative boundary polygons; this administrative name was used as the <city> token in the filename convention station_<idx>_<city>_<osm_id>.png. Satellite imagery was again retrieved from ArcGIS World Imagery using the same zoom level [18] and patch geometry (50 m radius, resampled to 600 × 600 px with Lanczos interpolation), followed by the same quality-control procedure to ensure that station infrastructure was visible and correctly centered. The initial UK collection contained 3542 images across three risk tiers after images corresponding to stations with zero nearby buildings (which invalidate building-based risk components) were removed. After filtering, FuelRiskMap-UK contains 2374 images for the binary benchmark (High-Risk: 1169, Low-Risk: 1205). The Medium-Risk tier (1168 images) is excluded from the main binary experiments, consistent with FuelRiskMap-TR. For rooftop segmentation in the UK imagery, the same Roboflow annotation schema and the YOLOv8-Small-Seg model trained on FuelRiskMap-TR annotations were used, ensuring consistent mask semantics and building-count extraction across both countries.
3.3. Risk Scoring and Labeling
Each image was further enriched with local semantic context by issuing OverpassQL queries for named features within a 50 m radius such as streets, schools, and hospitals. These were logged in both a semicolon-separated nearby names list and a numeric name count. These semantic data were merged with building count metrics and fuel station metadata. A composite raw risk score was calculated as a weighted sum of building density and the number of categorized nearby locations. The risk score is given below.
In this formula, , and refer to the weights of nearby buildings, facilities, and other points of interest, and , and refer to their quantities. A parameter search identified weights that balanced the Low/Medium/High image folders (2.8 for buildings, 3.0 for facilities, and 0.3 for other nearby points of interest, such as highways). Then, a min-max normalization is applied to map these raw scores into the [0, 1] interval:
After min-max normalization of the weighted station-level risk scores, thresholds were chosen empirically to form approximately balanced Low-Risk and High-Risk subsets from the pre-filtered datasets (Türkiye: 11,723 images; UK: 3569 images). Samples falling between the two thresholds were treated as Medium-Risk and excluded from the subsequent binary classification task. For FuelRiskMap-TR, scores < 0.08 were labeled Low-Risk and scores > 0.15 High-Risk, yielding 4038 Low-Risk and 3648 High-Risk images. For FuelRiskMap-UK, the thresholds were <0.13 (Low-Risk) and >0.21 (High-Risk), resulting in 1205 Low-Risk and 1169 High-Risk images (2374 total).
Figure 1a visualizes the city-level aggregation of the image-based risk scores based on given weights in Table 1, with each Turkish city shaded according to the mean normalized risk of its fuel-station surroundings. The darkest red in the Marmara region, particularly around Istanbul, indicates that images there consistently exhibit the highest composite risk, driven by dense building clusters and abundant nearby facilities, while the palest tones in certain northeastern and central cities reflect sparse development and fewer proximate points of interest. Mid-range oranges across the Aegean and Southeastern Anatolia suggest moderate risk levels that balance urban density with available services. By depicting these spatial variations, the map highlights regions where high structural complexity and proximity to amenities may warrant more stringent safety measures around fuel stations, in contrast to lower-risk areas that may require basic oversight.

Figure 1.
Spatial distribution of average fuel-station risk based on pre-filtered datasets. Mean normalized station-level risk score (0–1) aggregated by administrative unit; darker shades indicate higher average risk. Risk scores are computed from weighted building-related metrics and nearby-facility categories (Table 1). Administrative boundaries: GADM v4.1. Risk features derived from OpenStreetMap (POIs) and YOLOv8-Small rooftop segmentation. (a) Türkiye: province-level averages. Türkiye’s mean normalized risk by province (darker = higher). (b) United Kingdom: region-level averages. United Kingdom’s mean normalized risk by region (darker = higher).
Table 1.
Risk items and weights.
Figure 1b emphasizes that, similar to Türkiye, risk is spatially heterogeneous in the UK and tends to peak in highly urbanized regions, which may warrant stronger safety planning around fuel stations compared with lower-density areas that may require more basic oversight. This figure provides the same type of region-level aggregation for the United Kingdom, where each administrative region is shaded by the mean normalized image-based risk score of its fuel-station surroundings using the weights in Table 1. The darkest tones concentrate in parts of southern and southeastern England (including the Greater London area and nearby counties), indicating that stations in these regions tend to exhibit the highest composite risk, consistent with dense rooftop/building patterns and high concentrations of nearby facilities/POIs. Lighter shades across large portions of Scotland, Wales, Northern Ireland, and many rural regions of northern England suggest lower average risk, reflecting more dispersed built environments and fewer proximate points of interest. Intermediate orange regions across central England indicate moderate risk levels where urban development and surrounding services are present but less concentrated than in the London-centered area.
3.4. Rooftop Segmentation (YOLOv8-Seg) and Building Metrics
To quantify the number of buildings at each fuel station, a small subset of FuelRiskMap-TR images was annotated in Roboflow [27], and a YOLOv8-Small-Seg model was subsequently trained. From the full pool of 11,723 PNGs, 200 images, which is approximately 1.7% of the total images, were randomly sampled, and every visible rooftop was traced with Roboflow’s Smart-Polygon tool. The resulting Roboflow project was exported in the YOLOv8 segmentation format (one TXT file per image, with class 0 followed by normalized polygon vertices) and split 70:20:10 into 140 training, 40 validation, and 20 test images. The Ultralytics YOLOv8-Small-Seg model (yolov8s-seg.pt) was fine-tuned for 40 epochs at 640 px input resolution with a batch size of 8, using stochastic gradient descent (initial learning rate = 0.005, momentum = 0.937, weight decay = 1 × 10−4; see Table 2).
Table 2.
Configuration of rooftop segmentation model.
Over 40 epochs, training and validation losses decreased smoothly while precision/recall rose to ~0.70–0.75, yielding mAP@0.50 ≈ 0.74 (masks) and 0.75 (boxes) and mAP@0.50-0.95 ≈ 0.50. Figure 2a,b show nine 600 × 600 px map patches from FuelRiskMap-TR and FuelRiskMap-UK, respectively, overlaid with the YOLOv8-Small-Seg model’s outputs: red, semi-transparent polygons mark each predicted roof and red rectangles outline the associated bounding boxes. Tiles are grouped row-wise by the risk level computed in the hazard-assessment method, High-Risk (top row), Medium-Risk (middle row), and Low-Risk (bottom row), and labelled in the upper-left corner of each panel. High-risk examples exhibit dense roof clusters and frequent overlapping masks, whereas medium-risk scenes contain mixed industrial or roadside sites with moderate object counts, and low-risk stations feature only a few buildings in open surroundings. The accurate alignment of masks with rooftops across all three tiers visually corroborates the quantitative results in this study.
Figure 2.
Rooftop segmentation outputs used for building-related risk features. Example 600 × 600-px patches (≈50 m radius around each station) with YOLOv8-Small-Seg predictions overlaid: semi-transparent red polygons indicate predicted rooftops and red rectangles denote the associated bounding boxes. The text label in each tile indicates the risk tier assigned by the proposed scoring method. The segmentation model was trained on 200 manually annotated images (70/20/10 split). Imagery: Esri World Imagery © Esri, Maxar, GeoEye, Earthstar Geographics, CNES/Airbus DS, USDA, USGS, AeroGRID, IGN, and the GIS User Community. (a) FuelRiskMap-TR examples. Pre-filtered Türkiye dataset’s sample rooftop predictions across risk tiers (YOLOv8-Small-Seg overlays on 600 × 600-px patches). (b) Pre-filtered UK dataset’s sample rooftop predictions across risk tiers (same YOLOv8-Small-Seg model applied to UK patches). FuelRiskMap-UK examples (same annotation schema/model applied to UK imagery).
Figure 3 illustrates that most points lie close to the identity line, confirming that the model provides reliable roof-count estimates despite being trained on only 200 manually labelled images. Figure 4 shows that in the pre-filtered datasets (11,723 images for Türkiye and 3569 images for the UK, prior to excluding Medium-Risk samples), approximately 3600 of the 11,723 Turkish fuel-station images (≈31%) contain four detected rooftops, whereas the UK distribution peaks at around five detected rooftops per image, accounting for roughly one-third of the 3569 UK images. Figure 5 shows that in both countries, most station-centered patches have 5–25% building coverage; the Türkiye distribution peaks at around 12% (Figure 5a), while the UK distribution exhibits a similar mode in the 10–15% range (Figure 5b). As shown in Figure 6a,b, building footprint coverage increases rapidly as the first few rooftops are detected and then grows more gradually, with most patches in both datasets saturating around ~40–60% coverage; this pattern indicates a sub-linear relationship in which additional buildings tend to add smaller incremental area as counts increase.
Figure 3.
Predicted vs. true building counts for rooftop segmentation. Each dot corresponds to one validation image. The scatter plot compares the predicted building count (derived from YOLOv8-Small-Seg rooftop outputs) against the ground-truth building count on validation data; the red dashed line denotes the ideal agreement.
Figure 4.
Histograms show the number of rooftops/buildings detected per 600 × 600 px station-centered patch using YOLOv8-Small-Seg rooftop counts. (a) Distribution of detected buildings per image in the pre-filtered Türkiye dataset (11,723 station-centered images, including Medium-Risk before exclusion). (b) Distribution of detected buildings per image in the pre-filtered Türkiye dataset (3569 station-centered images, including Medium-Risk before exclusion).
Figure 5.
Percentage image coverage by buildings. Histograms show the distribution of building (rooftop) area coverage (%) per 600 × 600 px station-centered image patch, computed from YOLOv8-Small-Seg rooftop masks. (a) Pre-filtered Türkiye dataset. Distribution of building (rooftop) coverage (%) per 600 × 600 px station-centered image patch in pre-filtered Türkiye dataset. (b) Pre-filtered UK dataset. Distribution of building (rooftop) coverage (%) per 600 × 600 px station-centered image patch in pre-filtered UK dataset.
Figure 6.
Building count vs. footprint coverage per image. Scatter plots relate the number of detected buildings (YOLOv8-Small-Seg rooftop instances) to the corresponding building footprint coverage (%) within each 600 × 600 px station-centered image patch, illustrating how count-based and area-based building features vary in the pre-filtered datasets. (a) Pre-filtered Türkiye dataset. Detected building count vs. building footprint coverage (%) per 600 × 600 px station-centered image patch on pre-filtered Türkiye dataset. (b) Pre-filtered UK dataset. Detected building count vs. building footprint coverage (%) per 600 × 600 px station-centered image patch on pre-filtered UK dataset.
3.5. Activation Functions Evaluated
Modern convolutional neural networks derive much of their representational power from activation functions that introduce nonlinearity without impeding gradient propagation. Simple rectifiers, such as ReLU and LeakyReLU, remain common because they incur virtually no extra cost: they pass positive inputs unchanged while zeroing or lightly scaling negative values, enabling deep models to train efficiently via backpropagation. To achieve smoother curvature and more stable gradients, researchers proposed saturating smooth activations, including ELU, SELU, GELU, Snake, and LearnSnake, which approach finite asymptotes (or exhibit periodic behavior) outside the linearity, improving gradient stability and controlling output variance in normalization-free and recurrent settings. More recently, sigmoid-gated functions such as Swish, Mish, and Hard-Swish multiply the input by a smooth gate σ(x), yielding gently non-monotonic curves that preserve gradient flow even for large negative inputs and have boosted ImageNet accuracy in EfficientNet and MobileNet architectures. These activation families (summarized in Table 3) provide the conceptual foundation for SquareSwish, a new self-gated function that maintains the computational frugality of rectifiers while combining the smooth, adaptive behavior of gated activations with an enhanced gating effect achieved by squaring.
Table 3.
Activation functions evaluated in this study.
3.6. SquareSwish: Definition and Properties
SquareSwish is introduced as a one-line modification of Swish that strengthens the gating effect while retaining smoothness and low computational cost. The function is defined by
Swish multiplies the input by a single sigmoid gate, producing a gently non-monotonic curve that keeps gradients alive for negative inputs, as shown in Figure 7. In practice, the gate is observed to remain permissive for mildly negative inputs, allowing noise to propagate through the early layers of CNNs. Squaring the gate decays twice as fast for negative inputs and rises more steeply for small positive inputs, effectively sharpening the transition around without introducing any discontinuity. As , faster than Swish, suppressing large-magnitude negative activations; for , approaches , preserving the linear behavior characteristic of ReLU for the positive values.
Figure 7.
SquareSwish activation function.
When analyzing an activation function, its derivative reveals how learning signals propagate back through the network. If the derivative falls to zero over a finite region of inputs, any neuron receiving those inputs will stop updating its weights altogether and eventually die. By contrast, the gradient of SquareSwish, as given below, never vanishes for any finite .
Therefore, as illustrated by the smooth, nonzero derivative curve in Figure 8, any neuron using SquareSwish receives a nonzero error signal for all finite inputs, regardless of their magnitude or sign.
Figure 8.
Derivative of SquareSwish activation function.
3.7. CNN Architectures and Training Protocol
For evaluating SquareSwish on the FuelRiskMap-TR dataset, CNN architectures and activation pairs were trained and evaluated under the same conditions, with no special tuning for any model or function, as shown in Table 4. All six CNN architectures and the eleven activation functions shared an identical data methodology: images were randomly cropped and resized to 224 × 224 pixels, horizontally flipped, and normalized using ImageNet statistics; the train/validation split was 80% and 20%, respectively, with a reproducible random seed. Training followed a consistent schedule that began with three warm-up epochs, during which only the newly added classifier head was trainable. This was followed by up to twenty-seven fine-tuning epochs with the full network unfrozen, using early stopping after three epochs without improvement on the validation set. Every run used the classifier head learned at a rate of 2 × 10−3 during the frozen phase, and the architecture learned at 1 × 10−4 once unfrozen. Each model employed the same classification head architecture, a dropout at 0.30, a dense layer of 512 units, the candidate activation function, and a final two-unit output layer, and ran with identical batch size, hardware configuration, and random seeds. By controlling all other variables in this way, any differences observed in top-1 accuracy or ensemble performance can be attributed solely to the activation functions themselves, rather than to differences in preprocessing, learning-rate schedules, or regularization.
Table 4.
Common training hyperparameters for experiments.
A uniform dropout rate of 0.30 is applied to the fully connected (FC) classification head for every model. This level of regularization proved sufficient to prevent over-fitting on the 7686-image FuelRiskMap-TR dataset without under-utilizing any model’s capacity. The head-dropout rate is held constant to prevent differences in regularization from confounding the activation-function comparisons. The entire feature extractor is frozen for the first three epochs, during which only the fully connected (FC) head is trained. After the warm-up phase, the full network is fine-tuned for up to 27 additional epochs, with early stopping triggered after three consecutive epochs without improvement in the validation loss.
Each of the eleven activations was plugged into the same data methodology, optimizer, dropout, resolution, and schedule. Table 5 shows the given parameters for activation functions. Maintaining this consistency ensures that differences in accuracy reflect the inherent properties of the activations themselves.
Table 5.
Configuration of activation-function parameters used in experiments.
3.8. Evaluation Metrics and Statistical Tests
All models were evaluated on the held-out 20% split (seed = 42). Each network outputs per-image softmax probabilities p = (, ); HighRisk is treated as the positive class and use s = . Threshold-independent metrics are ROC area under the curve (AUC) and average precision (AP), computed from s across all thresholds. Threshold-dependent summaries (reported at τ = 0.5) include overall accuracy, confusion matrix, and class-wise precision/recall/F1. The two-model ensemble forms probabilities by summing the two models’ softmax vectors; for ROC/PR curves the HighRisk component (i.e., ) was averaged to keep scores in [0, 1], and for labels argmax of the summed vector was taken.
Paired comparisons on the same images were conducted using: McNemar’s test with continuity correction on the 2 × 2 table of “A correct/B incorrect” vs. “A incorrect/B correct” built from τ = 0.5 predictions; and a two-sided Wilcoxon signed-rank test on per-sample true-class probabilities (the probability assigned to the ground-truth label), which is sensitive to ranking/calibration differences without fixing a threshold. Uncertainty on performance gaps is quantified with non-parametric bootstrap confidence intervals (CIs) from 10,000 resamples (sampling images with replacement), reporting the 2.5th-97.5th percentiles and the bootstrap median for ΔACC (change in the classification accuracy) and ΔAUC (change in the area under the curve). Metrics use scikit-learn; McNemar uses statsmodels; Wilcoxon uses SciPy. Exact p-values were reported, with statistical significance defined as p < 0.05.
4. Results
4.1. Rooftop Segmentation Performance
Figure 9 depicts the precision-recall (PR) curve for rooftop/building-mask segmentation on the held-out validation split of the annotated rooftop dataset used to train YOLOv8-Small-Seg. Each point along the curve corresponds to a different prediction confidence threshold. As the confidence threshold is lowered, recall (the fraction of ground-truth roofs detected) increases toward 1, while precision (the fraction of predicted masks that are correct) decreases. In the high-confidence regime (recall ≤ 0.2), the curve remains near the upper-left corner, indicating very high precision for the most confident detections. As the threshold is relaxed, precision decreases gradually and then drops more steeply at higher recall, approaching zero at full recall as increasingly many low-confidence predictions (including false positives) are accepted. Overall performance is summarized by the area under the curve, reported as mAP@0.50 = 0.724 in the legend, suggesting that an operating point at a moderate confidence threshold can balance precision and recall around 0.7.
Figure 9.
Precision-recall (PR) curve for rooftop/building-mask segmentation using YOLOv8-Small-Seg on the held-out validation split of the annotated rooftop dataset (mAP@0.50 = 0.724).
4.2. Single-Model Classification Results
Table 6 and Table 7 report the validation accuracies obtained by training each CNN architectures with each candidate activation function (rows 1 to 11), separately for FuelRiskMap-TR and FuelRiskMap-UK. Numbers in the last row of the table give the accuracy of a two-model ensemble formed from the same model’s two highest-scoring single activation runs. On FuelRiskMap-TR (Table 6), the proposed SquareSwish achieves the best single-model result for EfficientNet-B1 (0.9090), outperforming the standard activations tested for that architecture. Across the remaining architectures, the top single-activation results are obtained by Snake for EfficientNet-B4 (0.8914) and EfficientNetV2-S (0.8596), Hard-Swish for EfficientNetV2-M (0.8583), Mish for ResNet-50 (0.9077), and ELU for Xception (0.8953). These results indicate that activation choice meaningfully affects performance and that SquareSwish is a strong alternative to widely used functions, particularly on EfficientNet-B1 for the Türkiye dataset.
Table 6.
Validation accuracies of single activation functions and two-model ensemble compositions on FuelRiskMap-TR dataset.
Table 7.
Validation accuracies of single activation functions and two-model ensemble compositions on FuelRiskMap-UK dataset.
On FuelRiskMap-UK (Table 7), the highest single-model accuracy is achieved by Xception (0.8737) using either LeakyReLU or Swish, while the proposed SquareSwish produces the best result for EfficientNet-B1 (0.8653). When comparing the best-performing activation per architecture on the UK dataset, the EfficientNet-B1 + SquareSwish result (0.8653) constitutes the second-highest architecture-wise best accuracy after Xception (0.8737). This shows that SquareSwish remains competitive under domain shift and provides one of the strongest architecture-level outcomes on the UK dataset.
The 0.92 accuracy for EfficientNet-B1, produced by ensembling SquareSwish with SELU, represents the best accuracy gain on the FuelRiskMap-TR dataset.
4.3. Two-Model Ensemble Performance
To assess whether complementary activation functions can further improve validation accuracy, a simple two-model ensemble per architecture is formed by summing the class-probability vectors from the two best single-activation runs (ranked in Table 6 and Table 7) and predicting the class with the maximum combined probability:
The predicted label is the arg max of , and comparing these predictions with the ground truth yields the top two-model ensemble accuracy. The resulting two-model ensemble accuracies are reported in the last row of Table 6 and Table 7. On FuelRiskMap-TR (Table 6), ensembling consistently improves performance, with the best overall ensemble obtained by EfficientNet-B1 (0.92) when combining SquareSwish + SELU. This result not only represents the strongest ensemble accuracy among all tested architectures on the Türkiye dataset, but also demonstrates that the proposed SquareSwish contributes effectively in a complementary pairing rather than only as a standalone activation. Notably, ResNet-50 and Xception ensembles also achieve high validation accuracies (0.91 and 0.90, respectively), confirming that probability-level ensembling can provide an additional performance margin over single runs.
On FuelRiskMap-UK (Table 7), the best ensemble performance is obtained by Xception (0.88) using Swish + LeakyReLU, while the second-best ensemble accuracy (0.87) is shared by EfficientNet-B1 (SquareSwish + ReLU) and ResNet-50 (ReLU + ELU). Importantly, this places SquareSwish inside a second-best ensemble configuration on the UK dataset, supporting its robustness and usefulness under cross-country domain differences. The ensemble results reinforce the benefit of combining complementary activations and highlight SquareSwish as both part of the best ensemble on FuelRiskMap-TR and, part of a second-best ensemble on FuelRiskMap-UK.
The softmax-probability ensemble of two EfficientNet-B1 heads (SquareSwish and SELU) gives the best validation results. On the held-out test sets (n = 1537; HighRisk = 729, LowRisk = 808) it achieves ACC = 0.9167, ROC-AUC = 0.973, and PR-AUC (AP) = 0.971 (Figure 10a and Figure 11a). The ROC curve hugs the top-left of the plot, and the precision-recall curve remains high across a wide recall range, indicating strong ranking and detection power for HighRisk stations. The corresponding UK ensemble (Xception with Swish + LeakyReLU) achieves ROC-AUC = 0.929 and PR-AUC = 0.923 (Figure 10b and Figure 11b), confirming strong discriminative ability across both datasets.
Figure 10.
Precision-Recall (PR) curves of the ensemble models on the FuelRiskMap datasets: (a) EfficientNet-B1 on FuelRiskMap-TR. PR curve of the SquareSwish + SELU ensemble using EfficientNet-B1 on the FuelRiskMap-TR dataset. and (b) Xception on FuelRiskMap-UK. PR curve of the Swish + LeakyReLU ensemble using Xception on the FuelRiskMap-UK dataset.
Figure 11.
ROC curves of the best-performing ensemble models on the FuelRiskMap datasets: (a) EfficientNet-B1 (SquareSwish + SELU) on FuelRiskMap-TR. ROC curve of the SquareSwish + SELU softmax-probability ensemble using EfficientNet-B1 on the FuelRiskMap-TR dataset (AUC = 0.973). and (b) Xception (Swish + LeakyReLU) on FuelRiskMap-UK. ROC curve of the Swish + LeakyReLU softmax-probability ensemble using Xception on the FuelRiskMap-UK dataset (AUC = 0.929).
Figure 12 reports the confusion matrices for both TR and UK datasets. Using HighRisk as the positive class, the results are given as follows. In FuelRiskMap-TR (Figure 12a), the SquareSwish + SELU softmax-probability ensemble (EfficientNet-B1) yields TP = 646, FN = 83, FP = 45, and TN = 763, yielding recall = 88.6%, precision = 93.5%, and specificity = 94.4%. These results indicate a well-balanced detector that remains highly sensitive to hazardous sites while keeping false alarms low. In FuelRiskMap-UK (Figure 12b), the confusion matrix gives TP = 198, FN = 24, FP = 33, TN = 220, resulting in recall = 198/222 = 89.2%, precision = 198/(198 + 33) = 85.7%, and specificity = 220/253 = 87.0%. Compared with TR, the UK ensemble shows similarly strong sensitivity, with a modest reduction in precision/specificity due to more false-positive High-Risk predictions.
Figure 12.
Confusion matrices of the best-performing ensemble models on the FuelRiskMap datasets: (a) EfficientNet-B1 (SquareSwish + SELU) on FuelRiskMap-TR. Confusion matrix of the SquareSwish + SELU softmax-probability ensemble using EfficientNet-B1 on the FuelRiskMap-TR dataset. and (b) Xception (Swish + LeakyReLU) on FuelRiskMap-UK. Confusion matrix of the Swish + LeakyReLU softmax-probability ensemble using Xception on the FuelRiskMap-UK dataset.
Despite the rooftop-segmentation model’s moderate validation performance (mAP@0.50 = 0.724), the downstream classifier remains robust: the ensemble outperforms the single models (SquareSwish: ACC = 0.899; SELU: ACC = 0.906). Formal significance tests confirming the ensemble’s gains are summarized in Section 4.4.
4.4. Statistical Significance and Confidence Intervals
To reduce dependence on a single 80/20 partition with a fixed random seed, paired comparisons were repeated over three stratified 80/20 splits (seeds 42, 0, and 1), retraining the compared activation variants on each split. On FuelRiskMap-TR, the SquareSwish + SELU softmax-probability ensemble produced the best average performance across splits (ACC = 0.9129 ± 0.0043, ROC-AUC = 0.9706 ± 0.0069, AP = 0.9700 ± 0.0067), followed by SELU (ACC = 0.9033 ± 0.0058, ROC-AUC = 0.9687 ± 0.0018, AP = 0.9680 ± 0.0030). SquareSwish showed lower mean accuracy (ACC = 0.8912 ± 0.0394) and noticeably higher split-to-split variability (ROC-AUC = 0.9548 ± 0.0264, AP = 0.9529 ± 0.0265). For paired significance, McNemar’s test (τ = 0.5) exhibited split sensitivity (e.g., for TR, Ensemble vs. SELU yielded p = 5.3 × 10−4 at seed 42 but p = 0.787 at seed 0), indicating that significance conclusions can vary across partitions even when mean metrics favor the ensemble. Across splits, Wilcoxon tests on split-level deltas did not reach significance with three paired observations (e.g., TR: ΔACC(Ensemble-SELU) two-sided p = 0.1088). Confidence intervals were used as the primary robustness check. On TR, bootstrap CIs across splits supported a positive ensemble advantage over SELU in accuracy (ΔACC CI95% = [0.00195, 0.02016], median 0.00954), while ROC-AUC differences were smaller and could include zero (ΔAUC CI95% = [−0.00643, 0.00709]).
On FuelRiskMap-UK, performance gaps were smaller and less consistent. The ensemble achieved ACC = 0.7509 ± 0.0235, ROC-AUC = 0.8439 ± 0.0226, AP = 0.8213 ± 0.0249, while LeakyReLU slightly exceeded the ensemble in mean accuracy (0.7572 ± 0.0301) but not in ROC-AUC/AP. Paired tests on UK did not provide reliable evidence of an ensemble gain over the strongest single model across splits (e.g., bootstrap ΔACC(Ensemble-LeakyReLU) CI95% = [−0.0147, 0.0000], median −0.00632; ΔAUC CI95% = [−0.00649, 0.00684]), suggesting that any activation/ensemble advantage is dataset-dependent and may be attenuated under cross-country domain shift. Repeated-split evaluation confirms that the ensemble yields the strongest point estimates on TR and competitive ranking performance on UK, while also showing that statistical significance can change with the split; achieving consistently significant activation-driven gains across countries may require explicit domain-adaptation and/or region-specific rooftop annotations.
The paired analyses across repeated stratified splits show that activation/ensembling effects are dataset-dependent and domain-dependent. In FuelRiskMap-TR, the SquareSwish + SELU softmax-probability ensemble yields the strongest average performance across splits and its accuracy improvement over SELU is supported by bootstrap confidence intervals that remain positive, whereas gains over SquareSwish are less stable due to higher split-to-split variance for that model. In FuelRiskMap-UK, performance differences are smaller and the ensemble does not consistently surpass the strongest single model in accuracy, with confidence intervals that include zero for key comparisons, indicating that the measurable advantage of activation choice can diminish under cross-country domain shift. This behavior is consistent with differences in imagery characteristics and urban morphology, and may be amplified by using rooftop masks learned from TR imagery when extracting UK building features. These results motivate multi-region rooftop supervision and domain-adaptation strategies to obtain more consistent activation-induced gains across countries.
5. Discussion and Limitations
This study examined how a small change in gating mechanics can matter in a noisy, weakly supervised geospatial setting. Squaring the sigmoid gate in SquareSwish consistently sharpened class separation on the FuelRiskMap-TR task, yielding the best single-model result on EfficientNet-B1 and, when paired with SELU in a simple softmax-probability ensemble, the strongest overall performance. The gains are modest in absolute terms but statistically reliable on the held-out test sets: the ensemble outperformed both single models by ~1–2% accuracy and ~0.6–0.9% ROC-AUC (bootstrap 95% CIs excluding zero) and showed significant paired improvements by McNemar and Wilcoxon tests. In a safety context, such increments translate into fewer missed high-risk stations or fewer false alarms at essentially no architectural cost.
Mechanistically, SquareSwish suppresses mildly negative activations more strongly than Swish while preserving linear growth for large positives and non-vanishing gradients everywhere. This appears beneficial in transfer learning, where early features can carry label noise (e.g., from imperfect segmentation or incomplete POI queries). Despite a moderate rooftop-segmentation quality (mAP@0.50 = 0.724), the classifier remained accurate, suggesting that the activation’s selective gating helps downstream robustness to imperfect auxiliary signals. The ensemble’s improvement over either constituent further indicates that SquareSwish and SELU emphasize partially complementary margins, and that combining calibrated probabilities, even with a single architecture, offers a simple, compute-light path to better discrimination.
Results were not uniform across all models, showing that activation-architecture interactions matter. SquareSwish was most effective on EfficientNet-B1, whereas other families (e.g., Hard-Swish, Snake/Mish) were competitive. This underscores the value of treating the activation as a first-class hyperparameter rather than a fixed default, especially in remote-sensing methods that mix imagery with semantic/contextual features.
Geographical domain shift is also expected to affect performance when transferring from FuelRiskMap-TR to FuelRiskMap-UK, because station surroundings, urban morphology, roof styles, and imagery appearance differ across countries, and OpenStreetMap completeness may vary by region. Consistent with this, the strongest gains of the SquareSwish + SELU ensemble are observed in-domain on FuelRiskMap-TR, whereas UK improvements are smaller and not statistically reliable under paired tests, suggesting reduced separability and/or increased feature noise under cross-domain transfer. This cross-country gap is plausibly amplified by applying a rooftop-segmentation model trained on TR annotations to UK scenes, where appearance differences can propagate into building-derived risk features and weaken downstream classification margins.
Several limitations temper external validity. First, risk labels are derived from heuristics (building density and POI counts) with thresholds chosen to balance classes; importantly, these heuristic thresholds were not validated by external domain experts in this study, and therefore the labels should be interpreted as proxy risk categories suitable for benchmarking under weak supervision rather than definitive safety ratings. In addition, the risk-score weights were fixed based on informal trial-and-adjustment and were not subjected to a systematic sensitivity analysis; therefore, the robustness of the proxy labels to alternative weightings remains an open question. This improves statistical power and comparability but departs from real-world prevalence and may inflate threshold-free metrics under prior shift. Second, OSM completeness and ArcGIS imagery quality vary by location; missing POIs or outdated tiles can bias supervision. Calibration was not assessed in the main text; although ROC-AUC and Average Precision (AP) capture ranking quality, deployment decisions often require calibrated probabilities and cost-sensitive thresholds. Third, although repeated stratified resamples were used to reduce sensitivity to a single seed, a more exhaustive evaluation (e.g., many more resamples, full k-fold cross-validation, and geographically disjoint testing) was not feasible within the current computational and time budget, because each split requires retraining the models and rerunning the end-to-end method. Accordingly, the reported results should be interpreted as evidence of trends under practical constraints, and more extensive resampling and cross-region validation are left for future work. Finally, a threshold sweep (τ ∈ [0.1, 0.9]) confirms that accuracy varies with τ, but method ranking remains stable across a mid-range, while deployment should additionally consider calibration and cost-sensitive thresholds.
Practically, the findings recommend a simple recipe for urban fuel-station risk mapping (and analogous multimodal remote-sensing tasks): replace the default activation with SquareSwish in the classification head (and, if feasible, key blocks), pair it with a contrasting smooth activation such as SELU and average softmax probabilities at inference, and maintain a lightweight two-stage design to keep training stable under weak labels. Future extensions should probe radius and resolution sensitivity, alternative segmentation models, activation placement beyond the head, domain shifts across countries and imagery providers, and explicit probability calibration for operational thresholding. Future extensions will include domain-expert review of a stratified subset (e.g., high-confidence vs. borderline cases), sensitivity analysis of heuristic thresholds, and validation against independent evidence where accessible (e.g., inspection outcomes, incident reports, or other regulatory/open datasets), to quantify how well the proxy labels align with real-world hazard assessments.
6. Conclusions
SquareSwish, a self-gated activation function defined as , is introduced and evaluated on both standard image-classification benchmarks and a geospatial safety task under a strictly uniform transfer-learning protocol. Under identical image resolution, optimizer and learning-rate schedules, warm-up/fine-tuning budgets, dropout, and hardware settings, SquareSwish is compared against ten established activations (ReLU, LeakyReLU, ELU, SELU, GELU, Snake, LearnSnake, Swish, Mish, and Hard-Swish) across six CNN architectures (EfficientNet-B1/B4, EfficientNet-V2-M/S, ResNet-50, and Xception). On the 7686-image FuelRiskMap-TR dataset, SquareSwish achieved top-1 validation accuracy of 0.909 on EfficientNet-B1, which increases to 0.920 when combined with SELU in a softmax-probability ensemble, indicating that strengthening the sigmoid gate can sharpen decision boundaries without adding architectural complexity.
To enrich supervision and construct FuelRiskMap-TR, a YOLOv8-Small rooftop-segmentation model was trained (mAP@0.50 = 0.724). Building counts derived from rooftop instances, together with OverpassQL features (e.g., proximity to roads, schools, and hospitals), enabled station-level risk scoring and city-level choropleth maps for Türkiye, illustrating how activation-level improvements can be paired with geospatial context to support practical inspection prioritization.
To reduce dependence on a single 80/20 split with a fixed seed, paired comparisons were additionally repeated over three stratified 80/20 splits (seeds 42, 0, and 1) with retraining on each split. On FuelRiskMap-TR, the SquareSwish + SELU ensemble produced the best average results across splits (ACC = 0.9129 ± 0.0043, ROC-AUC = 0.9706 ± 0.0069, AP = 0.9700 ± 0.0067), and bootstrap confidence intervals across splits supported a positive accuracy advantage over SELU (ΔACC CI95% = [0.00195, 0.02016]). On FuelRiskMap-UK, gaps were smaller and less consistent across splits, and bootstrap intervals suggested that any ensemble gain over the strongest single model may include zero, reinforcing that activation/ensemble benefits can be dataset- and domain-dependent under cross-country transfer.
The results support SquareSwish as a competitive, drop-in activation, especially in transfer learning with weak/noisy supervision, while highlighting that consistent cross-country gains may require region-specific rooftop annotations and/or explicit domain-adaptation strategies. Future work can extend evaluation to larger architectures and non-vision modalities, explore activation substitution beyond the classifier head, study robustness under distribution shift and label noise, and assess probability calibration for operational thresholding.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The datasets and code used in this study are available from the corresponding author upon reasonable request. Third-party basemap and vector data are not redistributed; they are attributed to © OpenStreetMap contributors (Open Database License, ODbL v1.0) and Esri World Imagery © Esri, Maxar, GeoEye, Earthstar Geographics, CNES/Airbus DS, USDA, USGS, AeroGRID, IGN, and the GIS User Community. Administrative boundaries are from GADM v4.1 (© GADM; CC BY 4.0).
Acknowledgments
This work contains information from OpenStreetMap (ODbL v1.0) and imagery from Esri World Imagery (attributions as listed above).
Conflicts of Interest
The author declares no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| ACC | Accuracy |
| AI | Artificial Intelligence |
| AP | Average Precision |
| AUC | Area Under the ROC Curve |
| B1/B4 | EfficientNet-B1/EfficientNet-B4 |
| CDF | Cumulative Distribution Function |
| CNG | Compressed Natural Gas |
| CNN | Convolutional Neural Network |
| EO | Earth Observation |
| FC | Fully Connected |
| GADM | Database of Global Administrative Areas |
| GELU | Gaussian Error Linear Unit |
| GIS | Geographic Information System |
| IoU | Intersection over Union |
| ICT | Information and Communication Technologies |
| LPG | Liquefied Petroleum Gas |
| LR | Learning Rate |
| mAP | Mean Average Precision |
| MCDA | Multi-Criteria Decision Analysis |
| ODbL | Open Database License |
| OSM | OpenStreetMap |
| PR | Precision-Recall |
| ROC | Receiver Operating Characteristic |
| R2 | Coefficient of Determination |
| SELU | Scaled Exponential Linear Unit |
| TP/FP/TN/FN | True Positive/False Positive/True Negative/False Negative |
| V2-M/S | EfficientNet-V2-M and EfficientNet-V2-S |
| YOLO | You Only Look Once (Object Detection Family) |
References
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 6105–6114. [Google Scholar]
- Tan, M.; Le, Q. Efficientnetv2: Smaller models and faster training. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 10096–10106. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Chollet, F. Xception: Deep Learning With Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. Proc. icml 2013, 30, 3. [Google Scholar]
- Clevert, D.A.; Unterthiner, T.; Hochreiter, S. Fast and accurate deep network learning by exponential linear units (elus). arXiv 2015, arXiv:1511.07289. [Google Scholar]
- Klambauer, G.; Unterthiner, T.; Mayr, A.; Hochreiter, S. Self-normalizing neural networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 972–981. [Google Scholar]
- Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar]
- Ziyin, L.; Hartwig, T.; Ueda, M. Neural networks fail to learn periodic functions and how to fix it. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; pp. 1583–1594. [Google Scholar]
- Ramachandran, P.; Zoph, B.; Le, Q.V. Searching for activation functions. arXiv 2017, arXiv:1710.05941. [Google Scholar] [CrossRef]
- Misra, D. Mish: A self regularized non-monotonic activation function. arXiv 2019, arXiv:190808681. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Wuraola, A.; Patel, N.; Nguang, S.K. Efficient activation functions for embedded inference engines. Neurocomputing 2021, 442, 73–88. [Google Scholar] [CrossRef]
- Ma, N.; Zhang, X.; Liu, M.; Sun, J. Activate or not: Learning customized activation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8032–8042. [Google Scholar]
- Yu, W.; Si, C.; Zhou, P.; Luo, M.; Zhou, Y.; Feng, J.; Yan, S.; Wang, X. Metaformer baselines for vision. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 896–912. [Google Scholar] [CrossRef] [PubMed]
- Apicella, A.; Donnarumma, F.; Isgrò, F.; Prevete, R. A survey on modern trainable activation functions. Neural Netw. 2021, 138, 14–32. [Google Scholar] [CrossRef]
- Dubey, S.R.; Singh, S.K.; Chaudhuri, B.B. Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing 2022, 503, 92–108. [Google Scholar] [CrossRef]
- Kunc, V.; Kléma, J. Three decades of activations: A comprehensive survey of 400 activation functions for neural networks. arXiv 2024, arXiv:2402.09092. [Google Scholar] [CrossRef]
- Fang, H.; Lee, J.U.; Moosavi, N.S.; Gurevych, I. Transformers with learnable activation functions. In Proceedings of the Findings of the Association for Computational Linguistics: EACL, Dubrovnik, Croatia, 2–6 May 2023; pp. 2382–2398. [Google Scholar]
- Huang, A.H.; Schlag, I. Deriving Activation Functions Using Integration. arXiv 2024, arXiv:2411.13010. [Google Scholar]
- Pugliese Viloria, A.d.J.; Folini, A.; Carrion, D.; Brovelli, M.A. Hazard susceptibility mapping with machine and deep learning: A literature review. Remote Sens. 2024, 16, 3374. [Google Scholar] [CrossRef]
- Al-Najjar, H.A.H.; Pradhan, B. Spatial landslide susceptibility assessment using machine learning techniques assisted by additional data created with generative adversarial networks. Geosci. Front. 2021, 12, 625–637. [Google Scholar] [CrossRef]
- Jokar Arsanjani, J.; Zipf, A.; Mooney, P.; Helbich, M. An introduction to OpenStreetMap in Geographic Information Science: Experiences, research, and applications. In OpenStreetMap in GIScience: Experiences, Research, and Applications; Springer: Cham, Switzerland, 2015; pp. 1–15. [Google Scholar]
- Li, X.; Song, L.; Liu, L.; Zhou, L. GSS-RiskAsser: A Multi-Modal Deep-Learning Framework for Urban Gas Supply System Risk Assessment on Business Users. Sensors 2021, 21, 7010. [Google Scholar] [CrossRef]
- Karakucs, O.; Corcoran, P. A Multi-Modal Spatial Risk Framework for EV Charging Infrastructure Using Remote Sensing. arXiv 2025, arXiv:2506.19860. [Google Scholar]
- Dodangeh, E.; Choubin, B.; Eigdir, A.N.; Nabipour, N.; Panahi, M.; Shamshirband, S.; Mosavi, A. Integrated machine learning methods with resampling algorithms for flood susceptibility prediction. Sci. Total Environ. 2020, 705, 135983. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Liu, H.; Li, W.; Xia, X.G.; Zhang, M.; Gao, C.Z.; Tao, R. Central attention network for hyperspectral imagery classification. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 8989–9003. [Google Scholar] [CrossRef]
- Liu, H.; Li, W.; Xia, X.G.; Zhang, M.; Tao, R. Multiarea target attention for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5524916. [Google Scholar] [CrossRef]
- Liu, H.; Li, W.; Xia, X.G.; Zhang, M.; Guo, Z.; Song, L. Seghsi: Semantic segmentation of hyperspectral images with limited labeled pixels. IEEE Trans. Image Process. 2024, 33, 6469–6482. [Google Scholar] [CrossRef]
- Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the opportunities and risks of foundation models. arXiv 2021, arXiv:2108.07258. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Yang, Y.; Li, S.; Zhang, P. Data-driven accident consequence assessment on urban gas method network based on machine learning. Reliab. Eng. Syst. Saf. 2022, 219, 108216. [Google Scholar] [CrossRef]
- Ma, H.; Liu, Y.; Ren, Y.; Wang, D.; Yu, L.; Yu, J. Improved CNN classification method for groups of buildings damaged by earthquake, based on high resolution remote sensing images. Remote Sens. 2020, 12, 260. [Google Scholar] [CrossRef]
- OpenStreetMap Contributors. Copyright and License. Available online: https://www.openstreetmap.org/copyright (accessed on 21 December 2025).
- ISO 3166-1; Codes for the Representation of Names of Countries and Their Subdivisions—Part 1: Country Code. ISO: Geneva, Switzerland, 2020.
- Esri. ArcGIS World Imagery (ArcGIS Online Basemap Service). Available online: https://services.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer (accessed on 21 December 2025).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.












