MDPI - Publisher of Open Access Journals

25 pages, 5559 KB

Open AccessArticle

WildfireGO: A Multi-Source Wildfire Detection and Validation System Integrating Crowdsourcing, Satellite Hotspots, and Deep Learning

by Supattra Puttinaovarat, Aekarat Saeliw, Siwipa Pruitikanee, Jinda Kongcharoen, Jariya Seksan, Attaporn Wangpoonsarp, Thidapath Anucharn and Niti Iamchuen

Appl. Syst. Innov. 2026, 9(7), 136; https://doi.org/10.3390/asi9070136 (registering DOI) - 26 Jun 2026

Abstract

Wildfires pose serious risks to ecosystems, air quality, and human health. Effective wildfire monitoring requires accurate detection and timely validation, but current approaches are often constrained by fragmented data sources, false alarms, and delays in field verification. This study presents WildfireGO, a multi-source [...] Read more.

Wildfires pose serious risks to ecosystems, air quality, and human health. Effective wildfire monitoring requires accurate detection and timely validation, but current approaches are often constrained by fragmented data sources, false alarms, and delays in field verification. This study presents WildfireGO, a multi-source wildfire detection and validation system that integrates crowdsourced observations, satellite hotspot data, and image-based classification in a geospatial monitoring environment. The system combines user-submitted images, Sentinel-2 imagery, and Moderate Resolution Imaging Spectroradiometer (MODIS) hotspot data processed through Google Earth Engine (GEE) to support wildfire detection and verification. Four classification models, namely Convolutional Neural Network (CNN), Random Forest (RF), K-Nearest Neighbors (KNN), and Gradient Boosting (GB), were evaluated using 10-fold cross-validation and an independent test dataset of 800 wildfire-related images. The CNN model produced the best result, with an accuracy of 97.5% on the independent test dataset. By combining image-based classification with crowdsourced reporting, the system helps screen user-submitted wildfire information and reduce false detections. Satellite-derived hotspot data provide spatial evidence for cross-checking reported events and improving spatial situational awareness for wildfire monitoring and response planning. WildfireGO supports near real-time data submission, automated processing, and interactive map-based visualization through a web-based interface. The findings indicate that combining crowdsourced reports, satellite observations, and image classification in a single geospatial system has the potential to support more reliable wildfire detection and provide practical support for environmental monitoring, disaster response, and spatial decision-making. Full article

(This article belongs to the Section Information Systems)

► Show Figures

Figure 1

28 pages, 125254 KB

Open AccessArticle

Bridging Image-Based Detection and Field Evaluation: A Semi-Automated Pavement Distress Assessment Framework

by Betül Değer Şitilbay and Mehmet Ozan Yılmaz

Sustainability 2026, 18(10), 4935; https://doi.org/10.3390/su18104935 - 14 May 2026

Viewed by 276

Abstract

Accurate, rapid, and consistent evaluation of pavement condition across large-scale road networks is critical for sustainable maintenance and rehabilitation planning. However, conventional approaches largely rely on manual visual inspections, which are time-consuming, subjective, and difficult to implement at the network level. In this [...] Read more.

Accurate, rapid, and consistent evaluation of pavement condition across large-scale road networks is critical for sustainable maintenance and rehabilitation planning. However, conventional approaches largely rely on manual visual inspections, which are time-consuming, subjective, and difficult to implement at the network level. In this study, a semi-automated pavement distress evaluation framework that integrates field-based assessment with computer vision techniques is proposed. The study was conducted on a 3 km roadway network located within the Yıldız Technical University Davutpaşa Campus. Field-based distress observations were used as reference data, while street-level images obtained from the Mapillary platform were analyzed using a deep learning-based YOLOv8 model trained on the RDD2022 dataset, which was specifically developed for road distress detection. The analysis focuses on crack and pothole distress, which have a dominant influence on PCR and are highly distinguishable in image-based approaches. Correlation analyses between automated detection results and field-based data demonstrate a strong agreement, reaching values of approximately

ρ \approx 0.90

in some routes. These findings indicate that these distress types are effective in representing variations in pavement condition. The results demonstrate that multi-source image data and deep learning-based detection methods can be reliably used for section-level pavement condition assessment. The proposed approach addresses a key gap in the literature by transforming image-level detections into engineering-based decision-support information. Furthermore, by leveraging publicly available data sources, the framework offers a low-cost and scalable solution that enables rapid preliminary assessment over large road networks, thereby providing significant potential for sustainable infrastructure management and the development of data-driven maintenance strategies. Several practical challenges encountered during the detection process—including sensitivity to contrast enhancement parameters, false positives from shadows and surface reflections, heterogeneous image resolution across crowdsourced imagery, and training distribution gaps for locally prevalent infrastructure features—are discussed, and directions for reducing human intervention through adaptive preprocessing and targeted model refinement are identified. Full article

► Show Figures

Figure 1

30 pages, 8293 KB

Open AccessArticle

Food Origin Authenticity Using Deep Learning and Citizen Science: Bananas Case Study

by Nikolaos Fragkos, Yamine Bouzembrak, Sara Wilhelmina Erasmus and Filipi Miranda Soares

Foods 2026, 15(10), 1628; https://doi.org/10.3390/foods15101628 - 7 May 2026

Viewed by 651

Abstract

This study introduces an Artificial Intelligence (AI)-based proof-of-concept approach to tackle food fraud by using convolutional neural networks (CNNs) and citizen science-generated imagery to predict the country of origin of Cavendish banana cultivars (Musa spp.). A total of 6000 images were collected [...] Read more.

This study introduces an Artificial Intelligence (AI)-based proof-of-concept approach to tackle food fraud by using convolutional neural networks (CNNs) and citizen science-generated imagery to predict the country of origin of Cavendish banana cultivars (Musa spp.). A total of 6000 images were collected from iNaturalist, and a CNN classifier was trained to distinguish bananas sourced from six countries. Transfer learning was leveraged, and among nine pre-trained models tested, MobileNetV1 demonstrated the best trade-off between performance and computational efficiency. Following model fine-tuning, data augmentation was implemented to mitigate class imbalance and ensure a dense feature space. The model achieved an average accuracy of 0.86 with Monte Carlo Cross Validation and 0.77 with a 5-Fold Cross Validation. The final selected model attained a validation accuracy of 0.79. Accordingly, this study should be viewed as a foundational proof-of-concept demonstrating the potential of AI for origin detection at the cultivation stage. While the current evaluation framework reflects an early-stage experimental setting, the findings introduce a promising new dimension for proactive food fraud detection. Moving forward, this pipeline provides a foundation that can be expanded and independently validated. Full article

(This article belongs to the Special Issue Artificial Intelligence and Modeling Science in the Food Industry)

► Show Figures

Figure 1

17 pages, 3309 KB

Open AccessArticle

Semantic Segmentation for Walkability Assessment in Southeast Asian Streetscapes

by Yunkyung Choi, Darren Ho Di Xiang and Samuel Chng

Sustainability 2026, 18(3), 1355; https://doi.org/10.3390/su18031355 - 29 Jan 2026

Cited by 1 | Viewed by 880

Abstract

Walkable urban environments are increasingly recognized as essential for sustainable mobility, public health, and social well-being. While macro-scale indicators of walkability are widely used, growing evidence highlights the importance of street-level physical conditions experienced at eye level. Advances in computer vision and street [...] Read more.

Walkable urban environments are increasingly recognized as essential for sustainable mobility, public health, and social well-being. While macro-scale indicators of walkability are widely used, growing evidence highlights the importance of street-level physical conditions experienced at eye level. Advances in computer vision and street view imagery (SVI) offer new opportunities to quantify such streetscape characteristics, yet the applicability of existing semantic segmentation models in developing urban contexts remains underexplored. This study evaluates the suitability of five state-of-the-art semantic segmentation models for streetscape analysis using crowdsourced SVI from Phnom Penh, Cambodia. Through a comparative analysis, Oneformer was identified as the most suitable semantic segmentation model, uniquely successful in identifying street vendors through surrogate semantic class (base) and street furniture. A rigorous quantitative validation using manually annotated images confirmed the model’s reliability, achieving an mIoU of 65.7% within the complex urban fabric of Phnom Penh. This performance stems from OneFormer’s unified task-conditioned framework, which integrates semantic, instance, and panoptic information within a single query. Such an architecture ensures enhanced boundary stability and semantic coherence by consolidating visual noise into meaningful units, making it particularly robust for processing the irregular street elements typical of Southeast Asian cities. Applying the selected model revealed pronounced spatial variation in streetscape composition across three neighborhoods, reflecting distinct development stages and levels of informality. These findings suggest that carefully selected pretrained models can yield analytically useful representations of streetscape conditions in data-constrained settings, supporting more context-sensitive and inclusive urban analysis in rapidly developing cities. Full article

► Show Figures

Figure 1

31 pages, 6944 KB

Open AccessArticle

Prompt-Based and Transformer-Based Models Evaluation for Semantic Segmentation of Crowdsourced Urban Imagery Under Projection and Geometric Symmetry Variations

by Sina Rezaei, Aida Yousefi and Hossein Arefi

Symmetry 2026, 18(1), 68; https://doi.org/10.3390/sym18010068 - 31 Dec 2025

Viewed by 1017

Abstract

Semantic segmentation of crowdsourced street-level imagery plays a critical role in urban analytics by enabling pixel-wise understanding of urban scenes for applications such as walkability scoring, environmental comfort evaluation, and urban planning, where robustness to geometric transformations and projection-induced symmetry variations is essential. [...] Read more.

Semantic segmentation of crowdsourced street-level imagery plays a critical role in urban analytics by enabling pixel-wise understanding of urban scenes for applications such as walkability scoring, environmental comfort evaluation, and urban planning, where robustness to geometric transformations and projection-induced symmetry variations is essential. This study presents a comparative evaluation of two primary families of semantic segmentation models: transformer-based models (SegFormer and Mask2Former) and prompt-based models (CLIPSeg, LangSAM, and SAM+CLIP). The evaluation is conducted on images with varying geometric properties, including normal perspective, fisheye distortion, and panoramic format, representing different forms of projection symmetry and symmetry-breaking transformations, using data from Google Street View and Mapillary. Each model is evaluated on a unified benchmark with pixel-level annotations for key urban classes, including road, building, sky, vegetation, and additional elements grouped under the “Other” class. Segmentation performance is assessed through metric-based, statistical, and visual evaluations, with mean Intersection over Union (mIoU) and pixel accuracy serving as the primary metrics. Results show that LangSAM demonstrates strong robustness across different image formats, with mIoU scores of 64.48% on fisheye images, 85.78% on normal perspective images, and 96.07% on panoramic images, indicating strong semantic consistency under projection-induced symmetry variations. Among transformer-based models, SegFormer proves to be the most reliable, attains higher accuracy on fisheye and normal perspective images among all models, with mean IoU scores of 72.21%, 94.92%, and 75.13% on fisheye, normal, and panoramic imagery, respectively. LangSAM not only demonstrates robustness across different projection geometries but also delivers the lowest segmentation error, consistently identifying the correct class for corresponding objects. In contrast, CLIPSeg remains the weakest prompt-based model, with mIoU scores of 77.60% on normal images, 59.33% on panoramic images, and a substantial drop to 59.33% on fisheye imagery, reflecting sensitivity to projection-related symmetry distortions. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry in Artificial Intelligence for Point Cloud Data Processing)

► Show Figures

Figure 1

36 pages, 39262 KB

Open AccessArticle

Exploration of Differences in Housing Price Determinants Based on Street View Imagery and the Geographical-XGBoost Model: Improving Quality of Life for Residents and Through-Travelers

by Shengbei Zhou, Qian Ji, Longhao Zhang, Jun Wu, Pengbo Li and Yuqiao Zhang

ISPRS Int. J. Geo-Inf. 2025, 14(10), 391; https://doi.org/10.3390/ijgi14100391 - 9 Oct 2025

Cited by 6 | Viewed by 2829

Abstract

Street design quality and socio-economic factors jointly influence housing prices, but their intertwined effects and spatial variations remain under-quantified. Housing prices not only reflect residents’ neighborhood experiences but also stem from the spillover value of public streets perceived and used by different users. [...] Read more.

Street design quality and socio-economic factors jointly influence housing prices, but their intertwined effects and spatial variations remain under-quantified. Housing prices not only reflect residents’ neighborhood experiences but also stem from the spillover value of public streets perceived and used by different users. This study takes Tianjin as a case and views the street environment as an immediate experience proxy for through-travelers, combining street view images and crowdsourced perception data to extract both subjective and objective indicators of the street environment, and integrating neighborhood and location characteristics. We use Geographical-XGBoost to evaluate the relative contributions of multiple factors to housing prices and their spatial variations. The results show that incorporating both subjective and objective street information into the Hedonic Pricing Model (HPM) improves its explanatory power, while local modeling with G-XGBoost further reveals significant heterogeneity in the strength and direction of effects across different locations. The results indicate that incorporating both subjective and objective street information into the HPM enhances explanatory power, while local modeling with G-XGBoost reveals significant heterogeneity in the strength and direction of effects across different locations. Street greening, educational resources, and transportation accessibility are consistently associated with higher housing prices, but their strength varies by location. Core urban areas exhibit a “counterproductive effect” in terms of complexity and recognizability, while peripheral areas show a “barely acceptable effect,” which may increase cognitive load and uncertainty for through-travelers. In summary, street environments and socio-economic conditions jointly influence housing prices via a “corridor-side–community-side” dual-pathway: the former (enclosure, safety, recognizability) corresponds to immediate improvements for through-travelers, while the latter (education and public services) corresponds to long-term improvements for residents. Therefore, core urban areas should control design complexity and optimize human-scale safety cues, while peripheral areas should focus on enhancing public services and transportation, and meeting basic quality thresholds with green spaces and open areas. Urban renewal within a 15 min walking radius of residential areas is expected to collaboratively improve daily travel experiences and neighborhood quality for both residents and through-travelers, supporting differentiated housing policy development and enhancing overall quality of life. Full article

► Show Figures

Figure 1

25 pages, 4433 KB

Open AccessArticle

Mathematical Analysis and Performance Evaluation of CBAM-DenseNet121 for Speech Emotion Recognition Using the CREMA-D Dataset

by Zineddine Sarhani Kahhoul, Nadjiba Terki, Ilyes Benaissa, Khaled Aldwoah, E. I. Hassan, Osman Osman and Djamel Eddine Boukhari

Appl. Sci. 2025, 15(17), 9692; https://doi.org/10.3390/app15179692 - 3 Sep 2025

Cited by 1 | Viewed by 1616

Abstract

Emotion recognition from speech is essential for human–computer interaction (HCI) and affective computing, with applications in virtual assistants, healthcare, and education. Although deep learning has made significant advancements in Automatic Speech Emotion Recognition (ASER), the challenge still exists in the task given variation [...] Read more.

Emotion recognition from speech is essential for human–computer interaction (HCI) and affective computing, with applications in virtual assistants, healthcare, and education. Although deep learning has made significant advancements in Automatic Speech Emotion Recognition (ASER), the challenge still exists in the task given variation in speakers, subtle emotional expressions, and environmental noise. Practical deployment in this context depends on a strong, fast, scalable recognition system. This work introduces a new framework combining DenseNet121, especially fine-tuned for the crowd-sourced emotional multimodal actors dataset (CREMA-D), with the convolutional block attention module (CBAM). While DenseNet121’s effective feature propagation captures rich, hierarchical patterns in the speech data, CBAM improves the focus of the model on emotionally significant elements by applying both spatial and channel-wise attention. Furthermore, enhancing the input spectrograms and strengthening resistance against environmental noise is an advanced preprocessing pipeline including log-Mel spectrogram transformation and normalization. The proposed model demonstrates superior performance. To make sure the evaluation is strong even if there is a class imbalance, we point out important metrics like an Unweighted Average Recall (UAR) of 71.01% and an F1 score of 71.25%. The model also gets a test accuracy of 71.26% and a precision of 71.30%. These results establish the model as a promising solution for real-world speech emotion detection, highlighting its strong generalization capabilities, computational efficiency, and focus on emotion-specific features compared to recent work. The improvements demonstrate practical flexibility, enabling the integration of established image recognition techniques and allowing for substantial adaptability in various application contexts. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

27 pages, 57533 KB

Open AccessArticle

Assessing the Influence of Feedback Strategies on Errors in Crowdsourced Annotation of Tumor Images

by Jose Alejandro Libreros, Edwin Gamboa, Erik Henke and Matthias Hirth

Big Data Cogn. Comput. 2025, 9(9), 220; https://doi.org/10.3390/bdcc9090220 - 26 Aug 2025

Viewed by 1756

Abstract

Crowdsourcing enables the acquisition of distributed human intelligence for solving tasks involving human judgments in scalable ways, with many use cases in various application areas accessing human intelligence. However, crowdworkers completing the tasks may have limited or no background knowledge about the tasks [...] Read more.

Crowdsourcing enables the acquisition of distributed human intelligence for solving tasks involving human judgments in scalable ways, with many use cases in various application areas accessing human intelligence. However, crowdworkers completing the tasks may have limited or no background knowledge about the tasks they solve due to the plethora of various tasks available. Therefore, the tasks—even on a micro scale—also need to include appropriate training for the crowdworkers to enable them to complete them successfully. However, training crowdworkers efficiently in a short time for complex tasks poses a challenge and remains an unresolved issue. This paper addresses this challenge by empirically comparing different training strategies for crowdworkers and evaluating their impact on the crowdworkers’ task results. We perform comparisons between a basic training strategy, a strategy based on previous errors made by other crowdworkers, and the addition of instant feedback during training and task completion. Our results show that adding instant feedback during both the training phase and during the task yields more attention from the workers in difficult tasks and hence reduces errors and improves the results. We conclude that more attention is retained when the content of instant feedback includes information about mistakes made by other crowdworkers previously. Full article

(This article belongs to the Topic Applications of Image and Video Processing in Medical Imaging)

► Show Figures

Figure 1

23 pages, 3781 KB

Open AccessArticle

Evaluating Urban Visual Attractiveness Perception Using Multimodal Large Language Model and Street View Images

by Qianyu Zhou, Jiaxin Zhang and Zehong Zhu

Buildings 2025, 15(16), 2970; https://doi.org/10.3390/buildings15162970 - 21 Aug 2025

Cited by 13 | Viewed by 4558

Abstract

Visual attractiveness perception—an individual’s capacity to recognise and evaluate the visual appeal of urban scene safety—has direct implications for well-being, economic vitality, and social cohesion. However, most empirical studies rely on single-source metrics or algorithm-centric pipelines that under-represent human perception. Addressing this gap, [...] Read more.

Visual attractiveness perception—an individual’s capacity to recognise and evaluate the visual appeal of urban scene safety—has direct implications for well-being, economic vitality, and social cohesion. However, most empirical studies rely on single-source metrics or algorithm-centric pipelines that under-represent human perception. Addressing this gap, we introduce a fully reproducible, multimodal framework that measures and models this domain-specific facet of human intelligence by coupling Generative Pre-trained Transformer 4o (GPT-4o) with 1000 Street View images. The pipeline first elicits pairwise aesthetic judgements from GPT-4o, converts them into a latent attractiveness scale via Thurstone’s law of comparative judgement, and then validates the scale against 1.17 M crowdsourced ratings from MIT’s Place Pulse 2.0 benchmark (Spearman ρ = 0.76, p < 0.001). Compared with a Siamese CNN baseline (ρ = 0.60), GPT-4o yields both higher criterion validity and an 88% reduction in inference time, underscoring its superior capacity to approximate human evaluative reasoning. In this study, we introduce a standardised and reproducible streetscape evaluation pipeline using GPT-4o. We then combine the resulting attractiveness scores with network-based accessibility modelling to generate a “aesthetic–accessibility map” of urban central districts in Chongqing, China. Cluster analysis reveals four statistically distinct street types—Iconic Core, Liveable Rings, Transit-Rich but Bland, and Peripheral Low-Appeal—providing actionable insights for landscape design, urban governance, and tourism planning. Full article

(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

► Show Figures

Figure 1

11 pages, 9959 KB

Open AccessArticle

Are Human Judgments of Real and Fake Faces Quantum-like Contextual?

by Peter Bruza, Aaron Lee and Pamela Hoyte

Entropy 2025, 27(8), 868; https://doi.org/10.3390/e27080868 - 15 Aug 2025

Cited by 1 | Viewed by 1574

Abstract

This paper describes a crowdsourced experiment in which participants were asked to judge which of two simultaneously presented facial images (one real, one AI-generated) was fake. With the growing presence of synthetic imagery in digital environments, cognitive systems must adapt to novel and [...] Read more.

This paper describes a crowdsourced experiment in which participants were asked to judge which of two simultaneously presented facial images (one real, one AI-generated) was fake. With the growing presence of synthetic imagery in digital environments, cognitive systems must adapt to novel and often deceptive visual stimuli. Recent developments in cognitive science propose that some mental processes may exhibit quantum-like characteristics, particularly in their context sensitivity. Drawing on Tezzin’s “generalized fair coin” model, this study applied Contextuality-by-Default (CbD) theory to investigate whether human judgments of human faces exhibit quantum-like contextuality. Across 20 trials, each treated as a “generalized coin”, bootstrap resampling (10,000 iterations per coin) revealed that nine trials demonstrated quantum-like contextuality. Notably, Coin 4 exhibited strong context-sensitive causal asymmetry, where both the real and synthetic faces elicited inverse judgments due to their unusually strong resemblance to one another. These results support the growing evidence that cognitive judgments are sometimes quantum-like contextual, suggesting that adopting comparative strategies, such as evaluating unfamiliar faces alongside known-real exemplars, may enhance accuracy in detecting synthetic images. Such pairwise methods align with the strengths of human perception and may inform future interventions, user interfaces, or educational tools aimed at improving visual judgment under uncertainty. Full article

(This article belongs to the Special Issue Quantum Probability and Randomness V)

► Show Figures

Figure 1

16 pages, 2750 KB

Open AccessArticle

Combining Object Detection, Super-Resolution GANs and Transformers to Facilitate Tick Identification Workflow from Crowdsourced Images on the eTick Platform

by Étienne Clabaut, Jérémie Bouffard and Jade Savage

Insects 2025, 16(8), 813; https://doi.org/10.3390/insects16080813 - 6 Aug 2025

Cited by 1 | Viewed by 1422

Abstract

Ongoing changes in the distribution and abundance of several tick species of medical relevance in Canada have prompted the development of the eTick platform—an image-based crowd-sourcing public surveillance tool for Canada enabling rapid tick species identification by trained personnel, and public health guidance [...] Read more.

Ongoing changes in the distribution and abundance of several tick species of medical relevance in Canada have prompted the development of the eTick platform—an image-based crowd-sourcing public surveillance tool for Canada enabling rapid tick species identification by trained personnel, and public health guidance based on tick species and province of residence of the submitter. Considering that more than 100,000 images from over 73,500 identified records representing 25 tick species have been submitted to eTick since the public launch in 2018, a partial automation of the image processing workflow could save substantial human resources, especially as submission numbers have been steadily increasing since 2021. In this study, we evaluate an end-to-end artificial intelligence (AI) pipeline to support tick identification from eTick user-submitted images, characterized by heterogeneous quality and uncontrolled acquisition conditions. Our framework integrates (i) tick localization using a fine-tuned YOLOv7 object detection model, (ii) resolution enhancement of cropped images via super-resolution Generative Adversarial Networks (RealESRGAN and SwinIR), and (iii) image classification using deep convolutional (ResNet-50) and transformer-based (ViT) architectures across three datasets (12, 6, and 3 classes) of decreasing granularities in terms of taxonomic resolution, tick life stage, and specimen viewing angle. ViT consistently outperformed ResNet-50, especially in complex classification settings. The configuration yielding the best performance—relying on object detection without incorporating super-resolution—achieved a macro-averaged F1-score exceeding 86% in the 3-class model (Dermacentor sp., other species, bad images), with minimal critical misclassifications (0.7% of “other species” misclassified as Dermacentor). Given that Dermacentor ticks represent more than 60% of tick volume submitted on the eTick platform, the integration of a low granularity model in the processing workflow could save significant time while maintaining very high standards of identification accuracy. Our findings highlight the potential of combining modern AI methods to facilitate efficient and accurate tick image processing in community science platforms, while emphasizing the need to adapt model complexity and class resolution to task-specific constraints. Full article

(This article belongs to the Section Medical and Livestock Entomology)

► Show Figures

Graphical abstract

13 pages, 736 KB

Open AccessArticle

Birding via Facebook—Methodological Considerations When Crowdsourcing Observations of Bird Behavior via Social Media

by Dirk H. R. Spennemann

Birds 2025, 6(3), 39; https://doi.org/10.3390/birds6030039 - 28 Jul 2025

Cited by 1 | Viewed by 1701

Abstract

This paper outlines a methodology to compile geo-referenced observational data of Australian birds acting as pollinators of Strelitzia sp. (Bird of Paradise) flowers and dispersers of their seeds. Given the absence of systematic published records, a crowdsourcing approach was employed, combining data from [...] Read more.

This paper outlines a methodology to compile geo-referenced observational data of Australian birds acting as pollinators of Strelitzia sp. (Bird of Paradise) flowers and dispersers of their seeds. Given the absence of systematic published records, a crowdsourcing approach was employed, combining data from natural history platforms (e.g., iNaturalist, eBird), image hosting websites (e.g., Flickr) and, in particular, social media. Facebook emerged as the most productive channel, with 61.4% of the 301 usable observations sourced from 43 ornithology-related groups. The strategy included direct solicitation of images and metadata via group posts and follow-up communication. The holistic, snowballing search strategy yielded a unique, behavior-focused dataset suitable for analysis. While the process exposed limitations due to user self-censorship on image quality and completeness, the approach demonstrates the viability of crowdsourced behavioral ecology data and contributes a replicable methodology for similar studies in under-documented ecological contexts. Full article

► Show Figures

Figure 1

26 pages, 7054 KB

Open AccessArticle

An Ensemble of Convolutional Neural Networks for Sound Event Detection

by Abdinabi Mukhamadiyev, Ilyos Khujayarov, Dilorom Nabieva and Jinsoo Cho

Mathematics 2025, 13(9), 1502; https://doi.org/10.3390/math13091502 - 1 May 2025

Cited by 6 | Viewed by 5025

Abstract

Sound event detection tasks are rapidly advancing in the field of pattern recognition, and deep learning methods are particularly well suited for such tasks. One of the important directions in this field is to detect the sounds of emotional events around residential buildings [...] Read more.

Sound event detection tasks are rapidly advancing in the field of pattern recognition, and deep learning methods are particularly well suited for such tasks. One of the important directions in this field is to detect the sounds of emotional events around residential buildings in smart cities and quickly assess the situation for security purposes. This research presents a comprehensive study of an ensemble convolutional recurrent neural network (CRNN) model designed for sound event detection (SED) in residential and public safety contexts. The work focuses on extracting meaningful features from audio signals using image-based representation, such as Discrete Cosine Transform (DCT) spectrograms, Cocheagrams, and Mel spectrograms, to enhance robustness against noise and improve feature extraction. In collaboration with police officers, a two-hour dataset consisting of 112 clips related to four classes of emotional sounds, such as harassment, quarrels, screams, and breaking sounds, was prepared. In addition to the crowdsourced dataset, publicly available datasets were used to broaden the study’s applicability. Our dataset contains 5055 audio files of different lengths totaling 14.14 h and strongly labeled data. The dataset consists of 13 separate sound categories. The proposed CRNN model integrates spatial and temporal feature extraction by processing these spectrograms through convolution and bi-directional gated recurrent unit (GRU) layers. An ensemble approach combines predictions from three models, achieving F1 scores of 71.5% for segment-based metrics and 46% for event-based metrics. The results demonstrate the model’s effectiveness in detecting sound events under noisy conditions, even with a small, unbalanced dataset. This research highlights the potential of the model for real-time audio surveillance systems using mini-computers, offering cost-effective and accurate solutions for maintaining public order. Full article

(This article belongs to the Special Issue Advanced Machine Vision with Mathematics)

► Show Figures

Figure 1

23 pages, 13812 KB

Open AccessArticle

Three-Dimensional Outdoor Pedestrian Road Network Map Construction Based on Crowdsourced Trajectory Data

by Jianbo Tang, Tianyu Zhang, Junjie Ding, Ke Tao, Chen Yang, Jianbing Xiang and Xia Ning

ISPRS Int. J. Geo-Inf. 2025, 14(4), 175; https://doi.org/10.3390/ijgi14040175 - 17 Apr 2025

Cited by 1 | Viewed by 2372

Abstract

Due to the complexity of outdoor environments, we still face challenges in collecting up-to-date outdoor road network maps because of high data collection costs, resulting in a lack of navigation road network maps in outdoor scenarios. Existing road network extraction methods are mainly [...] Read more.

Due to the complexity of outdoor environments, we still face challenges in collecting up-to-date outdoor road network maps because of high data collection costs, resulting in a lack of navigation road network maps in outdoor scenarios. Existing road network extraction methods are mainly divided into trajectory data-based and remote sensing image-based methods. Due to factors such as tree occlusion, methods based on remote sensing images struggle to extract complete road information in outdoor environments. The methods based on trajectory data mainly use vehicle trajectories to extract two-dimensional roads, lacking three-dimensional (3D) road information such as elevation and slope, which are important for navigation path planning in outdoor scenarios. Given this, this paper proposes a hierarchical map construction method for extracting the three-dimensional outdoor pedestrian road network based on crowdsourced trajectory data. This method models the pedestrian road network as a graph composed of pedestrian areas, intersections, and road segments connecting these areas. Three-dimensional roads within and between the intersection areas are generated hierarchically. Experiments and comparative analyses were conducted using real-world outdoor trajectory datasets. Results show that the proposed method has higher accuracy in extracting 3D road information than existing methods. Full article

► Show Figures

Figure 1

30 pages, 8481 KB

Open AccessArticle

Sustainable Parking Space Management Using Machine Learning and Swarm Theory—The SPARK System

by Artur Janowski, Mustafa Hüsrevoğlu and Malgorzata Renigier-Bilozor

Appl. Sci. 2024, 14(24), 12076; https://doi.org/10.3390/app142412076 - 23 Dec 2024

Cited by 11 | Viewed by 5284

Abstract

The utilization of contemporary technology enhances the efficiency of parking resource management, contributing to more liveable and sustainable cities. In response to the growing challenges of urbanization, intelligent parking systems have emerged as a crucial solution for optimizing parking management, reducing traffic congestion, [...] Read more.

The utilization of contemporary technology enhances the efficiency of parking resource management, contributing to more liveable and sustainable cities. In response to the growing challenges of urbanization, intelligent parking systems have emerged as a crucial solution for optimizing parking management, reducing traffic congestion, and minimizing pollution. The primary aim of this study is to present the concept of the developed web application that supports finding available parking spaces, embodied in the SPARK system (Smart Parking Assistance and Resource Knowledge). The integration of the YOLOv9 (You Only Look Once) segmentation algorithm with Artificial Bee Colony (ABC) optimization, combined with the use of crowdsourced data and deep learning for image analysis, significantly enhances the SPARK system’s operational efficiency. It enables rapid and precise detection of available parking spaces while ensuring robustness and continuous improvement. The accuracy of detecting available parking spaces in the presented system, estimated at 87.33%, is satisfactory compared to similar studies worldwide. Full article

(This article belongs to the Special Issue Crowd-Sourced Data and Deep Learning in Remote Sensing: Methods and Applications)

► Show Figures

Figure 1

Search Results (121)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (121)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI