Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (827)

Search Parameters:
Keywords = view fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
34 pages, 4436 KiB  
Article
Structure of the Secretory Compartments in Goblet Cells in the Colon and Small Intestine
by Alexander A. Mironov, Irina S. Sesorova, Pavel S. Vavilov, Roberto Longoni, Paola Briata, Roberto Gherzi and Galina V. Beznoussenko
Cells 2025, 14(15), 1185; https://doi.org/10.3390/cells14151185 (registering DOI) - 31 Jul 2025
Abstract
The Golgi of goblet cells represents a specialized machine for mucin glycosylation. This process occurs in a specialized form of the secretory pathway, which remains poorly examined. Here, using high-resolution three-dimensional electron microscopy (EM), EM tomography, serial block face scanning EM (SBF-SEM) and [...] Read more.
The Golgi of goblet cells represents a specialized machine for mucin glycosylation. This process occurs in a specialized form of the secretory pathway, which remains poorly examined. Here, using high-resolution three-dimensional electron microscopy (EM), EM tomography, serial block face scanning EM (SBF-SEM) and immune EM we analyzed the secretory pathway in goblet cells and revealed that COPII-coated buds on the endoplasmic reticulum (ER) are extremely rare. The ERES vesicles with dimensions typical for the COPII-dependent vesicles were not found. The Golgi is formed by a single cisterna organized in a spiral with characteristics of the cycloid surface. This ribbon has a shape of a cup with irregular perforations. The Golgi cup is filled with secretory granules (SGs) containing glycosylated mucins. Their diameter is close to 1 µm. The cup is connected with ER exit sites (ERESs) with temporal bead-like connections, which are observed mostly near the craters observed at the externally located cis surface of the cup. The craters represent conus-like cavities formed by aligned holes of gradually decreasing diameters through the first three Golgi cisternae. These craters are localized directly opposite the ERES. Clusters of the 52 nm vesicles are visible between Golgi cisternae and between SGs. The accumulation of mucin, started in the fourth cisternal layer, induces distensions of the cisternal lumen. The thickness of these distensions gradually increases in size through the next cisternal layers. The spherical distensions are observed at the edges of the Golgi cup, where they fuse with SGs and detach from the cisternae. After the fusion of SGs located just below the apical plasma membrane (APM) with APM, mucus is secreted. The content of this SG becomes less osmiophilic and the excessive surface area of the APM is formed. This membrane is eliminated through the detachment of bubbles filled with another SG and surrounded with a double membrane or by collapse of the empty SG and transformation of the double membrane lacking a visible lumen into multilayered organelles, which move to the cell basis and are secreted into the intercellular space where the processes of dendritic cells are localized. These data are evaluated from the point of view of existing models of intracellular transport. Full article
20 pages, 10161 KiB  
Article
HybridFilm: A Mixed-Reality History Tool Enabling Interoperability Between Screen Space and Immersive Environments
by Lisha Zhou, Meng Zhang, Yapeng Liu and Dongliang Guo
Appl. Sci. 2025, 15(15), 8489; https://doi.org/10.3390/app15158489 (registering DOI) - 31 Jul 2025
Viewed by 76
Abstract
History tools facilitate iterative analysis data by allowing users to view, retrieve, and revisit visualization states. However, traditional history tools are constrained by screen space limitations, which restrict the user’s ability to fully understand historical states and make it challenging to provide an [...] Read more.
History tools facilitate iterative analysis data by allowing users to view, retrieve, and revisit visualization states. However, traditional history tools are constrained by screen space limitations, which restrict the user’s ability to fully understand historical states and make it challenging to provide an intuitive preview of these states. Most immersive history tools, in contrast, operate independently of screen space and fail to consider their integration. This paper proposes HybridFilm, an innovative mixed-reality history tool that seamlessly integrates screen space and immersive reality. First, it expands the user’s understanding of historical states through a multi-source spatial fusion approach. Second, it proposes a “focus + context”-based multi-source spatial historical data visualization and interaction scheme. Furthermore, we assessed the usability and utility of HybridFilm through experimental evaluation. In comparison to traditional history tools, HybridFilm offers a more intuitive and immersive experience while maintaining a comparable level of interaction comfort and fluency. Full article
(This article belongs to the Special Issue Virtual and Augmented Reality: Theory, Methods, and Applications)
Show Figures

Figure 1

13 pages, 11739 KiB  
Article
DeepVinci: Organ and Tool Segmentation with Edge Supervision and a Densely Multi-Scale Pyramid Module for Robot-Assisted Surgery
by Li-An Tseng, Yuan-Chih Tsai, Meng-Yi Bai, Mei-Fang Li, Yi-Liang Lee, Kai-Jo Chiang, Yu-Chi Wang and Jing-Ming Guo
Diagnostics 2025, 15(15), 1917; https://doi.org/10.3390/diagnostics15151917 - 30 Jul 2025
Viewed by 167
Abstract
Background: Automated surgical navigation can be separated into three stages: (1) organ identification and localization, (2) identification of the organs requiring further surgery, and (3) automated planning of the operation path and steps. With its ideal visual and operating system, the da [...] Read more.
Background: Automated surgical navigation can be separated into three stages: (1) organ identification and localization, (2) identification of the organs requiring further surgery, and (3) automated planning of the operation path and steps. With its ideal visual and operating system, the da Vinci surgical system provides a promising platform for automated surgical navigation. This study focuses on the first step in automated surgical navigation by identifying organs in gynecological surgery. Methods: Due to the difficulty of collecting da Vinci gynecological endoscopy data, we propose DeepVinci, a novel end-to-end high-performance encoder–decoder network based on convolutional neural networks (CNNs) for pixel-level organ semantic segmentation. Specifically, to overcome the drawback of a limited field of view, we incorporate a densely multi-scale pyramid module and feature fusion module, which can also enhance the global context information. In addition, the system integrates an edge supervision network to refine the segmented results on the decoding side. Results: Experimental results show that DeepVinci can achieve state-of-the-art accuracy, obtaining dice similarity coefficient and mean pixel accuracy values of 0.684 and 0.700, respectively. Conclusions: The proposed DeepVinci network presents a practical and competitive semantic segmentation solution for da Vinci gynecological surgery. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

18 pages, 4857 KiB  
Article
Fast Detection of FDI Attacks and State Estimation in Unmanned Surface Vessels Based on Dynamic Encryption
by Zheng Liu, Li Liu, Hongyong Yang, Zengfeng Wang, Guanlong Deng and Chunjie Zhou
J. Mar. Sci. Eng. 2025, 13(8), 1457; https://doi.org/10.3390/jmse13081457 - 30 Jul 2025
Viewed by 67
Abstract
Wireless sensor networks (WSNs) are used for data acquisition and transmission in unmanned surface vessels (USVs). However, the openness of wireless networks makes USVs highly susceptible to false data injection (FDI) attacks during data transmission, which affects the sensors’ ability to receive real [...] Read more.
Wireless sensor networks (WSNs) are used for data acquisition and transmission in unmanned surface vessels (USVs). However, the openness of wireless networks makes USVs highly susceptible to false data injection (FDI) attacks during data transmission, which affects the sensors’ ability to receive real data and leads to decision-making errors in the control center. In this paper, a novel dynamic data encryption method is proposed whereby data are encrypted prior to transmission and the key is dynamically updated using historical system data, with a view to increasing the difficulty for attackers to crack the ciphertext. At the same time, a dynamic relationship is established among ciphertext, key, and auxiliary encrypted ciphertext, and an attack detection scheme based on dynamic encryption is designed to realize instant detection and localization of FDI attacks. Further, an H fusion filter is designed to filter external interference noise, and the real information is estimated or restored by the weighted fusion algorithm. Ultimately, the validity of the proposed scheme is confirmed through simulation experiments. Full article
(This article belongs to the Special Issue Control and Optimization of Ship Propulsion System)
Show Figures

Figure 1

23 pages, 7371 KiB  
Article
A Novel Method for Estimating Building Height from Baidu Panoramic Street View Images
by Shibo Ge, Jiping Liu, Xianghong Che, Yong Wang and Haosheng Huang
ISPRS Int. J. Geo-Inf. 2025, 14(8), 297; https://doi.org/10.3390/ijgi14080297 - 30 Jul 2025
Viewed by 150
Abstract
Building height information plays an important role in many urban-related applications, such as urban planning, disaster management, and environmental studies. With the rapid development of real scene maps, street view images are becoming a new data source for building height estimation, considering their [...] Read more.
Building height information plays an important role in many urban-related applications, such as urban planning, disaster management, and environmental studies. With the rapid development of real scene maps, street view images are becoming a new data source for building height estimation, considering their easy collection and low cost. However, existing studies on building height estimation primarily utilize remote sensing images, with little exploration of height estimation from street-view images. In this study, we proposed a deep learning-based method for estimating the height of a single building in Baidu panoramic street view imagery. Firstly, the Segment Anything Model was used to extract the region of interest image and location features of individual buildings from the panorama. Subsequently, a cross-view matching algorithm was proposed by combining Baidu panorama and building footprint data with height information to generate building height samples. Finally, a Two-Branch feature fusion model (TBFF) was constructed to combine building location features and visual features, enabling accurate height estimation for individual buildings. The experimental results showed that the TBFF model had the best performance, with an RMSE of 5.69 m, MAE of 3.97 m, and MAPE of 0.11. Compared with two state-of-the-art methods, the TBFF model exhibited robustness and higher accuracy. The Random Forest model had an RMSE of 11.83 m, MAE of 4.76 m, and MAPE of 0.32, and the Pano2Geo model had an RMSE of 10.51 m, MAE of 6.52 m, and MAPE of 0.22. The ablation analysis demonstrated that fusing building location and visual features can improve the accuracy of height estimation by 14.98% to 69.99%. Moreover, the accuracy of the proposed method meets the LOD1 level 3D modeling requirements defined by the OGC (height error ≤ 5 m), which can provide data support for urban research. Full article
Show Figures

Figure 1

22 pages, 2525 KiB  
Article
mmHSE: A Two-Stage Framework for Human Skeleton Estimation Using mmWave FMCW Radar Signals
by Jiake Tian, Yi Zou and Jiale Lai
Appl. Sci. 2025, 15(15), 8410; https://doi.org/10.3390/app15158410 - 29 Jul 2025
Viewed by 111
Abstract
We present mmHSE, a two-stage framework for human skeleton estimation using dual millimeter-Wave (mmWave) Frequency-Modulated Continuous-Wave (FMCW) radar signals. To enable data-driven model design and evaluation, we collect and process over 30,000 range–angle maps from 12 users across three representative indoor environments using [...] Read more.
We present mmHSE, a two-stage framework for human skeleton estimation using dual millimeter-Wave (mmWave) Frequency-Modulated Continuous-Wave (FMCW) radar signals. To enable data-driven model design and evaluation, we collect and process over 30,000 range–angle maps from 12 users across three representative indoor environments using a dual-node radar acquisition platform. Leveraging the collected data, we develop a two-stage neural architecture for human skeleton estimation. The first stage employs a dual-branch network with depthwise separable convolutions and self-attention to extract multi-scale spatiotemporal features from dual-view radar inputs. A cross-modal attention fusion module is then used to generate initial estimates of 21 skeletal keypoints. The second stage refines these estimates using a skeletal topology module based on graph convolutional networks, which captures spatial dependencies among joints to enhance localization accuracy. Experiments show that mmHSE achieves a Mean Absolute Error (MAE) of 2.78 cm. In cross-domain evaluations, the MAE remains at 3.14 cm, demonstrating the method’s generalization ability and robustness for non-intrusive human pose estimation from mmWave FMCW radar signals. Full article
Show Figures

Figure 1

17 pages, 1850 KiB  
Article
Cloud–Edge Collaborative Model Adaptation Based on Deep Q-Network and Transfer Feature Extraction
by Jue Chen, Xin Cheng, Yanjie Jia and Shuai Tan
Appl. Sci. 2025, 15(15), 8335; https://doi.org/10.3390/app15158335 - 26 Jul 2025
Viewed by 313
Abstract
With the rapid development of smart devices and the Internet of Things (IoT), the explosive growth of data has placed increasingly higher demands on real-time processing and intelligent decision making. Cloud-edge collaborative computing has emerged as a mainstream architecture to address these challenges. [...] Read more.
With the rapid development of smart devices and the Internet of Things (IoT), the explosive growth of data has placed increasingly higher demands on real-time processing and intelligent decision making. Cloud-edge collaborative computing has emerged as a mainstream architecture to address these challenges. However, in sky-ground integrated systems, the limited computing capacity of edge devices and the inconsistency between cloud-side fusion results and edge-side detection outputs significantly undermine the reliability of edge inference. To overcome these issues, this paper proposes a cloud-edge collaborative model adaptation framework that integrates deep reinforcement learning via Deep Q-Networks (DQN) with local feature transfer. The framework enables category-level dynamic decision making, allowing for selective migration of classification head parameters to achieve on-demand adaptive optimization of the edge model and enhance consistency between cloud and edge results. Extensive experiments conducted on a large-scale multi-view remote sensing aircraft detection dataset demonstrate that the proposed method significantly improves cloud-edge consistency. The detection consistency rate reaches 90%, with some scenarios approaching 100%. Ablation studies further validate the necessity of the DQN-based decision strategy, which clearly outperforms static heuristics. In the model adaptation comparison, the proposed method improves the detection precision of the A321 category from 70.30% to 71.00% and the average precision (AP) from 53.66% to 53.71%. For the A330 category, the precision increases from 32.26% to 39.62%, indicating strong adaptability across different target types. This study offers a novel and effective solution for cloud-edge model adaptation under resource-constrained conditions, enhancing both the consistency of cloud-edge fusion and the robustness of edge-side intelligent inference. Full article
Show Figures

Figure 1

24 pages, 8015 KiB  
Article
Innovative Multi-View Strategies for AI-Assisted Breast Cancer Detection in Mammography
by Beibit Abdikenov, Tomiris Zhaksylyk, Aruzhan Imasheva, Yerzhan Orazayev and Temirlan Karibekov
J. Imaging 2025, 11(8), 247; https://doi.org/10.3390/jimaging11080247 - 22 Jul 2025
Viewed by 437
Abstract
Mammography is the main method for early detection of breast cancer, which is still a major global health concern. However, inter-reader variability and the inherent difficulty of interpreting subtle radiographic features frequently limit the accuracy of diagnosis. A thorough assessment of deep convolutional [...] Read more.
Mammography is the main method for early detection of breast cancer, which is still a major global health concern. However, inter-reader variability and the inherent difficulty of interpreting subtle radiographic features frequently limit the accuracy of diagnosis. A thorough assessment of deep convolutional neural networks (CNNs) for automated mammogram classification is presented in this work, along with the introduction of two innovative multi-view integration techniques: Dual-Branch Ensemble (DBE) and Merged Dual-View (MDV). By setting aside two datasets for out-of-sample testing, we evaluate the generalizability of the model using six different mammography datasets that represent various populations and imaging systems. We compare a number of cutting-edge architectures on both individual and combined datasets, including ResNet, DenseNet, EfficientNet, MobileNet, Vision Transformers, and VGG19. Both MDV and DBE strategies improve classification performance, according to experimental results. VGG19 and DenseNet both obtained high ROC AUC scores of 0.9051 and 0.7960 under the MDV approach. DenseNet demonstrated strong performance in the DBE setting, achieving a ROC AUC of 0.8033, while ResNet50 recorded a ROC AUC of 0.8042. These enhancements demonstrate how beneficial multi-view fusion is for boosting model robustness. The impact of domain shift is further highlighted by generalization tests, which emphasize the need for diverse datasets in training. These results offer practical advice for improving CNN architectures and integration tactics, which will aid in the creation of trustworthy, broadly applicable AI-assisted breast cancer screening tools. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Graphical abstract

14 pages, 2370 KiB  
Article
DP-AMF: Depth-Prior–Guided Adaptive Multi-Modal and Global–Local Fusion for Single-View 3D Reconstruction
by Luoxi Zhang, Chun Xie and Itaru Kitahara
J. Imaging 2025, 11(7), 246; https://doi.org/10.3390/jimaging11070246 - 21 Jul 2025
Viewed by 291
Abstract
Single-view 3D reconstruction remains fundamentally ill-posed, as a single RGB image lacks scale and depth cues, often yielding ambiguous results under occlusion or in texture-poor regions. We propose DP-AMF, a novel Depth-Prior–Guided Adaptive Multi-Modal and Global–Local Fusion framework that integrates high-fidelity depth priors—generated [...] Read more.
Single-view 3D reconstruction remains fundamentally ill-posed, as a single RGB image lacks scale and depth cues, often yielding ambiguous results under occlusion or in texture-poor regions. We propose DP-AMF, a novel Depth-Prior–Guided Adaptive Multi-Modal and Global–Local Fusion framework that integrates high-fidelity depth priors—generated offline by the MARIGOLD diffusion-based estimator and cached to avoid extra training cost—with hierarchical local features from ResNet-32/ResNet-18 and semantic global features from DINO-ViT. A learnable fusion module dynamically adjusts per-channel weights to balance these modalities according to local texture and occlusion, and an implicit signed-distance field decoder reconstructs the final mesh. Extensive experiments on 3D-FRONT and Pix3D demonstrate that DP-AMF reduces Chamfer Distance by 7.64%, increases F-Score by 2.81%, and boosts Normal Consistency by 5.88% compared to strong baselines, while qualitative results show sharper edges and more complete geometry in challenging scenes. DP-AMF achieves these gains without substantially increasing model size or inference time, offering a robust and effective solution for complex single-view reconstruction tasks. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

23 pages, 3858 KiB  
Article
MCFA: Multi-Scale Cascade and Feature Adaptive Alignment Network for Cross-View Geo-Localization
by Kaiji Hou, Qiang Tong, Na Yan, Xiulei Liu and Shoulu Hou
Sensors 2025, 25(14), 4519; https://doi.org/10.3390/s25144519 - 21 Jul 2025
Viewed by 349
Abstract
Cross-view geo-localization (CVGL) presents significant challenges due to the drastic variations in perspective and scene layout between unmanned aerial vehicle (UAV) and satellite images. Existing methods have made certain advancements in extracting local features from images. However, they exhibit limitations in modeling the [...] Read more.
Cross-view geo-localization (CVGL) presents significant challenges due to the drastic variations in perspective and scene layout between unmanned aerial vehicle (UAV) and satellite images. Existing methods have made certain advancements in extracting local features from images. However, they exhibit limitations in modeling the interactions among local features and fall short in aligning cross-view representations accurately. To address these issues, we propose a Multi-Scale Cascade and Feature Adaptive Alignment (MCFA) network, which consists of a Multi-Scale Cascade Module (MSCM) and a Feature Adaptive Alignment Module (FAAM). The MSCM captures the features of the target’s adjacent regions and enhances the model’s robustness by learning key region information through association and fusion. The FAAM, with its dynamically weighted feature alignment module, adaptively adjusts feature differences across different viewpoints, achieving feature alignment between drone and satellite images. Our method achieves state-of-the-art (SOTA) performance on two public datasets, University-1652 and SUES-200. In generalization experiments, our model outperforms existing SOTA methods, with an average improvement of 1.52% in R@1 and 2.09% in AP, demonstrating its effectiveness and strong generalization in cross-view geo-localization tasks. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

24 pages, 53471 KiB  
Article
Integrating Remote Sensing and Street View Imagery with Deep Learning for Urban Slum Mapping: A Case Study from Bandung City
by Krisna Ramita Sijabat, Muhammad Aufaristama, Mochamad Candra Wirawan Arief and Irwan Ary Dharmawan
Appl. Sci. 2025, 15(14), 8044; https://doi.org/10.3390/app15148044 - 19 Jul 2025
Viewed by 307
Abstract
In pursuit of the Sustainable Development Goals (SDGs)’s objective of eliminating slum cities, the government of Indonesia has initiated a survey-based slum mapping program. Unfortunately, recent observations have highlighted considerable inconsistencies in the mapping process. These inconsistencies can be attributed to various factors, [...] Read more.
In pursuit of the Sustainable Development Goals (SDGs)’s objective of eliminating slum cities, the government of Indonesia has initiated a survey-based slum mapping program. Unfortunately, recent observations have highlighted considerable inconsistencies in the mapping process. These inconsistencies can be attributed to various factors, including variations in the expertise of surveyors and the intricacies of the indicators employed to characterize slum conditions. Consequently, reliable data is lacking, which poses a significant barrier to effective monitoring of slum upgrading programs. Remote sensing (RS)-based approaches, particularly those employing deep learning (DL) techniques, have emerged as a highly effective and accurate method for identifying slum areas. However, the reliance on RS alone is likely to encounter challenges in complex urban environments. A substantial body of research has previously identified the merits of integrating land surface data with RS. Therefore, this study seeks to combine remote sensing imagery (RSI) with street view imagery (SVI) for the purpose of slum mapping and compare its accuracy with a field survey conducted in 2024. The city of Bandung is a pertinent case study, as it is facing a considerable increase in population density. These slums collectively encompass approximately one-tenth of Bandung City’s population as of 2020. The present investigation evaluates the mapping results obtained from four distinct deep learning (DL) networks: The first category comprises FCN, which utilizes RSI exclusively, and FCN-DK, which also employs RSI as its sole input. The second category consists of two networks that integrate RSI and SVI, namely FCN and FCN-DK. The findings indicate that the integration of RSI and SVI enhances the precision of slum mapping in Bandung City, particularly when employing the FCN-DK network, achieving an accuracy of 86.25%. The results of the mapping process employing a combination of the FCN-DK network, which utilizes the RSI and SVI, indicate the presence of 2294 light slum points and 29 medium slum points. It should be noted that the outcomes are contingent upon the methodological approach employed, the accessibility of the dataset, and the training data that mirrors the distribution of slums in 2020 and the specific degree of its integration within the FCN network. The FCN-DK model, which integrates RSI and SVI, demonstrates enhanced performance in comparison to the other models examined in this study. Full article
(This article belongs to the Special Issue Geographic Information System (GIS) for Various Applications)
Show Figures

Figure 1

24 pages, 824 KiB  
Article
MMF-Gait: A Multi-Model Fusion-Enhanced Gait Recognition Framework Integrating Convolutional and Attention Networks
by Kamrul Hasan, Khandokar Alisha Tuhin, Md Rasul Islam Bapary, Md Shafi Ud Doula, Md Ashraful Alam, Md Atiqur Rahman Ahad and Md. Zasim Uddin
Symmetry 2025, 17(7), 1155; https://doi.org/10.3390/sym17071155 - 19 Jul 2025
Viewed by 375
Abstract
Gait recognition is a reliable biometric approach that uniquely identifies individuals based on their natural walking patterns. It is widely used to recognize individuals who are challenging to camouflage and do not require a person’s cooperation. The general face-based person recognition system often [...] Read more.
Gait recognition is a reliable biometric approach that uniquely identifies individuals based on their natural walking patterns. It is widely used to recognize individuals who are challenging to camouflage and do not require a person’s cooperation. The general face-based person recognition system often fails to determine the offender’s identity when they conceal their face by wearing helmets and masks to evade identification. In such cases, gait-based recognition is ideal for identifying offenders, and most existing work leverages a deep learning (DL) model. However, a single model often fails to capture a comprehensive selection of refined patterns in input data when external factors are present, such as variation in viewing angle, clothing, and carrying conditions. In response to this, this paper introduces a fusion-based multi-model gait recognition framework that leverages the potential of convolutional neural networks (CNNs) and a vision transformer (ViT) in an ensemble manner to enhance gait recognition performance. Here, CNNs capture spatiotemporal features, and ViT features multiple attention layers that focus on a particular region of the gait image. The first step in this framework is to obtain the Gait Energy Image (GEI) by averaging a height-normalized gait silhouette sequence over a gait cycle, which can handle the left–right gait symmetry of the gait. After that, the GEI image is fed through multiple pre-trained models and fine-tuned precisely to extract the depth spatiotemporal feature. Later, three separate fusion strategies are conducted, and the first one is decision-level fusion (DLF), which takes each model’s decision and employs majority voting for the final decision. The second is feature-level fusion (FLF), which combines the features from individual models through pointwise addition before performing gait recognition. Finally, a hybrid fusion combines DLF and FLF for gait recognition. The performance of the multi-model fusion-based framework was evaluated on three publicly available gait databases: CASIA-B, OU-ISIR D, and the OU-ISIR Large Population dataset. The experimental results demonstrate that the fusion-enhanced framework achieves superior performance. Full article
(This article belongs to the Special Issue Symmetry and Its Applications in Image Processing)
Show Figures

Figure 1

24 pages, 9664 KiB  
Article
Frequency-Domain Collaborative Lightweight Super-Resolution for Fine Texture Enhancement in Rice Imagery
by Zexiao Zhang, Jie Zhang, Jinyang Du, Xiangdong Chen, Wenjing Zhang and Changmeng Peng
Agronomy 2025, 15(7), 1729; https://doi.org/10.3390/agronomy15071729 - 18 Jul 2025
Viewed by 302
Abstract
In rice detection tasks, accurate identification of leaf streaks, pest and disease distribution, and spikelet hierarchies relies on high-quality images to distinguish between texture and hierarchy. However, existing images often suffer from texture blurring and contour shifting due to equipment and environment limitations, [...] Read more.
In rice detection tasks, accurate identification of leaf streaks, pest and disease distribution, and spikelet hierarchies relies on high-quality images to distinguish between texture and hierarchy. However, existing images often suffer from texture blurring and contour shifting due to equipment and environment limitations, which affects the detection performance. In view of the fact that pests and diseases affect the whole situation and tiny details are mostly localized, we propose a rice image reconstruction method based on an adaptive two-branch heterogeneous structure. The method consists of a low-frequency branch (LFB) that recovers global features using orientation-aware extended receptive fields to capture streaky global features, such as pests and diseases, and a high-frequency branch (HFB) that enhances detail edges through an adaptive enhancement mechanism to boost the clarity of local detail regions. By introducing the dynamic weight fusion mechanism (CSDW) and lightweight gating network (LFFN), the problem of the unbalanced fusion of frequency information for rice images in traditional methods is solved. Experiments on the 4× downsampled rice test set demonstrate that the proposed method achieves a 62% reduction in parameters compared to EDSR, 41% lower computational cost (30 G) than MambaIR-light, and an average PSNR improvement of 0.68% over other methods in the study while balancing memory usage (227 M) and inference speed. In downstream task validation, rice panicle maturity detection achieves a 61.5% increase in mAP50 (0.480 → 0.775) compared to interpolation methods, and leaf pest detection shows a 2.7% improvement in average mAP50 (0.949 → 0.975). This research provides an effective solution for lightweight rice image enhancement, with its dual-branch collaborative mechanism and dynamic fusion strategy establishing a new paradigm in agricultural rice image processing. Full article
(This article belongs to the Collection AI, Sensors and Robotics for Smart Agriculture)
Show Figures

Figure 1

22 pages, 5363 KiB  
Article
Accurate Extraction of Rural Residential Buildings in Alpine Mountainous Areas by Combining Shadow Processing with FF-SwinT
by Guize Luan, Jinxuan Luo, Zuyu Gao and Fei Zhao
Remote Sens. 2025, 17(14), 2463; https://doi.org/10.3390/rs17142463 - 16 Jul 2025
Viewed by 269
Abstract
Precise extraction of rural settlements in alpine regions is critical for geographic data production, rural development, and spatial optimization. However, existing deep learning models are hindered by insufficient datasets and suboptimal algorithm structures, resulting in blurred boundaries and inadequate extraction accuracy. Therefore, this [...] Read more.
Precise extraction of rural settlements in alpine regions is critical for geographic data production, rural development, and spatial optimization. However, existing deep learning models are hindered by insufficient datasets and suboptimal algorithm structures, resulting in blurred boundaries and inadequate extraction accuracy. Therefore, this study uses high-resolution unmanned aerial vehicle (UAV) remote sensing images to construct a specialized dataset for the extraction of rural settlements in alpine mountainous areas, while introducing an innovative shadow mitigation technique that integrates multiple spectral characteristics. This methodology effectively addresses the challenges posed by intense shadows in settlements and environmental occlusions common in mountainous terrain analysis. Based on the comparative experiments with existing deep learning models, the Swin Transformer was selected as the baseline model. Building upon this, the Feature Fusion Swin Transformer (FF-SwinT) model was constructed by optimizing the data processing, loss function, and multi-view feature fusion. Finally, we rigorously evaluated it through ablation studies, generalization tests and large-scale image application experiments. The results show that the FF-SwinT has improved in many indicators compared with the traditional Swin Transformer, and the recognition results have clear edges and strong integrity. These results suggest that the FF-SwinT establishes a novel framework for rural settlement extraction in alpine mountain regions, which is of great significance for regional spatial optimization and development policy formulation. Full article
Show Figures

Figure 1

21 pages, 3826 KiB  
Article
UAV-OVD: Open-Vocabulary Object Detection in UAV Imagery via Multi-Level Text-Guided Decoding
by Lijie Tao, Guoting Wei, Zhuo Wang, Zhaoshuai Qi, Ying Li and Haokui Zhang
Drones 2025, 9(7), 495; https://doi.org/10.3390/drones9070495 - 14 Jul 2025
Viewed by 458
Abstract
Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore [...] Read more.
Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore open-vocabulary or open-world detection, their application to UAV imagery remains limited and underexplored. In this paper, we address this limitation by exploring the relationship between images and textual semantics to extend object detection in UAV imagery to an open-vocabulary setting. We propose a novel and efficient detector named Unmanned Aerial Vehicle Open-Vocabulary Detector (UAV-OVD), specifically designed for drone-captured scenes. To facilitate open-vocabulary object detection, we propose improvements from three complementary perspectives. First, at the training level, we design a region–text contrastive loss to replace conventional classification loss, allowing the model to align visual regions with textual descriptions beyond fixed category sets. Structurally, building on this, we introduce a multi-level text-guided fusion decoder that integrates visual features across multiple spatial scales under language guidance, thereby improving overall detection performance and enhancing the representation and perception of small objects. Finally, from the data perspective, we enrich the original dataset with synonym-augmented category labels, enabling more flexible and semantically expressive supervision. Experiments conducted on two widely used benchmark datasets demonstrate that our approach achieves significant improvements in both mean mAP and Recall. For instance, for Zero-Shot Detection on xView, UAV-OVD achieves 9.9 mAP and 67.3 Recall, 1.1 and 25.6 higher than that of YOLO-World. In terms of speed, UAV-OVD achieves 53.8 FPS, nearly twice as fast as YOLO-World and five times faster than DetrReg, demonstrating its strong potential for real-time open-vocabulary detection in UAV imagery. Full article
(This article belongs to the Special Issue Applications of UVs in Digital Photogrammetry and Image Processing)
Show Figures

Figure 1

Back to TopTop