Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,102)

Search Parameters:
Keywords = 6B-Net

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 2849 KB  
Article
Empowering Rural Livestock Health: AI-Powered Early Detection of Cattle Diseases
by Dammavalam Srinivasa Rao, P. Chandra Sekhar Reddy, Annam Revathi, Vangipuram Sravan Kiran, Nuvvusetty Rajasekhar, Nadella Sandhya, Pulipati Venkateswara Rao, Adla Sai Karthik and Puvvala Jogeeswara Venkata Naga Sai
AI 2026, 7(4), 137; https://doi.org/10.3390/ai7040137 (registering DOI) - 9 Apr 2026
Abstract
This paper presents a novel approach for the early detection of cattle diseases. We present a uniquely integrated image classification-based project for real-time cattle disease diagnosis that combines image classification models to identify diseases accurately; a seamless, user-friendly dashboard for real-time monitoring with [...] Read more.
This paper presents a novel approach for the early detection of cattle diseases. We present a uniquely integrated image classification-based project for real-time cattle disease diagnosis that combines image classification models to identify diseases accurately; a seamless, user-friendly dashboard for real-time monitoring with data visualization and instant predictions; and a mobile application that acts as a data source. The mobile application enables real-time collection of farmer and cattle-related data, including age, number of cattle, vaccination cycles, cattle images, and location metadata. Our AI-based cattle health monitoring project enables the early, efficient, scalable, and timely detection of Lumpy Skin Disease (LSD) and Foot and Mouth Disease (FMD) in cattle with high accuracy. A dataset of approximately 1600 LSD/non-LSD images and 840 FMD images was used to train multiple classification networks such as EfficientNetB0, ResNet50, VGG16, EfficientNetV2B0, and EfficientNetV2S, along with a soft-voting ensemble at inference. The proposed framework achieved a maximum testing accuracy of 98.36% for LSD classification and 99.84% for FMD classification under internal validation. These results indicate strong disease recognition capability, with ensemble-based prediction improving robustness, particularly for FMD classification. The proposed system enables practical, early, efficient, and scalable applications of AI research to improve livestock health monitoring and support the early prevention of widespread disease outbreaks. Full article
Show Figures

Graphical abstract

17 pages, 5072 KB  
Article
A Dual-Input Dense U-Net-Based Method for Line Spectrum Purification Under Interference Background
by Zixuan Jia, Tingting Teng and Dajun Sun
J. Mar. Sci. Eng. 2026, 14(8), 700; https://doi.org/10.3390/jmse14080700 - 9 Apr 2026
Abstract
Line spectrum purification is a fundamental task in underwater detection and identification tasks. A dual-input architecture based on Dense U-net is introduced to extract clean line spectra from strong interference. The U-net model features a symmetric encoder–decoder structure that accepts two-dimensional data as [...] Read more.
Line spectrum purification is a fundamental task in underwater detection and identification tasks. A dual-input architecture based on Dense U-net is introduced to extract clean line spectra from strong interference. The U-net model features a symmetric encoder–decoder structure that accepts two-dimensional data as both input and output. The DenseBlock, a core component of DenseNets, offers greater parameter efficiency compared to conventional convolutional layers. In this paper, standard convolutional layers inside the original U-net are replaced by DenseBlocks. This model possesses two input channels, thus allowing the time–frequency feature of the interference and that of the interference–target mixture to be fed simultaneously. With supervised learning, the model is capable of eliminating the strong interference components and background noise from the superimposed spectrum, thereby producing a purified target line spectrum. Compared to traditional interference suppression methods, this approach offers higher feature accuracy and greater signal-to-interference-and-noise ratio (SINR) gain. Moreover, the model is trainable using simulation datasets and then deployed to real-world measurements, demonstrating strong generalization capabilities—a valuable property given the limited availability of labeled samples in underwater detection tasks. Being data-driven, this method operates without requiring prior assumptions about the array configuration, and consequently exhibits greater resilience to array imperfections relative to conventional model-based interference suppression techniques. Simulation and experimental results demonstrate that the proposed method achieves an output SINR improvement of more than 8 dB under low SINR conditions and exhibits significantly better robustness to array position errors than conventional methods, verifying its excellent line spectrum purification capability. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

21 pages, 9327 KB  
Article
Film Mulching Drip Irrigation Improves the Soil Hydrothermal Environment to Enhance Photosynthetic Efficiency and Yield of Sorghum in an Agro-Pastoral Ecotone of Northern China
by Siyu Yan, Wei Xiong, Fengpeng Guo, Baichen Zhang, Jiahao Wang, Matthew Tom Harrison, Ke Liu, Xiaorui Li, Shuqi Dong and Xiangyang Yuan
Plants 2026, 15(8), 1157; https://doi.org/10.3390/plants15081157 - 9 Apr 2026
Abstract
Film mulching drip irrigation (FMDI) has shown strong yield-promoting effects in arid regions, but its regulatory effects on sorghum, under the unstable soil hydrothermal conditions of the agro-pastoral ecotone zone, remain poorly understood. Sorghum production in this region is frequently constrained by uneven [...] Read more.
Film mulching drip irrigation (FMDI) has shown strong yield-promoting effects in arid regions, but its regulatory effects on sorghum, under the unstable soil hydrothermal conditions of the agro-pastoral ecotone zone, remain poorly understood. Sorghum production in this region is frequently constrained by uneven precipitation, high evaporative demand, and limited thermal resources. This study aimed to clarify the role of film mulching drip irrigation in improving the soil hydrothermal environment and photosynthetic performance of sorghum, thereby enhancing yield in the agro-pastoral ecotone of northern China. Compared with bare land without film mulching or drip irrigation (CK), FMDI increased soil temperature by 0.33–2.25 °C and soil moisture by 13.87–18.10% at 0–20 cm depth, alleviating early growth constraints. The leaf chlorophyll b content and carotenoid content of sorghum increased by 55.61% and 55.27%, respectively, while the net photosynthetic rate (Pn) increased by 32.35% and photosystem II (PSII) photochemical efficiency also improved. Random forest (RF) and partial least squares structural equation modeling (PLS–SEM) analyses indicated that chlorophyll, gas exchange, and soil moisture were key drivers of yield formation. Ultimately, FMDI increased yield by 67.08%, indicating that FMDI is an effective irrigation–mulching strategy for improving sustainable sorghum production in the agro-pastoral ecotone zone. Full article
(This article belongs to the Section Crop Physiology and Crop Production)
Show Figures

Figure 1

19 pages, 4608 KB  
Article
SGH-Net: An Efficient Hierarchical Fusion Network with Spectrally Guided Attention for Multi-Modal Landslide Segmentation
by Jing Wang, Haiyang Li, Shuguang Wu, Yukui Yu, Guigen Nie and Zhaoquan Fan
Remote Sens. 2026, 18(8), 1115; https://doi.org/10.3390/rs18081115 - 9 Apr 2026
Abstract
Accurate landslide segmentation from remote sensing imagery is important for geohazard assessment and emergency response, yet it remains challenging because landslide regions are often spectrally confused with bare soil, riverbeds, shadows, and disturbed surfaces while also suffering from severe foreground–background imbalance. To address [...] Read more.
Accurate landslide segmentation from remote sensing imagery is important for geohazard assessment and emergency response, yet it remains challenging because landslide regions are often spectrally confused with bare soil, riverbeds, shadows, and disturbed surfaces while also suffering from severe foreground–background imbalance. To address these issues, we propose an Efficient Spectrally Guided Hierarchical Fusion Network (SGH-Net) for multi-modal landslide segmentation. Instead of directly concatenating heterogeneous inputs at the image level, SGH-Net adopts an asymmetric encoder–decoder design in which a pretrained EfficientNet-B4 extracts RGB features, while two lightweight guidance encoders capture complementary multispectral band and DEM-derived terrain cues. These guidance features are progressively injected into the RGB backbone through multi-stage Guided Attention Blocks, enabling selective feature recalibration and reducing cross-modal interference. In addition, a hybrid Dice–Focal loss is used to alleviate class imbalance. Experiments on the Landslide4Sense dataset show that SGH-Net achieves the best overall performance among the compared methods under the adopted evaluation protocol, reaching 81.15% IoU and a 77.86% F1-score. Compared with representative multi-modal baselines, the proposed method delivers more accurate boundary delineation and fewer false alarms while maintaining favorable model complexity. These results indicate that modality-guided hierarchical fusion is an effective and efficient strategy for multi-modal landslide segmentation. Full article
Show Figures

Figure 1

26 pages, 17314 KB  
Article
An AESRGAN Remote Sensing Super-Resolution Model for Accurate Water Extraction
by Hongjie Liu, Wenlong Song, Juan Lv, Yizhu Lu, Long Chen, Yutong Zhao, Shaobo Linghu, Yifan Duan, Pengyu Chen, Tianshi Feng and Rongjie Gui
Remote Sens. 2026, 18(8), 1108; https://doi.org/10.3390/rs18081108 - 8 Apr 2026
Abstract
Accurate monitoring of water spatiotemporal dynamics is critical for hydrological process analysis and climate impact assessment. While remote sensing enables effective water monitoring, public satellite imagery is limited by mixed-pixel effects that hinder small river detection, and high-resolution commercial data suffers from low [...] Read more.
Accurate monitoring of water spatiotemporal dynamics is critical for hydrological process analysis and climate impact assessment. While remote sensing enables effective water monitoring, public satellite imagery is limited by mixed-pixel effects that hinder small river detection, and high-resolution commercial data suffers from low temporal frequency and restricted coverage. To address these limitations, this study proposes a deep learning-based super-resolution (SR) framework for multispectral remote sensing imagery. This paper constructs a matched dataset for GF2 and Sentinel-2 imagery and develops an Attention Enhanced Super Resolution Generative Adversarial Network (AESRGAN). By integrating attention mechanisms and a spectral-structural loss design, the network is optimized to adapt to the characteristics of multispectral remote sensing imagery. Experimental results demonstrate that AESRGAN achieves strong reconstruction performance, with a Peak Signal-to-Noise Ratio (PSNR) of 33.83 dB and a Structural Similarity Index Measure (SSIM) of 0.882. Water extraction based on the reconstructed imagery using the U-Net++ model achieved an overall accuracy of 0.97 and a Kappa coefficient of 0.92. In addition, the reconstructed imagery improved the estimation accuracy of river length, width, and area by 0.34%, 3.28%, and 8.51%, respectively. The proposed framework provides an effective solution for multi-source remote sensing data fusion and high-precision surface water monitoring, offering new potential for long-term hydrological observation using medium-resolution satellite imagery. Full article
Show Figures

Figure 1

21 pages, 4058 KB  
Article
Transient Voltage Stability Assessment Method Based on CWT-ResNet
by Chong Shao, Yongsheng Jin, Bolin Zhang, Xin He, Chen Zhou and Haiying Dong
Energies 2026, 19(7), 1804; https://doi.org/10.3390/en19071804 - 7 Apr 2026
Abstract
Accurate and rapid transient voltage stability assessment is crucial for the safe and stable operation of new energy bases in desert and grassland regions. Existing deep learning methods fail to adequately capture the high-dimensional dynamic coupling features of transient voltage signals in large-scale [...] Read more.
Accurate and rapid transient voltage stability assessment is crucial for the safe and stable operation of new energy bases in desert and grassland regions. Existing deep learning methods fail to adequately capture the high-dimensional dynamic coupling features of transient voltage signals in large-scale renewable energy bases with UHVDC transmission, and suffer from poor performance under class-imbalanced sample conditions. This paper proposes a transient voltage stability assessment method utilizing continuous wavelet transform (CWT) time–frequency images and a deep residual network (ResNet-50). CWT with the Morlet wavelet basis converts voltage time-series signals into multi-scale time–frequency images to simultaneously capture temporal and frequency-domain transient features. An improved focal loss (FL) function is introduced to dynamically adjust category weights based on actual sample distribution, enhancing model robustness under extreme class imbalance. The proposed method is validated on a modified IEEE 39-bus system incorporating the Qishao UHVDC line and wind/photovoltaic integration in Northwest China, using 1490 simulation samples under diverse fault scenarios. Results demonstrate that the proposed CWT-ResNet achieves 98.88% accuracy, 94.74% precision, 100% recall, and 97.29% F1-score, outperforming SVM, 1D-CNN, and 1D-ResNet baselines. Under 5 dB noise conditions, the method maintains over 90% accuracy, demonstrating strong noise robustness. Full article
(This article belongs to the Special Issue Challenges and Innovations in Stability and Control of Power Systems)
Show Figures

Figure 1

25 pages, 15195 KB  
Article
An Interpretable Deep Learning Approach for Brain Tumor Classification Using a Bangladeshi Brain MRI Dataset
by Md. Saymon Hosen Polash, Md. Tamim Hasan Saykat, Md. Ehsanul Haque, Md. Maniruzzaman, Mahe Zabin and Jia Uddin
BioMedInformatics 2026, 6(2), 19; https://doi.org/10.3390/biomedinformatics6020019 - 7 Apr 2026
Viewed by 26
Abstract
Magnetic resonance imaging (MRI) is a critical clinical tool that requires precise and reliable interpretation for effective brain tumor diagnosis and timely treatment planning. Deep learning methods have advanced automated tumor classification greatly in the last few years, but many of the current [...] Read more.
Magnetic resonance imaging (MRI) is a critical clinical tool that requires precise and reliable interpretation for effective brain tumor diagnosis and timely treatment planning. Deep learning methods have advanced automated tumor classification greatly in the last few years, but many of the current methods are still challenged by a lack of interpretability, a lack of testing on region-focused data, and a lack of model robustness testing. Such limitations reduce clinical trust and limit the practice of automated diagnostic systems. To address these challenges, this study proposes an interpretable deep learning model for classifying brain tumors using the PMRAM dataset, which is a Bangladeshi brain MRI collection containing four categories: glioma, meningioma, pituitary tumor, and normal brain.. The proposed pipeline combines image preprocessing and feature enhancement methods, and then it trains a series of squeeze-and-excitation (SE)-enhanced convolutional neural networks such as VGG19, DenseNet201, MobileNetV3-Large, InceptionV3, and EfficientNetB3. The SE-enhanced EfficientNetB3 performed best, with 98.70% accuracy, 98.77% precision, 98.70% recall, and 98.70% F1-score. Cross-validation also demonstrated stable performance, with a mean accuracy of 96.89%. The model also exhibited efficient inference with low GPU memory consumption, enabling predictions in about 2–4 s per MRI image. Grad-CAM++ and saliency maps were used to improve the transparency of the results, and it was found that the network was concentrated on the clinically significant parts of the tumor, which affected the model predictions. Further robustness analysis and cross-dataset testing are additional evidence of the generalization possibility of the model. An online application was also implemented to allow real-time prediction and visual explanation of brain tumors. Overall, the proposed framework offers a precise, interpretable, and promising solution to automated brain tumor classification using MRI images. Full article
Show Figures

Figure 1

17 pages, 12185 KB  
Article
Adjustable Complexity Transformer Architecture for Image Denoising
by Jan-Ray Liao, Wen Lin and Li-Wen Chang
Signals 2026, 7(2), 33; https://doi.org/10.3390/signals7020033 - 6 Apr 2026
Viewed by 234
Abstract
In recent years, image denoising has seen a shift from traditional non-local self-similarity methods like BM3D to deep-learning based approaches that use learnable convolutions and attention mechanisms. While pixel-level attention is effective at capturing long-range relationships similar to non-local self-similarity based methods, it [...] Read more.
In recent years, image denoising has seen a shift from traditional non-local self-similarity methods like BM3D to deep-learning based approaches that use learnable convolutions and attention mechanisms. While pixel-level attention is effective at capturing long-range relationships similar to non-local self-similarity based methods, it incurs extremely high computational costs that scale quadratically with image resolution. As an alternative, channel-wise attention is resolution-independent and computationally efficient but may miss crucial spatial details. In this paper, an adjustable attention mechanism is introduced that bridges the gap between pixel and channel attentions. In the proposed model, average pooling and variable-size convolutions are added before attention calculation to adjust spatial resolution and, thus, allow dynamical adjustment of computational complexity. This adjustable attention is applied in a transformer-based U-Net architecture and achieves performance comparable to state-of-the-art methods in both real and Gaussian blind denoising tasks. To be more concrete, the proposed method achieves a Peak Signal-to-Noise Ratio of 39.65 dB and a Structural Similarity Index Measure of 0.913 on the Smartphone Image Denoising Dataset. Therefore, the proposed method demonstrates a balance between efficiency and denoising quality. Full article
Show Figures

Figure 1

14 pages, 2277 KB  
Article
Deep Learning Denoising for Enhanced Acetone Detection in Cavity Ring-Down Spectroscopy
by Wenxuan Li, Dongxin Shi, Feifei Wang, Yuxiao Song, Yong Yang, Jing Sun and Chenyu Jiang
Chemosensors 2026, 14(4), 92; https://doi.org/10.3390/chemosensors14040092 - 5 Apr 2026
Viewed by 189
Abstract
Cavity ring-down spectroscopy has significant potential for detecting trace volatile organic compounds, owing to its long absorption path and high sensitivity. However, in practical measurements, noise severely decreases the accuracy of decay curves and the reliability of concentration retrieval. To address this, we [...] Read more.
Cavity ring-down spectroscopy has significant potential for detecting trace volatile organic compounds, owing to its long absorption path and high sensitivity. However, in practical measurements, noise severely decreases the accuracy of decay curves and the reliability of concentration retrieval. To address this, we developed a deep learning-based denoising model called decay-upsampling FC-Net. Experimental results showed that the model improved the signal-to-noise ratio from 13.86 dB to 26.79 dB and processed a single decay curve in only 0.000207 s on average. Moreover, under high-noise conditions, it determined the ring-down time more accurately than conventional methods. This study provides an effective signal processing solution to enhance the practical reliability of Cavity ring-down spectroscopy gas detection systems. Full article
(This article belongs to the Special Issue Spectroscopic Techniques for Chemical Analysis, 2nd Edition)
Show Figures

Figure 1

18 pages, 6642 KB  
Article
Computational Study of Linker Polarity Effects on Optical Electron Transfer in Imine- and Acylhydrazone-Linked Covalent Organic Frameworks Using Fragment Models
by Junjin Chen, Dongdong Qi and Jianzhuang Jiang
Molecules 2026, 31(7), 1179; https://doi.org/10.3390/molecules31071179 - 2 Apr 2026
Viewed by 251
Abstract
Covalent organic frameworks (COFs) have become a research hotspot in photocatalytic materials in recent years due to their highly ordered structures, tunable topologies, and excellent optoelectronic properties. However, the relationship between linker polarity and the direction of optical electron transfer between adjacent structural [...] Read more.
Covalent organic frameworks (COFs) have become a research hotspot in photocatalytic materials in recent years due to their highly ordered structures, tunable topologies, and excellent optoelectronic properties. However, the relationship between linker polarity and the direction of optical electron transfer between adjacent structural units remains poorly understood. This study employs density functional theory (DFT) and time-dependent DFT (TD-DFT) calculations to systematically investigate the effects of polarity reversal in imine and acylhydrazone linkers, as well as different fragment models, on the effective optical net electron transfer. To this end, four representative fragment models (K01–K04) were constructed to simulate linear, multi-connected, and branched environments. The results show that, across all models, the direction of the effective optical net electron transfer from phenyl unit (Ph) to UnitB (QPhUnitB) is highly consistent with the polarity direction of the linker. In imine-linked systems, when the dipole moment of the linker aligns with the intrinsic dipole moment direction between Ph and UnitB, the absolute value of QPhUnitB is significantly enhanced; in acylhydrazone-linked systems, only K02 and K03 exhibit similar behavior, while K01 and K04 show no obvious enhancement. These findings provide important guidance for designing efficient photocatalytic COFs: tuning the linker orientation to match the intrinsic polarity of adjacent structural units can significantly improve the efficiency of optical net electron transfer between them. Full article
(This article belongs to the Section Computational and Theoretical Chemistry)
Show Figures

Figure 1

18 pages, 1850 KB  
Article
AT-HSTNet: An Efficient Hierarchical Action-Transformer Framework for Deepfake Video Detection
by Sameena Javaid, Marwa Chendeb El Rai, Abeer Elkhouly, Obada Al-Khatib, Aicha Beya Far and May El Barachi
Appl. Sci. 2026, 16(7), 3450; https://doi.org/10.3390/app16073450 - 2 Apr 2026
Viewed by 168
Abstract
The rapid advancement of deepfake generation technologies presents significant challenges to the verification of digital video authenticity. These time-dependent artifacts are difficult to detect using conventional frame-based detection approaches. This paper introduces AT-HSTNet, an Action-Transformer-based Hierarchical Spatiotemporal Network designed for robust and computationally [...] Read more.
The rapid advancement of deepfake generation technologies presents significant challenges to the verification of digital video authenticity. These time-dependent artifacts are difficult to detect using conventional frame-based detection approaches. This paper introduces AT-HSTNet, an Action-Transformer-based Hierarchical Spatiotemporal Network designed for robust and computationally efficient deepfake video detection. The proposed framework adopts a multi-stage hierarchical architecture in which frame-level visual features are extracted using an EfficientNet-B0 backbone, short- and medium-range temporal patterns are modeled through Bidirectional Long Short-Term Memory (BiLSTM) networks, and long-range temporal dependencies are captured using an action-aware Transformer operating on temporally aggregated representations. Unlike conventional video transformers that apply self-attention directly to raw frame-level features, the proposed action-aware attention mechanism reduces redundant computation and improves stability in temporal reasoning. Extensive experiments on the balanced FFIW-10K dataset demonstrate that AT-HSTNet achieves an accuracy of 98.7%, with 98.0% precision, 96.0% recall, and a 96.9% F1-score, outperforming representative CNN–BiLSTM and CNN–Transformer baseline architectures. In addition, AT-HSTNet is highly efficient, requiring only 0.45 GFLOPs and achieving an inference speed of approximately 30 FPS on consumer-grade GPU hardware. As a result of this study, we found hierarchical temporal modeling more effective when combined with action-aware attention for any deepfake video detection. Full article
Show Figures

Figure 1

23 pages, 9705 KB  
Article
Wear Condition Assessment of Gear Transmission System Based on Wear Debris Boundary Energy
by Congrui Xu, Wei Cao, Yang Yan, Letian Ding, Yifan Wang, Rongrong Hao, Rui Su and Niraj Khadka
Lubricants 2026, 14(4), 153; https://doi.org/10.3390/lubricants14040153 - 1 Apr 2026
Viewed by 213
Abstract
The gear transmission system is the core component in industrial equipment, and its wear state directly affects the reliability and use life of equipment. The wear debris image contains important information on the mechanical wear state. By processing it and analyzing the characteristics [...] Read more.
The gear transmission system is the core component in industrial equipment, and its wear state directly affects the reliability and use life of equipment. The wear debris image contains important information on the mechanical wear state. By processing it and analyzing the characteristics and types of wear debris, the health status of mechanical equipment and components can be evaluated. However, wear debris images collected in real time are often affected by Gaussian noise. The improved K-SVD dictionary learning algorithm was used in this paper to remove Gaussian noise, using objective metrics to demonstrate the effectiveness of the improved K-SVD algorithm for wear debris images. Secondly, the improved marked watershed segmentation algorithm (B-FSL) was studied to segment the wear debris chains. After that, the boundary energy (BE) characteristics of the wear debris were extracted to warn about the severe wear state of equipment in advance, an EfficientNetB3 network based on transfer learning was constructed for the recognition and classification of the wear debris image, and the severity of the wear of the mechanical equipment was analyzed. Finally, an experiment was conducted to validate the above methods, proved that the BE characteristics of the wear debris can predict the failure of a planetary gearbox in advance, with the accuracy of the wear debris recognition and classification algorithm exceeding 98%. Full article
Show Figures

Figure 1

23 pages, 23579 KB  
Article
Image-Based Waste Classification Using a Hybrid Deep Learning Architecture with Transfer Learning and Edge AI Deployment
by Domen Verber, Teodora Grneva and Jani Dugonik
Mathematics 2026, 14(7), 1176; https://doi.org/10.3390/math14071176 - 1 Apr 2026
Viewed by 314
Abstract
Growing amounts of municipal waste and the need for efficient recycling demand automated and accurate classification systems. This paper investigates deep learning approaches for multi-class waste sorting based on image data, comparing three widely used convolutional neural network architectures (ResNet-50, EfficientNet-B0, and MobileNet [...] Read more.
Growing amounts of municipal waste and the need for efficient recycling demand automated and accurate classification systems. This paper investigates deep learning approaches for multi-class waste sorting based on image data, comparing three widely used convolutional neural network architectures (ResNet-50, EfficientNet-B0, and MobileNet V3) with a custom hybrid model (CustomNet). The dataset comprises 13,933 RGB images across 10 waste categories, combining publicly available samples from the Kaggle Garbage Classification dataset (61.1%) with images collected in house (38.9%). The three glass sub-categories (brown, green, and white glass) were merged into a single glass class to ensure consistent class representation across all dataset splits. Preprocessing steps include normalization, resizing, and extensive data augmentation to improve robustness and mitigate class imbalance. Transfer learning is applied to pretrained models, while CustomNet integrates feature representations from multiple backbones using projection layers and attention mechanisms. Performance is evaluated using accuracy, macro-F1, and ROC–AUC on a held-out test set. Statistical significance was assessed using paired t-tests and Wilcoxon signed-rank tests with Bonferroni correction across five-fold cross-validation runs. The results show that CustomNet achieves 97.79% accuracy, a macro-F1 score of 0.973, and a ROC–AUC of 0.992. CustomNet significantly outperforms EfficientNet-B0 and MobileNet V3 (p<0.001, Bonferroni corrected), and it achieves performance parity with ResNet-50 (p=0.383) at a substantially lower parameter count in the classification head (9.7 M vs. 25.6 M). These findings indicate that combining multiple feature extractors with attention mechanisms improves classification performance, supports qualitative model explainability via saliency visualization (Grad-CAM), and enables practical deployment on heterogeneous Edge AI platforms. Inference benchmarking on an NVIDIA Jetson Orin Nano demonstrated real-world deployment feasibility at 86.70 ms per image (11.5 FPS). Full article
(This article belongs to the Special Issue The Application of Deep Neural Networks in Image Processing)
Show Figures

Figure 1

19 pages, 9566 KB  
Article
Image Colorization with Residual Attention U-Net
by Jun Yang, Donghui Zhang, Fan Wu and Le Yang
Electronics 2026, 15(7), 1462; https://doi.org/10.3390/electronics15071462 - 1 Apr 2026
Viewed by 238
Abstract
Image colorization aims to add plausible colors to grayscale images. However, existing methods often suffer from detail loss, dull colors, and unrealistic results. To address these issues, we propose a novel image colorization method based on a residual attention U-Net. First, a shallow [...] Read more.
Image colorization aims to add plausible colors to grayscale images. However, existing methods often suffer from detail loss, dull colors, and unrealistic results. To address these issues, we propose a novel image colorization method based on a residual attention U-Net. First, a shallow feature extraction module with a fusion attention mechanism is designed to capture shallow features. Second, a residual attention U-Net is constructed by integrating a residual attention module into an improved U-Net architecture. Finally, we fuse the extracted shallow features with the shallow attention features within the residual attention U-Net to enhance detail preservation and improve colorization quality. Experimental results on the summer2winter dataset show that our method improves the average PSNR by 1.32 dB and SSIM by 0.0139, while reducing LPIPS by 0.01. Furthermore, our method achieves the best average PSNR and LPIPS on the NCData and COCO-Stuff datasets. Visual results demonstrate that our approach preserves fine details, produces more vibrant colors, and achieves a higher degree of realism and naturalness. Full article
Show Figures

Figure 1

18 pages, 2824 KB  
Article
Semantic Segmentation of Coffee Crops with PlanetScope Images: A Comparative Analysis of Spectral Band Combinations for U-Net Architecture
by Daniel Henrique Leite, Domingos Sárvio Magalhães Valente, Pedro Maya Ferreira Arruda, Gabriel Dumbá Monteiro de Castro, Daniel Marçal de Queiroz, Diego Bedin Marin and Fábio Daniel Tancredi
AgriEngineering 2026, 8(4), 125; https://doi.org/10.3390/agriengineering8040125 - 1 Apr 2026
Viewed by 276
Abstract
Coffee is among the primary agricultural commodities in international trade; however, mapping coffee crops in mountainous regions faces limitations due to high spectral variability and complex canopy structures. This study hypothesized that optimized spectral band combinations focused on the visible spectrum may outperform [...] Read more.
Coffee is among the primary agricultural commodities in international trade; however, mapping coffee crops in mountainous regions faces limitations due to high spectral variability and complex canopy structures. This study hypothesized that optimized spectral band combinations focused on the visible spectrum may outperform configurations including near-infrared (NIR) for coffee crop segmentation. This work aimed to evaluate how different spectral band combinations affect the performance of the U-Net for segmenting coffee crops in mountainous regions. Seven PlanetScope images (4 m resolution) from Matas de Minas, Brazil, covering different phenological stages in 2023–2024, were divided into 316 training patches and 25 test patches of 256 × 256 pixels and used to train U-Net models across five spectral band combinations: (B, G, R), (B, G, NIR), (B, R, NIR), (G, R, NIR), and (B, G, R, NIR). The visible spectrum combination (B, G, R) demonstrated superior performance with an overall Accuracy of 0.8669 and, for the Coffee Crops class, an F1-score of 0.8682 and an IoU of 0.7671, outperforming all NIR-inclusive configurations. Visible bands’ sensitivity to pigmentation variations proved more effective in heterogeneous environments, while NIR increased spectral confusion near native vegetation and crop edges. The model overestimated cultivated area by 18.3% due to mixed pixels from 4 m resolution and mountainous terrain. These findings confirm that visible-spectrum bands offer a cost-effective alternative for coffee segmentation, though higher spatial resolution is needed for improved boundary delineation. Full article
Show Figures

Figure 1

Back to TopTop