Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (80)

Search Parameters:
Keywords = residual full attention

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 13439 KiB  
Article
Swin-ReshoUnet: A Seismic Profile Signal Reconstruction Method Integrating Hierarchical Convolution, ORCA Attention, and Residual Channel Attention Mechanism
by Jie Rao, Mingju Chen, Xiaofei Song, Chen Xie, Xueyang Duan, Xiao Hu, Senyuan Li and Xingyue Zhang
Appl. Sci. 2025, 15(15), 8332; https://doi.org/10.3390/app15158332 - 26 Jul 2025
Viewed by 148
Abstract
This study proposes a Swin-ReshoUnet architecture with a three-level enhancement mechanism to address inefficiencies in multi-scale feature extraction and gradient degradation in deep networks for high-precision seismic exploration. The encoder uses a hierarchical convolution module to build a multi-scale feature pyramid, enhancing cross-scale [...] Read more.
This study proposes a Swin-ReshoUnet architecture with a three-level enhancement mechanism to address inefficiencies in multi-scale feature extraction and gradient degradation in deep networks for high-precision seismic exploration. The encoder uses a hierarchical convolution module to build a multi-scale feature pyramid, enhancing cross-scale geological signal representation. The decoder replaces traditional self-attention with ORCA attention to enable global context modeling with lower computational cost. Skip connections integrate a residual channel attention module, mitigating gradient degradation via dual-pooling feature fusion and activation optimization, forming a full-link optimization from low-level feature enhancement to high-level semantic integration. Simulated and real dataset experiments show that at decimation ratios of 0.1–0.5, the method significantly outperforms SwinUnet, TransUnet, etc., in reconstruction performance. Residual signals and F-K spectra verify high-fidelity reconstruction. Despite increased difficulty with higher sparsity, it maintains optimal performance with notable margins, demonstrating strong robustness. The proposed hierarchical feature enhancement and cross-scale attention strategies offer an efficient seismic profile signal reconstruction solution and show generality for migration to complex visual tasks, advancing geophysics-computer vision interdisciplinary innovation. Full article
Show Figures

Figure 1

15 pages, 4874 KiB  
Article
A Novel 3D Convolutional Neural Network-Based Deep Learning Model for Spatiotemporal Feature Mapping for Video Analysis: Feasibility Study for Gastrointestinal Endoscopic Video Classification
by Mrinal Kanti Dhar, Mou Deb, Poonguzhali Elangovan, Keerthy Gopalakrishnan, Divyanshi Sood, Avneet Kaur, Charmy Parikh, Swetha Rapolu, Gianeshwaree Alias Rachna Panjwani, Rabiah Aslam Ansari, Naghmeh Asadimanesh, Shiva Sankari Karuppiah, Scott A. Helgeson, Venkata S. Akshintala and Shivaram P. Arunachalam
J. Imaging 2025, 11(7), 243; https://doi.org/10.3390/jimaging11070243 - 18 Jul 2025
Viewed by 422
Abstract
Accurate analysis of medical videos remains a major challenge in deep learning (DL) due to the need for effective spatiotemporal feature mapping that captures both spatial detail and temporal dynamics. Despite advances in DL, most existing models in medical AI focus on static [...] Read more.
Accurate analysis of medical videos remains a major challenge in deep learning (DL) due to the need for effective spatiotemporal feature mapping that captures both spatial detail and temporal dynamics. Despite advances in DL, most existing models in medical AI focus on static images, overlooking critical temporal cues present in video data. To bridge this gap, a novel DL-based framework is proposed for spatiotemporal feature extraction from medical video sequences. As a feasibility use case, this study focuses on gastrointestinal (GI) endoscopic video classification. A 3D convolutional neural network (CNN) is developed to classify upper and lower GI endoscopic videos using the hyperKvasir dataset, which contains 314 lower and 60 upper GI videos. To address data imbalance, 60 matched pairs of videos are randomly selected across 20 experimental runs. Videos are resized to 224 × 224, and the 3D CNN captures spatiotemporal information. A 3D version of the parallel spatial and channel squeeze-and-excitation (P-scSE) is implemented, and a new block called the residual with parallel attention (RPA) block is proposed by combining P-scSE3D with a residual block. To reduce computational complexity, a (2 + 1)D convolution is used in place of full 3D convolution. The model achieves an average accuracy of 0.933, precision of 0.932, recall of 0.944, F1-score of 0.935, and AUC of 0.933. It is also observed that the integration of P-scSE3D increased the F1-score by 7%. This preliminary work opens avenues for exploring various GI endoscopic video-based prospective studies. Full article
Show Figures

Figure 1

20 pages, 7167 KiB  
Article
FM-Net: Frequency-Aware Masked-Attention Network for Infrared Small Target Detection
by Yongxian Liu, Zaiping Lin, Boyang Li, Ting Liu and Wei An
Remote Sens. 2025, 17(13), 2264; https://doi.org/10.3390/rs17132264 - 1 Jul 2025
Viewed by 364
Abstract
Infrared small target detection (IRSTD) aims to locate and separate targets from complex backgrounds. The challenges in IRSTD primarily come from extremely sparse target features and strong background clutter interference. However, existing methods typically perform discrimination directly on the features extracted by deep [...] Read more.
Infrared small target detection (IRSTD) aims to locate and separate targets from complex backgrounds. The challenges in IRSTD primarily come from extremely sparse target features and strong background clutter interference. However, existing methods typically perform discrimination directly on the features extracted by deep networks, neglecting the distinct characteristics of weak and small targets in the frequency domain, thereby limiting the improvement of detection capability. In this paper, we propose a frequency-aware masked-attention network (FM-Net) that leverages multi-scale frequency clues to assist in representing global context and suppressing noise interference. Specifically, we design the wavelet residual block (WRB) to extract multi-scale spatial and frequency features, which introduces a wavelet pyramid as the intermediate layer of the residual block. Then, to perceive global information on the long-range skip connections, a frequency-modulation masked-attention module (FMM) is used to interact with multi-layer features from the encoder. FMM contains two crucial elements: (a) a mask attention (MA) mechanism for injecting broad contextual feature efficiently to promote full-level semantic correlation and focus on salient regions, and (b) a channel-wise frequency modulation module (CFM) for enhancing the most informative frequency components and suppressing useless ones. Extensive experiments on three benchmark datasets (e.g., SIRST, NUDT-SIRST, IRSTD-1k) demonstrate that FM-Net achieves superior detection performance. Full article
Show Figures

Graphical abstract

14 pages, 1622 KiB  
Article
Neonicotinoid Residues in Tea Products from China: Contamination Patterns and Implications for Human Exposure
by Yulong Fan, Hongwei Jin, Jinru Chen, Kai Lin, Lihua Zhu, Yijia Guo, Jiajia Ji and Xiaming Chen
Toxics 2025, 13(7), 550; https://doi.org/10.3390/toxics13070550 - 29 Jun 2025
Viewed by 464
Abstract
Neonicotinoids (NEOs) are a class of systemic insecticides widely used in agriculture owing to their high efficacy and selectivity. As one of the most globally consumed beverages, tea may represent a potential dietary source of pesticide residues. However, limited research has examined NEO [...] Read more.
Neonicotinoids (NEOs) are a class of systemic insecticides widely used in agriculture owing to their high efficacy and selectivity. As one of the most globally consumed beverages, tea may represent a potential dietary source of pesticide residues. However, limited research has examined NEO contamination in tea and its implications for human exposure, highlighting the need for further investigation. Therefore, this study comprehensively evaluated the residue characteristics, processing effects, and human exposure risks of six NEOs—dinotefuran (DIN), imidacloprid (IMI), acetamiprid (ACE), thiamethoxam (THM), clothianidin (CLO), and thiacloprid (THI)—in Chinese tea products. According to the findings, the primary pollutants, ACE, DIN, and IMI, accounted for 95.65% of the total NEO residues in 137 tea samples, including green, oolong, white, black, dark, and herbal teas. The highest total target NEO (∑6NEOs) residue level was detected in oolong tea (mean: 57.86 ng/g). Meanwhile, IMI exhibited the highest residue level (78.88 ng/g) in herbal tea due to the absence of high-temperature fixation procedures. Concentrations of DIN in 61 samples (44.5%) exceeded the European Union’s maximum residue limit of 10 ng/g. Health risk assessment indicated that both the chronic hazard quotient (cHQ) and acute hazard quotient (aHQ) for adults and children were below the safety threshold (<1). However, children required special attention, as their exposure risk was 1.28 times higher than that of adults. The distribution of NEO residues was significantly influenced by tea processing techniques, such as full fermentation in black tea. Optimizing processing methods (e.g., using infrared enzyme deactivation) and implementing targeted pesticide application strategies may help mitigate risk. These results provide a scientific foundation for enhancing tea safety regulations and protecting consumer health. Full article
(This article belongs to the Special Issue Human Biomonitoring in Health Risk Assessment of Emerging Chemicals)
Show Figures

Graphical abstract

28 pages, 4199 KiB  
Article
Dose Reduction in Scintigraphic Imaging Through Enhanced Convolutional Autoencoder-Based Denoising
by Nikolaos Bouzianis, Ioannis Stathopoulos, Pipitsa Valsamaki, Efthymia Rapti, Ekaterini Trikopani, Vasiliki Apostolidou, Athanasia Kotini, Athanasios Zissimopoulos, Adam Adamopoulos and Efstratios Karavasilis
J. Imaging 2025, 11(6), 197; https://doi.org/10.3390/jimaging11060197 - 14 Jun 2025
Viewed by 551
Abstract
Objective: This study proposes a novel deep learning approach for enhancing low-dose bone scintigraphy images using an Enhanced Convolutional Autoencoder (ECAE), aiming to reduce patient radiation exposure while preserving diagnostic quality, as assessed by both expert-based quantitative image metrics and qualitative evaluation. Methods: [...] Read more.
Objective: This study proposes a novel deep learning approach for enhancing low-dose bone scintigraphy images using an Enhanced Convolutional Autoencoder (ECAE), aiming to reduce patient radiation exposure while preserving diagnostic quality, as assessed by both expert-based quantitative image metrics and qualitative evaluation. Methods: A supervised learning framework was developed using real-world paired low- and full-dose images from 105 patients. Data were acquired using standard clinical gamma cameras at the Nuclear Medicine Department of the University General Hospital of Alexandroupolis. The ECAE architecture integrates multiscale feature extraction, channel attention mechanisms, and efficient residual blocks to reconstruct high-quality images from low-dose inputs. The model was trained and validated using quantitative metrics—Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM)—alongside qualitative assessments by nuclear medicine experts. Results: The model achieved significant improvements in both PSNR and SSIM across all tested dose levels, particularly between 30% and 70% of the full dose. Expert evaluation confirmed enhanced visibility of anatomical structures, noise reduction, and preservation of diagnostic detail in denoised images. In blinded evaluations, denoised images were preferred over the original full-dose scans in 66% of all cases, and in 61% of cases within the 30–70% dose range. Conclusion: The proposed ECAE model effectively reconstructs high-quality bone scintigraphy images from substantially reduced-dose acquisitions. This approach supports dose reduction in nuclear medicine imaging while maintaining—or even enhancing—diagnostic confidence, offering practical benefits in patient safety, workflow efficiency, and environmental impact. Full article
Show Figures

Figure 1

23 pages, 13542 KiB  
Article
A Lightweight Neural Network for Denoising Wrapped-Phase Images Generated with Full-Field Optical Interferometry
by Muhammad Awais, Younggue Kim, Taeil Yoon, Wonshik Choi and Byeongha Lee
Appl. Sci. 2025, 15(10), 5514; https://doi.org/10.3390/app15105514 - 14 May 2025
Viewed by 535
Abstract
Phase wrapping is a common phenomenon in optical full-field imaging or measurement systems. It arises from large phase retardations and results in wrapped-phase maps that contain essential information about surface roughness and topology. However, these maps are often degraded by noise, such as [...] Read more.
Phase wrapping is a common phenomenon in optical full-field imaging or measurement systems. It arises from large phase retardations and results in wrapped-phase maps that contain essential information about surface roughness and topology. However, these maps are often degraded by noise, such as speckle and Gaussian, which reduces the measurement accuracy and complicates phase reconstruction. Denoising such data is a fundamental problem in computer vision and plays a critical role in biomedical imaging modalities like Full-Field Optical Interferometry. In this paper, we propose WPD-Net (Wrapped-Phase Denoising Network), a lightweight deep learning-based neural network specifically designed to restore phase images corrupted by high noise levels. The network architecture integrates a shallow feature extraction module, a series of Residual Dense Attention Blocks (RDABs), and a dense feature fusion module. The RDABs incorporate attention mechanisms that help the network focus on critical features and suppress irrelevant noise, especially in high-frequency or complex regions. Additionally, WPD-Net employs a growth-rate-based feature expansion strategy to enhance multi-scale feature representation and improve phase continuity. We evaluate the model’s performance on both synthetic and experimentally acquired datasets and compare it with other state-of-the-art deep learning-based denoising methods. The results demonstrate that WPD-Net achieves superior noise suppression while preserving fine structural details even with mixed speckle and Gaussian noises. The proposed method is expected to enable fast image processing, allowing unwrapped biomedical images to be retrieved in real time. Full article
(This article belongs to the Special Issue Computer-Vision-Based Biomedical Image Processing)
Show Figures

Figure 1

23 pages, 2325 KiB  
Article
Downhole Coal–Rock Recognition Based on Joint Migration and Enhanced Multidimensional Full-Scale Visual Features
by Bin Jiao, Chuanmeng Sun, Sichao Qin, Wenbo Wang, Yu Wang, Zhibo Wu, Yong Li and Dawei Shen
Appl. Sci. 2025, 15(10), 5411; https://doi.org/10.3390/app15105411 - 12 May 2025
Viewed by 344
Abstract
The accurate identification of coal and rock at the mining face is often hindered by adverse underground imaging conditions, including poor lighting and strong reflectivity. To tackle these issues, this work introduces a recognition framework specifically designed for underground environments, leveraging joint migration [...] Read more.
The accurate identification of coal and rock at the mining face is often hindered by adverse underground imaging conditions, including poor lighting and strong reflectivity. To tackle these issues, this work introduces a recognition framework specifically designed for underground environments, leveraging joint migration and enhancement of multidimensional and full-scale visual representations. A Transformer-based architecture is employed to capture global dependencies within the image and perform reflectance component denoising. Additionally, a multi-scale luminance adjustment module is integrated to merge features across perceptual ranges, mitigating localized brightness anomalies such as overexposure. The model is structured around an encoder–decoder backbone, enhanced by a full-scale connectivity mechanism, a residual attention block with dilated convolution, Res2Block elements, and a composite loss function. These components collectively support precise pixel-level segmentation of coal–rock imagery. Experimental evaluations reveal that the proposed luminance module achieves a PSNR of 21.288 and an SSIM of 0.783, outperforming standard enhancement methods like RetinexNet and RRDNet. The segmentation framework achieves a MIoU of 97.99% and an MPA of 99.28%, surpassing U-Net by 2.21 and 1.53 percentage points, respectively. Full article
Show Figures

Figure 1

22 pages, 1768 KiB  
Article
A Novel Integrated Biorefinery for the Valorization of Residual Cardoon Biomass: Overview of Technologies and Process Simulation
by Vittoria Fatta, Aristide Giuliano, Maria Teresa Petrone, Francesco Nanna, Antonio Villone, Donatella Barisano, Roberto Albergo, Federico Liuzzi, Diego Barletta and Isabella De Bari
Energies 2025, 18(4), 973; https://doi.org/10.3390/en18040973 - 18 Feb 2025
Viewed by 691
Abstract
Lignocellulosic biomass is currently widely used in many biorefining processes. The full exploitation of biomass from uncultivated or even marginal lands for the production of biobased chemicals has deserved huge attention in the last few years. Among the sustainable biomass-based value chains, cardoon [...] Read more.
Lignocellulosic biomass is currently widely used in many biorefining processes. The full exploitation of biomass from uncultivated or even marginal lands for the production of biobased chemicals has deserved huge attention in the last few years. Among the sustainable biomass-based value chains, cardoon crops could be a feedstock for biorefineries as they can grow on marginal lands and be used as raw material for multipurpose exploitation, including seeds, roots, and epigeous lignocellulosic solid residue. This work focused on the technical analysis of a novel integrated flowsheet for the exploitation of the lignocellulosic fraction through the assessment of thermochemical, biochemical, and extractive technologies and processes. In particular, high-yield thermochemical processes (gasification), innovative biotechnological processes (syngas fermentation to ethanol), and extractive/catalyzed processes for the valorization of cardoon roots to FDCA and residual solid biomass were modeled and simulated. Inulin conversion to 2,5-Furandicarboxylic acid was the main conversion route taken into consideration. Finally, the novel process flowsheet, treating 130,000 t/y of residual biomass and integrating all proposed technologies, was modeled and assessed using process simulation tools to achieve overall mass and energy balances for comparison with alternative options. The results indicated that cardoon biorefining through the proposed flowsheet can produce, per 1000 tons of input dry biomass, 211 kg of 2,5-Furandicarboxylic acid and 140 kg of ethanol through biomass gasification followed by syngas fermentation. Furthermore, a pre-feasibility analysis was conducted, revealing significant and potentially disruptive results in terms of environmental impact (with 40 ktCO2eq saved) and economic feasibility (with an annual gross profit of EUR 30 M/y). Full article
(This article belongs to the Section A4: Bio-Energy)
Show Figures

Figure 1

20 pages, 3955 KiB  
Article
Deep Learning Extraction of Tidal Creeks in the Yellow River Delta Using GF-2 Imagery
by Bojie Chen, Qianran Zhang, Na Yang, Xiukun Wang, Xiaobo Zhang, Yilan Chen and Shengli Wang
Remote Sens. 2025, 17(4), 676; https://doi.org/10.3390/rs17040676 - 16 Feb 2025
Viewed by 947
Abstract
Tidal creeks are vital geomorphological features of tidal flats, and their spatial and temporal variations contribute significantly to the preservation of ecological diversity and the spatial evolution of coastal wetlands. Traditional methods, such as manual annotation and machine learning, remain common for tidal [...] Read more.
Tidal creeks are vital geomorphological features of tidal flats, and their spatial and temporal variations contribute significantly to the preservation of ecological diversity and the spatial evolution of coastal wetlands. Traditional methods, such as manual annotation and machine learning, remain common for tidal creek extraction, but they are slow and inefficient. With increasing data volumes, accurately analyzing tidal creeks over large spatial and temporal scales has become a significant challenge. This study proposes a residual U-Net model that utilizes full-dimensional dynamic convolution to segment tidal creeks in the Yellow River Delta, employing Gaofen-2 satellite images with a resolution of 4 m. The model replaces the traditional convolutions in the residual blocks of the encoder with Omni-dimensional Dynamic Convolution (ODConv), mitigating the loss of fine details and improving segmentation for small targets. Adding coordinate attention (CA) to the Atrous Spatial Pyramid Pooling (ASPP) module improves target classification and localization in remote sensing images. Including dice coefficients in the focal loss function improves the model’s gradient and tackles class imbalance within the dataset. Furthermore, the inclusion of dice coefficients in the focal loss function improves the gradient of the model and tackles the dataset’s class inequality. The study results indicate that the model attains an F1 score and kappa coefficient exceeding 80% for both mud and salt marsh regions. Comparisons with several semantic segmentation models on the mud marsh tidal creek dataset show that ODU-Net significantly enhances tidal creek segmentation, resolves class imbalance issues, and delivers superior extraction accuracy and stability. Full article
(This article belongs to the Special Issue Remote Sensing of Coastal, Wetland, and Intertidal Zones)
Show Figures

Figure 1

26 pages, 5924 KiB  
Article
Hyperspectral Image Classification Based on Hybrid Depth-Wise Separable Convolution and Dual-Branch Feature Fusion Network
by Hualin Dai, Yingli Yue and Qi Liu
Appl. Sci. 2025, 15(3), 1394; https://doi.org/10.3390/app15031394 - 29 Jan 2025
Viewed by 886
Abstract
Recently, advancements in convolutional neural networks (CNNs) have significantly contributed to the advancement of hyperspectral image (HSI) classification. However, the problem of limited training samples is the primary obstacle to obtaining further improvements in HSI classification. The traditional methods relying solely on 2D-CNN [...] Read more.
Recently, advancements in convolutional neural networks (CNNs) have significantly contributed to the advancement of hyperspectral image (HSI) classification. However, the problem of limited training samples is the primary obstacle to obtaining further improvements in HSI classification. The traditional methods relying solely on 2D-CNN for feature extraction underutilize the inter-band correlations of HSI, while the methods based on 3D-CNN alone for feature extraction lead to an increase in training parameters. To solve the above problems, we propose an HSI classification network based on hybrid depth-wise separable convolution and dual-branch feature fusion (HDCDF). The dual-branch structure is designed in HDCDF to extract simultaneously integrated spectral–spatial features and obtain complementary features via feature fusion. The proposed modules of 2D depth-wise separable convolution attention (2D-DCAttention) block and hybrid residual blocks are applied to the dual branch, respectively, further extracting more representative and comprehensive features. Instead of full 3D convolutions, HDCDF uses hybrid 2D–3D depth-wise separable convolutions, offering computational efficiency. Experiments are conducted on three benchmark HSI datasets: Indian Pines, University of Pavia, and Salinas Valley. The experimental results show that the proposed method showcases superior performance when the training samples are extremely limited, outpacing the state-of-the-art method by an average of 2.03% in the overall accuracy of three datasets, which shows that HDCDF has a certain potential in HSI classification. Full article
Show Figures

Figure 1

15 pages, 4496 KiB  
Article
Identification of Oligopeptides in the Distillates from Various Rounds of Soy Sauce-Flavored Baijiu and Their Effect on the Ester–Acid–Alcohol Profile in Baijiu
by Qiang Wu, Shanlin Tian, Xu Zhang, Yunhao Zhao and Yougui Yu
Foods 2025, 14(2), 287; https://doi.org/10.3390/foods14020287 - 16 Jan 2025
Cited by 1 | Viewed by 1088
Abstract
Endogenous peptides in Baijiu have primarily focused on finished liquor research, with limited attention given to the peptides in base liquor prior to blending. Liquid chromatography–tandem mass spectrometry (LC-MS) was employed to identify endogenous peptides in the distillates from the first to seventh [...] Read more.
Endogenous peptides in Baijiu have primarily focused on finished liquor research, with limited attention given to the peptides in base liquor prior to blending. Liquid chromatography–tandem mass spectrometry (LC-MS) was employed to identify endogenous peptides in the distillates from the first to seventh rounds of soy sauce-flavored Baijiu. Two hundred and five oligopeptides were identified from these distillates, all of which had molecular weights below 1000 Da and were composed of amino acid residues associated with flavor (sweet, sour, and bitter) and biological activity. Furthermore, full-wavelength scanning, content determination of the main compounds, and molecular docking were performed to analyze these oligopeptides’ effect on the ester–acid–alcohol profile in Baijiu. This determination revealed a negative correlation between the peptide content and total ester content (r = −0.691), as well as the total acid content (r = −0.323), and a highly significant negative correlation with ethanol content (r = −0.916). Notably, the screened peptides (TRH, YHY, RQTQ, PLDLTSFVLHEAI, KHVS, LPQRHRMVYSLL, and NEWH) had specific interactions with the major flavor substances via hydrogen bonds, including esters (ethyl acetate, ethyl butanoate, ethyl hexanoate, and ethyl lactate), acids (acetate acid, butanoate acid, hexanoate acid, lactate acid), and alcohols (ethanol, 1-propanol, 1-butanol, and 1-hexanol). These findings elucidate the distribution and dynamic changes of endogenous peptides in the distillates from various rounds of soy sauce-flavored Baijiu, providing a theoretical foundation for further investigation into their interaction mechanisms associated with flavor compounds. Full article
(This article belongs to the Section Food Physics and (Bio)Chemistry)
Show Figures

Figure 1

21 pages, 11525 KiB  
Article
Detection of Defective Apples Using Learnable Residual Multi-Head Attention Networks Integrated with CNNs
by Dongshu Bao, Xiangyang Liu, Yong Xu, Qun Fang and Xin He
Electronics 2024, 13(24), 4861; https://doi.org/10.3390/electronics13244861 - 10 Dec 2024
Cited by 1 | Viewed by 1158
Abstract
Many traditional fruit vendors still rely on manual sorting to pick out high-quality apples. This process is not only time-consuming but can also damage the apples. Meanwhile, automated detection technology is still in its early stage and lacks full reliability. To improve this [...] Read more.
Many traditional fruit vendors still rely on manual sorting to pick out high-quality apples. This process is not only time-consuming but can also damage the apples. Meanwhile, automated detection technology is still in its early stage and lacks full reliability. To improve this technology, we propose a novel method, which incorporates a learnable scaling factor and residual connection to enhance the Multi-Head Attention mechanism. In our approach, a learnable scaling factor is first applied to adjust the attention weights dynamically, and then a residual connection combines the scaled attention output with the original input to preserve essential features from the initial data. By integrating Multi-Head Attention with Convolutional Neural Networks (CNNs) using this method, we propose a lightweight deep learning model called “Learnable Residual Multi-Head Attention Networks Fusion with CNNs” to detect defective apples. Compared to existing models, our proposed model has lower memory usage, shorter training time, and higher detection precision. On the test set, the model achieves an accuracy of 97.5%, a recall of 98%, and a specificity of 97%, along with the lowest detection time of 46 ms. Experimental results show that the proposed model using our method is highly promising for commercial sorting, as it reduces labor costs, increases the supply of high-quality apples, and boosts consumer satisfaction. Full article
Show Figures

Figure 1

16 pages, 10466 KiB  
Article
Hierarchical Residual Attention Network for Musical Instrument Recognition Using Scaled Multi-Spectrogram
by Rujia Chen, Akbar Ghobakhlou and Ajit Narayanan
Appl. Sci. 2024, 14(23), 10837; https://doi.org/10.3390/app142310837 - 22 Nov 2024
Cited by 2 | Viewed by 1366
Abstract
Musical instrument recognition is a relatively unexplored area of machine learning due to the need to analyze complex spatial–temporal audio features. Traditional methods using individual spectrograms, like STFT, Log-Mel, and MFCC, often miss the full range of features. Here, we propose a hierarchical [...] Read more.
Musical instrument recognition is a relatively unexplored area of machine learning due to the need to analyze complex spatial–temporal audio features. Traditional methods using individual spectrograms, like STFT, Log-Mel, and MFCC, often miss the full range of features. Here, we propose a hierarchical residual attention network using a scaled combination of multiple spectrograms, including STFT, Log-Mel, MFCC, and CST features (Chroma, Spectral contrast, and Tonnetz), to create a comprehensive sound representation. This model enhances the focus on relevant spectrogram parts through attention mechanisms. Experimental results with the OpenMIC-2018 dataset show significant improvement in classification accuracy, especially with the “Magnified 1/4 Size” configuration. Future work will optimize CST feature scaling, explore advanced attention mechanisms, and apply the model to other audio tasks to assess its generalizability. Full article
(This article belongs to the Special Issue AI in Audio Analysis: Spectrogram-Based Recognition)
Show Figures

Figure 1

17 pages, 1265 KiB  
Article
Message Action Adapter Framework in Multi-Agent Reinforcement Learning
by Bumjin Park and Jaesik Choi
Appl. Sci. 2024, 14(21), 10079; https://doi.org/10.3390/app142110079 - 4 Nov 2024
Viewed by 1792
Abstract
Multi-agent reinforcement learning (MARL) has demonstrated significant potential in enabling cooperative agents. The communication protocol, which is responsible for message exchange between agents, is crucial in cooperation. However, communicative MARL systems still face challenges due to the noisy messages in complex multi-agent decision [...] Read more.
Multi-agent reinforcement learning (MARL) has demonstrated significant potential in enabling cooperative agents. The communication protocol, which is responsible for message exchange between agents, is crucial in cooperation. However, communicative MARL systems still face challenges due to the noisy messages in complex multi-agent decision processes. This issue often stems from the entangled representation of observations and messages in policy networks. To address this, we propose the Message Action Adapter Framework (MAAF), which first trains individual agents without message inputs and then adapts a residual action based on message components. This separation isolates the effect of messages on action inference. We explore how training the MAAF framework with model-agnostic message types and varying optimization strategies influences adaptation performance. The experimental results indicate that MAAF achieves competitive performance across multiple baselines despite utilizing only half of the available communication, and shows an average improvement of 7.6% over the full attention-based communication approach. Additional findings suggest that different message types result in significant performance variations, emphasizing the importance of environment-specific message types. We demonstrate how the proposed architecture separates communication channels, effectively isolating message contributions. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

16 pages, 1818 KiB  
Article
FFA-BiGRU: Attention-Based Spatial-Temporal Feature Extraction Model for Music Emotion Classification
by Yuping Su, Jie Chen, Ruiting Chai, Xiaojun Wu and Yumei Zhang
Appl. Sci. 2024, 14(16), 6866; https://doi.org/10.3390/app14166866 - 6 Aug 2024
Cited by 2 | Viewed by 1816
Abstract
Music emotion recognition is becoming an important research direction due to its great significance for music information retrieval, music recommendation, and so on. In the task of music emotion recognition, the key to achieving accurate emotion recognition lies in how to extract the [...] Read more.
Music emotion recognition is becoming an important research direction due to its great significance for music information retrieval, music recommendation, and so on. In the task of music emotion recognition, the key to achieving accurate emotion recognition lies in how to extract the affect-salient features fully. In this paper, we propose an end-to-end spatial-temporal feature extraction method named FFA-BiGRU for music emotion classification. Taking the log Mel-spectrogram of music audio as the input, this method employs an attention-based convolutional residual module named FFA, which serves as a spatial feature learning module to obtain multi-scale spatial features. In the FFA module, three group architecture blocks extract multi-level spatial features, each of which consists of a stack of multiple channel-spatial attention-based residual blocks. Then, the output features from FFA are fed into the bidirectional gated recurrent units (BiGRU) module to capture the temporal features of music further. In order to make full use of the extracted spatial and temporal features, the output feature maps of FFA and those of the BiGRU are concatenated in the channel dimension. Finally, the concatenated features are passed through fully connected layers to predict the emotion classification results. The experimental results of the EMOPIA dataset show that the proposed model achieves better classification accuracy than the existing baselines. Meanwhile, the ablation experiments also demonstrate the effectiveness of each part of the proposed method. Full article
Show Figures

Figure 1

Back to TopTop