MDPI - Publisher of Open Access Journals

27 pages, 13439 KiB

Open AccessArticle

Swin-ReshoUnet: A Seismic Profile Signal Reconstruction Method Integrating Hierarchical Convolution, ORCA Attention, and Residual Channel Attention Mechanism

by Jie Rao, Mingju Chen, Xiaofei Song, Chen Xie, Xueyang Duan, Xiao Hu, Senyuan Li and Xingyue Zhang

Appl. Sci. 2025, 15(15), 8332; https://doi.org/10.3390/app15158332 - 26 Jul 2025

Viewed by 148

Abstract

This study proposes a Swin-ReshoUnet architecture with a three-level enhancement mechanism to address inefficiencies in multi-scale feature extraction and gradient degradation in deep networks for high-precision seismic exploration. The encoder uses a hierarchical convolution module to build a multi-scale feature pyramid, enhancing cross-scale [...] Read more.

This study proposes a Swin-ReshoUnet architecture with a three-level enhancement mechanism to address inefficiencies in multi-scale feature extraction and gradient degradation in deep networks for high-precision seismic exploration. The encoder uses a hierarchical convolution module to build a multi-scale feature pyramid, enhancing cross-scale geological signal representation. The decoder replaces traditional self-attention with ORCA attention to enable global context modeling with lower computational cost. Skip connections integrate a residual channel attention module, mitigating gradient degradation via dual-pooling feature fusion and activation optimization, forming a full-link optimization from low-level feature enhancement to high-level semantic integration. Simulated and real dataset experiments show that at decimation ratios of 0.1–0.5, the method significantly outperforms SwinUnet, TransUnet, etc., in reconstruction performance. Residual signals and F-K spectra verify high-fidelity reconstruction. Despite increased difficulty with higher sparsity, it maintains optimal performance with notable margins, demonstrating strong robustness. The proposed hierarchical feature enhancement and cross-scale attention strategies offer an efficient seismic profile signal reconstruction solution and show generality for migration to complex visual tasks, advancing geophysics-computer vision interdisciplinary innovation. Full article

► Show Figures

Figure 1

15 pages, 4874 KiB

Open AccessArticle

A Novel 3D Convolutional Neural Network-Based Deep Learning Model for Spatiotemporal Feature Mapping for Video Analysis: Feasibility Study for Gastrointestinal Endoscopic Video Classification

by Mrinal Kanti Dhar, Mou Deb, Poonguzhali Elangovan, Keerthy Gopalakrishnan, Divyanshi Sood, Avneet Kaur, Charmy Parikh, Swetha Rapolu, Gianeshwaree Alias Rachna Panjwani, Rabiah Aslam Ansari, Naghmeh Asadimanesh, Shiva Sankari Karuppiah, Scott A. Helgeson, Venkata S. Akshintala and Shivaram P. Arunachalam

J. Imaging 2025, 11(7), 243; https://doi.org/10.3390/jimaging11070243 - 18 Jul 2025

Viewed by 422

Abstract

Accurate analysis of medical videos remains a major challenge in deep learning (DL) due to the need for effective spatiotemporal feature mapping that captures both spatial detail and temporal dynamics. Despite advances in DL, most existing models in medical AI focus on static [...] Read more.

Accurate analysis of medical videos remains a major challenge in deep learning (DL) due to the need for effective spatiotemporal feature mapping that captures both spatial detail and temporal dynamics. Despite advances in DL, most existing models in medical AI focus on static images, overlooking critical temporal cues present in video data. To bridge this gap, a novel DL-based framework is proposed for spatiotemporal feature extraction from medical video sequences. As a feasibility use case, this study focuses on gastrointestinal (GI) endoscopic video classification. A 3D convolutional neural network (CNN) is developed to classify upper and lower GI endoscopic videos using the hyperKvasir dataset, which contains 314 lower and 60 upper GI videos. To address data imbalance, 60 matched pairs of videos are randomly selected across 20 experimental runs. Videos are resized to 224 × 224, and the 3D CNN captures spatiotemporal information. A 3D version of the parallel spatial and channel squeeze-and-excitation (P-scSE) is implemented, and a new block called the residual with parallel attention (RPA) block is proposed by combining P-scSE3D with a residual block. To reduce computational complexity, a (2 + 1)D convolution is used in place of full 3D convolution. The model achieves an average accuracy of 0.933, precision of 0.932, recall of 0.944, F1-score of 0.935, and AUC of 0.933. It is also observed that the integration of P-scSE3D increased the F1-score by 7%. This preliminary work opens avenues for exploring various GI endoscopic video-based prospective studies. Full article

(This article belongs to the Special Issue Clinical and Pathological Imaging in the Era of Artificial Intelligence: New Insights and Perspectives—2nd Edition)

► Show Figures

Figure 1

20 pages, 7167 KiB

Open AccessArticle

FM-Net: Frequency-Aware Masked-Attention Network for Infrared Small Target Detection

by Yongxian Liu, Zaiping Lin, Boyang Li, Ting Liu and Wei An

Remote Sens. 2025, 17(13), 2264; https://doi.org/10.3390/rs17132264 - 1 Jul 2025

Viewed by 364

Abstract

Infrared small target detection (IRSTD) aims to locate and separate targets from complex backgrounds. The challenges in IRSTD primarily come from extremely sparse target features and strong background clutter interference. However, existing methods typically perform discrimination directly on the features extracted by deep [...] Read more.

Infrared small target detection (IRSTD) aims to locate and separate targets from complex backgrounds. The challenges in IRSTD primarily come from extremely sparse target features and strong background clutter interference. However, existing methods typically perform discrimination directly on the features extracted by deep networks, neglecting the distinct characteristics of weak and small targets in the frequency domain, thereby limiting the improvement of detection capability. In this paper, we propose a frequency-aware masked-attention network (FM-Net) that leverages multi-scale frequency clues to assist in representing global context and suppressing noise interference. Specifically, we design the wavelet residual block (WRB) to extract multi-scale spatial and frequency features, which introduces a wavelet pyramid as the intermediate layer of the residual block. Then, to perceive global information on the long-range skip connections, a frequency-modulation masked-attention module (FMM) is used to interact with multi-layer features from the encoder. FMM contains two crucial elements: (a) a mask attention (MA) mechanism for injecting broad contextual feature efficiently to promote full-level semantic correlation and focus on salient regions, and (b) a channel-wise frequency modulation module (CFM) for enhancing the most informative frequency components and suppressing useless ones. Extensive experiments on three benchmark datasets (e.g., SIRST, NUDT-SIRST, IRSTD-1k) demonstrate that FM-Net achieves superior detection performance. Full article

(This article belongs to the Special Issue Advanced Artificial Intelligence and Deep Learning for Remote Sensing (3rd Edition))

► Show Figures

Graphical abstract

14 pages, 1622 KiB

Open AccessArticle

Neonicotinoid Residues in Tea Products from China: Contamination Patterns and Implications for Human Exposure

by Yulong Fan, Hongwei Jin, Jinru Chen, Kai Lin, Lihua Zhu, Yijia Guo, Jiajia Ji and Xiaming Chen

Toxics 2025, 13(7), 550; https://doi.org/10.3390/toxics13070550 - 29 Jun 2025

Viewed by 464

Abstract

Neonicotinoids (NEOs) are a class of systemic insecticides widely used in agriculture owing to their high efficacy and selectivity. As one of the most globally consumed beverages, tea may represent a potential dietary source of pesticide residues. However, limited research has examined NEO [...] Read more.

Neonicotinoids (NEOs) are a class of systemic insecticides widely used in agriculture owing to their high efficacy and selectivity. As one of the most globally consumed beverages, tea may represent a potential dietary source of pesticide residues. However, limited research has examined NEO contamination in tea and its implications for human exposure, highlighting the need for further investigation. Therefore, this study comprehensively evaluated the residue characteristics, processing effects, and human exposure risks of six NEOs—dinotefuran (DIN), imidacloprid (IMI), acetamiprid (ACE), thiamethoxam (THM), clothianidin (CLO), and thiacloprid (THI)—in Chinese tea products. According to the findings, the primary pollutants, ACE, DIN, and IMI, accounted for 95.65% of the total NEO residues in 137 tea samples, including green, oolong, white, black, dark, and herbal teas. The highest total target NEO (∑₆NEOs) residue level was detected in oolong tea (mean: 57.86 ng/g). Meanwhile, IMI exhibited the highest residue level (78.88 ng/g) in herbal tea due to the absence of high-temperature fixation procedures. Concentrations of DIN in 61 samples (44.5%) exceeded the European Union’s maximum residue limit of 10 ng/g. Health risk assessment indicated that both the chronic hazard quotient (cHQ) and acute hazard quotient (aHQ) for adults and children were below the safety threshold (<1). However, children required special attention, as their exposure risk was 1.28 times higher than that of adults. The distribution of NEO residues was significantly influenced by tea processing techniques, such as full fermentation in black tea. Optimizing processing methods (e.g., using infrared enzyme deactivation) and implementing targeted pesticide application strategies may help mitigate risk. These results provide a scientific foundation for enhancing tea safety regulations and protecting consumer health. Full article

(This article belongs to the Special Issue Human Biomonitoring in Health Risk Assessment of Emerging Chemicals)

► Show Figures

Graphical abstract

28 pages, 4199 KiB

Open AccessArticle

Dose Reduction in Scintigraphic Imaging Through Enhanced Convolutional Autoencoder-Based Denoising

by Nikolaos Bouzianis, Ioannis Stathopoulos, Pipitsa Valsamaki, Efthymia Rapti, Ekaterini Trikopani, Vasiliki Apostolidou, Athanasia Kotini, Athanasios Zissimopoulos, Adam Adamopoulos and Efstratios Karavasilis

J. Imaging 2025, 11(6), 197; https://doi.org/10.3390/jimaging11060197 - 14 Jun 2025

Viewed by 551

Abstract

Objective: This study proposes a novel deep learning approach for enhancing low-dose bone scintigraphy images using an Enhanced Convolutional Autoencoder (ECAE), aiming to reduce patient radiation exposure while preserving diagnostic quality, as assessed by both expert-based quantitative image metrics and qualitative evaluation. Methods: [...] Read more.

Objective: This study proposes a novel deep learning approach for enhancing low-dose bone scintigraphy images using an Enhanced Convolutional Autoencoder (ECAE), aiming to reduce patient radiation exposure while preserving diagnostic quality, as assessed by both expert-based quantitative image metrics and qualitative evaluation. Methods: A supervised learning framework was developed using real-world paired low- and full-dose images from 105 patients. Data were acquired using standard clinical gamma cameras at the Nuclear Medicine Department of the University General Hospital of Alexandroupolis. The ECAE architecture integrates multiscale feature extraction, channel attention mechanisms, and efficient residual blocks to reconstruct high-quality images from low-dose inputs. The model was trained and validated using quantitative metrics—Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM)—alongside qualitative assessments by nuclear medicine experts. Results: The model achieved significant improvements in both PSNR and SSIM across all tested dose levels, particularly between 30% and 70% of the full dose. Expert evaluation confirmed enhanced visibility of anatomical structures, noise reduction, and preservation of diagnostic detail in denoised images. In blinded evaluations, denoised images were preferred over the original full-dose scans in 66% of all cases, and in 61% of cases within the 30–70% dose range. Conclusion: The proposed ECAE model effectively reconstructs high-quality bone scintigraphy images from substantially reduced-dose acquisitions. This approach supports dose reduction in nuclear medicine imaging while maintaining—or even enhancing—diagnostic confidence, offering practical benefits in patient safety, workflow efficiency, and environmental impact. Full article

(This article belongs to the Special Issue Clinical and Pathological Imaging in the Era of Artificial Intelligence: New Insights and Perspectives—2nd Edition)

► Show Figures

Figure 1

23 pages, 13542 KiB

Open AccessArticle

A Lightweight Neural Network for Denoising Wrapped-Phase Images Generated with Full-Field Optical Interferometry

by Muhammad Awais, Younggue Kim, Taeil Yoon, Wonshik Choi and Byeongha Lee

Appl. Sci. 2025, 15(10), 5514; https://doi.org/10.3390/app15105514 - 14 May 2025

Viewed by 535

Abstract

Phase wrapping is a common phenomenon in optical full-field imaging or measurement systems. It arises from large phase retardations and results in wrapped-phase maps that contain essential information about surface roughness and topology. However, these maps are often degraded by noise, such as [...] Read more.

Phase wrapping is a common phenomenon in optical full-field imaging or measurement systems. It arises from large phase retardations and results in wrapped-phase maps that contain essential information about surface roughness and topology. However, these maps are often degraded by noise, such as speckle and Gaussian, which reduces the measurement accuracy and complicates phase reconstruction. Denoising such data is a fundamental problem in computer vision and plays a critical role in biomedical imaging modalities like Full-Field Optical Interferometry. In this paper, we propose WPD-Net (Wrapped-Phase Denoising Network), a lightweight deep learning-based neural network specifically designed to restore phase images corrupted by high noise levels. The network architecture integrates a shallow feature extraction module, a series of Residual Dense Attention Blocks (RDABs), and a dense feature fusion module. The RDABs incorporate attention mechanisms that help the network focus on critical features and suppress irrelevant noise, especially in high-frequency or complex regions. Additionally, WPD-Net employs a growth-rate-based feature expansion strategy to enhance multi-scale feature representation and improve phase continuity. We evaluate the model’s performance on both synthetic and experimentally acquired datasets and compare it with other state-of-the-art deep learning-based denoising methods. The results demonstrate that WPD-Net achieves superior noise suppression while preserving fine structural details even with mixed speckle and Gaussian noises. The proposed method is expected to enable fast image processing, allowing unwrapped biomedical images to be retrieved in real time. Full article

(This article belongs to the Special Issue Computer-Vision-Based Biomedical Image Processing)

► Show Figures

Figure 1

23 pages, 2325 KiB

Open AccessArticle

Downhole Coal–Rock Recognition Based on Joint Migration and Enhanced Multidimensional Full-Scale Visual Features

by Bin Jiao, Chuanmeng Sun, Sichao Qin, Wenbo Wang, Yu Wang, Zhibo Wu, Yong Li and Dawei Shen

Appl. Sci. 2025, 15(10), 5411; https://doi.org/10.3390/app15105411 - 12 May 2025

Viewed by 344

Abstract

The accurate identification of coal and rock at the mining face is often hindered by adverse underground imaging conditions, including poor lighting and strong reflectivity. To tackle these issues, this work introduces a recognition framework specifically designed for underground environments, leveraging joint migration [...] Read more.

The accurate identification of coal and rock at the mining face is often hindered by adverse underground imaging conditions, including poor lighting and strong reflectivity. To tackle these issues, this work introduces a recognition framework specifically designed for underground environments, leveraging joint migration and enhancement of multidimensional and full-scale visual representations. A Transformer-based architecture is employed to capture global dependencies within the image and perform reflectance component denoising. Additionally, a multi-scale luminance adjustment module is integrated to merge features across perceptual ranges, mitigating localized brightness anomalies such as overexposure. The model is structured around an encoder–decoder backbone, enhanced by a full-scale connectivity mechanism, a residual attention block with dilated convolution, Res2Block elements, and a composite loss function. These components collectively support precise pixel-level segmentation of coal–rock imagery. Experimental evaluations reveal that the proposed luminance module achieves a PSNR of 21.288 and an SSIM of 0.783, outperforming standard enhancement methods like RetinexNet and RRDNet. The segmentation framework achieves a MIoU of 97.99% and an MPA of 99.28%, surpassing U-Net by 2.21 and 1.53 percentage points, respectively. Full article

► Show Figures

Figure 1

22 pages, 1768 KiB

Open AccessEditor’s ChoiceArticle

A Novel Integrated Biorefinery for the Valorization of Residual Cardoon Biomass: Overview of Technologies and Process Simulation

by Vittoria Fatta, Aristide Giuliano, Maria Teresa Petrone, Francesco Nanna, Antonio Villone, Donatella Barisano, Roberto Albergo, Federico Liuzzi, Diego Barletta and Isabella De Bari

Energies 2025, 18(4), 973; https://doi.org/10.3390/en18040973 - 18 Feb 2025

Viewed by 691

Abstract

Lignocellulosic biomass is currently widely used in many biorefining processes. The full exploitation of biomass from uncultivated or even marginal lands for the production of biobased chemicals has deserved huge attention in the last few years. Among the sustainable biomass-based value chains, cardoon [...] Read more.

Lignocellulosic biomass is currently widely used in many biorefining processes. The full exploitation of biomass from uncultivated or even marginal lands for the production of biobased chemicals has deserved huge attention in the last few years. Among the sustainable biomass-based value chains, cardoon crops could be a feedstock for biorefineries as they can grow on marginal lands and be used as raw material for multipurpose exploitation, including seeds, roots, and epigeous lignocellulosic solid residue. This work focused on the technical analysis of a novel integrated flowsheet for the exploitation of the lignocellulosic fraction through the assessment of thermochemical, biochemical, and extractive technologies and processes. In particular, high-yield thermochemical processes (gasification), innovative biotechnological processes (syngas fermentation to ethanol), and extractive/catalyzed processes for the valorization of cardoon roots to FDCA and residual solid biomass were modeled and simulated. Inulin conversion to 2,5-Furandicarboxylic acid was the main conversion route taken into consideration. Finally, the novel process flowsheet, treating 130,000 t/y of residual biomass and integrating all proposed technologies, was modeled and assessed using process simulation tools to achieve overall mass and energy balances for comparison with alternative options. The results indicated that cardoon biorefining through the proposed flowsheet can produce, per 1000 tons of input dry biomass, 211 kg of 2,5-Furandicarboxylic acid and 140 kg of ethanol through biomass gasification followed by syngas fermentation. Furthermore, a pre-feasibility analysis was conducted, revealing significant and potentially disruptive results in terms of environmental impact (with 40 kt_CO2eq saved) and economic feasibility (with an annual gross profit of EUR 30 M/y). Full article

(This article belongs to the Section A4: Bio-Energy)

► Show Figures

Figure 1

20 pages, 3955 KiB

Open AccessArticle

Deep Learning Extraction of Tidal Creeks in the Yellow River Delta Using GF-2 Imagery

by Bojie Chen, Qianran Zhang, Na Yang, Xiukun Wang, Xiaobo Zhang, Yilan Chen and Shengli Wang

Remote Sens. 2025, 17(4), 676; https://doi.org/10.3390/rs17040676 - 16 Feb 2025

Viewed by 947

Abstract

Tidal creeks are vital geomorphological features of tidal flats, and their spatial and temporal variations contribute significantly to the preservation of ecological diversity and the spatial evolution of coastal wetlands. Traditional methods, such as manual annotation and machine learning, remain common for tidal [...] Read more.

Tidal creeks are vital geomorphological features of tidal flats, and their spatial and temporal variations contribute significantly to the preservation of ecological diversity and the spatial evolution of coastal wetlands. Traditional methods, such as manual annotation and machine learning, remain common for tidal creek extraction, but they are slow and inefficient. With increasing data volumes, accurately analyzing tidal creeks over large spatial and temporal scales has become a significant challenge. This study proposes a residual U-Net model that utilizes full-dimensional dynamic convolution to segment tidal creeks in the Yellow River Delta, employing Gaofen-2 satellite images with a resolution of 4 m. The model replaces the traditional convolutions in the residual blocks of the encoder with Omni-dimensional Dynamic Convolution (ODConv), mitigating the loss of fine details and improving segmentation for small targets. Adding coordinate attention (CA) to the Atrous Spatial Pyramid Pooling (ASPP) module improves target classification and localization in remote sensing images. Including dice coefficients in the focal loss function improves the model’s gradient and tackles class imbalance within the dataset. Furthermore, the inclusion of dice coefficients in the focal loss function improves the gradient of the model and tackles the dataset’s class inequality. The study results indicate that the model attains an F1 score and kappa coefficient exceeding 80% for both mud and salt marsh regions. Comparisons with several semantic segmentation models on the mud marsh tidal creek dataset show that ODU-Net significantly enhances tidal creek segmentation, resolves class imbalance issues, and delivers superior extraction accuracy and stability. Full article

(This article belongs to the Special Issue Remote Sensing of Coastal, Wetland, and Intertidal Zones)

► Show Figures

Figure 1

26 pages, 5924 KiB

Open AccessArticle

Hyperspectral Image Classification Based on Hybrid Depth-Wise Separable Convolution and Dual-Branch Feature Fusion Network

by Hualin Dai, Yingli Yue and Qi Liu

Appl. Sci. 2025, 15(3), 1394; https://doi.org/10.3390/app15031394 - 29 Jan 2025

Viewed by 886

Abstract

Recently, advancements in convolutional neural networks (CNNs) have significantly contributed to the advancement of hyperspectral image (HSI) classification. However, the problem of limited training samples is the primary obstacle to obtaining further improvements in HSI classification. The traditional methods relying solely on 2D-CNN [...] Read more.

Recently, advancements in convolutional neural networks (CNNs) have significantly contributed to the advancement of hyperspectral image (HSI) classification. However, the problem of limited training samples is the primary obstacle to obtaining further improvements in HSI classification. The traditional methods relying solely on 2D-CNN for feature extraction underutilize the inter-band correlations of HSI, while the methods based on 3D-CNN alone for feature extraction lead to an increase in training parameters. To solve the above problems, we propose an HSI classification network based on hybrid depth-wise separable convolution and dual-branch feature fusion (HDCDF). The dual-branch structure is designed in HDCDF to extract simultaneously integrated spectral–spatial features and obtain complementary features via feature fusion. The proposed modules of 2D depth-wise separable convolution attention (2D-DCAttention) block and hybrid residual blocks are applied to the dual branch, respectively, further extracting more representative and comprehensive features. Instead of full 3D convolutions, HDCDF uses hybrid 2D–3D depth-wise separable convolutions, offering computational efficiency. Experiments are conducted on three benchmark HSI datasets: Indian Pines, University of Pavia, and Salinas Valley. The experimental results show that the proposed method showcases superior performance when the training samples are extremely limited, outpacing the state-of-the-art method by an average of 2.03% in the overall accuracy of three datasets, which shows that HDCDF has a certain potential in HSI classification. Full article

► Show Figures

Figure 1

15 pages, 4496 KiB

Open AccessArticle

Identification of Oligopeptides in the Distillates from Various Rounds of Soy Sauce-Flavored Baijiu and Their Effect on the Ester–Acid–Alcohol Profile in Baijiu

by Qiang Wu, Shanlin Tian, Xu Zhang, Yunhao Zhao and Yougui Yu

Foods 2025, 14(2), 287; https://doi.org/10.3390/foods14020287 - 16 Jan 2025

Cited by 1 | Viewed by 1088

Abstract

Endogenous peptides in Baijiu have primarily focused on finished liquor research, with limited attention given to the peptides in base liquor prior to blending. Liquid chromatography–tandem mass spectrometry (LC-MS) was employed to identify endogenous peptides in the distillates from the first to seventh [...] Read more.

Endogenous peptides in Baijiu have primarily focused on finished liquor research, with limited attention given to the peptides in base liquor prior to blending. Liquid chromatography–tandem mass spectrometry (LC-MS) was employed to identify endogenous peptides in the distillates from the first to seventh rounds of soy sauce-flavored Baijiu. Two hundred and five oligopeptides were identified from these distillates, all of which had molecular weights below 1000 Da and were composed of amino acid residues associated with flavor (sweet, sour, and bitter) and biological activity. Furthermore, full-wavelength scanning, content determination of the main compounds, and molecular docking were performed to analyze these oligopeptides’ effect on the ester–acid–alcohol profile in Baijiu. This determination revealed a negative correlation between the peptide content and total ester content (r = −0.691), as well as the total acid content (r = −0.323), and a highly significant negative correlation with ethanol content (r = −0.916). Notably, the screened peptides (TRH, YHY, RQTQ, PLDLTSFVLHEAI, KHVS, LPQRHRMVYSLL, and NEWH) had specific interactions with the major flavor substances via hydrogen bonds, including esters (ethyl acetate, ethyl butanoate, ethyl hexanoate, and ethyl lactate), acids (acetate acid, butanoate acid, hexanoate acid, lactate acid), and alcohols (ethanol, 1-propanol, 1-butanol, and 1-hexanol). These findings elucidate the distribution and dynamic changes of endogenous peptides in the distillates from various rounds of soy sauce-flavored Baijiu, providing a theoretical foundation for further investigation into their interaction mechanisms associated with flavor compounds. Full article

(This article belongs to the Section Food Physics and (Bio)Chemistry)

► Show Figures

Figure 1

21 pages, 11525 KiB

Open AccessArticle

Detection of Defective Apples Using Learnable Residual Multi-Head Attention Networks Integrated with CNNs

by Dongshu Bao, Xiangyang Liu, Yong Xu, Qun Fang and Xin He

Electronics 2024, 13(24), 4861; https://doi.org/10.3390/electronics13244861 - 10 Dec 2024

Cited by 1 | Viewed by 1158

Abstract

Many traditional fruit vendors still rely on manual sorting to pick out high-quality apples. This process is not only time-consuming but can also damage the apples. Meanwhile, automated detection technology is still in its early stage and lacks full reliability. To improve this [...] Read more.

Many traditional fruit vendors still rely on manual sorting to pick out high-quality apples. This process is not only time-consuming but can also damage the apples. Meanwhile, automated detection technology is still in its early stage and lacks full reliability. To improve this technology, we propose a novel method, which incorporates a learnable scaling factor and residual connection to enhance the Multi-Head Attention mechanism. In our approach, a learnable scaling factor is first applied to adjust the attention weights dynamically, and then a residual connection combines the scaled attention output with the original input to preserve essential features from the initial data. By integrating Multi-Head Attention with Convolutional Neural Networks (CNNs) using this method, we propose a lightweight deep learning model called “Learnable Residual Multi-Head Attention Networks Fusion with CNNs” to detect defective apples. Compared to existing models, our proposed model has lower memory usage, shorter training time, and higher detection precision. On the test set, the model achieves an accuracy of 97.5%, a recall of 98%, and a specificity of 97%, along with the lowest detection time of 46 ms. Experimental results show that the proposed model using our method is highly promising for commercial sorting, as it reduces labor costs, increases the supply of high-quality apples, and boosts consumer satisfaction. Full article

► Show Figures

Figure 1

16 pages, 10466 KiB

Open AccessArticle

Hierarchical Residual Attention Network for Musical Instrument Recognition Using Scaled Multi-Spectrogram

by Rujia Chen, Akbar Ghobakhlou and Ajit Narayanan

Appl. Sci. 2024, 14(23), 10837; https://doi.org/10.3390/app142310837 - 22 Nov 2024

Cited by 2 | Viewed by 1366

Abstract

Musical instrument recognition is a relatively unexplored area of machine learning due to the need to analyze complex spatial–temporal audio features. Traditional methods using individual spectrograms, like STFT, Log-Mel, and MFCC, often miss the full range of features. Here, we propose a hierarchical [...] Read more.

Musical instrument recognition is a relatively unexplored area of machine learning due to the need to analyze complex spatial–temporal audio features. Traditional methods using individual spectrograms, like STFT, Log-Mel, and MFCC, often miss the full range of features. Here, we propose a hierarchical residual attention network using a scaled combination of multiple spectrograms, including STFT, Log-Mel, MFCC, and CST features (Chroma, Spectral contrast, and Tonnetz), to create a comprehensive sound representation. This model enhances the focus on relevant spectrogram parts through attention mechanisms. Experimental results with the OpenMIC-2018 dataset show significant improvement in classification accuracy, especially with the “Magnified 1/4 Size” configuration. Future work will optimize CST feature scaling, explore advanced attention mechanisms, and apply the model to other audio tasks to assess its generalizability. Full article

(This article belongs to the Special Issue AI in Audio Analysis: Spectrogram-Based Recognition)

► Show Figures

Figure 1

17 pages, 1265 KiB

Open AccessArticle

Message Action Adapter Framework in Multi-Agent Reinforcement Learning

by Bumjin Park and Jaesik Choi

Appl. Sci. 2024, 14(21), 10079; https://doi.org/10.3390/app142110079 - 4 Nov 2024

Viewed by 1792

Abstract

Multi-agent reinforcement learning (MARL) has demonstrated significant potential in enabling cooperative agents. The communication protocol, which is responsible for message exchange between agents, is crucial in cooperation. However, communicative MARL systems still face challenges due to the noisy messages in complex multi-agent decision [...] Read more.

Multi-agent reinforcement learning (MARL) has demonstrated significant potential in enabling cooperative agents. The communication protocol, which is responsible for message exchange between agents, is crucial in cooperation. However, communicative MARL systems still face challenges due to the noisy messages in complex multi-agent decision processes. This issue often stems from the entangled representation of observations and messages in policy networks. To address this, we propose the Message Action Adapter Framework (MAAF), which first trains individual agents without message inputs and then adapts a residual action based on message components. This separation isolates the effect of messages on action inference. We explore how training the MAAF framework with model-agnostic message types and varying optimization strategies influences adaptation performance. The experimental results indicate that MAAF achieves competitive performance across multiple baselines despite utilizing only half of the available communication, and shows an average improvement of 7.6% over the full attention-based communication approach. Additional findings suggest that different message types result in significant performance variations, emphasizing the importance of environment-specific message types. We demonstrate how the proposed architecture separates communication channels, effectively isolating message contributions. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

16 pages, 1818 KiB

Open AccessArticle

FFA-BiGRU: Attention-Based Spatial-Temporal Feature Extraction Model for Music Emotion Classification

by Yuping Su, Jie Chen, Ruiting Chai, Xiaojun Wu and Yumei Zhang

Appl. Sci. 2024, 14(16), 6866; https://doi.org/10.3390/app14166866 - 6 Aug 2024

Cited by 2 | Viewed by 1816

Abstract

Music emotion recognition is becoming an important research direction due to its great significance for music information retrieval, music recommendation, and so on. In the task of music emotion recognition, the key to achieving accurate emotion recognition lies in how to extract the [...] Read more.

Music emotion recognition is becoming an important research direction due to its great significance for music information retrieval, music recommendation, and so on. In the task of music emotion recognition, the key to achieving accurate emotion recognition lies in how to extract the affect-salient features fully. In this paper, we propose an end-to-end spatial-temporal feature extraction method named FFA-BiGRU for music emotion classification. Taking the log Mel-spectrogram of music audio as the input, this method employs an attention-based convolutional residual module named FFA, which serves as a spatial feature learning module to obtain multi-scale spatial features. In the FFA module, three group architecture blocks extract multi-level spatial features, each of which consists of a stack of multiple channel-spatial attention-based residual blocks. Then, the output features from FFA are fed into the bidirectional gated recurrent units (BiGRU) module to capture the temporal features of music further. In order to make full use of the extracted spatial and temporal features, the output feature maps of FFA and those of the BiGRU are concatenated in the channel dimension. Finally, the concatenated features are passed through fully connected layers to predict the emotion classification results. The experimental results of the EMOPIA dataset show that the proposed model achieves better classification accuracy than the existing baselines. Meanwhile, the ablation experiments also demonstrate the effectiveness of each part of the proposed method. Full article

(This article belongs to the Special Issue Machine Learning in Audio Signal Processing and Music Information Retrieval)

► Show Figures

Figure 1

Search Results (80)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (80)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI