Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (74)

Search Parameters:
Keywords = data augmentation (DA)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 734 KB  
Article
Hybrid Deep Learning Model for EI-MS Spectra Prediction
by Bartosz Majewski and Marta Łabuda
Int. J. Mol. Sci. 2026, 27(3), 1588; https://doi.org/10.3390/ijms27031588 - 5 Feb 2026
Viewed by 414
Abstract
Electron ionization (EI) mass spectrometry (MS) is a widely used technique for the compound identification and production of spectra. However, incomplete coverage of reference spectral libraries limits reliable analysis of newly characterized molecules. This study presents a hybrid deep learning model for predicting [...] Read more.
Electron ionization (EI) mass spectrometry (MS) is a widely used technique for the compound identification and production of spectra. However, incomplete coverage of reference spectral libraries limits reliable analysis of newly characterized molecules. This study presents a hybrid deep learning model for predicting EI-MS spectra directly from molecular structure. The approach combines a graph neural network encoder with a residual neural network decoder, followed by refinement using cross-attention, bidirectional prediction, and probabilistic, chemistry-informed masks. Trained on the NIST14 EI-MS database (≤500 Da), the model achieves strong library matching performance (Recall@10 ≈ 80.8%) and high spectral similarity. The proposed hybrid GNN (Graph Neural Network)-ResNet (Residual Neural Network) model can generate high-quality synthetic EI-MS spectra to supplement existing libraries, potentially reducing the cost and effort of experimental spectrum acquisition. The obtained results demonstrate the potential of data-driven models to augment EI-MS libraries, while highlighting remaining challenges in generalization and spectral uniqueness. Full article
Show Figures

Graphical abstract

34 pages, 3680 KB  
Article
A Semi-Supervised Transformer with a Curriculum Training Pipeline for Remote Sensing Image Semantic Segmentation
by Peizhuo Liu, Hongbo Zhu, Xiaofei Mi, Yuke Meng, Huijie Zhao and Xingfa Gu
Remote Sens. 2026, 18(3), 480; https://doi.org/10.3390/rs18030480 - 2 Feb 2026
Viewed by 325
Abstract
Semantic segmentation of remote sensing images is crucial for geospatial applications but is severely hampered by the prohibitive cost of pixel-level annotations. Although semi-supervised learning (SSL) offers a solution by leveraging unlabeled data, its application to Vision Transformers (ViTs) often encounters overfitting and [...] Read more.
Semantic segmentation of remote sensing images is crucial for geospatial applications but is severely hampered by the prohibitive cost of pixel-level annotations. Although semi-supervised learning (SSL) offers a solution by leveraging unlabeled data, its application to Vision Transformers (ViTs) often encounters overfitting and even training instability under extreme label scarcity. To tackle these challenges, we propose a Curriculum-based Self-supervised and Semi-supervised Pipeline (CSSP). The pipeline adopts a staged, easy-to-hard training strategy, commencing with in-domain pretraining for robust feature representation, followed by a carefully designed finetuning stage to prevent overfitting. The pipeline further integrates a novel Difficulty-Adaptive ClassMix (DA-ClassMix) augmentation that dynamically reinforces underperforming categories and a Progressive Intensity Adaptation (PIA) strategy that systematically escalates augmentation strength to maximize model generalization. Extensive evaluations on the Potsdam, Vaihingen, and Inria datasets demonstrate state-of-the-art performance. Notably, with only 1/32 of the labeled data on the Potsdam dataset, the CSSP reaches 82.16% mIoU, nearly matching the fully supervised result (82.24%). Furthermore, we extend the CSSP to a semi-supervised domain adaptation (SSDA) scenario, termed Cross-Domain CSSP (CDCSSP), which outperforms existing SSDA and unsupervised domain adaptation (UDA) methods. This work establishes a stable and highly effective framework for training ViT-based segmentation models with minimal annotation overhead. Full article
Show Figures

Figure 1

24 pages, 4209 KB  
Article
Stability-Oriented Deep Learning for Hyperspectral Soil Organic Matter Estimation
by Yun Deng and Yuxi Shi
Sensors 2026, 26(2), 741; https://doi.org/10.3390/s26020741 - 22 Jan 2026
Viewed by 194
Abstract
Soil organic matter (SOM) is a key indicator for evaluating soil fertility and ecological functions, and hyperspectral technology provides an effective means for its rapid and non-destructive estimation. However, in practical soil systems, the spectral response of SOM is often highly covariant with [...] Read more.
Soil organic matter (SOM) is a key indicator for evaluating soil fertility and ecological functions, and hyperspectral technology provides an effective means for its rapid and non-destructive estimation. However, in practical soil systems, the spectral response of SOM is often highly covariant with mineral composition, moisture conditions, and soil structural characteristics. Under small-sample conditions, hyperspectral SOM modeling results are usually highly sensitive to spectral preprocessing methods, sample perturbations, and model architecture and parameter configurations, leading to fluctuations in predictive performance across independent runs and thereby limiting model stability and practical applicability. To address these issues, this study proposes a multi-strategy collaborative deep learning modeling framework for small-sample conditions (SE-EDCNN-DA-LWGPSO). Under unified data partitioning and evaluation settings, the framework integrates spectral preprocessing, data augmentation based on sensor perturbation simulation, multi-scale dilated convolution feature extraction, an SE channel attention mechanism, and a linearly weighted generalized particle swarm optimization algorithm. Subtropical red soil samples from Guangxi were used as the study object. Samples were partitioned using the SPXY method, and multiple independent repeated experiments were conducted to evaluate the predictive performance and training consistency of the model under fixed validation conditions. The results indicate that the combination of Savitzky–Golay filtering and first-derivative transformation (SG–1DR) exhibits superior overall stability among various preprocessing schemes. In model structure comparison and ablation analysis, as dilated convolution, data augmentation, and channel attention mechanisms were progressively introduced, the fluctuations of prediction errors on the validation set gradually converged, and the performance dispersion among different independent runs was significantly reduced. Under ten independent repeated experiments, the final model achieved R2 = 0.938 ± 0.010, RMSE = 2.256 ± 0.176 g·kg−1, and RPD = 4.050 ± 0.305 on the validation set, demonstrating that the proposed framework has good modeling consistency and numerical stability under small-sample conditions. Full article
(This article belongs to the Section Environmental Sensing)
Show Figures

Figure 1

21 pages, 1089 KB  
Article
Data Augmentation and Time–Frequency Joint Attention for Underwater Acoustic Communication Modulation Classification
by Mingyu Cao, Qi Chen, Jinsong Tang and Haoran Wu
J. Mar. Sci. Eng. 2026, 14(2), 172; https://doi.org/10.3390/jmse14020172 - 13 Jan 2026
Viewed by 246
Abstract
This paper presents a modulation signal classification and recognition algorithm based on data augmentation and time–frequency joint attention (DA-TFJA) for underwater acoustic (UWA) communication systems. UWA communication, as an important means of marine information transmission, plays a key role in fields such as [...] Read more.
This paper presents a modulation signal classification and recognition algorithm based on data augmentation and time–frequency joint attention (DA-TFJA) for underwater acoustic (UWA) communication systems. UWA communication, as an important means of marine information transmission, plays a key role in fields such as marine engineering, military reconnaissance, and marine science research. Accurate recognition of modulated signals is a core technology for ensuring the reliability of UWA communication systems. Traditional classification and recognition methods, mostly based on pure neural network algorithms, suffer from insufficient feature representation and limited generalization performance in complex and changing UWA channel environments. They also struggle to address complex factors such as multipath, Doppler shift, and noise interference, often resulting in scarce effective training samples and inadequate classification accuracy. To overcome these limitations, the proposed DA-TFJA algorithm simulates the characteristics of real UWA channels through two novel data augmentation strategies: the adaptive time–frequency transform enhancement algorithm (ATFT) and dynamic path superposition enhancement algorithm (DPSE). An end-to-end recognition network is developed that integrates a multiscale time–frequency feature extractor (MTFE), two-layer long short-term memory (LSTM) temporal modeling, and a time–frequency joint attention mechanism (TFAM). This comprehensive architecture achieves high-precision recognition of six modulation types, including 2FSK, 4FSK, BPSK, QPSK, DSSS, and OFDM. Experimental results demonstrate that compared with existing advanced methods, DA-TFJA achieves a classification accuracy of 98.36% on the measured reservoir dataset, representing an improvement of 3.09 percentage points, which fully verifies the effectiveness and practical value of the proposed approach. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

17 pages, 2411 KB  
Article
Geographical Origin Identification of Citrus Fruits Based on Near-Infrared Spectroscopy Combined with Convolutional Neural Network and Data Augmentation
by Zhihong Lu, Kangkang Jia, Haoyang Zhang, Lei Tan, Saritporn Vittayapadung, Lie Deng and Qiang Lyu
Agriculture 2025, 15(22), 2350; https://doi.org/10.3390/agriculture15222350 - 12 Nov 2025
Viewed by 931
Abstract
Accurately determining citrus origin is essential for establishing and maintaining regional brands with distinctive qualities while safeguarding the rights and interests of both farmers and consumers. In this study, 2693 navel orange samples were collected from 13 major producing regions in China to [...] Read more.
Accurately determining citrus origin is essential for establishing and maintaining regional brands with distinctive qualities while safeguarding the rights and interests of both farmers and consumers. In this study, 2693 navel orange samples were collected from 13 major producing regions in China to establish a comprehensive near-infrared spectroscopy (NIRS) dataset. To address the challenge of citrus origin authentication, this study proposes a novel six-layer one-dimensional convolutional neural network (1D-CNN). The classification accuracy of this model reaches 96.16%. Compared with the support vector machine (SVM), partial least squares discriminant analysis (PLS-DA), and three-layer 1D-CNNs with kernel sizes of 3 and 16, the accuracy of the proposed six-layer model is improved by 9.65%, 3.21%, 3.84%, and 1.98%, respectively. Furthermore, the dataset is augmented using a Wasserstein Generative Adversarial Network (WGAN) and Noise Addition. The results indicate that data augmentation can effectively improve the accuracy of various algorithm models. Among them, the 1D-CNN proposed in this study achieves the best performance on the Noise Addition-augmented dataset, with its accuracy, precision, recall, and F1-score reaching 98.39%, 0.9843, 0.9839, and 0.9840, respectively. Compared with the other four comparative models, the accuracy of this model is increased by 1.48%, 1.36%, 1.48%, and 2.85%, respectively. Finally, a visual analysis of the 1D-CNN’s feature-extraction process was conducted. The results demonstrate that the 1D-CNN can effectively extract discriminative NIR spectral features to accurately distinguish citrus from different origins and that data augmentation markedly improves model performance by increasing data diversity. This work provides a robust tool for citrus origin tracing and offers a new perspective for the origin authentication of other agricultural products. Full article
(This article belongs to the Section Agricultural Product Quality and Safety)
Show Figures

Figure 1

28 pages, 6333 KB  
Article
Domain-Adaptive Graph Attention Semi-Supervised Network for Temperature-Resilient SHM of Composite Plates
by Nima Rezazadeh, Alessandro De Luca, Donato Perfetto, Giuseppe Lamanna, Fawaz Annaz and Mario De Oliveira
Sensors 2025, 25(22), 6847; https://doi.org/10.3390/s25226847 - 9 Nov 2025
Cited by 4 | Viewed by 1037
Abstract
This study introduces GAT-CAMDA, a novel framework for the structural health monitoring (SHM) of composite materials under temperature-induced variability, leveraging the powerful feature extraction capabilities of Graph Attention Networks (GATs) and advanced domain adaptation (DA) techniques. By combining Maximum Mean Discrepancy (MMD) and [...] Read more.
This study introduces GAT-CAMDA, a novel framework for the structural health monitoring (SHM) of composite materials under temperature-induced variability, leveraging the powerful feature extraction capabilities of Graph Attention Networks (GATs) and advanced domain adaptation (DA) techniques. By combining Maximum Mean Discrepancy (MMD) and Correlation Alignment (CORAL) losses with a domain-discriminative adversarial model, the framework achieves scalable alignment of feature distributions across temperature domains, ensuring robust damage detection. A simple yet at the same time efficient data augmentation process extrapolates damage behaviour across unmeasured temperature conditions, addressing the scarcity of damaged-state observations. Hyperparameter optimisation via Optuna not only identifies the optimal settings to enhance model performance, achieving a classification accuracy of 95.83% on a benchmark dataset, but also illustrates hyperparameter significance for explainability. Additionally, the GAT architecture’s attention demonstrates the importance of various sensors, enhancing transparency and reliability in damage detection. The dual use of Optuna serves to refine model accuracy and elucidate parameter impacts, while GAT-CAMDA represents a significant advancement in SHM, enabling precise, interpretable, and scalable diagnostics across complex operational environments. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
Show Figures

Figure 1

22 pages, 376 KB  
Article
CSCVAE-NID: A Conditionally Symmetric Two-Stage CVAE Framework with Cost-Sensitive Learning for Imbalanced Network Intrusion Detection
by Zhenyu Wang and Xuejun Yu
Entropy 2025, 27(11), 1086; https://doi.org/10.3390/e27111086 - 22 Oct 2025
Viewed by 924
Abstract
With the increasing complexity and diversity of network threats, developing high-performance Network Intrusion Detection Systems (NIDSs) has become a critical challenge. A primary obstacle in this domain is the pervasive issue of class imbalance, where the scarcity of minority attack samples and the [...] Read more.
With the increasing complexity and diversity of network threats, developing high-performance Network Intrusion Detection Systems (NIDSs) has become a critical challenge. A primary obstacle in this domain is the pervasive issue of class imbalance, where the scarcity of minority attack samples and the varying costs of misclassification severely limit the effectiveness of traditional models, often leading to a difficult trade-off between high False Positive Rates (FPRs) and low Recall. To address this challenge, this paper proposes a novel, conditionally symmetric two-stage framework, termed CSCVAE-NID (Conditionally Symmetric Two-Stage CVAE for Network Intrusion Detection). The framework operates in two synergistic stages: Firstly, a Data Augmentation Conditional Variational Autoencoder (DA-CVAE) is introduced to tackle the data imbalance problem at the data level. By conditioning on attack categories, the DA-CVAE generates high-quality and diverse synthetic samples for underrepresented classes, providing a more balanced training dataset. Secondly, the core of our framework, a Cost-Sensitive Multi-Class Classification CVAE (CSMC-CVAE), is proposed. This model innovatively reframes the classification task as a probabilistic distribution matching problem and integrates a cost-sensitive learning strategy at the algorithm level. By incorporating a predefined cost matrix into its loss function, the CSMC-CVAE is compelled to prioritize the correct classification of high-cost, minority attack classes. Comprehensive experiments conducted on the public CICIDS-2017 and UNSW-NB15 datasets demonstrate the superiority of the proposed CSCVAE-NID framework. Compared to several state-of-the-art methods, our approach achieves exceptional performance in both binary and multi-class classification tasks. Notably, the DA-CVAE module is designed to be independent and extensible, allowing the effective data that it generates to support any advanced intrusion detection methodology. Full article
Show Figures

Figure 1

22 pages, 1270 KB  
Article
Cotton Yield Prediction with Gaussian Distribution Sampling and Variational AutoEncoder
by Yaqi Lan, Xiudong Wang, Lei Gao and Xiaoliang Chen
Appl. Sci. 2025, 15(18), 9947; https://doi.org/10.3390/app15189947 - 11 Sep 2025
Cited by 1 | Viewed by 1052
Abstract
Accurate cotton yield prediction is crucial for agricultural production management, resource optimization, and market supply–demand balance. However, achieving high-precision cotton yield prediction faces significant challenges mainly because cotton growth is influenced by complex, nonlinear environmental factors. Traditional machine learning models struggle to fully [...] Read more.
Accurate cotton yield prediction is crucial for agricultural production management, resource optimization, and market supply–demand balance. However, achieving high-precision cotton yield prediction faces significant challenges mainly because cotton growth is influenced by complex, nonlinear environmental factors. Traditional machine learning models struggle to fully capture these complex factors, and deep learning models typically rely on large amounts of high-quality data. The high cost of obtaining field measurement data leads to a scarcity of high-quality datasets, further limiting the performance of prediction models. To overcome these challenges, this study proposes a novel cotton yield prediction architecture—Gaussian distribution data augmentation and variational autoencoder (GD-VAE). This architecture’s configuration offers the following advantages: (1) it calculates the mean and covariance of existing data, with new samples conforming to the original data distribution being sampled and generated to effectively expand the training dataset by utilizing Gaussian distribution data; (2) it uses an end-to-end variational autoencoder (VAE) that automatically learns the low-dimensional, compact, and discriminative feature representations of the input data. Specifically, GD-VAE uses a Gaussian distribution to model the original cotton yield data and generates augmented data through sampling. The VAE then learns deep feature representations from these data, which are fed into a regressor for final yield prediction. To evaluate the performance of GD-VAE, we conducted extensive tests under challenging cross-year and cross-district conditions. In the cross-year test in Bahawalnagar, Pakistan, GD-VAE achieved a root mean square error (RMSE) of 58.4 lbs/acre, a mean absolute error (MAE) of 38.19 lbs/acre, and a coefficient of determination (R2) of 0.65 between the actual and predicted yields. In the more challenging cross-year and cross-district test in Turkey, GD-VAE achieved an RMSE of 46.46 kg/da, an MAE of 37.74 kg/da, and an R2 of 0.14. The results indicate that the GD-VAE architecture significantly improves the accuracy of cotton yield prediction under limited data conditions through effective data augmentation and deep feature learning. This research provides an effective technical means for predicting challenges in agriculture with limited samples, which has important practical significance for ensuring global food security and sustainable agricultural development (to enhance analytical tractability, we use each district’s value by converting kg/ha to 1 lbs/acre, with 1.121 kg/ha converting to 1 kg/da, which is equivalent to 10 kg/ha). Full article
(This article belongs to the Section Agricultural Science and Technology)
Show Figures

Figure 1

24 pages, 7632 KB  
Article
Air Battlefield Time Series Data Augmentation Model Based on a Lightweight Denoising Diffusion Probabilistic Model
by Bo Cao, Qinghua Xing, Longyue Li, Junjie Shi and Weijie Lin
AI 2025, 6(8), 192; https://doi.org/10.3390/ai6080192 - 18 Aug 2025
Viewed by 1281
Abstract
The uncertainty and confrontational nature of war itself pose significant challenges to the collection and storage of aerial battlefield temporal data. To address the issue of insufficient training of intelligent models caused by the scarcity of air battlefield situation data, this paper designs [...] Read more.
The uncertainty and confrontational nature of war itself pose significant challenges to the collection and storage of aerial battlefield temporal data. To address the issue of insufficient training of intelligent models caused by the scarcity of air battlefield situation data, this paper designs an air battlefield time series data augmentation model based on a lightweight denoising diffusion probabilistic model (LDMKD-DA). Considering the advantages of a denoising diffusion probabilistic model (DDPM) in processing images, this paper transforms 1D time series data into image data. 1D univariate time series data, such as High-resolution Range Profile dataset, are transformed by Gramian angular fields and Markov transition fields. Multivariate time series data, such as the air target intention dataset, are transformed by matrix expansion. Then, the data augmentation model is constructed based on the denoising diffusion probabilistic model. Considering the need for miniaturization and intelligence in future combat platforms, the depthwise separable convolution is introduced to lighten the DDPM, and, at the same time, the improved knowledge distillation method is introduced to accelerate the sampling process. The experimental results show that LDMKD-DA is capable of generating synthetic data similar to real data with high quality while significantly reducing FLOPs and params, while having significant advantages in univariate and multivariate time series data amplification. Full article
Show Figures

Figure 1

18 pages, 2791 KB  
Article
Deterministic Data Assimilation in Thermal-Hydraulic Analysis: Application to Natural Circulation Loops
by Lanxin Gong, Changhong Peng and Qingyu Huang
J. Nucl. Eng. 2025, 6(3), 23; https://doi.org/10.3390/jne6030023 - 3 Jul 2025
Cited by 1 | Viewed by 1325
Abstract
Recent advances in high-fidelity modeling, numerical computing, and data science have spurred interest in model-data integration for nuclear reactor applications. While machine learning often prioritizes data-driven predictions, this study focuses on data assimilation (DA) to synergize physical models with measured data, aiming to [...] Read more.
Recent advances in high-fidelity modeling, numerical computing, and data science have spurred interest in model-data integration for nuclear reactor applications. While machine learning often prioritizes data-driven predictions, this study focuses on data assimilation (DA) to synergize physical models with measured data, aiming to enhance predictive accuracy and reduce uncertainties. We implemented deterministic DA methods—Kalman filter (KF) and three-dimensional variational (3D-VAR)—in a one-dimensional single-phase natural circulation loop and extended 3D-VAR to RELAP5, a system code for two-phase loop analysis. Unlike surrogate-based or model-reduction strategies, our approach leverages full-model propagation without relying on computationally intensive sampling. The results demonstrate that KF and 3D-VAR exhibit robustness against varied noise types, intensities, and distributions, achieving significant uncertainty reduction in state variables and parameter estimation. The framework’s adaptability is further validated under oceanic conditions, suggesting its potential to augment baseline models beyond conventional extrapolation boundaries. These findings highlight DA’s capacity to improve model calibration, safety margin quantification, and reactor field reconstruction. By integrating high-fidelity simulations with real-world data corrections, the study establishes a scalable pathway to enhance the reliability of nuclear system predictions, emphasizing DA’s role in bridging theoretical models and operational demands without compromising computational efficiency. Full article
(This article belongs to the Special Issue Advances in Thermal Hydraulics of Nuclear Power Plants)
Show Figures

Figure 1

37 pages, 11208 KB  
Article
Sustainable Self-Training Pig Detection System with Augmented Single Labeled Target Data for Solving Domain Shift Problem
by Junhee Lee, Heechan Chae, Seungwook Son, Jongwoong Seo, Yooil Suh, Jonguk Lee, Yongwha Chung and Daihee Park
Sensors 2025, 25(11), 3406; https://doi.org/10.3390/s25113406 - 28 May 2025
Cited by 2 | Viewed by 1314
Abstract
As global pork consumption rises, livestock farms increasingly adopt deep learning-based automated monitoring systems for efficient pigsty management. Typically, a system applies a pre-trained model on a source domain to a target domain. However, real pigsty environments differ significantly from existing public datasets [...] Read more.
As global pork consumption rises, livestock farms increasingly adopt deep learning-based automated monitoring systems for efficient pigsty management. Typically, a system applies a pre-trained model on a source domain to a target domain. However, real pigsty environments differ significantly from existing public datasets regarding lighting conditions, camera angles, and animal density. These discrepancies result in a substantial domain shift, leading to severe performance degradation. Additionally, due to variations in the structure of pigsties, pig breeds, and sizes across farms, it is practically challenging to develop a single generalized model that can be applied to all environments. Overcoming this limitation through large-scale labeling presents considerable burdens in terms of time and cost. To address the degradation issue, this study proposes a self-training-based domain adaptation method that utilizes a single label on target (SLOT) sample from the target domain, a genetic algorithm (GA)-based data augmentation search (DAS) designed explicitly for SLOT data to optimize the augmentation parameters, and a super-low-threshold strategy to include low-confidence-scored pseudo-labels during self-training. The proposed system consists of the following three modules: (1) data collection module; (2) preprocessing module that selects key frames and extracts SLOT data; and (3) domain-adaptive pig detection module that applies DAS to SLOT data to generate optimized augmented data, which are used to train the base model. Then, the trained base model is improved through self-training, where a super-low threshold is applied to filter pseudo-labels. The experimental results show that the proposed system significantly improved the average precision (AP) from 36.86 to 90.62 under domain shift conditions, which achieved a performance close to fully supervised learning while relying solely on SLOT data. The proposed system maintained a robust detection performance across various pig-farming environments and demonstrated stable performance under domain shift conditions, validating its feasibility for real-world applications. Full article
(This article belongs to the Special Issue Feature Papers in Smart Agriculture 2025)
Show Figures

Figure 1

16 pages, 3645 KB  
Article
A Global Coseismic InSAR Dataset for Deep Learning: Automated Construction from Sentinel-1 Observations (2015–2024)
by Xu Liu, Zhenjie Wang, Yingfeng Zhang, Xinjian Shan and Ziwei Liu
Remote Sens. 2025, 17(11), 1832; https://doi.org/10.3390/rs17111832 - 23 May 2025
Cited by 3 | Viewed by 3060
Abstract
Interferometric synthetic aperture radar (InSAR) technology has been widely employed in the rapid monitoring of earthquakes and associated geological hazards. With the continued advancement of InSAR technology, the growing volume of satellite-acquired data has opened new avenues for applying deep learning (DL) techniques [...] Read more.
Interferometric synthetic aperture radar (InSAR) technology has been widely employed in the rapid monitoring of earthquakes and associated geological hazards. With the continued advancement of InSAR technology, the growing volume of satellite-acquired data has opened new avenues for applying deep learning (DL) techniques to the analysis of earthquake-induced surface deformation. Although DL holds great promise for processing InSAR data, its development progress has been significantly constrained by the absence of large-scale, accurately annotated datasets related to earthquake-induced deformation. To address this limitation, we propose an automated method for constructing deep learning training datasets by integrating the Global Centroid Moment Tensor (GCMT) earthquake catalog with Sentinel-1 InSAR observations. This approach reduces the inefficiencies and manual labor typically involved in InSAR data preparation, thereby significantly enhancing the efficiency and automation of constructing deep learning datasets for coseismic deformation. Using this method, we developed and publicly released a large-scale training dataset consisting of coseismic InSAR samples. The dataset contained 353 Sentinel-1 interferograms corresponding to 62 global earthquakes that occurred between 2015 and 2024. Following standardized preprocessing and data augmentation (DA), a large number of image samples were generated for model training. Multidimensional analyses of the dataset confirmed its high quality and strong representativeness, making it a valuable asset for deep learning research on coseismic deformation. The dataset construction process followed a standardized and reproducible workflow, ensuring objectivity and consistency throughout data generation. As additional coseismic InSAR observations become available, the dataset can be continuously expanded, evolving into a comprehensive, high-quality, and diverse training resource. It serves as a solid foundation for advancing deep learning applications in the field of InSAR-based coseismic deformation analysis. Full article
(This article belongs to the Special Issue Artificial Intelligence and Remote Sensing for Geohazards)
Show Figures

Figure 1

26 pages, 982 KB  
Review
Harnessing Data Analytics for Enhanced Public Programming in Archives and Museums: A Scoping Review
by Mthokozisi Masumbika Ncube and Patrick Ngulube
Heritage 2025, 8(5), 163; https://doi.org/10.3390/heritage8050163 - 5 May 2025
Cited by 1 | Viewed by 2144
Abstract
A notable lacuna exists in the extant research regarding the application of data analytics (DA) to augment public programming and cultivate robust connections between archives, museums, and their constituent communities. This scoping review aimed to address this gap by mapping the available literature [...] Read more.
A notable lacuna exists in the extant research regarding the application of data analytics (DA) to augment public programming and cultivate robust connections between archives, museums, and their constituent communities. This scoping review aimed to address this gap by mapping the available literature at the intersection of data analytics, archives, and museums. Adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) guidelines, a two-stage selection process was employed, utilising a comprehensive search strategy across four databases and seven specialised journals. This search identified 37 publications that met the pre-defined inclusion criteria. Findings revealed a growing interest in data-driven approaches, with nearly half of the reviewed studies explicitly linking data analytics to public programming. The review identified diverse data analytics techniques employed, ranging from traditional methods to cutting-edge artificial intelligence (AI) applications, and highlighted the various data sources utilised. Furthermore, this study examined the transformative potential of data analytics across several key dimensions of public programming, including access, archival management, user experience, public engagement, and research methodologies. The review noted ethical considerations, data quality issues, preservation challenges, and accessibility concerns associated with leveraging data analytics in archives and museums. Full article
Show Figures

Figure 1

20 pages, 2914 KB  
Article
Cross-Dataset Data Augmentation Using UMAP for Deep Learning-Based Wind Speed Prediction
by Eder Arley Leon-Gomez, Andrés Marino Álvarez-Meza and German Castellanos-Dominguez
Computers 2025, 14(4), 123; https://doi.org/10.3390/computers14040123 - 27 Mar 2025
Cited by 1 | Viewed by 2013
Abstract
Wind energy has emerged as a cornerstone in global efforts to transition to renewable energy, driven by its low environmental impact and significant generation potential. However, the inherent intermittency of wind, influenced by complex and dynamic atmospheric patterns, poses significant challenges for accurate [...] Read more.
Wind energy has emerged as a cornerstone in global efforts to transition to renewable energy, driven by its low environmental impact and significant generation potential. However, the inherent intermittency of wind, influenced by complex and dynamic atmospheric patterns, poses significant challenges for accurate wind speed prediction. Existing approaches, including statistical methods, machine learning, and deep learning, often struggle with limitations such as non-linearity, non-stationarity, computational demands, and the requirement for extensive, high-quality datasets. In response to these challenges, we propose a novel neighborhood preserving cross-dataset data augmentation framework for high-horizon wind speed prediction. The proposed method addresses data variability and dynamic behaviors through three key components: (i) the uniform manifold approximation and projection (UMAP) is employed as a non-linear dimensionality reduction technique to encode local relationships in wind speed time-series data while preserving neighborhood structures, (ii) a localized cross-dataset data augmentation (DA) approach is introduced using UMAP-reduced spaces to enhance data diversity and mitigate variability across datasets, and (iii) recurrent neural networks (RNNs) are trained on the augmented datasets to model temporal dependencies and non-linear patterns effectively. Our framework was evaluated using datasets from diverse geographical locations, including the Argonne Weather Observatory (USA), Chengdu Airport (China), and Beijing Capital International Airport (China). Comparative tests using regression-based measures on RNN, GRU, and LSTM architectures showed that the proposed method was better at improving the accuracy and generalizability of predictions, leading to an average reduction in prediction error. Consequently, our study highlights the potential of integrating advanced dimensionality reduction, data augmentation, and deep learning techniques to address critical challenges in renewable energy forecasting. Full article
(This article belongs to the Special Issue Machine Learning and Statistical Learning with Applications 2025)
Show Figures

Figure 1

18 pages, 2362 KB  
Article
Hyperspectral Target Detection Based on Masked Autoencoder Data Augmentation
by Zhixuan Zhuang, Jinhui Lan and Yiliang Zeng
Remote Sens. 2025, 17(6), 1097; https://doi.org/10.3390/rs17061097 - 20 Mar 2025
Cited by 2 | Viewed by 2804
Abstract
Deep metric learning combines deep learning with metric learning to explore the deep spectral space and distinguish between the target and background. Current target detection methods typically fail to accurately distinguish local differences between the target and background, leading to insufficient suppression of [...] Read more.
Deep metric learning combines deep learning with metric learning to explore the deep spectral space and distinguish between the target and background. Current target detection methods typically fail to accurately distinguish local differences between the target and background, leading to insufficient suppression of the pixels surrounding the target and poor detection performance. To solve this issue, a hyperspectral target detection method based on masked autoencoder data augmentation (HTD-DA) was proposed. HTD-DA includes a multi-scale spectral metric network based on a triplet network, which enhances the ability to learn local and global spectral variations using multi-scale feature extraction and feature fusion, thereby improving background suppression. To alleviate the lack of training data, a masked spectral data augmentation network was employed. It utilizes the entire hyperspectral image (HSI) training the network to learn spectral variability through mask-based reconstruction techniques and generate target samples based on the prior spectrum. Additionally, in search of more optimal spectral space, an Inter-class Difference Amplification Triplet (IDAT) Loss was introduced to enhance the separation between the target and background when finding the spectral space, by making full use of background and prior information. The experimental results demonstrated that the proposed model provides superior detection results. Full article
(This article belongs to the Special Issue Image Processing from Aerial and Satellite Imagery)
Show Figures

Graphical abstract

Back to TopTop