Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline

Search Results (141)

Search Parameters:
Keywords = augmented state-space model

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 877 KB  
Article
SFD-ADNet: Spatial–Frequency Dual-Domain Adaptive Deformation for Point Cloud Data Augmentation
by Jiacheng Bao, Lingjun Kong and Wenju Wang
J. Imaging 2026, 12(2), 58; https://doi.org/10.3390/jimaging12020058 (registering DOI) - 26 Jan 2026
Abstract
Existing 3D point cloud enhancement methods typically rely on artificially designed geometric transformations or local blending strategies, which are prone to introducing illogical deformations, struggle to preserve global structure, and exhibit insufficient adaptability to diverse degradation patterns. To address these limitations, this paper [...] Read more.
Existing 3D point cloud enhancement methods typically rely on artificially designed geometric transformations or local blending strategies, which are prone to introducing illogical deformations, struggle to preserve global structure, and exhibit insufficient adaptability to diverse degradation patterns. To address these limitations, this paper proposes SFD-ADNet—an adaptive deformation framework based on a dual spatial–frequency domain. It achieves 3D point cloud augmentation by explicitly learning deformation parameters rather than applying predefined perturbations. By jointly modeling spatial structural dependencies and spectral features, SFD-ADNet generates augmented samples that are both structurally aware and task-relevant. In the spatial domain, a hierarchical sequence encoder coupled with a bidirectional Mamba-based deformation predictor captures long-range geometric dependencies and local structural variations, enabling adaptive position-aware deformation control. In the frequency domain, a multi-scale dual-channel mechanism based on adaptive Chebyshev polynomials separates low-frequency structural components from high-frequency details, allowing the model to suppress noise-sensitive distortions while preserving the global geometric skeleton. The two deformation predictions dynamically fuse to balance structural fidelity and sample diversity. Extensive experiments conducted on ModelNet40-C and ScanObjectNN-C involved synthetic CAD models and real-world scanned point clouds under diverse perturbation conditions. SFD-ADNet, as a universal augmentation module, reduces the mCE metrics of PointNet++ and different backbone networks by over 20%. Experiments demonstrate that SFD-ADNet achieves state-of-the-art robustness while preserving critical geometric structures. Furthermore, models enhanced by SFD-ADNet demonstrate consistently improved robustness against diverse point cloud attacks, validating the efficacy of adaptive space-frequency deformation in robust point cloud learning. Full article
(This article belongs to the Special Issue 3D Image Processing: Progress and Challenges)
21 pages, 1284 KB  
Article
Probabilistic Indoor 3D Object Detection from RGB-D via Gaussian Distribution Estimation
by Hyeong-Geun Kim
Mathematics 2026, 14(3), 421; https://doi.org/10.3390/math14030421 - 26 Jan 2026
Abstract
Conventional object detectors represent each object by a deterministic bounding box, regressing its center and size from RGB images. However, such discrete parameterization ignores the inherent uncertainty in object appearance and geometric projection, which can be more naturally modeled as a probabilistic density [...] Read more.
Conventional object detectors represent each object by a deterministic bounding box, regressing its center and size from RGB images. However, such discrete parameterization ignores the inherent uncertainty in object appearance and geometric projection, which can be more naturally modeled as a probabilistic density field. Recent works have introduced Gaussian-based formulations that treat objects as distributions rather than boxes, yet they remain limited to 2D images or require late fusion between image and depth modalities. In this paper, we propose a unified Gaussian-based framework for direct 3D object detection from RGB-D inputs. Our method is built upon a vision transformer backbone to effectively capture global context. Instead of separately embedding RGB and depth features or refining depth within region proposals, our method takes a full four-channel RGB-D tensor and predicts the mean and covariance of a 3D Gaussian distribution for each object in a single forward pass. We extend a pretrained vision transformer to accept four-channel inputs by augmenting the patch embedding layer while preserving ImageNet-learned representations. This formulation allows the detector to represent both object location and geometric uncertainty in 3D space. By optimizing divergence metrics such as the Kullback–Leibler or Bhattacharyya distances between predicted and target distributions, the network learns a physically consistent probabilistic representation of objects. Experimental results on the SUN RGB-D benchmark demonstrate that our approach achieves competitive performance compared to state-of-the-art point-cloud-based methods while offering uncertainty-aware and geometrically interpretable 3D detections. Full article
Show Figures

Figure 1

25 pages, 5757 KB  
Article
Heatmap-Assisted Reinforcement Learning Model for Solving Larger-Scale TSPs
by Guanqi Liu and Donghong Xu
Electronics 2026, 15(3), 501; https://doi.org/10.3390/electronics15030501 - 23 Jan 2026
Viewed by 73
Abstract
Deep reinforcement learning (DRL)-based algorithms for solving the Traveling Salesman Problem (TSP) have demonstrated competitive potential compared to traditional heuristic algorithms on small-scale TSP instances. However, as the problem size increases, the NP-hard nature of the TSP leads to exponential growth in the [...] Read more.
Deep reinforcement learning (DRL)-based algorithms for solving the Traveling Salesman Problem (TSP) have demonstrated competitive potential compared to traditional heuristic algorithms on small-scale TSP instances. However, as the problem size increases, the NP-hard nature of the TSP leads to exponential growth in the combinatorial search space, state–action space explosion, and sharply increased sample complexity, which together cause significant performance degradation for most existing DRL-based models when directly applied to large-scale instances. This research proposes a two-stage reinforcement learning framework, termed GCRL-TSP (Graph Convolutional Reinforcement Learning for the TSP), which consists of a heatmap generation stage based on a graph convolutional neural network, and a heatmap-assisted Proximal Policy Optimization (PPO) training stage, where the generated heatmaps are used as auxiliary guidance for policy optimization. First, we design a divide-and-conquer heatmap generation strategy: a graph convolutional network infers m-node sub-heatmaps, which are then merged into a global edge-probability heatmap. Second, we integrate the heatmap into PPO by augmenting the state representation and restricting the action space toward high-probability edges, improving training efficiency. On standard instances with 200/500/1000 nodes, GCRL-TSP achieves a Gap% of 4.81/4.36/13.20 (relative to Concorde) with runtimes of 36 s/1.12 min/4.65 min. Experimental results show that GCRL-TSP achieves more than twice the solving speed compared to other TSP solving algorithms, while obtaining solution quality comparable to other algorithms on TSPs ranging from 200 to 1000 nodes. Full article
(This article belongs to the Section Artificial Intelligence)
25 pages, 1075 KB  
Article
Prompt-Based Few-Shot Text Classification with Multi-Granularity Label Augmentation and Adaptive Verbalizer
by Deling Huang, Zanxiong Li, Jian Yu and Yulong Zhou
Information 2026, 17(1), 58; https://doi.org/10.3390/info17010058 - 8 Jan 2026
Viewed by 243
Abstract
Few-Shot Text Classification (FSTC) aims to classify text accurately into predefined categories using minimal training samples. Recently, prompt-tuning-based methods have achieved promising results by constructing verbalizers that map input data to the label space, thereby maximizing the utilization of pre-trained model features. However, [...] Read more.
Few-Shot Text Classification (FSTC) aims to classify text accurately into predefined categories using minimal training samples. Recently, prompt-tuning-based methods have achieved promising results by constructing verbalizers that map input data to the label space, thereby maximizing the utilization of pre-trained model features. However, existing verbalizer construction methods often rely on external knowledge bases, which require complex noise filtering and manual refinement, making the process time-consuming and labor-intensive, while approaches based on pre-trained language models (PLMs) frequently overlook inherent prediction biases. Furthermore, conventional data augmentation methods focus on modifying input instances while overlooking the integral role of label semantics in prompt tuning. This disconnection often leads to a trade-off where increased sample diversity comes at the cost of semantic consistency, resulting in marginal improvements. To address these limitations, this paper first proposes a novel Bayesian Mutual Information-based method that optimizes label mapping to retain general PLM features while reducing reliance on irrelevant or unfair attributes to mitigate latent biases. Based on this method, we propose two synergistic generators that synthesize semantically consistent samples by integrating label word information from the verbalizer to effectively enrich data distribution and alleviate sparsity. To guarantee the reliability of the augmented set, we propose a Low-Entropy Selector that serves as a semantic filter, retaining only high-confidence samples to safeguard the model against ambiguous supervision signals. Furthermore, we propose a Difficulty-Aware Adversarial Training framework that fosters generalized feature learning, enabling the model to withstand subtle input perturbations. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods on most few-shot and full-data splits, with F1 score improvements of up to +2.8% on the standard AG’s News benchmark and +1.0% on the challenging DBPedia benchmark. Full article
Show Figures

Graphical abstract

31 pages, 14010 KB  
Article
Deep Reinforcement Learning for Financial Trading: Enhanced by Cluster Embedding and Zero-Shot Prediction
by Haoran Zhang, Xiaofei Li, Tianjiao Wan and Junjie Du
Symmetry 2026, 18(1), 112; https://doi.org/10.3390/sym18010112 - 7 Jan 2026
Viewed by 402
Abstract
Deep reinforcement learning (DRL) plays a pivotal role in decision-making within financial markets. However, DRL models are highly reliant on raw market data and often overlook the impact of future trends on model performance. To address these challenges, we propose a novel framework [...] Read more.
Deep reinforcement learning (DRL) plays a pivotal role in decision-making within financial markets. However, DRL models are highly reliant on raw market data and often overlook the impact of future trends on model performance. To address these challenges, we propose a novel framework named Cluster Embedding-Proximal Policy Optimization (CE-PPO) for trading decision-making in financial markets. Specifically, the framework groups feature channels with intrinsic similarities and enhances the original model by leveraging clustering information instead of features from individual channels. Meanwhile, zero-shot prediction for unseen samples is achieved by assigning them to appropriate clusters. Future Open, High, Low, Close, and Volume (OHLCV) data predicted from observed values are integrated with actually observed OHLCV data, forming the state space inherent to reinforcement learning. Experiments conducted on five real-world financial datasets demonstrate that the time series model integrated with Cluster Embedding (CE) achieves significant improvements in predictive performance: in short-term prediction, the Mean Absolute Error (MAE) is reduced by an average of 20.09% and the Mean Squared Error (MSE) by 30.12%; for zero-shot prediction, the MAE and MSE decrease by an average of 21.56% and 31.71%, respectively. Through data augmentation using real and predicted data, the framework substantially enhances trading performance, achieving a cumulative return rate of 137.94% on the S&P 500 Index. Beyond its empirical contributions, this study also highlights the conceptual relevance of symmetry in the domain of algorithmic trading. The constructed deep reinforcement learning framework is capable of capturing the inherent balanced relationships and nonlinear interaction characteristics embedded in financial market behaviors. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis III)
Show Figures

Figure 1

26 pages, 5848 KB  
Article
HR-Mamba: Building Footprint Segmentation with Geometry-Driven Boundary Regularization
by Buyu Su, Defei Yin, Piyuan Yi, Wenhuan Wu, Junjian Liu, Fan Yang, Haowei Mu and Jingyi Xiong
Sensors 2026, 26(2), 352; https://doi.org/10.3390/s26020352 - 6 Jan 2026
Viewed by 309
Abstract
Building extraction underpins land-use assessment, urban planning, and disaster mitigation, yet dense urban scenes still cause missed small objects, target adhesion, and ragged contours. We present High-Resolution-Mamba (HR-Mamba), a high-resolution semantic segmentation network that augments a High-Resolution Network (HRNet) parallel backbone with edge-aware [...] Read more.
Building extraction underpins land-use assessment, urban planning, and disaster mitigation, yet dense urban scenes still cause missed small objects, target adhesion, and ragged contours. We present High-Resolution-Mamba (HR-Mamba), a high-resolution semantic segmentation network that augments a High-Resolution Network (HRNet) parallel backbone with edge-aware and sequence-state modeling. A Canny-enhanced, median-filtered stem stabilizes boundaries under noise; Involution-based residual blocks capture position-specific local geometry; and a Mamba-based State Space Models (Mamba-SSM) global branch captures cross-scale long-range dependencies with linear complexity. Training uses a composite loss of binary cross entropy (BCE), Dice loss, and Boundary loss, with weights selected by joint grid search. We further design a feature-driven adaptive post-processing pipeline that includes geometric feature analysis, multi-strategy simplification, multi-directional regularization, and topological consistency verification to produce regular, smooth, engineering-ready building outlines. On dense urban imagery, HR-Mamba improves F1-score from 80.95% to 83.93%, an absolute increase of 2.98% relative to HRNet. We conclude that HR-Mamba jointly enhances detail fidelity and global consistency and offers a generalizable route for high-resolution building extraction in remote sensing. Full article
Show Figures

Figure 1

20 pages, 7543 KB  
Article
Contrastive Learning with Feature Space Interpolation for Retrieval-Based Chest X-Ray Report Generation
by Zahid Ur Rahman, Gwanghyun Yu, Lee Jin and Jin Young Kim
Appl. Sci. 2026, 16(1), 470; https://doi.org/10.3390/app16010470 - 1 Jan 2026
Viewed by 441
Abstract
Automated radiology report generation from chest X-rays presents a critical challenge in medical imaging. Traditional image-captioning models struggle with clinical specificity and rare pathologies. Recently, contrastive vision language learning has emerged as a robust alternative that learns joint visual–textual representations. However, applying contrastive [...] Read more.
Automated radiology report generation from chest X-rays presents a critical challenge in medical imaging. Traditional image-captioning models struggle with clinical specificity and rare pathologies. Recently, contrastive vision language learning has emerged as a robust alternative that learns joint visual–textual representations. However, applying contrastive learning (CL) to radiology remains challenging due to severe data scarcity. Prior work has employed input space augmentation, but these approaches incur computational overhead and risk distorting diagnostic features. This work presents CL with feature space interpolation for retrieval (CLFIR), a novel CL framework operating on learned embeddings. The method generates interpolated pairs in the feature embedding space by mixing original and shuffled embeddings in batches using a mixing coefficient λU(0.85,0.99). This approach increases batch diversity via synthetic samples, addressing the limitations of CL on medical data while preserving diagnostic integrity. Extensive experiments demonstrate state-of-the-art performance across critical clinical validation tasks. For report generation, CLFIR achieves BLEU-1/ROUGE/METEOR scores of 0.51/0.40/0.26 (Indiana university [IU] X-ray) and 0.45/0.34/0.22 (MIMIC-CXR). Moreover, CLFIR excels at image-to-text retrieval with R@1 scores of 4.14% (IU X-ray) and 24.3% (MIMIC-CXR) and achieves 0.65 accuracy in zero-shot classification on the CheXpert5×200 dataset, surpassing the established vision-language models. Full article
Show Figures

Figure 1

22 pages, 4344 KB  
Article
CGAP-HBSA: A Source Camera Identification Framework Under Few-Shot Conditions
by Yifan Hu, Zhiqiang Wen, Aofei Chen and Lini Wu
Symmetry 2026, 18(1), 71; https://doi.org/10.3390/sym18010071 - 31 Dec 2025
Viewed by 191
Abstract
Source camera identification relies on sensor noise features to distinguish between different devices, but large-scale sample labeling is time-consuming and labor-intensive, making it difficult to implement in real-world applications. The noise residuals generated by different camera sensors exhibit statistical asymmetry, and the structured [...] Read more.
Source camera identification relies on sensor noise features to distinguish between different devices, but large-scale sample labeling is time-consuming and labor-intensive, making it difficult to implement in real-world applications. The noise residuals generated by different camera sensors exhibit statistical asymmetry, and the structured patterns within these residuals also show local symmetric relationships. Together, these features form the theoretical foundation for camera source identification. To address the problem of limited labeled data under few-shot conditions, this paper proposes a Cross-correlation Guided Augmentation and Prediction with Hybrid Bidirectional State-Space Model Attention (CGAP-HBSA) framework, based on the aforementioned symmetry-related theoretical foundation. The method extracts symmetric correlation structures from unlabeled samples and converts them into reliable pseudo-labeled samples. Furthermore, the HBSA network jointly models symmetric structures and asymmetric variations in camera fingerprints using a bidirectional SSM module and a hybrid attention mechanism, thereby enhancing long-range spatial modeling capabilities and recognition robustness. In the Dresden dataset, the proposed method achieves an identification accuracy for the 5-shot camera source identification task that is only 0.02% lower than the current best-performing method under few-shot conditions, MDM-CPS, and outperforms other classical few-shot camera source identification methods. In the 10-shot task, the method improves by at least 0.3% compared to MDM-CPS. In the Vision dataset, the method improves the identification accuracy in the 5-shot camera source identification task by at least 6% compared to MDM-CPS, and in the 10-shot task, it improves by at least 3% over the best-performing MDM-CPS method. Experimental results demonstrate that the proposed method achieves competitive or superior performance in both 5-shot and 10-shot settings. Additional robustness experiments further confirm that the HBSA network maintains strong performance even under image compression and noise contamination conditions. Full article
Show Figures

Figure 1

28 pages, 689 KB  
Article
LLM-Augmented Sensor Fusion for Urban Socioeconomic Monitoring: A Cyber–Physical–Social Systems Perspective
by Hui Xie, Hui Cao and Hongkai Zhao
Systems 2026, 14(1), 36; https://doi.org/10.3390/systems14010036 - 29 Dec 2025
Viewed by 266
Abstract
Urban welfare can deteriorate over a few weeks, yet most official indicators are only updated quarterly. This mismatch in time scales leaves city administrations effectively blind to the early stages of emerging crises, especially in areas where vulnerable residents generate few administrative or [...] Read more.
Urban welfare can deteriorate over a few weeks, yet most official indicators are only updated quarterly. This mismatch in time scales leaves city administrations effectively blind to the early stages of emerging crises, especially in areas where vulnerable residents generate few administrative or digital records. We cast urban socioeconomic monitoring as a systems problem: a six-dimensional welfare state on a spatial grid, observed through sparse delayed administrative data and noisy digital traces whose reliability declines with digital exclusion. On top of this latent state, we design a four-layer cyber–physical–social (CPSS) architecture centered on a stochastic state-space model with empirically guided couplings. This is supported by a semantic sensing layer where large language models (LLMs) convert daily geo-referenced public text into noisy welfare indicators. These signals are then fused with quarterly administrative records via an extended Kalman filter (EKF). Finally, a lightweight convex post-processing layer enforces fairness, differential privacy, and minimum representation as hard constraints. A key ingredient is a state-dependent noise model in which the LLM observation variance grows exponentially with digital exclusion. Under this model, we study finite-horizon observability and obtain an exclusion threshold beyond which several welfare dimensions become effectively unobservable over 30–60 day horizons; EKF error bounds scale with the same exponent, clarifying when semantic sensing is informative and when it is not. Finally, a 100,000-agent agent-based model of a synthetic city with daily shocks suggests that, relative to a quarterly-only baseline, the LLM-augmented fusion pipeline substantially reduces detection lags and multi-dimensional cascade failures while keeping estimation error bounded and satisfying the explicit fairness and privacy constraints. Full article
Show Figures

Figure 1

22 pages, 488 KB  
Article
AIDE: An Active Inference-Driven Framework for Dynamic Evaluation via Latent State Modeling and Generative Reasoning
by Xi Chen, Changwang Liu, Chenyang Zhang, Yuxuan Wang, Jiayi Chang, Shuqing He, Wangyu Wu, Wenjun Yu and Jia Guo
Electronics 2026, 15(1), 99; https://doi.org/10.3390/electronics15010099 - 24 Dec 2025
Viewed by 281
Abstract
This paper introduces AIDE, an active inference-driven evaluation framework designed to provide a unified and theoretically grounded approach for analyzing sequential textual data. AIDE formulates the evaluation problem as variational inference in a latent dynamical system, enabling joint treatment of representation, temporal structure, [...] Read more.
This paper introduces AIDE, an active inference-driven evaluation framework designed to provide a unified and theoretically grounded approach for analyzing sequential textual data. AIDE formulates the evaluation problem as variational inference in a latent dynamical system, enabling joint treatment of representation, temporal structure, and predictive reasoning. The framework integrates (i) a representation and augmentation module based on variational learning and contrastive semantic encoding, (ii) a parametric state–space model that captures the evolution of latent states and supports probabilistic forecasting, and (iii) a policy-selection mechanism that minimizes the expected free energy, guiding a latent diffusion generator to produce coherent and interpretable evaluation outputs. This formulation yields a principled pipeline linking evidence accumulation, latent-state inference, and policy-driven generative reporting. Experimental studies demonstrate that AIDE provides stable inference, coherent predictions, and consistent evaluation behavior across heterogeneous textual sequences. The proposed framework offers a general probabilistic foundation for dynamic evaluation tasks and contributes a structured methodology for integrating representation learning, dynamical modeling, and generative mechanisms within a single variational paradigm. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

19 pages, 5932 KB  
Article
FACMamba: Frequency-Aware Coupled State Space Modeling for Underwater Image Enhancement
by Li Wang, Keyong Shen, Haiyang Sun, Xiaoling Cheng, Jun Zhu and Bixuan Wang
J. Mar. Sci. Eng. 2025, 13(12), 2258; https://doi.org/10.3390/jmse13122258 - 27 Nov 2025
Viewed by 447
Abstract
Recent advances in underwater image enhancement (UIE) have achieved notable progress using deep learning techniques; however, existing methods often struggle with limited receptive fields, inadequate frequency modeling, and poor structural perception, leading to sub-optimal visual quality and weak generalization in complex underwater environments. [...] Read more.
Recent advances in underwater image enhancement (UIE) have achieved notable progress using deep learning techniques; however, existing methods often struggle with limited receptive fields, inadequate frequency modeling, and poor structural perception, leading to sub-optimal visual quality and weak generalization in complex underwater environments. To tackle these issues, we propose FACMamba, a Mamba-based framework augmented with frequency-aware mechanisms, enabling efficient modeling of long-range spatial relations for underwater image restoration. Specifically, FACMamba incorporates three key components: a Multi-Directional Vision State-Space Module (MVSM) to model directional spatial context via the proposed 8-direction selective scan block (SS8D), a Frequency-Aware Guidance Module (FAGM) for learning informative frequency representations with low overhead, and a Structure-Aware Fusion Module (SAFM) to preserve fine-grained structural cues through adaptive multi-scale integration. Recognizing the importance of spatial-frequency interaction, our model fuses these representations via lightweight architecture to enhance both texture and color fidelity. Experiments on standard UIE benchmarks demonstrate that FACMamba achieves a favorable balance between enhancement quality and computational efficiency, outperforming many existing UIE methods. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

30 pages, 28451 KB  
Article
Boosting Diffusion Networks with Deep External Context-Aware Encoders for Low-Light Image Enhancement
by Pengliang Tang, Yu Wang and Aidong Men
Sensors 2025, 25(23), 7232; https://doi.org/10.3390/s25237232 - 27 Nov 2025
Viewed by 613
Abstract
Low-light image enhancement (LLIE) requires modeling spatially extensive and interdependent degradations across large pixel regions, while directly equipping diffusion-based LLIE with heavy global modules inside the iterative denoising backbone leads to prohibitive computational overhead. To enhance long-range context modeling without inflating the per-step [...] Read more.
Low-light image enhancement (LLIE) requires modeling spatially extensive and interdependent degradations across large pixel regions, while directly equipping diffusion-based LLIE with heavy global modules inside the iterative denoising backbone leads to prohibitive computational overhead. To enhance long-range context modeling without inflating the per-step cost of diffusion, we propose ECA-Diff, a diffusion framework augmented with a deep External Context-Aware Encoder (ECAE). A latent-space context network built with hybrid Transformer–Convolution blocks extracts holistic cues from the input, generates multi-scale context features once, and injects them into the diffusion backbone as lightweight conditional guidance across all sampling steps. In addition, a CIELAB-space Luminance-Adaptive Chromaticity Loss regularizes conditional diffusion training and mitigates the cool color cast frequently observed in low-luminance regions. Experiments on paired and unpaired benchmarks show that ECA-Diff consistently outperforms recent state-of-the-art LLIE methods in both full-reference (PSNR/SSIM/LPIPS) and no-reference (NIQE/BRISQUE) metrics, with the external context path introducing only modest overhead relative to the baseline diffusion backbone. These results indicate that decoupling global context estimation from the iterative denoising process is an effective way to boost diffusion-based LLIE and provides a general compute-once conditioning paradigm for low-level image restoration. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

15 pages, 3459 KB  
Article
Multi-Granularity Invariant Structure Learning for Text Classification in Entrepreneurship Policy
by Xinyu Sun and Meifang Yao
Mathematics 2025, 13(22), 3648; https://doi.org/10.3390/math13223648 - 14 Nov 2025
Viewed by 500
Abstract
Data-driven text classification technology is crucial for understanding and managing a large number of entrepreneurial policy-related texts, yet it is hindered by two primary challenges. First, the intricate, multi-faceted nature of policy documents often leads to insufficient information extraction, as existing models struggle [...] Read more.
Data-driven text classification technology is crucial for understanding and managing a large number of entrepreneurial policy-related texts, yet it is hindered by two primary challenges. First, the intricate, multi-faceted nature of policy documents often leads to insufficient information extraction, as existing models struggle to synergistically leverage diverse information types, such as statistical regularities, linguistic structures, and external factual knowledge, resulting in semantic sparsity. Second, the performance of state-of-the-art deep learning models is heavily reliant on large-scale annotated data, a resource that is scarce and costly to acquire in entrepreneurial policy domains, rendering models susceptible to overfitting and poor generalization. To address these challenges, this paper proposes a Multi-granularity Invariant Structure Learning (MISL) model. Specifically, MISL first employs a multi-view feature engineering module that constructs and fuses distinct statistical, linguistic, and knowledge graphs to generate a comprehensive and rich semantic representation, thereby alleviating semantic sparsity. Furthermore, to enhance robustness and generalization from limited data, we introduce a dual invariant structure learning framework. This framework operates at two levels: (1) sample-invariant representation learning uses data augmentation and mutual information maximization to learn the essential semantic core of a text, invariant to superficial perturbations; (2) neighborhood-invariant semantic learning applies a contrastive objective on a nearest-neighbor graph to enforce intra-class compactness and inter-class separability in the feature space. Extensive experiments demonstrate that our proposed MISL model significantly outperforms state-of-the-art baselines, proving its effectiveness and robustness for classifying complex texts in entrepreneurial policy domains. Full article
(This article belongs to the Special Issue Artificial Intelligence and Data Science, 2nd Edition)
Show Figures

Figure 1

21 pages, 6018 KB  
Article
Enhancing Object Detection with Shape-IoU and Scale–Space–Task Collaborative Lightweight Path Aggregation
by Guogang Wang, Xin Zhao, Denghui Dang, Junlong Wang and Yaqiu Chen
Appl. Sci. 2025, 15(22), 11976; https://doi.org/10.3390/app152211976 - 11 Nov 2025
Viewed by 595
Abstract
We propose a novel target detection algorithm that addresses the issues of ignoring shape attributes in regression loss and the inability of the high-parameter PAFPN to jointly perceive scale–space–task information. Specifically, we construct a Lightweight Path Aggregation Feature Pyramid Network (LPAFPN) to reduce [...] Read more.
We propose a novel target detection algorithm that addresses the issues of ignoring shape attributes in regression loss and the inability of the high-parameter PAFPN to jointly perceive scale–space–task information. Specifically, we construct a Lightweight Path Aggregation Feature Pyramid Network (LPAFPN) to reduce model parameters by shuffling and fusing features across channels. To further enhance its perception ability, we augment LPAFPN with a scale–space–task joint-perception enhancement module, terming the resulting network ALPAFPN, which can adaptively process joint information of scale, space, and task. Finally, we introduce a shape-scale bounding box regression loss method that focuses on the target’s intrinsic attributes to optimize the regression measurement, thereby boosting the detection accuracy. Experimental results show that the proposed algorithm outperforms state-of-the-art algorithms in terms of F1 score, Precision, and Mean Average Precision (mAP) on the PASCAL VOC and VisDrone2019-DET datasets. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

26 pages, 10083 KB  
Article
Triple-Stream Contrastive Deep Embedding Clustering via Semantic Structure
by Aiyu Zheng, Jianghui Cai, Haifeng Yang, Yalin Xun and Xujun Zhao
Mathematics 2025, 13(22), 3578; https://doi.org/10.3390/math13223578 - 7 Nov 2025
Viewed by 721
Abstract
Deep neural network-based deep clustering has achieved remarkable success by unifying representation learning and clustering. However, conventional representation modules are typically not tailored for clustering, resulting in conflicting objectives that hinder the model’s ability to capture semantic structures with high intra-cluster cohesion and [...] Read more.
Deep neural network-based deep clustering has achieved remarkable success by unifying representation learning and clustering. However, conventional representation modules are typically not tailored for clustering, resulting in conflicting objectives that hinder the model’s ability to capture semantic structures with high intra-cluster cohesion and low inter-cluster separation. To overcome this limitation, we propose a novel framework called Triple-stream Contrastive Deep Embedding Clustering via Semantic Structure (TCSS). TCSS is composed of representation and clustering modules, with its innovation rooted in several key designs that ensure their synergistic interaction for modeling semantic structures. First, TCSS introduces a triple-stream input framework that processes the raw instance along with its limited and aggressive augmented views. This design enables a new triple-stream self-training clustering loss, which uncovers implicit cluster structures by contrasting the three input streams. Second, within this loss, a dynamic clustering structure factor is developed to represent the evolving semantic structure in the representation space, thereby constraining the clustering-prediction distribution. Third, TCSS integrates semantic structure-aware techniques, including a clustering-oriented negative sampling strategy and a triple-stream alignment scheme based on k-nearest neighbors and centroids, to refine semantic structures both locally and globally. Extensive experiments on five benchmark datasets demonstrate that TCSS outperforms state-of-the-art methods. Full article
Show Figures

Figure 1

Back to TopTop