Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (12)

Search Parameters:
Keywords = multi-dimension distortion features

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 1481 KiB  
Article
Analysis and Research on Spectrogram-Based Emotional Speech Signal Augmentation Algorithm
by Huawei Tao, Sixian Li, Xuemei Wang, Binkun Liu and Shuailong Zheng
Entropy 2025, 27(6), 640; https://doi.org/10.3390/e27060640 - 15 Jun 2025
Viewed by 349
Abstract
Data augmentation techniques are widely applied in speech emotion recognition to increase the diversity of data and enhance the performance of models. However, existing research has not deeply explored the impact of these data augmentation techniques on emotional data. Inappropriate augmentation algorithms may [...] Read more.
Data augmentation techniques are widely applied in speech emotion recognition to increase the diversity of data and enhance the performance of models. However, existing research has not deeply explored the impact of these data augmentation techniques on emotional data. Inappropriate augmentation algorithms may distort emotional labels, thereby reducing the performance of models. To address this issue, in this paper we systematically evaluate the influence of common data augmentation algorithms on emotion recognition from three dimensions: (1) we design subjective auditory experiments to intuitively demonstrate the impact of augmentation algorithms on the emotional expression of speech; (2) we jointly extract multi-dimensional features from spectrograms based on the Librosa library and analyze the impact of data augmentation algorithms on the spectral features of speech signals through heatmap visualization; and (3) we objectively evaluate the recognition performance of the model by means of indicators such as cross-entropy loss and introduce statistical significance analysis to verify the effectiveness of the augmentation algorithms. The experimental results show that “time stretching” may distort speech features, affect the attribution of emotional labels, and significantly reduce the model’s accuracy. In contrast, “reverberation” (RIR) and “resampling” within a limited range have the least impact on emotional data, enhancing the diversity of samples. Moreover, their combination can increase accuracy by up to 7.1%, providing a basis for optimizing data augmentation strategies. Full article
Show Figures

Figure 1

18 pages, 4494 KiB  
Article
MDFN: Enhancing Power Grid Image Quality Assessment via Multi-Dimension Distortion Feature
by Zhenyu Chen, Jianguang Du, Jiwei Li and Hongwei Lv
Sensors 2025, 25(11), 3414; https://doi.org/10.3390/s25113414 - 29 May 2025
Viewed by 464
Abstract
Low-quality power grid image data can greatly affect the effect of deep learning in the power industry. Therefore, adopting accurate image quality assessment techniques is essential for screening high-quality power grid images. Although current blind image quality assessment (BIQA) methods have made some [...] Read more.
Low-quality power grid image data can greatly affect the effect of deep learning in the power industry. Therefore, adopting accurate image quality assessment techniques is essential for screening high-quality power grid images. Although current blind image quality assessment (BIQA) methods have made some progress, they usually use only one type of feature and ignore other factors that affect the quality of images, such as noise and brightness, which are highly relevant to low-quality power grid images with noise, underexposure, and overexposure. Therefore, we propose a multi-dimension distortion feature network (MDFN) based on CNN and Transformer, which considers high-frequency (edges and details) and low-frequency (semantic and structural) features of images, along with noise and brightness features, to achieve more accurate quality assessment. Specifically, the network employs a dual-branch feature extractor, where the CNN branch captures local distortion features and the Transformer branch integrates both local and global features. We argue that separating low-frequency and high-frequency components enables richer distortion features. Thus, we propose a frequency selection module (FSM) which extracts high-frequency and low-frequency features and updates these features to achieve global spatial information fusion. Additionally, previous methods only use the CLS token for predicting the quality score of the image. Considering the issues of severe noise and exposure in power grid images, we design an effective way to extract noise and brightness features and combine them with the CLS token for the prediction. The results of the experiments indicate that our method surpasses existing approaches across three public datasets and a power grid image dataset, which shows the superiority of our proposed method. Full article
Show Figures

Figure 1

26 pages, 9328 KiB  
Article
Global Optical and SAR Image Registration Method Based on Local Distortion Division
by Bangjie Li, Dongdong Guan, Yuzhen Xie, Xiaolong Zheng, Zhengsheng Chen, Lefei Pan, Weiheng Zhao and Deliang Xiang
Remote Sens. 2025, 17(9), 1642; https://doi.org/10.3390/rs17091642 - 6 May 2025
Viewed by 560
Abstract
Variations in terrain elevation cause images acquired under different imaging modalities to deviate from a linear mapping relationship. This effect is particularly pronounced between optical and SAR images, where the range-based imaging mechanism of SAR sensors leads to significant local geometric distortions, such [...] Read more.
Variations in terrain elevation cause images acquired under different imaging modalities to deviate from a linear mapping relationship. This effect is particularly pronounced between optical and SAR images, where the range-based imaging mechanism of SAR sensors leads to significant local geometric distortions, such as perspective shrinkage and occlusion. As a result, it becomes difficult to represent the spatial correspondence between optical and SAR images using a single geometric model. To address this challenge, we propose a global optical-SAR image registration method that leverages local distortion characteristics. Specifically, we introduce a Superpixel-based Local Distortion Division (SLDD) method, which defines superpixel region features and segments the image into local distortion and normal regions by computing the Mahalanobis distance between superpixel features. We further design a Multi-Feature Fusion Capsule Network (MFFCN) that integrates shallow salient features with deep structural details, reconstructing the dimensions of digital capsules to generate feature descriptors encompassing texture, phase, structure, and amplitude information. This design effectively mitigates the information loss and feature degradation problems caused by pooling operations in conventional convolutional neural networks (CNNs). Additionally, a hard negative mining loss is incorporated to further enhance feature discriminability. Feature descriptors are extracted separately from regions with different distortion levels, and corresponding transformation models are built for local registration. Finally, the local registration results are fused to generate a globally aligned image. Experimental results on public datasets demonstrate that the proposed method achieves superior performance over state-of-the-art (SOTA) approaches in terms of Root Mean Squared Error (RMSE), Correct Match Number (CMN), Distribution of Matched Points (Scat), Edge Fidelity (EF), and overall visual quality. Full article
(This article belongs to the Special Issue Temporal and Spatial Analysis of Multi-Source Remote Sensing Images)
Show Figures

Figure 1

21 pages, 27582 KiB  
Article
Multi-Level Spectral Attention Network for Hyperspectral BRDF Reconstruction from Multi-Angle Multi-Spectral Images
by Liyao Song and Haiwei Li
Remote Sens. 2025, 17(5), 863; https://doi.org/10.3390/rs17050863 - 28 Feb 2025
Cited by 1 | Viewed by 903
Abstract
With the rapid development of hyperspectral applications using unmanned aerial vehicles (UAVs), the traditional assumption that ground objects exhibit Lambertian reflectance is no longer sufficient to meet the high-precision requirements for quantitative inversion and airborne hyperspectral data applications. Therefore, it is necessary to [...] Read more.
With the rapid development of hyperspectral applications using unmanned aerial vehicles (UAVs), the traditional assumption that ground objects exhibit Lambertian reflectance is no longer sufficient to meet the high-precision requirements for quantitative inversion and airborne hyperspectral data applications. Therefore, it is necessary to establish a hyperspectral bidirectional reflectance distribution function (BRDF) model suitable for the area of imaging. However, obtaining multi-angle information from UAV push-broom hyperspectral data is difficult. Achieving uniform push-broom imaging and flexibly acquiring multi-angle data is challenging due to spatial distortions, particularly under heightened roll or pitch angles, and the need for multiple flights; this extends acquisition time and exacerbates uneven illumination, introducing errors in BRDF model construction. To address these issues, we propose leveraging the advantages of multi-spectral cameras, such as their compact size, lightweight design, and high signal-to-noise ratio (SNR) to reconstruct hyperspectral multi-angle data. This approach enhances spectral resolution and the number of bands while mitigating spatial distortions and effectively captures the multi-angle characteristics of ground objects. In this study, we collected UAV hyperspectral multi-angle data, corresponding illumination information, and atmospheric parameter data, which can solve the problem of existing BRDF modeling not considering outdoor ambient illumination changes, as this limits modeling accuracy. Based on this dataset, we propose an improved Walthall model, considering illumination variation. Then, the radiance consistency of BRDF multi-angle data is effectively optimized, the error caused by illumination variation in BRDF modeling is reduced, and the accuracy of BRDF modeling is improved. In addition, we adopted Transformer for spectral reconstruction, increased the number of bands on the basis of spectral dimension enhancement, and conducted BRDF modeling based on the spectral reconstruction results. For the multi-level Transformer spectral dimension enhancement algorithm, we added spectral response loss constraints to improve BRDF accuracy. In order to evaluate BRDF modeling and quantitative application potential from the reconstruction results, we conducted comparison and ablation experiments. Finally, we solved the problem of difficulty in obtaining multi-angle information due to the limitation of hyperspectral imaging equipment, and we provide a new solution for obtaining multi-angle features of objects with higher spectral resolution using low-cost imaging equipment. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

16 pages, 5347 KiB  
Article
Achieving High-Accuracy Target Recognition Using Few ISAR Images via Multi-Prototype Network with Attention Mechanism
by Linbo Zhang, Xiuting Zou, Shaofu Xu, Bowen Ma, Wenbin Lu, Zhenbin Lv and Weiwen Zou
Electronics 2024, 13(23), 4703; https://doi.org/10.3390/electronics13234703 - 28 Nov 2024
Viewed by 899
Abstract
Inverse synthetic aperture radar (ISAR) is a significant means of detection in space of non-cooperative targets, which means that the imaging geometry and associated parameters between the ISAR platform and the detection targets are unknown. In this way, a large number of ISAR [...] Read more.
Inverse synthetic aperture radar (ISAR) is a significant means of detection in space of non-cooperative targets, which means that the imaging geometry and associated parameters between the ISAR platform and the detection targets are unknown. In this way, a large number of ISAR images for high-accuracy target recognition are difficult to obtain. Recently, prototypical networks (PNs) have gained considerable attention as an effective method for few-shot learning. However, due to the specificity of the ISAR imaging mechanism, ISAR images often have unknown range and azimuth distortions, resulting in a poor imaging effect. Therefore, this condition poses a challenge for a PN to represent a class through a prototype. To address this issue, we use a multi-prototype network (MPN) with attention mechanism for ISAR image target recognition. The use of multiple prototypes eases the uncertainty associated with the fixed structure of a single prototype, enabling the capture of more comprehensive target information. Furthermore, to maximize the feature extraction capability of MPN for ISAR images, this method introduces the classical convolutional block attention module (CBAM) attentional mechanism, where CBAM generates attentional feature maps along channel and spatial dimensions to generate multiple robust prototypes. Experimental results demonstrate that this method outperforms state-of-the-art few-shot methods. In a four-class classification task, it achieved a target recognition accuracy of 95.08%, representing an improvement of 9.94–17.49% over several other few-shot approaches. Full article
(This article belongs to the Section Microwave and Wireless Communications)
Show Figures

Figure 1

25 pages, 3089 KiB  
Article
A Hybrid Trio-Deep Feature Fusion Model for Improved Skin Cancer Classification: Merging Dermoscopic and DCT Images
by Omneya Attallah
Technologies 2024, 12(10), 190; https://doi.org/10.3390/technologies12100190 - 3 Oct 2024
Cited by 4 | Viewed by 3239
Abstract
The precise and prompt identification of skin cancer is essential for efficient treatment. Variations in colour within skin lesions are critical signs of malignancy; however, discrepancies in imaging conditions may inhibit the efficacy of deep learning models. Numerous previous investigations have neglected this [...] Read more.
The precise and prompt identification of skin cancer is essential for efficient treatment. Variations in colour within skin lesions are critical signs of malignancy; however, discrepancies in imaging conditions may inhibit the efficacy of deep learning models. Numerous previous investigations have neglected this problem, frequently depending on deep features from a singular layer of an individual deep learning model. This study presents a new hybrid deep learning model that integrates discrete cosine transform (DCT) with multi-convolutional neural network (CNN) structures to improve the classification of skin cancer. Initially, DCT is applied to dermoscopic images to enhance and correct colour distortions in these images. After that, several CNNs are trained separately with the dermoscopic images and the DCT images. Next, deep features are obtained from two deep layers of each CNN. The proposed hybrid model consists of triple deep feature fusion. The initial phase involves employing the discrete wavelet transform (DWT) to merge multidimensional attributes obtained from the first layer of each CNN, which lowers their dimension and provides time–frequency representation. In addition, for each CNN, the deep features of the second deep layer are concatenated. Afterward, in the subsequent deep feature fusion stage, for each CNN, the merged first-layer features are combined with the second-layer features to create an effective feature vector. Finally, in the third deep feature fusion stage, these bi-layer features of the various CNNs are integrated. Through the process of training multiple CNNs on both the original dermoscopic photos and the DCT-enhanced images, retrieving attributes from two separate layers, and incorporating attributes from the multiple CNNs, a comprehensive representation of attributes is generated. Experimental results showed 96.40% accuracy after trio-deep feature fusion. This shows that merging DCT-enhanced images and dermoscopic photos can improve diagnostic accuracy. The hybrid trio-deep feature fusion model outperforms individual CNN models and most recent studies, thus proving its superiority. Full article
(This article belongs to the Special Issue Medical Imaging & Image Processing III)
Show Figures

Figure 1

21 pages, 57724 KiB  
Article
MDSCNN: Remote Sensing Image Spatial–Spectral Fusion Method via Multi-Scale Dual-Stream Convolutional Neural Network
by Wenqing Wang, Fei Jia, Yifei Yang, Kunpeng Mu and Han Liu
Remote Sens. 2024, 16(19), 3583; https://doi.org/10.3390/rs16193583 - 26 Sep 2024
Cited by 2 | Viewed by 1879
Abstract
Pansharpening refers to enhancing the spatial resolution of multispectral images through panchromatic images while preserving their spectral features. However, existing traditional methods or deep learning methods always have certain distortions in the spatial or spectral dimensions. This paper proposes a remote sensing spatial–spectral [...] Read more.
Pansharpening refers to enhancing the spatial resolution of multispectral images through panchromatic images while preserving their spectral features. However, existing traditional methods or deep learning methods always have certain distortions in the spatial or spectral dimensions. This paper proposes a remote sensing spatial–spectral fusion method based on a multi-scale dual-stream convolutional neural network, which includes feature extraction, feature fusion, and image reconstruction modules for each scale. In terms of feature fusion, we propose a multi cascade module to better fuse image features. We also design a new loss function aim at enhancing the high degree of consistency between fused images and reference images in terms of spatial details and spectral information. To validate its effectiveness, we conduct thorough experimental analyses on two widely used remote sensing datasets: GeoEye-1 and Ikonos. Compared with the nine leading pansharpening techniques, the proposed method demonstrates superior performance in multiple key evaluation metrics. Full article
Show Figures

Figure 1

21 pages, 61283 KiB  
Article
Two-Branch Underwater Image Enhancement and Original Resolution Information Optimization Strategy in Ocean Observation
by Dehuan Zhang, Wei Cao, Jingchun Zhou, Yan-Tsung Peng, Weishi Zhang and Zifan Lin
J. Mar. Sci. Eng. 2023, 11(7), 1285; https://doi.org/10.3390/jmse11071285 - 25 Jun 2023
Cited by 3 | Viewed by 1778
Abstract
In complex marine environments, underwater images often suffer from color distortion, blur, and poor visibility. Existing underwater image enhancement methods predominantly rely on the U-net structure, which assigns the same weight to different resolution information. However, this approach lacks the ability to extract [...] Read more.
In complex marine environments, underwater images often suffer from color distortion, blur, and poor visibility. Existing underwater image enhancement methods predominantly rely on the U-net structure, which assigns the same weight to different resolution information. However, this approach lacks the ability to extract sufficient detailed information, resulting in problems such as blurred details and color distortion. We propose a two-branch underwater image enhancement method with an optimized original resolution information strategy to address this limitation. Our method comprises a feature enhancement subnetwork (FEnet) and an original resolution subnetwork (ORSnet). FEnet extracts multi-resolution information and utilizes an adaptive feature selection module to enhance global features in different dimensions. The enhanced features are then fed into ORSnet as complementary features, which extract local enhancement features at the original image scale to achieve semantically consistent and visually superior enhancement effects. Experimental results on the UIEB dataset demonstrate that our method achieves the best performance compared to the state-of-the-art methods. Furthermore, through comprehensive application testing, we have validated the superiority of our proposed method in feature extraction and enhancement compared to other end-to-end underwater image enhancement methods. Full article
Show Figures

Figure 1

19 pages, 5262 KiB  
Article
Feature Selection Fuzzy Neural Network Super-Twisting Harmonic Control
by Qi Pan, Yanli Zhou and Juntao Fei
Mathematics 2023, 11(6), 1495; https://doi.org/10.3390/math11061495 - 18 Mar 2023
Cited by 6 | Viewed by 1380
Abstract
This paper provides a multi-feedback feature selection fuzzy neural network (MFFSFNN) based on super-twisting sliding mode control (STSMC), aiming at compensating for current distortion and solving the harmonic current problem in an active power filter (APF) system. A feature selection layer is added [...] Read more.
This paper provides a multi-feedback feature selection fuzzy neural network (MFFSFNN) based on super-twisting sliding mode control (STSMC), aiming at compensating for current distortion and solving the harmonic current problem in an active power filter (APF) system. A feature selection layer is added to an output feedback neural network to attach the characteristics of signal filtering to the neural network. MFFSFNN, with the designed feedback loops and hidden layer, has the advantages of signal judging, filtering, and feedback. Signal filtering can choose valuable signals to deal with lumped uncertainties, and signal feedback can expand the learning dimension to improve the approximation accuracy. The STSMC, as a compensator with adaptive gains, helps to stabilize the compensation current. An experimental study is implemented to prove the effectiveness and superiority of the proposed controller. Full article
(This article belongs to the Special Issue Application of Artificial Intelligence and Control Theory)
Show Figures

Figure 1

23 pages, 14791 KiB  
Article
MPFINet: A Multilevel Parallel Feature Injection Network for Panchromatic and Multispectral Image Fusion
by Yuting Feng, Xin Jin, Qian Jiang, Quanli Wang, Lin Liu and Shaowen Yao
Remote Sens. 2022, 14(23), 6118; https://doi.org/10.3390/rs14236118 - 2 Dec 2022
Cited by 2 | Viewed by 2309
Abstract
The fusion of a high-spatial-resolution panchromatic (PAN) image and a corresponding low-resolution multispectral (MS) image can yield a high-resolution multispectral (HRMS) image, which is also known as pansharpening. Most previous methods based on convolutional neural networks (CNNs) have achieved remarkable results. However, information [...] Read more.
The fusion of a high-spatial-resolution panchromatic (PAN) image and a corresponding low-resolution multispectral (MS) image can yield a high-resolution multispectral (HRMS) image, which is also known as pansharpening. Most previous methods based on convolutional neural networks (CNNs) have achieved remarkable results. However, information of different scales has not been fully mined and utilized, and still produces spectral and spatial distortion. In this work, we propose a multilevel parallel feature injection network that contains three scale levels and two parallel branches. In the feature extraction branch, a multi-scale perception dynamic convolution dense block is proposed to adaptively extract the spatial and spectral information. Then, the sufficient multilevel features are injected into the image reconstruction branch, and an attention fusion module based on the spectral dimension is designed in order to fuse shallow contextual features and deep semantic features. In the image reconstruction branch, cascaded transformer blocks are employed to capture the similarities among the spectral bands of the MS image. Extensive experiments are conducted on the QuickBird and WorldView-3 datasets to demonstrate that MPFINet achieves significant improvement over several state-of-the-art methods on both spatial and spectral quality assessments. Full article
(This article belongs to the Special Issue Pansharpening and Beyond in the Deep Learning Era)
Show Figures

Graphical abstract

18 pages, 2137 KiB  
Article
Start from Scratch: A Crowdsourcing-Based Data Fusion Approach to Support Location-Aware Applications
by Yonghang Jiang, Bingyi Liu, Ze Wang and Xiaoquan Yi
Sensors 2019, 19(20), 4518; https://doi.org/10.3390/s19204518 - 17 Oct 2019
Cited by 4 | Viewed by 3026
Abstract
As one of the most important breakthroughs for modern transportation, the indoor location-based technology has been gradually penetrating into our daily lives and underlines the foundation of the Internet of Things (IoT). To improve the positioning accuracy and efficiency, crowdsourcing has been widely [...] Read more.
As one of the most important breakthroughs for modern transportation, the indoor location-based technology has been gradually penetrating into our daily lives and underlines the foundation of the Internet of Things (IoT). To improve the positioning accuracy and efficiency, crowdsourcing has been widely applied in indoor localization in recent years. However, the crowdsourced data can hardly be fused easily to enable usable applications for the reason that the data are collected by different users, in different locations, at different times, with different noises and distortions. Although different data fusing methods have been implemented in different crowdsourcing services, we find that they may not fully leverage the data collected from multiple dimensions that can potentially lead to a better fusion results. In order to address this problem, we propose a more general solution, which can fuse the multi-dimensional crowdsourced data together and align them with the consistent time and location stamps, by using the features of the sensory data only, and thus build high quality crowdsourcing services from the raw data samplings collected from the environment. Finally, we conduct extensive evaluations and experiments using different commercial devices to validate the effectiveness of the method we proposed. Full article
Show Figures

Figure 1

22 pages, 8287 KiB  
Article
Comparison of GNSS-, TLS- and Different Altitude UAV-Generated Datasets on the Basis of Spatial Differences
by Huseyin Yurtseven
ISPRS Int. J. Geo-Inf. 2019, 8(4), 175; https://doi.org/10.3390/ijgi8040175 - 3 Apr 2019
Cited by 25 | Viewed by 8303
Abstract
In this study, different in-situ and close-range sensing surveying techniques were compared based on the spatial differences of the resultant datasets. In this context, the DJI Phantom 3 Advanced and Trimble UX5 Unmanned Aerial Vehicle (UAV) platforms, Zoller + Fröhlich 5010C phase comparison [...] Read more.
In this study, different in-situ and close-range sensing surveying techniques were compared based on the spatial differences of the resultant datasets. In this context, the DJI Phantom 3 Advanced and Trimble UX5 Unmanned Aerial Vehicle (UAV) platforms, Zoller + Fröhlich 5010C phase comparison for continuous wave-based Terrestrial Laser Scanning (TLS) system and Network Real Time Kinematic (NRTK) Global Navigation Satellite System (GNSS) receiver were used to obtain the horizontal and vertical information about the study area. All data were collected in a gently (mean slope angle 4%) inclined, flat vegetation-free, bare-earth valley bottom near Istanbul, Turkey (the size is approximately 0.7 ha). UAV data acquisitions were performed at 25-, 50-, 120-m (with DJI Phantom 3 Advanced) and 350-m (with Trimble UX5) flight altitudes (above ground level, AGL). The imagery was processed with the state-of-the-art SfM (Structure-from-Motion) photogrammetry software. The ortho-mosaics and digital elevation models were generated from UAV-based photogrammetric and TLS-based data. GNSS- and TLS-based data were used as reference to calculate the accuracy of the UAV-based geodata. The UAV-results were assessed in 1D (points), 2D (areas) and 3D (volumes) based on the horizontal (X- and Y-directions) and vertical (Z-direction) differences. Various error measures, including the RMSE (Root Mean Square Error), ME (Mean Error) or MAE (Mean Average Error), and simple descriptive statistics were used to calculate the residuals. The comparison of the results is simplified by applying a normalization procedure commonly used in multi-criteria-decision-making analysis or visualizing offset. According to the results, low-altitude (25 and 50 m AGL) flights feature higher accuracy in the horizontal dimension (e.g., mean errors of 0.085 and 0.064 m, respectively) but lower accuracy in the Z-dimension (e.g., false positive volumes of 2402 and 1160 m3, respectively) compared to the higher-altitude flights (i.e., 120 and 350 m AGL). The accuracy difference with regard to the observed terrain heights are particularly striking, depending on the compared error measure, up to a factor of 40 (i.e., false positive values for 120 vs. 50 m AGL). This error is attributed to the “doming-effect”—a broad-scale systematic deformation of the reconstructed terrain surface, which is commonly known in SfM photogrammetry and results from inaccuracies in modeling the radial distortion of the camera lens. Within the scope of the study, the “doming-effect” was modeled as a functional surface by using the spatial differences and the results were indicated that the “doming-effect” increases inversely proportional to the flight altitude. Full article
Show Figures

Graphical abstract

Back to TopTop