Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (32)

Search Parameters:
Keywords = pretext tasks

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 3620 KiB  
Article
Question and Symbol: Challenges for a Contemporary Bell Tower
by Pablo Ramos Alderete, Ana Isabel Santolaria Castellanos and Felipe Samarán Saló
Religions 2025, 16(4), 405; https://doi.org/10.3390/rel16040405 - 22 Mar 2025
Viewed by 489
Abstract
Historically, bell towers have been religious and architectural symbols in the landscape that summoned the faithful to celebrations and fulfilled a crucial territorial significance task. This function was assumed by the towers of some universities. The real need of the University Francisco de [...] Read more.
Historically, bell towers have been religious and architectural symbols in the landscape that summoned the faithful to celebrations and fulfilled a crucial territorial significance task. This function was assumed by the towers of some universities. The real need of the University Francisco de Vitoria to build a bell tower for its new chapel and to be significant both for its campus and the city is the pretext to investigate the need for this element in the current context through an academic exercise with architecture students. Traditionally, the religious autority proposed a concrete celebration space. In this case, architecture students were commissioned to propose a contemporary response for the new bell tower of their university campus through a Design Workshop. The workshop result raises interesting questions about what the architecture of a bell tower should be like in the XXI century, the relationship with public space, the construction of a landmark on an urban scale, the need to respond to both the city and the immediate environment at its different scales, the obsolescence of elements such as clocks or bells, and, above all, the relevance of symbols and the way that architecture raises questions in the contemporary landscape. Full article
(This article belongs to the Special Issue Religion, Public Space and Society)
Show Figures

Figure 1

21 pages, 12011 KiB  
Article
Fine-Grained Air Pollution Inference at Large-Scale Region Level via 3D Spatiotemporal Attention Super-Resolution Model
by Changqun Li, Shan Tang, Jing Liu, Kai Pan, Zhenyi Xu, Yunbo Zhao and Shuchen Yang
Atmosphere 2025, 16(2), 166; https://doi.org/10.3390/atmos16020166 - 31 Jan 2025
Viewed by 818
Abstract
Air pollution presents a serious hazard to human health and the environment for the global rise in industrialization and urbanization. While fine-grained monitoring is crucial for understanding the formation and control of air pollution and their effects on human health, existing macro-regional level [...] Read more.
Air pollution presents a serious hazard to human health and the environment for the global rise in industrialization and urbanization. While fine-grained monitoring is crucial for understanding the formation and control of air pollution and their effects on human health, existing macro-regional level or ground-level methods make air pollution inference in the same spatial scale and fail to address the spatiotemporal correlations between cross-grained air pollution distribution. In this paper, we propose a 3D spatiotemporal attention super-resolution model (AirSTFM) for fine-grained air pollution inference at a large-scale region level. Firstly, we design a 3D-patch-wise self-attention convolutional module to extract the spatiotemporal features of air pollution, which aggregates both spatial and temporal information of coarse-grained air pollution and employs a sliding window to add spatial local features. Then, we propose a bidirectional optical flow feed-forward layer to extract the short-term air pollution diffusion characteristics, which can learn the temporal correlation contaminant diffusion between closeness time intervals. Finally, we construct a spatiotemporal super-resolution upsampling pretext task to model the higher-level dispersion features mapping between the coarse-grained and fined-grained air pollution distribution. The proposed method is tested on the PM2.5 pollution datatset of the Yangtze River Delta region. Our model outperforms the second best model in RMSE, MAE, and MAPE by 2.6%, 3.05%, and 6.36% in the 100% division, and our model also outperforms the second best model in RMSE, MAE, and MAPE by 3.86%, 3.76%, and 12.18% in the 40% division, which demonstrates the applicability of our model for different data sizes. Furthermore, the comprehensive experiment results show that our proposed AirSTFM outperforms the state-of-the-art models. Full article
(This article belongs to the Special Issue Study of Air Pollution Based on Remote Sensing (2nd Edition))
Show Figures

Figure 1

30 pages, 82967 KiB  
Article
Pansharpening Techniques: Optimizing the Loss Function for Convolutional Neural Networks
by Rocco Restaino
Remote Sens. 2025, 17(1), 16; https://doi.org/10.3390/rs17010016 - 25 Dec 2024
Viewed by 1034
Abstract
Pansharpening is a traditional image fusion problem where the reference image (or ground truth) is not accessible. Machine-learning-based algorithms designed for this task require an extensive optimization phase of network parameters, which must be performed using unsupervised learning techniques. The learning phase can [...] Read more.
Pansharpening is a traditional image fusion problem where the reference image (or ground truth) is not accessible. Machine-learning-based algorithms designed for this task require an extensive optimization phase of network parameters, which must be performed using unsupervised learning techniques. The learning phase can either rely on a companion problem where ground truth is available, such as by reproducing the task at a lower scale or using a pretext task, or it can use a reference-free cost function. This study focuses on the latter approach, where performance depends not only on the accuracy of the quality measure but also on the mathematical properties of these measures, which may introduce challenges related to computational complexity and optimization. The evaluation of the most recognized no-reference image quality measures led to the proposal of a novel criterion, the Regression-based QNR (RQNR), which has not been previously used. To mitigate computational challenges, an approximate version of the relevant indices was employed, simplifying the optimization of the cost functions. The effectiveness of the proposed cost functions was validated through the reduced-resolution assessment protocol applied to a public dataset (PairMax) containing images of diverse regions of the Earth’s surface. Full article
Show Figures

Figure 1

32 pages, 994 KiB  
Article
ORASIS-MAE Harnesses the Potential of Self-Learning from Partially Annotated Clinical Eye Movement Records
by Alae Eddine El Hmimdi, Themis Palpanas and Zoï Kapoula
BioMedInformatics 2024, 4(3), 1902-1933; https://doi.org/10.3390/biomedinformatics4030105 - 26 Aug 2024
Cited by 2 | Viewed by 1296
Abstract
Self-supervised learning (SSL) has gained significant attention in the past decade for its capacity to utilize non-annotated datasets to learn meaningful data representations. In the medical domain, the challenge of constructing large annotated datasets presents a significant limitation, rendering SSL an ideal approach [...] Read more.
Self-supervised learning (SSL) has gained significant attention in the past decade for its capacity to utilize non-annotated datasets to learn meaningful data representations. In the medical domain, the challenge of constructing large annotated datasets presents a significant limitation, rendering SSL an ideal approach to address this constraint. In this study, we introduce a novel pretext task tailored to stimulus-driven eye movement data, along with a denoising task to improve the robustness against simulated eye tracking failures. Our proposed task aims to capture both the characteristics of the pilot (brain) and the motor (eye) by learning to reconstruct the eye movement position signal using up to 12.5% of the unmasked eye movement signal patches, along with the entire REMOBI target signal. Thus, the encoder learns a high-dimensional representation using a multivariate time series of length 8192 points, corresponding to approximately 40 s. We evaluate the learned representation on screening eight distinct groups of pathologies, including dyslexia, reading disorder, and attention deficit disorder, across four datasets of varying complexity and size. Furthermore, we explore various head architecture designs along with different transfer learning methods, demonstrating promising results with improvements of up to approximately 15%, leading to an overall macro F1 score of 61% and 61.5% on the Saccade and the Vergence datasets, respectively. Notably, our method achieves macro F1 scores of 64.7%, 66.1%, and 61.1% for screening dyslexia, reading disorder, and attention deficit disorder, respectively, on clinical data. These findings underscore the potential of self-learning algorithms in pathology screening, particularly in domains involving complex data such as stimulus-driven eye movement analysis. Full article
Show Figures

Figure 1

14 pages, 1516 KiB  
Article
Early Recurrence Prediction of Hepatocellular Carcinoma Using Deep Learning Frameworks with Multi-Task Pre-Training
by Jian Song, Haohua Dong, Youwen Chen, Xianru Zhang, Gan Zhan, Rahul Kumar Jain and Yen-Wei Chen
Information 2024, 15(8), 493; https://doi.org/10.3390/info15080493 - 17 Aug 2024
Cited by 2 | Viewed by 1596
Abstract
Post-operative early recurrence (ER) of hepatocellular carcinoma (HCC) is a major cause of mortality. Predicting ER before treatment can guide treatment and follow-up protocols. Deep learning frameworks, known for their superior performance, are widely used in medical imaging. However, they face challenges due [...] Read more.
Post-operative early recurrence (ER) of hepatocellular carcinoma (HCC) is a major cause of mortality. Predicting ER before treatment can guide treatment and follow-up protocols. Deep learning frameworks, known for their superior performance, are widely used in medical imaging. However, they face challenges due to limited annotated data. We propose a multi-task pre-training method using self-supervised learning with medical images for predicting the ER of HCC. This method involves two pretext tasks: phase shuffle, focusing on intra-image feature representation, and case discrimination, focusing on inter-image feature representation. The effectiveness and generalization of the proposed method are validated through two different experiments. In addition to predicting early recurrence, we also apply the proposed method to the classification of focal liver lesions. Both experiments show that the multi-task pre-training model outperforms existing pre-training (transfer learning) methods with natural images, single-task self-supervised pre-training, and DINOv2. Full article
(This article belongs to the Special Issue Intelligent Image Processing by Deep Learning)
Show Figures

Figure 1

24 pages, 4272 KiB  
Article
JPSSL: SAR Terrain Classification Based on Jigsaw Puzzles and FC-CRF
by Zhongle Ren, Yiming Lu, Biao Hou, Weibin Li and Feng Sha
Remote Sens. 2024, 16(9), 1635; https://doi.org/10.3390/rs16091635 - 3 May 2024
Viewed by 1798
Abstract
Effective features play an important role in synthetic aperture radar (SAR) image interpretation. However, since SAR images contain a variety of terrain types, it is not easy to extract effective features of different terrains from SAR images. Deep learning methods require a large [...] Read more.
Effective features play an important role in synthetic aperture radar (SAR) image interpretation. However, since SAR images contain a variety of terrain types, it is not easy to extract effective features of different terrains from SAR images. Deep learning methods require a large amount of labeled data, but the difficulty of SAR image annotation limits the performance of deep learning models. SAR images have inevitable geometric distortion and coherence speckle noise, which makes it difficult to extract effective features from SAR images. If effective semantic context features cannot be learned for SAR images, the extracted features struggle to distinguish different terrain categories. Some existing terrain classification methods are very limited and can only be applied to some specified SAR images. To solve these problems, a jigsaw puzzle self-supervised learning (JPSSL) framework is proposed. The framework comprises a jigsaw puzzle pretext task and a terrain classification downstream task. In the pretext task, the information in the SAR image is learned by completing the SAR image jigsaw puzzle to extract effective features. The terrain classification downstream task is trained using only a small number of labeled data. Finally, fully connected conditional random field processing is performed to eliminate noise points and obtain a high-quality terrain classification result. Experimental results on three large-scene high-resolution SAR images confirm the effectiveness and generalization of our method. Compared with the supervised methods, the features learned in JPSSL are highly discriminative, and the JPSSL achieves good classification accuracy when using only a small amount of labeled data. Full article
Show Figures

Figure 1

13 pages, 1805 KiB  
Article
Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction
by Yusuf Brima, Ulf Krumnack, Simone Pika and Gunther Heidemann
Information 2024, 15(2), 114; https://doi.org/10.3390/info15020114 - 15 Feb 2024
Viewed by 3566
Abstract
Self-supervised learning (SSL) has emerged as a promising paradigm for learning flexible speech representations from unlabeled data. By designing pretext tasks that exploit statistical regularities, SSL models can capture useful representations that are transferable to downstream tasks. Barlow Twins (BTs) is an [...] Read more.
Self-supervised learning (SSL) has emerged as a promising paradigm for learning flexible speech representations from unlabeled data. By designing pretext tasks that exploit statistical regularities, SSL models can capture useful representations that are transferable to downstream tasks. Barlow Twins (BTs) is an SSL technique inspired by theories of redundancy reduction in human perception. In downstream tasks, BTs representations accelerate learning and transfer this learning across applications. This study applies BTs to speech data and evaluates the obtained representations on several downstream tasks, showing the applicability of the approach. However, limitations exist in disentangling key explanatory factors, with redundancy reduction and invariance alone being insufficient for factorization of learned latents into modular, compact, and informative codes. Our ablation study isolated gains from invariance constraints, but the gains were context-dependent. Overall, this work substantiates the potential of Barlow Twins for sample-efficient speech encoding. However, challenges remain in achieving fully hierarchical representations. The analysis methodology and insights presented in this paper pave a path for extensions incorporating further inductive priors and perceptual principles to further enhance the BTs self-supervision framework. Full article
(This article belongs to the Topic Advances in Artificial Neural Networks)
Show Figures

Figure 1

20 pages, 3580 KiB  
Review
Enhancing Object Detection in Smart Video Surveillance: A Survey of Occlusion-Handling Approaches
by Zainab Ouardirhi, Sidi Ahmed Mahmoudi and Mostapha Zbakh
Electronics 2024, 13(3), 541; https://doi.org/10.3390/electronics13030541 - 29 Jan 2024
Cited by 10 | Viewed by 5922
Abstract
Smart video surveillance systems (SVSs) have garnered significant attention for their autonomous monitoring capabilities, encompassing automated detection, tracking, analysis, and decision making within complex environments, with minimal human intervention. In this context, object detection is a fundamental task in SVS. However, many current [...] Read more.
Smart video surveillance systems (SVSs) have garnered significant attention for their autonomous monitoring capabilities, encompassing automated detection, tracking, analysis, and decision making within complex environments, with minimal human intervention. In this context, object detection is a fundamental task in SVS. However, many current approaches often overlook occlusion by nearby objects, posing challenges to real-world SVS applications. To address this crucial issue, this paper presents a comprehensive comparative analysis of occlusion-handling techniques tailored for object detection. The review outlines the pretext tasks common to both domains and explores various architectural solutions to combat occlusion. Unlike prior studies that primarily focus on a single dataset, our analysis spans multiple benchmark datasets, providing a thorough assessment of various object detection methods. By extending the evaluation to datasets beyond the KITTI benchmark, this study offers a more holistic understanding of each approach’s strengths and limitations. Additionally, we delve into persistent challenges in existing occlusion-handling approaches and emphasize the need for innovative strategies and future research directions to drive substantial progress in this field. Full article
(This article belongs to the Special Issue Image/Video Processing and Encoding for Contemporary Applications)
Show Figures

Figure 1

27 pages, 2404 KiB  
Article
A Generic Self-Supervised Learning (SSL) Framework for Representation Learning from Spectral–Spatial Features of Unlabeled Remote Sensing Imagery
by Xin Zhang and Liangxiu Han
Remote Sens. 2023, 15(21), 5238; https://doi.org/10.3390/rs15215238 - 3 Nov 2023
Cited by 4 | Viewed by 3279
Abstract
Remote sensing data has been widely used for various Earth Observation (EO) missions such as land use and cover classification, weather forecasting, agricultural management, and environmental monitoring. Most existing remote-sensing-data-based models are based on supervised learning that requires large and representative human-labeled data [...] Read more.
Remote sensing data has been widely used for various Earth Observation (EO) missions such as land use and cover classification, weather forecasting, agricultural management, and environmental monitoring. Most existing remote-sensing-data-based models are based on supervised learning that requires large and representative human-labeled data for model training, which is costly and time-consuming. The recent introduction of self-supervised learning (SSL) enables models to learn a representation from orders of magnitude more unlabeled data. The success of SSL is heavily dependent on a pre-designed pretext task, which introduces an inductive bias into the model from a large amount of unlabeled data. Since remote sensing imagery has rich spectral information beyond the standard RGB color space, it may not be straightforward to extend to the multi/hyperspectral domain the pretext tasks established in computer vision based on RGB images. To address this challenge, this work proposed a generic self-supervised learning framework based on remote sensing data at both the object and pixel levels. The method contains two novel pretext tasks, one for object-based and one for pixel-based remote sensing data analysis methods. One pretext task is used to reconstruct the spectral profile from the masked data, which can be used to extract a representation of pixel information and improve the performance of downstream tasks associated with pixel-based analysis. The second pretext task is used to identify objects from multiple views of the same object in multispectral data, which can be used to extract a representation and improve the performance of downstream tasks associated with object-based analysis. The results of two typical downstream task evaluation exercises (a multilabel land cover classification task on Sentinel-2 multispectral datasets and a ground soil parameter retrieval task on hyperspectral datasets) demonstrate that the proposed SSL method learns a target representation that covers both spatial and spectral information from massive unlabeled data. A comparison with currently available SSL methods shows that the proposed method, which emphasizes both spectral and spatial features, outperforms existing SSL methods on multi- and hyperspectral remote sensing datasets. We believe that this approach has the potential to be effective in a wider range of remote sensing applications and we will explore its utility in more remote sensing applications in the future. Full article
Show Figures

Graphical abstract

19 pages, 4951 KiB  
Article
S2AC: Self-Supervised Attention Correlation Alignment Based on Mahalanobis Distance for Image Recognition
by Zhi-Yong Wang, Dae-Ki Kang and Cui-Ping Zhang
Electronics 2023, 12(21), 4419; https://doi.org/10.3390/electronics12214419 - 26 Oct 2023
Cited by 3 | Viewed by 1502
Abstract
Susceptibility to domain changes for image classification hinders the application and development of deep neural networks. Domain adaptation (DA) makes use of domain-invariant characteristics to improve the performance of a model trained on labeled data from one domain (source domain) on an unlabeled [...] Read more.
Susceptibility to domain changes for image classification hinders the application and development of deep neural networks. Domain adaptation (DA) makes use of domain-invariant characteristics to improve the performance of a model trained on labeled data from one domain (source domain) on an unlabeled domain (target) with a different data distribution. But existing DA methods simply use pretrained models (e.g., AlexNet, ResNet) for feature extraction, which are convolutional models that are trapped in localized features and fail to acquire long-distance dependencies. Furthermore, many approaches depend too much on pseudo-labels, which can impair adaptation efficiency and lead to unstable and inconsistent results. In this research, we present S2AC, a novel approach for unsupervised deep domain adaptation, that makes use of a stacked attention architecture as a feature map extractor. Our method can fuse domain discrepancy with minimizing a linear transformation of the second statistics (covariances) extended by the p-norm, while simultaneously designing pretext tasks on heuristics to improve the generality of the learning representation. In addition, we have developed a new trainable relative position embedding that not only reduces the model parameters but also enhances model accuracy and expedites the training process. To illustrate our method’s efficacy and controllability, we designed extensive experiments based on the Office31, Office_Caltech_10, and OfficeHome datasets. To the best of our knowledge, the proposed method is the first attempt at incorporating attention-based networks and self-supervised learning for image domain adaptation, and has shown promising results. Full article
(This article belongs to the Special Issue Artificial Intelligence for Robotics)
Show Figures

Figure 1

14 pages, 545 KiB  
Article
Self-Supervised Spatio-Temporal Graph Learning for Point-of-Interest Recommendation
by Jiawei Liu, Haihan Gao, Chuan Shi, Hongtao Cheng and Qianlong Xie
Appl. Sci. 2023, 13(15), 8885; https://doi.org/10.3390/app13158885 - 1 Aug 2023
Cited by 3 | Viewed by 2179
Abstract
As one of the most crucial topics in the recommendation system field, point-of-interest (POI) recommendation aims to recommending potential interesting POIs to users. Recently, graph neural networks have been successfully used to model interaction and spatio-temporal information in POI recommendations, but the data [...] Read more.
As one of the most crucial topics in the recommendation system field, point-of-interest (POI) recommendation aims to recommending potential interesting POIs to users. Recently, graph neural networks have been successfully used to model interaction and spatio-temporal information in POI recommendations, but the data sparsity of POI recommendations affects the training of GNNs. Although some existing GNN-based POI recommendation approaches try to use social relationships or user attributes to alleviate the data sparsity problem, such auxiliary information is not always available for privacy reasons. Self-supervised learning provides a new idea to alleviate the data sparsity problem, but most existing self-supervised recommendation methods are designed for bi-partite graphs or social graphs, and cannot be directly used in the spatio-temporal graph of POI recommendations. In this paper, we propose a new method named SSTGL to combine self-supervised learning and GNN-based POI recommendation for the first time. SSTGL is empowered with spatio-temporal-aware strategies in the data augmentation and pre-text task stages, respectively, so that it can provide high-quality supervision information by incorporating spatio-temporal prior knowledge. By combining self-supervised learning objective with recommendation objectives, SSTGL can improve the performance of GNN-based POI recommendations. Extensive experiments on three POI recommendation datasets demonstrate the effectiveness of SSTGL, which performed better than existing mainstream methods. Full article
(This article belongs to the Special Issue Machine Learning and AI in Intelligent Data Mining and Analysis)
Show Figures

Figure 1

19 pages, 19495 KiB  
Article
A Self-Supervised Learning Approach for Extracting China Physical Urban Boundaries Based on Multi-Source Data
by Yuan Tao, Wanzeng Liu, Jun Chen, Jingxiang Gao, Ran Li, Jiaxin Ren and Xiuli Zhu
Remote Sens. 2023, 15(12), 3189; https://doi.org/10.3390/rs15123189 - 19 Jun 2023
Cited by 8 | Viewed by 2430
Abstract
Physical urban boundaries (PUBs) are basic geographic information data for defining the spatial extent of urban landscapes with non-agricultural land and non-agricultural economic activities. Accurately mapping PUBs provides a spatiotemporal database for urban dynamic monitoring, territorial spatial planning, and ecological environment protection. However, [...] Read more.
Physical urban boundaries (PUBs) are basic geographic information data for defining the spatial extent of urban landscapes with non-agricultural land and non-agricultural economic activities. Accurately mapping PUBs provides a spatiotemporal database for urban dynamic monitoring, territorial spatial planning, and ecological environment protection. However, traditional extraction methods often have problems, such as subjective parameter settings and inconsistent cartographic scales, making it difficult to identify PUBs objectively and accurately. To address these problems, we proposed a self-supervised learning approach for PUB extraction. First, we used nighttime light and OpenStreetMap road data to map the initial urban boundary for data preparation. Then, we designed a pretext task of self-supervised learning based on an unsupervised mutation detection algorithm to automatically mine supervised information in unlabeled data, which can avoid subjective human interference. Finally, a downstream task was designed as a supervised learning task in Google Earth Engine to classify urban and non-urban areas using impervious surface density and nighttime light data, which can solve the scale inconsistency problem. Based on the proposed method, we produced a 30 m resolution China PUB dataset containing six years (i.e., 1995, 2000, 2005, 2010, 2015, and 2020). Our PUBs show good agreement with existing products and accurately describe the spatial extent of urban areas, effectively distinguishing urban and non-urban areas. Moreover, we found that the gap between the national per capita GDP and the urban per capita GDP is gradually decreasing, but regional coordinated development and intensive development still need to be strengthened. Full article
Show Figures

Figure 1

14 pages, 977 KiB  
Article
Continuous Latent Spaces Sampling for Graph Autoencoder
by Zhongyu Li, Geng Zhao, Hao Ning, Xin Jin and Haoyang Yu
Appl. Sci. 2023, 13(11), 6491; https://doi.org/10.3390/app13116491 - 26 May 2023
Cited by 1 | Viewed by 2391
Abstract
This paper proposes colaGAE, a self-supervised learning framework for graph-structured data. While graph autoencoders (GAEs) commonly use graph reconstruction as a pretext task, this simple approach often yields poor model performance. To address this issue, colaGAE employs mutual isomorphism as a pretext task [...] Read more.
This paper proposes colaGAE, a self-supervised learning framework for graph-structured data. While graph autoencoders (GAEs) commonly use graph reconstruction as a pretext task, this simple approach often yields poor model performance. To address this issue, colaGAE employs mutual isomorphism as a pretext task for a continuous latent space sampling GAE (colaGAE). The central idea of mutual isomorphism is to sample from multiple views in the latent space and reconstruct the graph structure, with significant improvements in terms of the model’s training difficulty. To investigate whether continuous latent space sampling can enhance GAEs’ learning of graph representations, we provide both theoretical and empirical evidence for the benefits of this pretext task. Theoretically, we prove that mutual isomorphism can offer improvements with respect to the difficulty of model training, leading to better performance. Empirically, we conduct extensive experiments on eight benchmark datasets and achieve four state-of-the-art (SOTA) results; the average accuracy rate experiences a notable enhancement of 0.3%, demonstrating the superiority of colaGAE in node classification tasks. Full article
Show Figures

Figure 1

34 pages, 1678 KiB  
Systematic Review
Self-Supervised Contrastive Learning for Medical Time Series: A Systematic Review
by Ziyu Liu, Azadeh Alavi, Minyi Li and Xiang Zhang
Sensors 2023, 23(9), 4221; https://doi.org/10.3390/s23094221 - 23 Apr 2023
Cited by 41 | Viewed by 15787
Abstract
Medical time series are sequential data collected over time that measures health-related signals, such as electroencephalography (EEG), electrocardiography (ECG), and intensive care unit (ICU) readings. Analyzing medical time series and identifying the latent patterns and trends that lead to uncovering highly valuable insights [...] Read more.
Medical time series are sequential data collected over time that measures health-related signals, such as electroencephalography (EEG), electrocardiography (ECG), and intensive care unit (ICU) readings. Analyzing medical time series and identifying the latent patterns and trends that lead to uncovering highly valuable insights for enhancing diagnosis, treatment, risk assessment, and disease progression. However, data mining in medical time series is heavily limited by the sample annotation which is time-consuming and labor-intensive, and expert-depending. To mitigate this challenge, the emerging self-supervised contrastive learning, which has shown great success since 2020, is a promising solution. Contrastive learning aims to learn representative embeddings by contrasting positive and negative samples without the requirement for explicit labels. Here, we conducted a systematic review of how contrastive learning alleviates the label scarcity in medical time series based on PRISMA standards. We searched the studies in five scientific databases (IEEE, ACM, Scopus, Google Scholar, and PubMed) and retrieved 1908 papers based on the inclusion criteria. After applying excluding criteria, and screening at title, abstract, and full text levels, we carefully reviewed 43 papers in this area. Specifically, this paper outlines the pipeline of contrastive learning, including pre-training, fine-tuning, and testing. We provide a comprehensive summary of the various augmentations applied to medical time series data, the architectures of pre-training encoders, the types of fine-tuning classifiers and clusters, and the popular contrastive loss functions. Moreover, we present an overview of the different data types used in medical time series, highlight the medical applications of interest, and provide a comprehensive table of 51 public datasets that have been utilized in this field. In addition, this paper will provide a discussion on the promising future scopes such as providing guidance for effective augmentation design, developing a unified framework for analyzing hierarchical time series, and investigating methods for processing multimodal data. Despite being in its early stages, self-supervised contrastive learning has shown great potential in overcoming the need for expert-created annotations in the research of medical time series. Full article
(This article belongs to the Special Issue Sensors for Physiological Parameters Measurement)
Show Figures

Figure 1

17 pages, 6802 KiB  
Article
A Novel Study on a Generalized Model Based on Self-Supervised Learning and Sparse Filtering for Intelligent Bearing Fault Diagnosis
by Guocai Nie, Zhongwei Zhang, Mingyu Shao, Zonghao Jiao, Youjia Li and Lei Li
Sensors 2023, 23(4), 1858; https://doi.org/10.3390/s23041858 - 7 Feb 2023
Cited by 13 | Viewed by 2208
Abstract
Recently, deep learning has become more and more extensive in the field of fault diagnosis. However, most deep learning methods rely on large amounts of labeled data to train the model, which leads to their poor generalized ability in the application of different [...] Read more.
Recently, deep learning has become more and more extensive in the field of fault diagnosis. However, most deep learning methods rely on large amounts of labeled data to train the model, which leads to their poor generalized ability in the application of different scenarios. To overcome this deficiency, this paper proposes a novel generalized model based on self-supervised learning and sparse filtering (GSLSF). The proposed method includes two stages. Firstly (1), considering the representation of samples on fault and working condition information, designing self-supervised learning pretext tasks and pseudo-labels, and establishing a pre-trained model based on sparse filtering. Secondly (2), a knowledge transfer mechanism from the pre-training model to the target task is established, the fault features of the deep representation are extracted based on the sparse filtering model, and softmax regression is applied to distinguish the type of failure. This method can observably enhance the model’s diagnostic performance and generalization ability with limited training data. The validity of the method is proved by the fault diagnosis results of two bearing datasets. Full article
(This article belongs to the Section Industrial Sensors)
Show Figures

Figure 1

Back to TopTop