MDPI - Publisher of Open Access Journals

26 pages, 1747 KiB

Open AccessArticle

Quality over Quantity: An Effective Large-Scale Data Reduction Strategy Based on Pointwise V-Information

by Fei Chen and Wenchi Zhou

Electronics 2025, 14(15), 3092; https://doi.org/10.3390/electronics14153092 - 1 Aug 2025

Viewed by 313

In order to increase the effectiveness of model training, data reduction is essential to data-centric Artificial Intelligence (AI). It achieves this by locating the most instructive examples in massive datasets. To increase data quality and training efficiency, the main difficulty is choosing the [...] Read more.

V

-Information (PVI). To enable a static method, we first use PVI to quantify instance difficulty and remove instances with low difficulty. Experiments show that classifier performance is maintained with only a 0.0001% to 0.76% decline in accuracy when 10–30% of the data is removed. Second, we train the classifiers using a progressive learning strategy on examples sorted by increasing PVI, accelerating convergence and achieving a 0.8% accuracy gain over conventional training. Our findings imply that training a classifier on the chosen optimal subset may improve model performance and increase training efficiency when combined with an efficient data reduction strategy. Furthermore, we have adapted the PVI framework, which was previously limited to English datasets, to a variety of Chinese Natural Language Processing (NLP) tasks and base models, yielding insightful results for faster training and cross-lingual data reduction. Full article

(This article belongs to the Special Issue Data Retrieval and Data Mining)

► Show Figures

Figure 1

23 pages, 13802 KiB

Open AccessArticle

Underwater-Yolo: Underwater Object Detection Network with Dilated Deformable Convolutions and Dual-Branch Occlusion Attention Mechanism

by Zhenming Li, Bing Zheng, Dong Chao, Wenbo Zhu, Haibing Li, Jin Duan, Xinming Zhang, Zhongbo Zhang, Weijie Fu and Yunzhi Zhang

J. Mar. Sci. Eng. 2024, 12(12), 2291; https://doi.org/10.3390/jmse12122291 - 12 Dec 2024

Cited by 4 | Viewed by 2367

Abstract

Underwater object detection is critical for marine ecological monitoring and biodiversity research, yet existing algorithms struggle in detecting densely packed objects of varying sizes, particularly in occluded and complex underwater environments. This study introduces Underwater-Yolo, a novel detection network that enhances performance in these challenging scenarios by integrating a dual-branch occlusion-handling attention mechanism (GLOAM) and a Cross-Stage Partial Dilated Deformable Convolution (CSP-DDC) backbone. The dilated deformable convolutions (DDCs) in the backbone and neck expand the receptive field, thereby improving the detection of small objects, while the deformable convolutions enhance the model’s adaptive feature extraction capabilities for unstructured objects. Additionally, the CARAFE up-sampling operator in the neck aggregates contextual information across a broader spatial domain. The GLOAM, consisting of a global branch (using a Vision Transformer to capture global features and object–background relationships) and a local branch (enhancing the detection of occluded objects through depthwise–pointwise convolutions), further optimizes performance. By incorporating these innovations, the model effectively addresses the challenges of detecting small and occluded objects in dense underwater environments. The evaluation on the CLfish-V1 dataset shows significant improvements over state-of-the-art algorithms, with an AP50 of 93.8%, an AP75 of 88.9%, and an AP-small of 76.4%, marking gains of 4.7%, 16.7%, and 6%, respectively. These results demonstrate the model’s effectiveness in complex underwater scenarios. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

23 pages, 115282 KiB

Open AccessArticle

Exploiting Sentinel-5P TROPOMI and Ground Sensor Data for the Detection of Volcanic SO₂ Plumes and Activity in 2018–2021 at Stromboli, Italy

by Alessandra Cofano, Francesca Cigna, Luigi Santamaria Amato, Mario Siciliani de Cumis and Deodato Tapete

Sensors 2021, 21(21), 6991; https://doi.org/10.3390/s21216991 - 21 Oct 2021

Cited by 19 | Viewed by 5548

Abstract

Sulfur dioxide (SO₂) degassing at Strombolian volcanoes is directly associated with magmatic activity, thus its monitoring can inform about the style and intensity of eruptions. The Stromboli volcano in southern Italy is used as a test case to demonstrate that the TROPOspheric Monitoring Instrument (TROPOMI) onboard the Copernicus Sentinel-5 Precursor (Sentinel-5P) satellite has the suitable spatial resolution and sensitivity to carry out local-scale SO₂ monitoring of relatively small-size, nearly point-wise volcanic sources, and distinguish periods of different activity intensity. The entire dataset consisting of TROPOMI Level 2 SO₂ geophysical products from UV sensor data collected over Stromboli from 6 May 2018 to 31 May 2021 is processed with purposely adapted Python scripts. A methodological workflow is developed to encompass the extraction of total SO₂ Vertical Column Density (VCD) at given coordinates (including conditional VCD for three different hypothetical peaks at 0–1, 7 and 15 km), as well as filtering by quality in compliance with the Sentinel-5P Validation Team’s recommendations. The comparison of total SO₂ VCD time series for the main crater and across different averaging windows (3 × 3, 5 × 5 and 4 × 2) proves the correctness of the adopted spatial sampling criterion, and practical recommendations are proposed for further implementation in similar volcanic environments. An approach for detecting SO₂ VCD peaks at the volcano is trialed, and the detections are compared with the level of SO₂ flux measured at ground-based instrumentation. SO₂ time series analysis is complemented with information provided by contextual Sentinel-2 multispectral (in the visible, near and short-wave infrared) and Suomi NPP VIIRS observations. The aim is to correctly interpret SO₂ total VCD peaks when they either (i) coincide with medium to very high SO₂ emissions as measured in situ and known from volcanological observatory bulletins, or (ii) occur outside periods of significant emissions despite signs of activity visible in Sentinel-2 data. Finally, SO₂ VCD peaks in the time series are further investigated through daily time lapses during the paroxysms in July–August 2019, major explosions in August 2020 and a more recent period of activity in May 2021. Hourly wind records from ECMWF Reanalysis v5 (ERA5) data are used to identify local wind direction and SO₂ plume drift during the time lapses. The proposed analysis approach is successful in showing the SO₂ degassing associated with these events, and warning whenever the SO₂ VCD at Stromboli may be overestimated due to clustering with the plume of the Mount Etna volcano. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI