You are currently viewing a new version of our website. To view the old version click .

Data

Data is a peer-reviewed, open access journal on data in science, with the aim of enhancing data transparency and reusability.
The journal publishes in two sections: a section on the collection, treatment and analysis methods of data in science; a section publishing descriptions of scientific and scholarly datasets (one dataset per paper). The journal is published monthly online by MDPI.
Quartile Ranking JCR - Q2 (Multidisciplinary Sciences)

All Articles (1,270)

  • Data Descriptor
  • Open Access

Seasonal Trap Captures Data of Stink and Leaf-Footed Bugs in a Northern Italian Ecosystem

  • Vito Antonio Giannuzzi,
  • Valeria Rossi and
  • Rihem Moujahed
  • + 7 authors

An essential first step to implement a control strategy against herbivorous insects is the monitoring of their populations. The efficacy of pheromone-based traps in capturing herbivorous insects can be enhanced by adding adjuvants and using slow-release dispensers to ensure long-lasting attractiveness. Here, we present datasets from a two-year field monitoring campaign of the invasive brown marmorated stink bug, Halyomorpha halys (Stål) (Hemiptera: Pentatomidae), using clear sticky traps baited with its aggregation pheromone and a synergist, tested towards different dispensers and adjuvants. Bycatch data for native stink bugs (all Hemiptera: Pentatomidae) and leaf-footed bugs (Hemiptera: Coreidae) are also presented. The R code provided was used to organize data and generate weekly captures or weekly density of both H. halys and non-target species. The information provided in this article may contribute to the optimization of pest control strategies in agriculture.

24 December 2025

Trend of Halyomorpha halys adult trap captures (mean ± SE) during the 2023 field trial. “BLS” stands for blister pack, “WXT” for wax tablet, and “NBP” for non-biodegradable polymer. In addition to control (CNT, no lure, no dispenser), four different combinations of PHER/MDT were evaluated: 10/125 mg (“1”), 15/125 mg (“2”), 20/200 mg (“3”), and 20/300 mg (“4”). The NBP dispenser included a fatty acid methyl ester (FM) as adjuvant, while no adjuvants were added to the others (“00”).

Constructed-response items offer rich evidence of writing proficiency, but the linguistic signals they contain vary with grade level. This study presents a cross-sectional analysis of 5638 English Language Arts essays from Grades 6–12 to identify which linguistic features predict proficiency and to characterize how their importance shifts across grade levels. We extracted a suite of lexical, syntactic, and semantic-cohesion features, and evaluated their predictive power using an interpretive dual-model framework combining LASSO and XGBoost algorithms. Feature importance was assessed through LASSO coefficients, XGBoost Gain scores, and SHAP values, and interpreted by isolating both consensus and divergences of the three metrics. Results show moderate, generalizable predictive signals in Grades 6–8, but no generalizable predictive power was found in the Grades 9–12 cohort. Across the middle grades, three findings achieved strong consensus. Essay length, syntactic density, and global semantic organization served as strong predictors of writing proficiency. Lexical diversity emerged as a key divergent feature, it was a top predictor for XGBoost but ignored by LASSO, suggesting its contribution depends on interactions with other features. These findings inform actionable, grade-sensitive feedback, highlighting stable, diagnostic targets for middle school while cautioning that discourse-level features are necessary to model high-school writing.

21 December 2025

The flowchart of the dual-model framework.
  • Data Descriptor
  • Open Access

Advancements in data storage and data processing technologies has compelled higher education institutions to optimise the use of their data. Many universities globally have begun to implement learning analytics at their institutions to better understand and improve teaching and learning. African higher education institutions have been slow to implement learning analytics despite the continued accumulation of digital data. The research related to this study presents a dataset of Information Systems and Technology (IS&T) students from the University of KwaZulu-Natal, a South African university. The dataset comprises approximately 14,000 registered student records from 10 IS&T courses, primarily consisting of demographic data, academic performance (including past IS&T courses and school records), and Learning Management System (LMS) interaction data. The dataset exhibits an imbalance, characterised by a higher proportion of students who have successfully completed courses compared to those who have not. The dataset will be of interest to researchers engaged in learning analytics application studies, including early pass/fail prediction and grade classification, as well as those who want to test their techniques on a real-world dataset.

19 December 2025

Dataset hierarchy.

A Real-World Underwater Video Dataset with Labeled Frames and Water-Quality Metadata for Aquaculture Monitoring

  • Osbaldo Aragón-Banderas,
  • Leonardo Trujillo and
  • Yolocuauhtli Salazar
  • + 2 authors

Aquaculture monitoring increasingly relies on computer vision to evaluate fish behavior and welfare under farming conditions. This dataset was collected in a commercial recirculating aquaculture system (RAS) integrated with hydroponics in Queretaro, Mexico, to support the development of robust visual models for Nile tilapia (Oreochromis niloticus). More than ten hours of underwater recordings were curated into 31 clips of 30 s each, a duration selected to balance representativeness of fish activity with a manageable size for annotation and training. Videos were captured using commercial action cameras at multiple resolutions (1920 × 1080 to 5312 × 4648 px), frame rates (24–60 fps), depths, and lighting configurations, reproducing real-world challenges such as turbidity, suspended solids, and variable illumination. For each recording, physicochemical parameters were measured, including temperature, pH, dissolved oxygen and turbidity, and are provided in a structured CSV file. In addition to the raw videos, the dataset includes 3520 extracted frames annotated using a polygon-based JSON format, enabling direct use for training object detection and behavior recognition models. This dual resource of unprocessed clips and annotated images enhances reproducibility, benchmarking, and comparative studies. By combining synchronized environmental data with annotated underwater imagery, the dataset contributes a non-invasive and versatile resource for advancing aquaculture monitoring through computer vision.

18 December 2025

Schematic of the recirculating aquaponic system (RAS) with rearing, solids removal, biofilters, aeration, pumps, and hydroponic grow beds; solid arrows indicate water flow, and dashed arrows indicate air flow.

News & Conferences

Issues

Open for Submission

Editor's Choice

Reprints of Collections

Data Mining and Computational Intelligence for E-learning and Education
Reprint

Data Mining and Computational Intelligence for E-learning and Education

Editors: Antonio Sarasa Cabezuelo, Ramón González del Campo Rodríguez Barbero
Recent Advances and Applications in Partial Least Squares Structural Equation Modeling (PLS-SEM)
Reprint

Recent Advances and Applications in Partial Least Squares Structural Equation Modeling (PLS-SEM)

Editors: María del Carmen Valls Martínez, José-María Montero, Pedro Antonio Martín Cervantes

Get Alerted

Add your email address to receive forthcoming issues of this journal.

XFacebookLinkedIn
Data - ISSN 2306-5729