You are currently viewing a new version of our website. To view the old version click .

Data

Data is a peer-reviewed, open access journal on data in science, with the aim of enhancing data transparency and reusability.
The journal publishes in two sections: a section on the collection, treatment and analysis methods of data in science; a section publishing descriptions of scientific and scholarly datasets (one dataset per paper). The journal is published monthly online by MDPI.
Quartile Ranking JCR - Q2 (Multidisciplinary Sciences)

All Articles (1,271)

  • Data Descriptor
  • Open Access

The aryl hydrocarbon receptor (AhR) plays a crucial role in mediating xenobiotic responses, as well as regulating broader metabolic, differentiation, and stress response programs. In this study, we present a comprehensive long-read RNA sequencing dataset that examines transcriptional changes in the HepaRG human cell line during differentiation induced by dimethyl sulfoxide (DMSO) and acute activation of the AhR with 3-methylcholanthrene (3-MC). We identified 946 genes that were differentially expressed between the NonDiff and Diff conditions (303 genes upregulated and 643 genes downregulated), and 1786 genes that showed differential expression between Diff and Ind conditions (961 genes upregulated and 825 genes downregulated). The acute induction of 3-MC produced a robust AhR signature, characterized by the robust induction of CYP1A1 and CYP1B1, along with a coordinated downregulation of several constitutive hepatic genes involved in drug metabolism (e.g., CYP3A4 and CYP2C8). To facilitate further analysis and reuse of our data, we have provided processed gene-level count matrices, transcript per million (TPM) tables, and detailed differential expression results, as well as analysis scripts. This resource supports research into AhR biology, pharmacogene regulation, and the development of methods for long-read transcriptomics in liver models.

29 December 2025

Verification of HepaRG cell differentiation. (A,B) Morphology of HepaRG cells. (A) Proliferative phase. (B) Cells during induction of differentiation. Differentiated (hepatocyte-like) cells are marked with the letter “H”; cells that retain the epithelial phenotype are marked with “E”. Scale bar: 50 μm. (C) Relative mRNA expression of differentiation-related genes ALB, CYP3A4, and CYP2E1. NonDiff—undifferentiated proliferating cells, FC—full confluence (prior to DMSO treatment), and Diff—differentiated cells. Error bars indicate standard error of the mean. * p < 0.05, ** p < 0.01, and **** p < 0.0001 compared to NonDiff.
  • Data Descriptor
  • Open Access

Seasonal Trap Captures Data of Stink and Leaf-Footed Bugs in a Northern Italian Ecosystem

  • Vito Antonio Giannuzzi,
  • Valeria Rossi and
  • Rihem Moujahed
  • + 7 authors

An essential first step to implement a control strategy against herbivorous insects is the monitoring of their populations. The efficacy of pheromone-based traps in capturing herbivorous insects can be enhanced by adding adjuvants and using slow-release dispensers to ensure long-lasting attractiveness. Here, we present datasets from a two-year field monitoring campaign of the invasive brown marmorated stink bug, Halyomorpha halys (Stål) (Hemiptera: Pentatomidae), using clear sticky traps baited with its aggregation pheromone and a synergist, tested towards different dispensers and adjuvants. Bycatch data for native stink bugs (all Hemiptera: Pentatomidae) and leaf-footed bugs (Hemiptera: Coreidae) are also presented. The R code provided was used to organize data and generate weekly captures or weekly density of both H. halys and non-target species. The information provided in this article may contribute to the optimization of pest control strategies in agriculture.

24 December 2025

Trend of Halyomorpha halys adult trap captures (mean ± SE) during the 2023 field trial. “BLS” stands for blister pack, “WXT” for wax tablet, and “NBP” for non-biodegradable polymer. In addition to control (CNT, no lure, no dispenser), four different combinations of PHER/MDT were evaluated: 10/125 mg (“1”), 15/125 mg (“2”), 20/200 mg (“3”), and 20/300 mg (“4”). The NBP dispenser included a fatty acid methyl ester (FM) as adjuvant, while no adjuvants were added to the others (“00”).

Constructed-response items offer rich evidence of writing proficiency, but the linguistic signals they contain vary with grade level. This study presents a cross-sectional analysis of 5638 English Language Arts essays from Grades 6–12 to identify which linguistic features predict proficiency and to characterize how their importance shifts across grade levels. We extracted a suite of lexical, syntactic, and semantic-cohesion features, and evaluated their predictive power using an interpretive dual-model framework combining LASSO and XGBoost algorithms. Feature importance was assessed through LASSO coefficients, XGBoost Gain scores, and SHAP values, and interpreted by isolating both consensus and divergences of the three metrics. Results show moderate, generalizable predictive signals in Grades 6–8, but no generalizable predictive power was found in the Grades 9–12 cohort. Across the middle grades, three findings achieved strong consensus. Essay length, syntactic density, and global semantic organization served as strong predictors of writing proficiency. Lexical diversity emerged as a key divergent feature, it was a top predictor for XGBoost but ignored by LASSO, suggesting its contribution depends on interactions with other features. These findings inform actionable, grade-sensitive feedback, highlighting stable, diagnostic targets for middle school while cautioning that discourse-level features are necessary to model high-school writing.

21 December 2025

The flowchart of the dual-model framework.
  • Data Descriptor
  • Open Access

Advancements in data storage and data processing technologies has compelled higher education institutions to optimise the use of their data. Many universities globally have begun to implement learning analytics at their institutions to better understand and improve teaching and learning. African higher education institutions have been slow to implement learning analytics despite the continued accumulation of digital data. The research related to this study presents a dataset of Information Systems and Technology (IS&T) students from the University of KwaZulu-Natal, a South African university. The dataset comprises approximately 14,000 registered student records from 10 IS&T courses, primarily consisting of demographic data, academic performance (including past IS&T courses and school records), and Learning Management System (LMS) interaction data. The dataset exhibits an imbalance, characterised by a higher proportion of students who have successfully completed courses compared to those who have not. The dataset will be of interest to researchers engaged in learning analytics application studies, including early pass/fail prediction and grade classification, as well as those who want to test their techniques on a real-world dataset.

19 December 2025

Dataset hierarchy.

News & Conferences

Issues

Open for Submission

Editor's Choice

Reprints of Collections

Data Mining and Computational Intelligence for E-learning and Education
Reprint

Data Mining and Computational Intelligence for E-learning and Education

Editors: Antonio Sarasa Cabezuelo, Ramón González del Campo Rodríguez Barbero
Recent Advances and Applications in Partial Least Squares Structural Equation Modeling (PLS-SEM)
Reprint

Recent Advances and Applications in Partial Least Squares Structural Equation Modeling (PLS-SEM)

Editors: María del Carmen Valls Martínez, José-María Montero, Pedro Antonio Martín Cervantes

Get Alerted

Add your email address to receive forthcoming issues of this journal.

XFacebookLinkedIn
Data - ISSN 2306-5729