Skip Content
You are currently on the new version of our website. Access the old version .

Data

Data is a peer-reviewed, open access journal on data in science, with the aim of enhancing data transparency and reusability.
The journal publishes in two sections: a section on the collection, treatment and analysis methods of data in science; a section publishing descriptions of scientific and scholarly datasets (one dataset per paper). The journal is published monthly online by MDPI.
Quartile Ranking JCR - Q2 (Multidisciplinary Sciences)

All Articles (1,301)

  • Data Descriptor
  • Open Access

This dataset presents a real-world lipidomics resource for developing and benchmarking quality control methods, batch effect detection algorithms, and data validation workflows. The data originates from a cross-sectional clinical study of psoriatic arthritis (PsA) patients (n = 81) and healthy controls (n = 26), matched for age, sex, and body mass index, which was collected at a tertiary university rheumatology center. Subtle laboratory irregularities were detected only through advanced unsupervised analysis, after passing conventional quality control and standard analytical methods. Blood samples were processed using standardized protocols and analyzed using high-resolution and tandem mass spectrometry platforms. Both targeted and untargeted lipid assays captured lipids of several classes (including carnitines, ceramides, glycerophospholipids, sphingolipids, glycerolipids, fatty acids, sterols and esters, endocannabinoids). The dataset is organized into four comma-separated value (CSV) files: (1) Box–Cox-transformed and imputed lipidomics values; (2) outlier-cleaned and imputed values on the original scale; (3) metadata including clinical classifications, biological sex, and batch information for all assay types and control sample processing dates; and (4) a variable-level description file (readme.csv). The 292 lipid variables are named according to LIPID MAPS classification and standardized nomenclature. Complete batch documentation and FAIR-compliant data structure make this dataset valuable for testing the robustness of analytical pipelines and quality control in lipidomics and related omics fields. This unique dataset does not compete with larger lipidomics quality control datasets for comparisons of results but provides a unique, real-life lipidomics dataset displaying traces of the laboratory sample processing schedule, which can be used to challenge quality control frameworks.

3 February 2026

Heatmap of lipid abundance across samples, ordered by Psoriatic arthritis (PsA) and ESOM cluster status. Color intensity represents relative abundance, scaled from 0 (lowest) to 100 (highest), for intuitive visualization. Warm colors (yellow to red) indicate higher abundance, while cool colors (blue) indicate lower abundance. Row annotations show the PsA and ESOM cluster classification for each sample. This visualization highlights differences in lipid profiles between patient groups and clusters. Note that the prior classes challenge standard clustering algorithms, which seem incapable of detecting clusters that reflect the underlying class structure. Therefore, more sophisticated methods are required; this data set is a benchmark for them.
  • Data Descriptor
  • Open Access

This Data Descriptor provides an open, anonymized dataset describing anthropometric and physical fitness outcomes in undergraduate students enrolled in a Physical Activity and Sport Sciences degree program. The dataset included 156 participants (28 females and 128 males) and reported sex, age, body mass, stature, and body mass index, alongside standardized field-based tests covering flexibility, muscular endurance, strength, and jump performance. Hip flexibility was assessed using the Thomas test on both sides. Trunk extensor endurance was measured using the Biering–Sørensen test, and upper-body strength–endurance was assessed using a dead-hang test. Upper limb strength was recorded as elbow flexion strength. Lower limb power was evaluated using vertical jump tests, including Abalakov, squat jump, and countermovement jump, and a derived indicator (IE) was provided to facilitate comparisons across jump modalities. The data are distributed as a machine-readable CSV file accompanied by a detailed data dictionary describing the variables, units, and missingness. The dataset is intended to support the reproducible reporting of normative fitness profiles in sports science students, facilitate teaching and benchmarking in exercise science contexts, and enable secondary analyses exploring associations between anthropometry and physical performance. For reproducible inferential comparisons, users may apply Welch’s two-sample t-test for sex-based differences.

7 February 2026

  • Data Descriptor
  • Open Access

Indoor environmental comfort plays a central role in occupants’ well-being, learning outcomes, and productivity, especially in educational buildings characterized by high occupancy variability and diverse activities. This paper presents a real-world dataset collected at the Cesena Campus of the University of Bologna, aimed at supporting occupant-centric comfort analysis and prediction in classrooms and laboratories. The dataset integrates continuous environmental measurements, such as temperature, humidity, noise, air pressure, and CO2 concentration, with subjective comfort feedback gathered from students during regular lectures. Data were collected using permanently installed ceiling sensors and additional control sensors placed near occupants, enabling both longitudinal monitoring and validation analyses. Furthermore, the dataset includes both repeated comfort perception reports and a one-time comfort definition phase capturing individual relevance weights for different comfort dimensions. By combining objective and subjective data in realistic academic settings, the dataset provides a valuable resource for developing, benchmarking, and validating data-driven models for smart campus applications, indoor comfort prediction, and human-centered building analytics.

3 February 2026

  • Data Descriptor
  • Open Access

Dual-Source Synthetic Uzbek Corpora for Sentiment Analysis and NER with Controlled Emoji Signals

  • Bobur Saidov,
  • Vladimir Barakhnin and
  • Zarnigor Fayzullaeva
  • + 6 authors

This data descriptor presents two fully synthetic corpora for sentiment analysis and named entity recognition (NER) in Uzbek. The first corpus contains 12,000 hybrid synthetic sentences generated from templates with lexical randomization, automatic insertion of named entities (PER/ORG/LOC), lexicon-based polarity scoring, and a controlled emoji distribution. The second corpus includes 3000 “manual-style” sentences designed to resemble short, naturally structured messages. Although the manual-style subset was initially intended to be emoji-free, the released version includes a 39.6% emoji presence (sentences containing at least one emoji) to maintain comparability in emotional markers across corpora. Both corpora are released in CSV, XLSX, and JSONL formats and share a unified schema (id, text, sentiment, entities, entity_type, polarity_score, polarity_source, token_count, emojis, emoji_position, emoji_sentiment, conflict_flag, sentiment_from_polarity_score, split). The dataset is publicly available via Mendeley Data (DOI: 10.17632/y2d5pcyrzz.3).

1 February 2026

News & Conferences

Issues

Open for Submission

Editor's Choice

Reprints of Collections

Data Mining and Computational Intelligence for E-learning and Education
Reprint

Data Mining and Computational Intelligence for E-learning and Education

Editors: Antonio Sarasa Cabezuelo, Ramón González del Campo Rodríguez Barbero
Recent Advances and Applications in Partial Least Squares Structural Equation Modeling (PLS-SEM)
Reprint

Recent Advances and Applications in Partial Least Squares Structural Equation Modeling (PLS-SEM)

Editors: María del Carmen Valls Martínez, José-María Montero, Pedro Antonio Martín Cervantes

Get Alerted

Add your email address to receive forthcoming issues of this journal.

XFacebookLinkedIn
Data - ISSN 2306-5729