Benchmarking Datasets in Bioinformatics, 2nd Edition

A special issue of Data (ISSN 2306-5729). This special issue belongs to the section "Computational Biology, Bioinformatics, and Biomedical Data Science".

Deadline for manuscript submissions: 31 July 2025 | Viewed by 2583

Special Issue Editor

Special Issue Information

Dear Colleagues,

Over the last few years, computational predictions and identifications have gained importance in modern life science and medical science. Many efforts have been made to develop algorithms and computational models that can be used to identify molecular structures, functions, interactions, evolutions, and their relationships with complex disorders. To validate these methods, many benchmarking datasets have been constructed, applied, and released to the public domain. These benchmarking datasets form the basis of the fair comparison and validation of computational methods. A thorough discussion and comparison of these datasets is necessary. In this Special Issue, we aim to provide deep insights into the construction procedures and characteristics of different benchmarking datasets with the same, or similar, biological topics.

We are looking for manuscripts that discuss different benchmarking datasets which cover a single bioinformatics topic or a specific category of topics. These manuscripts can discuss and compare the construction procedures, data sources, and statistics of different datasets, as well as the computational methods that are developed and evaluated using these datasets. There is no limit or fixed boundary to these comparisons. All kinds of discussions, comments, and comparisons are welcome. In particular, a collection of different datasets covering a single topic or similar topics are welcome, as this will facilitate the further development of different computational methods. In general, all contributions related to bioinformatics benchmarking datasets may be included in this Special Issue.

Dr. Pufeng Du
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Data is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • bioinformatics datasets
  • dataset construction
  • dataset comparisons
  • dataset qualities
  • dataset comments
  • dataset collections
  • comparison of computational methods based on datasets

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Other

14 pages, 4526 KiB  
Data Descriptor
A Complementary Dataset of Scalp EEG Recordings Featuring Participants with Alzheimer’s Disease, Frontotemporal Dementia, and Healthy Controls, Obtained from Photostimulation EEG
by Aimilia Ntetska, Andreas Miltiadous, Markos G. Tsipouras, Katerina D. Tzimourta, Theodora Afrantou, Panagiotis Ioannidis, Dimitrios G. Tsalikakis, Konstantinos Sakkas, Emmanouil D. Oikonomou, Nikolaos Grigoriadis, Pantelis Angelidis, Nikolaos Giannakeas and Alexandros T. Tzallas
Data 2025, 10(5), 64; https://doi.org/10.3390/data10050064 (registering DOI) - 29 Apr 2025
Viewed by 68
Abstract
Research interest in the application of electroencephalogram (EEG) as a non-invasive diagnostic tool for the automated detection of neurodegenerative diseases is growing. Open-access datasets have become crucial for researchers developing such methodologies. Our previously published open-access dataset of resting-state (eyes-closed) EEG recordings from [...] Read more.
Research interest in the application of electroencephalogram (EEG) as a non-invasive diagnostic tool for the automated detection of neurodegenerative diseases is growing. Open-access datasets have become crucial for researchers developing such methodologies. Our previously published open-access dataset of resting-state (eyes-closed) EEG recordings from patients with Alzheimer’s disease (AD), frontotemporal dementia (FTD), and cognitively normal (CN) controls has attracted significant attention. In this paper, we present a complementary dataset consisting of eyes-open photic stimulation recordings from the same cohort. The dataset includes recordings from 88 participants (36 AD, 23 FTD, and 29 CN) and is provided in Brain Imaging Data Structure (BIDS) format, promoting consistency and ease of use across research groups. Additionally, a fully preprocessed version is included, using EEGLAB-based pipelines that involve filtering, artifact removal, and Independent Component Analysis, preparing the data for machine learning applications. This new dataset enables the study of brain responses to visual stimulation across different cognitive states and supports the development and validation of automated classification algorithms for dementia detection. It offers a valuable benchmark for both methodological comparisons and biological investigations, and it is expected to significantly contribute to the fields of neurodegenerative disease research, biomarker discovery, and EEG-based diagnostics. Full article
(This article belongs to the Special Issue Benchmarking Datasets in Bioinformatics, 2nd Edition)
Show Figures

Figure 1

7 pages, 407 KiB  
Data Descriptor
Draft Genome Sequence Data of the Ensifer sp. P24N7, a Symbiotic Bacteria Isolated from Nodules of Phaseolus vulgaris Grown in Mining Tailings from Huautla, Morelos, Mexico
by José Augusto Ramírez-Trujillo, Maria Guadalupe Castillo-Texta, Mario Ramírez-Yáñez and Ramón Suárez-Rodríguez
Data 2025, 10(3), 34; https://doi.org/10.3390/data10030034 - 27 Feb 2025
Viewed by 679
Abstract
In this work, we report the draft genome sequence of Ensifer sp. P24N7, a symbiotic nitrogen-fixing bacterium isolated from nodules of Phaseolus vulgaris var. Negro Jamapa was planted in pots that contained mining tailings from Huautla, Morelos, México. The genomic DNA was sequenced [...] Read more.
In this work, we report the draft genome sequence of Ensifer sp. P24N7, a symbiotic nitrogen-fixing bacterium isolated from nodules of Phaseolus vulgaris var. Negro Jamapa was planted in pots that contained mining tailings from Huautla, Morelos, México. The genomic DNA was sequenced by an Illumina NovaSeq 6000 using the 250 bp paired-end protocol obtaining 1,188,899 reads. An assembly generated with SPAdes v. 3.15.4 resulted in a genome length of 7,165,722 bp composed of 181 contigs with a N50 of 323,467 bp, a coverage of 76X, and a GC content of 61.96%. The genome was annotated with the NCBI Prokaryotic Genome Annotation Pipeline and contains 6631 protein-coding sequences, 3 complete rRNAs, 52 tRNAs, and 4 non-coding RNAs. The Ensifer sp. P24N7 genome has 59 genes related to heavy metal tolerance predicted by RAST server. These data may be useful to the scientific community because they can be used as a reference for other works related to heavy metals, including works in Huautla, Morelos. Full article
(This article belongs to the Special Issue Benchmarking Datasets in Bioinformatics, 2nd Edition)
Show Figures

Figure 1

11 pages, 1926 KiB  
Data Descriptor
Minisatellite Isolation and Minisatellite Molecular Marker Development in Citrus limon (L.) Osbeck
by Oleg S. Alexandrov and Dmitry V. Romanov
Data 2025, 10(1), 2; https://doi.org/10.3390/data10010002 - 28 Dec 2024
Viewed by 835
Abstract
Minisatellites are widespread tandem DNA repeats in the genome with a monomer length of 10 to 100 bp. The high variability of minisatellite loci makes them attractive for the development of molecular markers. Minisatellites are used as markers according to three strategies: marking [...] Read more.
Minisatellites are widespread tandem DNA repeats in the genome with a monomer length of 10 to 100 bp. The high variability of minisatellite loci makes them attractive for the development of molecular markers. Minisatellites are used as markers according to three strategies: marking of digested genomic DNA with minisatellite-based probes; amplification with primers based on the sequences of the minisatellites themselves; amplification with primers designed for borders upstream and downstream of the minisatellite locus. In this study, a microsatellite dataset was obtained from the analysis of the Citrus limon (L.) Osbeck genome using Tandem Repeat Finder (TRF) and GMATA software. The minisatellite loci found were used to develop molecular markers that were tested in GMATA using electronic PCR (e-PCR). The obtained dataset includes sequences of extracted minisatellites and their characteristics (start and end nucleotide positions on the chromosome, length of monomer, number of repetitions and length of array), as well as sequences of developed primers, expected lengths of amplicons, and e-PCR results. The presented dataset can be used for the marking of lemon samples according to any of the three strategies. It provides a useful basis for lemon variety certification, identification of samples, verification of collections, lemon genome mapping, saturation of already created maps, studying of the lemon genome architecture etc. Full article
(This article belongs to the Special Issue Benchmarking Datasets in Bioinformatics, 2nd Edition)
Show Figures

Figure 1

Back to TopTop