Machine Learning Supervised Algorithms in Bioinformatics

A special issue of Genes (ISSN 2073-4425). This special issue belongs to the section "Bioinformatics".

Deadline for manuscript submissions: closed (15 January 2023) | Viewed by 6354

Special Issue Editor


E-Mail Website
Guest Editor
Department of Ecology and Biology, University of Tuscia, San Camillo De Lellis, 01100 Viterbo, Italy
Interests: bioinformatics; computational biology; genomics; transcriptomics

Special Issue Information

Dear Colleagues,

With the advent of massive sequencing technologies and, more generally, increasingly sophisticated high-throughput experimental platforms, the amount of raw biological data produced in recent years has grown dramatically. The data deluge has led to the implementation and testing of powerful computational methods such as machine learning techniques to detect the complexity of biological data. Moreover, the contribution of omics technologies coupled with other heterogeneous data, such as phenotypic, structural and imaging data, helps to shed light on the enormous molecular complexity of living beings, although many aspects are still not characterized, since there are much missing data. Supervised machine learning algorithms that learn correlations between variables in annotated training data and use this information to predict inferred annotations for new data have already been used with great success in bioinformatics to predict biological events with greater reliability and accuracy than traditional algorithms.

This Special Issue is open to contributions concerning challenging research in different bioinformatics areas, (e.g., disease and health genomics, genomics and transcriptomics of both models and non-model organisms, eco-evolutionary genomics and phylogenomics, epigenomics, metagenomics, structural biology and bioimaging), addressed with supervised machine learning tools and algorithms.

This Special Issue on "Supervised Machine Learning Algorithms in Bioinformatics" aims to represent a wide range of topics related to theoretical, experimental, methodological or data contributions, or systematic reviews if they provide substantial contributions to the state-of-the-art. Topics include, but are not limited to, applications in bioinformatics of ML (machine learning) algorithms, such as SVM, KNN, regression or random forest and DL (deep learning) algorithms, such as, for example, RNN or CNN.

Dr. Tiziana Castrignanò
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Genes is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • bioinformatics
  • machine learning
  • computational biology
  • genomics
  • transcriptomics

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 474 KiB  
Article
Efficient Selection of Gaussian Kernel SVM Parameters for Imbalanced Data
by Chen-An Tsai and Yu-Jing Chang
Genes 2023, 14(3), 583; https://doi.org/10.3390/genes14030583 - 25 Feb 2023
Cited by 4 | Viewed by 1415
Abstract
For medical data mining, the development of a class prediction model has been widely used to deal with various kinds of data classification problems. Classification models especially for high-dimensional gene expression datasets have attracted many researchers in order to identify marker genes for [...] Read more.
For medical data mining, the development of a class prediction model has been widely used to deal with various kinds of data classification problems. Classification models especially for high-dimensional gene expression datasets have attracted many researchers in order to identify marker genes for distinguishing any type of cancer cells from their corresponding normal cells. However, skewed class distributions often occur in the medical datasets in which at least one of the classes has a relatively small number of observations. A classifier induced by such an imbalanced dataset typically has a high accuracy for the majority class and poor prediction for the minority class. In this study, we focus on an SVM classifier with a Gaussian radial basis kernel for a binary classification problem. In order to take advantage of an SVM and to achieve the best generalization ability for improving the classification performance, we will address two important problems: the class imbalance and parameter selection during SVM parameter optimization. First of all, we proposed a novel adjustment method called b-SVM, for adjusting the cutoff threshold of the SVM. Second, we proposed a fast and simple approach, called the Min-max gamma selection, to optimize the model parameters of SVMs without carrying out an extensive k-fold cross validation. An extensive comparison with a standard SVM and well-known existing methods are carried out to evaluate the performance of our proposed algorithms using simulated and real datasets. The experimental results show that our proposed algorithms outperform the over-sampling techniques and existing SVM-based solutions. This study also shows that the proposed Min-max gamma selection is at least 10 times faster than the cross-validation selection based on the average running time on six real datasets. Full article
(This article belongs to the Special Issue Machine Learning Supervised Algorithms in Bioinformatics)
Show Figures

Graphical abstract

11 pages, 2328 KiB  
Article
GReNaDIne: A Data-Driven Python Library to Infer Gene Regulatory Networks from Gene Expression Data
by Pauline Schmitt, Baptiste Sorin, Timothée Frouté, Nicolas Parisot, Federica Calevro and Sergio Peignier
Genes 2023, 14(2), 269; https://doi.org/10.3390/genes14020269 - 20 Jan 2023
Cited by 3 | Viewed by 2477
Abstract
Context: Inferring gene regulatory networks (GRN) from high-throughput gene expression data is a challenging task for which different strategies have been developed. Nevertheless, no ever-winning method exists, and each method has its advantages, intrinsic biases, and application domains. Thus, in order to analyze [...] Read more.
Context: Inferring gene regulatory networks (GRN) from high-throughput gene expression data is a challenging task for which different strategies have been developed. Nevertheless, no ever-winning method exists, and each method has its advantages, intrinsic biases, and application domains. Thus, in order to analyze a dataset, users should be able to test different techniques and choose the most appropriate one. This step can be particularly difficult and time consuming, since most methods’ implementations are made available independently, possibly in different programming languages. The implementation of an open-source library containing different inference methods within a common framework is expected to be a valuable toolkit for the systems biology community. Results: In this work, we introduce GReNaDIne (Gene Regulatory Network Data-driven Inference), a Python package that implements 18 machine learning data-driven gene regulatory network inference methods. It also includes eight generalist preprocessing techniques, suitable for both RNA-seq and microarray dataset analysis, as well as four normalization techniques dedicated to RNA-seq. In addition, this package implements the possibility to combine the results of different inference tools to form robust and efficient ensembles. This package has been successfully assessed under the DREAM5 challenge benchmark dataset. The open-source GReNaDIne Python package is made freely available in a dedicated GitLab repository, as well as in the official third-party software repository PyPI Python Package Index. The latest documentation on the GReNaDIne library is also available at Read the Docs, an open-source software documentation hosting platform. Contribution: The GReNaDIne tool represents a technological contribution to the field of systems biology. This package can be used to infer gene regulatory networks from high-throughput gene expression data using different algorithms within the same framework. In order to analyze their datasets, users can apply a battery of preprocessing and postprocessing tools and choose the most adapted inference method from the GReNaDIne library and even combine the output of different methods to obtain more robust results. The results format provided by GReNaDIne is compatible with well-known complementary refinement tools such as PYSCENIC. Full article
(This article belongs to the Special Issue Machine Learning Supervised Algorithms in Bioinformatics)
Show Figures

Figure 1

19 pages, 8813 KiB  
Article
Integrated Tissue and Blood miRNA Expression Profiles Identify Novel Biomarkers for Accurate Non-Invasive Diagnosis of Breast Cancer: Preliminary Results and Future Clinical Implications
by Fei Su, Ziyu Gao, Yueyang Liu, Guiqin Zhou, Ying Cui, Chao Deng, Yuyu Liu, Yihao Zhang, Xiaoyan Ma, Yongxia Wang, Lili Guan, Yafang Zhang and Baoquan Liu
Genes 2022, 13(11), 1931; https://doi.org/10.3390/genes13111931 - 24 Oct 2022
Cited by 1 | Viewed by 1861
Abstract
We aimed to identify miRNAs that were closely related to breast cancer (BRCA). By integrating several methods including significance analysis of microarrays, fold change, Pearson’s correlation analysis, t test, and receiver operating characteristic analysis, we developed a decision-tree-based scoring algorithm, called Optimized Scoring [...] Read more.
We aimed to identify miRNAs that were closely related to breast cancer (BRCA). By integrating several methods including significance analysis of microarrays, fold change, Pearson’s correlation analysis, t test, and receiver operating characteristic analysis, we developed a decision-tree-based scoring algorithm, called Optimized Scoring Mechanism for Primary Synergy MicroRNAs (O-PSM). Five synergy miRNAs (hsa-miR-139-5p, hsa-miR-331-3p, hsa-miR-342-5p, hsa-miR-486-5p, and hsa-miR-654-3p) were identified using O-PSM, which were used to distinguish normal samples from pathological ones, and showed good results in blood data and in multiple sets of tissue data. These five miRNAs showed accurate categorization efficiency in BRCA typing and staging and had better categorization efficiency than experimentally verified miRNAs. In the Protein-Protein Interaction (PPI) network, the target genes of hsa-miR-342-5p have the most regulatory relationships, which regulate carcinogenesis proliferation and metastasis by regulating Glycosaminoglycan biosynthesis and the Rap1 signaling pathway. Moreover, hsa-miR-342-5p showed potential clinical application in survival analysis. We also used O-PSM to generate an R package uploaded on github (SuFei-lab/OPSM accessed on 22 October 2021). We believe that miRNAs included in O-PSM could have clinical implications for diagnosis, prognostic stratification and treatment of BRCA, proposing potential significant biomarkers that could be utilized to design personalized treatment plans in BRCA patients in the future. Full article
(This article belongs to the Special Issue Machine Learning Supervised Algorithms in Bioinformatics)
Show Figures

Figure 1

Back to TopTop