New Multi-View Feature Learning Method for Accurate Antifungal Peptide Detection

: Antimicrobial resistance, particularly the emergence of resistant strains in fungal pathogens, has become a pressing global health concern. Antifungal peptides (AFPs) have shown great potential as a promising alternative therapeutic strategy due to their inherent antimicrobial properties and potential application in combating fungal infections. However, the identification of antifungal peptides using experimental approaches is time-consuming and costly. Hence, there is a demand to propose fast and accurate computational approaches to identifying AFPs. This paper introduces a novel multi-view feature learning (MVFL) model, called AFP-MVFL, for accurate AFP identification, utilizing multi-view feature learning. By integrating the sequential and physicochemical properties of amino acids and employing a multi-view approach, the AFP-MVFL model significantly enhances prediction accuracy. It achieves 97.9%, 98.4%, 0.98, and 0.96 in terms of accuracy, precision, F1 score, and Matthews correlation coefficient (MCC), respectively, outperforming previous studies found in the literature.


Introduction
Fungal infections pose a significant threat to human health, affecting over one billion people worldwide annually [1].Unlike bacteria, fungi share similar biological characteristics with mammalian cells as eukaryotes, making it challenging to develop antifungal drugs [2].Currently, the clinical treatment options for fungal infections are limited to polyenes, azoles, echinocandins, and a few auxiliary drugs like flucytosine, which are constrained by fungal resistance and have drug toxicity side effects [3].Therefore, there is a critical need to expand the repertoire of antifungal drugs [4].
Antifungal peptides (AFPs) represent a class of naturally occurring peptides produced by organisms as a defense mechanism against fungal pathogens [5].Typically consisting of 10-100 amino acids, AFPs are amphipathic.AFPs have low toxicity and high efficiency.Due to these favorable characteristics, they have emerged as promising alternatives to chemical antifungal agents [6].In contrast to traditional antifungal drugs, AFPs exhibit diverse modes of action, such as disrupting fungal cell membranes or inducing the production of reactive oxygen species (ROS) [7].Identifying AFPs experimentally is time-consuming and costly, especially for the pre-screening of a large number of AFP candidates.Therefore, there is a pressing need for computational models that can rapidly and accurately predict AFPs [8].
In recent years, a wide range of machine-learning-based approaches have been proposed to predict antifungal peptides (AFPs).For instance, Leyi Wei et al. introduced a novel computational model called AFP-MFL (multi-view feature learning) for accurately identifying antifungal peptides (AFPs) by integrating different feature groups [9].Later, Agrawal et al. [10] employed a combination of amino acid composition (AAC), dipeptide composition (DPC), split amino acid composition, and binary profiles to characterize peptides, subsequently utilizing support vector machine (SVM) classifier to construct prediction models [11].
More recently, Ahmad et al. introduced a feature fusion scheme to integrate diverse peptide features, which were then used to train a deep neural network (DNN) for prediction purposes [12].Later, Ahmad et al. proposed another innovative computational model for AFP prediction using sequential and evolutionary information extracted from peptides and employing a minimum redundancy and maximum relevance (mRMR) based method for feature extraction [13].Most recently, Zhang et al. proposed a machine-learning-based approach for accurately identifying and classifying AFPs by developing a comprehensive dataset of known AFPs and applying various feature extraction techniques to represent peptide sequences [14].
Most existing studies heavily rely on expert-knowledge-based handcrafted features to characterize intrinsic peptide properties [15].These approaches need help with handling short peptide sequences.For instance, descriptors such as AAC, DPC, and reduced amino acid alphabet composition (RAAAC) only consider the frequency of individual amino acid residues, overlooking the sequential order of amino acids in the peptide sequence.The integration of different feature vectors into a high-dimensional feature space has been used to achieve a more expressive feature representation [16].Nevertheless, this often leads to the curse of dimensionality, introducing redundant information and resulting in heightened computational complexity [17].
To address these issues, we present AFP_MVFL, a new machine learning model based on multi-view feature learning (MVFL) aimed at accurately identifying AFPs.The AFP-MVFL model leverages a diverse range of sequence-based information and physicochemical properties to comprehensively represent peptide characteristics [10].By incorporating multiple properties of peptides, our model enhances its ability to capture patterns underlying antifungal activity.AFP-MVFL achieves 97.9%, 98.4%, 0.98, and 0.96 in terms of accuracy, precision, F1 score, and Matthews correlation coefficient (MCC), respectively, outperforming previous studies found in the literature.The AFP-MVFL model and its source code are publicly available at https://github.com/MuntahaMim/AFP-MVFL.git (accessed on 1 May 2024).

Materials and Methods
The initial steps include importing necessary libraries, loading training and test datasets, and converting labels into binary values (0 for the negative and 1 for the positive classes).The training data are then preprocessed via scaling to ensure uniformity.Next, a random forest classifier with 100 estimators is initialized and fitted to the scaled training data to determine feature importance.At this stage, features with an importance level above the mean are selected and extracted.For evaluating classifier performance, stratified 10-fold cross-validation is employed.The general architecture of our model is presented in Figure 1.This section elaborates on the materials and methods used to build AFP_MVFL.

Dataset
To ensure a comprehensive evaluation and proper comparison with previous studies, this study leverages three benchmark datasets, namely, Antifp_DS1, Antifp_DS2, and An-tifp_DS3, which have been widely used in the literature [17][18][19][20][21].These datasets, outlined in Table 1, encompass distinct characteristics and composition.In Antifp_DS1, An-tifp_DS2, and Antifp_DS3, the positive samples originate from the data repository of antimicrobial peptides (DRAMP) [20], while excluding sequences containing unnatural amino acids (BIJOUX).However, the negative samples in each dataset differ.Antifp_DS1 negatives comprise active antimicrobial peptides, while Antifp_DS2 negatives are randomly generated from SwissProt.Notably, the maximum peptide length in these three datasets is 100.On the other hand, Antifp_DS3 encompasses peptides with lengths ranging from 5 to 30.Positive samples in Antifp_DS3 were collected from CAMP [21], DRAMP [22], and StarPep [23,24] databases, whereas negatives were randomly generated from the Swiss-Prot database.As shown in Table 1, all three datasets are balanced (equal number of positive and negative samples).They are also normally distributed.The non-AFPs were antimicrobial peptides other than antifungal peptides.

Dataset
To ensure a comprehensive evaluation and proper comparison with previous studies, this study leverages three benchmark datasets, namely, Antifp_DS1, Antifp_DS2, and An-tifp_DS3, which have been widely used in the literature [17][18][19][20][21].These datasets, outlined in Table 1, encompass distinct characteristics and composition.In Antifp_DS1, Antifp_DS2, and Antifp_DS3, the positive samples originate from the data repository of antimicrobial peptides (DRAMP) [20], while excluding sequences containing unnatural amino acids (BIJOUX).However, the negative samples in each dataset differ.Antifp_DS1 negatives comprise active antimicrobial peptides, while Antifp_DS2 negatives are randomly generated from SwissProt.Notably, the maximum peptide length in these three datasets is 100.On the other hand, Antifp_DS3 encompasses peptides with lengths ranging from 5 to 30.Positive samples in Antifp_DS3 were collected from CAMP [21], DRAMP [22], and StarPep [23,24] databases, whereas negatives were randomly generated from the Swiss-Prot database.As shown in Table 1, all three datasets are balanced (equal number of positive and negative samples).They are also normally distributed.The non-AFPs were chosen at random from the Swiss-Prot database.

Classifiers
To test the efficiency of our extracted features and identify the best classifier to build our model, we investigated eight different classifiers, most of which have been effectively used for similar studies [25].These eight classifiers are support vector machine (SVM), logistic regression (LR), decision tree (DT), rotation forest (RT), stochastic gradient descent (SGD), AdaBoost, naive Bayes (NB), and random forest (RF).

Support Vector Machine (SVM)
SVM is one of the most extensively used machine learning techniques in this field.It has been shown to outperform other classifiers for similar tasks [26][27][28].SVM aims to identify the biggest marginal hyperplane across classes to decrease prediction errors and improve classification task generality.In the case of linearly separable data, it generates a hyperplane with a maximum margin to distinguish two distinct classes.The SVM technique employs many kernel functions, such as linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid [29].In this study, we have used the linear kernel for the SVM and "C = 1" as a regularization parameter to influence the trade-off between having a smooth decision boundary and classifying the training points correctly.

Logistic Regression (LR)
The probability is estimated using log odds in logistic regression.It has been frequently employed for a variety of tasks with promising outcomes [30].It is also an excellent model for estimating the likelihood of a linear solution to a problem [31].In this study, we have used the default random state "0" to ensure the reproducibility of this method in the future.

Naive Bayes (NB)
In the field of machine learning and data mining, naive Bayes is regarded as one of the most prevalent types of classifiers [32].It is based on the assumption of conditional independence between features.This model creates a Gaussian naive Bayes classifier, and the instance is created with default parameters so that it represents the prior probabilities of the classes.

AdaBoost
AdaBoost is a booting-based approach that employs a basic classifier, also known as a weak learner, and improves its performance iteratively.It raises the cost of misclassified samples in each iteration to ensure they are correctly classified in the following iterations [33].Adaboost's performance strongly depends on its weak learner's performance in each iteration.We implemented the AdaBoost classifier using a decision tree as weak learners with 50 n_estimators to shrink the contribution of each classifier.

Random Forest (RF)
Proposed by Breiman in 2001 [34], random forest aims at building a powerful and divergent decision boundary by employing decision trees on numerous random subsets of data collected using the bagging technique [34].Random forest is a flexible technique for large-scale problems and has yielded promising results for various challenges [35].

Stochastic Gradient Descent (SGT)
Stochastic gradient descent (SGD) is a robust optimization algorithm widely used in machine learning and deep learning.It is a variant of the gradient descent method that is particularly well-suited for large datasets and complex models and has obtained promising results for similar studies [36,37].

Decision Tree (DT)
A decision tree is a non-parametric supervised learning approach for classification and regression applications.It has a hierarchical tree structure consisting of a root node, branches, internal nodes, and leaf nodes [38].

Feature Extraction
In this study, we employed iFeature, a widely used feature extraction tool, to extract informative features from the input data [39].iFeature provides a comprehensive set of feature descriptors that capture diverse aspects of the data, enabling a more comprehensive analysis.iFeature possesses the ability to compute and derive an extensive array of 18 primary sequence encoding schemes that cover 53 diverse feature descriptors.Within various feature categories, users are also able to extract distinct physiochemical properties of amino acids from the AAindex database [40].The following commonly used feature descriptors are calculated and extracted using iFeature.

Amino Acid Composition (AAC)
AAC presents the frequency or occurrence of each amino acid residue in the protein sequence.It provides insights into the overall amino acid distribution, which can be indicative of certain functional properties [41].The frequencies are computed for all 20 natural amino acids, denoted as "ACDEFGHIKLMNPQRSTVWY".
There are 20 elements in the amino acid composition (AAC) feature vector, each corresponding to one of the 20 standard amino acids.These elements indicate the percentage or frequency of each amino acid in the protein sequence.

Composition of Tripeptide (CTDC, CTDT, CTDD)
Tripeptide composition descriptors capture the occurrence frequencies of different combinations of three adjacent amino acids in the protein sequence.CTDC focuses on the composition of tripeptides in the C-terminus, CTDT in the middle, and CTDD in the N-terminus.
The composition of the tripeptide feature vector is an 8000-dimensional vector, with each dimension representing the frequency or occurrence of a specific tripeptide in the protein sequence.Each element in this vector corresponds to a unique tripeptide combination, capturing the information about the presence and distribution of these tripeptides in the protein sequence.

Dipeptide Composition (DPC)
DPC quantifies the occurrence frequencies of different combinations of two adjacent amino acids in the protein sequence.It provides information about local structural patterns and short-range interactions.
The dipeptide composition (DPC) is a 400-dimensional feature vector [42], with each dimension representing the frequency of a specific dipeptide in the protein sequence.

Grouped Amino Acid Composition (GAAC)
The "Grouped Amino Acid Composition" (GAAC) feature in iFeature involves grouping amino acids into predefined categories or classes and then computing the composition of these groups.We have used the basic grouping scheme that divides amino acids into four categories (e.g., hydrophobic, polar, charged, and aromatic).This GAAC feature vector has four features, each representing the composition of one of these groups in the protein sequence.

Global Descriptors of Protein Composition (GDPC)
GDPC captures the overall composition and properties of the protein by considering various physicochemical properties of the constituent amino acids.It provides a holistic view of the protein's chemical characteristics.
The "Grouped Diamino Acid Composition" (GDAC) feature in iFeature involves grouping dipeptides (two consecutive amino acids) into predefined categories or classes and then computing the composition of these groups.The GDPC is a 25-dimensional feature vector, with each dimension representing how dipeptides are grouped and the chosen classification scheme.

Grouped Tripeptide Composition (GTPC)
GTPC extends the tripeptide composition by grouping tripeptides with similar physicochemical properties.This allows for capturing higher-level structural and functional patterns in the protein sequence.
The GTPC is a 125-dimensional feature vector.Each feature vector represents the frequency or composition of tripeptides grouped into predefined categories or classes based on their physicochemical properties or structural similarities.

Tripeptide Position-Specific Composition (TPC)
TPC captures the position-specific occurrence frequencies of tripeptides in the protein sequence.It provides insights into the specific arrangement and distribution of tripeptides, which can be relevant for understanding functional motifs.
For a protein sequence of length L and using a standard scheme where the tripeptide composition is encoded at each position, the TPC feature vector will be L × 8000 (where 8000 represents the number of possible tripeptides).

Feature Selection
As is explained in Section 2.3, using iFeature, we extract over 20,000 features.This number of features exceeds the number of samples in our employed datasets by a 1:20 ratio (less than 1200 training samples and over 20,000 features).Hence, reducing the number of features is necessary to avoid under-training.In this study, after extracting the features using iFeature, we performed feature selection on our extracted features to identify the most effective features and filter out redundant features or those with limited discriminatory information.In this way, we aim to use a shortened input feature vector, which consequently enables us to build a more generalizable model.The significance of each feature in the training data is assessed using a random forest with 100 estimators.We have investigated several feature selection techniques.Among them, RF demonstrated the best performance.RF is considered an effective model for feature selection and classification.Feature importance is calculated, and features with importance surpassing the mean importance are selected for further analysis.These selected features are then extracted from both the training and test datasets to focus on the most informative aspects of the data [34,35].
The Gini index, widely employed in decision tree-based algorithms such as RF, is a metric to evaluate impurity or purity within a dataset [43].Specifically, it quantifies the likelihood that a randomly selected element would be misclassified, reflecting the overall impurity of a set of data points.In the context of the RF method, which is an ensemble of decision trees, the Gini index plays a crucial role in assessing the importance of each feature in contributing to the model's predictive accuracy.Features that lead to nodes with lower impurity are considered more important, as they contribute to more accurate classifications [44].
Here, we focused on the most important features to reduce the model's dimensionality, which improved computational efficiency.Features with higher Gini importance scores are indicative of their greater contribution to the overall predictive power of the model.This process aided in selecting a subset of features that are not only relevant but also collectively provide meaningful information for the given prediction task.

Performance Evaluation
To assess the performance and generalization capability of different classifiers, stratified 10-fold cross-validation and independent test sets are used.To report the results, we run our experiments 10 times and then report the average.

Evaluation Metrics
For the evaluation of our model's performance, various metrics, including accuracy (ACC), precision (PRE), the area under the precision-recall curve (AUPRC), the area under the receiver-operating characteristic curve (AUC), Matthews correlation coefficient (MCC), and F1-score are used.These metrics serve as reliable measures to assess the effectiveness and robustness of the model.The calculations for each metric are defined as follows: where TP represents the count of true positives, TN represents the count of true negatives, FP represents the count of false positives, and FN represents the count of false negatives.

Results and Discussion
This section showcases the experimental outcomes of multiple models employed for AFP prediction, utilizing diverse sequence encoding techniques and machine learning frameworks.A comprehensive comparison between our proposed model and state-of-theart AFP classifiers is also provided.

Performance of the Model for Different Classifiers
We conducted a comprehensive comparison using different classifiers to identify the best classifier to build AFP_MVLF.The results of the comparison between the classifiers for 10-fold cross-validation and the independent test set are presented in Tables 2 and 3 for the dataset Antifp_DS1, respectively.Note that for this comparison, we used all the extracted features using iFeature (no feature extraction).As shown in these tables, RF performs better than other classifiers used in this study.Using RF, we achieve an accuracy of 93.5%, F1 score of 0.92, precision of 91.2%, and MCC of 0.89 for the 10-fold cross-validation.RF also stands out with the highest accuracy of 93.8%, F1 score of 0.93, precision of 96.6%, and MCC of 0.80 for the independent test dataset.
As shown in Tables 4 and 5, again, RF consistently delivered the top performance among all classifiers for the Antifp_DS2 dataset using a 10-fold cross-validation and independent test set, respectively.It achieves remarkable accuracy of (93.5% and 93.1%), F1 scores of (0.93 and 0.93), MCC scores of (0.87 and 0.86), and precision scores of (92.6% and 92.3%), respectively, using 10-fold cross-validation and the independent test dataset.We also conducted the comparison for the Antifp_DS3 dataset and the summarized result in Tables 6 and 7 for a 10-fold cross-validation and independent test set, respectively.As shown in these tables, the model constructed with the RF performs the best among all the other classifiers.The model outperformed, resulting in ACC values of 93.7% and 94.1%, F1 scores of 0.93 and 0.92, MCC scores of 0.82 and 0.87, and precision scores of 94.3% and 95.1%, respectively.The ROC curves of all models on the independent test datasets-Antifp_DS1, An-tifp_DS2, and Antifp_DS3-are displayed in Figures 2-4, respectively.As shown in these figures, RF demonstrates better results compared to other classifiers.

Results Achieved on the Selected Feature Set
As a result of our comparison study, we use RF as the main classifier to build AFP_MVLF.Next, we use our feature extraction model and compare the results of using RF with and without feature extraction.The results of this comparison for Antifp_DS1 are presented in Table 8.As shown in this table, the models constructed with feature selec-tion consistently outperform the alternative model (RF and the whole feature set without using feature selection) in terms of accuracy (ACC), precision (PRE), F1 score, and MCC.Specifically, employing the random forest algorithm with 100 n_estimators in conjunction with the feature selection model yielded superior predictive capabilities, resulting in ACC values of 97.9% and 97.6%, F1 scores of 0.98 and 0.75, and MCC scores of 0.95 and 0.95 for the Antifp_DS1 dataset, respectively.These results substantiate the advantage of employing feature selection in improving the overall performance of the predictive models.

Comparison of the Proposed Model with Existing Models
Next, to investigate the effectiveness of our proposed model (AFP_MVLF), we compare its results against other state-of-the-art models found in the literature.The results achieved for AFP_MVLF compared to previous studies for Antifp_DS1 are presented in Table 9.As demonstrated in this table, AFP_MVLF outperforms previous studies, including [9,10,12,14,19,21,23,45] across all evaluation metrics.When compared to AFP-MFL, AFP-MVFL represents relative improvements of 2.1%, 2.4%, 1.3%, and 0.04 in terms of ACC, F1 score, precision, and MCC, respectively, for the Antifp_DS1 dataset.We also compare AFP_MVLF's performance to the state-of-the-art methods found in the literature for the Antifp_DS2 and Antifp_DS3 datasets.The experimental results obtained from these different datasets are presented in Table 10.AFP_MVLF consistently outperforms the competitive approaches across all evaluation metrics in each dataset.
Specifically, when tested on the Antifp_DS2 dataset, the AFP-MVFL achieves improved prediction rates with an accuracy of 98.3%, precision of 99.1%, F1 score of 0.98, and MCC of 0.97.A relative increase is observed compared to the AFP-MFL model.On the Antifp_DS3 dataset, the AFP-MVFL achieves an accuracy of 97.4%, precision of 98.4%, F1 score of 0.97, and MCC of 0.95, representing a relative improvement over the previous three models.These results establish that the AFP-MVFL consistently outperforms other methods in distinguishing AFPs from non-AFPs across all evaluated datasets.
We also generated t-SNE graphs in Figures 5-7 for the Antifp_DS1, Antifp_DS2, and Antifp_DS3 datasets to explore the importance and contribution of different features.Here we choose t-SNE to investigate feature importance since it was demonstrated as a better candidate than principal component analysis (PCA) in similar studies [46].By plotting the t-SNE, we can visualize the data in a reduced space, which helps us to identify which features are most relevant for distinguishing between different data points [47].By visualizing the distribution of instances in the reduced space, we can also assess the quality of the feature selection process [48].The results above highlight the robustness and generalizability of AFP-MVFL.By integrating a co-attention mechanism to fuse semantic information, evolutionary information, and physicochemical properties, AFP-MVFL effectively generates more informative features.Consequently, AFP-MVFL exhibits superior performance compared to alternative methods, positioning it as a reliable tool for AFP prediction.The AFP-MVFL model and its source code are publicly available at https://github.com/MuntahaMim/AFP-MVFL.git (accessed on 1 May 2024).

Conclusions
The accurate prediction of antifungal peptides is crucial for the advancement of therapeutic peptide design.In this study, we proposed a new machine learning framework called AFP_MVLF to predict AFPs accurately.Our approach employed a multi-view feature learning strategy to extract informative features from diverse perspectives, encompassing semantic information, evolutionary patterns, and physicochemical properties.
AFP-MVFL initially generated comprehensive profiles of peptide features by incorporating a set of sequence-based descriptors.AFP-MVFL achieved accurate AFP prediction based solely on sequence-based input features using the multi-view approach.Through rigorous cross-validation experiments conducted on three benchmark datasets, we demonstrated the superior performance of the AFP-MVFL compared to state-of-theart methods in AFP prediction.Overall, AFP-MVFL presented a robust tool for accurate AFP prediction based solely on sequence-based information.The AFP-MVFL model and its source code are publicly available at https://github.com/MuntahaMim/AFP-MVFL.git (accessed on 1 May 2024).
One of the main limitations of this study is having a limited number of samples with which to train our model.As shown in the result section, the results on the independent test set are similar or slightly better than those reported using 10-fold cross-validation.It means that when we use all the training data to build our model, we can achieve better performance.Hence, if we have more samples, we are likely to achieve better results.Therefore, for our future direction, we aim to build larger benchmarks to train more complex models and possibly enhance prediction performance.We also aim to investigate The results above highlight the robustness and generalizability of AFP-MVFL.By integrating a co-attention mechanism to fuse semantic information, evolutionary information, and physicochemical properties, AFP-MVFL effectively generates more informative features.Consequently, AFP-MVFL exhibits superior performance compared to alternative methods, positioning it as a reliable tool for AFP prediction.The AFP-MVFL model and its source code are publicly available at https://github.com/MuntahaMim/AFP-MVFL.git (accessed on 1 May 2024).

Conclusions
The accurate prediction of antifungal peptides is crucial for the advancement of therapeutic peptide design.In this study, we proposed a new machine learning framework called AFP_MVLF to predict AFPs accurately.Our approach employed a multi-view feature learning strategy to extract informative features from diverse perspectives, encompassing semantic information, evolutionary patterns, and physicochemical properties.
AFP-MVFL initially generated comprehensive profiles of peptide features by incorporating a set of sequence-based descriptors.AFP-MVFL achieved accurate AFP prediction based solely on sequence-based input features using the multi-view approach.Through rigorous cross-validation experiments conducted on three benchmark datasets, we demonstrated the superior performance of the AFP-MVFL compared to state-of-the-art methods in AFP prediction.Overall, AFP-MVFL presented a robust tool for accurate AFP prediction based solely on sequence-based information.The AFP-MVFL model and its source code are publicly available at https://github.com/MuntahaMim/AFP-MVFL.git (accessed on 1 May 2024).
One of the main limitations of this study is having a limited number of samples with which to train our model.As shown in the Section 3, the results on the independent test set are similar or slightly better than those reported using 10-fold cross-validation.It means that when we use all the training data to build our model, we can achieve better performance.Hence, if we have more samples, we are likely to achieve better results.Therefore, for our future direction, we aim to build larger benchmarks to train more complex models and possibly enhance prediction performance.We also aim to investigate more complex classification models to enhance the prediction performance even further to correctly determine unknown antifungal peptides.

Figure 1 .
Figure 1.The overall architecture of AFP-MVFL.The AFPs prediction pipeline consists of three modules: (i) feature extraction module using iFeature; (ii) feature selection using random forest; (iii) classification module for the prediction task.

Figure 1 .
Figure 1.The overall architecture of AFP-MVFL.The AFPs prediction pipeline consists of three modules: (i) feature extraction module using iFeature; (ii) feature selection using random forest; (iii) classification module for the prediction task.

Figure 2 .
Figure 2. ROC cCurve of the results for various classification models on the independent test of the Antifp_DS1 dataset.

Figure 3 .
Figure 3. ROC cCurve of the results for various classification models on the independent test of the Antifp_DS2 dataset.

Figure 2 .
Figure 2. ROC curve of the results for various classification models on the independent test of the Antifp_DS1 dataset.

Figure 2 .
Figure 2. ROC curve of the results for various classification models on the independent test of the Antifp_DS1 dataset.

Figure 3 .
Figure 3. ROC curve of the results for various classification models on the independent test of the Antifp_DS2 dataset.

Figure 4 .
Figure 4. ROC curve of the results for various classification models on the independent test of the Antifp_DS3 dataset.

Figure 3 .
Figure 3. ROC curve of the results for various classification models on the independent test of the Antifp_DS2 dataset.

Figure 2 .
Figure 2. ROC curve of the results for various classification models on the independent test of the Antifp_DS1 dataset.

Figure 3 .
Figure 3. ROC curve of the results for various classification models on the independent test of the Antifp_DS2 dataset.

Figure 4 .
Figure 4. ROC curve of the results for various classification models on the independent test of the Antifp_DS3 dataset.

Figure 4 .
Figure 4. ROC curve of the results for various classification models on the independent test of the Antifp_DS3 dataset.

Figure 5 .
Figure 5. Feature visualization of AFP-MVFL on the AntiFP_DS1 dataset.Blue dots correspond to instances where the label negative equals 0 and red dots correspond to instances where the label positive equals 1.

Figure 6 .
Figure 6.Feature visualization of random forest on the AntiFP_DS2 dataset.Blue dots correspond to instances where the label negative equals 0 and red dots correspond to instances where the label positive equals 1.

Figure 5 . 17 Figure 5 .
Figure 5. Feature visualization of AFP-MVFL on the AntiFP_DS1 dataset.Blue dots correspond to instances where the label negative equals 0 and red dots correspond to instances where the label positive equals 1.

Figure 6 .
Figure 6.Feature visualization of random forest on the AntiFP_DS2 dataset.Blue dots correspond to instances where the label negative equals 0 and red dots correspond to instances where the label positive equals 1.

Figure 6 .
Figure 6.Feature visualization of random forest on the AntiFP_DS2 dataset.Blue dots correspond to instances where the label negative equals 0 and red dots correspond to instances where the label positive equals 1.

Figure 7 .
Figure 7. Feature visualization of random forest on the AntiFP_DS3 dataset.Blue dots correspond to instances where the label negative equals 0 and red dots correspond to instances where the label positive equals 1.

Figure 7 .
Figure 7. Feature visualization of random forest on the AntiFP_DS3 dataset.Blue dots correspond to instances where the label negative equals 0 and red dots correspond to instances where the label positive equals 1.

Table 2 .
The results of comparing machine learning algorithms based on various performance metrics using 10-fold cross-validation for the Antifp_DS1 dataset.

Table 3 .
The results of comparing machine learning algorithms based on various performance metrics using an independent test set for the Antifp_DS1 dataset.

Table 4 .
The results of comparing machine learning algorithms based on various performance metrics using 10-fold cross-validation for the Antifp_DS2 dataset.

Table 5 .
The results of comparing machine learning algorithms based on various performance metrics using an independent test set for the Antifp_DS2 dataset.

Table 6 .
Results of the comparison of machine learning algorithms based on various performance metrics using 10-fold cross-validation for the Antifp_DS3 dataset.

Table 7 .
The results of comparing machine learning algorithms based on various performance metrics using an independent test set for the Antifp_DS3 dataset.

Table 8 .
Comparison of the AFP-MVFL model with and without feature selection of Antifp_DS1.

Table 9 .
Comparison of AFP-MVFL with other antifungal peptide predictors on the independent test dataset of Antifp_DS1.

Table 10 .
Comparison of AFP-MVFL with other antifungal peptide predictors on Antifp_DS2 and Antifp_DS3 datasets.