Next Article in Journal
An Improved YOLOv8n Used for Fish Detection in Natural Water Environments
Next Article in Special Issue
Current Trends in Artificial Intelligence and Bovine Mastitis Research: A Bibliometric Review Approach
Previous Article in Journal
Characterizing the Phan Rang Sheep: A First Look at the Y Chromosome, Mitochondrial DNA, and Morphometrics
Previous Article in Special Issue
Open-Set Recognition of Individual Cows Based on Spatial Feature Transformation and Metric Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Federated Multi-Label Learning (FMLL): Innovative Method for Classification Tasks in Animal Science

1
Graduate School of Natural and Applied Sciences, Dokuz Eylul University, Izmir 35390, Turkey
2
Department of Computer Engineering, Dokuz Eylul University, Izmir 35390, Turkey
3
Information Technologies Research and Application Center (DEBTAM), Dokuz Eylul University, Izmir 35390, Turkey
*
Author to whom correspondence should be addressed.
Animals 2024, 14(14), 2021; https://doi.org/10.3390/ani14142021
Submission received: 6 May 2024 / Revised: 2 July 2024 / Accepted: 8 July 2024 / Published: 9 July 2024

Abstract

:

Simple Summary

This study addresses the classification task in animal science, which helps organize and analyze complex data, essential for making informed decisions. It introduces Federated Multi-Label Learning (FMLL), a novel approach combining federated learning principles with a multi-label learning technique. Using machine learning strategies, FMLL achieved significant improvements in classification accuracy metrics compared to existing methods. The experimental results on different animal datasets demonstrated the effectiveness of FMLL and its superiority in multi-label classification tasks. The findings of our study offer valuable insights into understanding and managing animal populations, which could have important implications for biodiversity conservation and ecological management.

Abstract

Federated learning is a collaborative machine learning paradigm where multiple parties jointly train a predictive model while keeping their data. On the other hand, multi-label learning deals with classification tasks where instances may simultaneously belong to multiple classes. This study introduces the concept of Federated Multi-Label Learning (FMLL), combining these two important approaches. The proposed approach leverages federated learning principles to address multi-label classification tasks. Specifically, it adopts the Binary Relevance (BR) strategy to handle the multi-label nature of the data and employs the Reduced-Error Pruning Tree (REPTree) as the base classifier. The effectiveness of the FMLL method was demonstrated by experiments carried out on three diverse datasets within the context of animal science: Amphibians, Anuran-Calls-(MFCCs), and HackerEarth-Adopt-A-Buddy. The accuracy rates achieved across these animal datasets were 73.24%, 94.50%, and 86.12%, respectively. Compared to state-of-the-art methods, FMLL exhibited remarkable improvements (above 10%) in average accuracy, precision, recall, and F-score metrics.

1. Introduction

Animal science is an area where machine learning (ML) has proven effective in analyzing animal datasets and making predictions for future decisions. ML techniques have been utilized for different purposes such as animal health surveillance, outlier animal behavior detection, animal activity recognition, animal detection systems, and animal species classification. Moreover, multi-label learning as a subfield of ML has gained traction in animal science for handling complex scenarios where multiple labels need to be predicted simultaneously [1,2,3]. Furthermore, combining multi-label classification with federated learning (FL) enables distributed and privacy-preserving machine learning applications. Recent studies have demonstrated the effectiveness of FL in animal science initiatives, including federated frameworks for diagnosing and predicting animal diseases, monitoring animal welfare, predicting collaborative disease outbreaks, and implementing decentralized systems for animal tracking and detection [4,5,6]. These advancements highlight the potential of federated multi-label learning to revolutionize animal science by integrating robust predictive modeling with secure data-sharing mechanisms.
Federated learning is a collaborative ML approach that was introduced in 2016 [7]. In the FL framework, multiple clients work together to address machine learning problems, overseen by a central aggregator. This setup ensures that training data remains decentralized, safeguarding the privacy of each client’s data. In this framework, client data remains stored locally, and local models are trained in multiple nodes. Gaining popularity in recent years, this kind of distributed machine-learning technique builds a central model by aggregating local models, thereby reducing the computational complexity of training [8]. Consequently, federated learning proves highly beneficial in resolving privacy issues associated with data islands and holds promise for deployment across diverse edge devices [9,10].
Multi-label learning is a sophisticated machine learning paradigm that extends traditional classification techniques by allowing instances to be associated with multiple labels simultaneously. Unlike conventional single-label classification tasks where each instance is assigned to a single class, multi-label learning builds a model in which instances may exhibit multiple attributes or characteristics. This paradigm finds widespread application in domains where instances are inherently multi-faceted, such as image recognition [11], text classification [12], and biology [13]. For example, in biology classification tasks, multi-label learning can be applied to predict the functions of elements based on their multiple roles within biological pathways. The multi-label learning algorithm aims to capture the complex relationships between instances and their associated labels, finding applications across other fields e.g., animal [14], healthcare [15], social media [16], geoscience [17], transportation [18], and more, where data instances may belong to various classes at the same time.
Multi-label learning entails its own set of challenges. One common challenge is the increased complexity of model training and evaluation processes since multi-label datasets typically exhibit larger sizes and greater complexity compared to single-label datasets. Another challenge is that the presence of multiple labels can further complicate the learning process and require specialized algorithms. To tackle these obstacles, researchers have developed a solution, namely the binary relevance (BR) approach, which streamlines the learning process and facilitates the utilization of standard binary classifiers, such as support vector machines [19]. Additionally, techniques such as label powersets and classifier chains, have been proposed to tackle different aspects of the multi-label learning problem.
The Reduced-Error Pruning Tree (REPTree) algorithm is another method employed in machine learning, particularly in the context of decision tree-based classification tasks. REPTree aims to construct an optimal decision tree by iteratively pruning branches that do not contribute significantly to reducing classification error [20]. REPTree has applications in various domains such as animal [21], environment [22], healthcare [23], and education [24]. When considering multi-label classification tasks, REPTrees can serve as effective binary classifiers within the binary relevance framework. Each REP Tree can be trained independently to predict the absence or presence of a specific label, utilizing its pruning mechanism to optimize classification performance. They are simple yet powerful solutions, leveraging decision tree structures while handling the complexity of multiple labels per instance, to provide interpretable models that can manage both categorical and numerical data, making them suitable for a broad range of real-world problems.
The exploration of federated learning and multi-label learning, particularly in conjunction with methodologies such as the binary relevance approach and REPTree, remains relatively uncharted territory in the literature. Thus, in response to the evolving landscape of distributed data and complex classification tasks, we propose a novel approach, Federated Multi-Label Learning (FMLL) for classification tasks in the current study. Drawing upon established methodologies, namely Binary Relevance and Reduced-Error Pruning Tree (REPTree) approaches, our method aims to combine the strengths of federated learning and multi-label concepts to address the challenges inherent in distributed environments and multi-dimensional classification problems. The primary contributions of this study, setting it apart from other classification methods, are as follows:
(i)
The paper presents the first-of-its-kind Federated Multi-Label Learning (FMLL) method that combines federated learning principles with the Binary Relevance approach as a multi-label learning technique and uses the REPTree algorithm to address classification tasks where instances may belong to multiple classes simultaneously.
(ii)
FMLL contributes significantly to the field of animal science by offering a novel methodology for classifying diverse animal datasets. This advancement enables more accurate and efficient classification of animals based on various attributes, aiding researchers and practitioners in better understanding and managing animal populations.
(iii)
FMLL harnesses federated learning principles, allowing multiple nodes to collaboratively train a model using their own local data. This provides the distribution of computational complexity over multiple nodes to improve efficiency and ensures privacy preservation and data security, which are crucial considerations in animal science research where large sensitive data may be involved.
(iv)
The proposed approach adopts the Binary Relevance (BR) strategy to effectively handle the multi-label nature of the data. By accurately classifying instances belonging to multiple classes, FMLL enhances the understanding of complex relationships and characteristics within animal species datasets.
(v)
FMLL pioneers the use of the Reduced-Error Pruning Tree (REPTree) classifier within federated learning, marking the first instance in the literature. The REPTree was chosen for its effectiveness in addressing the complexities of multi-label classification tasks. This approach enhances both the accuracy and interpretability of classification results, representing a significant advancement in machine learning techniques applied to animal science.
(vi)
The effectiveness of FMLL is empirically validated through experiments conducted on three diverse datasets within the domain of animal science: Amphibians, Anuran-Calls-(MFCCs), and HackerEarth-Adopt-A-Buddy. These experiments demonstrated the applicability and efficacy of FMLL in real-world scenarios, showcasing significant improvements in classification accuracy.
(vii)
FMLL achieved remarkable improvements in classification accuracy across various animal datasets when compared to existing state-of-the-art methods. For instance, on the Amphibians dataset, FMLL achieved an average accuracy improvement of 10.92%. This improvement highlights the practical relevance and superiority of FMLL in multi-label classification tasks within the domain of animal science.
The structure of this paper unfolds as follows: Section 2 provides a concise review of related works, followed by Section 3, where we detail the materials and methods employed. Section 4 presents the experimental studies conducted, while Section 5 discusses the obtained results. Section 6 elucidates the conclusions drawn from our findings and delineates potential directions for future research on the proposed method.

2. Related Works

Lately, a plethora of researchers have committed their endeavors to developing federated learning (FL) techniques, aiming to bolster the efficacy of machine learning (ML) models. FL has found applications across different domains including health [25,26,27,28], agriculture [29,30,31,32], security [33,34,35,36], environment [37,38], animals [39,40,41], industries [42,43,44], transportation [45,46,47], and education [48,49,50,51]. For example, in the domain of health [28], a federated learning approach was introduced for the client end of health service providers. Their method incorporates modified artificial bee colony optimization and support vector machine techniques to enhance the accuracy of cardiovascular disease classification. In agriculture [31], a federated learning-based entropy model was presented to assess food safety by quantifying risk levels associated with pesticide residues in agricultural products. In security [36], the integration of homomorphic encryption into the privacy-preserving federated learning algorithm was implemented to empower centralized servers to securely aggregate encrypted local model parameters. In the environmental domain [40], a novel federated learning framework for animal activity recognition (FedAAR) was proposed to address the challenges of sensor-based animal monitoring systems through decentralized data from several farms.
Table 1 presents an overview of federated learning frameworks [30,52,53,54,55,56,57,58,59,60], offering insights to better understand the contributions made in this field. Various machine learning methods have been employed in previous studies, including the sparrow search algorithm (SSA) [52], the differential privacy Laplace mechanism (DPLA) [52], the amendable multi-function sensor control method (AMFSC) [54], and the multiscale residual attention network (MSRAN) [55]. While most studies [57,58,59,60] evaluated the results using the accuracy metric, some of them [30,53,55,56] also utilized F-score, precision, recall metrics, and others [52,53,56] used different indicators like the confusion matrix, false positive rate (FPR), mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R-squared).
Federated learning has been applied successfully across a broad spectrum of machine learning algorithms, including decision trees (DTs) [61], artificial neural networks (ANNs) [62], support vector machines (SVMs) [63], logistic regression (LR) [64], random forests (RFs) [65], and k-nearest neighbors (KNNs) [66]. These implementations have demonstrated the versatility and flexibility of federated learning techniques in diverse settings. Furthermore, different types of multi-based methodologies have emerged within the realm of federated learning, each aiming to address specific requirements and challenges. These similar methodologies to our FMLL method include multi-dimensional federated learning [67], multi-objective federated learning [68], multi-modal federated learning [69], multi-level federated edge learning [70], multi-model federated learning [71], and multi-participant multi-class vertical federated learning (MMVFL) [72], all specifically designed for multi-class classification tasks. Despite the breadth of research in federated learning, there remains a notable gap in the literature regarding multi-label-based federated learning approaches, indicating an area ripe for further exploration and development, particularly valuable in animal-related scenarios where data instances may simultaneously belong to multiple classes.
Multi-label learning challenges the traditional notion of assigning items to a single class and allows items to belong to multiple classes at the same time. This distinction underscores the complexity of classification tasks in modern data analysis. While single-label classification remains fundamental, multi-label classification has emerged as a crucial technique in various domains [73]. However, achieving high accuracy in multi-label classification presents a formidable hurdle, as accurately predicting multiple labels for each item demands sophisticated algorithms. Researchers have offered diverse solutions to handle the intricacies of multi-label classification tasks, including binary relevance (BR) [74], which treats each label as a separate binary classification task, and label powerset (LP) [75], which considers each unique combination of labels as a single class. Classifier chains (CCs) [76] sequentially train multiple binary classifiers, while random k-labelsets (RAkELs) [77] randomly partition the label space into subsets for classification. The ensemble of classifier chains (ECC) [78] combines multiple classifier chains for improved performance.
The multi-label k-nearest neighbors (ML-kNNs) method [79] adapts the k-nearest neighbor algorithm for multi-label classification. Pairwise coupling (PC) [80] trains a binary classifier for each pair of labels, while the majority of label sets [81] predict the most frequent label subset among training instances. Deep learning architectures, such as convolutional neural networks (CNNs) [82], recurrent neural networks (RNNs) [83], and graph neural networks (GNNs) [84], are powerful tools designed specifically for multi-label classification tasks. Additionally, hybrid approaches integrate various techniques to leverage the strengths of different methods, providing robustness in dealing with multi-label classification problems across diverse domains and related datasets, such as transfer learning-based multi-label classification [85], rule-based multi-label classification (MLC) [86], meta-learning based multi-instance multi-label learning (MetaMIML) [87], multi-label long short-term memory (LSTM) [88], the multi-label generative adversarial network (ML-CookGAN) [89], and so on. By reviewing these varied methodologies, valuable insights are gained into the evolving landscape of multi-label learning research in this study.
Recently, research has demonstrated the effectiveness of the REPtree in various machine learning-based tasks, including the rotational forest and reduced-error pruning trees (RTF-REPTree) approach in forest loss analysis [90], the ensemble models of REPTree in geospatial analysis [91], the combination of REPTree, additive regression (AR), regression by discretization (RD), and random committee (RC) models to predict the quality of river waters [92], the utilization of REPTree for air quality monitoring [93], the employment of REPTree in predicting landslide susceptibility (LSM) [94,95], the social engagement analysis of students during the COVID-19 pandemic through REPTree [96], the REPTree-based estimation of evapotranspiration (ETo) from the reference surface in agricultural planning [97], the enhancement of security in industrial internet of things (IIoT) to mitigate cyber-attacks via the REPTree and other ML algorithms [98], and the analysis of fear-inducing factors using the REPTree in reaction to the omicron variant of the coronavirus amidst academic societies [99]. While numerous types of decision trees, including GBDT [100,101,102,103,104,105,106], XGBoost [107,108,109,110,111,112,113,114,115,116,117], RF [118,119,120], and Extra Trees [121], have been utilized within federated learning methods, the literature notably lacks references to the REPtree. Renowned for its proficiency in handling noisy data and its interpretability, the REPtree holds promise for providing distinct advantages in federated learning.
It is noteworthy to consider that the classification of decision tree aggregation encompasses two primary groups, namely, aggregation decision trees and selecting decision trees, each with distinct methodologies. In the aggregation decision tree category, four types are delineated, including structured-based, weight-based, logic-based, and dataset-based approaches. Structured-based aggregation involves organizing decision trees hierarchically and then amalgamating different layers, thereby classifying samples within sub-nodes based on this hierarchical structure. Weight-based aggregation comprises treating divisions within the tree as sets and aggregating the weight values associated with samples in each set. Logic-based aggregation constructs decision trees as sets of logical rules, subsequently aggregating the logical expressions derived from these rules. Dataset-based aggregation entails fitting the outcomes of multiple decision trees onto a comprehensive dataset. In contrast, choosing decision trees involves iteratively selecting a single tree that optimally encapsulates the information across all the datasets, thereby serving as the global model. This systematic approach for decision tree aggregation and selection facilitates robust modeling across diverse datasets and problem domains [61].
While the REPtree has shown remarkable effectiveness across various machine learning tasks, including those mentioned earlier, its potential within the realm of federated learning and multi-label learning, particularly when combined with the binary relevance approach, remains relatively unexplored. Federated learning, which enables distributed model training across multiple components while keeping data decentralized, presents a powerful framework for effectively integrating algorithms like the REPtree. Similarly, multi-label learning is used to predict multiple labels for a single instance and could benefit from the proficiency of the REPtree. However, the intersection of these fields with the REPtree has yet to be deeply investigated, representing an intriguing avenue for further research in the current study.

3. Materials and Methods

3.1. Proposed Approach

This paper proposes a federated-learning-based approach that trains data distributed on the nodes and learns a global model by aggregating locally trained models. This innovative strategy aimed to revolutionize the traditional model of machine learning by decentralizing the training process. Instead of gathering user data into a centralized repository, it implements a distributed approach where each device independently trains a predictive model using locally stored data. The central server aggregates local models, refining the predictive capabilities of the model. This innovative technique not only enhances the performance of machine learning applications but also sets a new standard for privacy-preserving machine learning practices in diverse applications and industries.
Federated learning encompasses three primary steps: global model and constraints initialization, local training, and model aggregation. Notably, only the second step belongs to the local participants, while the remaining two are handled on the aggregation server side. Consider synchronized algorithms for federated learning, where a standard round entails the following sequence of steps: Firstly, a subset of clients is selected. Subsequently, each client builds or updates its local model based on its local private data. Then, the local models from these clients are transmitted to the server. Finally, the server aggregates these models to construct an enhanced global model. Hereby, a model resembling a traditionally centralized machine learning model is jointly constructed in an efficient way. Moreover, federated learning offers several notable advantages. Firstly, it enhances data privacy by retaining data on the client, thereby safeguarding sensitive information. Disclosure control mechanisms, such as differential privacy and homomorphic encryption, can be employed to further protect data during the exchange of model updates. Additionally, it enhances efficiency by distributing model training across multiple clients, allowing for parallelized and accelerated learning processes [122].
The federated learning architecture encompasses various approaches tailored to different data distribution scenarios: horizontal federated learning (HFL), vertical federated learning (VFL), and federated transfer learning (FTL). In HFL, local datasets may have the same feature space and different sample spaces. Each node trains a local model using its respective data, and the local models or outputs are then transmitted to a central server. The server aggregates these results and gives a response to the user, facilitating collaborative model training. Conversely, VFL utilizes vertical data partitioning, where the datasets of each client may have the same sample space and different feature spaces. This setup allows the ability to build an accurate model as participants retain their data and models locally, exchanging intermediate computation results with the server. FTL introduces a hybrid approach to data partitioning, characterized by a common sample space and different feature spaces. This setup is particularly useful for scenarios where there is minimal overlap in both data features and data samples among participants. FTL enables knowledge transfer across heterogeneous datasets by leveraging pre-trained models or representations from one domain to enhance learning in another domain, thereby maximizing the utility of disparate data sources [123]. Each federated learning approach offers distinct advantages and is tailored to specific data distribution characteristics, ensuring flexibility and scalability in addressing diverse real-world scenarios while maintaining data privacy and efficiency. In this study, we specifically employed VFL due to its ability to leverage the same sample space with differing target label features, which enriches the information about samples and facilitates the construction of multiple binary classifiers for multiple labels. In other words, this approach ensures that the number of instances for each client is equal, and therefore balanced as well.
In the binary relevance approach, the multi-label problem is decomposed into several binary classification tasks. Here, each label is handled as an independent binary classification task. This means that a separate binary classifier is trained on each client node to predict its presence or absence for a given instance. In other words, the number of client nodes is equal to the number of labels in the dataset. Therefore, label size impacts the addition or removal of client nodes in the final model. Consequently, the output of the binary classifiers is a set of binary predictions, one for each label. In addition to its simplicity, the binary relevance approach offers several advantages. It allows for the utilization of standard binary classifiers, shortens the learning process, and provides interpretability as the prediction of each label is independent of others. However, one potential drawback of the binary relevance approach is that it does not consider the correlations between labels, which could be important in certain applications. While our datasets do not require correlated labels, making this limitation less impactful in our context, it is worth noting for other potential applications. As a solution, the classifier chains method can be employed, which passes label information between classifiers and incorporates label correlations. This approach effectively captures label dependencies and addresses the limitations of the binary relevance method, potentially enhancing performance in scenarios where label correlations are significant.
In the proposed system, as shown in Figure 1, a central node collaborates with several local nodes (or clients) as the standard step of federated learning. In the architecture, the method manages instances with multiple labels, such as label 1 to label q, resulting in a multi-label dataset as the input. Initially, preprocessing operations are conducted to clean, manipulate, and prepare the data. Subsequently, dataset decomposition is performed to transform the multi-label dataset into multiple binary datasets, following the binary relevance approach. This decomposition yields datasets 1 to q, where instances possess binary labels—for example, dataset 1 indicates whether label 1 exists or not. These transformed datasets serve as local data on local nodes, acting as local clients within the federated learning framework. In the training phase, the REPTree algorithm is applied to each dataset, generating local models on local nodes—tree 1 corresponds to dataset 1, and so forth. Following this, in the central node, local models are aggregated to create a global model. After that, model evaluation takes place, where its performance is assessed using metrics such as accuracy, precision, recall, and F-score. This step ensures that the collective knowledge from the local models is effectively integrated. The final model in the central node facilitates predictions based on the input query data. This integrated approach offers a comprehensive solution for handling multi-label datasets within a federated learning context, providing scalability and efficiency while maintaining model performance.

3.2. Formal Description

Traditional supervised learning algorithms operate within the framework of single-label scenarios, where each sample in the training set is related to a sole label defining its characteristics. In contrast, multi-label learning algorithms deal with samples in the training set that are concurrently linked to multiple labels. The objective of multi-label learning is to predict the appropriate label set for unseen samples, which may encompass more than one label per. Here, the definition of multi-label learning is formally established. Given D as the training set comprising N samples   S i = ( x i , Y i ) , where   i = 1,2 , , N , each sample S i is paired with a feature vector x i = ( x i 1 , x i 2 , x i K ) having K elements and a subset of labels   Y i L , where L = y j     j = 1   t o   q } represents the set of q probable labels. This representation is depicted in Table 2. In this context, the objective of a multi-label learning algorithm is to construct a global model G that, given an unlabeled instance   S = ( x , ? ) , precisely predicts its subset of labels   Y , denoted as   G S Y , where Y represents the labels associated with the sample   S .
Table 2 illustrates a multi-label dataset where each sample S is associated with a subset of labels denoted by   Y . For instance, S 1 is associated with the label set Y 1 containing y 2 and   y 4 , indicating that this instance possesses both labels y 2 and   y 4 . It is noteworthy to regard that the outputs from all classifiers are combined with the concatenate operator. Here, the label set   Y 1   includes the concatenation of both labels y 2   and   y 4 . Similarly, the sample S 2 belongs to   y 1 , y 3 , and y 4 classes simultaneously, given with a concatenate operator. These representations showcase the multi-label nature of the dataset, where instances may have multiple associated labels simultaneously.
The binary relevance method represents a problem transformation approach that breaks down a multi-label classification task into multiple single-label binary classification problems, each corresponding to one of the q labels in the set   L = { y 1 , y 2 , , y q } . Primarily, this method converts the initial multi-label training dataset into q binary datasets D y j , j = 1,2 , , q , where D y j encompasses all samples from the initial dataset but with a singular positive or negative label attributed to the label y j based on the true label subset related to each sample. In essence, a label is considered positive if it is included in the label set containing   y j ; if not, it is considered negative. Following this transformation of the multi-label data, a collection of q binary classification models   M j , where   j = 1 , 2 , , q , is then developed using the respective datasets D y j . Finally, the local q models are aggregated to create the global model   G , as indicated by Equation (1):
G = M j ( x , y j ) y j 0,1   |   y j L   : j = 1 . . . q
To elucidate the fundamental concept of the binary relevance transformation procedure, Table 3 showcases the four binary datasets formed subsequently to transform the multi-label dataset as depicted in the preceding Table 2. In this context, the class attribute can take on two potential values: “present”, denoted as y j , or “not present”, represented as ¬ y j . Each row in Table 3 corresponds to a sample ( S 1 , S 2 , , S N ) from the original dataset, while each target column represents a distinct label ( y 1 , y 2 , y 3 , y 4 ) . Through this transformation, the binary datasets are constructed by discerning the presence or absence of individual labels for each sample. For instance, the positive indicators ( y j ) signify the presence of a label, while negative indicators ( ¬ y j ) indicate its absence. By comparing Table 3 with Table 2, it becomes evident how the labels associated with each example are encoded into binary attributes, simplifying the classification task. For instance, S 2 in Table 2 is associated with y 1 , y 3 , and y 4 , which is reflected in Table 3 by the presence of y 1 , y 3 , and y 4 , respectively, and the absence of ¬ y 2 . This transformation facilitates the utilization of conventional binary classification algorithms to handle multi-label classification tasks more effectively.
The Binary Relevance (BR) method is employed to classify new multi-label samples by aggregating labels positively identified by independent binary classifiers. An inherent advantage of the BR approach lies in its low computational complexity relative to other multi-label methods. Specifically, for a fixed number of samples, the scalability of BR is directly proportional to the size ( q ) of the label set ( L ). Given that the complexity of the base classifiers is constrained to   O ( C ) , the overall complexity of BR becomes   q O ( C ) . As a result, the BR method proves to be particularly suitable for scenarios where the value of q is not excessively large. However, given the prevalence of numerous labels across various domains, alternative methods, such as divide and conquer approaches, have emerged to establish labels into a tree-shaped hierarchy, allowing for the management of a substantially smaller set of labels in comparison with   q .
Algorithm 1 is devised to address the Federated Multi-Label Learning (FMLL) method through a structured approach divided into two main phases. The client learning process begins with data preparation, given the dataset   D   comprising   N   instances represented as   ( x i , Y i ) , where   x i   denotes the feature vector and   Y i   represents the associated labels, along with   q   as the number of nodes (or the number of class labels) and   M j as the local models for each label. The dataset D is partitioned into q binary datasets based on the presence of each class label   y j . Each node generates local datasets   D y j , marking instances as 1 if   y j   is present in   Y i   and 0 otherwise, and stores them locally. Subsequently, in local model training, each node independently trains local models   M j   using the REPTree algorithm on their respective binary datasets   D y j . These trained models   M j   are then transmitted to the central server for further processing. The server aggregation process integrates the received local models   M j   to construct a unified global model   G   through the model aggregation approach. The central server combines these models to form   G , representing a comprehensive synthesis of knowledge from all nodes. Using this global model, the algorithm performs classification tasks on the test set   T . For each instance   x   in   T , predictions are made by aggregating outputs from all local models, resulting in the final predicted label set   Y ^ . Thus, the algorithm provides a systematic approach to federated multi-label learning by incorporating distinct client learning and server aggregation processes. This structured methodology ensures robustness and reproducibility in handling distributed datasets and synthesizing global models, essential for effective multi-label prediction across decentralized environments.
Algorithm 1: Federated Multi-Label Learning (FMLL)
1. Client Learning Process
Inputs:
   D : dataset D = { x i , Y i } i = 1 N
   q = number of nodes (number of class labels at the same time)
Outputs:
   M j : local models for each label
1.1. Data Preparation
Begin
  for j = 1 to q
    foreach ( x i , Y i ) in D     // Generate binary datasets
      if ( y j Y i )
         D y j . A d d ( x i , 1 )
      else
         D y j . A d d ( x i , 0 )
      endif
    end foreach
    Store ( D y j )           // Store local data at the node j
  end for
1.2. Local Model Training
  for j = 1 to q
     M j = R E P T r e e ( D y j )     // Train local models at each node in parallel
    Send ( M j )            // Send M j   to the central server
  end for
End
2. Server Aggregation Process
Inputs:
   M j :   local models from each client for each class label
   q = number of nodes (number of class labels at the same time)
   T : test set that will be predicted
Outputs:
  G: global model
   Y ^ :   predicted labels for test set
2.1. Model Aggregation
Begin
   G = Ø
  for j = 1 to q
    Receive ( M j )   // Receive local models M j   from each client
     G = G   M j   // Aggregate the models to form the global model   G
  end for
2.2. Classification
  foreach x in T
    for j = 1 to q
       y = G ( x )
       Y ^ = Y ^ y  // Predict   y   using the global model   G
    end for
  end foreach
End

4. Experimental Studies

4.1. Dataset Description

The study of animals in their natural habitats is fundamental to our understanding of ecological dynamics, biodiversity conservation, and species management. Animal behavior, physiology, and interactions with their environment provide invaluable insights into the functioning of ecosystems and the intricate balance of life on our planet. In this paper, we harness the richness of animal-related datasets to evaluate the efficacy of our proposed Federated Multi-Label Learning (FMLL) method within the vibrant field of animal research. Table 4 provides a summarized overview of these datasets utilized in the current study. In this table, the respective number of classes is represented for each label in the datasets.

4.1.1. Amphibians

The Amphibians Habitat Classification dataset, briefly presented in Table 5, is collected from a combination of geographic information systems (GIS), satellite imagery, and field inventories conducted as part of environmental impact assessments (EIAs) for two planned road projects, including Road A and Road B in Poland [124]. Amphibians, as crucial animal indicators of environmental health and ecosystem integrity due to their sensitivity to environmental changes, play a vital role in assessing the impact of infrastructure projects on biodiversity, particularly within their habitat. Integrating GIS and satellite information with data collected from natural inventories, field research was directed within a 500-m-wide strip on both sides of the proposed project area for Road A, identifying 80 amphibian breeding sites, while Road B’s inventory focused on the vicinity of two variants of the planned Beskidy Integration Way, covering approximately 60 km and resulting in the identification of 109 amphibian occurrence sites through map analysis, field observations, a literature review, and archive data analysis. The dataset comprises multiple variables, contributing to a comprehensive understanding of amphibian habitats within the realm of biology. It was primarily generated for classification tasks, capturing diverse environmental characteristics relevant to amphibian habitat suitability.
This multivariate dataset with 189 samples and 23 features provides valuable insights into the ecological implications of road infrastructure development on amphibian populations, facilitating biodiversity conservation and informed decision-making in environmental management with the aim of predicting the existence of seven different animals, namely green frogs, brown frogs, common toads, fire-bellied toads, tree frogs, common newts, and great crested newts with labels one to seven, respectively. The dataset encompasses three distinct numerical features, as detailed in Table 6, showcasing their statistical attributes such as minimum, mean, maximum, mode, and standard deviation. Additionally, Table 7 comprehensively explains all features, providing deeper insight into the instances collected.

4.1.2. Anuran-Calls-(MFCCs)

The Anuran-Calls-(MFCCs) dataset [125] comprises acoustic features extracted from syllables of anuran (frogs) calls, accompanied by multi-label annotations indicating their family, genus, and species, as represented in Table 8. With a total of 7195 instances, this multivariate dataset has been extensively utilized in various classification and clustering tasks, particularly within the realm of biology. Furthermore, the dataset incorporates 22 separate numerical features, elaborated in Table 9, and highlights their statistical characteristics, including maximum, minimum, mean, mode, and standard deviation. Its completeness and reliability are attributed to the absence of missing values, markedly enhancing its suitability for such analytical endeavors.
The Anuran-Calls-(MFCCs) dataset originates from the segmentation of 60 audio recordings spanning four distinct families, eight genera, and ten species of anuran frogs. Each audio recording corresponds to a single specimen, with an additional record ID column included for reference. The distribution of instances for each family, genus, and species class is given in Table 10. The recordings were conducted in situ under real noise conditions, capturing the natural background sounds, thereby offering a diverse representation of anuran habitats, including locations such as the campus of the Federal University of Amazonas in Manaus, the Mata Atlantic region in Brazil, and even one location in Córdoba, Argentina. Recorded in WAV format at a sampling frequency of 44.1 kHz and a 32-bit resolution, the dataset enables signal analysis up to 22 kHz. The feature extraction process involved calculating 22 Mel-Frequency Cepstral Coefficients (MFCCs) for each syllable, employing 44 triangular filters. These coefficients are subsequently normalized within the range of −1 to 1 and are statistically discussed in Table 9.
The Anuran-Calls-(MFCCs) dataset, with its rich acoustic features and multi-label annotations, is a valuable asset for advancing research in anuran species recognition and related fields. Anurans play crucial roles in ecosystems worldwide, serving as indicators of ecosystem health and biodiversity. They regulate populations of insects and other invertebrates, maintaining ecological balance within animal food webs. Additionally, their skin contains bioactive compounds with potential pharmaceutical applications, contributing to medical research. However, anuran species are threatened by habitat destruction, pollution, and climate change, requiring robust analysis and conservation efforts. Furthermore, they are important for education and outreach initiatives, promoting public awareness of ecology, biodiversity, and conservation.

4.1.3. HackerEarth-Adopt-A-Buddy

The HackerEarth-Adopt-A-Buddy dataset [126] served a noble purpose in facilitating the creation of a virtual tour experience for an esteemed pet adoption agency amidst the pandemic, introduced in Table 11. As the pandemic saw a surge in animal adoption and fostering, this initiative aimed to keep potential pet owners engaged indoors by virtually presenting animals accessible for adoption. To support this endeavor, machine learning methods can be developed to determine the type and breed of animals based on their physical attributes and other pertinent factors. The description of all features in the HackerEarth-Adopt-A-Buddy dataset is summarized in Table 12. The dataset provides a comprehensive foundation for predictive model development and evaluation with 18,834 entries in the training dataset. Moreover, within the dataset, there are four distinct numerical features outlined in Table 13, presenting their statistical attributes such as minimum, maximum, mean, mode, and standard deviation.
This dataset presents an opportunity for multi-label classification as a fundamental aspect of machine learning. By utilizing the provided data and employing machine learning techniques, researchers are tasked with constructing a predictive model capable of accurately discerning both the breed category and pet category based on factors such as animal condition, appearance, and other relevant attributes. This dataset contributes to the important cause of promoting pet adoption and fostering.
Pets serve a crucial role in animal science, offering researchers invaluable insights into various aspects of behavior, physiology, and health. Beyond companionship, they provide real-life settings for studying topics such as animal nutrition, genetics, psychology, and disease management. Moreover, pets serve as models for understanding human–animal interactions, leading to advancements in veterinary medicine and animal welfare. Studying pets yields insights that benefit both human and animal well-being, making them indispensable in the field. Additionally, pet adoption holds significant importance in animal science, extending beyond providing loving homes for animals in need. It serves as a vital avenue for research and education within the discipline. Researchers gain valuable insights into behavior, health, and welfare by studying adopted animals in diverse environments. The diversity among adopted animals allows for the exploration of genetic variations and their impacts on traits and diseases, contributing to veterinary medicine and animal breeding practices. Furthermore, the adoption process fosters public awareness and appreciation for animal welfare issues, promoting responsible pet ownership and ethical treatment. Embracing pet adoption not only enriches individual lives but also advances our understanding and care of the animal kingdom through the analysis of related datasets.

4.2. Results

The primary objective of this study is to introduce an innovative method termed Federated Multi-Label Learning (FMLL) designed specifically for classification tasks. By integrating insights from well-established methodologies such as Binary Relevance and the Reduced-Error Pruning Tree (REPTree) approaches, our framework seeks to synergize the advantages of federated learning and multi-label concepts. This integration is aimed at tackling the complexities associated with multi-label classification issues. The efficacy of the FMLL method was validated using dedicated multi-label datasets, including Amphibians, Anuran-Calls-(MFCCs), and HackerEarth-Adopt-A-Buddy. Our approach was implemented in the C# programming language utilizing the Weka library [127]. The source codes of both FMLL and REPTree methods are publicly available in the GitHub archive (https://github.com/BitaGhasemkhani/Federated-Multi-Label-Learning-FMLL, accessed on 28 June 2024), ensuring reproducibility.
The REPTree classifier was used in our experiments, with hyperparameters set to their default values, e.g., batchSize (100), debug (False), doNotCheckCapabilities (False), initialCount (0.0), maxDepth (−1), minNum (2.0), minVarianceProp (0.001), noPruning (False), numDecimalPlaces (2), numFolds (3), seed (1), and spreadInitialCount (False). The experiments were conducted on standard machines (i.e., Intel(R) Core(TM) i5, 1.80 GHz, 4.00 GB RAM). Also, we employed the 10-fold cross-validation method during experimentation to train and assess the classification models. This method involves randomly dividing the dataset into ten sets, reserving one set for testing while the remaining nine sets serve as the training set. The evaluation process was iterated ten times, and the average classification accuracy was computed.
Furthermore, we employed a range of metrics to evaluate the performance of the proposed FMLL method, including accuracy (ACC), precision (PR), recall (TPR), F-score (FS), and true negative rate (TNR) as delineated in Equations (2) to (6). Moreover, we used the receiver operating characteristic (ROC) curve to assess the trade-off between the true positive rate (TPR) from Equation (4) and the false positive rate (FPR) from Equation (7). Additionally, the precision–recall curve (PRC) was utilized to evaluate the balance between precision and recall.
A C C = T P + T N T P + T N + F P + F N
P R = T P T P + F P
T P R = T P T P + F N
F S = 2 T P 2 T P + F P + F N
T N R = T N T N + F P
F P R = F P F P + T N
In this context:
  • True Positive (TP) signifies the count of correctly predicted positive classes by the classifier.
  • True Negative (TN) represents the count of accurately predicted negative classes by the classifier.
  • False Positive (FP) denotes the count of erroneously predicted positive classes by the classifier.
  • False Negative (FN) indicates the count of erroneously predicted negative classes by the classifier.
The application of Federated Multi-Label Learning (FMLL) to the Amphibians dataset yielded compelling results, as shown in Table 14, achieving an average accuracy of 73.24%. Precision scores ranged from 0.613 to 0.790, while recall scores varied from 0.656 to 0.884, demonstrating FMLL’s effectiveness in accurately classifying various amphibian species. Moreover, the F-score, ranging from 0.619 to 0.834, underscored the method’s capability to manage dataset complexities while maintaining a balanced performance between precision and recall. The ROC curve results, spanning from 0.503 to 0.715, highlighted variable performance in class differentiation, whereas the PRC values, ranging from 0.603 to 0.818, provided valuable insights into precision–recall trade-offs across different thresholds. Additionally, the TNR scores between 0.655 and 0.884 indicated the method’s reliability in correctly identifying negative instances. Remarkably, the “great crested newt” amphibian emerged as the top performer across all the metrics, except ROC.
Regarding the Anuran-Calls-(MFCCs) dataset, FMLL showcased exceptional performance, as represented in Table 15, boasting an average accuracy of 94.50%. Precision scores consistently surpassed 0.935 for family, genus, and species categories, demonstrating FMLL’s precision in classifying different levels of anuran calls. Additionally, recall scores ranged from 0.936 to 0.957, underscoring the method’s success in retrieving relevant instances for each category. The F-score, averaging 0.944, further validated FMLL’s effectiveness in handling multi-label classification tasks with high accuracy and reliability. Outstandingly, the “family” syllabus of Anurans excelled in all metrics, achieving an accuracy of 95.75%, with precision, TNR, ROC, PRC, recall, and F-score all reaching above 0.957. Moreover, TNR scores across all categories were considerably high, ranging from 0.980 to 0.992, indicating FMLL‘s ability to accurately identify negative instances. The ROC curve values, ranging from 0.978 to 0.983, illustrated strong performance in distinguishing between classes, while PRC values, ranging from 0.935 to 0.964, offered a detailed analysis of precision–recall dynamics across varying thresholds.
FMLL demonstrated remarkable performance on the HackerEarth-Adopt-A-Buddy dataset, as shown in Table 16, accurately predicting breed and pet categories with an average accuracy of 86.12%. According to the results, the “pet_category” exhibited slightly superior performance compared to the “breed_category” across all the metrics, except ROC and PRC. Also, precision, TNR, ROC, PRC, recall, and F-score metrics presented high average values of 0.863, 0.928, 0.956, 0.933, 0.861, and 0.858, respectively. Furthermore, the ROC values for both categories demonstrated strong discrimination between classes, with values of 0.965 for breed and 0.946 for pet categories. Furthermore, the PRC values, at 0.938 for the breed and 0.928 for pet categories, provided detailed visions into the model‘s precision–recall dynamics. FMLL reaffirmed its robustness in handling complex multi-label classification tasks across different datasets.
As evidenced by Table 14, the FMLL method achieved the highest accuracy (88.36%) on the “great crested newt” species among all the considered metrics. To elucidate the decision-making process underlying this performance, the FMLL method employed a REPTree classifier, generating a structured tree representation as shown in Figure 2. This REPTree structure prominently featured attributes such as type of water reservoirs (TR), surroundings 3 (SUR3), presence of fishing (FR), number of water reservoirs (NR), and vegetation presence (VR) as pivotal nodes. The hierarchical arrangement facilitated a detailed comprehension of feature interactions and their impact on species classification. This illustrative tree not only aids in interpreting model decisions but also underscores the importance of feature selection and attribute significance in FMLL-based classification tasks.
To elaborate further on Figure 2, the root node, labeled TR, represents the most significant attribute for splitting the data, with branches indicating different values of TR. Internal nodes such as SUR3, FR, NR, and VR are actually decision points where data are further split based on specific attribute values. Each leaf node provides the final classification outcome and contains two sets of numbers: (a/b) and [c/d]. Here, a represents the total number of instances reaching the leaf, b indicates the number of misclassified instances, c denotes the number of instances of the majority class, and d shows the number of instances of the minority class. For example, the leaf node 0 (10/4) [5/1] under SUR3 = 1 and TR = 1 indicates that out of 10 instances, 4 were misclassified, with 5 instances in the majority class and 1 in the minority class. Misclassified instances highlight areas where the model‘s predictions do not align with the actual data, aiding in assessing model accuracy. Subtree analysis under nodes like FR = 6 shows further splits based on values of NR, leading to various leaves with their respective instance distributions. To achieve optimal accuracy, parameters such as the minimum number of instances per leaf were fine-tuned in Weka, ensuring the model balances complexity and generalization. This inclusive interpretation of the REPTree figure enhances our understanding of the model‘s performance and data patterns.

5. Discussion

In this section, we compare our proposed method with the current state-of-the-art techniques [124,125,128] in the field. Our analysis covers different dimensions, including accuracy metric on the Amphibians dataset and precision, recall, and F-score evaluation metrics on the Anuran-Calls-(MFCCs) dataset, juxtaposed with state-of-the-art methods, represented in Table 17 and Table 18, respectively.
As shown in Table 17, our approach achieved a remarkable 10.92% improvement on average regarding the Amphibians dataset, outperforming the state-of-the-art methods [124,128]. This improvement can be attributed to the combination of FMLL with BR and the REPTree. While the gradient-boosted tree (GBT), random forest (RF), AdaBoost (ADA), decision tree (DT), and partially monotonic decision tree (PMDT) approaches attained moderate accuracy rates ranging from 57.54% to 71.50%, the proposed method surpassed all these state-of-the-art techniques with the highest accuracy rate of 73.24%. These outcomes highlight the superior performance of FMLL in accurately classifying instances within the multi-label Amphibians dataset.
Table 18 presents a comprehensive comparison of precision, recall, and F-score metrics for various methods using the Anuran-Calls-(MFCCs) dataset, categorized into different taxonomic levels, including species, family, genus, and their combination. At the species level, the FMLL method outperformed all others [125] with precision, recall, and F-score scores of 0.935, 0.936, and 0.935, respectively. The previous methods, e.g., KNN-Flat, RBF-SVM-Flat, Polynomial-SVM-Flat, and Tree-Flat [125], displayed precision scores ranging from 0.470 to 0.850, recall scores ranging from 0.500 to 0.760, and F-scores ranging from 0.490 to 0.740. At the family level, FMLL again revealed superior performance, boasting precision, recall, and F-scores of 0.957 each, outperforming the baseline method, KNN-LCPL. Similarly, at the genus level, FMLL exhibited substantial enhancements over its counterpart, achieving precision, recall, and F-scores of 0.941, 0.942, and 0.941, respectively. Across all taxonomic levels, our method consistently outperformed KNN-LCPL, showcasing precision, recall, and F-scores of 0.944, 0.945, and 0.944, respectively. It is notable that the FMLL method attained substantial improvements across various taxonomic levels when compared to state-of-the-art peers. Specially, at the species taxonomic level, FMLL demonstrated improvements of 25.1% in precision, 30.1% in recall, and 28.4% in F-score metrics. Moving to the family taxonomic level, the method presented improvements of 24.4%, 13.7%, and 19.4% in precision, recall, and F-score metrics, respectively. Similarly, at the genus taxonomic level, FMLL achieved improvements of 27.8%, 21.1%, and 24.6% in precision, recall, and F-score metrics. Finally, when considering the combination of species, family, and genus taxonomic levels, FMLL exhibited improvements of 25.5%, 18.8%, and 22.3% in precision, recall, and F-score metrics. These results underscore the effectiveness of the FMLL method across multiple taxonomic levels, demonstrating substantial improvements over baseline methods in terms of precision, recall, and F-score metrics.
The accuracy results of the existing KNN-LCPL method and the proposed FMLL method given in Table 18 were evaluated using the Mann–Whitney-U and the Quade tests in detail. The Mann–Whitney-U and Quade as non-parametric statistical tests are ideal for comparing the performances of algorithms, making them appropriate for our analysis. The obtained p-values from the Mann–Whitney-U and Quade tests are 0.02107 and 0.03047, respectively. These results show that p-values are considerably below the significance level of 0.05 (α = 0.05). These results indicate that the likelihood of the results occurring by random chance is minimal, allowing us to reject the null hypothesis, which suggests no difference in performance between the methods. Therefore, these statistical tests provide strong evidence that the proposed FMLL method significantly outperformed the KNN-LCPL method. The very small p-values obtained underscore the substantial and reliable differences in accuracy between the two methods.

6. Conclusions and Future Work

In summary, this study introduces Federated Multi-Label Learning (FMLL) as a groundbreaking approach in animal science classification to address the challenges posed by distributed data. By blending federated learning principles with multi-label learning techniques, FMLL offers a method for handling classification tasks where instances may belong to multiple classes simultaneously. Utilizing the Binary Relevance (BR) strategy and adopting the Reduced-Error Pruning Tree (REPTree) classifier within the federated learning framework, FMLL demonstrated robust performance and showcased significant improvements (above 10%) in classification accuracy across diverse animal species datasets. Empirical validation on three distinct datasets—Amphibians, Anuran-Calls-(MFCCs), and HackerEarth-Adopt-A-Buddy—underscored the effectiveness of FMLL in real-world scenarios. Notably, the classification accuracy reached 94.50% for the Anuran-Calls-(MFCCs) dataset and 86.12% for the HackerEarth-Adopt-A-Buddy dataset, highlighting the robustness and practical relevance of FMLL across various taxonomic levels and its potential for applications in diverse domains. Having explored the advancements and contributions of the current research, the following conclusions highlight the significant impacts of the proposed method on the field of animal studies:
(i)
Introduction of FMLL (with BR and REPTree) in animal science classification as a novel approach, applicable to diverse real-world scenarios.
(ii)
Providing the distribution of computational cost over several clients and ensuring data security with FMLL to preserve privacy in collaborative learning environments.
(iii)
Effective handling of multi-label data within the FMLL framework using the BR strategy.
(iv)
Pioneering use of the REPTree classifier in federated learning, enhancing accuracy and interpretability.
(v)
Empirical validation of FMLL on various animal-based datasets, demonstrating its reliable applicability and efficacy in the field.
(vi)
The superiority of FMLL in multi-label classification tasks, evidenced by higher accuracy, precision, recall, and F-score metrics compared to state-of-the-art methods.
(vii)
The practical relevance of FMLL across taxonomic levels, showcasing its reliability in addressing multi-label classification problems within the context of animal research.
Looking ahead, several avenues emerge for further exploration of FMLL. Firstly, developing a web application that provides an interface to access the FMLL-based machine-learning model could be useful for animal scientists in decision-making. Additionally, extending FMLL to accommodate dynamic datasets collected by IoT devices, along with integrating mechanisms for model updating, could bolster its adaptability and long-term performance. Exploring alternative multi-label learning methodologies, such as classifier chains, would address the current limitation of binary relevance by incorporating label correlations. Moreover, ensemble learning techniques could be further integrated with FMLL by combining predictions from multiple models. Further exploration of deep learning architectures within the FMLL framework presents an opportunity to uncover profound insights into complex patterns inherent in animal science data. By focusing on these research directions, we aspire to propel the field of federated multi-label learning forward and advance its applications in animal science classification tasks.

Author Contributions

Conceptualization, B.G., O.V. and Y.D.; methodology, B.G., S.U. and K.U.B.; software, B.G. and D.B.; validation, B.G.; formal analysis, B.G.; investigation, B.G., O.V., Y.D., S.U. and K.U.B.; resources, B.G. and D.B.; data curation, O.V., Y.D., S.U. and K.U.B.; writing—original draft preparation, B.G.; writing—review and editing, O.V., Y.D., S.U., K.U.B. and D.B.; visualization, B.G. and D.B.; supervision, D.B.; project administration, D.B.; funding acquisition, O.V., Y.D., S.U. and K.U.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The “Amphibians” dataset [124] is publicly available in the UCI (University of California Irvine) machine learning repository (https://archive.ics.uci.edu/dataset/528/amphibians, accessed on 22 April 2024). The “Anuran-Calls-(MFCCs)” dataset [125] is publicly available in the UCI learning repository (https://archive.ics.uci.edu/dataset/406/anuran+calls+mfccs, accessed on 22 April 2024). The “HackerEarth-Adopt-A-Buddy” dataset [126] is publicly available in the Kaggle machine learning repository (https://www.kaggle.com/datasets/mannsingh/hackerearth-ml-challenge-pet-adoption, accessed on 22 April 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this paper.
AIArtificial intelligence
ANNArtificial neural network
ADAAdaBoost
AMFSCAmendable multi-function sensor control method
ARAdditive regression
BRBinary relevance
CCClassifier chains
CNNConvolutional neural network
CPPSCyber-physical production system
DAGsDual attention gates
DeepFedWTFederated deep learning framework
DPLADifferential privacy Laplace mechanism
DTDecision tree
ECCEnsemble of classifier chains
EIAsEnvironmental impact assessments
EToEstimation of the evapotranspiration
FCLOptFederated contrastive learning optimization
FedAARFederated learning framework for animal activity recognition
FedAvgFederated averaging
FELIDSFederated learning-based intrusion detection system
FLFederated learning
FMLLFederated multi-label learning
FPRFalse positive rate
FTLFederated transfer learning
GBTGradient-boosted tree
GISGeographic information system
GNNGraph neural network
HFLHorizontal federated learning
IDSIntrusion detection system
IIoTIndustrial Internet of things
IoTInternet of things
KNNK-nearest neighbors
LCPLLocal classifier per level
LCPNLocal classifier per node
LPLabel powerset
LRLogistic regression
LSMLandslide susceptibility map
LSTMLong short-term memory
MAEAbsolute error
MetaMIMLMeta learning based multi-instance multi-label learning
MLMachine learning
MLCMulti-label classification
ML-CookGANMulti-label generative adversarial network
ML-kNNMulti-label k-nearest neighbors
MMVFLMulti-participant multi-class vertical federated learning
MSRANMulti scale residual attention network
PCPairwise coupling
PMDTPartially monotonic decision tree
PRCPrecision-recall curve
RAkELRandom k-labelsets
RBFRadial basis function
RCRandom committee
RDRegression by discretization
REPTreeReduced error pruning tree
RFRandom forests
RMSERoot mean square error
RNNRecurrent neural network
ROCReceiver operating characteristic
RTF-REPTreeRotational forest and reduced error pruning trees
SCADASupervisory control and data acquisition
SSASparrow search algorithm
SVMSupport vector machines
TNRTrue negative rate
TPRTrue positive rate
VFLVertical federated learning

References

  1. Li, R.; Gao, L.; Wu, G.; Dong, J. Multiple Marine Algae Identification Based on Three-Dimensional Fluorescence Spectroscopy and Multi-Label Convolutional Neural Network. Spectrochim. Acta Part A 2024, 311, 123938. [Google Scholar] [CrossRef]
  2. Swaminathan, B.; Jagadeesh, M.; Vairavasundaram, S. Multi-Label Classification for Acoustic Bird Species Detection Using Transfer Learning Approach. Ecol. Inf. 2024, 80, 102471. [Google Scholar] [CrossRef]
  3. Celniak, W.; Wodziński, M.; Jurgas, A.; Burti, S.; Zotti, A.; Atzori, M.; Müller, H.; Banzato, T. Improving the Classification of Veterinary Thoracic Radiographs through Inter-Species and Inter-Pathology Self-Supervised Pre-Training of Deep Learning Models. Sci. Rep. 2023, 13, 19518. [Google Scholar] [CrossRef]
  4. Ahsan, M.M.; Alam, T.E.; Haque, M.A.; Ali, M.S.; Rifat, R.H.; Nafi, A.A.N.; Hossain, M.M.; Islam, M.K. Enhancing Monkeypox Diagnosis and Explanation through Modified Transfer Learning, Vision Transformers, and Federated Learning. Inf. Med. Unlocked 2024, 45, 101449. [Google Scholar] [CrossRef]
  5. van Schaik, G.; Hostens, M.; Faverjon, C.; Jensen, D.B.; Kristensen, A.R.; Ezanno, P.; Frössling, J.; Dórea, F.; Jensen, B.-B.; Carmo, L.P.; et al. The DECIDE Project: From Surveillance Data to Decision-Support for Farmers and Veterinarians. Open Res. Eur. 2023, 3, 82. [Google Scholar] [CrossRef]
  6. Shah, K.; Kanani, S.; Patel, S.; Devani, M.; Tanwar, S.; Verma, A.; Sharma, R. Blockchain-Based Object Detection Scheme Using Federated Learning. Secur. Priv. 2022, 6, e276. [Google Scholar] [CrossRef]
  7. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
  8. Ogundokun, R.O.; Misra, S.; Maskeliunas, R.; Damasevicius, R. A review on federated learning and machine learning approaches: Categorization, application areas, and blockchain technology. Information 2022, 13, 263. [Google Scholar] [CrossRef]
  9. Abreha, H.G.; Hayajneh, M.; Serhani, M.A. Federated learning in edge computing: A systematic survey. Sensors 2022, 22, 450. [Google Scholar] [CrossRef]
  10. Shaheen, M.; Farooq, M.S.; Umer, T.; Kim, B.-S. Applications of federated learning; taxonomy, challenges, and research trends. Electronics 2022, 11, 670. [Google Scholar] [CrossRef]
  11. Hassanin, M.; Radwan, I.; Khan, S.; Tahtali, M. Learning discriminative representations for multi-label image recognition. J. Vis. Commun. Image Represent. 2022, 83, 103448. [Google Scholar] [CrossRef]
  12. Alfaro, R.; Allende-Cid, H.; Allende, H. Multilabel text classification with label-dependent representation. Appl. Sci. 2023, 13, 3594. [Google Scholar] [CrossRef]
  13. Mei, S. A Multi-label learning framework for predicting chemical classes and biological activities of natural products from biosynthetic gene clusters. J. Chem. Ecol. 2023, 49, 681–695. [Google Scholar] [CrossRef]
  14. Zhu, C.; Liu, Y.; Miao, D.; Dong, Y.; Pedrycz, W. Within-cross-consensus-view representation-based multi-view multi-label learning with incomplete data. Neurocomputing 2023, 557, 126729. [Google Scholar] [CrossRef]
  15. Mo, L.; Zhu, Y.; Zeng, L. A Multi-label based physical activity recognition via cascade classifier. Sensors 2023, 23, 2593. [Google Scholar] [CrossRef]
  16. Suh, J.H. Multi-label prediction-based fuzzy age difference analysis for social profiling of anonymous social media. Appl. Sci. 2024, 14, 790. [Google Scholar] [CrossRef]
  17. Han, R.; Wang, Z.; Guo, Y.; Wang, X.; A, R.; Zhong, G. Multi-label prediction method for lithology, lithofacies and fluid classes based on data augmentation by cascade forest. Adv. Geo Energy Res. 2023, 9, 25–37. [Google Scholar] [CrossRef]
  18. Hou, J.; Zeng, H.; Cai, L.; Zhu, J.; Chen, J.; Ma, K.-K. Multi-label learning with multi-label smoothing regularization for vehicle re-identification. Neurocomputing 2019, 345, 15–22. [Google Scholar] [CrossRef]
  19. Zhang, M.L.; Li, Y.K.; Liu, X.Y.; Geng, X. Binary relevance for multi-label learning: An overview. Front. Comput. Sci. 2018, 12, 191–202. [Google Scholar] [CrossRef]
  20. Akshay, E.; Sugumaran, V.; Elangovan, M. Single point cutting tool fault diagnosis in turning operation using reduced error pruning tree classifier. Struct. Durab. Health Monit. 2022, 16, 255–270. [Google Scholar] [CrossRef]
  21. Clunie, C.; Batista-Mendoza, G.; Cedeño-Moreno, D.; Calderón-Gómez, H.; Mendoza-Pittí, L.; Russell, C.; Vargas-Lombardo, M. Use of data mining strategies in environmental parameters in poultry farms, a case Study. In Proceedings of the 9th International Conference, Guayaquil, Ecuador, 13–16 November 2023; pp. 81–94. [Google Scholar] [CrossRef]
  22. Kumar, A.R.S.; Goyal, M.K.; Ojha, C.S.P.; Singh, R.D.; Swamee, P.K. Application of artificial neural network, fuzzy logic and decision tree algorithms for modelling of streamflow at Kasol in India. Water Sci. Technol. 2013, 68, 2521–2526. [Google Scholar] [CrossRef]
  23. Lin, C.-N.; Huang, W.-S.; Huang, T.-H.; Chen, C.-Y.; Huang, C.-Y.; Wang, T.-Y.; Liao, Y.-S.; Lee, L.-W. Adding value of MRI over CT in predicting peritoneal cancer index and completeness of cytoreduction. Diagnostics 2021, 11, 674. [Google Scholar] [CrossRef]
  24. Haron, N.H.; Mahmood, R.; Amin, N.M.; Ahmad, A.; Jantan, S.R. An Artificial Intelligence Approach to Monitor and Predict Student Academic Performance. J. Adv. Res. Appl. Sci. Eng. Technol. 2024, 44, 105–119. [Google Scholar] [CrossRef]
  25. Dhade, P.; Shirke, P. Federated learning for healthcare: A comprehensive review. Eng. Proc. 2023, 59, 230. [Google Scholar] [CrossRef]
  26. Da Silva, F.R.; Camacho, R.; Tavares, J.M.R.S. Federated learning in medical image analysis: A systematic survey. Electronic 2024, 13, 47. [Google Scholar] [CrossRef]
  27. Prasad, V.K.; Bhattacharya, P.; Maru, D.; Tanwar, S.; Verma, A.; Singh, A.; Tiwari, A.K.; Sharma, R.; Alkhayyat, A.; Țurcanu, F.-E.; et al. Federated learning for the internet-of-medical-things: A survey. Mathematics 2023, 11, 151. [Google Scholar] [CrossRef]
  28. Yaqoob, M.M.; Nazir, M.; Khan, M.A.; Qureshi, S.; Al-Rasheed, A. hybrid classifier-based federated learning in health service providers for cardiovascular disease prediction. Appl. Sci. 2023, 13, 1911. [Google Scholar] [CrossRef]
  29. Žalik, K.R.; Žalik, M. A review of federated learning in agriculture. Sensors 2023, 23, 9566. [Google Scholar] [CrossRef]
  30. Friha, O.; Ferrag, M.A.; Shu, L.; Maglaras, L.; Choo, K.K.R.; Nafaa, M. FELIDS: Federated learning-based intrusion detection system for agricultural Internet of Things. J. Parallel Distrib. Comput. 2022, 165, 17–31. [Google Scholar] [CrossRef]
  31. Yu, J.; Chen, Y.; Wang, Z.; Liu, J.; Huang, B. Food risk entropy model based on federated learning. Appl. Sci. 2022, 12, 5174. [Google Scholar] [CrossRef]
  32. Li, A.; Markovic, M.; Edwards, P.; Leontidis, G. Model pruning enables localized and efficient federated learning for yield forecasting and data sharing. Expert Syst. Appl. 2024, 242, 122847. [Google Scholar] [CrossRef]
  33. Fedorchenko, E.; Novikova, E.; Shulepov, A. Comparative review of the intrusion detection systems based on federated learning: Advantages and open challenges. Algorithms 2022, 15, 247. [Google Scholar] [CrossRef]
  34. Lazzarini, R.; Tianfield, H.; Charissis, V. Federated learning for IoT intrusion detection. AI 2023, 4, 509–530. [Google Scholar] [CrossRef]
  35. Ashraf, M.M.; Waqas, M.; Abbas, G.; Baker, T.; Abbas, Z.H.; Alasmary, H. FedDP: A privacy-protecting theft detection scheme in smart grids using federated learning. Energies 2022, 15, 6241. [Google Scholar] [CrossRef]
  36. Park, J.; Lim, H. Privacy-preserving federated learning using homomorphic encryption. Appl. Sci. 2022, 12, 734. [Google Scholar] [CrossRef]
  37. Abimannan, S.; El-Alfy, E.-S.M.; Hussain, S.; Chang, Y.-S.; Shukla, S.; Satheesh, D.; Breslin, J.G. Towards federated learning and multi-access edge computing for air quality monitoring: Literature review and assessment. Sustainability 2023, 15, 13951. [Google Scholar] [CrossRef]
  38. Supriya, Y.; Gadekallu, T.R. Particle swarm-based federated learning approach for early detection of forest fires. Sustainability 2023, 15, 964. [Google Scholar] [CrossRef]
  39. Chen, D.; Yang, P.; Chen, I.-R.; Ha, D.S.; Cho, J.-H. SusFL: Energy-Aware Federated Learning-based Monitoring for Sustainable Smart Farms. arXiv 2024, arXiv:2402.10280. [Google Scholar] [CrossRef]
  40. Mao, A.; Huang, E.; Gan, H.; Liu, K. FedAAR: A novel federated learning framework for animal activity recognition with wearable sensors. Animals 2022, 12, 2142. [Google Scholar] [CrossRef]
  41. Huang, Y.; Yang, X.; Guo, J.; Cheng, J.; Qu, H.; Ma, J.; Li, L. A High-Precision Method for 100-Day-Old Classification of Chickens in Edge Computing Scenarios Based on Federated Computing. Animals 2022, 12, 3450. [Google Scholar] [CrossRef]
  42. Berghout, T.; Benbouzid, M.; Bentrcia, T.; Lim, W.H.; Amirat, Y. Federated Learning for Condition Monitoring of Industrial Processes: A Review on Fault Diagnosis Methods, Challenges, and Prospects. Electronics 2023, 12, 158. [Google Scholar] [CrossRef]
  43. Wu, S.; Xue, H.; Zhang, L. Q-Learning-Aided Offloading Strategy in Edge-Assisted Federated Learning over Industrial IoT. Electronics 2023, 12, 1706. [Google Scholar] [CrossRef]
  44. Bemani, A.; Björsell, N. Low-Latency Collaborative Predictive Maintenance: Over-the-Air Federated Learning in Noisy Industrial Environments. Sensors 2023, 23, 7840. [Google Scholar] [CrossRef]
  45. Kaleem, S.; Sohail, A.; Tariq, M.U.; Asim, M. An Improved Big Data Analytics Architecture Using Federated Learning for IoT-Enabled Urban Intelligent Transportation Systems. Sustainability 2023, 15, 15333. [Google Scholar] [CrossRef]
  46. Alohali, M.A.; Aljebreen, M.; Nemri, N.; Allafi, R.; Duhayyim, M.A.; Alsaid, M.I.; Alneil, A.A.; Osman, A.E. Anomaly Detection in Pedestrian Walkways for Intelligent Transportation System Using Federated Learning and Harris Hawks Optimizer on Remote Sensing Images. Remote Sens. 2023, 15, 3092. [Google Scholar] [CrossRef]
  47. Xu, C.; Mao, Y. An Improved Traffic Congestion Monitoring System Based on Federated Learning. Information 2020, 11, 365. [Google Scholar] [CrossRef]
  48. Fachola, C.; Tornaría, A.; Bermolen, P.; Capdehourat, G.; Etcheverry, L.; Fariello, M.I. Federated Learning for Data Analytics in Education. Data 2023, 8, 43. [Google Scholar] [CrossRef]
  49. Sengupta, D.; Khan, S.S.; Das, S.; De, D. FedEL: Federated Education Learning for generating correlations between course outcomes and program outcomes for Internet of Education Things. IoT 2024, 25, 101056. [Google Scholar] [CrossRef]
  50. Guo, S.; Zeng, D. Pedagogical Data Federation toward Education 4.0. In Proceedings of the 6th International Conference on Frontiers of Educational Technologies; Association for Computing Machinery, New York, NY, USA, 5–8 June 2020; pp. 51–55. [Google Scholar] [CrossRef]
  51. Zhang, T.; Liu, H.; Tao, J.; Wang, Y.; Yu, M.; Chen, H.; Yu, G. Enhancing Dropout Prediction in Distributed Educational Data Using Learning Pattern Awareness: A Federated Learning Approach. Mathematics 2023, 11, 4977. [Google Scholar] [CrossRef]
  52. Huang, G.; Zhao, X.; Lu, Q. A New Cross-Domain Prediction Model of Air Pollutant Concentration Based on Secure Federated Learning and Optimized LSTM Neural Network. Environ. Sci. Pollut. Res. 2022, 30, 5103–5125. [Google Scholar] [CrossRef] [PubMed]
  53. Idoje, G.; Dagiuklas, T.; Muddesar, I. Federated Learning: Crop Classification in a Smart Farm Decentralised Network. Smart Agric. Technol. 2023, 5, 100277. [Google Scholar] [CrossRef]
  54. Abu-Khadrah, A.; Ali, A.M.; Jarrah, M. An Amendable Multi-Function Control Method Using Federated Learning for Smart Sensors in Agricultural Production Improvements. ACM Trans. Sens. Netw. 2023, in press. [CrossRef]
  55. Jiang, G.; Fan, W.; Li, W.; Wang, L.; He, Q.; Xie, P.; Li, X. DeepFedWT: A Federated Deep Learning Framework for Fault Detection of Wind Turbines. Measurement 2022, 199, 111529. [Google Scholar] [CrossRef]
  56. Campos, E.M.; Saura, P.F.; González-Vidal, A.; Hernández-Ramos, J.L.; Bernabé, J.B.; Baldini, G.; Skarmeta, A. Evaluating Federated Learning for Intrusion Detection in Internet of Things: Review and Challenges. Comput. Netw. 2022, 203, 108661. [Google Scholar] [CrossRef]
  57. Wu, Y.; Zeng, D.; Wang, Z.; Shi, Y.; Hu, J. Distributed Contrastive Learning for Medical Image Segmentation. Med. Image Anal. 2022, 81, 102564. [Google Scholar] [CrossRef] [PubMed]
  58. Rey, V.; Sánchez, P.M.S.; Celdrán, A.H.; Bovet, G. Federated Learning for Malware Detection in IoT Devices. Comput. Netw. 2022, 204, 108693. [Google Scholar] [CrossRef]
  59. Novikova, E.; Doynikova, E.; Golubev, S. Federated Learning for Intrusion Detection in the Critical Infrastructures: Vertically Partitioned Data Use Case. Algorithms 2022, 15, 104. [Google Scholar] [CrossRef]
  60. Geng, D.; He, H.; Lan, X.; Liu, C. Bearing Fault Diagnosis Based on Improved Federated Learning Algorithm. Computing 2021, 104, 1–19. [Google Scholar] [CrossRef]
  61. Wang, Z.; Gai, K. Decision Tree-Based Federated Learning: A Survey. Blockchains 2024, 2, 40–60. [Google Scholar] [CrossRef]
  62. Tonellotto, N.; Gotta, A.; Nardini, F.M.; Gadler, D.; Silvestri, F. Neural Network Quantization in Federated Learning at the Edge. Inf. Sci. 2021, 575, 417–436. [Google Scholar] [CrossRef]
  63. Anaissi, A.; Suleiman, B.; Alyassine, W. A personalized federated learning algorithm for one-class support vector machine: An application in anomaly detection. In Proceedings of the International Conference on Computational Science, London, UK, 21–23 June 2022; pp. 373–379. [Google Scholar] [CrossRef]
  64. Deng, Z.; Han, Z.; Ma, C.; Ding, M.; Yuan, L.; Ge, C.; Liu, Z. Vertical Federated Unlearning on the Logistic Regression Model. Electronics 2023, 12, 3182. [Google Scholar] [CrossRef]
  65. Markovic, T.; Leon, M.; Buffoni, D.; Punnekkat, S. Random Forest Based on Federated Learning for Intrusion Detection. In Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations, Crete, Greece, 17–20 June 2022; pp. 132–144. [Google Scholar] [CrossRef]
  66. Liu, Z.; Wang, L.; Chen, K. Secure efficient federated knn for recommendation systems. In Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery; Springer: Cham, Switzerland, 2021; pp. 1808–1819. [Google Scholar] [CrossRef]
  67. Jiang, C.; Yin, K.; Xia, C.; Huang, W. FedHGCDroid: An Adaptive Multi-Dimensional Federated Learning for Privacy-Preserving Android Malware Classification. Entropy 2022, 24, 919. [Google Scholar] [CrossRef] [PubMed]
  68. Zhong, J.; Wu, Y.; Ma, W.; Deng, S.; Zhou, H. Optimizing Multi-Objective Federated Learning on Non-IID Data with Improved NSGA-III and Hierarchical Clustering. Symmetry 2022, 14, 1070. [Google Scholar] [CrossRef]
  69. Che, L.; Wang, J.; Zhou, Y.; Ma, F. Multimodal Federated Learning: A Survey. Sensors 2023, 23, 6986. [Google Scholar] [CrossRef]
  70. Liu, Z.; Duan, S.; Wang, S.; Liu, Y.; Li, X. MFLCES: Multi-Level Federated Edge Learning Algorithm Based on Client and Edge Server Selection. Electronics 2023, 12, 2689. [Google Scholar] [CrossRef]
  71. Le, D.-D.; Tran, A.-K.; Dao, M.-S.; Nguyen-Ly, K.-C.; Le, H.-S.; Nguyen-Thi, X.-D.; Pham, T.-Q.; Nguyen, V.-L.; Nguyen-Thi, B.-Y. Insights into Multi-Model Federated Learning: An Advanced Approach for Air Quality Index Forecasting. Algorithms 2022, 15, 434. [Google Scholar] [CrossRef]
  72. Feng, S.; Yu, H.; Zhu, Y. MMVFL: A Simple Vertical Federated Learning Framework for Multi-Class Multi-Participant Scenarios. Sensors 2024, 24, 619. [Google Scholar] [CrossRef] [PubMed]
  73. Sajid, N.A.; Rahman, A.; Ahmad, M.; Musleh, D.; Basheer Ahmed, M.I.; Alassaf, R.; Chabani, S.; Ahmed, M.S.; Salam, A.A.; AlKhulaifi, D. Single vs. Multi-Label: The Issues, Challenges and Insights of Contemporary Classification Schemes. Appl. Sci. 2023, 13, 6804. [Google Scholar] [CrossRef]
  74. Suri, J.S.; Bhagawati, M.; Paul, S.; Protogerou, A.D.; Sfikakis, P.P.; Kitas, G.D.; Khanna, N.N.; Ruzsa, Z.; Sharma, A.M.; Saxena, S.; et al. A Powerful Paradigm for Cardiovascular Risk Stratification Using Multiclass, Multi-Label, and Ensemble-Based Machine Learning Paradigms: A Narrative Review. Diagnostics 2022, 12, 722. [Google Scholar] [CrossRef] [PubMed]
  75. Kumar, S.; Kumar, N.; Dev, A.; Naorem, S. Movie Genre Classification Using Binary Relevance, Label Powerset, and Machine Learning Classifiers. Multimed. Tools Appl. 2023, 82, 945–968. [Google Scholar] [CrossRef]
  76. Raza, A.; Rustam, F.; Siddiqui, H.U.R.; Diez, I.d.l.T.; Garcia-Zapirain, B.; Lee, E.; Ashraf, I. Predicting Genetic Disorder and Types of Disorder Using Chain Classifier Approach. Genes 2023, 14, 71. [Google Scholar] [CrossRef]
  77. Yoo, J.; Jin, Y.; Ko, B.; Kim, M.-S. k-Labelsets Method for Multi-Label ECG Signal Classification Based on SE-ResNet. Appl. Sci. 2021, 11, 7758. [Google Scholar] [CrossRef]
  78. Rocha, V.F.; Varejão, F.M.; Segatto, M.E.V. Ensemble of Classifier Chains and Decision Templates for Multi-Label Classification. Knowl. Inf. Syst. 2022, 64, 643–663. [Google Scholar] [CrossRef]
  79. Romero-del-Castillo, J.A.; Mendoza-Hurtado, M.; Ortiz-Boyer, D.; García-Pedrajas, N. Local-Based K Values for Multi-Label K-Nearest Neighbors Rule. Eng. Appl. Artif. Intell. 2022, 116, 105487. [Google Scholar] [CrossRef]
  80. Chada, N.K.; Hoel, H.; Jasra, A.; Zouraris, G.E. Improved Efficiency of Multilevel Monte Carlo for Stochastic PDE through Strong Pairwise Coupling. J. Sci. Comput. 2022, 93, 62. [Google Scholar] [CrossRef]
  81. Read, J.; Bifet, A.; Holmes, G.; Pfahringer, B. Scalable and Efficient Multi-Label Classification for Evolving Data Streams. Mach. Learn. 2012, 88, 243–272. [Google Scholar] [CrossRef]
  82. Nadeem, M.I.; Ahmed, K.; Li, D.; Zheng, Z.; Naheed, H.; Muaad, A.Y.; Alqarafi, A.; Abdel Hameed, H. SHO-CNN: A Metaheuristic Optimization of a Convolutional Neural Network for Multi-Label News Classification. Electronics 2023, 12, 113. [Google Scholar] [CrossRef]
  83. Shakeel, M.; Nishida, K.; Itoyama, K.; Nakadai, K. 3D Convolution Recurrent Neural Networks for Multi-Label Earthquake Magnitude Classification. Appl. Sci. 2022, 12, 2195. [Google Scholar] [CrossRef]
  84. Pang, Y.; Qin, X.; Zhang, Z. Specific Relation Attention-Guided Graph Neural Networks for Joint Entity and Relation Extraction in Chinese EMR. Appl. Sci. 2022, 12, 8493. [Google Scholar] [CrossRef]
  85. Park, M.; Tran, D.Q.; Lee, S.; Park, S. Multilabel Image Classification with Deep Transfer Learning for Decision Support on Wildfire Response. Remote Sens. 2021, 13, 3985. [Google Scholar] [CrossRef]
  86. Hüllermeier, E.; Fürnkranz, J.; Mencia, E.L. Conformal Rule-Based Multi-Label Classification. Lect. Notes Comput. Sci. 2020, 12325, 290–296. [Google Scholar] [CrossRef]
  87. Qiu, S.; Wang, M.; Yang, Y.; Yu, G.; Wang, J.; Yan, Z.; Domeniconi, C.; Guo, M. Meta Multi-Instance Multi-Label Learning by Heterogeneous Network Fusion. Inf. Fusion 2023, 94, 272–283. [Google Scholar] [CrossRef]
  88. Verma, S.; Singh, S.; Majumdar, A. Multi-Label LSTM Autoencoder for Non-Intrusive Appliance Load Monitoring. Electr. Power Syst. Res. 2021, 199, 107414. [Google Scholar] [CrossRef]
  89. Liu, Z.; Niu, K.; He, Z. ML-CookGAN: Multi-Label Generative Adversarial Network for Food Image Generation. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 19, 85. [Google Scholar] [CrossRef]
  90. Saha, S.; Saha, M.; Mukherjee, K.; Arabameri, A.; Ngo, P.T.T.; Paul, G.C. Predicting the Deforestation Probability Using the Binary Logistic Regression, Random Forest, Ensemble Rotational Forest, REPTree: A Case Study at the Gumani River Basin, India. Sci. Total Environ. 2020, 730, 139197. [Google Scholar] [CrossRef] [PubMed]
  91. Ajin, R.S.; Saha, S.; Saha, A.; Biju, A.; Costache, R.; Kuriakose, S.L. Enhancing the Accuracy of the REPTree by Integrating the Hybrid Ensemble Meta-Classifiers for Modelling the Landslide Susceptibility of Idukki District, South-Western India. Photonirvachak 2022, 50, 2245–2265. [Google Scholar] [CrossRef]
  92. Al-Mukhtar, M.; Srivastava, A.; Khadke, L.; Al-Musawi, T.; Elbeltagi, A. Prediction of Irrigation Water Quality Indices Using Random Committee, Discretization Regression, REPTree, and Additive Regression. Water Resour. Manag. 2023, 38, 343–368. [Google Scholar] [CrossRef]
  93. Alsultanny, Y. Machine Learning by Data Mining REPTree and M5P for Predicating Novel Information for PM10. Cloud Comput. Data Sci. 2020, 1, 40–48. [Google Scholar] [CrossRef]
  94. Saha, S.; Sarkar, R.; Roy, J.; Saha, T.K.; Bhardwaj, D.; Acharya, S. Predicting the Landslide Susceptibility Using Ensembles of Bagging with RF and REPTree in Logchina, Bhutan. In Impact of Climate Change, Land Use and Land Cover, and Socio-Economic Dynamics on Landslides; Sarkar, R., Shaw, R., Pradhan, B., Eds.; Springer: Singapore, 2022; pp. 231–247. [Google Scholar] [CrossRef]
  95. Mandal, K.; Saha, S.; Mandal, S. Predicting the Landslide Susceptibility in Eastern Sikkim Himalayan Region, India Using Boosted Regression Tree and REPTree Machine Learning Techniques. In Applied Geomorphology and Contemporary Issues; Mandal, S., Maiti, R., Nones, M., Beckedahl, H.R., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 683–707. [Google Scholar] [CrossRef]
  96. Prajapati, J.B. Analysis of Age Sage Classification for Students’ Social Engagement Using REPTree and Random Forest. In Proceedings of the International Conference on Computational Intelligence in Data Science, Virtual Event, 24–26 March 2022; pp. 44–54. [Google Scholar] [CrossRef]
  97. Elbeltagi, A.; Srivastava, A.; Al-Saeedi, A.H.; Raza, A.; Abd-Elaty, I.; El-Rawy, M. Forecasting Long-Series Daily Reference Evapotranspiration Based on Best Subset Regression and Machine Learning in Egypt. Water 2023, 15, 1149. [Google Scholar] [CrossRef]
  98. Mrabet, H.; Alhomoud, A.; Jemai, A.; Trentesaux, D. A Secured Industrial Internet-of-Things Architecture Based on Blockchain Technology and Machine Learning for Sensor Access Control Systems in Smart Manufacturing. Appl. Sci. 2022, 12, 4641. [Google Scholar] [CrossRef]
  99. Olaleye, T.O. Opinion Mining Analytics for Spotting Omicron Fear-Stimuli Using REPTree Classifier and Natural Language Processing. Int. J. Res. Appl. Sci. Eng. Technol. 2022, 10, 995–1005. [Google Scholar] [CrossRef]
  100. Li, Q.; Wu, Z.; Cai, Y.; Han, Y.; Yung, C.M.; Fu, T.; He, B. Fedtree: A federated learning system for trees. In Proceedings of the 6th Machine Learning and Systems, Miami Beach, FL, USA, 8 June 2023; pp. 1–15. [Google Scholar]
  101. Zheng, Y.; Xu, S.; Wang, S.; Gao, Y.; Hua, Z. Privet: A Privacy-Preserving Vertical Federated Learning Service for Gradient Boosted Decision Tables. IEEE Trans. Serv. Comput. 2023, 16, 3604–3620. [Google Scholar] [CrossRef]
  102. Maddock, S.; Cormode, G.; Wang, T.; Maple, C.; Jha, S. Federated Boosted Decision Trees with Differential Privacy. In Proceedings of the CCS, Nagasaki, Japan, 30 May–2 June 2022; pp. 2249–2263. [Google Scholar] [CrossRef]
  103. Yamamoto, F.; Ozawa, S.; Wang, L. eFL-Boost: Efficient Federated Learning for Gradient Boosting Decision Trees. IEEE Access 2022, 10, 43954–43963. [Google Scholar] [CrossRef]
  104. Fu, F.; Shao, Y.; Yu, L.; Jiang, J.; Xue, H.; Tao, Y.; Cui, B. Vf2boost: Very fast vertical federated gradient boosting for cross-enterprise learning. In Proceedings of the SIGMOD, Xi’an, China, 20–25 June 2021; pp. 563–576. [Google Scholar] [CrossRef]
  105. Li, Q.; Wu, Z.; Wen, Z.; He, B. Privacy-preserving gradient boosting decision trees. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 784–791. [Google Scholar] [CrossRef]
  106. Zhao, L.; Ni, L.; Hu, S.; Chen, Y.; Zhou, P.; Xiao, F.; Wu, L. InPrivate Digging: Enabling Tree-based Distributed Data Mining with Differential Privacy. In Proceedings of the IEEE Conference on Computer Communications, Honolulu, HI, USA, 16–19 April 2018; pp. 2087–2095. [Google Scholar] [CrossRef]
  107. Li, X.; Hu, Y.; Liu, W.; Feng, H.; Peng, L.; Hong, Y.; Ren, K.; Qin, Z. OpBoost: A vertical federated tree boosting framework based on order-preserving desensitization. arXiv 2022, arXiv:2210.01318. [Google Scholar] [CrossRef]
  108. Zhao, J.; Zhu, H.; Xu, W.; Wang, F.; Lu, R.; Li, H. SGBoost: An Efficient and Privacy-Preserving Vertical Federated Tree Boosting Framework. IEEE Trans. Inf. Forensics Secur. 2022, 18, 1022–1036. [Google Scholar] [CrossRef]
  109. Cheng, K.; Fan, T.; Jin, Y.; Liu, Y.; Chen, T.; Papadopoulos, D.; Yang, Q. SecureBoost: A Lossless Federated Learning Framework. IEEE Intell. Syst. 2021, 36, 87–98. [Google Scholar] [CrossRef]
  110. Chen, W.; Ma, G.; Fan, T.; Kang, Y.; Xu, Q.; Yang, Q. Secureboost+: A high performance gradient boosting tree framework for large scale vertical federated learning. arXiv 2021, arXiv:2110.10927. [Google Scholar] [CrossRef]
  111. Le, N.K.; Liu, Y.; Nguyen, Q.M.; Liu, Q.; Liu, F.; Cai, Q.; Hirche, S. Fedxgboost: Privacy-preserving xgboost for federated learning. arXiv 2021, arXiv:2106.10662. [Google Scholar] [CrossRef]
  112. Law, A.; Leung, C.; Poddar, R.; Popa, R.A.; Shi, C.; Sima, O.; Zheng, W. Secure collaborative training and inference for xgboost. In Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice, Virtual Event, 9 November 2020; pp. 21–26. [Google Scholar] [CrossRef]
  113. Wang, Z.; Yang, Y.; Liu, Y.; Liu, X.; Gupta, B.B.; Ma, J. Cloud-based federated boosting for mobile crowdsensing. arXiv 2020, arXiv:2005.05304. [Google Scholar] [CrossRef]
  114. Zhang, J.; Zhao, X.; Yuan, P. Federated security tree algorithm for user privacy protection. J. Comput. Appl. 2020, 40, 2980. [Google Scholar]
  115. Li, Q.; Wen, Z.; He, B. Practical Federated Gradient Boosting Decision Trees. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 4642–4649. [Google Scholar] [CrossRef]
  116. Yang, M.W.; Song, L.Q.; Xu, J.; Li, C.; Tan, G. The tradeoff between privacy and accuracy in anomaly detection using federated xgboost. arXiv 2019, arXiv:1907.07157. [Google Scholar] [CrossRef]
  117. Liu, Y.; Ma, Z.; Liu, X.; Ma, S.; Nepal, S.; Deng, R. Boosting Privately: Privacy-Preserving Federated Extreme Boosting for Mobile Crowdsensing. arXiv 2019, arXiv:1907.10218. [Google Scholar] [CrossRef]
  118. Yao, H.; Wang, J.; Dai, P.; Bo, L.; Chen, Y. An efficient and robust system for vertically federated random forest. arXiv 2022, arXiv:2201.10761. [Google Scholar] [CrossRef]
  119. Han, Y.; Du, P.; Yang, K. FedGBF: An efficient vertical federated learning framework via gradient boosting and bagging. arXiv 2022, arXiv:2204.00976. [Google Scholar] [CrossRef]
  120. Wu, Y.; Cai, S.; Xiao, X.; Chen, G.; Ooi, B.C. Privacy preserving vertical federated learning for tree-based models. arXiv 2020, arXiv:2008.06170. [Google Scholar] [CrossRef]
  121. Liu, Y.; Liu, Y.; Liu, Z.; Liang, Y.; Meng, C.; Zhang, J.; Zheng, Y. Federated Forest. IEEE Trans. Big Data 2020, 8, 843–854. [Google Scholar] [CrossRef]
  122. Zhang, K.; Song, X.; Zhang, C.; Yu, S. Challenges and future directions of secure federated learning: A survey. Front. Comput. Sci. 2022, 16, 165817. [Google Scholar] [CrossRef] [PubMed]
  123. Banabilah, S.; Aloqaily, M.; Alsayed, E.; Malik, N.; Jararweh, Y. Federated learning review: Fundamentals, enabling technologies, and future applications. Inf. Process. Manag. 2022, 59, 103061. [Google Scholar] [CrossRef]
  124. Blachnik, M.; Sołtysiak, M.; Dąbrowska, D. Predicting Presence of Amphibian Species Using Features Obtained from GIS and Satellite Images. ISPRS Int. J. Geo Inf. 2019, 8, 123. [Google Scholar] [CrossRef]
  125. Colonna, J.G.; Gama, J.; Nakamura, E.F. A comparison of hierarchical multi-output recognition approaches for anuran classification. Mach. Learn. 2018, 107, 1651–1671. [Google Scholar] [CrossRef]
  126. Kaggle. HackerEarth ML Challenge: Adopt a Buddy. Available online: https://www.kaggle.com/datasets/mannsingh/hackerearth-ml-challenge-pet-adoption (accessed on 16 March 2024).
  127. Witten, I.H.; Frank, E.; Hall, M.A. Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed.; Morgan Kaufmann: Cambridge, MA, USA, 2016; pp. 1–664. [Google Scholar]
  128. Pan, W. Predicting Presence of Amphibian Species Using Feature Selection. In Proceedings of the 6th IEEE Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 4–6 March 2022; pp. 1823–1826. [Google Scholar] [CrossRef]
Figure 1. The architecture of the proposed FMLL method.
Figure 1. The architecture of the proposed FMLL method.
Animals 14 02021 g001
Figure 2. REPTree structure for “great crested newt” classification in FMLL.
Figure 2. REPTree structure for “great crested newt” classification in FMLL.
Animals 14 02021 g002
Table 1. Overview of federated learning frameworks.
Table 1. Overview of federated learning frameworks.
YearRef.FL TypeDatasetAggregation
Algorithm
ML
Algorithm
Evaluation
Metric
Contribution
2023[52]CentralizedAir pollutants and meteorolgical dataFedAvgLSTM, SSA, and DPLAMAE, RMSE, R-squaredCross-domain prediction of air pollutant concentration
2023[53]DecentralizedAir datasetFedAvgCNNAccuracy, precision, recall, F-score, and confusion matrixPredicting chickpea crops for smart farming
2023[54]CentralizedCrop and
soil dataset
Federated learningAMFSCAnalysis rate, control rateAgricultural production improvement
2022[30]CentralizedCSE-CICIDS2018,
MQTTset, and
InSDN
Cyber-physical production system
(CPPS)-based aggregation
CNN, recurrent neural networks, and deep neural networksAccuracy, precision, recall, F-scoreIntrusion detection to enhance the security of agricultural IoT infrastructures
2022[55]CentralizedWind turbines dataFedAvgMSRAN and deep networkPrecision, recall, and
F-score
Fault detection in wind turbines
2022[56]CentralizedToN_IoTFedAvg and Fed+Multinomial
logistic regression
Accuracy, precision, recall, F-score, FPRIntrusion detection for IoT
2022[57]CentralizedSpinesagt2-
wdataset3
Federated contrastive learning optimization (FCLOpt)Dual attention gates (DAGs) and U-NetAccuracyFederated
learning-based
vertebral body
segment framework (FLVBSF)
2022[58]CentralizedN-BaIoTMini-batch
and multi-epoch aggregation, derived from FedAVG
Multilayer
perceptron and
autoencoder
Accuracy,
F-score
Federated learning for IoT malware detection
2022[59]CentralizedSWAT 2015SCADA server-based aggregationGBDT with
Paillier HE
AccuracyIntrusion detection for IoT prioritizing data confidentiality
2021[60]DecentralizedCWRUFA-FedAvgCNNAccuracyBearing fault diagnosis
Table 2. Example representation of instances in multi-label learning.
Table 2. Example representation of instances in multi-label learning.
SampleXY
S 1 x 11 x 12 x 1 K Y 1 = { y 2 , y 4 }
S 2 x 21 x 22 x 2 K Y 2 = { y 1 , y 3 , y 4 }
S N x N 1 x N 2 x N K Y N = { y 3 }
Table 3. Binary Relevance transformation of the multi-label dataset displayed in Table 2.
Table 3. Binary Relevance transformation of the multi-label dataset displayed in Table 2.
D y 1 XY D y 2 XY D y 3 XY D y 4 XY
S 1 [ x 11 x 1 K ] ¬ y 1 S 1 [ x 11 x 1 K ] y 2 S 1 [ x 11 x 1 K ] ¬ y 3 S 1 [ x 11 x 1 K ] y 4
S 2 [ x 21 x 2 K ] y 1 S 2 [ x 21 x 2 K ] ¬ y 2 S 2 [ x 21 x 2 K ] y 3 S 2 [ x 21 x 2 K ] y 4
S N [ x N 1 x N K ] ¬ y 1 S N [ x N 1 x N K ] ¬ y 2 S N [ x N 1 x N K ] y 3 S N [ x N 1 x N K ] ¬ y 4
Table 4. A brief overview of utilized datasets.
Table 4. A brief overview of utilized datasets.
IDRef.Dataset Name#Features#Instances#Labels#ClassesSourceLink (accessed on 16 March 2024)
1[124]Amphibians2318972,2,2,2,2,2,2UCIhttps://archive.ics.uci.edu/dataset/528/amphibians
2[125]Anuran-Calls-(MFCCs)22719534,8,10UCIhttps://archive.ics.uci.edu/dataset/406/anuran+calls+mfccs
3[126]HackerEarth-Adopt-A-Buddy1118,83423,4Kagglehttps://www.kaggle.com/datasets/mannsingh/hackerearth-ml-challenge-pet-adoption
Table 5. The information of Amphibians dataset.
Table 5. The information of Amphibians dataset.
Dataset AttributesTaskStudy DomainFeature Types#Instances#Features#Views
MultivariateClassificationBiologyInteger, real, nominal189236457
Table 6. The statistics of numerical features in Amphibians dataset.
Table 6. The statistics of numerical features in Amphibians dataset.
Feature NameMinMaxMeanModeStandard Deviation
SR30500,0009633.227530046,256.0783
NR1121.566111.5444
OR2510090.868910019.0996
Table 7. The description of all features in Amphibians dataset.
Table 7. The description of all features in Amphibians dataset.
NoAttributeTypeDescription
1IDIntegerIdentification number (unused in classification)
2MVCategoricalMotorway (unused in classification)
3SRNumericalSurface of water reservoir (m2)
4NRNumericalNumber of water reservoirs in habitat (The greater the number of reservoirs, the higher the probability that some of them will be proper for amphibian breeding).
5TRCategoricalType of water reservoirs (including reservoirs with natural features, lately formed reservoirs, settling ponds, reservoirs situated near residential areas, technological water reservoirs, etc.)
6VRCategoricalVegetation presence within the reservoirs (including absence of vegetation, sparse patches at the edges, densely overgrown areas, abundant vegetation within the reservoir, reservoirs entirely overgrown, etc.)
7SUR1CategoricalSurroundings 1 (the predominant land cover types surrounding the water reservoir)
8SUR2CategoricalSurroundings 2 (the second most prevalent types of land cover surrounding the water reservoir)
9SUR3CategoricalSurroundings 3 (the third most predominant types of land cover surrounding the water reservoir)
10URCategoricalUse of water reservoirs (unused by humans, recreational and scenic use, economic utilization, technological purposes)
11FRCategoricalThe presence of fishing (limited or occasional fishing, intensive fishing, breeding reservoirs)
12ORNumericalDegree of access from reservoir edges to undeveloped areas: no access, limited access, moderate access, extensive access to open space
13RROrdinalMinimum distance from the water reservoir to roads categorized as: <50 m, 50–100 m, 100–200 m, 200–500 m, 500–1000 m, >1000 m
14BROrdinalBuilding development as minimum distance to buildings <50 m, 50–100 m, 100–200 m, 200–500 m, 500–1000 m, >1000 m
15MRCategoricalMaintenance status of the reservoir (including clean, slightly littered, reservoirs heavily or very heavily littered)
16CRCategoricalType of shore (natural or concrete)
17Green frogsCategoricalPresence of green frogs (label 1)
18Brown frogsCategoricalPresence of brown frogs (label 2)
19Common toadCategoricalPresence of common toad (label 3)
20Fire-bellied toadCategoricalPresence of fire-bellied toad (label 4)
21Tree frogCategoricalPresence of tree frog (label 5)
22Common newtCategoricalPresence of common newt (label 6)
23Great crested newtCategoricalPresence of great crested newt (label 7)
Table 8. The information of Anuran-Calls-(MFCCs) dataset.
Table 8. The information of Anuran-Calls-(MFCCs) dataset.
Dataset AttributesTaskStudy DomainFeature Type#Instances#Features#Views
MultivariateClassification, clusteringBiologyReal7195225692
Table 9. The statistics of MFCC syllables in Anuran-Calls-(MFCCs) dataset.
Table 9. The statistics of MFCC syllables in Anuran-Calls-(MFCCs) dataset.
Feature NameMinMaxMeanModeStandard Deviation
MFCCs_1−0.25121.00000.98991.00000.0690
MFCCs_2−0.67301.00000.32361.00000.2187
MFCCs_3−0.43601.00000.31121.00000.2635
MFCCs_4−0.47271.00000.44601.00000.1603
MFCCs_5−0.63600.75220.1270No0.1627
MFCCs_6−0.41040.96420.0979No0.1204
MFCCs_7−0.53901.0000−0.0014No0.1714
MFCCs_8−0.57650.5518−0.0004No0.1163
MFCCs_9−0.58730.73800.1282No0.1790
MFCCs_10−0.95230.52280.0560No0.1271
MFCCs_11−0.90200.5230−0.1157No0.1868
MFCCs_12−0.79940.69090.0434No0.1560
MFCCs_13−0.64410.94570.1509No0.2069
MFCCs_14−0.59040.5757−0.0392No0.1525
MFCCs_15−0.71720.6689−0.1017No0.1876
MFCCs_16−0.49870.67070.0421No0.1199
MFCCs_17−0.42150.68120.0887No0.1381
MFCCs_18−0.75930.61410.0078No0.0847
MFCCs_19−0.68070.5742−0.0495No0.0825
MFCCs_20−0.36160.4678−0.0532No0.0942
MFCCs_21−0.43080.38980.0373No0.0795
MFCCs_22−0.37930.43220.0876No0.1234
Table 10. The distribution of instances per class in Anuran-Calls-(MFCCs) dataset.
Table 10. The distribution of instances per class in Anuran-Calls-(MFCCs) dataset.
LabelClass#Instances
FamilyBufonidae68
Dendrobatidae542
Hylidae2165
Leptodactylidae4420
GenusAdenomera4150
Ameerega542
Dendropsophus310
Hypsiboas1593
Leptodactylus270
Osteocephalus114
Rhinella68
Scinax148
SpeciesAdenomeraAndre672
AdenomeraHylaedactylus3478
Ameeregatrivittata542
HylaMinuta310
HypsiboasCordobae1121
HypsiboasCinerascens472
LeptodactylusFuscus270
OsteocephalusOophagus114
Rhinellagranulosa68
ScinaxRuber148
Table 11. The information of the HackerEarth-Adopt-A-Buddy dataset.
Table 11. The information of the HackerEarth-Adopt-A-Buddy dataset.
Dataset AttributesTaskStudy DomainFeature Type#Instances#Features#Views
MultivariateClassificationBiologyInteger, real, nominal,
temporal
18,834115605
Table 12. The description of all features in the HackerEarth-Adopt-A-Buddy dataset.
Table 12. The description of all features in the HackerEarth-Adopt-A-Buddy dataset.
No.AttributeTypeDescription
1pet_idIntegerA unique identifier is assigned to each animal up for adoption.
2issue_dateTemporalThe date when the pet was officially taken in by the shelter.
3listing_dateTemporalThe date and time when the pet became available for adoption at the shelter.
4conditionCategoricalThe health or physical state of the pet upon arrival at the shelter.
5color_typeCategoricalThe color pattern or combination exhibited by the pet.
6lengthRealThe measured length of the pet is typically in meters.
7heightRealThe measured height of the pet is typically in centimeters.
8X1IntegerThe value related with the pet.
9X2IntegerThe other value related with the pet.
10breed_categoryCategoricalThe category or classification of the pet’s breed.
11pet_categoryCategoricalThe category or species classification of the pet.
Table 13. The statistics of numerical features in the HackerEarth-Adopt-A-Buddy dataset.
Table 13. The statistics of numerical features in the HackerEarth-Adopt-A-Buddy dataset.
Feature NameMinMaxMeanModeStandard Deviation
length0.00001.00000.50260.08000.2887
height5.000050.000027.448821.400013.0198
X10.000019.00005.36960.00006.5724
X20.00009.00004.57731.00003.5178
Table 14. Performance metrics for various amphibian species in FMLL.
Table 14. Performance metrics for various amphibian species in FMLL.
AmphibiansAccuracyPrecisionTNRROCPRCRecallF-Score
Green frogs68.780.6940.6880.7150.6820.6880.689
Brown frogs78.310.6130.7830.5030.6650.7830.688
Common toad71.430.7120.7140.6210.6530.7140.674
Fire-bellied toad70.370.6690.7040.5760.6120.7040.650
Tree frog65.610.6390.6550.6380.6270.6560.631
Common newt69.840.6580.6980.5280.6030.6980.619
Great crested newt88.360.7900.8840.5390.8180.8840.834
Average73.240.6820.7320.5890.6660.7320.684
Table 15. Performance metrics for Anuran-Calls-(MFCCs) classification in FMLL.
Table 15. Performance metrics for Anuran-Calls-(MFCCs) classification in FMLL.
Anuran-Calls-(MFCCs)AccuracyPrecisionTNRROCPRCRecallF-Score
Family95.750.9570.9800.9780.9640.9570.957
Genus94.190.9410.9910.9790.9430.9420.941
Species93.550.9350.9920.9830.9350.9360.935
Average94.500.9440.9880.9800.9470.9450.944
Table 16. Performance metrics for categories of HackerEarth-Adopt-A-Buddy dataset in FMLL.
Table 16. Performance metrics for categories of HackerEarth-Adopt-A-Buddy dataset in FMLL.
HackerEarth-Adopt-A-BuddyAccuracyPrecisionTNRROCPRCRecallF-Score
Breed_category85.430.8560.9270.9650.9380.8540.850
Pet_category86.800.8690.9280.9460.9280.8680.865
Average86.120.8630.9280.9560.9330.8610.858
Table 17. The comparison of FMLL with state-of-the-art methods using the Amphibians dataset.
Table 17. The comparison of FMLL with state-of-the-art methods using the Amphibians dataset.
MethodAccuracy
Gradient-Boosted Trees (GBT) [124]64.18
Random Forest (RF) [124]57.54
AdaBoost (ADA) [124]60.01
Decision Tree (DT) [124]58.37
Partially Monotonic Decision Tree (PMDT) [128]71.50
Average62.32
Proposed (FMLL with BR and REPTree)73.24
Table 18. The comparison of FMLL with state-of-the-art methods [125] using the Anuran-Calls-(MFCCs) dataset.
Table 18. The comparison of FMLL with state-of-the-art methods [125] using the Anuran-Calls-(MFCCs) dataset.
MethodPrecisionRecallF-Score
Species
KNN-Flat 0.6900.7200.700
RBF-SVM-Flat 0.8500.5400.660
Polynomial-SVM-Flat 0.7100.7600.740
Tree-Flat 0.4900.5000.500
KNN-LCPL 0.6910.7190.705
KNN-Hierarchical-LCPN 0.6900.7200.700
RBF-SVM- Hierarchical-LCPN 0.8400.5400.650
Polynomial-SVM-Hierarchical-LCPN 0.6800.7100.700
Tree-Hierarchical-LCPN 0.5700.5600.560
KNN-Hierarchical-LCPL 0.6900.7200.700
RBF-SVM- Hierarchical-LCPL 0.8300.5200.640
Polynomial-SVM-Hierarchical-LCPL 0.6900.7400.720
Tree-Hierarchical-LCPL 0.4700.5000.490
Proposed (FMLL with BR and REPTree)0.9350.9360.935
Family
KNN-LCPL 0.7130.8200.763
Proposed (FMLL with BR and REPTree)0.9570.9570.957
Genus
KNN-LCPL 0.6630.7310.695
Proposed (FMLL with BR and REPTree)0.9410.9420.941
Species + Family + Genus
KNN-LCPL 0.6890.7570.721
Proposed (FMLL with BR and REPTree)0.9440.9450.944
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ghasemkhani, B.; Varliklar, O.; Dogan, Y.; Utku, S.; Birant, K.U.; Birant, D. Federated Multi-Label Learning (FMLL): Innovative Method for Classification Tasks in Animal Science. Animals 2024, 14, 2021. https://doi.org/10.3390/ani14142021

AMA Style

Ghasemkhani B, Varliklar O, Dogan Y, Utku S, Birant KU, Birant D. Federated Multi-Label Learning (FMLL): Innovative Method for Classification Tasks in Animal Science. Animals. 2024; 14(14):2021. https://doi.org/10.3390/ani14142021

Chicago/Turabian Style

Ghasemkhani, Bita, Ozlem Varliklar, Yunus Dogan, Semih Utku, Kokten Ulas Birant, and Derya Birant. 2024. "Federated Multi-Label Learning (FMLL): Innovative Method for Classification Tasks in Animal Science" Animals 14, no. 14: 2021. https://doi.org/10.3390/ani14142021

APA Style

Ghasemkhani, B., Varliklar, O., Dogan, Y., Utku, S., Birant, K. U., & Birant, D. (2024). Federated Multi-Label Learning (FMLL): Innovative Method for Classification Tasks in Animal Science. Animals, 14(14), 2021. https://doi.org/10.3390/ani14142021

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop