Federated Multi-Label Learning (FMLL): Innovative Method for Classification Tasks in Animal Science

Ghasemkhani, Bita; Varliklar, Ozlem; Dogan, Yunus; Utku, Semih; Birant, Kokten Ulas; Birant, Derya

doi:10.3390/ani14142021

Open AccessFeature PaperArticle

Federated Multi-Label Learning (FMLL): Innovative Method for Classification Tasks in Animal Science

by

Bita Ghasemkhani

¹

,

Ozlem Varliklar

²

,

Yunus Dogan

²

,

Semih Utku

²,

Kokten Ulas Birant

^2,3 and

Derya Birant

^2,*

¹

Graduate School of Natural and Applied Sciences, Dokuz Eylul University, Izmir 35390, Turkey

²

Department of Computer Engineering, Dokuz Eylul University, Izmir 35390, Turkey

³

Information Technologies Research and Application Center (DEBTAM), Dokuz Eylul University, Izmir 35390, Turkey

^*

Author to whom correspondence should be addressed.

Animals 2024, 14(14), 2021; https://doi.org/10.3390/ani14142021

Submission received: 6 May 2024 / Revised: 2 July 2024 / Accepted: 8 July 2024 / Published: 9 July 2024

(This article belongs to the Special Issue Animal Production in the Artificial Intelligence Era: Advances and Applications)

Download

Browse Figures

Versions Notes

Abstract

Simple Summary

This study addresses the classification task in animal science, which helps organize and analyze complex data, essential for making informed decisions. It introduces Federated Multi-Label Learning (FMLL), a novel approach combining federated learning principles with a multi-label learning technique. Using machine learning strategies, FMLL achieved significant improvements in classification accuracy metrics compared to existing methods. The experimental results on different animal datasets demonstrated the effectiveness of FMLL and its superiority in multi-label classification tasks. The findings of our study offer valuable insights into understanding and managing animal populations, which could have important implications for biodiversity conservation and ecological management.

Abstract

Federated learning is a collaborative machine learning paradigm where multiple parties jointly train a predictive model while keeping their data. On the other hand, multi-label learning deals with classification tasks where instances may simultaneously belong to multiple classes. This study introduces the concept of Federated Multi-Label Learning (FMLL), combining these two important approaches. The proposed approach leverages federated learning principles to address multi-label classification tasks. Specifically, it adopts the Binary Relevance (BR) strategy to handle the multi-label nature of the data and employs the Reduced-Error Pruning Tree (REPTree) as the base classifier. The effectiveness of the FMLL method was demonstrated by experiments carried out on three diverse datasets within the context of animal science: Amphibians, Anuran-Calls-(MFCCs), and HackerEarth-Adopt-A-Buddy. The accuracy rates achieved across these animal datasets were 73.24%, 94.50%, and 86.12%, respectively. Compared to state-of-the-art methods, FMLL exhibited remarkable improvements (above 10%) in average accuracy, precision, recall, and F-score metrics.

Keywords:

animals; machine learning; multi-label learning; federate learning; Federated Multi-Label Learning; Binary Relevance; Reduced-Error Pruning Tree

1. Introduction

Animal science is an area where machine learning (ML) has proven effective in analyzing animal datasets and making predictions for future decisions. ML techniques have been utilized for different purposes such as animal health surveillance, outlier animal behavior detection, animal activity recognition, animal detection systems, and animal species classification. Moreover, multi-label learning as a subfield of ML has gained traction in animal science for handling complex scenarios where multiple labels need to be predicted simultaneously [1,2,3]. Furthermore, combining multi-label classification with federated learning (FL) enables distributed and privacy-preserving machine learning applications. Recent studies have demonstrated the effectiveness of FL in animal science initiatives, including federated frameworks for diagnosing and predicting animal diseases, monitoring animal welfare, predicting collaborative disease outbreaks, and implementing decentralized systems for animal tracking and detection [4,5,6]. These advancements highlight the potential of federated multi-label learning to revolutionize animal science by integrating robust predictive modeling with secure data-sharing mechanisms.

Federated learning is a collaborative ML approach that was introduced in 2016 [7]. In the FL framework, multiple clients work together to address machine learning problems, overseen by a central aggregator. This setup ensures that training data remains decentralized, safeguarding the privacy of each client’s data. In this framework, client data remains stored locally, and local models are trained in multiple nodes. Gaining popularity in recent years, this kind of distributed machine-learning technique builds a central model by aggregating local models, thereby reducing the computational complexity of training [8]. Consequently, federated learning proves highly beneficial in resolving privacy issues associated with data islands and holds promise for deployment across diverse edge devices [9,10].

Multi-label learning is a sophisticated machine learning paradigm that extends traditional classification techniques by allowing instances to be associated with multiple labels simultaneously. Unlike conventional single-label classification tasks where each instance is assigned to a single class, multi-label learning builds a model in which instances may exhibit multiple attributes or characteristics. This paradigm finds widespread application in domains where instances are inherently multi-faceted, such as image recognition [11], text classification [12], and biology [13]. For example, in biology classification tasks, multi-label learning can be applied to predict the functions of elements based on their multiple roles within biological pathways. The multi-label learning algorithm aims to capture the complex relationships between instances and their associated labels, finding applications across other fields e.g., animal [14], healthcare [15], social media [16], geoscience [17], transportation [18], and more, where data instances may belong to various classes at the same time.

Multi-label learning entails its own set of challenges. One common challenge is the increased complexity of model training and evaluation processes since multi-label datasets typically exhibit larger sizes and greater complexity compared to single-label datasets. Another challenge is that the presence of multiple labels can further complicate the learning process and require specialized algorithms. To tackle these obstacles, researchers have developed a solution, namely the binary relevance (BR) approach, which streamlines the learning process and facilitates the utilization of standard binary classifiers, such as support vector machines [19]. Additionally, techniques such as label powersets and classifier chains, have been proposed to tackle different aspects of the multi-label learning problem.

The Reduced-Error Pruning Tree (REPTree) algorithm is another method employed in machine learning, particularly in the context of decision tree-based classification tasks. REPTree aims to construct an optimal decision tree by iteratively pruning branches that do not contribute significantly to reducing classification error [20]. REPTree has applications in various domains such as animal [21], environment [22], healthcare [23], and education [24]. When considering multi-label classification tasks, REPTrees can serve as effective binary classifiers within the binary relevance framework. Each REP Tree can be trained independently to predict the absence or presence of a specific label, utilizing its pruning mechanism to optimize classification performance. They are simple yet powerful solutions, leveraging decision tree structures while handling the complexity of multiple labels per instance, to provide interpretable models that can manage both categorical and numerical data, making them suitable for a broad range of real-world problems.

The exploration of federated learning and multi-label learning, particularly in conjunction with methodologies such as the binary relevance approach and REPTree, remains relatively uncharted territory in the literature. Thus, in response to the evolving landscape of distributed data and complex classification tasks, we propose a novel approach, Federated Multi-Label Learning (FMLL) for classification tasks in the current study. Drawing upon established methodologies, namely Binary Relevance and Reduced-Error Pruning Tree (REPTree) approaches, our method aims to combine the strengths of federated learning and multi-label concepts to address the challenges inherent in distributed environments and multi-dimensional classification problems. The primary contributions of this study, setting it apart from other classification methods, are as follows:

(i): The paper presents the first-of-its-kind Federated Multi-Label Learning (FMLL) method that combines federated learning principles with the Binary Relevance approach as a multi-label learning technique and uses the REPTree algorithm to address classification tasks where instances may belong to multiple classes simultaneously.
(ii): FMLL contributes significantly to the field of animal science by offering a novel methodology for classifying diverse animal datasets. This advancement enables more accurate and efficient classification of animals based on various attributes, aiding researchers and practitioners in better understanding and managing animal populations.
(iii): FMLL harnesses federated learning principles, allowing multiple nodes to collaboratively train a model using their own local data. This provides the distribution of computational complexity over multiple nodes to improve efficiency and ensures privacy preservation and data security, which are crucial considerations in animal science research where large sensitive data may be involved.
(iv): The proposed approach adopts the Binary Relevance (BR) strategy to effectively handle the multi-label nature of the data. By accurately classifying instances belonging to multiple classes, FMLL enhances the understanding of complex relationships and characteristics within animal species datasets.
(v): FMLL pioneers the use of the Reduced-Error Pruning Tree (REPTree) classifier within federated learning, marking the first instance in the literature. The REPTree was chosen for its effectiveness in addressing the complexities of multi-label classification tasks. This approach enhances both the accuracy and interpretability of classification results, representing a significant advancement in machine learning techniques applied to animal science.
(vi): The effectiveness of FMLL is empirically validated through experiments conducted on three diverse datasets within the domain of animal science: Amphibians, Anuran-Calls-(MFCCs), and HackerEarth-Adopt-A-Buddy. These experiments demonstrated the applicability and efficacy of FMLL in real-world scenarios, showcasing significant improvements in classification accuracy.
(vii): FMLL achieved remarkable improvements in classification accuracy across various animal datasets when compared to existing state-of-the-art methods. For instance, on the Amphibians dataset, FMLL achieved an average accuracy improvement of 10.92%. This improvement highlights the practical relevance and superiority of FMLL in multi-label classification tasks within the domain of animal science.

The structure of this paper unfolds as follows: Section 2 provides a concise review of related works, followed by Section 3, where we detail the materials and methods employed. Section 4 presents the experimental studies conducted, while Section 5 discusses the obtained results. Section 6 elucidates the conclusions drawn from our findings and delineates potential directions for future research on the proposed method.

2. Related Works

Lately, a plethora of researchers have committed their endeavors to developing federated learning (FL) techniques, aiming to bolster the efficacy of machine learning (ML) models. FL has found applications across different domains including health [25,26,27,28], agriculture [29,30,31,32], security [33,34,35,36], environment [37,38], animals [39,40,41], industries [42,43,44], transportation [45,46,47], and education [48,49,50,51]. For example, in the domain of health [28], a federated learning approach was introduced for the client end of health service providers. Their method incorporates modified artificial bee colony optimization and support vector machine techniques to enhance the accuracy of cardiovascular disease classification. In agriculture [31], a federated learning-based entropy model was presented to assess food safety by quantifying risk levels associated with pesticide residues in agricultural products. In security [36], the integration of homomorphic encryption into the privacy-preserving federated learning algorithm was implemented to empower centralized servers to securely aggregate encrypted local model parameters. In the environmental domain [40], a novel federated learning framework for animal activity recognition (FedAAR) was proposed to address the challenges of sensor-based animal monitoring systems through decentralized data from several farms.

Table 1 presents an overview of federated learning frameworks [30,52,53,54,55,56,57,58,59,60], offering insights to better understand the contributions made in this field. Various machine learning methods have been employed in previous studies, including the sparrow search algorithm (SSA) [52], the differential privacy Laplace mechanism (DPLA) [52], the amendable multi-function sensor control method (AMFSC) [54], and the multiscale residual attention network (MSRAN) [55]. While most studies [57,58,59,60] evaluated the results using the accuracy metric, some of them [30,53,55,56] also utilized F-score, precision, recall metrics, and others [52,53,56] used different indicators like the confusion matrix, false positive rate (FPR), mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R-squared).

Federated learning has been applied successfully across a broad spectrum of machine learning algorithms, including decision trees (DTs) [61], artificial neural networks (ANNs) [62], support vector machines (SVMs) [63], logistic regression (LR) [64], random forests (RFs) [65], and k-nearest neighbors (KNNs) [66]. These implementations have demonstrated the versatility and flexibility of federated learning techniques in diverse settings. Furthermore, different types of multi-based methodologies have emerged within the realm of federated learning, each aiming to address specific requirements and challenges. These similar methodologies to our FMLL method include multi-dimensional federated learning [67], multi-objective federated learning [68], multi-modal federated learning [69], multi-level federated edge learning [70], multi-model federated learning [71], and multi-participant multi-class vertical federated learning (MMVFL) [72], all specifically designed for multi-class classification tasks. Despite the breadth of research in federated learning, there remains a notable gap in the literature regarding multi-label-based federated learning approaches, indicating an area ripe for further exploration and development, particularly valuable in animal-related scenarios where data instances may simultaneously belong to multiple classes.

Multi-label learning challenges the traditional notion of assigning items to a single class and allows items to belong to multiple classes at the same time. This distinction underscores the complexity of classification tasks in modern data analysis. While single-label classification remains fundamental, multi-label classification has emerged as a crucial technique in various domains [73]. However, achieving high accuracy in multi-label classification presents a formidable hurdle, as accurately predicting multiple labels for each item demands sophisticated algorithms. Researchers have offered diverse solutions to handle the intricacies of multi-label classification tasks, including binary relevance (BR) [74], which treats each label as a separate binary classification task, and label powerset (LP) [75], which considers each unique combination of labels as a single class. Classifier chains (CCs) [76] sequentially train multiple binary classifiers, while random k-labelsets (RAkELs) [77] randomly partition the label space into subsets for classification. The ensemble of classifier chains (ECC) [78] combines multiple classifier chains for improved performance.

The multi-label k-nearest neighbors (ML-kNNs) method [79] adapts the k-nearest neighbor algorithm for multi-label classification. Pairwise coupling (PC) [80] trains a binary classifier for each pair of labels, while the majority of label sets [81] predict the most frequent label subset among training instances. Deep learning architectures, such as convolutional neural networks (CNNs) [82], recurrent neural networks (RNNs) [83], and graph neural networks (GNNs) [84], are powerful tools designed specifically for multi-label classification tasks. Additionally, hybrid approaches integrate various techniques to leverage the strengths of different methods, providing robustness in dealing with multi-label classification problems across diverse domains and related datasets, such as transfer learning-based multi-label classification [85], rule-based multi-label classification (MLC) [86], meta-learning based multi-instance multi-label learning (MetaMIML) [87], multi-label long short-term memory (LSTM) [88], the multi-label generative adversarial network (ML-CookGAN) [89], and so on. By reviewing these varied methodologies, valuable insights are gained into the evolving landscape of multi-label learning research in this study.

Recently, research has demonstrated the effectiveness of the REPtree in various machine learning-based tasks, including the rotational forest and reduced-error pruning trees (RTF-REPTree) approach in forest loss analysis [90], the ensemble models of REPTree in geospatial analysis [91], the combination of REPTree, additive regression (AR), regression by discretization (RD), and random committee (RC) models to predict the quality of river waters [92], the utilization of REPTree for air quality monitoring [93], the employment of REPTree in predicting landslide susceptibility (LSM) [94,95], the social engagement analysis of students during the COVID-19 pandemic through REPTree [96], the REPTree-based estimation of evapotranspiration (ETo) from the reference surface in agricultural planning [97], the enhancement of security in industrial internet of things (IIoT) to mitigate cyber-attacks via the REPTree and other ML algorithms [98], and the analysis of fear-inducing factors using the REPTree in reaction to the omicron variant of the coronavirus amidst academic societies [99]. While numerous types of decision trees, including GBDT [100,101,102,103,104,105,106], XGBoost [107,108,109,110,111,112,113,114,115,116,117], RF [118,119,120], and Extra Trees [121], have been utilized within federated learning methods, the literature notably lacks references to the REPtree. Renowned for its proficiency in handling noisy data and its interpretability, the REPtree holds promise for providing distinct advantages in federated learning.

It is noteworthy to consider that the classification of decision tree aggregation encompasses two primary groups, namely, aggregation decision trees and selecting decision trees, each with distinct methodologies. In the aggregation decision tree category, four types are delineated, including structured-based, weight-based, logic-based, and dataset-based approaches. Structured-based aggregation involves organizing decision trees hierarchically and then amalgamating different layers, thereby classifying samples within sub-nodes based on this hierarchical structure. Weight-based aggregation comprises treating divisions within the tree as sets and aggregating the weight values associated with samples in each set. Logic-based aggregation constructs decision trees as sets of logical rules, subsequently aggregating the logical expressions derived from these rules. Dataset-based aggregation entails fitting the outcomes of multiple decision trees onto a comprehensive dataset. In contrast, choosing decision trees involves iteratively selecting a single tree that optimally encapsulates the information across all the datasets, thereby serving as the global model. This systematic approach for decision tree aggregation and selection facilitates robust modeling across diverse datasets and problem domains [61].

While the REPtree has shown remarkable effectiveness across various machine learning tasks, including those mentioned earlier, its potential within the realm of federated learning and multi-label learning, particularly when combined with the binary relevance approach, remains relatively unexplored. Federated learning, which enables distributed model training across multiple components while keeping data decentralized, presents a powerful framework for effectively integrating algorithms like the REPtree. Similarly, multi-label learning is used to predict multiple labels for a single instance and could benefit from the proficiency of the REPtree. However, the intersection of these fields with the REPtree has yet to be deeply investigated, representing an intriguing avenue for further research in the current study.

3. Materials and Methods

3.1. Proposed Approach

This paper proposes a federated-learning-based approach that trains data distributed on the nodes and learns a global model by aggregating locally trained models. This innovative strategy aimed to revolutionize the traditional model of machine learning by decentralizing the training process. Instead of gathering user data into a centralized repository, it implements a distributed approach where each device independently trains a predictive model using locally stored data. The central server aggregates local models, refining the predictive capabilities of the model. This innovative technique not only enhances the performance of machine learning applications but also sets a new standard for privacy-preserving machine learning practices in diverse applications and industries.

Federated learning encompasses three primary steps: global model and constraints initialization, local training, and model aggregation. Notably, only the second step belongs to the local participants, while the remaining two are handled on the aggregation server side. Consider synchronized algorithms for federated learning, where a standard round entails the following sequence of steps: Firstly, a subset of clients is selected. Subsequently, each client builds or updates its local model based on its local private data. Then, the local models from these clients are transmitted to the server. Finally, the server aggregates these models to construct an enhanced global model. Hereby, a model resembling a traditionally centralized machine learning model is jointly constructed in an efficient way. Moreover, federated learning offers several notable advantages. Firstly, it enhances data privacy by retaining data on the client, thereby safeguarding sensitive information. Disclosure control mechanisms, such as differential privacy and homomorphic encryption, can be employed to further protect data during the exchange of model updates. Additionally, it enhances efficiency by distributing model training across multiple clients, allowing for parallelized and accelerated learning processes [122].

The federated learning architecture encompasses various approaches tailored to different data distribution scenarios: horizontal federated learning (HFL), vertical federated learning (VFL), and federated transfer learning (FTL). In HFL, local datasets may have the same feature space and different sample spaces. Each node trains a local model using its respective data, and the local models or outputs are then transmitted to a central server. The server aggregates these results and gives a response to the user, facilitating collaborative model training. Conversely, VFL utilizes vertical data partitioning, where the datasets of each client may have the same sample space and different feature spaces. This setup allows the ability to build an accurate model as participants retain their data and models locally, exchanging intermediate computation results with the server. FTL introduces a hybrid approach to data partitioning, characterized by a common sample space and different feature spaces. This setup is particularly useful for scenarios where there is minimal overlap in both data features and data samples among participants. FTL enables knowledge transfer across heterogeneous datasets by leveraging pre-trained models or representations from one domain to enhance learning in another domain, thereby maximizing the utility of disparate data sources [123]. Each federated learning approach offers distinct advantages and is tailored to specific data distribution characteristics, ensuring flexibility and scalability in addressing diverse real-world scenarios while maintaining data privacy and efficiency. In this study, we specifically employed VFL due to its ability to leverage the same sample space with differing target label features, which enriches the information about samples and facilitates the construction of multiple binary classifiers for multiple labels. In other words, this approach ensures that the number of instances for each client is equal, and therefore balanced as well.

In the binary relevance approach, the multi-label problem is decomposed into several binary classification tasks. Here, each label is handled as an independent binary classification task. This means that a separate binary classifier is trained on each client node to predict its presence or absence for a given instance. In other words, the number of client nodes is equal to the number of labels in the dataset. Therefore, label size impacts the addition or removal of client nodes in the final model. Consequently, the output of the binary classifiers is a set of binary predictions, one for each label. In addition to its simplicity, the binary relevance approach offers several advantages. It allows for the utilization of standard binary classifiers, shortens the learning process, and provides interpretability as the prediction of each label is independent of others. However, one potential drawback of the binary relevance approach is that it does not consider the correlations between labels, which could be important in certain applications. While our datasets do not require correlated labels, making this limitation less impactful in our context, it is worth noting for other potential applications. As a solution, the classifier chains method can be employed, which passes label information between classifiers and incorporates label correlations. This approach effectively captures label dependencies and addresses the limitations of the binary relevance method, potentially enhancing performance in scenarios where label correlations are significant.

In the proposed system, as shown in Figure 1, a central node collaborates with several local nodes (or clients) as the standard step of federated learning. In the architecture, the method manages instances with multiple labels, such as label 1 to label q, resulting in a multi-label dataset as the input. Initially, preprocessing operations are conducted to clean, manipulate, and prepare the data. Subsequently, dataset decomposition is performed to transform the multi-label dataset into multiple binary datasets, following the binary relevance approach. This decomposition yields datasets 1 to q, where instances possess binary labels—for example, dataset 1 indicates whether label 1 exists or not. These transformed datasets serve as local data on local nodes, acting as local clients within the federated learning framework. In the training phase, the REPTree algorithm is applied to each dataset, generating local models on local nodes—tree 1 corresponds to dataset 1, and so forth. Following this, in the central node, local models are aggregated to create a global model. After that, model evaluation takes place, where its performance is assessed using metrics such as accuracy, precision, recall, and F-score. This step ensures that the collective knowledge from the local models is effectively integrated. The final model in the central node facilitates predictions based on the input query data. This integrated approach offers a comprehensive solution for handling multi-label datasets within a federated learning context, providing scalability and efficiency while maintaining model performance.

3.2. Formal Description

Traditional supervised learning algorithms operate within the framework of single-label scenarios, where each sample in the training set is related to a sole label defining its characteristics. In contrast, multi-label learning algorithms deal with samples in the training set that are concurrently linked to multiple labels. The objective of multi-label learning is to predict the appropriate label set for unseen samples, which may encompass more than one label per. Here, the definition of multi-label learning is formally established. Given

D

as the training set comprising

N

samples

S_{i} = (x_{i}, Y_{i})

, where

i = 1,2, \dots, N

, each sample

S_{i}

is paired with a feature vector

x_{i} = (x_{i 1}, x_{i 2}, \dots x_{i K})

having

K

elements and a subset of labels

Y_{i} \subseteq L

, where

L = \{y_{j}| j = 1 t o q}

represents the set of q probable labels. This representation is depicted in Table 2. In this context, the objective of a multi-label learning algorithm is to construct a global model

G

that, given an unlabeled instance

S = (x, ?)

, precisely predicts its subset of labels

Y

, denoted as

G (S) \to Y

, where

Y

represents the labels associated with the sample

S

.

Table 2 illustrates a multi-label dataset where each sample

S

is associated with a subset of labels denoted by

Y

. For instance,

S_{1}

is associated with the label set

Y_{1}

containing

y_{2}

and

y_{4}

, indicating that this instance possesses both labels

y_{2}

and

y_{4}

. It is noteworthy to regard that the outputs from all classifiers are combined with the concatenate operator. Here, the label set

Y_{1}

includes the concatenation of both labels

y_{2}

and

y_{4}

. Similarly, the sample

S_{2}

belongs to

y_{1}

,

y_{3}

, and

y_{4}

classes simultaneously, given with a concatenate operator. These representations showcase the multi-label nature of the dataset, where instances may have multiple associated labels simultaneously.

The binary relevance method represents a problem transformation approach that breaks down a multi-label classification task into multiple single-label binary classification problems, each corresponding to one of the

q

labels in the set

L = {y_{1}, y_{2}, {\dots, y}_{q}}

. Primarily, this method converts the initial multi-label training dataset into

q

binary datasets

D_{y_{j}}

,

j = 1,2, \dots, q

, where

D_{y_{j}}

encompasses all samples from the initial dataset but with a singular positive or negative label attributed to the label

y_{j}

based on the true label subset related to each sample. In essence, a label is considered positive if it is included in the label set containing

y_{j}

; if not, it is considered negative. Following this transformation of the multi-label data, a collection of

q

binary classification models

M_{j}

, where

j = 1, 2, \dots, q

, is then developed using the respective datasets

D_{y_{j}}

. Finally, the local

q

models are aggregated to create the global model

G

, as indicated by Equation (1):

G = \{M_{j} ((x, y_{j})) \to {y_{j}}^{'} \in \{0,1\} | y_{j} \in L : j = 1 . . . q\}

(1)

To elucidate the fundamental concept of the binary relevance transformation procedure, Table 3 showcases the four binary datasets formed subsequently to transform the multi-label dataset as depicted in the preceding Table 2. In this context, the class attribute can take on two potential values: “present”, denoted as

y_{j}

, or “not present”, represented as

\neg y_{j}

. Each row in Table 3 corresponds to a sample

(S_{1}, S_{2}, \dots, S_{N})

from the original dataset, while each target column represents a distinct label

(y_{1}, y_{2}, y_{3}, y_{4})

. Through this transformation, the binary datasets are constructed by discerning the presence or absence of individual labels for each sample. For instance, the positive indicators (

y_{j}

) signify the presence of a label, while negative indicators (

{\neg y}_{j}

) indicate its absence. By comparing Table 3 with Table 2, it becomes evident how the labels associated with each example are encoded into binary attributes, simplifying the classification task. For instance,

S_{2}

in Table 2 is associated with

y_{1}

,

y_{3}

, and

y_{4}

, which is reflected in Table 3 by the presence of

y_{1}

,

y_{3}

, and

y_{4}

, respectively, and the absence of

{\neg y}_{2}

. This transformation facilitates the utilization of conventional binary classification algorithms to handle multi-label classification tasks more effectively.

The Binary Relevance (BR) method is employed to classify new multi-label samples by aggregating labels positively identified by independent binary classifiers. An inherent advantage of the BR approach lies in its low computational complexity relative to other multi-label methods. Specifically, for a fixed number of samples, the scalability of BR is directly proportional to the size (

q

) of the label set (

L

). Given that the complexity of the base classifiers is constrained to

O (C)

, the overall complexity of BR becomes

q * O (C)

. As a result, the BR method proves to be particularly suitable for scenarios where the value of

q

is not excessively large. However, given the prevalence of numerous labels across various domains, alternative methods, such as divide and conquer approaches, have emerged to establish labels into a tree-shaped hierarchy, allowing for the management of a substantially smaller set of labels in comparison with

q

.

Algorithm 1 is devised to address the Federated Multi-Label Learning (FMLL) method through a structured approach divided into two main phases. The client learning process begins with data preparation, given the dataset

D

comprising

N

instances represented as

(x_{i}, Y_{i})

, where

x_{i}

denotes the feature vector and

Y_{i}

represents the associated labels, along with

q

as the number of nodes (or the number of class labels) and

M_{j}

as the local models for each label. The dataset D is partitioned into q binary datasets based on the presence of each class label

y_{j}

. Each node generates local datasets

D_{y_{j}}

, marking instances as 1 if

y_{j}

is present in

Y_{i}

and 0 otherwise, and stores them locally. Subsequently, in local model training, each node independently trains local models

M_{j}

using the REPTree algorithm on their respective binary datasets

D_{y_{j}}

. These trained models

M_{j}

are then transmitted to the central server for further processing. The server aggregation process integrates the received local models

M_{j}

to construct a unified global model

G

through the model aggregation approach. The central server combines these models to form

G

, representing a comprehensive synthesis of knowledge from all nodes. Using this global model, the algorithm performs classification tasks on the test set

T

. For each instance

x

in

T

, predictions are made by aggregating outputs from all local models, resulting in the final predicted label set

\hat{Y}

. Thus, the algorithm provides a systematic approach to federated multi-label learning by incorporating distinct client learning and server aggregation processes. This structured methodology ensures robustness and reproducibility in handling distributed datasets and synthesizing global models, essential for effective multi-label prediction across decentralized environments.

Algorithm 1: Federated Multi-Label Learning (FMLL)

1. Client Learning Process
Inputs:

D

: dataset

D

= {(x_{i}, Y_{i})}_{i = 1}^{N}

q

= number of nodes (number of class labels at the same time)
Outputs:

M_{j}

: local models for each label

1.1. Data Preparation
Begin
for

j

= 1 to

q

foreach

(x_{i}, Y_{i})

in

D

// Generate binary datasets
if (

y_{j} \in Y_{i}

)

D_{y_{j}} . A d d (x_{i}, 1)

else

D_{y_{j}} . A d d (x_{i}, 0)

endif
end foreach
Store (

D_{y_{j}})

// Store local data at the node j
end for
1.2. Local Model Training
for

j

= 1 to

q

M_{j}

=

R E P T r e e (D_{y_{j}})

// Train local models at each node in parallel
Send (

M_{j}

) // Send

M_{j}

to the central server
end for
End

2. Server Aggregation Process
Inputs:

M_{j} :

local models from each client for each class label

q

= number of nodes (number of class labels at the same time)

T

: test set that will be predicted
Outputs:
G: global model

\hat{Y} :

predicted labels for test set

2.1. Model Aggregation
Begin

G

= Ø
for

j

= 1 to

q

Receive (

M_{j}

) // Receive local models

M_{j}

from each client

G

=

G \cup M_{j}

// Aggregate the models to form the global model

G

end for
2.2. Classification
foreach

x

in

T

for

j

= 1 to

q

y

=

G (x)

\hat{Y}

=

\hat{Y} \cup y

//

Predict y

using the global model

G

end for
end foreach
End

4. Experimental Studies

4.1. Dataset Description

The study of animals in their natural habitats is fundamental to our understanding of ecological dynamics, biodiversity conservation, and species management. Animal behavior, physiology, and interactions with their environment provide invaluable insights into the functioning of ecosystems and the intricate balance of life on our planet. In this paper, we harness the richness of animal-related datasets to evaluate the efficacy of our proposed Federated Multi-Label Learning (FMLL) method within the vibrant field of animal research. Table 4 provides a summarized overview of these datasets utilized in the current study. In this table, the respective number of classes is represented for each label in the datasets.

4.1.1. Amphibians

The Amphibians Habitat Classification dataset, briefly presented in Table 5, is collected from a combination of geographic information systems (GIS), satellite imagery, and field inventories conducted as part of environmental impact assessments (EIAs) for two planned road projects, including Road A and Road B in Poland [124]. Amphibians, as crucial animal indicators of environmental health and ecosystem integrity due to their sensitivity to environmental changes, play a vital role in assessing the impact of infrastructure projects on biodiversity, particularly within their habitat. Integrating GIS and satellite information with data collected from natural inventories, field research was directed within a 500-m-wide strip on both sides of the proposed project area for Road A, identifying 80 amphibian breeding sites, while Road B’s inventory focused on the vicinity of two variants of the planned Beskidy Integration Way, covering approximately 60 km and resulting in the identification of 109 amphibian occurrence sites through map analysis, field observations, a literature review, and archive data analysis. The dataset comprises multiple variables, contributing to a comprehensive understanding of amphibian habitats within the realm of biology. It was primarily generated for classification tasks, capturing diverse environmental characteristics relevant to amphibian habitat suitability.

This multivariate dataset with 189 samples and 23 features provides valuable insights into the ecological implications of road infrastructure development on amphibian populations, facilitating biodiversity conservation and informed decision-making in environmental management with the aim of predicting the existence of seven different animals, namely green frogs, brown frogs, common toads, fire-bellied toads, tree frogs, common newts, and great crested newts with labels one to seven, respectively. The dataset encompasses three distinct numerical features, as detailed in Table 6, showcasing their statistical attributes such as minimum, mean, maximum, mode, and standard deviation. Additionally, Table 7 comprehensively explains all features, providing deeper insight into the instances collected.

4.1.2. Anuran-Calls-(MFCCs)

The Anuran-Calls-(MFCCs) dataset [125] comprises acoustic features extracted from syllables of anuran (frogs) calls, accompanied by multi-label annotations indicating their family, genus, and species, as represented in Table 8. With a total of 7195 instances, this multivariate dataset has been extensively utilized in various classification and clustering tasks, particularly within the realm of biology. Furthermore, the dataset incorporates 22 separate numerical features, elaborated in Table 9, and highlights their statistical characteristics, including maximum, minimum, mean, mode, and standard deviation. Its completeness and reliability are attributed to the absence of missing values, markedly enhancing its suitability for such analytical endeavors.

The Anuran-Calls-(MFCCs) dataset originates from the segmentation of 60 audio recordings spanning four distinct families, eight genera, and ten species of anuran frogs. Each audio recording corresponds to a single specimen, with an additional record ID column included for reference. The distribution of instances for each family, genus, and species class is given in Table 10. The recordings were conducted in situ under real noise conditions, capturing the natural background sounds, thereby offering a diverse representation of anuran habitats, including locations such as the campus of the Federal University of Amazonas in Manaus, the Mata Atlantic region in Brazil, and even one location in Córdoba, Argentina. Recorded in WAV format at a sampling frequency of 44.1 kHz and a 32-bit resolution, the dataset enables signal analysis up to 22 kHz. The feature extraction process involved calculating 22 Mel-Frequency Cepstral Coefficients (MFCCs) for each syllable, employing 44 triangular filters. These coefficients are subsequently normalized within the range of −1 to 1 and are statistically discussed in Table 9.

The Anuran-Calls-(MFCCs) dataset, with its rich acoustic features and multi-label annotations, is a valuable asset for advancing research in anuran species recognition and related fields. Anurans play crucial roles in ecosystems worldwide, serving as indicators of ecosystem health and biodiversity. They regulate populations of insects and other invertebrates, maintaining ecological balance within animal food webs. Additionally, their skin contains bioactive compounds with potential pharmaceutical applications, contributing to medical research. However, anuran species are threatened by habitat destruction, pollution, and climate change, requiring robust analysis and conservation efforts. Furthermore, they are important for education and outreach initiatives, promoting public awareness of ecology, biodiversity, and conservation.

4.1.3. HackerEarth-Adopt-A-Buddy

The HackerEarth-Adopt-A-Buddy dataset [126] served a noble purpose in facilitating the creation of a virtual tour experience for an esteemed pet adoption agency amidst the pandemic, introduced in Table 11. As the pandemic saw a surge in animal adoption and fostering, this initiative aimed to keep potential pet owners engaged indoors by virtually presenting animals accessible for adoption. To support this endeavor, machine learning methods can be developed to determine the type and breed of animals based on their physical attributes and other pertinent factors. The description of all features in the HackerEarth-Adopt-A-Buddy dataset is summarized in Table 12. The dataset provides a comprehensive foundation for predictive model development and evaluation with 18,834 entries in the training dataset. Moreover, within the dataset, there are four distinct numerical features outlined in Table 13, presenting their statistical attributes such as minimum, maximum, mean, mode, and standard deviation.

This dataset presents an opportunity for multi-label classification as a fundamental aspect of machine learning. By utilizing the provided data and employing machine learning techniques, researchers are tasked with constructing a predictive model capable of accurately discerning both the breed category and pet category based on factors such as animal condition, appearance, and other relevant attributes. This dataset contributes to the important cause of promoting pet adoption and fostering.

Pets serve a crucial role in animal science, offering researchers invaluable insights into various aspects of behavior, physiology, and health. Beyond companionship, they provide real-life settings for studying topics such as animal nutrition, genetics, psychology, and disease management. Moreover, pets serve as models for understanding human–animal interactions, leading to advancements in veterinary medicine and animal welfare. Studying pets yields insights that benefit both human and animal well-being, making them indispensable in the field. Additionally, pet adoption holds significant importance in animal science, extending beyond providing loving homes for animals in need. It serves as a vital avenue for research and education within the discipline. Researchers gain valuable insights into behavior, health, and welfare by studying adopted animals in diverse environments. The diversity among adopted animals allows for the exploration of genetic variations and their impacts on traits and diseases, contributing to veterinary medicine and animal breeding practices. Furthermore, the adoption process fosters public awareness and appreciation for animal welfare issues, promoting responsible pet ownership and ethical treatment. Embracing pet adoption not only enriches individual lives but also advances our understanding and care of the animal kingdom through the analysis of related datasets.

4.2. Results

The primary objective of this study is to introduce an innovative method termed Federated Multi-Label Learning (FMLL) designed specifically for classification tasks. By integrating insights from well-established methodologies such as Binary Relevance and the Reduced-Error Pruning Tree (REPTree) approaches, our framework seeks to synergize the advantages of federated learning and multi-label concepts. This integration is aimed at tackling the complexities associated with multi-label classification issues. The efficacy of the FMLL method was validated using dedicated multi-label datasets, including Amphibians, Anuran-Calls-(MFCCs), and HackerEarth-Adopt-A-Buddy. Our approach was implemented in the C# programming language utilizing the Weka library [127]. The source codes of both FMLL and REPTree methods are publicly available in the GitHub archive (https://github.com/BitaGhasemkhani/Federated-Multi-Label-Learning-FMLL, accessed on 28 June 2024), ensuring reproducibility.

The REPTree classifier was used in our experiments, with hyperparameters set to their default values, e.g., batchSize (100), debug (False), doNotCheckCapabilities (False), initialCount (0.0), maxDepth (−1), minNum (2.0), minVarianceProp (0.001), noPruning (False), numDecimalPlaces (2), numFolds (3), seed (1), and spreadInitialCount (False). The experiments were conducted on standard machines (i.e., Intel(R) Core(TM) i5, 1.80 GHz, 4.00 GB RAM). Also, we employed the 10-fold cross-validation method during experimentation to train and assess the classification models. This method involves randomly dividing the dataset into ten sets, reserving one set for testing while the remaining nine sets serve as the training set. The evaluation process was iterated ten times, and the average classification accuracy was computed.

Furthermore, we employed a range of metrics to evaluate the performance of the proposed FMLL method, including accuracy (ACC), precision (PR), recall (TPR), F-score (FS), and true negative rate (TNR) as delineated in Equations (2) to (6). Moreover, we used the receiver operating characteristic (ROC) curve to assess the trade-off between the true positive rate (TPR) from Equation (4) and the false positive rate (FPR) from Equation (7). Additionally, the precision–recall curve (PRC) was utilized to evaluate the balance between precision and recall.

A C C = \frac{T P + T N}{T P + T N + F P + F N}

(2)

P R = \frac{T P}{T P + F P}

(3)

T P R = \frac{T P}{T P + F N}

(4)

F S = \frac{2 T P}{2 T P + F P + F N}

(5)

T N R = \frac{T N}{T N + F P}

(6)

F P R = \frac{F P}{F P + T N}

(7)

In this context:

True Positive (TP) signifies the count of correctly predicted positive classes by the classifier.
True Negative (TN) represents the count of accurately predicted negative classes by the classifier.
False Positive (FP) denotes the count of erroneously predicted positive classes by the classifier.
False Negative (FN) indicates the count of erroneously predicted negative classes by the classifier.

The application of Federated Multi-Label Learning (FMLL) to the Amphibians dataset yielded compelling results, as shown in Table 14, achieving an average accuracy of 73.24%. Precision scores ranged from 0.613 to 0.790, while recall scores varied from 0.656 to 0.884, demonstrating FMLL’s effectiveness in accurately classifying various amphibian species. Moreover, the F-score, ranging from 0.619 to 0.834, underscored the method’s capability to manage dataset complexities while maintaining a balanced performance between precision and recall. The ROC curve results, spanning from 0.503 to 0.715, highlighted variable performance in class differentiation, whereas the PRC values, ranging from 0.603 to 0.818, provided valuable insights into precision–recall trade-offs across different thresholds. Additionally, the TNR scores between 0.655 and 0.884 indicated the method’s reliability in correctly identifying negative instances. Remarkably, the “great crested newt” amphibian emerged as the top performer across all the metrics, except ROC.

Regarding the Anuran-Calls-(MFCCs) dataset, FMLL showcased exceptional performance, as represented in Table 15, boasting an average accuracy of 94.50%. Precision scores consistently surpassed 0.935 for family, genus, and species categories, demonstrating FMLL’s precision in classifying different levels of anuran calls. Additionally, recall scores ranged from 0.936 to 0.957, underscoring the method’s success in retrieving relevant instances for each category. The F-score, averaging 0.944, further validated FMLL’s effectiveness in handling multi-label classification tasks with high accuracy and reliability. Outstandingly, the “family” syllabus of Anurans excelled in all metrics, achieving an accuracy of 95.75%, with precision, TNR, ROC, PRC, recall, and F-score all reaching above 0.957. Moreover, TNR scores across all categories were considerably high, ranging from 0.980 to 0.992, indicating FMLL‘s ability to accurately identify negative instances. The ROC curve values, ranging from 0.978 to 0.983, illustrated strong performance in distinguishing between classes, while PRC values, ranging from 0.935 to 0.964, offered a detailed analysis of precision–recall dynamics across varying thresholds.

FMLL demonstrated remarkable performance on the HackerEarth-Adopt-A-Buddy dataset, as shown in Table 16, accurately predicting breed and pet categories with an average accuracy of 86.12%. According to the results, the “pet_category” exhibited slightly superior performance compared to the “breed_category” across all the metrics, except ROC and PRC. Also, precision, TNR, ROC, PRC, recall, and F-score metrics presented high average values of 0.863, 0.928, 0.956, 0.933, 0.861, and 0.858, respectively. Furthermore, the ROC values for both categories demonstrated strong discrimination between classes, with values of 0.965 for breed and 0.946 for pet categories. Furthermore, the PRC values, at 0.938 for the breed and 0.928 for pet categories, provided detailed visions into the model‘s precision–recall dynamics. FMLL reaffirmed its robustness in handling complex multi-label classification tasks across different datasets.

As evidenced by Table 14, the FMLL method achieved the highest accuracy (88.36%) on the “great crested newt” species among all the considered metrics. To elucidate the decision-making process underlying this performance, the FMLL method employed a REPTree classifier, generating a structured tree representation as shown in Figure 2. This REPTree structure prominently featured attributes such as type of water reservoirs (TR), surroundings 3 (SUR3), presence of fishing (FR), number of water reservoirs (NR), and vegetation presence (VR) as pivotal nodes. The hierarchical arrangement facilitated a detailed comprehension of feature interactions and their impact on species classification. This illustrative tree not only aids in interpreting model decisions but also underscores the importance of feature selection and attribute significance in FMLL-based classification tasks.

To elaborate further on Figure 2, the root node, labeled TR, represents the most significant attribute for splitting the data, with branches indicating different values of TR. Internal nodes such as SUR3, FR, NR, and VR are actually decision points where data are further split based on specific attribute values. Each leaf node provides the final classification outcome and contains two sets of numbers: (a/b) and [c/d]. Here, a represents the total number of instances reaching the leaf, b indicates the number of misclassified instances, c denotes the number of instances of the majority class, and d shows the number of instances of the minority class. For example, the leaf node 0 (10/4) [5/1] under SUR3 = 1 and TR = 1 indicates that out of 10 instances, 4 were misclassified, with 5 instances in the majority class and 1 in the minority class. Misclassified instances highlight areas where the model‘s predictions do not align with the actual data, aiding in assessing model accuracy. Subtree analysis under nodes like FR = 6 shows further splits based on values of NR, leading to various leaves with their respective instance distributions. To achieve optimal accuracy, parameters such as the minimum number of instances per leaf were fine-tuned in Weka, ensuring the model balances complexity and generalization. This inclusive interpretation of the REPTree figure enhances our understanding of the model‘s performance and data patterns.

5. Discussion

In this section, we compare our proposed method with the current state-of-the-art techniques [124,125,128] in the field. Our analysis covers different dimensions, including accuracy metric on the Amphibians dataset and precision, recall, and F-score evaluation metrics on the Anuran-Calls-(MFCCs) dataset, juxtaposed with state-of-the-art methods, represented in Table 17 and Table 18, respectively.

As shown in Table 17, our approach achieved a remarkable 10.92% improvement on average regarding the Amphibians dataset, outperforming the state-of-the-art methods [124,128]. This improvement can be attributed to the combination of FMLL with BR and the REPTree. While the gradient-boosted tree (GBT), random forest (RF), AdaBoost (ADA), decision tree (DT), and partially monotonic decision tree (PMDT) approaches attained moderate accuracy rates ranging from 57.54% to 71.50%, the proposed method surpassed all these state-of-the-art techniques with the highest accuracy rate of 73.24%. These outcomes highlight the superior performance of FMLL in accurately classifying instances within the multi-label Amphibians dataset.

Table 18 presents a comprehensive comparison of precision, recall, and F-score metrics for various methods using the Anuran-Calls-(MFCCs) dataset, categorized into different taxonomic levels, including species, family, genus, and their combination. At the species level, the FMLL method outperformed all others [125] with precision, recall, and F-score scores of 0.935, 0.936, and 0.935, respectively. The previous methods, e.g., KNN-Flat, RBF-SVM-Flat, Polynomial-SVM-Flat, and Tree-Flat [125], displayed precision scores ranging from 0.470 to 0.850, recall scores ranging from 0.500 to 0.760, and F-scores ranging from 0.490 to 0.740. At the family level, FMLL again revealed superior performance, boasting precision, recall, and F-scores of 0.957 each, outperforming the baseline method, KNN-LCPL. Similarly, at the genus level, FMLL exhibited substantial enhancements over its counterpart, achieving precision, recall, and F-scores of 0.941, 0.942, and 0.941, respectively. Across all taxonomic levels, our method consistently outperformed KNN-LCPL, showcasing precision, recall, and F-scores of 0.944, 0.945, and 0.944, respectively. It is notable that the FMLL method attained substantial improvements across various taxonomic levels when compared to state-of-the-art peers. Specially, at the species taxonomic level, FMLL demonstrated improvements of 25.1% in precision, 30.1% in recall, and 28.4% in F-score metrics. Moving to the family taxonomic level, the method presented improvements of 24.4%, 13.7%, and 19.4% in precision, recall, and F-score metrics, respectively. Similarly, at the genus taxonomic level, FMLL achieved improvements of 27.8%, 21.1%, and 24.6% in precision, recall, and F-score metrics. Finally, when considering the combination of species, family, and genus taxonomic levels, FMLL exhibited improvements of 25.5%, 18.8%, and 22.3% in precision, recall, and F-score metrics. These results underscore the effectiveness of the FMLL method across multiple taxonomic levels, demonstrating substantial improvements over baseline methods in terms of precision, recall, and F-score metrics.

The accuracy results of the existing KNN-LCPL method and the proposed FMLL method given in Table 18 were evaluated using the Mann–Whitney-U and the Quade tests in detail. The Mann–Whitney-U and Quade as non-parametric statistical tests are ideal for comparing the performances of algorithms, making them appropriate for our analysis. The obtained p-values from the Mann–Whitney-U and Quade tests are 0.02107 and 0.03047, respectively. These results show that p-values are considerably below the significance level of 0.05 (α = 0.05). These results indicate that the likelihood of the results occurring by random chance is minimal, allowing us to reject the null hypothesis, which suggests no difference in performance between the methods. Therefore, these statistical tests provide strong evidence that the proposed FMLL method significantly outperformed the KNN-LCPL method. The very small p-values obtained underscore the substantial and reliable differences in accuracy between the two methods.

6. Conclusions and Future Work

In summary, this study introduces Federated Multi-Label Learning (FMLL) as a groundbreaking approach in animal science classification to address the challenges posed by distributed data. By blending federated learning principles with multi-label learning techniques, FMLL offers a method for handling classification tasks where instances may belong to multiple classes simultaneously. Utilizing the Binary Relevance (BR) strategy and adopting the Reduced-Error Pruning Tree (REPTree) classifier within the federated learning framework, FMLL demonstrated robust performance and showcased significant improvements (above 10%) in classification accuracy across diverse animal species datasets. Empirical validation on three distinct datasets—Amphibians, Anuran-Calls-(MFCCs), and HackerEarth-Adopt-A-Buddy—underscored the effectiveness of FMLL in real-world scenarios. Notably, the classification accuracy reached 94.50% for the Anuran-Calls-(MFCCs) dataset and 86.12% for the HackerEarth-Adopt-A-Buddy dataset, highlighting the robustness and practical relevance of FMLL across various taxonomic levels and its potential for applications in diverse domains. Having explored the advancements and contributions of the current research, the following conclusions highlight the significant impacts of the proposed method on the field of animal studies:

(i): Introduction of FMLL (with BR and REPTree) in animal science classification as a novel approach, applicable to diverse real-world scenarios.
(ii): Providing the distribution of computational cost over several clients and ensuring data security with FMLL to preserve privacy in collaborative learning environments.
(iii): Effective handling of multi-label data within the FMLL framework using the BR strategy.
(iv): Pioneering use of the REPTree classifier in federated learning, enhancing accuracy and interpretability.
(v): Empirical validation of FMLL on various animal-based datasets, demonstrating its reliable applicability and efficacy in the field.
(vi): The superiority of FMLL in multi-label classification tasks, evidenced by higher accuracy, precision, recall, and F-score metrics compared to state-of-the-art methods.
(vii): The practical relevance of FMLL across taxonomic levels, showcasing its reliability in addressing multi-label classification problems within the context of animal research.

Looking ahead, several avenues emerge for further exploration of FMLL. Firstly, developing a web application that provides an interface to access the FMLL-based machine-learning model could be useful for animal scientists in decision-making. Additionally, extending FMLL to accommodate dynamic datasets collected by IoT devices, along with integrating mechanisms for model updating, could bolster its adaptability and long-term performance. Exploring alternative multi-label learning methodologies, such as classifier chains, would address the current limitation of binary relevance by incorporating label correlations. Moreover, ensemble learning techniques could be further integrated with FMLL by combining predictions from multiple models. Further exploration of deep learning architectures within the FMLL framework presents an opportunity to uncover profound insights into complex patterns inherent in animal science data. By focusing on these research directions, we aspire to propel the field of federated multi-label learning forward and advance its applications in animal science classification tasks.

Author Contributions

Conceptualization, B.G., O.V. and Y.D.; methodology, B.G., S.U. and K.U.B.; software, B.G. and D.B.; validation, B.G.; formal analysis, B.G.; investigation, B.G., O.V., Y.D., S.U. and K.U.B.; resources, B.G. and D.B.; data curation, O.V., Y.D., S.U. and K.U.B.; writing—original draft preparation, B.G.; writing—review and editing, O.V., Y.D., S.U., K.U.B. and D.B.; visualization, B.G. and D.B.; supervision, D.B.; project administration, D.B.; funding acquisition, O.V., Y.D., S.U. and K.U.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The “Amphibians” dataset [124] is publicly available in the UCI (University of California Irvine) machine learning repository (https://archive.ics.uci.edu/dataset/528/amphibians, accessed on 22 April 2024). The “Anuran-Calls-(MFCCs)” dataset [125] is publicly available in the UCI learning repository (https://archive.ics.uci.edu/dataset/406/anuran+calls+mfccs, accessed on 22 April 2024). The “HackerEarth-Adopt-A-Buddy” dataset [126] is publicly available in the Kaggle machine learning repository (https://www.kaggle.com/datasets/mannsingh/hackerearth-ml-challenge-pet-adoption, accessed on 22 April 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this paper.

AI	Artificial intelligence
ANN	Artificial neural network
ADA	AdaBoost
AMFSC	Amendable multi-function sensor control method
AR	Additive regression
BR	Binary relevance
CC	Classifier chains
CNN	Convolutional neural network
CPPS	Cyber-physical production system
DAGs	Dual attention gates
DeepFedWT	Federated deep learning framework
DPLA	Differential privacy Laplace mechanism
DT	Decision tree
ECC	Ensemble of classifier chains
EIAs	Environmental impact assessments
ETo	Estimation of the evapotranspiration
FCLOpt	Federated contrastive learning optimization
FedAAR	Federated learning framework for animal activity recognition
FedAvg	Federated averaging
FELIDS	Federated learning-based intrusion detection system
FL	Federated learning
FMLL	Federated multi-label learning
FPR	False positive rate
FTL	Federated transfer learning
GBT	Gradient-boosted tree
GIS	Geographic information system
GNN	Graph neural network
HFL	Horizontal federated learning
IDS	Intrusion detection system
IIoT	Industrial Internet of things
IoT	Internet of things
KNN	K-nearest neighbors
LCPL	Local classifier per level
LCPN	Local classifier per node
LP	Label powerset
LR	Logistic regression
LSM	Landslide susceptibility map
LSTM	Long short-term memory
MAE	Absolute error
MetaMIML	Meta learning based multi-instance multi-label learning
ML	Machine learning
MLC	Multi-label classification
ML-CookGAN	Multi-label generative adversarial network
ML-kNN	Multi-label k-nearest neighbors
MMVFL	Multi-participant multi-class vertical federated learning
MSRAN	Multi scale residual attention network
PC	Pairwise coupling
PMDT	Partially monotonic decision tree
PRC	Precision-recall curve
RAkEL	Random k-labelsets
RBF	Radial basis function
RC	Random committee
RD	Regression by discretization
REPTree	Reduced error pruning tree
RF	Random forests
RMSE	Root mean square error
RNN	Recurrent neural network
ROC	Receiver operating characteristic
RTF-REPTree	Rotational forest and reduced error pruning trees
SCADA	Supervisory control and data acquisition
SSA	Sparrow search algorithm
SVM	Support vector machines
TNR	True negative rate
TPR	True positive rate
VFL	Vertical federated learning

References

Li, R.; Gao, L.; Wu, G.; Dong, J. Multiple Marine Algae Identification Based on Three-Dimensional Fluorescence Spectroscopy and Multi-Label Convolutional Neural Network. Spectrochim. Acta Part A 2024, 311, 123938. [Google Scholar] [CrossRef]
Swaminathan, B.; Jagadeesh, M.; Vairavasundaram, S. Multi-Label Classification for Acoustic Bird Species Detection Using Transfer Learning Approach. Ecol. Inf. 2024, 80, 102471. [Google Scholar] [CrossRef]
Celniak, W.; Wodziński, M.; Jurgas, A.; Burti, S.; Zotti, A.; Atzori, M.; Müller, H.; Banzato, T. Improving the Classification of Veterinary Thoracic Radiographs through Inter-Species and Inter-Pathology Self-Supervised Pre-Training of Deep Learning Models. Sci. Rep. 2023, 13, 19518. [Google Scholar] [CrossRef]
Ahsan, M.M.; Alam, T.E.; Haque, M.A.; Ali, M.S.; Rifat, R.H.; Nafi, A.A.N.; Hossain, M.M.; Islam, M.K. Enhancing Monkeypox Diagnosis and Explanation through Modified Transfer Learning, Vision Transformers, and Federated Learning. Inf. Med. Unlocked 2024, 45, 101449. [Google Scholar] [CrossRef]
van Schaik, G.; Hostens, M.; Faverjon, C.; Jensen, D.B.; Kristensen, A.R.; Ezanno, P.; Frössling, J.; Dórea, F.; Jensen, B.-B.; Carmo, L.P.; et al. The DECIDE Project: From Surveillance Data to Decision-Support for Farmers and Veterinarians. Open Res. Eur. 2023, 3, 82. [Google Scholar] [CrossRef]
Shah, K.; Kanani, S.; Patel, S.; Devani, M.; Tanwar, S.; Verma, A.; Sharma, R. Blockchain-Based Object Detection Scheme Using Federated Learning. Secur. Priv. 2022, 6, e276. [Google Scholar] [CrossRef]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
Ogundokun, R.O.; Misra, S.; Maskeliunas, R.; Damasevicius, R. A review on federated learning and machine learning approaches: Categorization, application areas, and blockchain technology. Information 2022, 13, 263. [Google Scholar] [CrossRef]
Abreha, H.G.; Hayajneh, M.; Serhani, M.A. Federated learning in edge computing: A systematic survey. Sensors 2022, 22, 450. [Google Scholar] [CrossRef]
Shaheen, M.; Farooq, M.S.; Umer, T.; Kim, B.-S. Applications of federated learning; taxonomy, challenges, and research trends. Electronics 2022, 11, 670. [Google Scholar] [CrossRef]
Hassanin, M.; Radwan, I.; Khan, S.; Tahtali, M. Learning discriminative representations for multi-label image recognition. J. Vis. Commun. Image Represent. 2022, 83, 103448. [Google Scholar] [CrossRef]
Alfaro, R.; Allende-Cid, H.; Allende, H. Multilabel text classification with label-dependent representation. Appl. Sci. 2023, 13, 3594. [Google Scholar] [CrossRef]
Mei, S. A Multi-label learning framework for predicting chemical classes and biological activities of natural products from biosynthetic gene clusters. J. Chem. Ecol. 2023, 49, 681–695. [Google Scholar] [CrossRef]
Zhu, C.; Liu, Y.; Miao, D.; Dong, Y.; Pedrycz, W. Within-cross-consensus-view representation-based multi-view multi-label learning with incomplete data. Neurocomputing 2023, 557, 126729. [Google Scholar] [CrossRef]
Mo, L.; Zhu, Y.; Zeng, L. A Multi-label based physical activity recognition via cascade classifier. Sensors 2023, 23, 2593. [Google Scholar] [CrossRef]
Suh, J.H. Multi-label prediction-based fuzzy age difference analysis for social profiling of anonymous social media. Appl. Sci. 2024, 14, 790. [Google Scholar] [CrossRef]
Han, R.; Wang, Z.; Guo, Y.; Wang, X.; A, R.; Zhong, G. Multi-label prediction method for lithology, lithofacies and fluid classes based on data augmentation by cascade forest. Adv. Geo Energy Res. 2023, 9, 25–37. [Google Scholar] [CrossRef]
Hou, J.; Zeng, H.; Cai, L.; Zhu, J.; Chen, J.; Ma, K.-K. Multi-label learning with multi-label smoothing regularization for vehicle re-identification. Neurocomputing 2019, 345, 15–22. [Google Scholar] [CrossRef]
Zhang, M.L.; Li, Y.K.; Liu, X.Y.; Geng, X. Binary relevance for multi-label learning: An overview. Front. Comput. Sci. 2018, 12, 191–202. [Google Scholar] [CrossRef]
Akshay, E.; Sugumaran, V.; Elangovan, M. Single point cutting tool fault diagnosis in turning operation using reduced error pruning tree classifier. Struct. Durab. Health Monit. 2022, 16, 255–270. [Google Scholar] [CrossRef]
Clunie, C.; Batista-Mendoza, G.; Cedeño-Moreno, D.; Calderón-Gómez, H.; Mendoza-Pittí, L.; Russell, C.; Vargas-Lombardo, M. Use of data mining strategies in environmental parameters in poultry farms, a case Study. In Proceedings of the 9th International Conference, Guayaquil, Ecuador, 13–16 November 2023; pp. 81–94. [Google Scholar] [CrossRef]
Kumar, A.R.S.; Goyal, M.K.; Ojha, C.S.P.; Singh, R.D.; Swamee, P.K. Application of artificial neural network, fuzzy logic and decision tree algorithms for modelling of streamflow at Kasol in India. Water Sci. Technol. 2013, 68, 2521–2526. [Google Scholar] [CrossRef]
Lin, C.-N.; Huang, W.-S.; Huang, T.-H.; Chen, C.-Y.; Huang, C.-Y.; Wang, T.-Y.; Liao, Y.-S.; Lee, L.-W. Adding value of MRI over CT in predicting peritoneal cancer index and completeness of cytoreduction. Diagnostics 2021, 11, 674. [Google Scholar] [CrossRef]
Haron, N.H.; Mahmood, R.; Amin, N.M.; Ahmad, A.; Jantan, S.R. An Artificial Intelligence Approach to Monitor and Predict Student Academic Performance. J. Adv. Res. Appl. Sci. Eng. Technol. 2024, 44, 105–119. [Google Scholar] [CrossRef]
Dhade, P.; Shirke, P. Federated learning for healthcare: A comprehensive review. Eng. Proc. 2023, 59, 230. [Google Scholar] [CrossRef]
Da Silva, F.R.; Camacho, R.; Tavares, J.M.R.S. Federated learning in medical image analysis: A systematic survey. Electronic 2024, 13, 47. [Google Scholar] [CrossRef]
Prasad, V.K.; Bhattacharya, P.; Maru, D.; Tanwar, S.; Verma, A.; Singh, A.; Tiwari, A.K.; Sharma, R.; Alkhayyat, A.; Țurcanu, F.-E.; et al. Federated learning for the internet-of-medical-things: A survey. Mathematics 2023, 11, 151. [Google Scholar] [CrossRef]
Yaqoob, M.M.; Nazir, M.; Khan, M.A.; Qureshi, S.; Al-Rasheed, A. hybrid classifier-based federated learning in health service providers for cardiovascular disease prediction. Appl. Sci. 2023, 13, 1911. [Google Scholar] [CrossRef]
Žalik, K.R.; Žalik, M. A review of federated learning in agriculture. Sensors 2023, 23, 9566. [Google Scholar] [CrossRef]
Friha, O.; Ferrag, M.A.; Shu, L.; Maglaras, L.; Choo, K.K.R.; Nafaa, M. FELIDS: Federated learning-based intrusion detection system for agricultural Internet of Things. J. Parallel Distrib. Comput. 2022, 165, 17–31. [Google Scholar] [CrossRef]
Yu, J.; Chen, Y.; Wang, Z.; Liu, J.; Huang, B. Food risk entropy model based on federated learning. Appl. Sci. 2022, 12, 5174. [Google Scholar] [CrossRef]
Li, A.; Markovic, M.; Edwards, P.; Leontidis, G. Model pruning enables localized and efficient federated learning for yield forecasting and data sharing. Expert Syst. Appl. 2024, 242, 122847. [Google Scholar] [CrossRef]
Fedorchenko, E.; Novikova, E.; Shulepov, A. Comparative review of the intrusion detection systems based on federated learning: Advantages and open challenges. Algorithms 2022, 15, 247. [Google Scholar] [CrossRef]
Lazzarini, R.; Tianfield, H.; Charissis, V. Federated learning for IoT intrusion detection. AI 2023, 4, 509–530. [Google Scholar] [CrossRef]
Ashraf, M.M.; Waqas, M.; Abbas, G.; Baker, T.; Abbas, Z.H.; Alasmary, H. FedDP: A privacy-protecting theft detection scheme in smart grids using federated learning. Energies 2022, 15, 6241. [Google Scholar] [CrossRef]
Park, J.; Lim, H. Privacy-preserving federated learning using homomorphic encryption. Appl. Sci. 2022, 12, 734. [Google Scholar] [CrossRef]
Abimannan, S.; El-Alfy, E.-S.M.; Hussain, S.; Chang, Y.-S.; Shukla, S.; Satheesh, D.; Breslin, J.G. Towards federated learning and multi-access edge computing for air quality monitoring: Literature review and assessment. Sustainability 2023, 15, 13951. [Google Scholar] [CrossRef]
Supriya, Y.; Gadekallu, T.R. Particle swarm-based federated learning approach for early detection of forest fires. Sustainability 2023, 15, 964. [Google Scholar] [CrossRef]
Chen, D.; Yang, P.; Chen, I.-R.; Ha, D.S.; Cho, J.-H. SusFL: Energy-Aware Federated Learning-based Monitoring for Sustainable Smart Farms. arXiv 2024, arXiv:2402.10280. [Google Scholar] [CrossRef]
Mao, A.; Huang, E.; Gan, H.; Liu, K. FedAAR: A novel federated learning framework for animal activity recognition with wearable sensors. Animals 2022, 12, 2142. [Google Scholar] [CrossRef]
Huang, Y.; Yang, X.; Guo, J.; Cheng, J.; Qu, H.; Ma, J.; Li, L. A High-Precision Method for 100-Day-Old Classification of Chickens in Edge Computing Scenarios Based on Federated Computing. Animals 2022, 12, 3450. [Google Scholar] [CrossRef]
Berghout, T.; Benbouzid, M.; Bentrcia, T.; Lim, W.H.; Amirat, Y. Federated Learning for Condition Monitoring of Industrial Processes: A Review on Fault Diagnosis Methods, Challenges, and Prospects. Electronics 2023, 12, 158. [Google Scholar] [CrossRef]
Wu, S.; Xue, H.; Zhang, L. Q-Learning-Aided Offloading Strategy in Edge-Assisted Federated Learning over Industrial IoT. Electronics 2023, 12, 1706. [Google Scholar] [CrossRef]
Bemani, A.; Björsell, N. Low-Latency Collaborative Predictive Maintenance: Over-the-Air Federated Learning in Noisy Industrial Environments. Sensors 2023, 23, 7840. [Google Scholar] [CrossRef]
Kaleem, S.; Sohail, A.; Tariq, M.U.; Asim, M. An Improved Big Data Analytics Architecture Using Federated Learning for IoT-Enabled Urban Intelligent Transportation Systems. Sustainability 2023, 15, 15333. [Google Scholar] [CrossRef]
Alohali, M.A.; Aljebreen, M.; Nemri, N.; Allafi, R.; Duhayyim, M.A.; Alsaid, M.I.; Alneil, A.A.; Osman, A.E. Anomaly Detection in Pedestrian Walkways for Intelligent Transportation System Using Federated Learning and Harris Hawks Optimizer on Remote Sensing Images. Remote Sens. 2023, 15, 3092. [Google Scholar] [CrossRef]
Xu, C.; Mao, Y. An Improved Traffic Congestion Monitoring System Based on Federated Learning. Information 2020, 11, 365. [Google Scholar] [CrossRef]
Fachola, C.; Tornaría, A.; Bermolen, P.; Capdehourat, G.; Etcheverry, L.; Fariello, M.I. Federated Learning for Data Analytics in Education. Data 2023, 8, 43. [Google Scholar] [CrossRef]
Sengupta, D.; Khan, S.S.; Das, S.; De, D. FedEL: Federated Education Learning for generating correlations between course outcomes and program outcomes for Internet of Education Things. IoT 2024, 25, 101056. [Google Scholar] [CrossRef]
Guo, S.; Zeng, D. Pedagogical Data Federation toward Education 4.0. In Proceedings of the 6th International Conference on Frontiers of Educational Technologies; Association for Computing Machinery, New York, NY, USA, 5–8 June 2020; pp. 51–55. [Google Scholar] [CrossRef]
Zhang, T.; Liu, H.; Tao, J.; Wang, Y.; Yu, M.; Chen, H.; Yu, G. Enhancing Dropout Prediction in Distributed Educational Data Using Learning Pattern Awareness: A Federated Learning Approach. Mathematics 2023, 11, 4977. [Google Scholar] [CrossRef]
Huang, G.; Zhao, X.; Lu, Q. A New Cross-Domain Prediction Model of Air Pollutant Concentration Based on Secure Federated Learning and Optimized LSTM Neural Network. Environ. Sci. Pollut. Res. 2022, 30, 5103–5125. [Google Scholar] [CrossRef] [PubMed]
Idoje, G.; Dagiuklas, T.; Muddesar, I. Federated Learning: Crop Classification in a Smart Farm Decentralised Network. Smart Agric. Technol. 2023, 5, 100277. [Google Scholar] [CrossRef]
Abu-Khadrah, A.; Ali, A.M.; Jarrah, M. An Amendable Multi-Function Control Method Using Federated Learning for Smart Sensors in Agricultural Production Improvements. ACM Trans. Sens. Netw. 2023, in press. [CrossRef]
Jiang, G.; Fan, W.; Li, W.; Wang, L.; He, Q.; Xie, P.; Li, X. DeepFedWT: A Federated Deep Learning Framework for Fault Detection of Wind Turbines. Measurement 2022, 199, 111529. [Google Scholar] [CrossRef]
Campos, E.M.; Saura, P.F.; González-Vidal, A.; Hernández-Ramos, J.L.; Bernabé, J.B.; Baldini, G.; Skarmeta, A. Evaluating Federated Learning for Intrusion Detection in Internet of Things: Review and Challenges. Comput. Netw. 2022, 203, 108661. [Google Scholar] [CrossRef]
Wu, Y.; Zeng, D.; Wang, Z.; Shi, Y.; Hu, J. Distributed Contrastive Learning for Medical Image Segmentation. Med. Image Anal. 2022, 81, 102564. [Google Scholar] [CrossRef] [PubMed]
Rey, V.; Sánchez, P.M.S.; Celdrán, A.H.; Bovet, G. Federated Learning for Malware Detection in IoT Devices. Comput. Netw. 2022, 204, 108693. [Google Scholar] [CrossRef]
Novikova, E.; Doynikova, E.; Golubev, S. Federated Learning for Intrusion Detection in the Critical Infrastructures: Vertically Partitioned Data Use Case. Algorithms 2022, 15, 104. [Google Scholar] [CrossRef]
Geng, D.; He, H.; Lan, X.; Liu, C. Bearing Fault Diagnosis Based on Improved Federated Learning Algorithm. Computing 2021, 104, 1–19. [Google Scholar] [CrossRef]
Wang, Z.; Gai, K. Decision Tree-Based Federated Learning: A Survey. Blockchains 2024, 2, 40–60. [Google Scholar] [CrossRef]
Tonellotto, N.; Gotta, A.; Nardini, F.M.; Gadler, D.; Silvestri, F. Neural Network Quantization in Federated Learning at the Edge. Inf. Sci. 2021, 575, 417–436. [Google Scholar] [CrossRef]
Anaissi, A.; Suleiman, B.; Alyassine, W. A personalized federated learning algorithm for one-class support vector machine: An application in anomaly detection. In Proceedings of the International Conference on Computational Science, London, UK, 21–23 June 2022; pp. 373–379. [Google Scholar] [CrossRef]
Deng, Z.; Han, Z.; Ma, C.; Ding, M.; Yuan, L.; Ge, C.; Liu, Z. Vertical Federated Unlearning on the Logistic Regression Model. Electronics 2023, 12, 3182. [Google Scholar] [CrossRef]
Markovic, T.; Leon, M.; Buffoni, D.; Punnekkat, S. Random Forest Based on Federated Learning for Intrusion Detection. In Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations, Crete, Greece, 17–20 June 2022; pp. 132–144. [Google Scholar] [CrossRef]
Liu, Z.; Wang, L.; Chen, K. Secure efficient federated knn for recommendation systems. In Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery; Springer: Cham, Switzerland, 2021; pp. 1808–1819. [Google Scholar] [CrossRef]
Jiang, C.; Yin, K.; Xia, C.; Huang, W. FedHGCDroid: An Adaptive Multi-Dimensional Federated Learning for Privacy-Preserving Android Malware Classification. Entropy 2022, 24, 919. [Google Scholar] [CrossRef] [PubMed]
Zhong, J.; Wu, Y.; Ma, W.; Deng, S.; Zhou, H. Optimizing Multi-Objective Federated Learning on Non-IID Data with Improved NSGA-III and Hierarchical Clustering. Symmetry 2022, 14, 1070. [Google Scholar] [CrossRef]
Che, L.; Wang, J.; Zhou, Y.; Ma, F. Multimodal Federated Learning: A Survey. Sensors 2023, 23, 6986. [Google Scholar] [CrossRef]
Liu, Z.; Duan, S.; Wang, S.; Liu, Y.; Li, X. MFLCES: Multi-Level Federated Edge Learning Algorithm Based on Client and Edge Server Selection. Electronics 2023, 12, 2689. [Google Scholar] [CrossRef]
Le, D.-D.; Tran, A.-K.; Dao, M.-S.; Nguyen-Ly, K.-C.; Le, H.-S.; Nguyen-Thi, X.-D.; Pham, T.-Q.; Nguyen, V.-L.; Nguyen-Thi, B.-Y. Insights into Multi-Model Federated Learning: An Advanced Approach for Air Quality Index Forecasting. Algorithms 2022, 15, 434. [Google Scholar] [CrossRef]
Feng, S.; Yu, H.; Zhu, Y. MMVFL: A Simple Vertical Federated Learning Framework for Multi-Class Multi-Participant Scenarios. Sensors 2024, 24, 619. [Google Scholar] [CrossRef] [PubMed]
Sajid, N.A.; Rahman, A.; Ahmad, M.; Musleh, D.; Basheer Ahmed, M.I.; Alassaf, R.; Chabani, S.; Ahmed, M.S.; Salam, A.A.; AlKhulaifi, D. Single vs. Multi-Label: The Issues, Challenges and Insights of Contemporary Classification Schemes. Appl. Sci. 2023, 13, 6804. [Google Scholar] [CrossRef]
Suri, J.S.; Bhagawati, M.; Paul, S.; Protogerou, A.D.; Sfikakis, P.P.; Kitas, G.D.; Khanna, N.N.; Ruzsa, Z.; Sharma, A.M.; Saxena, S.; et al. A Powerful Paradigm for Cardiovascular Risk Stratification Using Multiclass, Multi-Label, and Ensemble-Based Machine Learning Paradigms: A Narrative Review. Diagnostics 2022, 12, 722. [Google Scholar] [CrossRef] [PubMed]
Kumar, S.; Kumar, N.; Dev, A.; Naorem, S. Movie Genre Classification Using Binary Relevance, Label Powerset, and Machine Learning Classifiers. Multimed. Tools Appl. 2023, 82, 945–968. [Google Scholar] [CrossRef]
Raza, A.; Rustam, F.; Siddiqui, H.U.R.; Diez, I.d.l.T.; Garcia-Zapirain, B.; Lee, E.; Ashraf, I. Predicting Genetic Disorder and Types of Disorder Using Chain Classifier Approach. Genes 2023, 14, 71. [Google Scholar] [CrossRef]
Yoo, J.; Jin, Y.; Ko, B.; Kim, M.-S. k-Labelsets Method for Multi-Label ECG Signal Classification Based on SE-ResNet. Appl. Sci. 2021, 11, 7758. [Google Scholar] [CrossRef]
Rocha, V.F.; Varejão, F.M.; Segatto, M.E.V. Ensemble of Classifier Chains and Decision Templates for Multi-Label Classification. Knowl. Inf. Syst. 2022, 64, 643–663. [Google Scholar] [CrossRef]
Romero-del-Castillo, J.A.; Mendoza-Hurtado, M.; Ortiz-Boyer, D.; García-Pedrajas, N. Local-Based K Values for Multi-Label K-Nearest Neighbors Rule. Eng. Appl. Artif. Intell. 2022, 116, 105487. [Google Scholar] [CrossRef]
Chada, N.K.; Hoel, H.; Jasra, A.; Zouraris, G.E. Improved Efficiency of Multilevel Monte Carlo for Stochastic PDE through Strong Pairwise Coupling. J. Sci. Comput. 2022, 93, 62. [Google Scholar] [CrossRef]
Read, J.; Bifet, A.; Holmes, G.; Pfahringer, B. Scalable and Efficient Multi-Label Classification for Evolving Data Streams. Mach. Learn. 2012, 88, 243–272. [Google Scholar] [CrossRef]
Nadeem, M.I.; Ahmed, K.; Li, D.; Zheng, Z.; Naheed, H.; Muaad, A.Y.; Alqarafi, A.; Abdel Hameed, H. SHO-CNN: A Metaheuristic Optimization of a Convolutional Neural Network for Multi-Label News Classification. Electronics 2023, 12, 113. [Google Scholar] [CrossRef]
Shakeel, M.; Nishida, K.; Itoyama, K.; Nakadai, K. 3D Convolution Recurrent Neural Networks for Multi-Label Earthquake Magnitude Classification. Appl. Sci. 2022, 12, 2195. [Google Scholar] [CrossRef]
Pang, Y.; Qin, X.; Zhang, Z. Specific Relation Attention-Guided Graph Neural Networks for Joint Entity and Relation Extraction in Chinese EMR. Appl. Sci. 2022, 12, 8493. [Google Scholar] [CrossRef]
Park, M.; Tran, D.Q.; Lee, S.; Park, S. Multilabel Image Classification with Deep Transfer Learning for Decision Support on Wildfire Response. Remote Sens. 2021, 13, 3985. [Google Scholar] [CrossRef]
Hüllermeier, E.; Fürnkranz, J.; Mencia, E.L. Conformal Rule-Based Multi-Label Classification. Lect. Notes Comput. Sci. 2020, 12325, 290–296. [Google Scholar] [CrossRef]
Qiu, S.; Wang, M.; Yang, Y.; Yu, G.; Wang, J.; Yan, Z.; Domeniconi, C.; Guo, M. Meta Multi-Instance Multi-Label Learning by Heterogeneous Network Fusion. Inf. Fusion 2023, 94, 272–283. [Google Scholar] [CrossRef]
Verma, S.; Singh, S.; Majumdar, A. Multi-Label LSTM Autoencoder for Non-Intrusive Appliance Load Monitoring. Electr. Power Syst. Res. 2021, 199, 107414. [Google Scholar] [CrossRef]
Liu, Z.; Niu, K.; He, Z. ML-CookGAN: Multi-Label Generative Adversarial Network for Food Image Generation. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 19, 85. [Google Scholar] [CrossRef]
Saha, S.; Saha, M.; Mukherjee, K.; Arabameri, A.; Ngo, P.T.T.; Paul, G.C. Predicting the Deforestation Probability Using the Binary Logistic Regression, Random Forest, Ensemble Rotational Forest, REPTree: A Case Study at the Gumani River Basin, India. Sci. Total Environ. 2020, 730, 139197. [Google Scholar] [CrossRef] [PubMed]
Ajin, R.S.; Saha, S.; Saha, A.; Biju, A.; Costache, R.; Kuriakose, S.L. Enhancing the Accuracy of the REPTree by Integrating the Hybrid Ensemble Meta-Classifiers for Modelling the Landslide Susceptibility of Idukki District, South-Western India. Photonirvachak 2022, 50, 2245–2265. [Google Scholar] [CrossRef]
Al-Mukhtar, M.; Srivastava, A.; Khadke, L.; Al-Musawi, T.; Elbeltagi, A. Prediction of Irrigation Water Quality Indices Using Random Committee, Discretization Regression, REPTree, and Additive Regression. Water Resour. Manag. 2023, 38, 343–368. [Google Scholar] [CrossRef]
Alsultanny, Y. Machine Learning by Data Mining REPTree and M5P for Predicating Novel Information for PM10. Cloud Comput. Data Sci. 2020, 1, 40–48. [Google Scholar] [CrossRef]
Saha, S.; Sarkar, R.; Roy, J.; Saha, T.K.; Bhardwaj, D.; Acharya, S. Predicting the Landslide Susceptibility Using Ensembles of Bagging with RF and REPTree in Logchina, Bhutan. In Impact of Climate Change, Land Use and Land Cover, and Socio-Economic Dynamics on Landslides; Sarkar, R., Shaw, R., Pradhan, B., Eds.; Springer: Singapore, 2022; pp. 231–247. [Google Scholar] [CrossRef]
Mandal, K.; Saha, S.; Mandal, S. Predicting the Landslide Susceptibility in Eastern Sikkim Himalayan Region, India Using Boosted Regression Tree and REPTree Machine Learning Techniques. In Applied Geomorphology and Contemporary Issues; Mandal, S., Maiti, R., Nones, M., Beckedahl, H.R., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 683–707. [Google Scholar] [CrossRef]
Prajapati, J.B. Analysis of Age Sage Classification for Students’ Social Engagement Using REPTree and Random Forest. In Proceedings of the International Conference on Computational Intelligence in Data Science, Virtual Event, 24–26 March 2022; pp. 44–54. [Google Scholar] [CrossRef]
Elbeltagi, A.; Srivastava, A.; Al-Saeedi, A.H.; Raza, A.; Abd-Elaty, I.; El-Rawy, M. Forecasting Long-Series Daily Reference Evapotranspiration Based on Best Subset Regression and Machine Learning in Egypt. Water 2023, 15, 1149. [Google Scholar] [CrossRef]
Mrabet, H.; Alhomoud, A.; Jemai, A.; Trentesaux, D. A Secured Industrial Internet-of-Things Architecture Based on Blockchain Technology and Machine Learning for Sensor Access Control Systems in Smart Manufacturing. Appl. Sci. 2022, 12, 4641. [Google Scholar] [CrossRef]
Olaleye, T.O. Opinion Mining Analytics for Spotting Omicron Fear-Stimuli Using REPTree Classifier and Natural Language Processing. Int. J. Res. Appl. Sci. Eng. Technol. 2022, 10, 995–1005. [Google Scholar] [CrossRef]
Li, Q.; Wu, Z.; Cai, Y.; Han, Y.; Yung, C.M.; Fu, T.; He, B. Fedtree: A federated learning system for trees. In Proceedings of the 6th Machine Learning and Systems, Miami Beach, FL, USA, 8 June 2023; pp. 1–15. [Google Scholar]
Zheng, Y.; Xu, S.; Wang, S.; Gao, Y.; Hua, Z. Privet: A Privacy-Preserving Vertical Federated Learning Service for Gradient Boosted Decision Tables. IEEE Trans. Serv. Comput. 2023, 16, 3604–3620. [Google Scholar] [CrossRef]
Maddock, S.; Cormode, G.; Wang, T.; Maple, C.; Jha, S. Federated Boosted Decision Trees with Differential Privacy. In Proceedings of the CCS, Nagasaki, Japan, 30 May–2 June 2022; pp. 2249–2263. [Google Scholar] [CrossRef]
Yamamoto, F.; Ozawa, S.; Wang, L. eFL-Boost: Efficient Federated Learning for Gradient Boosting Decision Trees. IEEE Access 2022, 10, 43954–43963. [Google Scholar] [CrossRef]
Fu, F.; Shao, Y.; Yu, L.; Jiang, J.; Xue, H.; Tao, Y.; Cui, B. Vf2boost: Very fast vertical federated gradient boosting for cross-enterprise learning. In Proceedings of the SIGMOD, Xi’an, China, 20–25 June 2021; pp. 563–576. [Google Scholar] [CrossRef]
Li, Q.; Wu, Z.; Wen, Z.; He, B. Privacy-preserving gradient boosting decision trees. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 784–791. [Google Scholar] [CrossRef]
Zhao, L.; Ni, L.; Hu, S.; Chen, Y.; Zhou, P.; Xiao, F.; Wu, L. InPrivate Digging: Enabling Tree-based Distributed Data Mining with Differential Privacy. In Proceedings of the IEEE Conference on Computer Communications, Honolulu, HI, USA, 16–19 April 2018; pp. 2087–2095. [Google Scholar] [CrossRef]
Li, X.; Hu, Y.; Liu, W.; Feng, H.; Peng, L.; Hong, Y.; Ren, K.; Qin, Z. OpBoost: A vertical federated tree boosting framework based on order-preserving desensitization. arXiv 2022, arXiv:2210.01318. [Google Scholar] [CrossRef]
Zhao, J.; Zhu, H.; Xu, W.; Wang, F.; Lu, R.; Li, H. SGBoost: An Efficient and Privacy-Preserving Vertical Federated Tree Boosting Framework. IEEE Trans. Inf. Forensics Secur. 2022, 18, 1022–1036. [Google Scholar] [CrossRef]
Cheng, K.; Fan, T.; Jin, Y.; Liu, Y.; Chen, T.; Papadopoulos, D.; Yang, Q. SecureBoost: A Lossless Federated Learning Framework. IEEE Intell. Syst. 2021, 36, 87–98. [Google Scholar] [CrossRef]
Chen, W.; Ma, G.; Fan, T.; Kang, Y.; Xu, Q.; Yang, Q. Secureboost+: A high performance gradient boosting tree framework for large scale vertical federated learning. arXiv 2021, arXiv:2110.10927. [Google Scholar] [CrossRef]
Le, N.K.; Liu, Y.; Nguyen, Q.M.; Liu, Q.; Liu, F.; Cai, Q.; Hirche, S. Fedxgboost: Privacy-preserving xgboost for federated learning. arXiv 2021, arXiv:2106.10662. [Google Scholar] [CrossRef]
Law, A.; Leung, C.; Poddar, R.; Popa, R.A.; Shi, C.; Sima, O.; Zheng, W. Secure collaborative training and inference for xgboost. In Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice, Virtual Event, 9 November 2020; pp. 21–26. [Google Scholar] [CrossRef]
Wang, Z.; Yang, Y.; Liu, Y.; Liu, X.; Gupta, B.B.; Ma, J. Cloud-based federated boosting for mobile crowdsensing. arXiv 2020, arXiv:2005.05304. [Google Scholar] [CrossRef]
Zhang, J.; Zhao, X.; Yuan, P. Federated security tree algorithm for user privacy protection. J. Comput. Appl. 2020, 40, 2980. [Google Scholar]
Li, Q.; Wen, Z.; He, B. Practical Federated Gradient Boosting Decision Trees. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 4642–4649. [Google Scholar] [CrossRef]
Yang, M.W.; Song, L.Q.; Xu, J.; Li, C.; Tan, G. The tradeoff between privacy and accuracy in anomaly detection using federated xgboost. arXiv 2019, arXiv:1907.07157. [Google Scholar] [CrossRef]
Liu, Y.; Ma, Z.; Liu, X.; Ma, S.; Nepal, S.; Deng, R. Boosting Privately: Privacy-Preserving Federated Extreme Boosting for Mobile Crowdsensing. arXiv 2019, arXiv:1907.10218. [Google Scholar] [CrossRef]
Yao, H.; Wang, J.; Dai, P.; Bo, L.; Chen, Y. An efficient and robust system for vertically federated random forest. arXiv 2022, arXiv:2201.10761. [Google Scholar] [CrossRef]
Han, Y.; Du, P.; Yang, K. FedGBF: An efficient vertical federated learning framework via gradient boosting and bagging. arXiv 2022, arXiv:2204.00976. [Google Scholar] [CrossRef]
Wu, Y.; Cai, S.; Xiao, X.; Chen, G.; Ooi, B.C. Privacy preserving vertical federated learning for tree-based models. arXiv 2020, arXiv:2008.06170. [Google Scholar] [CrossRef]
Liu, Y.; Liu, Y.; Liu, Z.; Liang, Y.; Meng, C.; Zhang, J.; Zheng, Y. Federated Forest. IEEE Trans. Big Data 2020, 8, 843–854. [Google Scholar] [CrossRef]
Zhang, K.; Song, X.; Zhang, C.; Yu, S. Challenges and future directions of secure federated learning: A survey. Front. Comput. Sci. 2022, 16, 165817. [Google Scholar] [CrossRef] [PubMed]
Banabilah, S.; Aloqaily, M.; Alsayed, E.; Malik, N.; Jararweh, Y. Federated learning review: Fundamentals, enabling technologies, and future applications. Inf. Process. Manag. 2022, 59, 103061. [Google Scholar] [CrossRef]
Blachnik, M.; Sołtysiak, M.; Dąbrowska, D. Predicting Presence of Amphibian Species Using Features Obtained from GIS and Satellite Images. ISPRS Int. J. Geo Inf. 2019, 8, 123. [Google Scholar] [CrossRef]
Colonna, J.G.; Gama, J.; Nakamura, E.F. A comparison of hierarchical multi-output recognition approaches for anuran classification. Mach. Learn. 2018, 107, 1651–1671. [Google Scholar] [CrossRef]
Kaggle. HackerEarth ML Challenge: Adopt a Buddy. Available online: https://www.kaggle.com/datasets/mannsingh/hackerearth-ml-challenge-pet-adoption (accessed on 16 March 2024).
Witten, I.H.; Frank, E.; Hall, M.A. Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed.; Morgan Kaufmann: Cambridge, MA, USA, 2016; pp. 1–664. [Google Scholar]
Pan, W. Predicting Presence of Amphibian Species Using Feature Selection. In Proceedings of the 6th IEEE Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 4–6 March 2022; pp. 1823–1826. [Google Scholar] [CrossRef]

Figure 1. The architecture of the proposed FMLL method.

Figure 2. REPTree structure for “great crested newt” classification in FMLL.

Table 1. Overview of federated learning frameworks.

Year	Ref.	FL Type	Dataset	Aggregation Algorithm	ML Algorithm	Evaluation Metric	Contribution
2023	[52]	Centralized	Air pollutants and meteorolgical data	FedAvg	LSTM, SSA, and DPLA	MAE, RMSE, R-squared	Cross-domain prediction of air pollutant concentration
2023	[53]	Decentralized	Air dataset	FedAvg	CNN	Accuracy, precision, recall, F-score, and confusion matrix	Predicting chickpea crops for smart farming
2023	[54]	Centralized	Crop and soil dataset	Federated learning	AMFSC	Analysis rate, control rate	Agricultural production improvement
2022	[30]	Centralized	CSE-CICIDS2018, MQTTset, and InSDN	Cyber-physical production system (CPPS)-based aggregation	CNN, recurrent neural networks, and deep neural networks	Accuracy, precision, recall, F-score	Intrusion detection to enhance the security of agricultural IoT infrastructures
2022	[55]	Centralized	Wind turbines data	FedAvg	MSRAN and deep network	Precision, recall, and F-score	Fault detection in wind turbines
2022	[56]	Centralized	ToN_IoT	FedAvg and Fed+	Multinomial logistic regression	Accuracy, precision, recall, F-score, FPR	Intrusion detection for IoT
2022	[57]	Centralized	Spinesagt2- wdataset3	Federated contrastive learning optimization (FCLOpt)	Dual attention gates (DAGs) and U-Net	Accuracy	Federated learning-based vertebral body segment framework (FLVBSF)
2022	[58]	Centralized	N-BaIoT	Mini-batch and multi-epoch aggregation, derived from FedAVG	Multilayer perceptron and autoencoder	Accuracy, F-score	Federated learning for IoT malware detection
2022	[59]	Centralized	SWAT 2015	SCADA server-based aggregation	GBDT with Paillier HE	Accuracy	Intrusion detection for IoT prioritizing data confidentiality
2021	[60]	Decentralized	CWRU	FA-FedAvg	CNN	Accuracy	Bearing fault diagnosis

Table 2. Example representation of instances in multi-label learning.

Sample	X				Y
$S_{1}$	$x_{11}$	$x_{12}$	…	$x_{1 K}$	$Y_{1} = {y_{2}, y_{4}}$
$S_{2}$	$x_{21}$	$x_{22}$	…	$x_{2 K}$	$Y_{2} = {y_{1}, y_{3}, y_{4}}$
…	…	…	…	…	…
$S_{N}$	$x_{N 1}$	$x_{N 2}$	…	$x_{N K}$	$Y_{N} = {y_{3}}$

Table 3. Binary Relevance transformation of the multi-label dataset displayed in Table 2.

$D_{y_{1}}$	X	Y	$D_{y_{2}}$	X	Y	$D_{y_{3}}$	X	Y	$D_{y_{4}}$	X	Y
$S_{1}$	$[x_{11} \dots x_{1 K}]$	${\neg y}_{1}$	$S_{1}$	$[x_{11} \dots x_{1 K}]$	$y_{2}$	$S_{1}$	$[x_{11} \dots x_{1 K}]$	${\neg y}_{3}$	$S_{1}$	$[x_{11} \dots x_{1 K}]$	$y_{4}$
$S_{2}$	$[x_{21} \dots x_{2 K}]$	$y_{1}$	$S_{2}$	$[x_{21} \dots x_{2 K}]$	$\neg y_{2}$	$S_{2}$	$[x_{21} \dots x_{2 K}]$	$y_{3}$	$S_{2}$	$[x_{21} \dots x_{2 K}]$	$y_{4}$
…	…	…	…	…	…	…	…	…	…	…	…
$S_{N}$	$[x_{N 1} \dots x_{N K}]$	${\neg y}_{1}$	$S_{N}$	$[x_{N 1} \dots x_{N K}]$	${\neg y}_{2}$	$S_{N}$	$[x_{N 1} \dots x_{N K}]$	$y_{3}$	$S_{N}$	$[x_{N 1} \dots x_{N K}]$	${\neg y}_{4}$

Table 4. A brief overview of utilized datasets.

ID	Ref.	Dataset Name	#Features	#Instances	#Labels	#Classes	Source	Link (accessed on 16 March 2024)
1	[124]	Amphibians	23	189	7	2,2,2,2,2,2,2	UCI	https://archive.ics.uci.edu/dataset/528/amphibians
2	[125]	Anuran-Calls-(MFCCs)	22	7195	3	4,8,10	UCI	https://archive.ics.uci.edu/dataset/406/anuran+calls+mfccs
3	[126]	HackerEarth-Adopt-A-Buddy	11	18,834	2	3,4	Kaggle	https://www.kaggle.com/datasets/mannsingh/hackerearth-ml-challenge-pet-adoption

Table 5. The information of Amphibians dataset.

Dataset Attributes	Task	Study Domain	Feature Types	#Instances	#Features	#Views
Multivariate	Classification	Biology	Integer, real, nominal	189	23	6457

Table 6. The statistics of numerical features in Amphibians dataset.

Feature Name	Min	Max	Mean	Mode	Standard Deviation
SR	30	500,000	9633.2275	300	46,256.0783
NR	1	12	1.5661	1	1.5444
OR	25	100	90.8689	100	19.0996

Table 7. The description of all features in Amphibians dataset.

No	Attribute	Type	Description
1	ID	Integer	Identification number (unused in classification)
2	MV	Categorical	Motorway (unused in classification)
3	SR	Numerical	Surface of water reservoir (m²)
4	NR	Numerical	Number of water reservoirs in habitat (The greater the number of reservoirs, the higher the probability that some of them will be proper for amphibian breeding).
5	TR	Categorical	Type of water reservoirs (including reservoirs with natural features, lately formed reservoirs, settling ponds, reservoirs situated near residential areas, technological water reservoirs, etc.)
6	VR	Categorical	Vegetation presence within the reservoirs (including absence of vegetation, sparse patches at the edges, densely overgrown areas, abundant vegetation within the reservoir, reservoirs entirely overgrown, etc.)
7	SUR1	Categorical	Surroundings 1 (the predominant land cover types surrounding the water reservoir)
8	SUR2	Categorical	Surroundings 2 (the second most prevalent types of land cover surrounding the water reservoir)
9	SUR3	Categorical	Surroundings 3 (the third most predominant types of land cover surrounding the water reservoir)
10	UR	Categorical	Use of water reservoirs (unused by humans, recreational and scenic use, economic utilization, technological purposes)
11	FR	Categorical	The presence of fishing (limited or occasional fishing, intensive fishing, breeding reservoirs)
12	OR	Numerical	Degree of access from reservoir edges to undeveloped areas: no access, limited access, moderate access, extensive access to open space
13	RR	Ordinal	Minimum distance from the water reservoir to roads categorized as: <50 m, 50–100 m, 100–200 m, 200–500 m, 500–1000 m, >1000 m
14	BR	Ordinal	Building development as minimum distance to buildings <50 m, 50–100 m, 100–200 m, 200–500 m, 500–1000 m, >1000 m
15	MR	Categorical	Maintenance status of the reservoir (including clean, slightly littered, reservoirs heavily or very heavily littered)
16	CR	Categorical	Type of shore (natural or concrete)
17	Green frogs	Categorical	Presence of green frogs (label 1)
18	Brown frogs	Categorical	Presence of brown frogs (label 2)
19	Common toad	Categorical	Presence of common toad (label 3)
20	Fire-bellied toad	Categorical	Presence of fire-bellied toad (label 4)
21	Tree frog	Categorical	Presence of tree frog (label 5)
22	Common newt	Categorical	Presence of common newt (label 6)
23	Great crested newt	Categorical	Presence of great crested newt (label 7)

Table 8. The information of Anuran-Calls-(MFCCs) dataset.

Dataset Attributes	Task	Study Domain	Feature Type	#Instances	#Features	#Views
Multivariate	Classification, clustering	Biology	Real	7195	22	5692

Table 9. The statistics of MFCC syllables in Anuran-Calls-(MFCCs) dataset.

Feature Name	Min	Max	Mean	Mode	Standard Deviation
MFCCs_1	−0.2512	1.0000	0.9899	1.0000	0.0690
MFCCs_2	−0.6730	1.0000	0.3236	1.0000	0.2187
MFCCs_3	−0.4360	1.0000	0.3112	1.0000	0.2635
MFCCs_4	−0.4727	1.0000	0.4460	1.0000	0.1603
MFCCs_5	−0.6360	0.7522	0.1270	No	0.1627
MFCCs_6	−0.4104	0.9642	0.0979	No	0.1204
MFCCs_7	−0.5390	1.0000	−0.0014	No	0.1714
MFCCs_8	−0.5765	0.5518	−0.0004	No	0.1163
MFCCs_9	−0.5873	0.7380	0.1282	No	0.1790
MFCCs_10	−0.9523	0.5228	0.0560	No	0.1271
MFCCs_11	−0.9020	0.5230	−0.1157	No	0.1868
MFCCs_12	−0.7994	0.6909	0.0434	No	0.1560
MFCCs_13	−0.6441	0.9457	0.1509	No	0.2069
MFCCs_14	−0.5904	0.5757	−0.0392	No	0.1525
MFCCs_15	−0.7172	0.6689	−0.1017	No	0.1876
MFCCs_16	−0.4987	0.6707	0.0421	No	0.1199
MFCCs_17	−0.4215	0.6812	0.0887	No	0.1381
MFCCs_18	−0.7593	0.6141	0.0078	No	0.0847
MFCCs_19	−0.6807	0.5742	−0.0495	No	0.0825
MFCCs_20	−0.3616	0.4678	−0.0532	No	0.0942
MFCCs_21	−0.4308	0.3898	0.0373	No	0.0795
MFCCs_22	−0.3793	0.4322	0.0876	No	0.1234

Table 10. The distribution of instances per class in Anuran-Calls-(MFCCs) dataset.

Label	Class	#Instances
Family	Bufonidae	68
	Dendrobatidae	542
	Hylidae	2165
	Leptodactylidae	4420
Genus	Adenomera	4150
	Ameerega	542
	Dendropsophus	310
	Hypsiboas	1593
	Leptodactylus	270
	Osteocephalus	114
	Rhinella	68
	Scinax	148
Species	AdenomeraAndre	672
	AdenomeraHylaedactylus	3478
	Ameeregatrivittata	542
	HylaMinuta	310
	HypsiboasCordobae	1121
	HypsiboasCinerascens	472
	LeptodactylusFuscus	270
	OsteocephalusOophagus	114
	Rhinellagranulosa	68
	ScinaxRuber	148

Table 11. The information of the HackerEarth-Adopt-A-Buddy dataset.

Dataset Attributes	Task	Study Domain	Feature Type	#Instances	#Features	#Views
Multivariate	Classification	Biology	Integer, real, nominal, temporal	18,834	11	5605

Table 12. The description of all features in the HackerEarth-Adopt-A-Buddy dataset.

No.	Attribute	Type	Description
1	pet_id	Integer	A unique identifier is assigned to each animal up for adoption.
2	issue_date	Temporal	The date when the pet was officially taken in by the shelter.
3	listing_date	Temporal	The date and time when the pet became available for adoption at the shelter.
4	condition	Categorical	The health or physical state of the pet upon arrival at the shelter.
5	color_type	Categorical	The color pattern or combination exhibited by the pet.
6	length	Real	The measured length of the pet is typically in meters.
7	height	Real	The measured height of the pet is typically in centimeters.
8	X1	Integer	The value related with the pet.
9	X2	Integer	The other value related with the pet.
10	breed_category	Categorical	The category or classification of the pet’s breed.
11	pet_category	Categorical	The category or species classification of the pet.

Table 13. The statistics of numerical features in the HackerEarth-Adopt-A-Buddy dataset.

Feature Name	Min	Max	Mean	Mode	Standard Deviation
length	0.0000	1.0000	0.5026	0.0800	0.2887
height	5.0000	50.0000	27.4488	21.4000	13.0198
X1	0.0000	19.0000	5.3696	0.0000	6.5724
X2	0.0000	9.0000	4.5773	1.0000	3.5178

Table 14. Performance metrics for various amphibian species in FMLL.

Amphibians	Accuracy	Precision	TNR	ROC	PRC	Recall	F-Score
Green frogs	68.78	0.694	0.688	0.715	0.682	0.688	0.689
Brown frogs	78.31	0.613	0.783	0.503	0.665	0.783	0.688
Common toad	71.43	0.712	0.714	0.621	0.653	0.714	0.674
Fire-bellied toad	70.37	0.669	0.704	0.576	0.612	0.704	0.650
Tree frog	65.61	0.639	0.655	0.638	0.627	0.656	0.631
Common newt	69.84	0.658	0.698	0.528	0.603	0.698	0.619
Great crested newt	88.36	0.790	0.884	0.539	0.818	0.884	0.834
Average	73.24	0.682	0.732	0.589	0.666	0.732	0.684

Table 15. Performance metrics for Anuran-Calls-(MFCCs) classification in FMLL.

Anuran-Calls-(MFCCs)	Accuracy	Precision	TNR	ROC	PRC	Recall	F-Score
Family	95.75	0.957	0.980	0.978	0.964	0.957	0.957
Genus	94.19	0.941	0.991	0.979	0.943	0.942	0.941
Species	93.55	0.935	0.992	0.983	0.935	0.936	0.935
Average	94.50	0.944	0.988	0.980	0.947	0.945	0.944

Table 16. Performance metrics for categories of HackerEarth-Adopt-A-Buddy dataset in FMLL.

HackerEarth-Adopt-A-Buddy	Accuracy	Precision	TNR	ROC	PRC	Recall	F-Score
Breed_category	85.43	0.856	0.927	0.965	0.938	0.854	0.850
Pet_category	86.80	0.869	0.928	0.946	0.928	0.868	0.865
Average	86.12	0.863	0.928	0.956	0.933	0.861	0.858

Table 17. The comparison of FMLL with state-of-the-art methods using the Amphibians dataset.

Method	Accuracy
Gradient-Boosted Trees (GBT) [124]	64.18
Random Forest (RF) [124]	57.54
AdaBoost (ADA) [124]	60.01
Decision Tree (DT) [124]	58.37
Partially Monotonic Decision Tree (PMDT) [128]	71.50
Average	62.32
Proposed (FMLL with BR and REPTree)	73.24

Table 18. The comparison of FMLL with state-of-the-art methods [125] using the Anuran-Calls-(MFCCs) dataset.

Method	Precision	Recall	F-Score
Species
KNN-Flat	0.690	0.720	0.700
RBF-SVM-Flat	0.850	0.540	0.660
Polynomial-SVM-Flat	0.710	0.760	0.740
Tree-Flat	0.490	0.500	0.500
KNN-LCPL	0.691	0.719	0.705
KNN-Hierarchical-LCPN	0.690	0.720	0.700
RBF-SVM- Hierarchical-LCPN	0.840	0.540	0.650
Polynomial-SVM-Hierarchical-LCPN	0.680	0.710	0.700
Tree-Hierarchical-LCPN	0.570	0.560	0.560
KNN-Hierarchical-LCPL	0.690	0.720	0.700
RBF-SVM- Hierarchical-LCPL	0.830	0.520	0.640
Polynomial-SVM-Hierarchical-LCPL	0.690	0.740	0.720
Tree-Hierarchical-LCPL	0.470	0.500	0.490
Proposed (FMLL with BR and REPTree)	0.935	0.936	0.935
Family
KNN-LCPL	0.713	0.820	0.763
Proposed (FMLL with BR and REPTree)	0.957	0.957	0.957
Genus
KNN-LCPL	0.663	0.731	0.695
Proposed (FMLL with BR and REPTree)	0.941	0.942	0.941
Species + Family + Genus
KNN-LCPL	0.689	0.757	0.721
Proposed (FMLL with BR and REPTree)	0.944	0.945	0.944

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ghasemkhani, B.; Varliklar, O.; Dogan, Y.; Utku, S.; Birant, K.U.; Birant, D. Federated Multi-Label Learning (FMLL): Innovative Method for Classification Tasks in Animal Science. Animals 2024, 14, 2021. https://doi.org/10.3390/ani14142021

AMA Style

Ghasemkhani B, Varliklar O, Dogan Y, Utku S, Birant KU, Birant D. Federated Multi-Label Learning (FMLL): Innovative Method for Classification Tasks in Animal Science. Animals. 2024; 14(14):2021. https://doi.org/10.3390/ani14142021

Chicago/Turabian Style

Ghasemkhani, Bita, Ozlem Varliklar, Yunus Dogan, Semih Utku, Kokten Ulas Birant, and Derya Birant. 2024. "Federated Multi-Label Learning (FMLL): Innovative Method for Classification Tasks in Animal Science" Animals 14, no. 14: 2021. https://doi.org/10.3390/ani14142021

APA Style

Ghasemkhani, B., Varliklar, O., Dogan, Y., Utku, S., Birant, K. U., & Birant, D. (2024). Federated Multi-Label Learning (FMLL): Innovative Method for Classification Tasks in Animal Science. Animals, 14(14), 2021. https://doi.org/10.3390/ani14142021

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Federated Multi-Label Learning (FMLL): Innovative Method for Classification Tasks in Animal Science

Abstract

Simple Summary

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Proposed Approach

3.2. Formal Description

4. Experimental Studies

4.1. Dataset Description

4.1.1. Amphibians

4.1.2. Anuran-Calls-(MFCCs)

4.1.3. HackerEarth-Adopt-A-Buddy

4.2. Results

5. Discussion

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI