Weighted Domain Adaptation Using the Graph-Structured Dataset Representation for Machinery Fault Diagnosis under Varying Operating Conditions

Data-driven fault diagnosis has received significant attention in the era of big data. Most data-driven methods have been developed under the assumption that both training and test data come from identical data distributions. However, in real-world industrial scenarios, data distribution often changes due to varying operating conditions, leading to a degradation of diagnostic performance. Although several domain adaptation methods have shown their feasibility, existing methods have overlooked metadata from the manufacturing process and treated all domains uniformly. To address these limitations, this article proposes a weighted domain adaptation method using a graph-structured dataset representation. Our framework involves encoding a collection of datasets into the proposed graph structure, which captures relations between datasets based on metadata and raw data simultaneously. Then, transferability scores of candidate source datasets for a target are estimated using the constructed graph and a graph embedding model. Finally, the fault diagnosis model is established with a voting ensemble of the base classifiers trained on candidate source datasets and their estimated transferability scores. For validation, two case studies on rotor machinery, specifically tool wear and bearing fault detection, were conducted. The experimental results demonstrate the effectiveness and superiority of the proposed method over other existing domain adaptation methods.


Introduction
Fault diagnosis of rotor machinery is a significant task for improving process efficiency and reducing machine downtime in the manufacturing industry [1,2].Particularly, the high cost of machinery necessitates intelligent fault diagnosis to ensure the expected functionality and performance throughout its lifespan [3].Consequently, research on this topic has rapidly grown in recent years, paralleling the advancements of industrial machinery in terms of scale and complexity.
With the rapid progress of sensor technology and monitoring equipment, various data types can be collected during the manufacturing process.Based on this, the recent research on machinery fault diagnosis has mainly focused on data-driven methods, which can more effectively extract the crucial features of machinery faults [4].Machine-learning methods such as support vector regression (SVR) and random forests (RFs) have shown successful performances in tool wear detection [5,6].Lately, deep-learning methods such as auto-encoder and long short-term memory (LSTM) have outperformed machine-learning methods because they can capture complex features from a huge amount of sensor data [7].
While researchers have achieved considerable success in data-driven fault diagnosis, these methods require that the training data and the test data come from the same probability distribution [8,9].Unfortunately, this requirement is not usually guaranteed • A domain adaptation method for machinery fault diagnosis is developed to address varying operating conditions.It focuses on a more realistic scenario where fault data from the target domain (i.e., unseen operating conditions) is unavailable during the training phase, and multiple source datasets are present.Consequently, the method aims to effectively transfer knowledge from multiple sources to enhance the fault diagnosis model's performance for the target domain through the weighting of candidate source datasets.
• We propose the use of a graph structure to capture intricate relations among domains in the context of fault diagnosis.Within this graph structure, we introduce a novel domain feature that simultaneously considers metadata and collected data.This comprehensive representation enhances our understanding of the varying operating conditions and their effects.

•
Building upon the graph structure, we present a transferability score estimation method specifically designed for multi-source domain adaptation in fault diagnosis under varying operating conditions.This method accurately quantifies the effectiveness of source domains for the target domain.• The experimental results on both the tool wear detection and the bearing fault detec- tion demonstrate the effectiveness and superiority of the proposed method over the baseline domain adaptation methods.
The remainder of the article is organized as follows.Section 2 provides a detailed description of the theoretical background related to this research.Section 3 offers a comprehensive explanation of the proposed method in detail.In Section 4, experimental results and the effectiveness of the proposed framework are discussed.Finally, Section 5 summarizes the key findings and presents further research topics with limitation analysis.

Theoretical Background
This article is related to several research areas in data-driven fault diagnosis, including domain adaptation, weighted domain adaptation, and transferability estimation.This section introduces previous literature that is relevant to this research.

Domain Adaptation
Domain adaptation aims to mitigate the data discrepancy problem by aligning the source and target domains (e.g., unseen operating conditions), thus enhancing the diagnosis model's generalization ability.This approach has emerged as a promising solution to address the challenge of varying operating conditions in fault diagnosis.
Before delving into the details of DA, we introduce essential notations and definitions derived from survey works [19,20].A domain is defined as a distribution D, represented as D = {X, P(X)}, where X denotes the input feature and P(X) denotes a marginal probability distribution.Given a specific domain D, a task T consists of a label space Y and an objective function f (.), which often be represented as a conditional probability distribution P(Y|X).Domain adaptation is a subset of transfer learning settings, where the task remains constant, but the domains vary.
DA methods can be categorized based on their approach to transferring knowledge from source to target domains.A comprehensive review by Singhal et al. [19] classified DA methods into three categories: feature-based and two data-based DA methods.Featurebased DA methods aim to learn a domain-invariant feature representation by minimizing data distribution discrepancies between domains.For example, the feature extractor learns a common feature representation for both domains using distance measures like maximum mean discrepancy [21].Ganin et al. [22] proposed the domain-adversarial neural network (DANN) that employs adversarial training to learn features that are agnostic to variations across different domains (i.e., source and target data) in the input data.Building on the DANN algorithm, Tzeng et al. [23] introduced adversarial discriminative domain adaptation (ADDA), which utilizes the DANN algorithm in a two-stage process, first training on labeled source data and then adapting the model to an unlabeled target domain through adversarial alignment of feature distributions.
In contrast to the feature-based approach, data-based methods aim to minimize specific distances between data distributions by assigning weights or selecting a subset of the source dataset for the target.Based on the distance, the model for the target domain only learns similar datasets to the target dataset.These methods determine the importance of source data for the target data based on distance metrics such as MMD or Kullback-Leibler (KL) divergence.Furthermore, Dai et al. [24] proposed TradaBoost, which is based on a reverse-boosting strategy where the importance of poorly predicted source data decreases at each boosting iteration.
While feature-based approaches, focusing on the role of trained models and their layers, have dominated the research area, recent attempts have highlighted the crucial role of the source dataset as well [25,26].Likewise, several studies have shown that identifying source data can be as important as increasing the size of the source dataset.However, these distance metrics require labeled data for the target domain to accurately compute distances, making them challenging to implement in real industrial scenarios.

Transferability Score Estimation
Transferability score estimation is the task of quantifying the extent to which knowledge acquired from one task or domain can be effectively transferred to another, even when sufficient labeled data for the target domain is lacking.This concept plays a pivotal role in data-based domain adaptation, as accurate transferability estimations help establish relationships between domains and aid in the selection of suitable source datasets for a given target task.As a result, transferability estimation stands as an essential tool for weighting training datasets in domain adaptation, ultimately maximizing the accuracy of the target task [27].
The primary objective of transferability score estimation is to develop a score or metric that evaluates the effectiveness of domain adaptation methods in transferring knowledge from the source domain to the target domain, even in the absence of massive, labeled data.This evaluation enables the efficient assessment of domain adaptation algorithm performance across various source datasets prior to their execution.
Several works have shown the benefit of improving the accuracy of fault diagnosis by selecting high-transferability datasets.For instance, Bao et al. [28] introduced the H-score, a metric that measures the discriminative ability of the source model's features for the target task by considering both intra-class variance and inter-class variances.Tran et al. [29] proposed the negative conditional entropy (NCE) score, derived from the empirical joint distribution of actual labels and predicted labels from the source model.Nguyen et al. [30] presented the log expected empirical prediction (LEEP) score, which replaces the source label with the average log likelihood generated by the pre-trained model.Extensions of the LEEP score, such as those utilizing Gaussian mixture models [31], aim to provide more accurate source labels.While NCE and LEEP scores are simple and practical, the estimation measure relies on the pre-trained model and its output.In an effort to generalize these measures, LogME [32] computes the logarithm of the maximum evidence based on extracted features without requiring the pre-trained model.These measures have demonstrated successful results in fields such as computer vision and speech recognition.

Domain Adaptation for Machinery Fault Diagnosis
The application of domain adaptation methods in machinery fault diagnosis has gained attention in recent years.For instance, Wen et al. [33] employed a distance measure of data distributions called maximum mean difference (MMD) to reduce the discrepancy between features from an auto-encoder model.On the other hand, X. Li et al. [34] utilized multi-kernel MMD, an advanced version of MMD, to align different multi-layer networks for bearing fault diagnosis.
Currently, the predominant approach in this research area has been feature-based methods, such as adversarial learning, which focuses on the role of the trained model and its layers.However, recent attempts have emerged, concentrating on the identification and removal of irrelevant source domains, which has been shown to enhance the domain adaptation performance [15].For example, Mo et al. proposed a novel sparsity measure by improving the optimization process of invariant risk minimization for machinery fault diagnosis tasks [35].Furthermore, several studies [36,37] have incorporated weighting mechanisms into adversarial learning.For instance, Han et al. [38] integrated a domain weighting mechanism into an adversarial domain adaptive network to assign weights to each sample from multiple source domains.
In these contexts, various novel transferability score estimation measures also have been proposed.Given that most industrial machinery datasets consist of time-series data while existing methods primarily focus on image or text data, adaptations are necessary.To tackle this challenge, Ye and Dai [39] incorporated dynamic time warping, which quantifies the distance between time series and the traditional transferability estimation, Jensen-Shannon (JS) distance.Bang [40] proposed a novel measure called expected knowledge gain, which calculates the relatedness between two manufacturing processes using metadata.This approach leverages descriptions of the manufacturing process and its significance to select source datasets for the target.
However, despite the successes achieved by these methods in enhancing domain adaptation under varying operating conditions, most of them have primarily focused on quantifying the distance between data distributions without fully incorporating metadata related to manufacturing operations during the training phase.Furthermore, these methods often consider only one-to-one distances and overlook the intricate relationships between datasets, including the possibility of negative transfer.

Proposed Method
In the proposed framework (refer to Figure 1), our initial step involves converting a collection of datasets into a graph-structured representation.Then, we proceed to estimate the transferability scores of candidate source datasets by leveraging the topological structures of the constructed graph and a graph embedding model.Following the transferability estimation, we allocate weights for the target domain based on these estimates.These weights are subsequently employed in the establishment of a voting ensemble model.The details of each stage are described in the following sections.
Sensors 2024, 24, x FOR PEER REVIEW 5 of 22 diagnosis tasks [35].Furthermore, several studies [36,37] have incorporated weighting mechanisms into adversarial learning.For instance, Han et al. [38] integrated a domain weighting mechanism into an adversarial domain adaptive network to assign weights to each sample from multiple source domains.In these contexts, various novel transferability score estimation measures also have been proposed.Given that most industrial machinery datasets consist of time-series data while existing methods primarily focus on image or text data, adaptations are necessary.To tackle this challenge, Ye and Dai [39] incorporated dynamic time warping, which quantifies the distance between time series and the traditional transferability estimation, Jensen-Shannon (JS) distance.Bang [40] proposed a novel measure called expected knowledge gain, which calculates the relatedness between two manufacturing processes using metadata.This approach leverages descriptions of the manufacturing process and its significance to select source datasets for the target.
However, despite the successes achieved by these methods in enhancing domain adaptation under varying operating conditions, most of them have primarily focused on quantifying the distance between data distributions without fully incorporating metadata related to manufacturing operations during the training phase.Furthermore, these methods often consider only one-to-one distances and overlook the intricate relationships between datasets, including the possibility of negative transfer.

Proposed Method
In the proposed framework (refer to Figure 1), our initial step involves converting a collection of datasets into a graph-structured representation.Then, we proceed to estimate the transferability scores of candidate source datasets by leveraging the topological structures of the constructed graph and a graph embedding model.Following the transferability estimation, we allocate weights for the target domain based on these estimates.These weights are subsequently employed in the establishment of a voting ensemble model.The details of each stage are described in the following sections.

Graph-Structured Dataset Representation
This stage aims to establish a unified structure that is capable of representing a collection of source datasets, capturing diverse domain features and their inter-relations.This unified structure can then be utilized to estimate transferability scores of source domains for the target domain.To achieve this unified structure, we employ a graph comprising nodes and edges.A concept of this stage named the 'Graph-structured Dataset Representation', is illustrated in Figure 2. In this context, each node corresponds to each domain,

Graph-Structured Dataset Representation
This stage aims to establish a unified structure that is capable of representing a collection of source datasets, capturing diverse domain features and their inter-relations.This unified structure can then be utilized to estimate transferability scores of source domains for the target domain.To achieve this unified structure, we employ a graph comprising nodes and edges.A concept of this stage named the 'Graph-structured Dataset Representation', is illustrated in Figure 2. In this context, each node corresponds to each domain, while an edge between two nodes refers to a relation between two different domains.The primary contribution of this article centers on the method used to define nodes and quantify while an edge between two nodes refers to a relation between two different domains.The primary contribution of this article centers on the method used to define nodes and quantify edges in a manner tailored to the requirements of machinery fault diagnosis under varying operating conditions.

Measuring Transferability Scores between Source Domains
This subsection focuses on constructing the representation for estimating transferability scores between domains.In this context, the transferability from domain  to domain  is defined as the effectiveness of domain  in improving fault diagnostic performance under domain  .This is quantified by assessing the accuracy of the diagnostic model transferred from domain  to domain  .This transferability can be quantified using the following mathematical expression: Here,   represents a model trained on the dataset collected under domain P.    ,  denotes the performance score of the trained model   on dataset  .This formula aims to quantify the effectiveness of domain  in achieving the objective related to domain Q.
Initially, the actual transferability scores between source datasets, which include labeled fault data, are assessed.This process involves training a fault detection model using a commonly employed classification algorithm, the random forest algorithm.Subsequently, the model's performance is evaluated using an accuracy score.
Understanding the relations of transferability scores between source domains will be helpful in allocating the weight of the corresponding source domains for the target domain.Consequently, the measured transferability scores between source domains play a crucial role in constructing the graph structure.

Extracting Domain Features for Nodes
First, we will introduce the method for characterizing features within each domain that can be effectively used to quantify the transferability between different domains.In the context of the multi-source DA setting, where a combination of datasets can compose a source dataset for the target, candidate source datasets encompass all possible combinations of source datasets.Consequently, a combination of source datasets is encoded as a node in the graph.For example, with seven source datasets originating from seven distinct domains, the resulting graph comprises 127 nodes (computed as 2 − 1).Subsequently, domain features will be extracted from each of these nodes.
The focus here lies in the identification of a new domain feature that exhibits effectiveness in DA for fault diagnosis under varying operating conditions.Variations in machinery operating conditions significantly influence the correlations between process variables.For instance, in rotor machinery, higher motor speeds can lead to increased vibrations, thus altering the correlation between vibration and acceleration.These fluctuations

Measuring Transferability Scores between Source Domains
This subsection focuses on constructing the representation for estimating transferability scores between domains.In this context, the transferability from domain D S p to domain D S q is defined as the effectiveness of domain D S p in improving fault diagnostic performance under domain D S q .This is quantified by assessing the accuracy of the diagnostic model transferred from domain D S p to domain D S q .This transferability can be quantified using the following mathematical expression: Here, ϕ D S p represents a model trained on the dataset collected under domain P. Score ϕ D S p , D S q denotes the performance score of the trained model ϕ D S p on dataset D S q .This formula aims to quantify the effectiveness of domain D S p in achieving the objective related to domain Q.
Initially, the actual transferability scores between source datasets, which include labeled fault data, are assessed.This process involves training a fault detection model using a commonly employed classification algorithm, the random forest algorithm.Subsequently, the model's performance is evaluated using an accuracy score.
Understanding the relations of transferability scores between source domains will be helpful in allocating the weight of the corresponding source domains for the target domain.Consequently, the measured transferability scores between source domains play a crucial role in constructing the graph structure.

Extracting Domain Features for Nodes
First, we will introduce the method for characterizing features within each domain that can be effectively used to quantify the transferability between different domains.In the context of the multi-source DA setting, where a combination of datasets can compose a source dataset for the target, candidate source datasets encompass all possible combinations of source datasets.Consequently, a combination of source datasets is encoded as a node in the graph.For example, with seven source datasets originating from seven distinct domains, the resulting graph comprises 127 nodes (computed as 2 7 − 1).Subsequently, domain features will be extracted from each of these nodes.
The focus here lies in the identification of a new domain feature that exhibits effectiveness in DA for fault diagnosis under varying operating conditions.Variations in machinery operating conditions significantly influence the correlations between process variables.For instance, in rotor machinery, higher motor speeds can lead to increased vibrations, thus altering the correlation between vibration and acceleration.These fluctuations in correlation become crucial factors contributing to the degradation of diagnostic model per-formance when operating conditions change.Therefore, each domain is characterized by the associations between operating parameters and process variables.The distance between different associations within a domain constitutes the elements of the edge features.
As a result, we propose the correlation structure as a novel domain feature for each node.The correlation structure encompasses associations between operating parameters from metadata and process variables from collected data.The extraction of these associations involves the computation of the Kendall rank correlation coefficient from [41], which is a statistical measure used to assess the strength of association between two sets of data.This non-parametric measure evaluates the similarity in ranking or the concordance of the order of observations between two variables without regard to the magnitude of difference between the variables.The formula for the correlation coefficient for a set of n paired observations ( PV 1 , PV 2 ) is calculated as follows: Here, concordant pairs refer to pairs where the order of variables is the same, signifying a consistent ranking.On the other hand, discordant pairs are those where the order of variables differs, indicating an inconsistent ranking.It is worth noting that tied observations, where two pairs have identical values, are considered in the calculation of the Kendall rank correlation coefficient.Each computed correlation coefficient contributes to the construction of a comprehensive correlation structure, which is typically represented as a matrix.
To refine this matrix, it was determined that correlations with an absolute correlation coefficient less than 0.5 held insignificant correlation, and the corresponding values in the matrix were replaced with 0. An illustrative example of the correlation structure is presented in Figure 3.
in correlation become crucial factors contributing to the degradation of diagnostic model performance when operating conditions change.Therefore, each domain is characterized by the associations between operating parameters and process variables.The distance between different associations within a domain constitutes the elements of the edge features.
As a result, we propose the correlation structure as a novel domain feature for each node.The correlation structure encompasses associations between operating parameters from metadata and process variables from collected data.The extraction of these associations involves the computation of the Kendall rank correlation coefficient from [41], which is a statistical measure used to assess the strength of association between two sets of data.This non-parametric measure evaluates the similarity in ranking or the concordance of the order of observations between two variables without regard to the magnitude of difference between the variables.The formula for the correlation coefficient for a set of n paired observations ( ,  ) is calculated as follows: Here, concordant pairs refer to pairs where the order of variables is the same, signifying a consistent ranking.On the other hand, discordant pairs are those where the order of variables differs, indicating an inconsistent ranking.It is worth noting that tied observations, where two pairs have identical values, are considered in the calculation of the Kendall rank correlation coefficient.Each computed correlation coefficient contributes to the construction of a comprehensive correlation structure, which is typically represented as a matrix.
To refine this matrix, it was determined that correlations with an absolute correlation coefficient less than 0.5 held insignificant correlation, and the corresponding values in the matrix were replaced with 0. An illustrative example of the correlation structure is presented in Figure 3.
Moreover, we incorporated the use of raw data collected from each domain as an additional domain feature.By calculating various distances between these raw data, we quantified the relationships between domains.Moreover, we incorporated the use of raw data collected from each domain as an additional domain feature.By calculating various distances between these raw data, we quantified the relationships between domains.

Constructing an Adjacency Matrix for Edges
This subsection aims to generate a matrix that contains distances between each pair of distinct nodes (i.e., domains) based on the domain features.These distances serve as the basis for connecting two nodes through an edge.Each such edge is represented as a multidimensional vector that signifies the relationships between the two nodes (refer to Figure 4).Typically, these relationships rely on distance measures between data distributions obtained from collected data.Examples of such measures include information divergence, MMD, JS divergence, and optimal transport (OT) [42].In this study, we opted for JS divergence and MMD as our data-distribution similarity metric due to their ease of computation and wide applicability.However, since JS divergence does not account for temporal dependencies in time series data, we additionally incorporated dynamic time warping (DTW) as a distance measure for pairs of time series datasets.The specifics of this distance measure are expounded upon in the following phases.

Constructing an Adjacency Matrix for Edges
This subsection aims to generate a matrix that contains distances between each pair of distinct nodes (i.e., domains) based on the domain features.These distances serve as the basis for connecting two nodes through an edge.Each such edge is represented as a multi-dimensional vector that signifies the relationships between the two nodes (refer to Figure 4).Typically, these relationships rely on distance measures between data distributions obtained from collected data.Examples of such measures include information divergence, MMD, JS divergence, and optimal transport (OT) [42].In this study, we opted for JS divergence and MMD as our data-distribution similarity metric due to their ease of computation and wide applicability.However, since JS divergence does not account for temporal dependencies in time series data, we additionally incorporated dynamic time warping (DTW) as a distance measure for pairs of time series datasets.The specifics of this distance measure are expounded upon in the following phases.JS distance  (, ) between two data distributions (S and T) is a commonly used metric for quantifying dissimilarity.This distance measure is based on KL divergence, a fundamental concept in information theory.KL divergence quantifies the additional information required to encode data from distribution S using a code optimized for distribution T, as opposed to encoding it with a code optimized for distribution S itself.The JS distance combines KL divergence with an averaging step to ensure symmetry and measure overall dissimilarity in a balanced manner.Mathematically, JS distance can be calculated using the following expression: (, ) = ((||) (||))/2 (4) MMD distance  (, ) is a statistical measure used to quantify the dissimilarity between probability distributions.It is commonly employed in various machine learning and statistical tasks, including domain adaptation and kernel methods.The mathematical expression for the MMD distance between two distributions, S and Q, is as follows: JS distance E JS (S, T) between two data distributions (S and T) is a commonly used metric for quantifying dissimilarity.This distance measure is based on KL divergence, a fundamental concept in information theory.KL divergence quantifies the additional information required to encode data from distribution S using a code optimized for distribution T, as opposed to encoding it with a code optimized for distribution S itself.The JS distance combines KL divergence with an averaging step to ensure symmetry and measure overall dissimilarity in a balanced manner.Mathematically, JS distance can be calculated using the following expression: MMD distance E MMD (S, T) is a statistical measure used to quantify the dissimilarity between probability distributions.It is commonly employed in various machine learning Sensors 2024, 24, 188 9 of 22 and statistical tasks, including domain adaptation and kernel methods.The mathematical expression for the MMD distance between two distributions, S and Q, is as follows: In the equation, n and m are the respective numbers of samples drawn from each distribution.x i and y i are the individual data points sampled from the distributions.∅() is a feature map that transforms data into a higher-dimensional space.||.|| represents the norm in the reproducing kernel Hilbert space.
DTW distance is a distance measure used to quantify the dissimilarity between pairs of time series data.Time series data frequently exhibit variations in length, posing a unique challenge when measuring dissimilarity between such datasets.To address these challenges, Sakoe and Chiba [43] introduced the DTW distance, which calculates the distance by optimally matching similar data points between time series.To apply the measure in the dataset similarity, each signal {s i } N i=1 in datasets is divided into an equal number K of segments, each with varying lengths denoted as Window S q = {s i } j+ceil (N/K) i=q .Then, we calculate DTW distances between time series windows in domain S and domain T. Subsequently, we calculate DTW distances between time series windows in domain S and domain T. The final DTW distance between the two domains is computed using the following mathematical expression: The time complexity of calculating DTW distance between two time series depends on several factors, including the lengths of the time series and the specific algorithmic optimizations used.In its basic form, DTW has a time complexity of O N 2 , where N is the length of the time series.This quadratic time complexity arises because, in the worst case, every point in one time series is compared with every point in the other time series.In situations where time consumption is high due to a long length of time series, several approximate or fast DTW algorithms can be applied to reduce the time complexity to linear or near-linear time, O(N).
To measure the distance E CORR (S, T) between two correlation structures, we em- ployed cosine similarity between the correlation structures of different domains.This similarity metric quantifies the cosine of the angle between two matrices treated as vectors.The calculation for the correlation structure distance can be expressed mathematically as follows:

Constructing a Graph-Structured Datasets Representation
In this subsection, a graph structure is proposed based on the extracted domain features and quantified relations between domains.Since a graph is composed of nodes and edges, its structure varies depending on how these nodes and edges are defined.In this research, two different graph structures are proposed.
In this graph structure, measured transferability scores are converted into node features (refer to Figure 5).Specifically, for the node of D S p , there are a total of N measured transferability scores between other nodes.The array of N measured transferability scores is converted into a node feature of the node representing D S p .Edge features consist of a multi-dimensional vector signifying the four quantified relations between the two nodes.The concept of the graph structure is illustrated in the figure.However, in the case of the target domain, the node feature of the target domain is missing since there are no labeled fault data during the training phase, and performance scores remain unknown.These missing values will be estimated in the next subsection.

Transferability Scores Estimation from the Constructed Graph
This subsection aims to estimate the transferability between the target domain and candidate source domains.Since there is sufficient labeled data in the target domain, the transferability between the target domain and other domains remains unknown in the constructed graph.This estimation leverages the constructed graph with a graph embedding model named GraphSAGE (Graph Sampling and Aggregation) proposed in [44].
The GraphSAGE model is well-suited for tasks involving graphs, especially when dealing with missing or incomplete node features.The key component of the model is the neighbor aggregation function, which operates by collecting information or attributes from these neighboring nodes and then aggregating or combining this information in some way to generate a summary or representation for the focal node.This function allows the model to capture the local neighborhood information of each node, which is then used to generate its embedding.This helps the model learn representations that encode the graph's structure and relationships effectively.
The detailed training process for this model can be outlined as follows.
• Input preparation: The input to the model is the constructed graph itself, represented by its edge features (i.e., quantified edge features) and node features (i.e., measured transferability score).The target node with missing node features is not included in the training phase.

•
Sampling neighbors: For each node in the graph, a fixed-size neighborhood is sampled.This is performed to efficiently handle graphs with nodes that have varying degrees of connectivity.

•
Aggregating features: The key idea of the model is to learn a function that aggregates features from a node's local neighborhood.Several aggregation functions can be used, such as mean, LSTM, pooling, etc.This function is learned during the training process.In this article, fully connected neural networks are leveraged for the aggregation functions (refer to Figure 6).

•
Generating embeddings: The aggregated features are then combined with the features of adjacent nodes to generate the target node's embedding.The dimension of this embedding is designed to be identical to the node feature.

•
Loss calculation: The loss is calculated based on a generated embedding and a ground-truth label of the node.Specifically, a mean-squared error function is leveraged for the loss calculation.

•
Backpropagation and optimization: The computed loss is backpropagated through the network, leading to the adjustment of parameters in the aggregating function using optimization techniques such as the Adam optimizer.Training occurs over multiple iterations (epochs), with each iteration dedicated to enhancing the model's ability to minimize the loss.

Transferability Scores Estimation from the Constructed Graph
This subsection aims to estimate the transferability between the target domain and candidate source domains.Since there is sufficient labeled data in the target domain, the transferability between the target domain and other domains remains unknown in the constructed graph.This estimation leverages the constructed graph with a graph embedding model named GraphSAGE (Graph Sampling and Aggregation) proposed in [44].
The GraphSAGE model is well-suited for tasks involving graphs, especially when dealing with missing or incomplete node features.The key component of the model is the neighbor aggregation function, which operates by collecting information or attributes from these neighboring nodes and then aggregating or combining this information in some way to generate a summary or representation for the focal node.This function allows the model to capture the local neighborhood information of each node, which is then used to generate its embedding.This helps the model learn representations that encode the graph's structure and relationships effectively.
The detailed training process for this model can be outlined as follows.
• Input preparation: The input to the model is the constructed graph itself, represented by its edge features (i.e., quantified edge features) and node features (i.e., measured transferability score).The target node with missing node features is not included in the training phase.

•
Sampling neighbors: For each node in the graph, a fixed-size neighborhood is sampled.This is performed to efficiently handle graphs with nodes that have varying degrees of connectivity.• Aggregating features: The key idea of the model is to learn a function that aggregates features from a node's local neighborhood.Several aggregation functions can be used, such as mean, LSTM, pooling, etc.This function is learned during the training process.In this article, fully connected neural networks are leveraged for the aggregation functions (refer to Figure 6).

•
Generating embeddings: The aggregated features are then combined with the features of adjacent nodes to generate the target node's embedding.The dimension of this embedding is designed to be identical to the node feature.

•
Loss calculation: The loss is calculated based on a generated embedding and a groundtruth label of the node.Specifically, a mean-squared error function is leveraged for the loss calculation.

•
Backpropagation and optimization: The computed loss is backpropagated through the network, leading to the adjustment of parameters in the aggregating function using optimization techniques such as the Adam optimizer.Training occurs over multiple iterations (epochs), with each iteration dedicated to enhancing the model's ability to minimize the loss.After training, the trained model estimates the missing node values of the target node by using existing graph features and edge features between the target node and others.The model will leverage the learned node representations, which have been enriched by the edge features, to make these predictions.
The performance of the graph embedding model is influenced by two critical parameters: the number of neighbors selected for aggregation and the depth and structure of the aggregation process.Consequently, it is imperative to optimize these parameters, potentially employing techniques like grid search.
Furthermore, it is important to emphasize that these parameters are intricately linked to the overall time complexity of the model.Additionally, the size of the graph is the main factor of the time complexity.Therefore, it is also important to select the appropriate number of neighbors, a choice that should be tailored to the specific size and characteristics of the graph.Alternatively, exploring optimization techniques, such as graph partitioning, holds the potential to alleviate the computational burden associated with the practical implementation of GraphSAGE.

Two-Stage Weighting Strategy for Domain Adaptation
This subsection is focused on the development of a voting ensemble model using candidate source datasets and their corresponding estimated transferability for the target domain.Instead of constructing a single model, our approach involves establishing a voting ensemble model that combines the results of multiple models based on their assigned weights.
Here is a step-by-step breakdown of the process.
• Developing base classifiers with instance weighting: We start by generating a diverse pool of trained models using the candidate source datasets.This involves developing individual fault diagnostic models, each trained on a specific candidate source dataset.To build these models, we employ a classification algorithm combined with a domain adaptation method to ensure the robustness of each individual model in the ensemble.Specifically, we leverage the random forest algorithm as the classification algorithm and TrAdaBoost [27] as an instance weighting mechanism.The detailed process is explained in Algorithm 1.After training, the trained model estimates the missing node values of the target node by using existing graph features and edge features between the target node and others.The model will leverage the learned node representations, which have been enriched by the edge features, to make these predictions.
The performance of the graph embedding model is influenced by two critical parameters: the number of neighbors selected for aggregation and the depth and structure of the aggregation process.Consequently, it is imperative to optimize these parameters, potentially employing techniques like grid search.
Furthermore, it is important to emphasize that these parameters are intricately linked to the overall time complexity of the model.Additionally, the size of the graph is the main factor of the time complexity.Therefore, it is also important to select the appropriate number of neighbors, a choice that should be tailored to the specific size and characteristics of the graph.Alternatively, exploring optimization techniques, such as graph partitioning, holds the potential to alleviate the computational burden associated with the practical implementation of GraphSAGE.

Two-Stage Weighting Strategy for Domain Adaptation
This subsection is focused on the development of a voting ensemble model using candidate source datasets and their corresponding estimated transferability for the target domain.Instead of constructing a single model, our approach involves establishing a voting ensemble model that combines the results of multiple models based on their assigned weights.
Here is a step-by-step breakdown of the process.
• Developing base classifiers with instance weighting: We start by generating a diverse pool of trained models using the candidate source datasets.This involves developing individual fault diagnostic models, each trained on a specific candidate source dataset.
To build these models, we employ a classification algorithm combined with a domain adaptation method to ensure the robustness of each individual model in the ensemble.Specifically, we leverage the random forest algorithm as the classification algorithm and TrAdaBoost [27] as an instance weighting mechanism.The detailed process is explained in Algorithm 1.

Algorithm 1 Procedure of developing base classifiers with the instance weighting
For each iteration p (up to a pre-defined P): 1: Initialize weight uniformly.2: Train a classifier h p (.) using the current weights.3: Calculate weighted error.

•
Domain weighting with allocated weights: The next step is to assign weights to each candidate source dataset for the voting ensemble model.These weights are determined based on the estimated transferability scores assigned to each candidate source dataset concerning a particular target dataset.Higher estimated transferability scores result in larger weight values.To ensure that the sum of all weights equals 1, the weights are adjusted accordingly.Additionally, a threshold (estimated transferability of 0) is introduced to refine the selection of source datasets for the ensemble, allowing only the best-performing source datasets to contribute.

•
Building an ensemble model: The final predicted value of the ensemble model for a target domain is computed by multiplying each node's final weight by its predicted values from the trained model with the domain adaptation method.These weighted predictions are then summed to obtain the ensemble's prediction for the target domain.
The detailed process is explained in Algorithm 2.

Algorithm 2 Domain weighting with allocated weights
Input: A set of base classifiers {WC i } M i=1 , A set of estimated transferability scores [TF(i, T)] M i=1 .A test dataset {x k } T k=1 1: Normalize the so that they sum up to 1.
2: Obtain predictions on kth sample by the base classifier i.The visual representation provided in Figure 7 illustrates how the voting ensemble model operates, leveraging the weights from various source datasets to make accurate predictions for the target domain.This ensemble approach enhances the robustness and performance of the domain adaptation process in fault diagnosis under varying operating conditions.
Sensors 2024, 24, x FOR PEER REVIEW 13 of 22 The visual representation provided in Figure 7 illustrates how the voting ensemble model operates, leveraging the weights from various source datasets to make accurate predictions for the target domain.This ensemble approach enhances the robustness and performance of the domain adaptation process in fault diagnosis under varying operating conditions.

Experimental Results and Discussion
In this section, we conduct two case studies on rotor machinery fault diagnosis under varying operating conditions to validate the effectiveness and superiority of the proposed framework.Each case study is based on real-world public datasets.

Description of Dataset
The SMART dataset was collected from a real CNC milling machine that is part of the system-level manufacturing and automation research testbed (SMART) at the University of Michigan [45].The testbed provided a milling machine dataset collected from a CNC milling machine under varying feed rates, clamping pressures, and tool conditions (as illustrated in Figure 8).In the CNC milling machine, feed rate refers to the relative velocity of the cutting tool along the workpiece, and clamping pressure refers to the pressure used to hold the workpiece.Depending on the characteristics of the workpiece or expected part quality, the setting values of those operating parameters may vary.In the context of this research, datasets collected under different feed rates and clamping pressures are defined as domains A to F, respectively (as detailed in Table 1).
Table 1.Operational conditions of each domain.Feed rate refers relative velocity of the cutting tool along the workpiece.Clamping pressure refers to pressure used to hold the workpiece in the vise.

Domain
Operating Conditions Feed Rate (mm/s) Clamping Pressure (bar) A

Experimental Results and Discussion
In this section, we conduct two case studies on rotor machinery fault diagnosis under varying operating conditions to validate the effectiveness and superiority of the proposed framework.Each case study is based on real-world public datasets.

SMART Dataset 4.1.1. Description of Dataset
The SMART dataset was collected from a real CNC milling machine that is part of the system-level manufacturing and automation research testbed (SMART) at the University of Michigan [45].The testbed provided a milling machine dataset collected from a CNC milling machine under varying feed rates, clamping pressures, and tool conditions (as illustrated in Figure 8).In the CNC milling machine, feed rate refers to the relative velocity of the cutting tool along the workpiece, and clamping pressure refers to the pressure used to hold the workpiece.Depending on the characteristics of the workpiece or expected part quality, the setting values of those operating parameters may vary.In the context of this research, datasets collected under different feed rates and clamping pressures are defined as domains A to F, respectively (as detailed in Table 1).
Table 1.Operational conditions of each domain.Feed rate refers relative velocity of the cutting tool along the workpiece.Clamping pressure refers to pressure used to hold the workpiece in the vise.During one such manufacturing operation, time series datasets were collected from each of the four motors (X, Y, Z, S) in the machine, where S is the spindle.A total of seven datasets were collected for each motor, resulting in a comprehensive dataset consisting of 44 variables.These datasets were sampled at a rate of 10 Hz.The time series datasets included the motor's actual position, actual velocity, actual acceleration, command position, command velocity, command acceleration, current feedback, DC bus voltage, output current, output voltage, and output power.During one such manufacturing operation, time series datasets were collected from each of the four motors (X, Y, Z, S) in the machine, where S is the spindle.A total of seven datasets were collected for each motor, resulting in a comprehensive dataset consisting of 44 variables.These datasets were sampled at a rate of 10 Hz.The time series datasets included the motor's actual position, actual velocity, actual acceleration, command position, command velocity, command acceleration, current feedback, DC bus voltage, output current, output voltage, and output power.

Experiment Setup
Firstly, one domain was chosen from the available seven datasets named A to G, designated as the target domain.Subsequently, all remaining datasets, excluding the chosen target, were employed as candidate source datasets for training the tool wear detection model with the DA method.Notably, we assumed that the target operating condition represented an unseen operating condition, thus restricting the involvement of solely normal (unworn tool) data from the target dataset during the training phase.
Leveraging these datasets, we proceeded to develop the tool wear detection model, employing various domain adaptation methods.Subsequently, we evaluated the model's accuracy in predicting the target dataset.Performance assessment of the tool wear detection models was carried out employing widely recognized metrics, including accuracy scores and area under the ROC curve (AUC) scores, both of which are established for evaluating classification models.
We conducted a comparative analysis of the proposed method against several baseline methods.We began by selecting a naïve method that does not utilize any DA method and instead learns from the source dataset, assigning uniform weights to all collected data.The difference in accuracy from the "Traditional ML" serves as an indicator of the performance improvement through the application of the DA method.

•
Traditional machine learning (ML): To provide a reference point, we implemented the "Traditional ML" baseline method, where equal weight was apportioned to each

Experiment Setup
Firstly, one domain was chosen from the available seven datasets named A to G, designated as the target domain.Subsequently, all remaining datasets, excluding the chosen target, were employed as candidate source datasets for training the tool wear detection model with the DA method.Notably, we assumed that the target operating condition represented an unseen operating condition, thus restricting the involvement of solely normal (unworn tool) data from the target dataset during the training phase.
Leveraging these datasets, we proceeded to develop the tool wear detection model, employing various domain adaptation methods.Subsequently, we evaluated the model's accuracy in predicting the target dataset.Performance assessment of the tool wear detection models was carried out employing widely recognized metrics, including accuracy scores and area under the ROC curve (AUC) scores, both of which are established for evaluating classification models.
We conducted a comparative analysis of the proposed method against several baseline methods.We began by selecting a naïve method that does not utilize any DA method and instead learns from the source dataset, assigning uniform weights to all collected data.The difference in accuracy from the "Traditional ML" serves as an indicator of the performance improvement through the application of the DA method.
• Traditional machine learning (ML): To provide a reference point, we implemented the "Traditional ML" baseline method, where equal weight was apportioned to each domain.This method exclusively relies on the source data for training the diagnostic model, devoid of any DA method.
The proposed method was also compared to the widely used DA methods from databased and feature-based approaches.The selected feature-based baseline methods include: • Domain-adversarial neural network (DANN) [22]: Employing adversarial training to learn features that are agnostic to variations across different domains (i.e., source and target data) in the input data.• Adversarial discriminative domain adaptation (ADDA) [23]: Utilizing a DANN algo- rithm in a two-stage process by initially training on labeled source data and subsequently adapting the model to an unlabeled target domain using adversarial alignment of feature distributions.
In addition to the feature-based methods, we also considered data-based methods as baseline methods, which consist of:

•
Kernel mean matching (KMM) [46]: Reweighting source data to minimize the MMD distance between domains by solving quadratic optimization problems.

•
Kullback-Leibler importance estimation procedure (KLIEP) [47]: Estimating weighted importance by minimizing the KL divergence distance between domains.It assigns higher weights to the data that are more important for learning the target distribution.• TrAdaBoost: Allocating weights based on a reverse boosting strategy where the weight of source data poorly predicted is decreased at each iteration.
To ensure a fair comparison among the above DA methods, all the methods were implemented with the same loss function (=mean squared error), learning rate (=0.001), and optimizer (=Adam).A one-class classifier, specifically the RandomForestClassifier, served as the backbone model for all DA methods.In testing, the classifier showed a 98.89% accuracy score under the traditional setting of an identical distribution, providing its capability for tool wear detection tasks.Furthermore, for methods necessitating a domain discriminator, a fully connected network structured with layers of 200-100-2 was employed.This network architecture facilitated the domain discrimination required by these specific methods in the comparative evaluation.By employing these standardized settings and model choices, the comparative evaluation of the DA methods was conducted on an equitable footing, enabling an unbiased assessment of their respective performances.

Accuracy Comparison
In this subsection, a comprehensive comparison was conducted among six different domain adaptation (DA) methods applied specifically to the task of tool wear detection.The primary objective of this evaluation was to assess the effectiveness of each DA method in comparison to the method without any DA, serving as the baseline represented by the "Traditional ML" method.The performance improvements achieved by each DA method relative to this baseline were measured and depicted.
Performance improvement, in this context, refers to how much the performance of a DA method is improved compared to the method without any DA.Therefore, the Y-axis of each bar graph in Figure 9 represents these performance improvements, where the values for the "Traditional ML" method are all zero, as it serves as the baseline.The X-axis represents the various target domains.For a more comprehensive breakdown of the results, Table 2 offers a detailed account of AUC scores, while Table 3 provides an in-depth analysis of accuracy scores.These tables provide a comprehensive evaluation of the performance metrics, augmenting the insights garnered from Figure 9.
The experimental results, as shown in Figure 9, indicate that the proposed method outperformed the other DA methods, achieving the highest area under the ROC curve (AUC) scores across all target domains.Notably, in the target domains F and G, the proposed method exhibited substantial improvements in AUC scores of 0.635 and 0.454, respectively.These domains operated at a feed rate of 20 mm/s, while other domains used a feed rate of 3 or 6 mm/s.The change in feed rate significantly impacted the performance in these two domains, but the proposed method effectively mitigated this discrepancy by allocating weights to each candidate source dataset.The experimental results, as shown in Figure 9, indicate that the proposed method outperformed the other DA methods, achieving the highest area under the ROC curve (AUC) scores across all target domains.Notably, in the target domains F and G, the proposed method exhibited substantial improvements in AUC scores of 0.635 and 0.454,  In contrast to the other domains, the performance improvement in domain A is not statistically significant.This is likely due to the fact that the traditional machine learning method achieved an AUC score of 0.991 when applied to domain A. This suggests that, in the context of domain A, a detection model with sufficient performance can be learned even by utilizing existing data without employing DA methods.
While the TrAdaBoost method also showed performance improvements in all target domains, the observed results were inferior to those of the proposed method.This suggests that the combination of the proposed transferability estimation with the voting ensemble approach significantly enhanced the model's ability for domain adaptation, establishing the superiority of this approach for tool wear detection under varying operating conditions.
Conversely, the feature-based DA methods demonstrated relatively lower performance improvements and, in some cases, even performed worse than the method without DA (the "Traditional ML" method).The feature-based methods tended to underperform due to their indiscriminate training with all source datasets, allowing potentially harmful datasets to negatively influence the fault diagnosis model in most of the target domains.

PU Bearing Dataset
In this subsection, we aim to demonstrate the generalizability of the proposed framework to a wide range of machinery fault diagnoses.To achieve this, we validated the framework for the bearing fault diagnosis under the same experimental setup.

Description of Dataset
The PU bearing datasets were collected from the test rig, which is part of the bearing data center at Paderborn University [48].The test rig of the PU bearing datasets is depicted in Figure 10.The bearings were operated under four different operating conditions, as shown in Table 4.Each of the statuses was tested in four different operating conditions.The operating parameters were the load torque of the drive train, the rotational speed of the drive system, and the radial force on the test bearing.All three operating parameters and their setting values were constant during each measurement.During each measurement, vibration data were collected using accelerometers with a sampling frequency of 64 KHz over a duration of four seconds.
a feed rate of 3 or 6 mm/s.The change in feed rate significantly impacted the performance in these two domains, but the proposed method effectively mitigated this discrepancy by allocating weights to each candidate source dataset.
In contrast to the other domains, the performance improvement in domain A is not statistically significant.This is likely due to the fact that the traditional machine learning method achieved an AUC score of 0.991 when applied to domain A. This suggests that, in the context of domain A, a detection model with sufficient performance can be learned even by utilizing existing data without employing DA methods.
While the TrAdaBoost method also showed performance improvements in all target domains, the observed results were inferior to those of the proposed method.This suggests that the combination of the proposed transferability estimation with the voting ensemble approach significantly enhanced the model's ability for domain adaptation, establishing the superiority of this approach for tool wear detection under varying operating conditions.
Conversely, the feature-based DA methods demonstrated relatively lower performance improvements and, in some cases, even performed worse than the method without DA (the "Traditional ML" method).The feature-based methods tended to underperform due to their indiscriminate training with all source datasets, allowing potentially harmful datasets to negatively influence the fault diagnosis model in most of the target domains.

PU Bearing Dataset
In this subsection, we aim to demonstrate the generalizability of the proposed framework to a wide range of machinery fault diagnoses.To achieve this, we validated the framework for the bearing fault diagnosis under the same experimental setup.

Description of Dataset
The PU bearing datasets were collected from the test rig, which is part of the bearing data center at Paderborn University [48].The test rig of the PU bearing datasets is depicted in Figure 10.The bearings were operated under four different operating conditions, as shown in Table 4.Each of the statuses was tested in four different operating conditions.The operating parameters were the load torque of the drive train, the rotational speed of the drive system, and the radial force on the test bearing.All three operating parameters and their setting values were constant during each measurement.During each measurement, vibration data were collected using accelerometers with a sampling frequency of 64KHz over a duration of four seconds.Figure 11 shows the data distribution of two different domains and its negative effect on the fault diagnosis performance.The scatter plot visualizes datasets from two different domains using the t-distributed stochastic neighboring embedding (t-SNE) algorithm, a dimensionality reduction method.Instances from dataset C, collected under domain C, are represented by orange points, while those from dataset A are depicted in blue.The red lines represent the decision boundary of the classifier model that was trained from the whole dataset C and only normal data in dataset A.
Although the classifier exhibited high performance on domain C, numerous misclassified points were on domain A. This discrepancy within the PU bearing dataset domains significantly affected the models' performance, indicating a discrepancy between the datasets across domains, thereby impeding accurate fault diagnosis.

Experiments Setup
To implement the domain adaptation methodology, we adopted the experimental setup as outlined in [49] for the PU bearing dataset scenario.The setup involved the selection of two distinct real fault datasets, namely KA16 and KI16, as well as a normal dataset named K001.All datasets were operated under four different operating conditions denoted as domains A, B, C, and D (see Table 4).Specifically, the K001 dataset was collected under a normal bearing, the KA16 dataset was collected under the outer fault, and the KI18 was collected under the inner fault.The datasets and their corresponding Although the classifier exhibited high performance on domain C, numerous misclassified points were on domain A. This discrepancy within the PU bearing dataset domains significantly affected the models' performance, indicating a discrepancy between the datasets across domains, thereby impeding accurate fault diagnosis.

Experiments Setup
To implement the domain adaptation methodology, we adopted the experimental setup as outlined in [49] for the PU bearing dataset scenario.The setup involved the selection of two distinct real fault datasets, namely KA16 and KI16, as well as a normal dataset named K001.All datasets were operated under four different operating conditions denoted as domains A, B, C, and D (see Table 4).Specifically, the K001 dataset was collected under a normal bearing, the KA16 dataset was collected under the outer fault, and the KI18 was collected under the inner fault.The datasets and their corresponding operating conditions are detailed in Table 4.As a result, the task in this experiment is a multi-class (three-class) classification task under four different operating conditions.To evaluate the multi-class classification, we utilized the f-1 score instead of the AUC score.The other experimental setup was identical to Section 4.1.2.
As observed in the experimental results from the first case study, the proposed method consistently outperforms existing DA methods in terms of f-1 scores and accuracy scores across most target domains.This highlights the effectiveness of the proposed method not only in the previous task but also in the task of bearing fault detection.
However, it is worth noting that in the case of domain A, all DA methods failed to enhance diagnostic capability.In our experiments, the traditional machine learning (ML) method achieved f-1 scores of 0.555, 0.934, 0.941, and 0.773 for the target domains A, B, C, and D, respectively.Although the performance significantly degraded to 0.555 in domain A, it seems that the effectiveness of the proposed method was limited due to the distinct characteristics of failure data in domain A. While no performance improvement was achieved in domain A, our method showed performance improvements in other domains B to D. This demonstrated the effectiveness of the proposed method for machinery fault diagnosis under varying operating conditions.

Conclusions
This article presents a weighted domain adaptation method that leverages the graphstructured representation to achieve robust accuracies under varying operating conditions.To be specific, the collection of datasets is represented in a graph structure.Subsequently, estimated weights are assigned to the target domain based on this transferability estimation, and these weights are utilized to construct a voting ensemble model.The empirical results from two case studies in rotor machinery fault diagnosis consistently demonstrate that the proposed method outperforms existing domain adaptation techniques in terms of accuracy.These findings underscore the effectiveness of the proposed approach for fault diagnosis under diverse operating conditions.
While our research holds promise, it does come with several limitations that deserve attention.Firstly, we assume that source datasets are labeled and contain a sufficient amount of data for training diagnostic models effectively.However, in real-world scenarios, source datasets may often fall short in terms of both quantity and the availability of labeled examples.Secondly, domain adaptation may not always guarantee better accuracy by leveraging given source datasets.For example, if there is only a source dataset that is harmful to learning a diagnostic model for the target domain, the result may be poor, no matter how optimal the weight is calculated.In this case, it is necessary to ensure that the effect of domain adaptation itself will be small.Lastly, our research focuses exclusively on the classification of faults in machinery fault diagnosis.
As part of future research directions, one potential direction involves the incorporation of a measure to assess the quality of source datasets themselves when constructing the dataset graph.Such an evaluation could assist in identifying the suitability of source data for domain adaptation.To determine the potential effectiveness of domain adaptation, we suggest initializing all weights to zero when no source dataset surpasses a pre-defined criterion.Analyzing the pattern of estimated indices could provide valuable insights into the feasibility of domain adaptation in specific scenarios.Alternatively, we could explore the calculation of outliers for nodes within the expressed graph as a means to assess whether domain adaptation is likely to yield beneficial results or not.Broadening the scope of our approach to encompass regression problems could enable the development of more comprehensive and versatile fault diagnosis solutions, extending its applicability beyond classification tasks.These proposed research directions aim to address the identified limitations and further enhance the practicality and effectiveness of domain adaptation in a range of real-world scenarios and domains.

Figure 1 .
Figure 1.Proposed framework for fault diagnosis under varying operating conditions.

Figure 1 .
Figure 1.Proposed framework for fault diagnosis under varying operating conditions.
manner tailored to the requirements of machinery fault diagnosis under varying operating conditions.Sensors 2024, 24, x FOR PEER REVIEW 6 of 22

Figure 2 .
Figure 2. Concept of the graph-structured representation.

Figure 2 .
Figure 2. Concept of the graph-structured representation.

Figure 3 .
Figure 3.The concept and example of the correlation structure.Figure 3. The concept and example of the correlation structure.

Figure 3 .
Figure 3.The concept and example of the correlation structure.Figure 3. The concept and example of the correlation structure.

Figure 4 .
Figure 4. Four attributes of an edge feature.

Figure 4 .
Figure 4. Four attributes of an edge feature.

Figure 5 .
Figure 5.An example of the constructed graph.

Figure 5 .
Figure 5.An example of the constructed graph.

Figure 6 .
Figure 6.Procedure of estimating transferability scores by learning aggregation functions.

Figure 6 .
Figure 6.Procedure of estimating transferability scores by learning aggregation functions.

) 4 : 1 p-
Compute classifier weight.β p = 1 − e p /e p 5: Update weight -For misclassified source domain instances: w T k = w T k × β −For misclassified target domain instances: w S k = w S k × β p 6: Iterating K process Output: The final classifier WC i .WC i = ∑ P p=1 (log β −1 p ×h p (x)) After K iterations, the final model is a combination of the K classifiers.

) 3 :
Get the predicted values by each classifier {TM i } Final predictions on the test dataset ŷk T k=1 .

Figure 7 .
Figure 7.The procedure of developing an ensemble model with the DA method.

Figure 7 .
Figure 7.The procedure of developing an ensemble model with the DA method.

Figure 9 .
Figure 9. Performance comparison in the AUC score of DA methods across different target domains in tool wear detection.Table 2.The AUC score results of tool wear detection of the proposed model and baseline models by the target domain (an average of five repetitions).The method shortcut refers to (TrAB = TrAda-Boost), (T-ML = Traditional machine learning without domain adaptation).The proposed method performs best overall.

Figure 9 .
Figure 9. Performance comparison in the AUC score of DA methods across different target domains in tool wear detection.

Figure 10 .
Figure 10.The rest rig for the PU bearing datasets.Figure 10.The rest rig for the PU bearing datasets.

Figure 10 .
Figure 10.The rest rig for the PU bearing datasets.Figure 10.The rest rig for the PU bearing datasets.

Figure 11
Figure11shows the data distribution of two different domains and its negative effect on the fault diagnosis performance.The scatter plot visualizes datasets from two different domains using the t-distributed stochastic neighboring embedding (t-SNE) algorithm, a dimensionality reduction method.Instances from dataset C, collected under domain C, are represented by orange points, while those from dataset A are depicted in blue.The red lines represent the decision boundary of the classifier model that was trained from the whole dataset C and only normal data in dataset A.

Figure 11 .
Figure 11.An example of the data distribution discrepancy between different domains.

Figure 11 .
Figure 11.An example of the data distribution discrepancy between different domains.

Table 3 .
The accuracy score results of tool wear detection of the proposed model and baseline models by the target domain (an average of five repetitions).

Table 3 .
The accuracy score results of tool wear detection of the proposed model and baseline models by the target domain (an average of five repetitions).

Table 4 .
Operational conditions of each domain.