Next Article in Journal
Executive Function, Working Memory, and Verbal Fluency in Relation to Non-Verbal Intelligence in Greek-Speaking School-Age Children with Developmental Language Disorder
Previous Article in Journal
Clinical Profiles and Socio-Demographic Characteristics of Adults with Specific Learning Disorder in Northern Greece
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Domain Adaptation Using a Three-Way Decision Improves the Identification of Autism Patients from Multisite fMRI Data

1
School of Artificial Intelligence, Beijing Normal University, Beijing 100875, China
2
Engineering Research Center of Intelligent Technology and Educational Application, Ministry of Education, Beijing 100875, China
*
Author to whom correspondence should be addressed.
Brain Sci. 2021, 11(5), 603; https://doi.org/10.3390/brainsci11050603
Submission received: 23 March 2021 / Revised: 29 April 2021 / Accepted: 4 May 2021 / Published: 8 May 2021
(This article belongs to the Section Neurotechnology and Neuroimaging)

Abstract

:
Machine learning methods are widely used in autism spectrum disorder (ASD) diagnosis. Due to the lack of labelled ASD data, multisite data are often pooled together to expand the sample size. However, the heterogeneity that exists among different sites leads to the degeneration of machine learning models. Herein, the three-way decision theory was introduced into unsupervised domain adaptation in the first time, and applied to optimize the pseudolabel of the target domain/site from functional magnetic resonance imaging (fMRI) features related to ASD patients. The experimental results using multisite fMRI data show that our method not only narrows the gap of the sample distribution among domains but is also superior to the state-of-the-art domain adaptation methods in ASD recognition. Specifically, the ASD recognition accuracy of the proposed method is improved on all the six tasks, by 70.80%, 75.41%, 69.91%, 72.13%, 71.01% and 68.85%, respectively, compared with the existing methods.

1. Introduction

Autism spectrum disorder (ASD) is a common neurodevelopmental disease originating in infancy [1,2,3,4,5,6]. According to a recent study, one in 45 children in the world has autism, and the number of affected children has increased by 78% in the last decade [7]. Some symptoms of ASD even appear in young children by the age of two years [8]. Therefore, the early diagnosis of and intervention in ASD have received great attention in recent years [9,10]. Researchers have applied machine learning methods to identify biomarkers from resting-state functional magnetic resonance imaging (rs-fMRI) data to assist in diagnosing ASD [11,12,13].
Machine learning methods have demonstrated their effectiveness with the assumption that we have sufficient training data and test data drawn from the same distribution [14,15]. However, this assumption calling for enough examples is not always satisfied in practical applications and is not true in most cases, which will lead to the poor generalization ability of a model trained on one dataset when applied to another new dataset. First, clinical neural image datasets often face the problem of small dataset size due to their expensive acquisition and time-consuming labels. Therefore, multisite rs-fMRI data are often combined to expand the dataset in some research, such as ASD diagnosis, which leads to the second problem: samples from different scanners or acquisition protocols do not follow the same distribution in most cases [16,17].
The fMRI samples from different sites have also been named domains in the machine learning research community. In addition to the distribution difference of the training set (source domain) and the test set (target domain), the scarcity of labelled samples is another challenge to ASD recognition. Previous studies have investigated domain adaptation approaches to overcome site-to-site transfer [18]. Many studies have successfully applied domain adaptation to object recognition [19], activity recognition [20], speech recognition [21], text classification [22] and autism recognition [7]. The main goal of domain adaptation is to reduce the difference in the data distribution between the source domain and the target domain and then train a robust classifier for the target domain by reusing the labelled data in the source domain.
At present, the research on domain adaptation mainly focuses on three methods, namely, instance adaptation, feature adaptation and classifier adaptation. Specifically, the instance-based domain adaptation method reuses samples from the source domain according to a certain weighting rule. Instance adaptation has achieved good results by eliminating cross domain differences [23]. However, this method needs to satisfy two strict assumptions: (1) the source domain and the target domain follow the same conditional distribution, and (2) some data in the source domain can be reused in the target domain by reweighting. The classifier-based domain adaptation method transfers knowledge from the source domain to the target domain by sharing parameters between the source domain and target domain [24,25]. Classifier transfer has performed well with labelled samples. However, regarding ASD diagnosis from fMRI, the data distributions of different sites are different, and reliable labelled data are difficult to obtain.
Therefore, the application of domain adaptations based on instances or classifiers is relatively difficult. However, the feature-based domain adaptation method can learn the subspace geometrical structure [26,27,28,29] or distribution alignment [30,31,32]. This method gears the marginal or conditional distributions of different domains in a principled dimensionality reduction procedure. Our work employed the feature-based domain adaptation method to eliminate the divergence of the data distribution.
Currently, most feature-based domain adaptation research is devoted to the adaptation of the marginal distribution, conditional distribution, or both. For example, Long et al. [26] found that the marginal distribution and conditional distribution between domains are different in the real world, and better performance can be achieved if the two distributions are adapted simultaneously. Subsequently, some studies based on joint distribution adaptation have been proposed successively [33,34,35], and these works have greatly contributed to the development of domain adaptation.
It is worth noting that in order to obtain the pseudolabels of the target domain data, the traditional methods usually directly apply the classifier trained in the source domain to the prediction of the target domain data. However, these pseudolabels might lead to some error due to the possible domain mismatch. Here, we proposed a robust method using a three-way decision model derived from triangular fuzzy similarity. The proposed model roughly classified the samples in the target domain into three domains, i.e., the positive region, the negative region and the boundary region. Then, the label propagation algorithm was used to optimize the label and make secondary decisions on the boundary region samples. The experiments demonstrate that our method can effectively improve the classification performance for automated ASD diagnosis.
The contributions of this paper is as follows:
  • A three-way decision model based on triangular fuzzy similarity is proposed to reduce the cost loss of target domain data prediction. To the best of authors’ knowledge, it is the first time to combine the three-way decision model and the distribution adaptation method to reduce the distribution differences between domains. The proposed method extends the application of machine learning in the field of decision making.
  • Our method utilizes the label information from the source domain and the structural information from the target domain at the same time, which not only reduces the distribution differences between domains but also further improves the recognition ability of the target domain data.
  • Comprehensive experiments on the Autism Brain Imaging Data Exchange (ABIDE) dataset prove that our method is better than several state-of-the-art methods.
The remainder of this paper is organized as follows. Section 2 reviews the related work concisely. In Section 3, we elucidate the foundation of the proposed method. Our proposed method is illustrated in detail in Section 4. Then, the results and discussion are presented in Section 5 and Section 6, respectively. Finally, the paper is concluded in Section 7.

2. Related Work

It has been a lasting challenge to build maps between different domains in the field of machine learning. Domain adaptation has become a hot research topic in disease diagnosis with machine learning. In this paper, we proposed transfer learning based on distribution adaptation and three-way decisions. To elaborate the proposed method, we will introduce the related work from the following three aspects in this section.

2.1. Distribution Adaptation

Distribution adaptation is one of the most commonly used methods in domain adaptation. It seeks a space translation and eliminated data distribution differences between source and target domains by explicitly minimizing the predefined distance in this feature space. According to the nature of the data distribution, distribution adaptation can be divided into three categories: marginal distribution adaptation, conditional distribution adaptation and joint distribution adaptation.
Pan et al. [36] first proposed a transfer component analysis (TCA) method based on marginal distribution adaptation, which used the maximum mean discrepancy (MMD) to measure the distance between domains and achieve feature dimensionality reduction. The method assumes that there is a mapping so that the marginal distribution of the mapped source domain and target domain is similar in the new space. The disadvantage of TCA is that the algorithm only focuses on reducing the cross-domain marginal distribution difference without considering reducing the conditional distribution difference. Long et al. [37] proposed a transfer joint matching (TJM) method, which mainly combines source domain sample selection and distribution adaptation to further eliminate cross domain distribution differences.
Recently, in the work based on conditional distribution adaptation, Wang et al. [38] proposed a stratified transfer learning method (STL). Its main idea is to reduce the spatial dimension in the reproducing kernel Hilbert space (RKHS) by using the intraclass similarity so as to eliminate the distribution differences. However, in the real world, differences may exist in both marginal distributions and different conditional distributions. Adjusting only one of the distributions is insufficient to bridge domain differences. In order to solve this problem, Long et al. [26] proposed the joint distribution adaptation (JDA) method. The goal of JDA is to jointly adjust the marginal distribution and the conditional distribution using a principled dimensionality reduction process, and the representation in this common feature space reduced the domain differences significantly. Other work extended JDA by adding structural consistency [29], domain invariant clustering [30] and label propagation [31].
To provide supervised information for the target domain, JDA methods applied source domain classifiers in the target domain and took the classifier outputs as the pseudolabels of the target domain data. However, due to the different data distributions of domains, the direct use of these inaccurate pseudolabels will result in the degradation of the final model’s performance.
Considering the domain gap both in labels and samples, three-way decisions provided a novel method to transmit the label information between domains and reuse the intrinsic structural information of the target domain data to further improve the performance of the model in the domain adaptation process.

2.2. Three-Way Decisions

As an effective extension of traditional rough sets, three-way decision [39] (3WD) theory has been widely applied to address uncertain, inaccurate and fuzzy problems, such as medical diagnosis [40], image processing [41], emotion analysis [42], etc. In simple terms, 3WD divides the universe of discourse into three disjoint parts, i.e., the positive region (Pos), the negative region (Neg), and the boundary region (Bnd), through a pair of upper and lower, approximately. Acceptance and rejection decisions were made for the objects in Pos and Neg, respectively. Specifically, the objects in Bnd adopt the delay decision.
Strictly speaking, the current 3WD research can be divided according to whether it is based on decision-theoretic rough sets (DTRSs) [43]. For example, Zhang et al. [44] proposed a 3WD model for interval-valued DTRSs and gave a new decision cost function. Liu et al. [45] introduced intuitionistic fuzzy language DTRSs and 3WD models to obtain fuzzy information in uncertain languages. Agbodah [46] focused on the study of the DTRS loss function aggregation method in group decision making and utilized it to construct a 3WD model.
In addition, scholars have also conducted in-depth explorations on 3WD outside the DTRS framework. For example, Liang et al. [47] integrated the risk preference of decision makers into the decision-making process and proposed a 3WD model based on the TODIM (an acronym in Portuguese for interactive multicriteria decision making) method. Qian et al. [48] investigated three-way formal concept lattices of objects (properties) based on 3WD. Yang et al. [49] presented a 3WD model oriented to multigranularity space to adapt 3WD to intuitionistic fuzzy decisions.
From a broad perspective, 3WD can be classified as static or dynamic. Static 3WD includes related research based on the DTRS framework and fusion of other theories. Dynamic 3WD mainly addresses the problem of constantly changing data in time series and space, and its typical representative is the sequential 3WD model [50]. For example, Yang et al. [51] proposed a three-way calculation method for dynamic mixed data based on time and space. Zhang et al. [52] systematically investigated a new sequential 3WD model to balance autoencoder classification and reduce its misclassification cost. Liu et al. [53] combined 3WD and granular computing to construct a dynamic three-way recommendation model to reduce decision-making costs.
3WD theory has been widely used in many areas, such as emerging three-way formal concept analysis [54], three-way conflict analysis [55], three-way granular computing [56], three-way classification [57], three-way recommendation [58], and three-way clustering [59]. This paper will combine the idea of 3WD to improve the performance of heterogeneous ASD data diagnosis by reducing the difference in the data distributions between the source domain and target domain.

2.3. Application of Machine Learning in Identification of ASD Patients

In recent years, magnetic resonance imaging (MRI) has been widely used in clinical practice [60,61]. The commonly used MRI can be divided into structural MRI (sMRI) and functional MRI (fMRI). As fMRI can measure the hemodynamic changes caused by the activity of brain neurons, it has been widely used in the research of brain dysfunction diseases. For example, Li et al. [62] proposed a 4D deep learning model for ASD recognition that can utilize both temporal and spatial information of fMRI data. In the work of Riaz et al. [63], they proposed an end-to-end deep learning method called DeepfMRI for accurately identifying patients with Attention Deficit Hyperactivity Disorder (ADHD) and achieved an accuracy rate of 73.1% on open datasets. To study the relationship between mild cognitive impairment (MCI) and Small Vessel Disease (SVD), Diciotti et al. [64] applied the Stroop test to the rs-fMRI data of 67 MCI subjects and found that regional homogeneity of rs-fMRI is significantly correlated with measurements of the cognitive deficits.
As a neurodevelopmental disorder, early diagnosis of ASD is very important to improve the quality of life of patients. In recent years, researchers have attempted to extract biomarkers representing ASD from fMRI data using machine learning methods, so as to provide an auxiliary diagnosis for clinicians. For example, Lu et al. [65] proposed a multi-kernel-based subspace clustering algorithm for identifying ASD patients, which still has a good clustering effect on high-dimensional network datasets. Leming et al. [66] trained a convolutional neural network and applied it to ASD recognition, and their experiments showed that deep learning models that distinguish ASD from NC controls focus broadly on temporal and cerebellar connections. However, the problem of small size fMRI data prevented the generalization of the above research works [67].
To solve this problem, the Autism Brain Imaging Data Exchange, an international collaborative project, has collected data from over 1000 subjects and made the whole database publicly available. Based on the ABIDE database, many advanced machine learning models have been proposed for the identification of ASD patients. For example, Eslami et al. [68] used autoencoder and single-layer perceptron to diagnose ASD and proposed a deep learning framework called ASD-DiagNet, which achieved classification accuracy of 70.3%. Bi et al. [69] used randomized support vector machine (SVM) clusters to distinguish ASD patients from normal controls and identified a number of abnormal brain regions that contribute to ASD. Mladen et al. [70] selected 368 ASD patients and 449 normal controls from ABIDE database, and then used the Fisher score as the feature selection method to quantitatively analyze 817 subjects and obtained classification accuracy of 85.06%.

3. Preliminaries

We start with the definition of the problem and the terms and introduce the notation we will use below. The source domain data denoted as X s d × n s are drawn from distribution P s ( X s ) , and the target domain data denoted as X t d × n t are drawn from distribution P t ( X t ) , where d is the dimension of the data instance and n s and n t are the number of samples in the source and target domains, respectively.
Assume a labelled source domain D s = { ( x i , y i ) } i = 1 n s , where   x i d × n s , and an unlabeled target domain D t = { ( x j ) } j = 1 n t   and   x j d × n t . We assume that their feature space and label space are the same, i.e., X s = X t   and   Y s = Y t , but their marginal distribution and conditional distribution are different, i.e., P s ( X s ) P t ( X t )   and   P s ( Y s | X s ) P t ( Y t | X t ) .
Domain adaptation methods often seek to reduce the distribution differences across domains by explicitly adapting both the marginal and conditional distributions between domains. To be specific, domain adaptation seeks to minimize the distance (Equation (1)):
D ( D s , D t ) D ( P s ( X s ) , P t ( X t ) ) + D ( P s ( Y s | X s ) , P t ( Y t | X t ) )
where D ( P s ( X s ) , P t ( X t ) ) and D ( P s ( Y s | X s ) , P t ( Y t | X t ) ) are the marginal distribution distance and conditional distribution distance between domains, respectively.
There are many metrics that can be used to estimate the distance between distributions, such as the Kullback–Leibler (KL) divergence. However, most of these distance metrics are based on parameters, and it is difficult to calculate the distance. Therefore, Borgwardt et al. [71] proposed a nonparametric distance metric MMD using a kernel learning method to measure the distance between two distributions in RKHS. The definition of the MMD is as follows:
Definition 1.
Given two random variables  X s and X t , their MMD squared distance is calculated as follows (Equation (2)):
D i s t ( X s , X t ) = | | 1 n s i = 1 n s ( x i ) 1 n t j = 1 n t ( x j ) | | 2
where is a universal RKHS[72], and : X .
Next, we introduce the concepts of triangular fuzzy numbers and three-way decisions.
Definition 2.
[73]. Let t ˜ = [ t L , t M , t T ] be a triangular fuzzy number, where t L and t T denote the upper bound and lower bound of t ˜ , respectively, and t M is the median of t ˜ . If 0 < t L t M t T is satisfied, then t ˜ is called a normal triangular fuzzy number. For any two triangular fuzzy numbers t ˜ = [ t L , t M , t T ] and k ˜ = [ k L , k M , k T ] , the distance between them is as follows (Equation (3)):
d ( t ˜ , k ˜ ) = ( t L k L ) 2 + ( t M k M ) 2 + ( t T k T ) 2 3
In addition, the basic operations between t ˜ = [ t L , t M , t T ] and k ˜ = [ k L , k M , k T ] are as follows (Equation (4)):
t ˜ + k ˜ = [ t L , t M , t T ] + [ k L , k M , k T ] = [ t L + k L , t M + k M , t T + k T ] t ˜ k ˜ = [ t L , t M , t T ] [ k L , k M , k T ] = [ t L k L , t M k M , t T k T ] t ˜ × k ˜ = [ t L , t M , t T ] × [ k L , k M , k T ] = [ t L × k L , t M × k M , t T × k T ]
Definition 3.
[74]. Let U be the universe of discourse, X U . If threshold 0 β < α 1 exists, then its positive region, negative region and boundary region are defined with threshold ( α , β ) (Equation (5)):
P o s ( α , β ) ( X ) = { x U | Pr ( X | [ x ] ) α } B n d ( α , β ) ( X ) = { x U | β < Pr ( X | [ x ] ) < α } N e g ( α , β ) ( X ) = { x U | Pr ( X | [ x ] ) β }
where [ x ] is the equivalence class containing x , and Pr ( X | [ x ] ) is the conditional probability.

4. Methods

4.1. Joint Distribution Adaptation

Domain adaptation seeks an invariant feature expression for the source domain and the target domain in a low-dimensional (K < d) space. Let W d × k be the linear transformation matrix and Z s = W T X s and Z t = W T X t be the projected variables from the source and target data, respectively. We use the nonparametric metric MMD, which computes the distance between the sample means of the source and target data in the k-dimensional embeddings, to estimate the difference between distributions. Specifically, according to Equation (2), D ( P s ( X s ) , P t ( X t ) ) can be expressed as (Equation (6)):
D ( P s ( X s ) , P t ( X t ) ) = | | 1 n s i = 1 n s W T x i 1 n t j = 1 n t W T x j | | 2
By further using the matrix transformation rule and regularization and then minimizing the marginal distribution distance, Equation (6) can be formalized as follows (Equation (7)):
D ( P s ( X s ) , P t ( X t ) ) = t r ( A T X M 0 X T A )
where X represents the input matrix containing X s and X t . In addition, following [26], M 0 is the MMD matrix and can be constructed as follows (Equation (8)):
( M 0 ) i j = { 1 n s 2 , x i , x j D s 1 n t 2 , x i , x j D t 1 n s n t , o t h e r w i s e
However, the label information of the domain data is not considered, which will lead to the lack of sufficient discriminability of the adapted features; therefore, so it is insufficient to adapt to the marginal distribution only. To solve this problem, we will next adjust the conditional distribution between domains.
Since no label information is available in the target domain, we cannot directly estimate the conditional distribution P t ( Y t | X t ) of the target domain. Here, based on the concept of sufficient statistics, we can replace P t ( Y t | X t ) and P s ( Y s | X s ) with class conditional distributions P t ( X t | Y t ) and P s ( X s | Y s ) , respectively. However, obtaining target domain label information through source domain data while reducing the distribution difference between domains is a challenging problem in unsupervised domain adaptation. In Section 4.2, we introduce how to obtain the label information of the target domain data so as to obtain the above class conditional distribution. Thus far, we can match the class condition distribution of the two domains. Similar to the calculation of the marginal distribution, we use the modified MMD formula to estimate the conditional distribution D ( P s ( Y s | X s ) , P t ( Y t | X t ) ) between domains. D ( P s ( Y s | X s ) , P t ( Y t | X t ) ) can be represented as (Equation (9)):
D ( P s ( Y s | X s ) , P t ( Y t | X t ) ) = c = 1 C | | 1 n s c x i D s ( c ) W T x i 1 n t c x j D t ( c ) W T x j | | 2
where c { 1 , 2 , 3 , , C } is the class label, and D s ( c ) and D t ( c ) are samples belonging to class c in the source domain and target domain, respectively. n s c and n t c are the number of samples belonging to class c in the source domain and target domain, respectively.
Similar to the marginal distribution, we formalize Equation (9) as Equation (10) by using matrix transformation rules and regularization:
D ( P s ( Y s | X s ) , P t ( Y t | X t ) ) = t r ( W T X M c X T W )
where the MMD matrices M c containing class labels are constructed as follows (Equation (11)):
( M c ) i j = { 1 ( n s c ) 2 , x i , x j D s ( c ) 1 ( n t c ) 2 , x i , x j D t ( c ) 1 n s c n t c , { x i D s ( c ) , x j D t ( c ) x i D t ( c ) , x j D s ( c ) 0 , o t h e r w i s e
In order to reduce both the marginal distribution and conditional distribution between domains, we incorporate Equations (7) and (10) into one object Function (Equation (12)):
min   c = 0 C t r ( W T X M c X T W ) + λ | | W | | 2 F s . t .   W T X H X T W = I
where the first term considers both the adaptive marginal distribution and conditional distribution, and the second term is the regularization term. | | | | 2 F is the Frobenius norm, and λ is the regularization parameter. As noted in [29], adding the constraint in Function (12) would preserve the inner properties of the original data, which implies and introduces an additional data discrimination ability into the learned model. In addition, in function (12), X represents the input matrix containing X s and X t ; I ( n s + n t ) × ( n s + n t ) denotes the identity matrix; and H = I 1 n s + n t 1 is the centering matrix, where 1 is the ( n s + n t ) × ( n s + n t ) matrix of ones.
To obtain the transformation matrix W , we obtain the Lagrange solution to function (12), which is rewritten as (Equation (13)):
L = c = 0 C t r ( W T X M c X T W ) + λ | | W | | 2 F + t r ( ( I W T X H X T W ) Φ )
where Φ = ( 1 , 2 , d ) is the Lagrange multiplier. Setting L W = 0 , the original optimization problem is transformed into the following eigen-decomposition problem (Equation (14)):
( c = 0 C X M c X T + λ I ) W = X H X T W Φ
The transformation matrix W is the solution to Equation (14) and thus builds the bridge between the source and target domains in the new expression Z = ( Z s , Z t ) .

4.2. Three-Way Decision Model Based on Triangular Fuzzy Similarity

In practice, the conditional distribution cannot be obtained directly because there is no label information in the target domain. In order to solve this problem, we first give the concept of the degree of information difference and apply it to the construction of triangular fuzzy numbers and the calculation of the corresponding triangular fuzzy similarity. Then, according to the degree of association of the triangular fuzzy similarity between objects in the target domain, the target domain is divided into positive regions, negative regions and boundary regions with structural information.
For the convenience of the description, suppose that both the domain of discourse U and attribute set A are nonempty finite sets and that x i is an object in U, a j is an attribute in A, where i = 1 , 2 , , n   a n d   j = 1 , 2 , , m .

4.2.1. Information Difference Degree and Triangular Fuzzy Similarity

Definition 4.
Let U = { x 1 , x 2 , , x n } be the domain of discourse, A = { a 1 , a 2 , , a m } be the set of attributes, and the value of object x i under attribute a j be x i j . When a j , a k A , the degree of information difference of object x i is as follows (Equation (15)):
I D i ( a j , a k ) = 1 exp ( log 2 | x i j x i k | x i j + x i k )
Remark 1.
(1) 
The greater the value of  I D i ( a j , a k ) is, the greater the degree of information difference of object x i under a j and a k . When object x i has the same description x i j = x i k = 0 for a j and a k , the real part of the log function will have a denominator of 0, i.e., x i j + x i k = 0 . In this case, since | x i j x i k | = 0 , we can obtain that the final degree of information deviation I D i ( a j , a k ) is independent of the value of x i j + x i k . For the reasonableness of the calculation, let | x i j x i k | x i j + x i k = 0 .
(2) 
For the convenience of the representation, we obtain the information difference matrix of object  x i , which can be expressed as follows (Equation (16)):
                                a 1                                   a 2                                   a m 1                     a m I D i = a 1 a 2 a m 1 a m [ I D i 11   I D i 21     I D i ( m 1 ) 1   I D i m 1 I D i 12   I D i 22     I D i ( m 1 ) 2   I D i m 2         I D i 1 ( m 1 )   I D i 2 ( m 1 )   I D i ( m 1 ) ( m 1 )   I D i m ( m 1 ) I D i 1 m   I D i 2 m     I D i ( m 1 ) m   I D i m m ]
where I D i j k = I D i ( a j , a k ) represents the degree of information difference of object x i under attributes a j and a k .
Theorem 1.
According to definition 4, we have the following conclusions:
(1) 
Boundedness: 0 I D i ( a j , a k ) 1 .
(2) 
Monotonicity: The degree of information difference of x i about a j and a k increases monotonously as the difference increases.
(3) 
Symmetry: I D i ( a j , a k ) = I D i ( a k , a j ) .
Proof. 
Properties (2) and (3) are easily proven by Definition 4.
(1)
According to Definition 4, a j , a k A and x i U . When the description of x i under a j and a k appears in two extreme cases, namely, x i j = 0   and   x i k = 1 or x i j = 1   and   x i k = 0 , we can obtain | x i j x i k | = 1 , and the information difference reaches the maximum at this time, I D i ( a j , a k ) = 1 . □
Definition 5.
Let U be the domain of discourse, and the triangular fuzzy number of  x i under attribute set A is x ˜ i = [ x i L , x i M , x i T ] , where x i L = min { I D i j k } , x i T = max { I D i j k } , x i M = max { | I D i j k | } and | I D i j k | denotes the number of information difference values I D i j k . Then, the degree of triangular fuzzy similarity between x ˜ i and x ˜ k is as follows (Equation (17)):
S ˜ T F ( x ˜ i , x ˜ k ) = 1 d ( x ˜ i , x ˜ k ) = 1 ( x ˜ i L x ˜ k L ) 2 + ( x ˜ i M x ˜ k M ) 2 + ( x ˜ i T x ˜ k T ) 2 3
Theorem 2.
The degree of triangular fuzzy similarity satisfies the following properties:
(1) 
S ˜ T F ( x ˜ i , x ˜ k ) = S ˜ T F ( x ˜ k , x ˜ i ) .
(2) 
S ˜ T F ( x ˜ i , x ˜ k ) = 1 if x ˜ i = x ˜ k , and S ˜ T F ( x ˜ i , x ˜ k ) = 0 if x ˜ i = [ 0 , 0 , 0 ] and x ˜ k = [ 1 , 1 , 1 ] or x ˜ i = [ 1 , 1 , 1 ] and x ˜ k = [ 0 , 0 , 0 ] .
Proof. 
According to Definition 5, (1) obviously holds.
(2)
Since 0 S ˜ T F ( x ˜ i , x ˜ k ) 1 , when x ˜ i = x ˜ k , i.e., x ˜ i L = x ˜ k L , x ˜ i M = x ˜ k M , x ˜ i T = x ˜ k T , we have d ( x ˜ i , x ˜ k ) = 0 , so S ˜ T F ( x ˜ i , x ˜ k ) = 1 . Similarly, since x ˜ i L , x ˜ i M , x ˜ i T [ 0 , 1 ] and x ˜ k L , x ˜ k M , x ˜ k T [ 0 , 1 ] , d ( x ˜ i , x ˜ k ) [ 0 , 1 ] . When S ˜ T F ( x ˜ i , x ˜ k ) = 0 , d ( x ˜ i , x ˜ k ) = 1 . In this case, we can obtain x ˜ i = [ 0 , 0 , 0 ] and x ˜ k = [ 1 , 1 , 1 ] or x ˜ i = [ 1 , 1 , 1 ] and x ˜ k = [ 0 , 0 , 0 ] . □

4.2.2. Construction of the 3WD Model

Definition 6.
Let U be the universe and A be the set of attributes. The triangular fuzzy similarity between any object x i and x k in U is S ˜ S F ( x ˜ i , x ˜ k ) . If there is a threshold δ , then the δ -level classes of x U with respect to S ˜ S F ( x ˜ i , x ˜ k ) are defined as follows (Equation (18)):
[ S ˜ S F δ ] s = { x U | S ˜ S F ( x ˜ i , x ˜ k ) > δ } [   S ˜ S F δ ] g = { x U | S ˜ S F ( x ˜ i , x ˜ k ) δ }
where [   S ˜ S F δ ] s and [   S ˜ S F δ ] g are triangular fuzzy similarity classes of positive and negative fields, respectively. Specifically, the objects in [   S ˜ S F δ ] s have the smallest degree of information difference and the largest triangular fuzzy similarity on the δ -level while the objects in [   S ˜ S F δ ] g are the opposite to [   S ˜ S F δ ] s .
Suppose X U is a given goal concept and Ω = { [   S ˜ S F δ ] s , [   S ˜ S F δ ] g } is the set of states, which represents object x in the δ -level similarity domain [   S ˜ S F δ ] s or x in the δ -level negative similarity domain [   S ˜ S F δ ] g . Γ = { a P , a B , a N } is the set of actions, where a P means acceptance, a B means delay, and a N means rejection. According to reference [75], the losses caused by actions taken in different states are shown in Table 1.
When the object x [   S ˜ S F δ ] s , λ P P , λ B P and λ N P represent the loss of acceptance, delay and rejection decisions, respectively. Analogously, λ P N , λ B N and λ N N represent the corresponding decision loss cost when x [   S ˜ S F δ ] g . Without any loss of generality, when x [   S ˜ S F δ ] s , we assume that the correct acceptance cost is less than the delay decision cost and less than the corresponding wrong acceptance cost, namely λ P P < λ B P < λ N P . Similarly, when misclassified, we have λ N N < λ B N < λ P N . Therefore, the expected losses R ( a | x ) ( · { P , B , N } ) of object x under the above three decision actions are as follows (Equation (19)):
R ( a P | x ) = λ P P Pr ( [   S ˜ S F δ ] s | x ) + λ P N Pr ( [   S ˜ S F δ ] g | x ) , R ( a B | x ) = λ B P Pr ( [   S ˜ S F δ ] s | x ) + λ B N Pr ( [   S ˜ S F δ ] g | x ) , R ( a N | x ) = λ N P Pr ( [   S ˜ S F δ ] s | x ) + λ N N Pr ( [   S ˜ S F δ ] g | x )
where   Pr ( [   S ˜ S F δ ] s | x ) =   P (   S ˜ S F δ | x ) and Pr ( [   S ˜ S F δ ] g | x ) = 1   P (   S ˜ S F δ | x ) are the probabilities that object x belongs to a similar state of the δ -level positive or negative domain. By introducing Bayesian minimum risk decision theory, we have (Equation (20)):
( P )   λ P P P ( S ˜ S F δ | x ) + λ P N ( 1 P ( S ˜ S F δ | x ) ) λ B P P ( S ˜ S F δ | x ) + λ B N ( 1 P ( S ˜ S F δ | x ) )   a n d                 λ P P P ( S ˜ S F δ | x ) + λ P N ( 1 P ( S ˜ S F δ | x ) ) λ N P P ( S ˜ S F δ | x ) + λ N N ( 1 P ( S ˜ S F δ | x ) ) ( B )     λ B P P ( S ˜ S F δ | x ) + λ B N ( 1 P ( S ˜ S F δ | x ) ) λ P P P ( S ˜ S F δ | x ) + λ P N ( 1 P ( S ˜ S F δ | x ) )   a n d                 λ B P P ( S ˜ S F δ | x ) + λ B N ( 1 P ( S ˜ S F δ | x ) ) λ N P P ( S ˜ S F δ | x ) + λ N N ( 1 P ( S ˜ S F δ | x ) ) ( N )     λ N P P ( S ˜ S F δ | x ) + λ N N ( 1 P ( S ˜ S F δ | x ) ) λ P P P ( S ˜ S F δ | x ) + λ P N ( 1 P ( S ˜ S F δ | x ) )   a n d                 λ N P P ( S ˜ S F δ | x ) + λ N N ( 1 P ( S ˜ S F δ | x ) ) λ B P P ( S ˜ S F δ | x ) + λ B N ( 1 P ( S ˜ S F δ | x ) )
Furthermore, form Equations (19) and (20), we can obtain (Equation (21)):
( P )   I f   P ( S ˜ S F δ | x ) α   ,   t h e n   x P o s ( X ) , ( B )   I f   β < P ( S ˜ S F δ | x ) < α ,   t h e n   x B n d ( X ) , ( N )   I f   P ( S ˜ S F δ | x ) β   ,   t h e n   x N e g ( X )
where (Equation (22))
α = λ P N λ B N ( λ P N λ B N ) + ( λ B P λ P P ) , β = λ B N λ N N ( λ B N λ N N ) + ( λ N P λ B P ) .
In the Algorithm 1, we first measure the degree of information difference for each object according to any two attributes in the target domain (line 1 and line 2). On this basis, the triangular fuzzy similarity of each object can be calculated (line 3). It is worth noting that we can obtain triangular fuzzy similarity at different levels by adjusting the threshold parameter δ . Furthermore, the triangular fuzzy similarity is regarded as the cost loss of different classification decisions, and the final decision is implemented by comparing with the decision thresholds α and β (line 4).
In addition, the higher the value of δ is, the greater the triangular fuzzy similarity between objects is. On the one hand, since [   S ˜ S F δ ] s = { x U | S ˜ S F ( x ˜ i , x ˜ k ) > δ } , [   S ˜ S F δ ] g = { x U | S ˜ S F ( x ˜ i , x ˜ k ) δ } , by changing the parameter δ , we can obtain the triangular fuzzy similarity of objects in the target domain at different levels. One the other hand, the values of [   S ˜ S F δ ] s and [   S ˜ S F δ ] g will directly affect the values of threshold α and β . In order to visualize the impact of the final result and the threshold, we have shown it in detail in Section 6.1.
Algorithm 1 Three-way decision model based on the triangular fuzzy similarity
Input: target domain data X t , threshold δ , α and β .
Output: positive region object set P o s ( X ) , negative region object set N e g ( X ) , boundary region object set B n d ( X ) .
1: BEGIN
2: Calculate the degree of information difference I D i ( a j , a k ) of each object in the target domain under any two attributes according to Equation (15).
3: Calculate the triangular fuzzy similarity S ˜ T F ( x ˜ i , x ˜ k ) between any two objects in the target domain using Equation (17).
4: According to Equation (21), divide the target domain X t into three domains.
5: END BEGIN

4.3. Adaptation Via Iterative Refinement

In this section, we integrate the methods presented in Section 4.1 and Section 4.2 and finally realize unsupervised domain adaptation to the conditional distribution of cross-domain data by introducing the label propagation algorithm. Specifically, we first obtain the initial pseudolabel y ^ T of the target domain according to joint distribution adaptation, then obtain the set of boundary objects of the target domain according to the three-way decision model proposed in Section 4.2 and place these objects into objects to be classified. Once the above y ^ T and B n d ( X ) are obtained, we effectively set a semisupervised setting for the target domain data. Following [29], we use the label propagation algorithm to discriminate the boundary objects in the target domain and update y ^ T . Algorithm 2 summarizes our proposed method. Algorithm 2—which in addition to the initial stage, we only adapt to the marginal distribution—and the subsequent steps consider both the marginal distribution and the conditional distribution. In addition, the accuracy of the labels in the target domain is gradually improved as the cross domain distribution differences decrease. In the following experiments, we will show that the proposed method converges to the optimal solution in a finite number of iterations and further prove the effectiveness of the proposed method.
Algorithm 2 Our Proposed Model
Input: source domain data X s , target domain data X t , labels y S of source domain data, threshold δ , α and β
Output: y T as labels of target domain data
1: BEGIN
2: Initialize D t ( c ) as Null
3: while not converged do
4:   (1) W   Distribution adaptation ( D t ( c ) y ^ T ) in Equation (14) and let Z s = W T X s and Z t = W T X t
5:    (2) Assign y ^ T using classifiers trained by Z s
6:    (3) Obtain B n d ( X ) in Algorithm 1
7:     (4) ( D t ( c ) y ^ T )   execute label propagation algorithm
8: End while
9: y T y ^ T
10: END BEGIN

5. Experiments

5.1. Materials

5.1.1. Data Acquisition

In order to verify the effectiveness of our proposed method and compare this method with the existing research, our experimental data are obtained from the publicly accessible ABIDE dataset. ABIDE is a multisite platform that has aggregated functional and structural brain imaging data collected from 17 different laboratories around the world, which including 539 ASD patients and 573 neurotypical controls. All subjects had corresponding resting-state fMRI images and phenotypic information such as age and gender. More details on the data collection, exclusion criteria, and scan parameters are available on the ABIDE website, namely, http://fcon_1000.projects.nitrc.org/indi/abide/, (accessed on 8 October 2020). As different sites have different numbers of limited samples, we use the data from three different sites, including NYU, UM and USM, each with more than 50 subjects and using different fMRI protocols. Specifically, there were 343 subjects, including 159 ASD patients and 184 neurotypical controls. Detailed demographic information of the subjects is listed in Table 2. In Table 2, m ± std and M/F are short for mean ± standard deviation and male/female, respectively. In each site, we used the two-sample t-test to evaluate the differences in age between the two groups and no significant differences was observed between the control group and the ASD group, i.e., p = 0.42 (NYU), p = 0.31 (USM), p = 0.34 (UM). Since the subjects across different sites follow different distributions, it is necessary to perform domain adaptation. In the experiments, we use A→B to denote the knowledge transfer from source domain A to target domain B. We construct a total of six tasks: NYU→USM, NYU→UM, USM→NYU, USM→UM, UM→NYU, and UM→USM.

5.1.2. Data Pre-Processing

To ensure replicability, each rs-fMRI datapoint used in this research was provided by the Preprocessed Connectome Project initiative and preprocessed by using the Data Processing Assistant for Resting-State fMRI (DPARSF) software [76]. The image preprocessing steps are listed as follows. (1) Remove the first 10 time points, (2) conduct slice timing correction, and (3) conduct head motion realignment. (4) Next, image standardization was performed by normalizing the functional images into the echo planar imaging (EPI) template, followed by (5) spatial smoothing, (6) removing the linear trend, (7) temporal filtering, and (8) removing covariates. Subsequently, the brain was divided into 90 regions of interest (ROIs) based on the Automatic Anatomical Labelling (AAL) [77] atlas, and the average time series of each ROI was extracted. Then, for each subject, we obtained a 90 × 90 functional connectivity symmetric matrix, where each element represents the Pearson correlation coefficient between a pair of ROIs. Finally, we convert the upper triangle into a 4005 (90 × 89/2)-dimensional feature vector to represent each subject.

5.2. Competing Methods

We compared the performance of our method with the following state-of-the-art machine learning models, including one baseline method and three representation-based methods.
Baseline: In this study, we use a support vector machine (SVM) as the base classifier, which is widely used in the field of neuroimaging [11]. Specifically, we specify site data as the source domain, directly train an SVM model using the original features on it, and then use the rest of the site data as the target domain to test the classifier we have trained. In the SVM classifier, we applied a linear kernel and searched the margin penalty using the grid-search strategy from the range of [2−5, 2−4…, 24, 25] via cross-validation.
Transfer component analysis (TCA) [36]: This is a general feature transformation method that reduces the difference in the marginal distribution between different domains by learning the transfer components between domains in RKHS.
Joint distribution adaptation (JDA) [26]: The JDA approach reduces both the marginal distribution and conditional distribution between different domains.
Domain adaptation with label and structural consistency (DALSC) [29]: DALSC is an unsupervised domain adaptation method that uses the structural information of the target domain to improve the performance of the model while adjusting the marginal distribution and conditional distribution between domains.

5.3. Experimental Setup

In this work, we use 5-fold cross-validation to evaluate the performance of each method. For our method, we set δ = 0.3 , β is searched in {0.5,0.55, ,0.85,0.9}, α is searched in {0.55,0.6, ,0.9,0.95}, and α > β . In addition, to evaluate the classification performance, we calculated the true positives (TPs), false positives (FPs), true negatives (TNs), and false negatives (FNs) for the classification by comparing the classified labels and gold-standard labels. Then, six evaluation metrics on test data, including the classification accuracy (ACC), sensitivity (SEN), specificity (SPE), balanced accuracy (BAC), positive predictive value (PPV) and negative predictive value (NPV), are utilized. These metrics can be computed as follows (Equation (23)):
A C C = ( T P + T N ) / ( T P + F N + T N + F P ) , S E N = T P / ( T P + F N ) S P E = T N / ( T N + F P ) , B A C = ( S E N + S P E ) / 2 P P V = T P / ( T P + F P ) , N P V = T N / ( T N + F N )
For these metrics, higher values indicate better classification performance.

5.4. Results on ABIDE with Multisite fMRI Data

In this section, we present the experimental results of the proposed method and several other comparative methods on six tasks. Note that data from each site can be used as the source domain while the data from other sites can be used as the target domain. For the three domain adaptation methods (i.e., TCA, JDA, and DALSC) and our proposed method, an unsupervised adaptive experimental setup is adopted, which has no label information of the target domain to be utilized in the prediction process. The classification performance results of various methods are shown in Table 3. From Table 3, we can make the following three observations.
First, in terms of accuracy, the domain adaptive method based on feature representation is better than the direct use of the SVM classifier to predict the target domain.
Second, the TCA method in the domain adaptation method has the worst classification result because it only considers the marginal distribution.
Finally, the experimental results show that the classification accuracy of the proposed method is better than the existing domain adaptive methods (such as TCA, JDA and DALSC) in six tasks, and it also has good performance in SEN, SPE, BAC and other indicators.

6. Discussion

In this section, we first analyze the influence of the parameters in the proposed method on the algorithm performance and then compare the proposed method with other state-of-the-art methods.

6.1. Parameter Analysis

We first analyze the impact of the number of iterations on the performance of the proposed method. As mentioned in Section 4.3, for domain adaptation, we solve the proposed model iteratively. In order to evaluate its convergence, Figure 1 shows the change in algorithm accuracy as the number of iterations increases on the six tasks. It can be seen from Figure 1 that the classification accuracy of each task is gradually improved with the increase in the number of iterations. This indicates that our model learned an invariant data distribution among domains/sites after multiple iterations. The figure shows that the accuracy rate converges in 10–15 iterations.
In addition, the values of α and β involved in the experiment represent different decision risk cost levels, and their slight differences may induce different decision results. Without any loss of generality, in order to obtain more suitable parameters, we analyze the influence of different threshold parameters on the performance of the proposed method. Specifically, in order to evaluate the method’s convergence, we conducted comparison experiments at different levels on the six tasks, and the final results are shown in Figure 2. The figures show that the accuracy of the algorithm changes as the threshold changes; and although the degree of fluctuation of the accuracy is different under different ( α , β ) , it will eventually converge. It can be seen from Figure 2 that the optimal values of ( α , β ) under six tasks NYU UM, NYU USM, USM UM, USM NYU, UM NYU and UM USM are (0.8, 0.7), (0.75, 0.65), (0.7, 0.6), (0.8, 0.7), (0.75, 0.55), (0.9, 0.6), respectively. Furthermore, it can be observed from Figure 2 that when given smaller β and larger α , the classification accuracy of the six tasks is relatively low. This shows that smaller β and larger α result in more samples from the target domain being divided into the boundary region. More boundary objects increase the uncertainty information when implementing the label propagation algorithm, which leads to the decline of classification performance.

6.2. Comparison with State-of-the-Art Methods

To further verify the effectiveness of our proposed method, we also compare it with six other advanced methods (including the deep learning method) using the rs-fMRI data in the ABIDE database. Since only a few research papers have reported their average classification results among different sites, we only list the classification results on the NYU site in Table 4. In addition, we list the details of each method in Table 4, including the classifier and the type of feature. It is worth noting that in the research of [14,17], they selected a part of the samples from each site in proportion as the training set and then used the trained deep learning model to predict the NYU site directly.
As Table 4 shows, the proposed method achieves 72.13% and 71.01% classification accuracy, respectively in the two tasks with NYU as the target domain, which is better than the models proposed in other research papers. In terms of feature type and feature dimension, this paper uses AAL atlas to divide brain regions, and obtains the original feature vector with the smallest dimension. In addition, although the sGCN, DAE and DANN are three deep learning methods, our proposed method still has a better classification effect. There may be two reasons for this. (1) Training a robust deep learning model usually requires a large number of samples. However, for multisite ASD recognition, although the data from each site can be fused together to generate a larger data set, these samples are still insufficient to train a reliable deep neural network. (2) The overfitting problem usually occurs when a deep neural network processes data with noise. In fact, fMRI data usually contain a large amount of noise information, which limits the generalization ability of the trained neural network.

7. Conclusions

In this paper, we propose a novel domain adaptation method for ASD identification with rs-fMRI data. Specifically, we introduce a three-way decision model based on triangular fuzzy similarity and divide the objects in the target domain with coarse granularity. Then, a label propagation algorithm is used to make secondary decisions on boundary region objects so as to improve the performance of ASD diagnosis based on cross-site rs-fMRI data. We conduct extensive experiments on the ABIDE dataset based on multisite data to verify the convergence and robustness of the proposed algorithm. Compared with several state-of-the-art methods, the experimental results show that the proposed method has better classification performance.
Although the classification results of our proposed method based on cross-site ASD diagnosis are significantly improved compared with the existing domain adaptation methods based on feature distribution, the following technical problems need to be considered in the future. First, although the proposed method can alleviate data heterogeneity between source and target domains, the input fMRI features are still unfiltered original high-dimensional features. However, the original high-dimensional features may have redundant features, which will reduce the performance of the model. Therefore, in the future, we will study how to combine feature selection with our methods for ASD diagnosis. Second, in this paper, we only take the functional connectivity matrix of rs-fMRI data as the feature representation of each subject without considering the network topology information. In future research, we will consider the fusion of functional brain network topology data to provide more valuable discriminant information for ASD diagnosis. Finally, in order to obtain more valuable structured information of the target domain, we will consider combining multigranularity rough sets to further improve the model performance in the future.

Author Contributions

C.S. initiated the research and wrote the paper. C.S. and X.X. performed the experiments; J.Z. supervised the research work and provided helpful suggestions. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the General Program (61977010), Key Program (61731003) of Nature Science Foundation of China and the Beijing Normal University Interdisciplinary Research Foundation for the First-Year Doctoral Candidates (BNUXKJC1925).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors would like to thank all individuals who participated in the initial experiments from which raw data were collected.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Khan, N.A.; Waheeb, S.A.; Riaz, A.; Shang, X. A Three-Stage Teacher, Student Neural Networks and Sequential Feed Forward Selection-Based Feature Selection Approach for the Classification of Autism Spectrum Disorder. Brain Sci. 2020, 10, 754. [Google Scholar] [CrossRef] [PubMed]
  2. Kong, Y.; Gao, J.; Xu, Y.; Pan, Y.; Wang, J.; Liu, J. Classification of autism spectrum disorder by combining brain connectivity and deep neural network classifier. Neurocomputing 2019, 324, 63–68. [Google Scholar] [CrossRef]
  3. Zecavati, N.; Spence, S.J. Neurometabolic disorders and dysfunction in autism spectrum disorders. Curr. Neurol. Neurosci. 2009, 9, 129–136. [Google Scholar] [CrossRef] [PubMed]
  4. Amaral, D.G.; Schumann, C.M.; Nordahl, C.W. Neuroanatomy of autism. Trends Neurosci. 2008, 31, 137–145. [Google Scholar] [CrossRef] [PubMed]
  5. Khundrakpam, B.S.; Lewis, J.D.; Kostopoulos, P.; Carbonell, F.; Evans, A.C. Cortical Thickness Abnormalities in Autism Spectrum Disorders Through Late Childhood, Adolescence, and Adulthood: A Large-Scale MRI Study. Cereb. Cortex 2017, 27, 1721–1731. [Google Scholar] [CrossRef]
  6. Zablotsky, B.; Black, L.I.; Maenner, M.J.; Schieve, L.A.; Blumberg, S.J. Estimated Prevalence of Autism and Other Developmental Disabilities Following Questionnaire Changes in the 2014 National Health Interview Survey. Natl. Health Stat. Rep. 2015, 87, 1–20. [Google Scholar]
  7. Wang, M.; Zhang, D.; Huang, J.; Yap, P.; Shen, D.; Liu, M. Identifying Autism Spectrum Disorder With Multi-Site fMRI via Low-Rank Domain Adaptation. IEEE Trans. Med. Imaging 2020, 39, 644–655. [Google Scholar] [CrossRef]
  8. Fernell, E.; Eriksson, M.A.; Gillberg, C. Early diagnosis of autism and impact on prognosis: A narrative review. Clin. Epidemiol. 2013, 5, 33–43. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  9. Mwangi, B.; Ebmeier, K.P.; Matthews, K.; Steele, J.D. Multi-centre diagnostic classification of individual structural neuroimaging scans from patients with major depressive disorder. Brain J. Neurol. 2012, 135, 1508–1521. [Google Scholar] [CrossRef] [Green Version]
  10. Plitt, M.; Barnes, K.A.; Martin, A. Functional connectivity classification of autism identifies highly predictive brain features but falls short of biomarker standards. NeuroImage Clin. 2015, 7, 359–366. [Google Scholar] [CrossRef] [Green Version]
  11. Shi, C.; Zhang, J.; Wu, X. An fMRI Feature Selection Method Based on a Minimum Spanning Tree for Identifying Patients with Autism. Symmetry 2020, 12, 1995. [Google Scholar] [CrossRef]
  12. Van den Heuvel, M.P.; Hulshoff Pol, H.E. Exploring the brain network: A review on resting-state fMRI functional connectivity. Eur. Neuropsychopharmacol. 2010, 20, 519–534. [Google Scholar] [CrossRef] [PubMed]
  13. Song, J.; Yoon, N.; Jang, S.; Lee, G.; Kim, B. Neuroimaging-Based Deep Learning in Autism Spectrum Disorder and Attention-Deficit/Hyperactivity Disorder. J. Child Adolesc. Psychiatry 2020, 31, 97–104. [Google Scholar] [CrossRef]
  14. Ktena, S.I.; Parisot, S.; Ferrante, E.; Rajchl, M.; Lee, M.; Glocker, B.; Rueckert, D. Metric learning with spectral graph convolutions on brain connectivity networks. Neuroimage 2018, 169, 431–442. [Google Scholar] [CrossRef] [PubMed]
  15. Abraham, A.; Milham, M.P.; Di Martino, A.; Craddock, R.C.; Samaras, D.; Thirion, B.; Varoquaux, G. Deriving reproducible biomarkers from multi-site resting-state data: An Autism-based example. Neuroimage 2017, 147, 736–745. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Heinsfeld, A.S.; Franco, A.R.; Craddock, R.C.; Buchweitz, A.; Meneguzzi, F. Identification of autism spectrum disorder using deep learning and the ABIDE dataset. NeuroImage Clin. 2017, 17, 16–23. [Google Scholar] [CrossRef]
  17. Nielsen, J.A.; Zielinski, B.A.; Fletcher, P.T.; Alexander, A.L.; Lange, N.; Bigler, E.D.; Lainhart, J.E.; Anderson, J.S. Multisite functional connectivity MRI classification of autism: ABIDE results. Front. Hum. Neurosci. 2013, 7. [Google Scholar] [CrossRef] [Green Version]
  18. Dai, W.; Yang, Q.; Xue, G.R.; Yu, Y. Boosting for transfer learning. In Proceedings of the 24th International Conference on Machine Learning, Corvalis, OR, USA, 20–24 June 2007; pp. 193–200. [Google Scholar]
  19. Ren, C.X.; Dai, D.Q.; Huang, K.K.; Lai, Z.R. Transfer learning of structured representation for face recognition. IEEE Trans. Image Process. 2014, 23, 5440–5454. [Google Scholar] [CrossRef]
  20. Chen, Y.; Wang, J.; Huang, M.; Yu, H. Cross-position activity recognition with stratified transfer learning. Perv. Mob. Comput. 2019, 57, 1–13. [Google Scholar] [CrossRef] [Green Version]
  21. Yi, J.; Tao, J.; Wen, Z.; Bai, Y. Language-adversarial transfer learning for low-resource speech recognition. IEEE-ACM Trans. Audio Speech Lang. 2019, 27, 621–630. [Google Scholar] [CrossRef]
  22. Do, C.B.; Ng, A.Y. Transfer learning for text classification. Adv. Neural Inf. Process. Syst. 2005, 18, 299–306. [Google Scholar]
  23. Xu, Y.; Pan, S.J.; Xiong, H.; Wu, Q.; Luo, R.; Min, H.; Song, H. A Unified Framework for Metric Transfer Learning. IEEE Trans. Knowl. Data Eng. 2017, 29, 1158–1171. [Google Scholar] [CrossRef]
  24. Duan, L.; Tsang, I.W.; Xu, D.; Chua, T.S. Domain adaptation from multiple sources via auxiliary classifiers. In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; pp. 289–296. [Google Scholar]
  25. Long, M.; Wang, J.; Ding, G.; Pan, S.J.; Yu, P.S. Adaptation Regularization: A General Framework for Transfer Learning. IEEE Trans. Knowl. Data Eng. 2014, 26, 1076–1089. [Google Scholar] [CrossRef]
  26. Fernando, B.; Habrard, A.; Sebban, M.; Tuytelaars, T. Unsupervised visual domain adaptation using subspace alignment. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2960–2967. [Google Scholar]
  27. Gong, B.; Shi, Y.; Sha, F.; Grauman, K. Geodesic flow kernel for unsupervised domain adaptation. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 16–21 June 2012; pp. 2066–2073. [Google Scholar]
  28. Sun, B.; Feng, J.; Saenko, K. Return of frustratingly easy domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AR, USA, 12–17 February 2016. [Google Scholar]
  29. Wang, J.; Chen, Y.; Yu, H.; Huang, M.; Yang, Q. Easy transfer learning by exploiting intra-domain structures. In Proceedings of the IEEE International Conference on Multimedia and Expo, Shanghai, China, 8–12 July 2019; pp. 1210–1215. [Google Scholar]
  30. Long, M.; Wang, J.; Ding, G.; Sun, J.; Yu, P.S. Transfer feature learning with joint distribution adaptation. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2200–2207. [Google Scholar]
  31. Wang, J.; Chen, Y.; Hao, S.; Feng, W.; Shen, Z. Balanced distribution adaptation for transfer learning. In Proceedings of the 2017 IEEE International Conference on Data Mining, New Orleans, LA, USA, 18–21 November 2017; pp. 1129–1134. [Google Scholar]
  32. Zhang, J.; Li, W.; Ogunbona, P. Joint geometrical and statistical alignment for visual domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1859–1867. [Google Scholar]
  33. Hou, C.; Tsai, Y.H.; Yeh, Y.; Wang, Y.F. Unsupervised Domain Adaptation with Label and Structural Consistency. IEEE T Image Process. 2016, 25, 5552–5562. [Google Scholar] [CrossRef]
  34. Tahmoresnezhad, J.; Hashemi, S. Visual domain adaptation via transfer feature learning. Knowl. Inf. Syst. 2017, 50, 586–605. [Google Scholar] [CrossRef]
  35. Zhang, Y.; Deng, B.; Jia, K.; Zhang, L. Label propagation with augmented anchors: A simple semi-supervised learning baseline for unsupervised domain adaptation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 781–797. [Google Scholar]
  36. Pan, S.J.; Tsang, I.W.; Kwok, J.T.; Yang, Q. Domain Adaptation via Transfer Component Analysis. IEEE Trans. Neural Netw. 2011, 22, 199–210. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  37. Long, M.; Wang, J.; Ding, G.; Sun, J.; Yu, P.S. Transfer joint matching for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1410–1417. [Google Scholar]
  38. Wang, J.; Chen, Y.; Hu, L.; Peng, X.; Philip, S.Y. Stratified transfer learning for cross-domain activity recognition. In Proceedings of the 2018 IEEE International Conference on Pervasive Computing and Communications, Athens, Greece, 19–23 March 2018; pp. 1–10. [Google Scholar]
  39. Yao, Y. Three-way decisions with probabilistic rough sets. Inf. Sci. 2010, 180, 341–353. [Google Scholar] [CrossRef] [Green Version]
  40. Chu, X.; Sun, B.; Huang, Q.; Zhang, Y. Preference degree-based multi-granularity sequential three-way group conflict decisions approach to the integration of TCM and Western medicine. Comput. Ind. Eng. 2020, 143. [Google Scholar] [CrossRef]
  41. Almasvandi, Z.; Vahidinia, A.; Heshmati, A.; Zangeneh, M.M.; Goicoechea, H.C.; Jalalvand, A.R. Coupling of digital image processing and three-way calibration to assist a paper-based sensor for determination of nitrite in food samples. RSC Adv. 2020, 10, 14422–14430. [Google Scholar] [CrossRef] [Green Version]
  42. Ren, F.; Wang, L. Sentiment analysis of text based on three-way decisions. J. Intell. Fuzzy Syst. 2017, 33, 245–254. [Google Scholar] [CrossRef]
  43. Yao, Y. Decision-theoretic rough set models. In International Conference on Rough Sets and Knowledge Technology; Springer: Berlin/Heidelberg, Germany, 14–16 May 2007; pp. 1–12. [Google Scholar]
  44. Zhang, H.; Yang, S. Three-way group decisions with interval-valued decision-theoretic rough sets based on aggregating inclusion measures. Int. J. Approx. Reason. 2019, 110, 31–45. [Google Scholar] [CrossRef]
  45. Liu, P.; Yang, H. Three-way decisions with intuitionistic uncertain linguistic decision-theoretic rough sets based on generalized Maclaurin symmetric mean operators. Int. J. Fuzzy Syst. 2020, 22, 653–667. [Google Scholar] [CrossRef]
  46. Agbodah, K. The determination of three-way decisions with decision-theoretic rough sets considering the loss function evaluated by multiple experts. Granul. Comput. 2019, 4, 285–297. [Google Scholar] [CrossRef]
  47. Liang, D.; Wang, M.; Xu, Z.; Liu, D. Risk appetite dual hesitant fuzzy three-way decisions with TODIM. Inf. Sci. 2020, 507, 585–605. [Google Scholar] [CrossRef]
  48. Qian, T.; Wei, L.; Qi, J. A theoretical study on the object (property) oriented concept lattices based on three-way decisions. Soft Comput. 2019, 23, 9477–9489. [Google Scholar] [CrossRef]
  49. Yang, C.; Zhang, Q.; Zhao, F. Hierarchical Three-Way Decisions with Intuitionistic Fuzzy Numbers in Multi-Granularity Spaces. IEEE Access 2019, 7, 24362–24375. [Google Scholar] [CrossRef]
  50. Yao, Y.; Deng, X. Sequential three-way decisions with probabilistic rough sets. In Proceedings of the IEEE 10th International Conference on Cognitive Informatics and Cognitive Computing, Banff, AB, Canada, 18–20 August 2011; pp. 120–125. [Google Scholar]
  51. Yang, X.; Li, T.; Liu, D.; Fujita, H. A temporal-spatial composite sequential approach of three-way granular computing. Inf. Sci. 2019, 486, 171–189. [Google Scholar] [CrossRef]
  52. Zhang, L.; Li, H.; Zhou, X.; Huang, B. Sequential three-way decision based on multi-granular autoencoder features. Inf. Sci. 2020, 507, 630–643. [Google Scholar] [CrossRef]
  53. Liu, D.; Ye, X. A matrix factorization based dynamic granularity recommendation with three-way decisions. Knowl. Based Syst. 2020, 191. [Google Scholar] [CrossRef]
  54. Yao, Y. Three-way granular computing, rough sets, and formal concept analysis. Int. J. Approx. Reason. 2020, 116, 106–125. [Google Scholar] [CrossRef]
  55. Lang, G.; Luo, J.; Yao, Y. Three-way conflict analysis: A unification of models based on rough sets and formal concept analysis. Knowl. Based Syst. 2020, 194. [Google Scholar] [CrossRef]
  56. Xin, X.; Song, J.; Peng, W. Intuitionistic Fuzzy Three-Way Decision Model Based on the Three-Way Granular Computing Method. Symmetry 2020, 12, 1068. [Google Scholar] [CrossRef]
  57. Yue, X.D.; Chen, Y.F.; Miao, D.Q.; Fujita, H. Fuzzy neighborhood covering for three-way classification. Inf. Sci. 2020, 507, 795–808. [Google Scholar] [CrossRef]
  58. Ma, Y.Y.; Zhang, H.R.; Xu, Y.Y.; Min, F.; Gao, L. Three-way recommendation integrating global and local information. J. Eng. 2018, 16, 1397–1401. [Google Scholar] [CrossRef]
  59. Yu, H.; Wang, X.; Wang, G.; Zeng, X. An active three-way clustering method via low-rank matrices for multi-view data. Inf. Sci. 2020, 507, 823–839. [Google Scholar] [CrossRef]
  60. Taneja, S.S. Re: Variability of the Positive Predictive Value of PI-RADS for Prostate MRI across 26 Centers: Experience of the Society of Abdominal Radiology Prostate Cancer Disease-Focused Panel. J. Urol. 2020, 204, 1380–1381. [Google Scholar] [CrossRef] [PubMed]
  61. Salama, S.; Khan, M.; Shanechi, A.; Levy, M.; Izbudak, I. MRI differences between MOG antibody disease and AQP4 NMOSD. Mult. Scler. 2020, 26, 1854–1865. [Google Scholar] [CrossRef] [PubMed]
  62. Li, W.; Lin, X.; Chen, X. Detecting Alzheimer’s disease Based on 4D fMRI: An exploration under deep learning framework. Neurocomputing 2020, 388, 280–287. [Google Scholar] [CrossRef]
  63. Riaz, A.; Asad, M.; Alonso, E.; Slabaugh, G. DeepFMRI: End-to-end deep learning for functional connectivity and classification of ADHD using fMRI. J. Neurosci. Meth. 2020, 335. [Google Scholar] [CrossRef]
  64. Diciotti, S.; Orsolini, S.; Salvadori, E.; Giorgio, A.; Toschi, N.; Ciulli, S.; Ginestroni, A.; Poggesi, A.; De Stefano, N.; Pantoni, L.; et al. Resting state fMRI regional homogeneity correlates with cognition measures in subcortical vascular cognitive impairment. J. Neurol. Sci. 2017, 373, 1–6. [Google Scholar] [CrossRef]
  65. Lu, H.; Liu, S.; Wei, H.; Tu, J. Multi-kernel fuzzy clustering based on auto-encoder for fMRI functional network. Exp. Syst. Appl. 2020, 159. [Google Scholar] [CrossRef]
  66. Leming, M.; Górriz, J.M.; Suckling, J. Ensemble Deep Learning on Large, Mixed-Site fMRI Datasets in Autism and Other Tasks. Int. J. Neural Syst. 2020, 30. [Google Scholar] [CrossRef] [Green Version]
  67. Benabdallah, F.Z.; Maliani, A.D.E.; Lotfi, D.; Hassouni, M.E. Analysis of the Over-Connectivity in Autistic Brains Using the Maximum Spanning Tree: Application on the Multi-Site and Heterogeneous ABIDE Dataset. In Proceedings of the 2020 8th International Conference on Wireless Networks and Mobile Communications (WINCOM), Reims, France, 27–29 October 2020; pp. 1–7. [Google Scholar]
  68. Eslami, T.; Mirjalili, V.; Fong, A.; Laird, A.R.; Saeed, F. ASD-DiagNet: A Hybrid Learning Approach for Detection of Autism Spectrum Disorder Using fMRI Data. Front. Neuroinf. 2019, 13. [Google Scholar] [CrossRef] [PubMed]
  69. Bi, X.; Wang, Y.; Shu, Q.; Sun, Q.; Xu, Q. Classification of Autism Spectrum Disorder Using Random Support Vector Machine Cluster. Front. Genet. 2018, 9. [Google Scholar] [CrossRef] [PubMed]
  70. Rakić, M.; Cabezas, M.; Kushibar, K.; Oliver, A.; Lladó, X. Improving the detection of autism spectrum disorder by combining structural and functional MRI information. NeuroImage Clin. 2020, 25. [Google Scholar] [CrossRef] [PubMed]
  71. Borgwardt, K.M.; Gretton, A.; Rasch, M.J.; Kriegel, H.; Schölkopf, B.; Smola, A.J. Integrating structured biological data by Kernel Maximum Mean Discrepancy. Bioinformatics 2006, 22, e49–e57. [Google Scholar] [CrossRef] [Green Version]
  72. Steinwart, I. On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res. 2001, 2, 67–93. [Google Scholar]
  73. Xu, Z.S. Study on method for triangular fuzzy number-based multi-attribute decision making with preference information on alternatives. Syst. Eng. Electron. 2002, 24, 9–12. [Google Scholar]
  74. Yao, Y. The superiority of three-way decisions in probabilistic rough set models. Inf. Sci. 2011, 180, 1080–1096. [Google Scholar] [CrossRef]
  75. Yao, Y. An outline of a theory of three-way decisions. In Proceedings of the International Conference on Rough Sets and Current Trends in Computing, Chengdu, China, 17–20 August 2012; pp. 1–17. [Google Scholar]
  76. Chao-Gan, Y.; Yu-Feng, Z. DPARSF: A MATLAB Toolbox for “Pipeline” Data Analysis of Resting-State fMRI. Front. Syst. Neurosci. 2010, 4. [Google Scholar] [CrossRef] [Green Version]
  77. Tzourio-Mazoyer, N.; Landeau, B.; Papathanassiou, D.; Crivello, F.; Etard, O.; Delcroix, N.; Mazoyer, B.; Joliot, M. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage 2002, 15, 273–289. [Google Scholar] [CrossRef] [PubMed]
  78. Niu, K.; Guo, J.; Pan, Y.; Gao, X.; Peng, X.; Li, N.; Li, H. Multichannel deep attention neural networks for the classification of autism spectrum disorder using neuroimaging and personal characteristic data. Complexity 2020. [Google Scholar] [CrossRef]
Figure 1. Classification accuracy versus the number of iterations on six domain pairs.
Figure 1. Classification accuracy versus the number of iterations on six domain pairs.
Brainsci 11 00603 g001
Figure 2. Classification accuracies with respect to different parameter values of α and β on six domain pairs (a) NYU→UM; (b) NYU→USM; (c) USM→UM; (d) USM→NYU; (e) UM→NYU; (f) UM→USM.
Figure 2. Classification accuracies with respect to different parameter values of α and β on six domain pairs (a) NYU→UM; (b) NYU→USM; (c) USM→UM; (d) USM→NYU; (e) UM→NYU; (f) UM→USM.
Brainsci 11 00603 g002aBrainsci 11 00603 g002bBrainsci 11 00603 g002c
Table 1. Cost function matrix.
Table 1. Cost function matrix.
ActionCost Function
[ S ˜ S F δ ] s [ S ˜ S F δ ] g
a P λ P P λ P N
a B λ B P λ B N
a N λ N P λ N N
Table 2. Demographic information of the studied subjects from three imaging sites in the ABIDE database. The age values are denoted as the mean ± standard deviation. M/F: male/female.
Table 2. Demographic information of the studied subjects from three imaging sites in the ABIDE database. The age values are denoted as the mean ± standard deviation. M/F: male/female.
SiteASDNormal Control
Age (m ± std)Gender (M/F)Age (m ± std)Gender (M/F)
NYU14.92 ± 7.0464/915.75 ± 6.2370/36
USM24.59 ± 8.4638/022.33 ± 7.6923/0
UM13.85 ± 2.2939/915.03 ± 3.6449/16
Table 3. Performance of five different methods in ASD classification on the multisite ABIDE database. The number in bold indicates the best result achieved under a certain metric.
Table 3. Performance of five different methods in ASD classification on the multisite ABIDE database. The number in bold indicates the best result achieved under a certain metric.
TaskMethodACC (%)SEN (%)SPE (%)BAC (%)PPV (%)NPV (%)
NYU→UMBaseline54.8749.2362.555.876447.62
TCA62.8358.4668.7563.6171.6955.00
JDA64.5066.6761.6464.1669.5758.44
DALSC64.6056.9275.0065.9675.5156.25
Ours70.8072.3168.7570.5375.8164.71
NYU→USMBaseline67.2178.2660.5369.3954.5582.14
TCA68.8582.6160.5371.5755.8885.19
JDA70.4986.9660.5373.7457.1488.46
DALSC72.1373.9171.0572.4860.7181.81
Ours75.4191.3065.7978.5561.7692.59
USM→UMBaseline57.5235.3887.5061.4479.3150.00
TCA58.4138.4685.4261.9478.1350.62
JDA61.0661.5460.4260.9867.8053.70
DALSC64.6073.8552.0862.9667.6159.52
Ours69.9176.9260.4268.6772.4665.91
USM→NYUBaseline53.2535.4276.7156.0666.6747.46
TCA57.3940.6379.4560.0472.2250.43
JDA60.3664.5854.7959.6965.2654.05
DALSC63.9165.6361.6463.6369.2357.69
Ours72.1378.2668.4273.3460.0083.87
UM→NYUBaseline58.5883.3326.0354.6859.7054.29
TCA61.5482.2934.2558.2762.2059.50
JDA63.3182.2938.3560.3263.7162.22
DALSC64.4992.7027.3960.0562.6874.07
Ours71.0190.6345.2167.9268.5078.57
UM→USMBaseline54.0978.2639.4758.8743.9075.00
TCA60.6673.9152.6363.2748.5776.92
JDA60.6678.2650.0064.1348.6579.17
DALSC57.3873.9147.3760.6445.9575.00
Ours68.8582.6160.5371.5755.8885.19
Table 4. Comparison with state-of-the-art methods for ASD identification using rs-fMRI ABIDE data on the NYU site. HOA: Harvard Oxford Atlas. GMR: grey matter ROIs, and AAL: anatomical automatic labelling. CC200: Craddock 200. sGCN: siamese graph convolutional neural network. FCA: functional connectivity analysis. DAE: denoising autoencoder. DANN: deep attention neural networks.
Table 4. Comparison with state-of-the-art methods for ASD identification using rs-fMRI ABIDE data on the NYU site. HOA: Harvard Oxford Atlas. GMR: grey matter ROIs, and AAL: anatomical automatic labelling. CC200: Craddock 200. sGCN: siamese graph convolutional neural network. FCA: functional connectivity analysis. DAE: denoising autoencoder. DANN: deep attention neural networks.
MethodFeature TypeFeature DimensionClassifierACC (%)
sGCN + Hing Loss [14]HOA111 × 111K-Nearest Neighbor (KNN)60.50
sGCN + Global Loss [14]HOA111 × 111KNN63.50
sGCN + Constrained Variance Loss [14]HOA111 × 111KNN68.00
FCA [17]GMR7266 × 7266t-test63.00
DAE [16]CC200 Atlas19,900Softmax Regression66.00
DANN [78]AAL6670Deep neural network70.90
OursAAL4005SVM72.13/71.01
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Shi, C.; Xin, X.; Zhang, J. Domain Adaptation Using a Three-Way Decision Improves the Identification of Autism Patients from Multisite fMRI Data. Brain Sci. 2021, 11, 603. https://doi.org/10.3390/brainsci11050603

AMA Style

Shi C, Xin X, Zhang J. Domain Adaptation Using a Three-Way Decision Improves the Identification of Autism Patients from Multisite fMRI Data. Brain Sciences. 2021; 11(5):603. https://doi.org/10.3390/brainsci11050603

Chicago/Turabian Style

Shi, Chunlei, Xianwei Xin, and Jiacai Zhang. 2021. "Domain Adaptation Using a Three-Way Decision Improves the Identification of Autism Patients from Multisite fMRI Data" Brain Sciences 11, no. 5: 603. https://doi.org/10.3390/brainsci11050603

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop