Domain Adaptation with Data Uncertainty Measure Based on Evidence Theory

Domain adaptation aims to learn a classifier for a target domain task by using related labeled data from the source domain. Because source domain data and target domain task may be mismatched, there is an uncertainty of source domain data with respect to the target domain task. Ignoring the uncertainty may lead to models with unreliable and suboptimal classification results for the target domain task. However, most previous works focus on reducing the gap in data distribution between the source and target domains. They do not consider the uncertainty of source domain data about the target domain task and cannot apply the uncertainty to learn an adaptive classifier. Aimed at this problem, we revisit the domain adaptation from source domain data uncertainty based on evidence theory and thereby devise an adaptive classifier with the uncertainty measure. Based on evidence theory, we first design an evidence net to estimate the uncertainty of source domain data about the target domain task. Second, we design a general loss function with the uncertainty measure for the adaptive classifier and extend the loss function to support vector machine. Finally, numerical experiments on simulation datasets and real-world applications are given to comprehensively demonstrate the effectiveness of the adaptive classifier with the uncertainty measure.


Introduction
In the field of machine learning research, supervised learning methods have already witnessed the outstanding performance in many applications. The key point of supervised learning is to collect sufficient labeled datasets for model training, which also limits the usage of supervised learning in scenarios with a lack of training data. Furthermore, data annotating is usually a time-consuming, labor-expensive, or even unrealistic task. To settle this situation, domain adaption (DA) is a promising methodology that aims to learn an adaptive classifier for the target domain tasks by making use of labeled data from source domains [1][2][3][4]. It has been applied in various fields successfully, such as object recognition [5,6], text classification [7,8], medical field [9,10], machine translation [11] and so on.
However, due to the mismatch between the source domain data and the target domain task, there is an uncertainty in DA when source domain data transfers to tasks of the target domain. As shown in Figure 1, in the target domain classification task, each source domain datum may no longer fully belong to a class in the label space of the target domain. The possibility of it being in class 1* is 0.2, and the uncertainty is 0.8, or the possibility of it being in class 1* is 0.9, and the uncertainty is 0.1. Unfortunately, the uncertainty of source domain data with respect to the target domain task is given less attention in DA. Ignoring Most DA research works adopted metric learning to minimize the data differences between the source and target domain for getting an adaptive classifier. Some works map the source and target data instances into a common feature space by minimizing the gap between the data distributions of the source and target domain, such as transfer component analysis (TCA) [12], correlation alignment (CORAL) [13], and scatter component analysis (SCA) [14]. Some works construct a loss function with the data differences as the constraint to train an adaptive classifier, such as joint adaptation network (JAN) [15], manifold embedded distribution alignment (MEDA) [16], and multi-representation adaptation network (MRAN) [17]. However, existing methods (1) cannot measure the uncertainty of source domain data about the target domain task, and (2) cannot accomplish effective training of adaptive classifiers with a data uncertainty measure.
The uncertainty is important for evaluating the adaptation degree of the source domain data about target tasks. The study of uncertainty has been successfully applied in traditional machine learning, such as bayesian-based uncertainty [18], evidence theory-based uncertainty [19], information entropy-based uncertainty [20], and granular computingbased uncertainty [21]. In particular, the evidence theory has been widely combined with machine learning methods to improve their ability to handle the uncertainty data [22][23][24][25][26].
To solve these problems, in this paper, we revisit the domain adaptation from source domain data uncertainty based on evidence theory and thereby devise a reliable adaptive classifier with the uncertainty measure. Specifically, we first construct an evidence net based on evidence theory for measuring the uncertainty of source domain data about the target domain classification task. It can calculate the proportion of uncertainty for each source domain instance in the target domain classification task. Second, we design a general loss function with the uncertainty measure for the adaptive classifier and extend the loss function to support vector machine (SVM). The contributions of this paper are summarized as follows.

•
Designing an evidence net based on evidence theory to measure the uncertainty of source domain data about a target domain classification task. • Designing a general loss function with uncertainty measure for learning of the adaptive classifier. • Extending the SVM by the general loss function with uncertainty measure for enhancing its transferred performance.
The remainder of the paper is organized as follows. We start by reviewing related works in Section 2. Section 3 describes the evidence net that is built based on evidence theory for estimating the uncertainty. Section 4 extends the general loss function to SVM. Section 5 presents the experimental results to validate the efficiency of the proposed method. The conclusion about our exploratory work is also given in the last section.

Related Work
In this section, we discuss previous works on domain adaptation that minimizes the data difference between the source and target domain. In addition, we introduce the evidence theory that is most related to our work.

Domain Adaptation with Metric Learning
We will briefly introduce the domain adaptation with metric learning. These methods leverage the metric methods to reduce the data difference between two domains.
Maximum mean discrepancy (MMD) [27] takes advantage of the kernel trick, which can measure the data difference between the source domain and target domain. MMD is widely used in domain adaptation. Some state-of-the-art methods are proposed based on MMD. Pan and Yang et al. [12] propose the transfer component analysis (TCA) model based on MMD. The TCA utilizes the MMD to reduce the gap between the source domain and target domain. Long et al. [28] put forward the joint distribution adaptation (JDA) algorithm that uses the MMD to adapt both the marginal distribution and conditional distribution in domain adaptation. Muhammad Ghifary et al. [29] propose a neural network model that embeds the MMD regularization to reduce the distribution mismatch. Long et al. [30] propose a novel framework that is called adaptation regularization-based transfer learning (ARTL). The ARTL optimizes the structural risk functional, joint distribution adaptation of both the marginal, and conditional distributions by embedding the MMD regularization. Yan et al. [31] propose a weighted domain adaptation network (WDAN) by both incorporating the weighted MMD into CNN and taking into account the empirical loss on target samples.
Kullback-Leibler (KL) divergence [32] can measure data distribution differences between the source domain and target domain. Dai et al. [33,34] use the KL divergence to measure the difference between the source domain and target domain and uses the difference in co-clustering to improve the performance of transferring. Zhuang et al. [35] propose a supervised representation learning method based on a deep auto-encoder for domain adaptation. In the embedding layer, the authors use the KL divergence to keep the two distributions of source and target domains similar.
Jensen-Shannon (JS) divergence is similar to KL divergence and measures the difference between the source domain and target domain. However, the JS divergence solves the asymmetry problem of KL divergence. Joshua Giles et al. [36] use JS divergence to compare calibration trails with an electroencephalogram dataset for selecting the target user in domain adaptation. Subhadeep Dey et al. [37] employ JS divergence in Information Bottleneck clustering to find clusters in domain adaptation.
The Wasserstein distance derives from the optimal transport problem. It can be used to measure distances between two probability distributions. Shen et al. [38] reduce the discrepancy between the source domain and target domain by gradient property of the Wasserstein distance for improving transfer performance. Lee et al. [39] use the Wasserstein discrepancy between classifiers to align distributions in domain adaptation.
In summary, the core idea of most methods is to minimize the distribution difference between the source and target domain. However, they ignore the uncertainty between the source domain data and the target domain task.

Learning with Evidence Theory
Evidence theory can be considered a generalized probability [19,40]. It can represent and measure data uncertainty using mass function [41]. The evidence theory uses Dempster's rule to finish uncertainty reasoning [42]. We will recall mass function and Dempster's rule from evidence theory.

Mass Function
Let Ω = {z 1 , z 2 , . . . , z n } be a finite domain (set) that includes all possible answers to the decision problem, and the elements of the set are mutually exclusive and exhaustive.
Ω is called the frame of discernment. In the classification problems, the element z k can be regarded as the kth category, and Ω can be considered as the sample space or label space. We denote the power-set as 2 Ω , and the cardinality of the power-set is 2 |Ω| .
The mass function m(·) is the Basic Belief Assignment (BBA) that represents the support degree of evidence, and m(·) is a mapping from 2 Ω to the interval [0, 1]. It satisfies the condition as follows where m(A) measures the support degree for proposition A itself and m(∅) represents that the empty set has no support degree. If m(A) > 0, A is called a focal element. In classification problems, if A = z k , m(A) can be interpreted as a support degree (possible) that instance belongs to class z k . If A = Ω, m(A) can be interpreted as the total ignorance degree for classification results. In this paper, m(Ω) can be used to reflect the instance uncertainty. For example, we assume a classification problem that distinguishes colors. The frame of discernment is Ω = {red, green, blue}. The power-set is 2 Ω = {∅, {red}, {green}, {bule}, {red, green}, {red, blue}, {green, blue}, Ω} and |Ω| = 3, 2 |Ω| = 8. m(green|x; E) represents the possibility that x belong to green based on evidence E. m(Ω|x; E) represents that we can not determine which class the sample belongs to. It reflects the instance uncertainty.

Dempster's Rule
Dempster's rule reflects the combined effect of evidence. Let m 1 and m 2 be two mass functions induced by independent items of evidence. They can be combined using Dempster's rule to form a new mass function defined as for all A ⊆ Ω, A = ∅ and (m 1 ⊕ m 2 )(∅) = 0 (⊕ is the combination operator of Dempster's rule). k is the degree of conflict between m 1 and m 2 ; it can be defined as

Uncertainty Measure in Domain Adaptation Based on Evidence Theory
In domain adaptation, the key problem of the uncertainty measure is how to evaluate the uncertainty in the target domain classification task for each source domain data. We consider that the lower uncertainty of instance represents less information loss in domain adaptation. To achieve this, we construct an evidence net based on evidence theory. It consists of two key steps (1) obtaining a trusty evidence set, and (2) designing the evidence net based on evidence theory. We describe them separately below.

Obtaining the Trusty Evidence Set
Let us consider a simple scenario with a large number of instances labeled sourcedomain D s and a small number of instances labeled target-domain D t l . Given a source-domain instance x s , its evidence set Φ t consists of similar instances from the target domain and can be formulated as a neighborhood surrounding x s .
in which x t 1 , x t 2 , · · · , x t n are n target domain instances similar to the source domain instances x s and n > 10. To ensure the validity of the evidence set, the discrepancy between a source-domain instance and the elements of its evidence set should be small. Motivated by this, we design the objective function of obtaining an evidence set for a source domain instance x s as in which the function h(·) measures the discrepancy between the x s of the source domain and the evidence set Φ t in a reproducing kernel Hilbert Space where φ : X → H is the feature mapping, and |Φ t | is the number of elements in the evidence set. In this paper, we utilize the radial basis function kernel to construct the kernel Hilbert space, in which x t − x s 2 is the Euclidean distance between two points and γ is a scaling parameter. Substituting K x t , x s into Equation (6), the function h(·) can be rewritten as Based on the above analysis, the objective function of Equation (5) to obtain the evidence set can be specified as The optimal evidence set Φ t in Equation (9) can be solved by a greedy search on the labeled target domain.

Constructing Evidence Net Based on Evidence Theory
In the evidence theory, suppose that m(·|x; Φ) is the mass function, Ω is the label space, and Φ is the evidence set, the mass function m(Ω|x; Φ) can represent the uncertainty of x about the classification task. In domain adaptation, Ω comes from the label space of the target domain. In a built-up evidence set Φ t , from the target domain D t , for instance, x s , from source domain D s , m(Ω|x s ; Φ t ) can represent the uncertainty of the source domain instance x s about the target domain classification task.
In this section, motivated by evidential k-Nearest Neighbor [22] and neural network, we construct an evidence net based on Dempster's rule to calculate m(Ω|x s ; Φ t ). The details of the evidence net are described as follows.
According to Section 3.1, the evidence set Φ t has been generated from the labeled target domain D t l . Given k classes, we decompose the evidence set Φ t into different classes, . . x t kl } is the evidence subset in which all the target domain instances have the class label z k , and x t kl is the lth element in the evidence subset. According to the decomposition of the evidence set Φ t and Dempster's rule, the evidence net can be represented in the connectionist formalism as a network with an input layer, three evidence layers L 1 , L 2 , and L 3 , and an output layer.
As shown in Figure 2, the input layer is an instance of source domain x s , and the output layer is m(z k |x s ; Φ t ) and m(Ω|x s ; Φ t ). Each evidence layer L i (i = 1, 2, 3) corresponds to one step of the procedure described as follows. (1) Layer L 1 contains n nodes, and we denote the node of layer L 1 as f 1 i · | x s ; x t i . The input of the node is an instance x s from source domain D s . At the fine-grained evidence level, given an element x t i in an evidence subset, we compute in which m z k |x s ; where d(·) is defined as follows in which K(·) is the radial basis function kernel.
(2) Layer L 2 contains k nodes, and we denote the node as f 2 in which m Ω|x s where the orthogonal sum represents the combination operator of Dempster's rule. (3) In layer L 3 , we denote the node as f · | x s ; Φ t . f · | x s ; Φ t can be calculated under the entire evidence set Φ t through accumulating f 2 j · | x s ; Φ t j under evidence subsets.
in which where κ is a normalizing factor.
m(Ω|x s ; Φ t ) represents the proportion of uncertainty in the target domain classification task for the source domain instance x s . m(z k |x s ; Φ t ) represents the possibility that source domain instance x s belongs to class z k of the target domain. In this paper, we use m(Ω|x s ; Φ t ) to measure the uncertainty of source domain data about the target domain task. Algorithm 1 summarizes the evidence net-based uncertainty measure of source domain data in domain adaptation. Generate an evidence set Φ t for x s according to Equation (9).

3:
Estimate uncertainty m(Ω|x s ; Φ t ) of x s based on the evidence net f (·|x s ; Φ t ). 4: end for 5: return D s with m(Ω|x s ; Φ t ).

Learning Algorithm of Adaptive Classifier with Uncertainty Measure
Section 3 has successfully solved the uncertainty measure of source domain data for target domain tasks. In domain adaptation, another key issue is how to use the uncertainty to learn an adaptive classifier. To solve this problem, we propose a general loss function with an uncertainty measure.
The learning algorithm with uncertainty measure can be transformed into a problem of regularized risk minimization with uncertainty R[m(Ω|x s ; Φ t ), L(x s , z, w)]. Thus, the general loss function of the learning algorithm with uncertainty can be written as where instance x s comes from source domain D s , L(·) is loss function, and w is the parameter of the model. In order to verify its effectiveness, we extend the loss function with an uncertainty measure to support vector machine (SVM).

Support Vector Machine with Uncertainty Measure (SVMU)
Based on the general loss function, we propose an improved support vector machine with an uncertainty measure (SVMU), which integrates the uncertainty of the source domain instance about the target domain task to SVM. The SVM uses only one penalty factor to control the balance between margin maximization and misclassification. However, in domain adaptation, due to domain differences, the classification hyperplane controlled by only one penalty factor cannot effectively distinguish classes of the target domain. The SVMU can change the penalty factor by the uncertainty measure. It makes the instances of the source domain that are beneficial to the target domain classification task become the new support vectors and diminishes the importance of some instances that have negative effects. Thus, SVMU is more flexible and superior in domain adaptation than SVM. The details of SVMU are described as follows.
SVM maps the input points into a high-dimensional feature space and finds a separating hyperplane that maximizes the margin between two classes in this space. According to the general loss function, Equation (19), the optimization problem for SVMU is then regarded as the solution to min w,b,ξ subject to where parameter ξ i is the slack variable. C > 0 is the penalty factor, which controls the trade-off between the slack variable penalty and the margin. φ(·) denotes a fixed feature-space transformation. b is the bias parameter.
To solve this optimization problem, we construct the Lagrangian function To find the saddle point of L(w, b, ξ, σ, λ), the parameters satisfy the following conditions By applying these conditions to the Lagrangian function (22), problem (20) can be transformed into subject to where K(·) is a kernel function and the KKT conditions are defined as The optimal solution of (24) can be denoted as σ * = (σ * 1 , σ * 2 , ·, σ * N ), where x s i corresponding to σ * i > 0 is a support vector. The support vector x s i falls exactly on the margin boundary if 0 < σ * i < (1 − m(Ω|x s i ; Φ t )) * C. If σ * i = (1 − m(Ω|x s i ; Φ t )) * C, 0 < ξ i < 1, then the classification is correct, and x s i is between the boundary and the hyperplane. If α * i = (1 − m(Ω|x s i ; Φ t )) * C and ξ i = 1, then x s i is on the classification hyperplane; if α * i = (1 − m(Ω|x s i ; Φ t )) * C and ξ i > 1, then x s i is on the misclassified side of the classification hyperplane.
In the traditional SVM, the only penalty factor C controls the balance between margin maximization and misclassification. A larger C allows the SVM to have fewer misclassification and a narrower margin. Conversely, a smaller C makes the SVM ignore more training points and obtains a larger margin. Due to the existing uncertainty of the source domain data about the target domain task, with only one penalty factor, it is difficult to control the balance between margin maximization and misclassification in the target domain task. This may result in negative transfer when using SVM as the classifier.
Based on the above analysis, applying uncertainty to SVM, it can be found that the single penalty factor C becomes (1 − m(Ω|x s i ; Φ t )) * C, whose number of penalty factors increases from one to the number of source domain instances. Each support vector corresponds to a penalty factor (1 − m(Ω|x s i ; Φ t )) * C with an uncertainty measure instead of corresponding to a single constant value C. Thus, the selection of support vectors does not rely on a single penalty factor but is determined by the uncertainty m(Ω|x s i ; Φ t ) of each source domain instance with respect to the target domain task. As shown in Figure 3, changing the penalty factor C by uncertainty m(Ω|x s i ; Φ t ) can make the instances of the source domain that are beneficial to the target domain classification task become the new support vectors and diminish the importance of some instances that have negative effects. The classification hyperplane that is generated by these new support vectors is suited to discriminate the target data. Thus, integrating the uncertainty to SVM can adjust the classification hyperplane to suit the target domain task.

Experiments
In the experiments, we evaluate the adaptive classifier with an uncertainty measure on various kinds of data, including texts and images. The descriptions of the datasets are listed below.
Amazon product reviews dataset [43] is the benchmark text corpora widely used for domain adaptation evaluation. The reviews are about four product domains: books (denoted as B), dvds (denoted as D), electronics (denoted as E), and kitchen appliances (denoted as K). Each review is assigned to a sentiment label, −1 (negative review) or +1 (positive review), based on the rating score given by the review author. In each domain, there are 1000 positive reviews and 1000 negative reviews. In this dataset, we construct 12 cross-domain sentiment classification tasks: where the word before an arrow corresponds to the source domain and the word after an arrow corresponds with the target domain. In each cross-domain classification task, we extract the features of the texts by using the word2vec tool.
Office+Caltech dataset [44] is commonly used for the task of visual object recognition in domain adaptation. It includes four domains: Amazon (denoted as A, images downloaded from online merchants), Webcam (denoted as W, low-resolution images from a web camera), DSLR (denoted as D, high-resolution images from a digital SLR camera), and Caltech-256 (denoted as C). The dataset includes 10 classes: backpack, touring bike, calculator, head Caltech, phones, computer keyboard, laptop-101, computer monitor, computer mouse, coffee mug, and video projector. There are 8 to 151 samples per category per domain and 2533 images in total. In this dataset, we construct 12 cross-domain multi-classification tasks: In the experiment, for each domain adaptation classification task, we use the classification accuracy of the target domain as the evaluation criterion. Suppose D t is the target domain dataset, where y is the ground truth label of x, and v(x) is the label predicted by the classifier.
(1) Testing on Amazon product reviews dataset In this testing, we evaluate SVM with an uncertainty measure (SVMU) on the Amazon product reviews dataset. The classification accuracies of the comparative study are listed in Table 1. 2.70%, 3.33%, and 0.9% compared to baseline method. The average classification accuracy of TCA, CORAL, GFK, JDA, and KMM are 76.63%, 70.03%, 73.52%, 76.76%, and 77.61% on Amazon product reviews. These methods aim to minimize the different between the source and target domains, while ignoring the uncertainty of the instances in the source domain with respect to the task. Although they can find a representation space with the greatest commonality between the source and target domains, they cannot determine whether the source domain instance is suitable for the target domain task. This limits the performance of these methods. The performance improvement of our method is 6.17%, 12.77%, 9.28%, 6.04%, and 5.19% compared to them. In results of text classification, since these results are obtained from a larger number of datasets, it can convincingly verify that SVMU is reliable and effective for classifying cross-domain text accurately.
(2) Testing on Office+Caltech datasets In this testing, we evaluate SVM with an uncertainty measure (SVMU) on Office+Caltech datasets. In each cross-domain classification task, we extract the features of images by speeded up robust features (SURF). The classification accuracies of the comparative study are listed in Table 2. It is obvious that SVMU achieves better performance than the methods of comparison on Office+Caltech datasets. Specifically, the average classification accuracy of SVMU on 12 cross-domain classification tasks is 50.60%, which gains significant performance improvements of 4.16%, 4.1%, 5.22%, 4.04%, 4.55%, 4.63%, 3.23%, and 2.93% compared to the baseline methods. The experimental results reveal that the improved SVM with the uncertainty measure is reliable and effective in cross-domain image classification tasks.

Effectiveness Verification of Uncertainty Measure
In this experiment, we verify the effectiveness of the uncertainty measure from three views: (1) Testing on synthetic data, visualizing the classification hyperplane of an adaptive classifier with and without an uncertainty measure. (2) Testing on real-world datasets, comparing the performance of SVM with and without an uncertainty measure. (3) Case study, explaining the role of uncertainty measure.

Testing on Synthetic Data
In order to demonstrate the effectiveness of the adaptive classifier with an uncertainty measure in domain adaptation, we visualize the classification hyperplane of an adaptive classifier on a synthetic dataset. The synthetic dataset is generated from a Gaussian distribution x ∼ N (µ, σ), where µ and σ are the mean and standard deviation, respectively. We apply different µ and σ to generate the data from the source domain and target domain.
In the dataset, the source domain and target domain consist of two-dimensional data points under two classes, and each class has 500 data points. The source domain is marked by a pentagram, and the target domain is marked by a triangle. Class 1 is marked in orange, and Class 2 is marked in dark slate-gray.
In Figure 4, (a) and (b) show the classification hyperplanes that are generated based on the source domain by SVM and SVMU, respectively. Due to the difference in data distribution between the source and target domains, the classification hyperplane generated by SVM cannot accurately distinguish the categories of the target domain and cannot satisfy the domain adaptation task. In contrast, the classification hyperplane generated by SVMU can accurately classify the target domain categories, and the classification results are shown in (a) and (b). The experimental results are consistent with the conclusions about SVMU in Section 4.1. Therefore, the uncertainty measure is effective and can improve the transfer performance of the adaptive classifier.

Testing on Real-World Datasets
To further explain the effectiveness of the adaptive classifier with uncertainty, we compare the SVM with and without uncertainty on the Amazon product reviews dataset. Figure 5 shows the result of SVM with and without uncertainty on the Amazon product reviews dataset; it is obvious that in all the cross-domain text tasks, SVMU achieves better performance than SVM. SVMU improves the transfer accuracy over SVM on the 12 subtasks by 11.71%, 6.62%, 8.28%, 8.06%, 9.46%, 12.1%, 5.24%, 9.89%, 14.82%, 6.56%, 11.27%, and 12.48%. Comparing the average classification accuracy, SVMU improves the average classification accuracy by 9.71% over SVM. The above results show that it is effective at enhancing the transfer performance of the adaptive classifier by introducing the uncertainty between the source domain data and target domain task.   Figure 5. Cross-domain sentiment classification accuracies on Amazon product reviews generated by SVM with and without uncertainty.

Case Study
Based on the above sub-experiments, it can be verified that the uncertainty measure is able to enhance adaptive classifier transfer performance. To explain the role of the uncertainty measure on the transfer process for an adaptive classifier, we use the Caltech-256 image data (complex background) as the source domain and the Amazon image data (no background) as the target domain. When the Caltech-256 dataset transfers to the Amazon dataset, the uncertainty values of some instances in the backpack and bicycle categories in the Caltech-256 dataset are shown.
As shown in Figure 6, for images (a1) to (a6) in the Caltech-256 dataset, it can be found that (a1) and (a2) are cartoon images of a backpack, and (a5) and (a6) are bicycles with obscure features. These instances are not significantly helpful for the target domain classification task. On the contrary, in (a3) and (a4), the features of the backpack and bicycle are obvious and beneficial for the target domain classification task.  We use the evidence net to calculate the uncertainty between (a1)-(a6) and the target domain task; as shown in Figure 6, we can find that the uncertainties of (a1), (a2), (a5), and (a6) calculated by the evidence network are high; 0.75, 0.82, 0.97, and 0.92, respectively. The uncertainties of (a3) and (a4) are low, at 0.09 and 0.03, respectively. When the Caltech-256 dataset transfers to the Amazon dataset, the images (a1)-(a6) no longer fully belong to the category of backpack and bicycle. (a1), (a2), and (a3) belong to the backpack category with the possibilities 0.25, 0.18, and 0.91, and 0.75, 0.82, and 0.09 are the uncertainties. (a4), (a5), and (a6) belong to the bicycle category with the possibilities 0.97, 0.03, and 0.08, and 0.03, 0.97, and 0.92 are the uncertainties. Based on the above results, it can be found that our proposed uncertainty measure is consistent with people's cognition. Therefore, the uncertainty can accurately measure the adaptability of instances with respect to the target domain task.

Conclusions
In this article, based on evidence theory, we revisited the domain adaptation from source domain data uncertainty and thereby devised a reliable adaptive classifier with the uncertainty measure. Specifically, for solving the uncertainty measure between the source domain data and target domain tasks, we designed an evidence net based on evidence theory. To solve the problem of model learning with a data uncertainty measure, we proposed a general loss function with an uncertainty measure for an adaptive classifier and extended the loss function to support vector machine. Experiments on the text dataset and image dataset validate that the proposed uncertainty measure is effective at improving the transfer performance of an adaptive classifier. In the future, we plan to extend the classifier with the uncertainty measure to handle the domain adaptation with multiple source domains and the domain adaptation on open sets.