Discriminative Sparse Filtering for Multi-Source Image Classification
Abstract
:1. Introduction
- We propose a novel unsupervised domain adaptation solution to reduce domain discrepancy and extract discriminative features simultaneously. Compared to existing works, the proposed method models feature distinctiveness with explicitly constraint. Comparisons with state-of-the-art methods show that our method works well in accuracy and efficiency.
- Alternating discriminant optimization is proposed to obtain discriminative features in the labeled source domain, which utilizes an l2 objective to measure feature distinctiveness. We use a toy example to demonstrate how it works.
- We combine sparse filtering and maximum mean discrepancy into an integrated framework, and propose an unified optimization method with full-batch and mini-batch gradient descent.
2. Related Works
2.1. Transfer Learning and Domain Adaptation
2.2. Sparse Filtering
- Population sparsity: Each example should be represented by only a few active features. Specifically, non-zero elements represent the activation of features, so each sample has few non-zero elements.
- Lifetime sparsity: Good features should be distinguishable. Therefore, a feature is only allowed to be activated in few samples. For example, if we want to classify cats and dogs, the feature of having a tail is activated for all samples, then it is not a good feature.
- High dispersal: It requires each feature to have similar statistical properties across all samples. No one feature should have significantly more activity than the others. This avoids the extraction of features that are only activated on very few samples, and prevents the extraction of similar features.
2.3. Maximum Mean Discrepancy
3. Methodology
3.1. Framework of Discriminative Sparse Filtering
3.2. Target Domain Sparsity: Sparse Filtering
3.3. Source Domain Discriminability: Alternating Discriminant Optimization
Algorithm 1: Alternating Discriminant Optimization. |
|
3.4. Domain Discrepancy: MMD
3.5. Optimization
3.5.1. Optimization of
3.5.2. Optimization of
3.5.3. Optimization of
4. Experiments
4.1. Data Set
4.1.1. Office-Caltech10
4.1.2. ImageCLEF
4.2. Experimental Setting
- Nearest neighbor (NN): NN is served as a baseline model to check whether the learned representations really work for DA problems.
- Joint distribution alignment (JDA): [11]. JDA [ICCV2013] adopts pseudo labels to align the conditional distributions of two domains.
- Correlation alignment (CORAL): [30]. CORAL [AAAI2016] obtains transferable representations by aligning the second-order statistics of distributions.
- Confidence-aware pseudo-label selection (CAPLS): [34].CAPLS [IJCNN2019] uses a selective pseudo labeling procedure to obtain more reliable labels.
- Modified A-distance sparse filtering (MASF): [35].MASF [Pattern Recognit.2020] employs an L2 constraint combining sparse filtering to learn both domain-shared and discriminative representations.
- Selective pseudo-labeling (SPL): [36].SPL [AAAI2020] is also a selective pseudo labeling strategy based on structured prediction.
- Generalized softmax (GSMAX): [37]. GSMAX [Inf. Sci.2020] aims to learn smooth representation with both labeled source domain and unlabeled target domain.
4.3. Implementation Details
4.4. Results
- DSF vs. NN. According to the results, DSF is significantly better than NN. NN cannot handle the domain discrepancy, thus results in unsatisfying performance. On the other hand, it indicates that our method is able to learn transferable representations.
- DSF vs. CORAL, JDA. DSF is superior to CORAL and JDA. These two methods are classical distribution matching methods, but they have limited considerations on the discrimination of learned representations.
- DSF vs. MASF. MASF is another framework based on sparse filtering, which adopts a modified distance for domain alignment. Compared to our method, it cannot ensure that the learned representation can be classified easily.
- DSF vs. CAPLS, SPL. Objectively speaking, our method DSF has comparable performance when compared state-of-the-art works, only 0.5% decreasing on average accuracy. It reveals that the proposed discriminative features are applicable for domain adaptation problems.
4.5. Empirical Analysis
4.5.1. Ablation Study
4.5.2. Parameter Sensitivity Analysis
4.5.3. Running Time
5. Discussion
5.1. Mini-Batch versus Full-Batch
5.1.1. Implementation of Mini-Batch-Based Optimization
5.1.2. Influence of Mini-Batch SGD
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
- Wang, M.; Deng, W. Deep visual domain adaptation: A survey. Neurocomputing 2018, 312, 135–153. [Google Scholar] [CrossRef] [Green Version]
- Busto, P.P.; Iqbal, A.; Gall, J. Open Set Domain Adaptation for Image and Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 413–429. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cai, R.; Li, J.; Zhang, Z.; Yang, X.; Hao, Z. DACH: Domain Adaptation Without Domain Information. IEEE Trans. Neural Netw. Learn. Syst. 2020, 99, 1–13. [Google Scholar] [CrossRef] [PubMed]
- Zhao, H.; Des Combes, R.T.; Zhang, K.; Gordon, G. On Learning Invariant Representation for Domain Adaptation. arXiv 2019, arXiv:1901.09453. [Google Scholar]
- Song, L.; Wang, C.; Zhang, L.; Du, B.; Zhang, Q.; Huang, C.; Wang, X. Unsupervised domain adaptive re-identification: Theory and practice. Pattern Recognit. 2020, 102, 107173. [Google Scholar] [CrossRef] [Green Version]
- Dai, W.; Yang, Q.; Xue, G.R.; Yu, Y. Boosting for transfer learning. In Proceedings of the 24th International Conference on Machine Learning, Corvalis, OR, USA, 20–24 June 2007; pp. 193–200. [Google Scholar]
- Ben-David, S.; Blitzer, J.; Crammer, K.; Pereira, F. Analysis of representations for domain adaptation. In Proceedings of the Neural Information Processing Systems, Vancouver, BC, Canada, 4–5 December 2006; pp. 137–144. [Google Scholar]
- Pan, S.J.; Tsang, I.W.; Kwok, J.T.; Yang, Q. Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. Learn. Syst. 2011, 22, 199–210. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Smola, A.J.; Gretton, A.; Song, L.; Scholkopf, B. A Hilbert Space Embedding for Distributions. Int. Conf. Algorithmic Learn. Theory 2007, 4754, 13–31. [Google Scholar]
- Long, M.; Wang, J.; Ding, G.; Sun, J.; Yu, P.S. Transfer Feature Learning with Joint Distribution Adaptation. In Proceedings of the International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2200–2207. [Google Scholar]
- Gong, B.; Shi, Y.; Sha, F.; Grauman, K. Geodesic flow kernel for unsupervised domain adaptation. In Proceedings of the Computer Vision and Pattern Recognition, Providence, RI, USA, 16-21 June 2012; pp. 2066–2073. [Google Scholar]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? In Proceedings of the Neural Information Processing Systems, Montreal, Canada, 8–13 December 2014; pp. 3320–3328. [Google Scholar]
- Ghifary, M.; Kleijn, W.B.; Zhang, M. Domain adaptive neural networks for object recognition. arXiv 2014, arXiv:1409.6041. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Tzeng, E.; Hoffman, J.; Zhang, N.; Saenko, K.; Darrell, T. Deep domain confusion: Maximizing for domain invariance. arXiv 2014, arXiv:1412.3474. [Google Scholar]
- Long, M.; Cao, Y.; Cao, Z.; Wang, J.; Jordan, M.I. Transferable Representation Learning with Deep Adaptation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 3071–3085. [Google Scholar] [CrossRef] [PubMed]
- Long, M.; Zhu, H.; Wang, J.; Jordan, M.I. Unsupervised domain adaptation with residual transfer networks. In Proceedings of the Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 136–144. [Google Scholar]
- Ganin, Y.; Lempitsky, V.S. Unsupervised Domain Adaptation by Backpropagation. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1180–1189. [Google Scholar]
- Long, M.; Zhu, H.; Wang, J.; Jordan, M.I. Deep transfer learning with joint adaptation networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 2208–2217. [Google Scholar]
- Pei, Z.; Cao, Z.; Long, M.; Wang, J. Multi-Adversarial Domain Adaptation. In Proceedings of the National Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 3934–3941. [Google Scholar]
- Zhang, W.; Ouyang, W.; Li, W.; Xu, D. Collaborative and Adversarial Network for Unsupervised Domain Adaptation. In Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3801–3809. [Google Scholar]
- Weng, J.; Young, D.S. Some dimension reduction strategies for the analysis of survey data. J. Big Data 2017, 4, 43. [Google Scholar] [CrossRef] [Green Version]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Le, Q.V.; Karpenko, A.; Ngiam, J.; Ng, A.Y. ICA with reconstruction cost for efficient overcomplete feature learning. In Proceedings of the Neural Information Processing Systems, Granada, Spain, 12–14 December 2011; pp. 1017–1025. [Google Scholar]
- d’Aspremont, A.; Ghaoui, L.E.; Jordan, M.I.; Lanckriet, G.R. A direct formulation for sparse PCA using semidefinite programming. In Proceedings of the Neural Information Processing Systems, Vancouver, BC, Canada, 5–8 December 2005; pp. 41–48. [Google Scholar]
- Ngiam, J.; Chen, Z.; Bhaskar, S.A.; Koh, P.W.; Ng, A.Y. Sparse filtering. In Proceedings of the Neural Information Processing Systems, Granada, Spain, 12–15 December 2011; pp. 1125–1133. [Google Scholar]
- Zhang, Z.; Xu, Y.; Yang, J.; Li, X.; Zhang, D. A Survey of Sparse Representation: Algorithms and Applications. IEEE Access 2015, 3, 490–530. [Google Scholar] [CrossRef]
- Long, M.; Wang, J.; Sun, J.; Yu, P.S. Domain Invariant Transfer Kernel Learning. IEEE Trans. Knowl. Data Eng. 2015, 27, 1519–1532. [Google Scholar] [CrossRef]
- Sun, B.; Feng, J.; Saenko, K. Return of frustratingly easy domain adaptation. In Proceedings of the National Conference on Artificial Intelligence, Phoenix, AR, USA, 12–17 February 2016; pp. 2058–2065. [Google Scholar]
- Goodfellow, I.; Pougetabadie, J.; Mirza, M.; Xu, B.; Wardefarley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Dash, M.; Liu, H. Feature selection for classification. Intell. Data Anal. 1997, 1, 131–156. [Google Scholar] [CrossRef]
- Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
- Wang, Q.; Bu, P.; Breckon, T.P. Unifying Unsupervised Domain Adaptation and Zero-Shot Visual Recognition. In Proceedings of the International Joint Conference on Neural Network, Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar]
- Han, C.; Lei, Y.; Xie, Y.; Zhou, D.; Gong, M. Visual Domain Adaptation Based on Modified A Distance and Sparse Filtering. Pattern Recognit. 2020, 104, 107254. [Google Scholar] [CrossRef]
- Wang, Q.; Breckon, T.P. Unsupervised Domain Adaptation via Structured Prediction Based Selective Pseudo-Labeling. In Proceedings of the National Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 1–10. [Google Scholar]
- Han, C.; Lei, Y.; Xie, Y.; Zhou, D.; Gong, M. Learning Smooth Representations with Generalized Softmax for Unsupervised Domain Adaptation. Inf. Sci. 2020. [Google Scholar] [CrossRef]
Notations | Description |
---|---|
source/target domain | |
original source/target domain data | |
source/target domain label | |
number of source/target samples | |
original/transformed feature dimension | |
transformed source/target domain data | |
mapping function, | |
W | the transformation matrix to be solved |
weight matrix for alternating discriminant optimization | |
objective function for sparsity in the target domain | |
objective function for alternating discriminant optimization in the source domain | |
objective function for domain discrepancy | |
the balance factors among three objectives |
No. | Task | NN | JDA | CORAL | CAPLS | MASF | SPL | GSMAX | DSF |
---|---|---|---|---|---|---|---|---|---|
1 | C→A | 85.69 | 89.77 | 92.00 | 90.90 | 90.81 | 92.80 | 92.48 | 91.12 |
2 | C→W | 66.10 | 83.73 | 80.00 | 88.83 | 87.46 | 85.08 | 81.02 | 91.52 |
3 | C→D | 74.52 | 86.62 | 84.70 | 90.08 | 89.81 | 91.72 | 89.81 | 89.17 |
4 | A→C | 70.35 | 82.28 | 83.20 | 80.66 | 87.36 | 81.39 | 85.31 | 83.88 |
5 | A→W | 57.29 | 78.64 | 74.60 | 80.69 | 81.02 | 84.07 | 81.69 | 82.03 |
6 | A→D | 64.97 | 80.25 | 84.10 | 89.45 | 86.62 | 90.45 | 87.26 | 89.17 |
7 | W→C | 60.37 | 83.53 | 75.50 | 86.62 | 85.04 | 74.00 | 81.39 | 81.92 |
8 | W→A | 62.53 | 90.19 | 81.20 | 91.38 | 91.34 | 91.96 | 77.97 | 89.35 |
9 | W→D | 98.73 | 100.00 | 100.00 | 100.00 | 99.36 | 100.00 | 97.45 | 100.00 |
10 | D→C | 52.09 | 85.13 | 76.80 | 88.05 | 85.75 | 88.51 | 84.95 | 84.23 |
11 | D→A | 62.73 | 91.44 | 85.50 | 92.32 | 90.40 | 93.32 | 90.61 | 91.44 |
12 | D→W | 89.15 | 98.98 | 99.30 | 98.66 | 98.98 | 100.00 | 98.98 | 98.30 |
13 | C→I | 85.16 | 92.00 | 83.00 | 91.00 | 89.83 | 90.83 | 87.33 | 93.16 |
14 | C→P | 69.16 | 75.50 | 71.50 | 77.33 | 72.83 | 78.17 | 70.39 | 75.63 |
15 | I→C | 91.16 | 92.33 | 88.66 | 94.17 | 93.17 | 94.33 | 92.83 | 95.66 |
16 | I→P | 73.16 | 77.00 | 73.66 | 75.80 | 76.83 | 77.50 | 78.68 | 77.49 |
17 | P→C | 81.33 | 83.83 | 72.50 | 90.67 | 85.33 | 91.33 | 91.33 | 85.83 |
18 | P→I | 74.50 | 79.16 | 72.33 | 85.00 | 80.83 | 85.83 | 86.67 | 82.50 |
19 | AVG | 73.28 | 86.13 | 82.14 | 88.42 | 87.38 | 88.40 | 86.45 | 87.91 |
NN | JDA | CORAL | CAPLS | MASF | SPL | GSMAX | DSF | |
---|---|---|---|---|---|---|---|---|
Accuracy | 73.28 | 86.13 | 82.14 | 88.42 | 87.38 | 88.40 | 86.45 | 87.91 |
Time | 0.503 | 13.114 | 16.441 | 1078 | 7.228 | 1326 | 2.499 | 6.176 |
MMD | ADO | CI | CP | IC | IP | PC | PI | AVG |
---|---|---|---|---|---|---|---|---|
✗ | ✗ | 91.17 | 76.48 | 95.00 | 75.31 | 86.83 | 82.83 | 84.60 |
✗ | ✓ | 92.67 | 76.31 | 95.00 | 76.65 | 87.00 | 83.83 | 85.24 |
✓ | ✗ | 92.17 | 75.63 | 95.83 | 76.82 | 89.00 | 83.67 | 85.52 |
✓ | ✓ | 93.00 | 78.17 | 96.33 | 77.16 | 89.17 | 83.50 | 86.22 |
No. | Task | NN | JDA | CORAL | CAPLS | MASF | SPL | GSMAX | DSF |
---|---|---|---|---|---|---|---|---|---|
1 | C→A | 1.134 | 33.893 | 25.513 | 2516.933 | 13.098 | 3206.378 | 3.936 | 10.515 |
2 | C→W | 0.770 | 16.354 | 23.638 | 897.388 | 9.586 | 1030.175 | 2.484 | 6.755 |
3 | C→D | 0.776 | 13.224 | 23.356 | 842.526 | 8.389 | 921.195 | 2.677 | 5.905 |
4 | A→C | 1.240 | 33.622 | 23.650 | 3278.291 | 12.424 | 4365.437 | 3.700 | 11.097 |
5 | A→W | 0.770 | 12.780 | 23.045 | 2870.206 | 8.399 | 3528.711 | 2.412 | 5.879 |
6 | A→D | 0.628 | 10.275 | 23.005 | 2725.151 | 7.513 | 3272.032 | 3.224 | 4.716 |
7 | W→C | 0.655 | 16.215 | 22.761 | 735.878 | 9.288 | 833.277 | 2.229 | 6.479 |
8 | W→A | 0.554 | 12.702 | 22.771 | 1510.084 | 7.939 | 1840.393 | 1.874 | 5.528 |
9 | W→D | 0.239 | 1.619 | 22.554 | 319.998 | 4.174 | 369.574 | 1.612 | 1.951 |
10 | D→C | 0.572 | 13.368 | 23.084 | 519.365 | 8.080 | 622.318 | 1.764 | 5.375 |
11 | D→A | 0.490 | 10.542 | 23.049 | 1336.746 | 7.332 | 1751.001 | 1.731 | 4.678 |
12 | D→W | 0.201 | 1.631 | 22.370 | 241.107 | 4.257 | 278.883 | 1.481 | 1.982 |
13 | C→I | 0.158 | 10.035 | 2.991 | 290.860 | 4.961 | 341.601 | 0.873 | 6.798 |
14 | C→P | 0.155 | 9.949 | 2.876 | 264.467 | 4.922 | 304.481 | 0.835 | 6.893 |
15 | I→C | 0.162 | 9.989 | 2.850 | 293.450 | 4.953 | 334.425 | 0.895 | 6.666 |
16 | I→P | 0.177 | 9.909 | 2.801 | 253.168 | 4.912 | 290.072 | 4.536 | 6.682 |
17 | P→C | 0.179 | 10.039 | 2.800 | 263.583 | 4.823 | 302.712 | 1.760 | 6.600 |
18 | P→I | 0.196 | 9.897 | 2.826 | 249.105 | 5.051 | 283.321 | 6.962 | 6.674 |
19 | AVG | 0.503 | 13.114 | 16.441 | 1078.239 | 7.228 | 1326.444 | 2.499 | 6.176 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Han, C.; Zhou, D.; Yang, Z.; Xie, Y.; Zhang, K. Discriminative Sparse Filtering for Multi-Source Image Classification. Sensors 2020, 20, 5868. https://doi.org/10.3390/s20205868
Han C, Zhou D, Yang Z, Xie Y, Zhang K. Discriminative Sparse Filtering for Multi-Source Image Classification. Sensors. 2020; 20(20):5868. https://doi.org/10.3390/s20205868
Chicago/Turabian StyleHan, Chao, Deyun Zhou, Zhen Yang, Yu Xie, and Kai Zhang. 2020. "Discriminative Sparse Filtering for Multi-Source Image Classification" Sensors 20, no. 20: 5868. https://doi.org/10.3390/s20205868
APA StyleHan, C., Zhou, D., Yang, Z., Xie, Y., & Zhang, K. (2020). Discriminative Sparse Filtering for Multi-Source Image Classification. Sensors, 20(20), 5868. https://doi.org/10.3390/s20205868