AMS-Net: An Attention-Based Multi-Scale Network for Classification of 3D Terracotta Warrior Fragments
Abstract
:1. Introduction
- A novel hierarchical network called AMS-Net is proposed to enhance the capability of extracting the features of the 3D Terracotta Warrior fragments. In order to decrease the computational cost, our AMS-Net is proposed to extract contextual features in a multi-scale way instead of stacking many layers to increase the receptive field directly. The self-attention model is adopted to integrate the semantic and spatial relationships between features. To the best of our knowledge, this is the first work to apply the multi-scale structure and self-attention strategy to classify 3D cultural relic fragments;
- A local-global module is proposed, which can effectively achieve local region feature aggregation and capture long-range dependencies well. The two main components are local features aggregated cell (LFA-Cell), and global features aggregated cell (GFA-Cell). However, LFA-Cell is proposed to preserve complex local structures, which are explicitly encoded with the spatial locations from the original 3D space. The global geometric features are obtained by GFA-Cell based on self-attention. As one of the important components in LFA-Cell, a self-attention feature aggregation method named attentive aggregation sub-unit (AAS) is proposed. Compared with the traditional max-pooling-based feature aggregation networks, AAS can explicitly learn not only geometric features of local regions but also the spatial relationships among them;
- As the performance of the feature extractor is strongly affected by the dimension of the max-pooling layer, a feature fusion named IMLP is proposed in a targeted manner for our multi-scale structure, which can aggregate both low-level and high-level features with rich local information;
- Our AMS-Net can explicitly learn not only geometric features of local regions but also the spatial relationships among them. The proposed method is more suitable for the characteristic of the Terracotta Warrior fragments and can achieve a suitable classification result.
2. Related Work
2.1. Traditional Classification Methods of Terracotta Warrior Fragments
2.2. Deep Learning on Point Clouds
2.3. Multi-Scale Structure
2.4. Attention Mechanism
3. Methods
3.1. Our Proposed AMS-Net
3.2. Local-Global Module
3.2.1. Local Features Aggregated Cell (LFA-Cell)
- 1.→
- Local Geometric Relation and Features Encode (LGRFE)
- 2.→
- Attentive Aggregation Sub-unit (AAS)
3.2.2. Global Features Aggregated Cell (GFA-Cell) Based on Self-Attention
3.3. LGLayer
3.4. IMLP
4. Experiments and Results
4.1. Data Set and Implementation Detail
4.2. ModelNet40/10 Classification
4.2.1. Comparing with Other Methods
4.2.2. Robustness Test
4.2.3. Complexity Analysis
4.3. Results of Real-World Data
4.3.1. Shape Classification
4.3.2. Shape Classification with Noise
4.4. Ablation Study
4.4.1. Experiments of Partial Detail Setting in LFA-Cell
- Ablation Studies on LGRFE
- 2.
- Ablation Studies on AAS
4.4.2. Experiment of IMLP
4.4.3. Single-Scale vs. Multi-Scale
5. Conclusions and Future Direction
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AAS | Attentive Aggregation Sub-Unit |
AMS-Net | Attention-based Multi-Scale Neural Network |
AS | Adaptive Sampling |
CNNs | Convolutional Neural Networks |
DCL | Dilated Convolution Layer |
DCN | Dilated Convolution Network |
DCNN | Dilated Convolutional Neural Network |
EMD | Earth Mover’S Distance |
FC | Fully Connected |
FPS | Farthest Point Sampling |
GFA-Cell | Global Features Aggregated Cell |
IMLP | Improved Multi-Layer Perceptron |
KNN | K Nearest Neighbor |
LFA-Cell | Local Features Aggregated Cell |
LGLayer | Local-Global Layer |
LGRFE | Local Geometric Relation and Features Encode |
L-NL | Local-Nonlocal |
MLP | Multi-Layer Perceptron |
MS-BLOCK | Multi-Scale Set Abstraction Block |
MSCNN | Multi-Scale Convolutional Neural Network |
PFH | Point Feature Histogram |
SAM | Self-Attention Module |
SER | Speech Emotion Recognition |
SIFT | Scale-Invariant Feature Transform |
SOM | Self-Organizing Map |
SVM | Support Vector Machine |
Appendix A
Notation | Definitions |
---|---|
Raw point cloud | |
Sampled point cloud | |
, | The number of the point cloud χ and sampled point cloud χFPS, respectively |
The channel number of except for xyz-coordinate, e.g., color, normal the number of channels | |
The output channel number of each bottleneck layer’s size in Figure 4 | |
The output channel number of LFA-Cell and GFA-Cell, respectively | |
The channel number of features in different layers | |
The output channel number of feature encoding | |
M | The number of scales (in Figure 2) |
The total local | |
, | The coordinate feature vector and , respectively |
T | |
The new encoded neighboring feature vector | |
The new feature vector obtained by an FC layer | |
Learned attention score vector | |
The final output point of LFA-Cell | |
+ | Element-wise sum |
c | Concatenation |
s | Softmax operation |
× | Matrix multiplication |
T | Transpose |
MLP (Equation (1)) | |
Aggregate function, e.g., max/avg | |
, and | The activation functions |
References
- Liu, Y.Z.; Tang, Y.W.; Jing, L.H.; Chen, F.L.; Wang, P. Remote Sensing-Based Dynamic Monitoring of Immovable Cultural Relics, from Environmental Factors to the Protected Cultural Site: A Case Study of the Shunji Bridge. Sustainability 2021, 13, 6042. [Google Scholar] [CrossRef]
- Vinci, G.; Bernardini, F. Reconstructing the protohistoric landscape of Trieste Karst (north-eastern Italy) through airborne LiDAR remote sensing. J. Archaeol. Sci. Rep. 2017, 12, 591–600. [Google Scholar] [CrossRef]
- Liu, Y. The Application Research of Laser Scanning System in Cultural Relic Reconstruction and Virtual Repair Technology. Master’s Thesis, Chang’an University, Xi’an, China, 2012. [Google Scholar]
- Kampel, M.; Sablatnig, R. Color classification of archaeological fragments. In Proceedings of the Internaltional Conference on Pattern Recogintion (ICPR), Barcelona, Spain, 3–7 September 2000; pp. 771–774. [Google Scholar]
- Qi, L.Y.; Wang, K.G. Kernel fuzzy clustering based classification of Ancient-Ceramic fragments. In Proceedings of the International Conference on Information Management and Engineering, Chengdu, China, 16–18 April 2010; pp. 348–350. [Google Scholar]
- Rasheed, N.A.; Nordin, M.J. Archaeological Fragments Classification Based on RGB Color and Texture Features. J. Theor. Appl. Inf. Technol. 2015, 76, 358–365. [Google Scholar]
- Rasheed, N.A.; Nordin, M.J. Using Both HSV Color and Texture Features to Classify Archaeological Fragments. Res. J. Appl.Sci. Eng. Technol. 2015, 10, 1396–1403. [Google Scholar] [CrossRef]
- Rasheed, N.A.; Nordin, M.J. Classification and reconstruction algorithms for the archaeological fragments. J. King Saud Univ.-Comput. Inf. Sci. 2020, 32, 883–894. [Google Scholar] [CrossRef]
- Wei, Y.; Zhou, M.Q.; Geng, G.H.; Zou, L.B. Classification of Terra-Cotta Warriors fragments based on multi-feature and SVM. J. Northwest Univ. (Nat. Sci. Ed.) 2017, 47, 497–504. [Google Scholar]
- Zhao, F.Q.; Geng, G.H. Fragments Classification Method of Terracotta Warriors Based on Region and Shape Features. J. Geomat. Sci. Technol. 2018, 35, 584–588. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 3–8 December 2012; pp. 1097–1105. [Google Scholar]
- Christoph, R.; Pinz, F.A. Spatiotemporal residual networks for video action recognition. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 9–10 December 2016; pp. 3468–3476. [Google Scholar]
- Bian, Y.L.; Gan, C.; Liu, X.; Li, F.; Long, X.; Li, Y.D. Revisiting the Effectiveness of off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification. arXiv 2017, arXiv:1708.03805. [Google Scholar]
- Mustaqeem; Kwon, S. CLSTM: Deep Feature-Based Speech Emotion Recognition Using the Hierarchical ConvLSTM Network. Mathematics 2020, 8, 2133. [Google Scholar] [CrossRef]
- Mustaqeem; Sajjad, M.; Kwon, S. Clustering-Based Speech Emotion Recognition by Incorporating Learned Features and Deep BiLSTM. IEEE Access 2020, 8, 79861–79875. [Google Scholar] [CrossRef]
- Mustaqeem; Kwon, S. 1D-CNN: Speech Emotion Recognition System Using a Stacked Network with Dilated CNN Features. Comput. Mater. Contin. 2021, 67, 4039–4059. [Google Scholar] [CrossRef]
- Mustaqeem; Kwon, S. A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition. Sensors 2019, 20, 183. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, Y. Research on the Classification Algorithm of Terracotta Warrior Fragments Based on the Optimization Model of Convolutional Neural Network. Master’s Thesis, Northwest University, Kirkland, WA, USA, 2019. [Google Scholar]
- Maturana, D.; Scherer, S. Voxnet: A 3D convolutional neural network for real-time object recognition. In Proceedings of the International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015; pp. 922–928. [Google Scholar]
- Brock, A.; Lim, T.; Ritchie, J.M.; Weston, N. Generative and Discriminative Voxel Modeling with Convolutional Neural Networks. arXiv 2016, arXiv:1608.04236V04232. [Google Scholar]
- Su, H.; Maji, S.; Kalogerakis, E.; Learned-Miller, E. Multi-view Convolutional Neural Networks for 3D Shape Recognition. In Proceedings of the International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 945–953. [Google Scholar]
- Zhang, L.; Sun, J.; Zheng, Q. 3D Point Cloud Recognition Based on a Multi-View Convolutional Neural Network. Sensors 2018, 18, 3681. [Google Scholar] [CrossRef] [Green Version]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the Internaltional Conference on Computer Vision and Pattern Recogintion(CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Gao, H.J.; Geng, G.H. Classification of 3D Terracotta Warrior Fragments Based on Deep Learning and Template Guidance. IEEE Access 2019, 8, 4086–4098. [Google Scholar] [CrossRef]
- Yang, K.; Cao, X.; Geng, G.H.; Li, K.; Zhou, M.Q. Classification of 3D terracotta warriors fragments based on geospatial and texture information. J. Vis. 2021, 24, 251–259. [Google Scholar] [CrossRef]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 5099–5108. [Google Scholar]
- Li, Y.Y.; Bu, R.; Sun, M.C.; Wu, W.; Di, X.H.; Chen, B.Q. PointCNN: Convolution On X-Transformed Points. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Montréal, QC, Canada, 3–8 December 2018; pp. 820–830. [Google Scholar]
- Wang, Y.; Sun, Y.B.; Liu, Z.W.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic Graph CNN for Learning on Point Clouds. ACM Trans. Graph. 2019, 38, 146. [Google Scholar] [CrossRef] [Green Version]
- Kang, X.Y.; Zhou, M.Q.; Geng, G.H. Classification of Cultural Relic Fragments Based on Salient Geometric Features. J. Graph. 2015, 36, 551–556. [Google Scholar]
- Lu, Z.; Li, C.; Geng, G.; Zhou, P.; Li, Y.; Liu, Y. Classification of Cultural Fragments Based on Adaptive Weights of Multi-Feature Descriptions. Laser Optoelectron. Prog. 2020, 57, 321–329. [Google Scholar]
- Du, G.Q.; Zhou, M.Q.; Yin, C.L.; Wu, Z.K.; Shui, W.Y. Classifying fragments of Terracotta Warriors using template-based partial matching. Multimedia Tools Appl. 2018, 77, 19171–19191. [Google Scholar] [CrossRef]
- Karasik, A.; Smilansky, U. Computerized morphological classification of ceramics. J. Archaeol. Sci. 2011, 38, 2644–2657. [Google Scholar] [CrossRef]
- Geng, G.H.; Liu, J.; Cao, X.; Liu, Y.Y.; Zhou, M.Q. Simplification Method for 3D Terracotta WarriorFragments Based on Local Structure and Deep Neuralnetworks. J. Opt. Soc. Am. A 2020, 37, 1711–1720. [Google Scholar] [CrossRef]
- Liu, Y.C.; Fan, B.; Xiang, S.M.; Pan, C.H. Relation-Shape Convolutional Neural Network for Point Cloud Analysis. In Proceedings of the Internaltional Conference on Computer Vision and Pattern Recogintion(CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 8887–8896. [Google Scholar]
- Li, J.; Chen, B.M.; Lee, G.H. SO-Net: Self-Organizing Network for Point Cloud Analysis. In Proceedings of the Internaltional Conference on Computer Vision and Pattern Recogintion(CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 9397–9940. [Google Scholar]
- Wang, C.; Samari, B.; Siddiqi, K. Local spectral graph convolution for point set feature learning. In Proceedings of the Europeon Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 52–66. [Google Scholar]
- Gomez-Donoso, F.; Escalona, F.; Cazorla, M. Par3DNet: Using 3DCNNs for Object Recognition on Tridimensional Partial Views. Appl. Sci. 2020, 10, 3409. [Google Scholar] [CrossRef]
- Hou, M.L.; Li, S.K.; Jiang, L.L.; Wu, Y.H.; Hu, Y.G.; Yang, S.; Zhang, X.D. A New Method of Gold Foil Damage Detection in Stone Carving Relics Based on Multi-Temporal 3D LiDAR Point Clouds. Int. J. Geo-Inf. 2016, 5, 60. [Google Scholar] [CrossRef] [Green Version]
- Zhao, B.; Zhang, X.M.; Zhan, Z.H.; Shuiquan, P. Deep multi-scale convolutional transfer learning network: A novel method for intelligent fault diagnosis of rolling bearings under variable working conditions and domains. Neurocomputing 2020, 407, 24–38. [Google Scholar] [CrossRef]
- Huang, R.; Hong, D.F.; Xu, Y.S.; Yao, W.; Stilla, U. Multi-Scale Local Context Embedding for LiDAR Point Cloud Classification. IEEE Geosci. Remote Sens. Lett. 2019, 17, 721–725. [Google Scholar] [CrossRef]
- Mustaqeem; Kwon, S. MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach. Expert Syst. Appl. 2021, 167, 114177. [Google Scholar] [CrossRef]
- Mustaqeem; Kwon, S. Att-Net: Enhanced emotion recognition system using lightweight self-attention module. Appl. Soft Comput. 2021, 102, 107101. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 5998–6008. [Google Scholar]
- Wang, X.L.; Girshick, R.; Gupta, A.; He, K.M. Non-Local_Neural_Networks. In Proceedings of the Internaltional Conference on Computer Vision and Pattern Recogintion(CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
- Yan, X.; Zheng, C.D.; Li, Z.; Wang, S.; Cui, S.Q. PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling. In Proceedings of the Internaltional Conference on Computer Vision and Pattern Recogintion (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 5588–5597. [Google Scholar]
- Wang, P.S.; Liu, Y.; Guo, Y.X.; Sun, C.Y.; Tong, X. O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis. arXiv 2017, arXiv:1712.01537. [Google Scholar] [CrossRef]
- Klokov, R.; Lempitsky, V. Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Ravanbakhsh, S.; Schneider, J.; Poczos, B. Deep Learning with Sets and Point Clouds. arXiv 2016, arXiv:1611.04500v04501. [Google Scholar]
- Zhao, H.S.; Jiang, L.; Jia, J.Y.; Torr, P.; Koltun, V. Point Transformer. arXiv 2020, arXiv:2012.09164. [Google Scholar]
- Huang, Z.T.; Yu, Y.K.; Xu, J.W.; Ni, F.; Le, X.Y. PF-Net: Point Fractal Network for 3D Point Cloud Completion. In Proceedings of the Internaltional Conference on Computer Vision and Pattern Recogintion(CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Layer | ||||||
---|---|---|---|---|---|---|
1st. MS-BLOCK1 | 512 | 16 | (32, 32, 6) | 64 | - | |
32 | (64, 64, 128) | 128 | - | |||
64 | (64, 96, 128) | 128 | - | |||
2nd. MS-BLOCK2 | 256 | 32 | (64, 64, 128] | 128 | - | |
64 | (128, 128, 256) | 256 | - | |||
128 | (128, 128, 256) | 256 | - | |||
3rd. | IMLP 1 | 1 | - | - | - | (320, 384, 512) |
IMLP 2 | (640, 768, 1024) |
Method | Representation | Input | ModelNet10 (%) | ModelNet40 (%) | ||
---|---|---|---|---|---|---|
CA | OA | CA | OA | |||
O-CNN [47] | Octree | - | - | - | 90.60 | |
VRN [21] | Voxel | - | - | 93.61 | - | 91.33 |
Kd-Net [48] depth = 15 | pnt. | (32k) | 93.50 | 94.0 | 88.50 | 91.80 |
DeepSets [49] | pnt. | - | - | - | 90.00 | |
SO-Net [36] | pnt. | - | - | - | 90.90 | |
PointNet [24] | pnt. | - | - | 86.20 | 89.20 | |
PointNet++ [27] | pnt. | - | - | - | 90.70 | |
PointCNN [28] | pnt. | - | - | - | 92.20 | |
RS-CNN [35] | pnt. | - | - | 93.60 | ||
PointASNL [46] | pnt. | - | 95.70 | - | 92.90 | |
AMS-Net (Ours) | pnt. | 95.83 | - | 92.94 | ||
PointNet++ [27] | pnt., nor. | - | - | - | 91.90 | |
SO-Net [36] | pnt., nor. | 95.50 | 95.70 | 90.80 | 93.40 | |
Point transformer [50] | pnt., nor. | - | - | 90.6 | 93.70 | |
PointASNL [46] | pnt., nor. | - | 95.90 | - | 93.20 | |
AMS-Net (Ours) | pnt., nor. | 95.91 | - | 93.52 |
Method | Model Size (MB) | Time (MS) |
---|---|---|
PointNet | 40.1 | 17.6 |
PointNet++ (SSG) | 8.3 | 82.4 |
PointNet++ (MSG) | 12.0 | 165.0 |
AMS-Net (Ours) | 17.2 | 112.8 |
Method | Input Data Type | Deep Model | OA (%) |
---|---|---|---|
Method in [9] | image | F | 74.66 |
Method in [31] | image | F | 84.34 |
Method in [10] | image | F | 86.86 |
Method in [19] | image (cnn-based) | T | 89.54 |
Method in [32] | pnt. | F | 87.64 |
PointNet [24] | pnt., | T | 88.93 |
Method in [25] | pnt., | T | 90.94 |
Method in [26] | pnt., image | T | 91.41 |
Ours | pnt. | T | 95.68 |
Ours | pnt., nor. | T | 96.22 |
Model | Local Geometric Feature Definition | Channels | Acc. |
---|---|---|---|
A | 3 | 93.82 | |
B | 6 | 93.95 | |
C | 7 | 94.87 | |
D | 9 | 95.05 | |
E | 10 | 95.68 |
Aggregation | Max | Avg | AAS |
---|---|---|---|
Acc (%) | 94.27 | 94.06 | 95.68 |
MLP | CMLP | IMLP | |
---|---|---|---|
Acc (%) | 93.64 | 95.15 | 95.68 |
ASS-Net | AMS-Net | |
---|---|---|
Acc (%) | 93.99 | 95.68 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, J.; Cao, X.; Zhang, P.; Xu, X.; Liu, Y.; Geng, G.; Zhao, F.; Li, K.; Zhou, M. AMS-Net: An Attention-Based Multi-Scale Network for Classification of 3D Terracotta Warrior Fragments. Remote Sens. 2021, 13, 3713. https://doi.org/10.3390/rs13183713
Liu J, Cao X, Zhang P, Xu X, Liu Y, Geng G, Zhao F, Li K, Zhou M. AMS-Net: An Attention-Based Multi-Scale Network for Classification of 3D Terracotta Warrior Fragments. Remote Sensing. 2021; 13(18):3713. https://doi.org/10.3390/rs13183713
Chicago/Turabian StyleLiu, Jie, Xin Cao, Pingchuan Zhang, Xueli Xu, Yangyang Liu, Guohua Geng, Fengjun Zhao, Kang Li, and Mingquan Zhou. 2021. "AMS-Net: An Attention-Based Multi-Scale Network for Classification of 3D Terracotta Warrior Fragments" Remote Sensing 13, no. 18: 3713. https://doi.org/10.3390/rs13183713
APA StyleLiu, J., Cao, X., Zhang, P., Xu, X., Liu, Y., Geng, G., Zhao, F., Li, K., & Zhou, M. (2021). AMS-Net: An Attention-Based Multi-Scale Network for Classification of 3D Terracotta Warrior Fragments. Remote Sensing, 13(18), 3713. https://doi.org/10.3390/rs13183713