Next Article in Journal
Rethinking Feature Generalization in Vacant Space Detection
Previous Article in Journal
An Enhanced Ensemble Deep Neural Network Approach for Elderly Fall Detection System Based on Wearable Sensors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Autoencoders Based on 2D Convolution Implemented for Reconstruction Point Clouds from Line Laser Sensors

1
Institute of Informatics, Slovak Academy of Sciences, 845 07 Bratislava, Slovakia
2
Department of Automation and Production Systems, Faculty of Mechanical Engineering, University of Zilina, 010 26 Zilina, Slovakia
3
Department of Power Mechanical Engineering, National Tsing Hua University, Hsinchu 30013, Taiwan
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(10), 4772; https://doi.org/10.3390/s23104772
Submission received: 21 April 2023 / Revised: 9 May 2023 / Accepted: 12 May 2023 / Published: 15 May 2023
(This article belongs to the Section Sensing and Imaging)

Abstract

:
Gradual development is moving from standard visual content in the form of 2D data to the area of 3D data, such as points scanned by laser sensors on various surfaces. An effort in the field of autoencoders is to reconstruct the input data based on a trained neural network. For 3D data, this task is more complicated due to the demands for more accurate point reconstruction than for standard 2D data. The main difference is in shifting from discrete values in the form of pixels to continuous values obtained by highly accurate laser sensors. This work describes the applicability of autoencoders based on 2D convolutions for 3D data reconstruction. The described work demonstrates various autoencoder architectures. The reached training accuracies are in the range from 0.9447 to 0.9807. The obtained values of the mean square error (MSE) are in the range from 0.059413 to 0.015829 mm. They are close to resolution in the Z axis of the laser sensor, which is 0.012 mm. The improvement of reconstruction abilities is reached by extracting values in the Z axis and defining nominal coordinates of points for the X and Y axes, where the structural similarity metric value is improved from 0.907864 to 0.993680 for validation data.

1. Introduction

Standard systems being applied in the industrial field are built primarily on the use of camera devices that create visual data classifiable as 2D data. With the gradual development and reduction of the price, more and more LiDAR-type laser sensors are applied for area surveys or outdoor sensing. Such applications can be seen primarily in connection with unmanned aerial vehicles (UAVs) [1,2], unmanned ground vehicles (UGV) [3,4], autonomous vehicles [5], or with household devices, such as robotic vacuum cleaners and the like [6]. At the same time, sensors for sensing physical quantities are being developed for industrial applications [7,8], which show the increasingly significant direction of development and the physical possibilities of the future development of various types of sensors including new body materials [9,10]. In the field of industry, not only point and line lasers are used for measurement and inspection systems. The advantage of these line lasers is the high accuracy for scanning surfaces, where the resolution in the Z axis ranges from 0.012–0.001 mm depending on the type of sensor [11]. The resolution in the X and Y axes depends on the hardware setting and scanning parameters [12]. It is possible to scan the inspected surface with a width of up to 10–270 mm, and the number of scanned points can be up to 2048 in one line, depending on the manufacturer and model of the sensor [11,13]. By using such sensors, it is possible to obtain high-quality data of the scanned surface. Another type of a laser device is the VR-6000 optical profilometer made by KEYENCE Corporation. This device is adapted for very accurate scanning of objects with a scanning accuracy of up to 0.0001 mm [14]. This predestines these sensors for application in inspection systems for industrial purposes [15]. In the field of inspection systems, the area of surface quality evaluation is mainly quantitative or qualitative assessment. Quantitative evaluation of scanned surfaces can be performed using statistical methods. In the field of qualitative evaluation of surfaces, this task is significantly more complicated. A large part of the issue is focused on the detection of defects on the surface of inspected objects. The problem of defect detection can be solved in two ways. One of them is direct detection of defects in visual data using the R-CNN [16], fast R-CNN [17], faster R-CNN [18], or YOLO [19] methods included in the supervised learning area. The second way is through the detection of anomalies by methods such as autoencoders, U-networks, visual transformers, and the like included in the area of unsupervised learning. Based on works [5,6,7,8,9], this is suitable for the initial identification of anomalies and the application of unsupervised learning methods in the first step. The field of visual data is relatively well processed and verified for the field of autoencoders. Applying these methods in the field of point cloud is still under development. This is primarily due to the emergence of high-quality sensor devices on the market. The second reason is the difference between typology of geometric data and classical visual data. This is the primary reason why the work with point clouds has not yet been fully developed and requires a relatively large amount of research and focus on its processing and evaluation. The illustration of basic inspection systems is shown in Figure 1. Two methods of defect detection and anomaly detection are mentioned there. The first method is based on detectors, which are trained to find trained patterns of defects which are bounded in regions and labeled. The main disadvantage is high sensitivity to the diversity of defect patterns. The second method is anomaly detection, where anomalies are defined as differences between reconstructed samples generated by trained reconstruction model (in this experiment by autoencoder) and tested samples. This type of approach highlights differences—anomalies. The disadvantage of this approach is not labeling the detected objects. In the papers already published, many types of systems are mentioned, based on one type of approach or combination of more approaches, and create hybrid systems. Each method has its own advantages and disadvantages.
The majority of published works are focused on laser sensors of the LiDAR type. In connection with the implementation of autoencoders, the focus is primarily on indoor applications, where the emphasis is placed on the classification of the cloud of points representing a specific object and, subsequently, in the process of reconstruction, creating a sample cloud of points of the given object. The use of laser sensors in the inspection system is applied for the detection of defects on rails [20]. The principle is based exclusively on a constant sensed rail pattern. Thanks to this, it is possible to identify abnormalities or defects in the scanned surface. Threshold functions were used for this purpose. A similar system for the detection of defects using laser sensors on the surface of the rails was published in [21]. This work describes the use of regression and extraction of differences from the mathematical description of the shape scanned from above. In the case of more complex works, these methods cannot be used due to the various patterns occurring on the surface of the inspected object. In this way, it is more appropriate to ensure an ideal pattern of the controlled part of the scanned object. The work [22] describes the method of finding anomalies in visual content through the cascaded autoencoder (CASAE) architecture based on two levels of autoencoder. The result is a sample image generated, and when compared with image testing, the difference is in the identified anomalies. The development of an efficient quality of assessment method for 3D point clouds is described in [23]. There are described algorithms to improve and evaluate 3D point clouds assessment.
One of the first works dealing with the recognition and classification of point clouds is focused on the tree solution of the neural network [24]. Working with point clouds consists of formatting 3D shapes represented by point clouds. These point clouds are arranged in one-dimensional directions. The processing of such data is based on 1D convolution. The goal is to create a representative pattern of the object based on the input cloud of points or an image that can be characterized by the generated cloud of points. The basis of this solution is similar to the PointNet architecture. This work describes the classification of the cloud of points into the appropriate category [25]. In this work, data are characterized as a set of points in 3D space, where each point is represented by a vector containing data such as the coordinates of a point in Euclidean space, a color channel, and the like. Another similar work is dedicated to the accuracy of point cloud reconstruction of 4 types of data, where the noise is monitored against the generated point cloud and against the input point cloud [26]. The basis of this architecture is built on a 1D arrangement of a cloud of points in Euclidean space. The architecture is built on 1D convolution. The number of points in point cloud is 2048. Another work is based on the processing of the point cloud through voxelization and transformation into a 3D object, for which 3D convolution is used [27]. A developing area is autonomous transport, where LiDAR sensors are used for scanning mainly exteriors. Autoencoders are also applied here for the reconstruction of this type of data [28]. The basis of the architecture is built on 1D convolutions. The aim of the work is to minimize the demands on memory and data storage. Another work in the field of application of autoencoders is described in [29]. The contribution of this work is to point out new perspectives in the processing of point clouds with emphasis especially on recorded large-scale 3D structures from standard data. The self-supervised model is described in [30], where the emphasis is on predicting the density of points in the point cloud database of the same category. Another work focused on data compression is described in [31]. The authors changed their view of the issue from global reconstruction to local reconstruction and thereby captured or focused on the internal structure of point clouds. Most recently, solutions built on transformers are applied in the works as shown in [32] with very good results achieved in this area. The application of autoencoders to increase or decrease the number of points in a dataset with an emphasis on the connection and combined application with sensing devices and CAD models is described as very desirable [33]. The application of the SSRecNet architecture with the exponential linear unit (ELU) activation function on the ShapeNet and Pix3D datasets is described in [34]. The work [35] describes a transformer called Point-BERT. The application of this model is compared with standard models, where the tests were performed on the ModelNet40 dataset, while the Point-BERT model achieved an accuracy of 93.80%. The autoencoders for point clouds or data from MRI and CT devices are applicated in medical areas. Autoencoders are used for these data to detect anomalies in the way of tumors, etc. [36].
The research described in this work is focused on reconstruction models—autoencoders based on 2D convolution and using high precision 3D data from the laser sensor. The goal of this work is to design a system adapted for generating a sample point cloud with respect to the tested point cloud. The purpose of our work is to design methods of reconstruction of point clouds obtained from laser sensors as very precise data and their use in inspection systems for anomaly detection by unsupervised learning. The novelty of this work is in the application of autoencoders based on 2D convolution to reconstruct 3D data with the ability to get very high similarity between tested and reconstructed samples, and, in this way, to use high-potential data quality from laser sensors.

2. Point Cloud Reconstruction Based on 2D Convolution

Most of the works focus on the processing of the cloud of points with the aim of optimization, classification, or homogenization of data. On the basis of previous works and the application of inspection systems as in common applications, the aim of this work is to design a system capable of reconstructing an image with adequate quality as achieved by scanning procedures with laser sensors. These values are highly dependent on the type of device used and the scanning parameters. In general, the resolution in the Z axis is defined by the type of sensor. The resolutions in the X and Y axes are highly dependent on the type of sensor and the way of application. The resolutions in X and Y are also dependent on the scanning parameters such as distances between sensor and scanned surface, shape of surface and relative movement of sensor with respect to the scanned object. Laser sensors can achieve a resolution of 0.01 to 0.1 mm (scanCONTROL 2600–100 [11]) in the X or Y axes according to the shape of the scanned surfaces [12]. This work describes the system for reconstruction of data from these sensors using autoencoders in industrial applications with high precision data from the laser sensors. The reconstruction of data is based on usage of 2D convolution comparable to the use of autoencoders on visual data. The standard methods described in the articles on processing point clouds are built either by 1D (conv1D) or 3D convolution (conv3D) using voxelization (3D) [24,28,30,33]. The result is the design of a system including data acquisition, data processing, autoencoder training, and sample data generation using the autoencoder for encryption of the data and their comparison with the tested data, including the evaluation, as depicted in Figure 2. which illustrates a pipeline of data. The first step is the scanning procedure, which consist of a calibration process in the way of scanning the rotary shaft and removing the first harmonic component as eccentricity of assembling the rotary shaft to the rotary axis. This work is described in [37]. This way, the processed data are obtained, decrypted, and normalized to continuous values from 0 to 1, the most suitable for the training process. For illustration, decrypted values are transformed to discrete values from 0 to 255 in 3 channels, red, green and blue (RGB), as an image. Decrypted data are used for training autoencoders and reconstruction in trained model. Reconstructed data are obtained as encrypted data consisting of continuous values from 0 to 1. The reconstructed data are transformed to discrete values to get visual representation of reconstructed data in RGB (image). The reconstructed data are then encrypted, and the point cloud is obtained, which is afterwards possible to be compared with the processed data. The comparison of encrypted reconstructed data and processed data are in the form of evaluated data. The whole algorithms are designed in Spyder environment and implemented in Anaconda software [38].

2.1. Preprocessing Data

To generate a sample cloud of points using 2D convolution, it was necessary to prepare the data. In the raw state, the data from the laser sensor were obtained in the form of .csv files, where all scanned points were stored. From the point of view of the operation of the laser sensor, it is possible to format this point cloud into a 2D space—i.e., a matrix, where each point is represented by a vector of its coordinates (X, Y, and Z). Based on the X and Y coordinates of the points, the points in the matrix were divided into a discrete place in the matrix based on their X and Y values (Figure 3A) In this way, representative content can be obtained. Subsequently, it was necessary to homogenize the data and remove inaccuracies in the data that would worsen the results in the process of training the autoencoder or in the process of data reconstruction. The first data homogenization consisted of removing the first harmonic component in the data. This inaccuracy arises when the scanned object is placed incorrectly on the rotating axis of the shaft. This was reflected in the sinusoidal display of the scanned object. The process of compensation and removal of this harmonic component is described in [37] using a two-step approach, where in the first step, the first harmonic component with a phase shift was identified based on the Fourier transform. For the sake of accuracy, this method was supplemented with a custom method for correcting the phase shift and the frequency of the first harmonic component in the data. The data with the first harmonic component removed are shown in Figure 3B) The next step was to remove empty spaces due to non-scanned points by the laser sensor. These places were filled with the average value of the surrounding points based on Equation (1), where the average value was computed over non-zero values in a 3 × 3 matrix. The result was the generation of a cloud of points corresponding to the visual content shown in Figure 3C) The display of the monochromatic expression of the Z-coordinates of the scanned points is shown in Figure 3D) where the details of the scanned surface of the gear wheel can be better observed.
V 0 = i . j P i , j n ,   P i , j 0 ,   n > 3

2.2. Architectures of Autoencoders

The key part of this work is focused on the architecture of the used autoencoder together with the type of data used. Based on the scanning parameters of the used laser scanner, which scans a maximum of 640 points in one line, this data were divided into fragments, while the range of the measuring band was preserved (dimension X, i.e., 640 points), and the dimension in the Y axis was experimented with. The size of the training data fragments was defined as 96, 120, 240, 360, 480, and 640. At the same time, the training process of different autoencoder architectures was carried out for each of the sizes. The original data consisted of 3 scans in the size of 640 × 11,000, where 2 scans were used to create the training dataset and the 3rd was used to create the verification dataset. In total, 4 basic types of autoencoders were defined (Table 1), where the simplest was built on 9 convolutional layers with the number of filters in the latent space at 128. More complex architectures were built on 512 filters in the latent space, and the overall summary of the architectures is given in Appendix B. The goal was to find a way for the system to be able to achieve high quality of the reconstructed image from data preparation through the architecture to the parameters of the training process. The main purpose was to explore and define the impact of specific parts of architecture in order to reach better results in reconstructing point clouds—for instance, the impact of a higher number of trainable parameters or usage of more parameters at the end of architectures or balanced architecture. The more accurate specification is defined in Table 1. The first type of architecture was created as basic architecture with 128 filters for each convolution layer without the input layer and last layer. The other architectures used more parameters in the way of more filters in convolution layers or implementing more convolution layers in specific parts of the architectures. For each type of architecture, there was a specific shape of input data tested, such as fragmented data (96, 120, 240, 360, 480, and 640). The architectures were built on the TensorFlow library [39]. The experiments were processed for four types of architectures, each modified to six different-sized training data inputs. The evaluation of designed architectures was important in two areas. The first area consists of achieved training results in the way of training accuracy and loss values. The second monitored area was the comparison of the error between the reconstructed and the input cloud of points together. The variables under consideration are the size of samples connected with their numbers. The amount of training data and number of parameters were dependent on the size of the graphics memory of 24GB for the RTX3090 GPU. For this reason, the most training data were in the dimension 640 × 96, and with increasing data size to the dimension 640 × 640, the number of batch size decreases. The optimizer used was Adam [40]. The loss function used was mean square error (MSE).

3. Results

The summary of the training results is presented in Table 2. The overall results are in Table A1 (Appendix A). From the view of the type of architecture, it can be said that the best results were achieved for ITE_1 with the least number of training parameters. With increased number of filters meaning more training parameters, the achieved results worsened. Additionally, with higher number of trainable parameters and higher number of convolution layers, training processes were very sensitive to overtraining. The size of input data is attached with number of training and validation samples, and there is an inverse relationship between the size of samples and the number of samples. It is due to scans fragmentation to specific sample sizes. The best results were therefore achieved with the dimensions of 640 × 96 (L_96_ITE_x), and good results were also achieved for the size of 640 × 120 (L_120_ITE_x) as shown in Figure 4 and Figure 5. In the case of data dimensions of 640 × 240 (L_240_ITE_x), 640 × 360 (L_360_ITE_x), 640 × 480 (L_480_ITE_x), and 640 × 640 (L_640_ITE_x), only average or below average results were observed. The summarization of training results is shown in Figure 4. The best accuracies and loss values achieved for each type of architecture and input size of samples are illustrated there. The second illustration of results, Figure 5, shows mean square errors with standard deviation of mean square errors. As it is shown that the best results from the view of architecture were achieved for the first architecture ITE_1 and for input size, the best results were achieved for the size of 940 × 96. In Figure 6, the structural similarity index (SSIM) defined in [41,42] is shown. According to this metric, the result 0.988568 for L_96_ITE_1 is very good, which indicates almost identical reconstructed point cloud to the original point cloud of samples. The complete results are shown in Appendix A in Table A1. The illustrations of point cloud (3D) are in Table A2. There are 3 types of images, where the first one is showing an original sample, the second image is a constructed sample, and the last one is a reconstructed point cloud showing difference values between the reconstructed image and the original sample. The visualizations of encoded and decoded images of point clouds in 2D are displayed in Table A3. The table shows visual representations of encoded point clouds and reconstructed point clouds. The last column shows differences between them. Among all the results, the best results were reached for architecture ITE_1 (Appendix A, Table A4).
Based on the results, it is assumed that due to the small amount of data, it is more appropriate to use a simple balanced architecture with a smaller number of filters. This statement is supported by the results especially for the ITE_1 architecture defined in Figure 4. Accuracy is decreasing in direct ratio with the number of training samples. For L_96_ITE_1 679 training samples were used and the achieved results have accuracy of 0.9807, loss 0.0008 and structural similarity metric 0.988568 in Figure 6 (more specific in Table A1). In case of L_640_ITE_1 97 training samples were used. This way results accuracy decreased to 0.9669, loss is 0.0019 and structural similarity metric is 0.944652 (more specific in Table A1). The metric of mean square error supports also this statement defined in Figure 5. The similar tendencies are for other architectures but not to be represented so clearly. For further work the architecture L_640_ITE_1 is used.
There is a possibility for the structural similarity metric improvement, which can be made by focusing on the values for individual axes. It can be demonstrated on the validation sample. The X and Y axes can be characterized as the positions of the scanned points and considered as constants. Values representing the positions of the points in space are essential for determining the position on the Z axis. It is due to principle of obtaining data from the laser sensor. For this purpose, it is convenient to divide the coordinates of points into 3 matrices for each axis and to express the deviation of the reconstructed points from the reference points. The expression of these values is given in Table 3. In this table, there is a graphic representation of the deviations for individual axes and the return of the average error for a specific axis. The mean square error for the entire improved frame is 0.136640 mm. The mean square error of unmodified validation sample is 3.488547 mm. The value SSIM improved from 0.907864 to 0.993680. This illustrates how it is possible to suppress the error of the reconstructed image and thereby increase the accuracy of the reconstruction of the point cloud. A graphical representation of the application of the extraction of only the Z-coordinates of the reconstructed points in conjunction with the nominal coordinates of the points for the X and Y axes is shown in Figure 7.

4. Conclusions

The goal of this work is to create a methodology for creating inspection systems based on 3D data obtained by using laser sensors. Most of the work focuses on standard point cloud datasets. The basic part of autoencoders is built usually on 1D convolution or 3D convolution often associated with voxelization. In this paper, we present the possibilities of working on the reconstruction of the point cloud based on 2D convolution on very accurate data from laser sensors. The goal is to provide a more comprehensive overview of the creation and training of such autoencoders as well as the impact of the type of training data on the accuracy of the reconstructed data compared to the input data. Another feature is to focus on smaller dimensions, namely 640 × 96 and 640 × 120, where better results are achieved but where it is also possible to reconstruct the points representing the stamped numbers as the description of the gear wheel. The architecture of the autoencoder itself does not have such a significant effect. The accuracy is connected to the topology of architecture and the amount of data. This is based on the results shown in Figure 4, mainly for architecture ITE_1, where there is an indirect ratio. For other architectures, there are the same tendencies but without such obvious results. MSE 0.015829 mm and SSIM 0.988568 were reached. The validation was performed on validation data independent from training data, where MSE 3.488547 mm and SSIM 0.907864 were reached. In the improved method, we reached MSE 0.136640 mm and SSIM 0.993680. The results show that the accuracy of autoencoders is almost comparable to the resolution of laser sensors, meaning very good results. The performance of training is highly sensitive to the amount of data according to the number of layers and the number of filters used in the architecture. Another outcome is the possible applicability and possibility for further development of autoencoders based on 2D convolution for point cloud processing.

5. Discussion

In the experiments, there were demonstrated possibilities of samples reconstruction by trained autoencoders. The results have shown that in case of data from laser sensors, it is possible to reach very good results—for instance, SSIM 0.993680. The limiting factor was observed in the amount of data and GPU memory (Nvidia Gforce RTX3090, 24 GB memory) mainly in the case of a training dataset with bigger size—for instance, 640 × 480 and more—and the number of filters in convolution layers. A necessity for further development is the provision of server graphics cards (GPU) with larger graphics memory due to the amount of data and more data in the way of training samples. The second limiting factor was to maintain balanced data. The preparation of the data was performed based on large scan fragmentation of training samples—for instance, base scans of size 640 × 11,000 were fragmented to smaller samples such as 640 × 96, etc. This way, it was possible to use stepping in fragmentation, where every sample of a specific size was extracted from a basic scan in the specific step. In this case, worse results were obtained due to unbalancing the training data, which have high impact on the training. For this reason, a dataset without stepping fragmentation was used. Another outcome was that the accuracy and success of this method were highly sensitive to the type of data. Increasing this success was based on the training and application of point clouds representing a planar surface parallel to the base XY plane. For inspection systems, it is necessary to manage more data with captured defects and to demonstrate the ability to capture defects.

Author Contributions

Conceptualization, J.K. and R.A.; methodology, J.K.; software, J.K.; validation, J.K., I.K., V.B. and J.H.; formal analysis, J.K., R.A., I.K., V.B. and H.-Y.T.; investigation, J.K.; resources, J.K., V.B. and J.H.; data curation, J.K.; writing—original draft preparation, J.K. and J.H.; writing—review and editing, J.K. and J.H.; visualization, J.K.; supervision, J.H. and R.A.; project administration, R.A.; funding acquisition, R.A. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by project grant M-Era.Net Inline evaluation of Li-ion battery electrode porosity using machine learning algorithms—BattPor. This work was also supported by the national scientific grant agency VEGA under the projects No. 2/0135/23 and No. 2/0099/22 and by the project APVV-20-0042—“Microelectromechanical sensors with radio frequency data transmission”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Summarizing the training autoencoders.
Table A1. Summarizing the training autoencoders.
No.:Architecture:Total Parameters:
Trainable Parameters:
Non-Trainable Parameters:
Input Size:Train. Dataset:

Val. Dataset:
Accuracy:
Loss:
Val. Accuracy:
Val. Loss:
Mean of Square Error [mm]:
Standard Deviation of Square Error [mm]:
Structural Similarity Metric (SSIM):
1L_96_ITE_1896,227
894,429
1798
640 × 96679

227
0.9807
0.0008
0.9826
0.0008
0.015829
0.047211
0.988568
2L_120_ITE_1896,227
894,429
1798
640 × 120541

181
0.9785
0.0010
0.9801
0.0010
0.016108
0.053263
0.984566
3L_240_ITE_1896,227
894,429
1798
640 × 240265

89
0.9722
0.0014
0.9747
0.0015
0.025456
0.062530
0.969051
4L_360_ITE_1896,227
894,429
1798
640 × 360175

59
0.9684
0.0017
0.9675
0.0016
0.021630
0.066946
0.970153
5L_480_ITE_1896,227
894,429
1798
640 × 480127

43
0.9659
0.0018
0.9690
0.0017
0.025276
0.067987
0.968888
6L_640_ITE_1896,227
894,429
1798
640 × 64097

33
0.9669
0.0019
0.9664
0.0021
0.033767
0.071704
0.944652
7L_96_ITE_27,996,899
7,991,517
5382
640 × 96679

227
0.9576
0.0046
0.9632
0.0037
0.038118
0.090202
0.954267
8L_120_ITE_27,996,899
7,991,517
5382
640 × 120541

181
0.9722
0.0016
0.9584
0.0019
0.033286
0.069123
0.968925
9L_240_ITE_27,996,899
7,991,517
5382
640 × 240265

89
0.9621
0.0021
0.9628
0.0020
0.030853
0.072560
0.954858
10L_360_ITE_27,996,899
7,991,517
5382
640 × 360175

59
0.9520
0.0025
0.9317
0.0025
0.040637
0.075443
0.954289
11L_480_ITE_27,996,899
7,991,517
5382
640 × 480127

43
0.9583
0.0021
0.9627
0.0020
0.028388
0.070746
0.960838
12L_640_ITE_27,996,899
7,991,517
5382
640 × 64097

33
0.9638
0.0023
0.9456
0.0031
0.049036
0.080666
0.951701
13L_96_ITE_326,442,243
26,429,763
12,480
640 × 96679

227
0.9599
0.0022
0.9614
0.0022
0.034843
0.073919
0.951601
14L_120_ITE_326,442,243
26,429,763
12,480
640 × 120541

181
0.9637
0.0021
0.9544
0.0023
0.036443
0.075231
0.944264
15L_240_ITE_326,442,243
26,429,763
12,480
640 × 240265

89
0.9447
0.0026
0.9553
0.0024
0.037398
0.074976
0.963985
16L_360_ITE_326,442,243
26,429,763
12,480
640 × 360175

59
0.9535
0.0025
0.9467
0.0028
0.044122
0.079939
0.953952
17L_480_ITE_326,442,243
26,429,763
12,480
640 × 480127

43
0.9549
0.0034
0.9579
0.0032
0.045042
0.086710
0.965870
18L_640_ITE_326,442,243
26,429,763
12,480
640 × 64097

33
0.9583
0.0038
0.9603
0.0026
0.035180
0.081573
0.935048
19L_96_ITE_418,923,011
18,914,819
8192
640 × 96679

227
0.9589
0.0021
0.9623
0.0022
0.027502
0.075493
0.960815
20L_120_ITE_418,923,011
18,914,819
8192
640 × 120541

181
0.9489
0.0025
0.9551
0.0026
0.043441
0.078784
0.922372
21L_240_ITE_418,923,011
18,914,819
8192
640 × 240265

89
0.9472
0.0032
0.9547
0.0027
0.039939
0.079776
0.942701
22L_360_ITE_418,923,011
18,914,819
8192
640 × 360175

59
0.9516
0.0028
0.9443
0.0031
0.033598
0.088179
0.939626
23L_480_ITE_418,923,011
18,914,819
8192
640 × 480127

43
0.9540
0.0027
0.9619
0.0022
0.028731
0.075250
0.954734
24L_640_ITE_418,923,011
18,914,819
8192
640 × 64097

33
0.9489
0.0027
0.9631
0.0036
0.059413
0.085243
0.950013
Table A2. Illustration of reconstruction point cloud and comparison of reconstructed and input point cloud.
Table A2. Illustration of reconstruction point cloud and comparison of reconstructed and input point cloud.
No.InputReconstructed 3DDifference 3D
1Sensors 23 04772 i004
2Sensors 23 04772 i005
3Sensors 23 04772 i006
4Sensors 23 04772 i007
5Sensors 23 04772 i008
6Sensors 23 04772 i009
7Sensors 23 04772 i010
8Sensors 23 04772 i011
9Sensors 23 04772 i012
10Sensors 23 04772 i013
11Sensors 23 04772 i014
12Sensors 23 04772 i015
13Sensors 23 04772 i016
14Sensors 23 04772 i017
15Sensors 23 04772 i018
16Sensors 23 04772 i019
17Sensors 23 04772 i020
18Sensors 23 04772 i021
19Sensors 23 04772 i022
20Sensors 23 04772 i023
21Sensors 23 04772 i024
22Sensors 23 04772 i025
23Sensors 23 04772 i026
24Sensors 23 04772 i027
Table A3. Illustration of reconstruction point cloud and comparison of reconstructed and input point cloud expressed in visual content.
Table A3. Illustration of reconstruction point cloud and comparison of reconstructed and input point cloud expressed in visual content.
No.Input 2DReconstructed 2DDifference between Input 2D and
Reconstructed 2D
1Sensors 23 04772 i028
2Sensors 23 04772 i029
3Sensors 23 04772 i030
4Sensors 23 04772 i031
5Sensors 23 04772 i032
6Sensors 23 04772 i033
7Sensors 23 04772 i034
8Sensors 23 04772 i035
9Sensors 23 04772 i036
10Sensors 23 04772 i037
11Sensors 23 04772 i038
12Sensors 23 04772 i039
13Sensors 23 04772 i040
14Sensors 23 04772 i041
15Sensors 23 04772 i042
16Sensors 23 04772 i043
17Sensors 23 04772 i044
18Sensors 23 04772 i045
19Sensors 23 04772 i046
20Sensors 23 04772 i047
21Sensors 23 04772 i048
22Sensors 23 04772 i049
23Sensors 23 04772 i050
24Sensors 23 04772 i051

Appendix B

Table A4. Tested four types of the autoencoder’s architectures.
Table A4. Tested four types of the autoencoder’s architectures.
L_640_ITE_1
Layer (type)Output ShapeParam #
conv2d_36 (Conv2D)(None, 640, 640, 3)84
batch_normalization_32(Batch (None, 640, 640, 3)12
max_pooling2d_12(MaxPooling (None, 320, 320, 3)0
conv2d_37 (Conv2D)(None, 320, 320, 128)3584
batch_normalization_33(Batch (None, 320, 320, 128)512
max_pooling2d_13(MaxPooling (None, 160, 160, 128)0
conv2d_38 (Conv2D)(None, 160, 160, 128)147,584
batch_normalization_34(Batch (None, 160, 160, 128)512
max_pooling2d_14(MaxPooling (None, 80, 80, 128)0
conv2d_39 (Conv2D)(None, 80, 80, 128)147,584
batch_normalization_35(Batch (None, 80, 80, 128)512
conv2d_40 (Conv2D)(None, 80, 80, 128)147,584
batch_normalization_36(Batch (None, 80, 80, 128)512
up_sampling2d_12(UpSampling (None, 160, 160, 128)0
conv2d_41 (Conv2D)(None, 160, 160, 128)147,584
batch_normalization_37(Batch (None, 160, 160, 128)512
up_sampling2d_13(UpSampling (None, 320, 320, 128)0
conv2d_42 (Conv2D)(None, 320, 320, 128)147,584
batch_normalization_38(Batch (None, 320, 320, 128)512
up_sampling2d_14(UpSampling (None, 640, 640, 128)0
conv2d_43 (Conv2D)(None, 640, 640, 128)147,584
batch_normalization_39(Batch (None, 640, 640, 128)512
conv2d_44 (Conv2D)(None, 640, 640, 3)3459
L_640_ITE_2
Layer (type)Output ShapeParam #
conv2d_195 (Conv2D)(None, 640, 640, 3)84
batch_normalization_173(Batch (None, 640, 640, 3)12
max_pooling2d (MaxPooling2D)(None, 320, 320, 3)0
conv2d_1 (Conv2D)(None, 320, 320, 384)10,752
batch_normalization_1(Batch (None, 320, 320, 384)1536
max_pooling2d_1 (MaxPooling (None, 160, 160, 384)0
conv2d_2 (Conv2D)(None, 160, 160, 384)1,327,488
batch_normalization_2(Batch (None, 160, 160, 384)1536
max_pooling2d_2(MaxPooling (None, 80, 80, 384)0
conv2d_3 (Conv2D)(None, 80, 80, 384)1,327,488
batch_normalization_3(Batch (None, 80, 80, 384)1536
conv2d_4 (Conv2D)(None, 80, 80, 384)1,327,488
batch_normalization_4(Batch (None, 80, 80, 384)1536
up_sampling2d (UpSampling (None, 160, 160, 384)0
conv2d_5 (Conv2D)(None, 160, 160, 384)1,327,488
batch_normalization_5(Batch (None, 160, 160, 384)1536
up_sampling2d_1(UpSampling (None, 320, 320, 384)0
conv2d_6 (Conv2D)(None, 320, 320, 384)1,327,488
batch_normalization_6(Batch (None, 320, 320, 384)1536
up_sampling2d_2(UpSampling (None, 640, 640, 384)0
conv2d_7 (Conv2D)(None, 640, 640, 384)1,327,488
batch_normalization_7(Batch (None, 640, 640, 384)1536
conv2d_8 (Conv2D)(None, 640, 640, 3)10,371
L_640_ITE_3
Layer (type)Output ShapeParam #
conv2d_81 (Conv2D)(None, 640, 640, 96)2688
batch_normalization_72(Batc (None, 640, 640, 96)384
max_pooling2d_27(MaxPooling (None, 320, 320, 96)0
conv2d_82 (Conv2D)(None, 320, 320, 512)442,880
batch_normalization_73(Batc (None, 320, 320, 512)2048
conv2d_83 (Conv2D)(None, 320, 320, 512)2,359,808
batch_normalization_74(Batc (None, 320, 320, 512)2048
max_pooling2d_28(MaxPooling (None, 160, 160, 512)0
conv2d_84 (Conv2D)(None, 160, 160, 512)2,359,808
batch_normalization_75(Batc (None, 160, 160, 512)2048
conv2d_85 (Conv2D)(None, 160, 160, 512)2,359,808
batch_normalization_76(Batc (None, 160, 160, 512)2048
max_pooling2d_29(MaxPooling (None, 80, 80, 512)0
conv2d_86 (Conv2D)(None, 80, 80, 512)2,359,808
batch_normalization_77(Batc (None, 80, 80, 512)2048
conv2d_87 (Conv2D)(None, 80, 80, 512)2,359,808
batch_normalization_78(Batc (None, 80, 80, 512)2048
up_sampling2d_27(UpSampling (None, 160, 160, 512)0
conv2d_88 (Conv2D)(None, 160, 160, 512)2,359,808
batch_normalization_79(Batc (None, 160, 160, 512)2048
conv2d_89 (Conv2D)(None, 160, 160, 512)2,359,808
batch_normalization_80(Batc (None, 160, 160, 512)2048
up_sampling2d_28(UpSampling (None, 320, 320, 512)0
conv2d_90 (Conv2D)(None, 320, 320, 512)2,359,808
batch_normalization_81(Batc (None, 320, 320, 512)2048
conv2d_91 (Conv2D)(None, 320, 320, 512)2,359,808
batch_normalization_82(Batc (None, 320, 320, 512)2048
up_sampling2d_29(UpSampling (None, 640, 640, 512)0
conv2d_92 (Conv2D)(None, 640, 640, 512)2,359,808
batch_normalization_83(Batc (None, 640, 640, 512)2048
conv2d_93 (Conv2D)(None, 640, 640, 512)2,359,808
batch_normalization_84(Batc (None, 640, 640, 512)2048
conv2d_94 (Conv2D)(None, 640, 640, 3)13,827
L_640_ITE_4
Layer (type)Output ShapeParam #
conv2d_78 (Conv2D)(None, 640, 640, 512)14,336
conv2d_79 (Conv2D)(None, 640, 640, 512)2,359,808
batch_normalization_64(Batc (None, 640, 640, 512)2048
max_pooling2d_24(MaxPooling (None, 320, 320, 512)0
conv2d_80 (Conv2D)(None, 320, 320, 512)2,359,808
batch_normalization_65(Batc (None, 320, 320, 512)2048
max_pooling2d_25(MaxPooling (None, 160, 160, 512)0
conv2d_81 (Conv2D)(None, 160, 160, 512)2,359,808
batch_normalization_66(Batc (None, 160, 160, 512)2048
max_pooling2d_26(MaxPooling (None, 80, 80, 512)0
conv2d_82 (Conv2D)(None, 80, 80, 512)2,359,808
batch_normalization_67(Batc (None, 80, 80, 512)2048
conv2d_83 (Conv2D)(None, 80, 80, 512)2,359,808
batch_normalization_68(Batc (None, 80, 80, 512)2048
up_sampling2d_24(UpSampling (None, 160, 160, 512)0
conv2d_84 (Conv2D)(None, 160, 160, 512)2,359,808
batch_normalization_69(Batc (None, 160, 160, 512)2048
up_sampling2d_25(UpSampling (None, 320, 320, 512)0
conv2d_85 (Conv2D)(None, 320, 320, 512)2,359,808
batch_normalization_70(Batc (None, 320, 320, 512)2048
up_sampling2d_26(UpSampling (None, 640, 640, 512)0
conv2d_86 (Conv2D)(None, 640, 640, 512)2,359,808
batch_normalization_71(Batc (None, 640, 640, 512)2048
conv2d_87 (Conv2D)(None, 640, 640, 3)13,827

References

  1. Wu, T.; Zheng, W.; Yin, W.; Zhang, H. Development and Performance Evaluation of a Very Low-Cost UAV-Lidar System for Forestry Applications. Remote Sens. 2020, 13, 77. [Google Scholar] [CrossRef]
  2. Bolourian, N.; Hammad, A. LiDAR-equipped UAV path planning considering potential locations of defects for bridge inspection. Autom. Constr. 2020, 117, 103250. [Google Scholar] [CrossRef]
  3. Zhao, Z.; Zhang, Y.; Shi, J.; Long, L.; Lu, Z. Robust Lidar-Inertial Odometry with Ground Condition Perception and Optimization Algorithm for UGV. Sensors 2022, 22, 7424. [Google Scholar] [CrossRef]
  4. Gao, H.; Cheng, S.; Chen, Z.; Song, X.; Xu, Z.; Xu, X. Design and Implementation of Autonomous Mapping System for UGV Based on Lidar. In Proceedings of the 2022 IEEE International Conference on Networking, Sensing and Control (ICNSC), Shanghai, China, 15–18 December 2022; IEEE: New York NY, USA, 2023; pp. 1–6. [Google Scholar] [CrossRef]
  5. Sun, X.; Wang, M.; Du, J.; Sun, Y.; Cheng, S.S.; Xie, W. A Task-Driven Scene-Aware LiDAR Point Cloud Coding Framework for Autonomous Vehicles. IEEE Trans. Ind. Inform. 2022. early access. [Google Scholar] [CrossRef]
  6. Bouazizi, M.; Lorite Mora, A.; Ohtsuki, T. A 2D-Lidar-Equipped Unmanned Robot-Based Approach for Indoor Human Activity Detection. Sensors 2023, 23, 2534. [Google Scholar] [CrossRef] [PubMed]
  7. Hartansky, R.; Mierka, M.; Jancarik, V.; Bittera, M.; Halgos, J.; Dzuris, M.; Krchnak, J.; Hricko, J.; Andok, R. Towards a MEMS Force Sensor via the Electromagnetic Principle. Sensors 2023, 23, 1241. [Google Scholar] [CrossRef] [PubMed]
  8. Miškiv-Pavlík, M.; Jurko, J. Dynamic Measurement of the Surface After Process of Turning with Application of Laser Displacement Sensors. In EAI/Springer Innovations in Communication and Computing; Springer: Berlin/Heidelberg, Germany, 2022; pp. 197–208. [Google Scholar] [CrossRef]
  9. Bolibruchová, D.; Matejka, M.; Kuriš, M. Analysis of the impact of the change of primary and secondary AlSi9Cu3 alloy ratio in the batch on its performance. Manuf. Technol. 2019, 19, 734–739. [Google Scholar] [CrossRef]
  10. Šutka, J.; Koňar, R.; Moravec, J.; Petričko, L. Arc welding renovation of permanent steel molds. Arch. Foundry Eng. 2021, 21, 35–40. [Google Scholar]
  11. Laser Profile Sensors for Precise 2D/3D Measurements. Available online: https://www.micro-epsilon.co.uk/2D_3D/laser-scanner/ (accessed on 15 January 2021).
  12. Klarák, J.; Kuric, I.; Zajačko, I.; Bulej, V.; Tlach, V.; Józwik, J. Analysis of Laser Sensors and Camera Vision in the Shoe Position Inspection System. Sensors 2021, 21, 7531. [Google Scholar] [CrossRef]
  13. In-Sight 3D-L4000-Specifications|Cognex. Available online: https://www.cognex.com/products/machine-vision/3d-machine-vision-systems/in-sight-3d-l4000/specifications (accessed on 5 September 2022).
  14. Versatile Profilometer Eliminates Blind Spots and Measures Glossy Surfaces|3D Optical Profilometer VR-6000 Series | KEYENCE International Belgium. Available online: https://www.keyence.eu/products/microscope/macroscope/vr-6000/index_pr.jsp (accessed on 5 September 2022).
  15. Penar, M.; Zychla, W. Object-oriented build automation—A case study. Comput. Inform. 2021, 40, 754–771. [Google Scholar] [CrossRef]
  16. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
  17. Girshick, R. Fast R-CNN. 2015. pp. 1440–1448. Available online: https://github.com/rbgirshick/ (accessed on 7 December 2022).
  18. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
  19. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
  20. Xiong, Z.; Li, Q.; Mao, Q.; Zou, Q. A 3D Laser Profiling System for Rail Surface Defect Detection. Sensors 2017, 17, 1791. [Google Scholar] [CrossRef] [PubMed]
  21. Cao, X.; Xie, W.; Ahmed, S.M.; Li, C.R. Defect detection method for rail surface based on line-structured light. Measurement 2020, 159, 107771. [Google Scholar] [CrossRef]
  22. Tao, X.; Zhang, D.; Ma, W.; Liu, X.; Xu, D. Automatic metallic surface defect detection and recognition with convolutional neural networks. Appl. Sci. 2018, 8, 1575. [Google Scholar] [CrossRef]
  23. Zhou, W.; Yang, Q.; Jiang, Q.; Zhai, G.; Member, S.; Lin, W. Blind Quality Assessment of 3D Dense Point Clouds with Structure Guided Resampling. 2022. Available online: https://arxiv.org/abs/2208.14603v1 (accessed on 9 May 2023).
  24. Gadelha, M.; Wang, R.; Maji, S. Multiresolution Tree Networks for 3D Point Cloud Processing. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
  25. Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
  26. Yan, W.; Shao, Y.; Liu, S.; Li, T.H.; Li, Z.; Li, G. Deep AutoEncoder-based Lossy Geometry Compression for Point Clouds. arXiv 2019, arXiv:1905.03691. [Google Scholar]
  27. Wang, J.; Ding, D.; Li, Z.; Ma, Z. Multiscale Point Cloud Geometry Compression. In 2021 Data Compression Conference (DCC); IEEE: New York, NY, USA, 2021; pp. 73–82. [Google Scholar] [CrossRef]
  28. Wiesmann, L.; Milioto, A.; Chen, X.; Stachniss, C.; Behley, J. Deep Compression for Dense Point Cloud Maps. IEEE Robot. Autom. Lett. 2021, 6, 2060–2067. [Google Scholar] [CrossRef]
  29. Shen, W.; Ren, Q.; Liu, D.; Zhang, Q.; Jiao, S.; University, T. Interpreting Representation Quality of DNNs for 3D Point Cloud Processing. Adv. Neural Inf. Process. Syst. 2021, 34, 8857–8870. [Google Scholar]
  30. Cheng, A.-C.; Li, X.; Sun, M.; Yang, M.-H.; Liu, S. Learning 3D Dense Correspondence via Canonical Point Autoencoder. Available online: https://anjiecheng.github.io/cpae/ (accessed on 17 January 2023).
  31. You, K.; Gao, P. Patch-Based Deep Autoencoder for Point Cloud Geometry Compression. In Patch-Based Deep Autoencoder for Point Cloud Geometry Compression; ACM: New York, NY, USA, 2021; pp. 1–7. [Google Scholar] [CrossRef]
  32. Pang, Y.; Wang, W.; Tay, F.E.H.; Liu, W.; Tian, Y.; Yuan, L. Masked Autoencoders for Point Cloud Self-supervised Learning. Available online: https://github.com/Pang- (accessed on 17 January 2023).
  33. Zhang, C.; Shi, J.; Deng, X.; Wu, Z. Upsampling Autoencoder for Self-Supervised Point Cloud Learning. arXiv 2022, arXiv:2203.10768. [Google Scholar] [CrossRef]
  34. Yue, G.; Xiong, J.; Tian, S.; Li, B.; Zhu, S.; Lu, Y. A Single Stage and Single View 3D Point Cloud Reconstruction Network Based on DetNet. Sensors 2022, 22, 8235. [Google Scholar] [CrossRef]
  35. Yu, X.; Tang, L.; Rao, Y.; Huang, T.; Zhou, J.; Lu, J. Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, Louisiana, 19–24 June 2022. [Google Scholar]
  36. Ma, S.; Li, X.; Tang, J.; Guo, F. EAA-Net: Rethinking the Autoencoder Architecture with Intra-class Features for Medical Image Segmentation. 2022. Available online: https://arxiv.org/abs/2208.09197v1 (accessed on 9 May 2023).
  37. Klarák, J.; Andok, R.; Hricko, J.; Klačková, I.; Tsai, H.Y. Design of the Automated Calibration Process for an Experimental Laser Inspection Stand. Sensors 2022, 22, 5306. [Google Scholar] [CrossRef]
  38. Spyder: Anaconda.org. Available online: https://anaconda.org/anaconda/spyder (accessed on 24 March 2023).
  39. tf.keras.layers.Layer. TensorFlow v2.10.0. Available online: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer (accessed on 10 March 2023).
  40. Kingma, D.P.; Ba, J.L. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference for Learning Representations. ICLR 2015, San Diego, CA, USA, 7–9 May 2015; Available online: https://arxiv.org/abs/1412.6980v9 (accessed on 23 February 2022).
  41. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
  42. Structural Similarity Index—Skimage v0.20.0 Docs. Available online: https://scikit-image.org/docs/stable/auto_examples/transform/plot_ssim.html (accessed on 4 April 2023).
Figure 1. Illustration of inspection systems for defect detection separated into two main types: the defect detection by supervised methods—detectors; and the anomaly detection by unsupervised methods—mainly autoencoders.
Figure 1. Illustration of inspection systems for defect detection separated into two main types: the defect detection by supervised methods—detectors; and the anomaly detection by unsupervised methods—mainly autoencoders.
Sensors 23 04772 g001
Figure 2. Illustration of a system consisting of obtaining data from laser sensors in scanning procedure, processing the data (point cloud), decrypting the data for autoencoder, reconstructing the data, encrypting them to point cloud, comparing the processed data (tested data), and encrypting the reconstructed data (after training of autoencoder).
Figure 2. Illustration of a system consisting of obtaining data from laser sensors in scanning procedure, processing the data (point cloud), decrypting the data for autoencoder, reconstructing the data, encrypting them to point cloud, comparing the processed data (tested data), and encrypting the reconstructed data (after training of autoencoder).
Sensors 23 04772 g002
Figure 3. Illustration of processing of point cloud and transformation to visual content. (A)—basic data including the first harmonic component in the data (eccentricity during the scanning); (B)—removal of the first harmonic component in the data; (C)—filling the missing points in the matrix; (D)—visualization of Z-coordinates of points in grayscale.
Figure 3. Illustration of processing of point cloud and transformation to visual content. (A)—basic data including the first harmonic component in the data (eccentricity during the scanning); (B)—removal of the first harmonic component in the data; (C)—filling the missing points in the matrix; (D)—visualization of Z-coordinates of points in grayscale.
Sensors 23 04772 g003
Figure 4. Summarizing training architectures according to Table A1, where ITE_X represents type of architecture, and L_XXX represents second size of training data.
Figure 4. Summarizing training architectures according to Table A1, where ITE_X represents type of architecture, and L_XXX represents second size of training data.
Sensors 23 04772 g004
Figure 5. Mean square error values including standard deviation, where ITE_X represents type of architecture, and L_XXX represents second size of training data.
Figure 5. Mean square error values including standard deviation, where ITE_X represents type of architecture, and L_XXX represents second size of training data.
Sensors 23 04772 g005
Figure 6. Structural similarity index (SSIM) between original samples and reconstructed samples.
Figure 6. Structural similarity index (SSIM) between original samples and reconstructed samples.
Sensors 23 04772 g006
Figure 7. Merge of X and Y coordinates of basic points and Z coordinates from reconstructed point cloud.
Figure 7. Merge of X and Y coordinates of basic points and Z coordinates from reconstructed point cloud.
Sensors 23 04772 g007
Table 1. Summarization of used types of architecture for autoencoders (detailed described in Table A4).
Table 1. Summarization of used types of architecture for autoencoders (detailed described in Table A4).
Type of ArchitectureNo. of All ParametersNo. of Filters in One Conv. LayerNo. of Conv. LayersComment
ITE_18962271289Basic architecture with small number with filters
ITE_279968993849The same architecture as basic architecture, but with average number of filters in convolution layers used in this work
ITE_32644224351214Included more convolution layers, with higher number of filters in first convolution layer. The 2 convolution layers at the end of architecture in shape (None, 640, 640, 512)
ITE_41892301151210Balanced architecture in way of similarity of convolution layers and number of parameters for start and end architecture
Table 2. Result summarization of architecture types.
Table 2. Result summarization of architecture types.
Type of ArchitectureResults
ITE_1Basic architecture with fast training and lower consumption of GPU memory. The results are sufficient.
ITE_2Little worse results compared to ITE_1. High sensitivity to overtraining, the necessity to use 30 epochs for training. For L_480 and L_640, 50 epochs were used for training.
ITE_3L_96: 30 epochs, 30–50 epochs were used for other types. High sensitivity to overtraining. The results compared to other types of architectures are average. Presumably, there is a lack of data to reach better results for architectures with more convolution layers.
ITE_4Training performed with 30 epochs. The results are below average.
Table 3. Error of reconstruction in specific axis.
Table 3. Error of reconstruction in specific axis.
AxisError in Specific Axis (L_96_ITE_1)Mean Square Error [mm]
X axisSensors 23 04772 i0010.091116
Y axisSensors 23 04772 i0020.000000
Z axisSensors 23 04772 i0030.101825
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Klarák, J.; Klačková, I.; Andok, R.; Hricko, J.; Bulej, V.; Tsai, H.-Y. Autoencoders Based on 2D Convolution Implemented for Reconstruction Point Clouds from Line Laser Sensors. Sensors 2023, 23, 4772. https://doi.org/10.3390/s23104772

AMA Style

Klarák J, Klačková I, Andok R, Hricko J, Bulej V, Tsai H-Y. Autoencoders Based on 2D Convolution Implemented for Reconstruction Point Clouds from Line Laser Sensors. Sensors. 2023; 23(10):4772. https://doi.org/10.3390/s23104772

Chicago/Turabian Style

Klarák, Jaromír, Ivana Klačková, Robert Andok, Jaroslav Hricko, Vladimír Bulej, and Hung-Yin Tsai. 2023. "Autoencoders Based on 2D Convolution Implemented for Reconstruction Point Clouds from Line Laser Sensors" Sensors 23, no. 10: 4772. https://doi.org/10.3390/s23104772

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop