3D Snow Sculpture Reconstruction Based on Structured-Light 3D Vision Measurement

: Structured-light technique is an effective method for indoor 3D measurement, but it is hard to obtain ideal results outdoors because of complex illumination interference on sensors. This paper presents a 3D vision measurement method based on digital image processing to improve resistance to noise of measuring systems, which ensuresnormal operation of a structured-light sensor in the wild without changing its components, and the method is applied in 3D reconstruction of snow sculpture. During image preprocessing, an optimal weight function is designed based on noise classiﬁcation and minimum entropy, and the color images are transformed into monochromatic value images to eliminate most environmental noise. Then a Decision Tree Model (DTM) in a spatial-temporal context of video sequence is used to extract and track stripe. The model is insensitive to stubborn noise and reﬂection in the images, and the result of the model after coordinate transformation is a 3D point cloud of the corresponding snow sculpture. In experimental results, the root mean square (RMS) error and mean error are less than 0.722 mm and 0.574 mm respectively, showing that the method can realize real-time, robust and accurate measurement under a complex illumination environment, and can therefore provide technical support for snow sculpture 3D measurement.


Introduction
Digitalized archive of cultural relics and artworks has become an important means for their restoration and virtual museum construction thanks to the development of 3D measurement technique. Snow sculpture is an aesthetical form of artwork peculiar to cold regions around the world in winter, as shown in Figure 1, and its cultural value is no less than painting, wood carving, architecture and other artistic expressions [1,2]. However, when warmed up, these artworks, great displays of the wisdom of their designer artists, will disappear. Snow sculpture can be divided into two categories from the perspective of creation [3]. The first category is designed and constructed according to three-dimensional computer modeling, the 3D data of which is known in advance, meaning that there is no need to measure. The other category is works improvised according to shape and texture of snow billet and combined with surrounding scenes.Similar to other sculptures, they fuse the ideas and inspirations of artists. The category of snow sculpture is unique and needs to measure and record 3D data. Therefore, both snow sculpture designers and management companies want to obtain and store 3D data of each snow sculpture in the computer, thereby making it easy to reproduce the original appearance of snow sculpture through technologies such as 3D movie and 3D printing. Obviously, there is actual demand for 3D At present, the main methods for 3D measurement of cultural relics and art include laser scanning, sequence image method and structured-light vision measur [4][5][6][7]. Laser scanning is highlyaccuratebut expensive, and its point cloud is not enough to adequately describe details. Sequence image method has low requireme equipment and is easy to operate, but it cannot measure objects without obvious t (such as pure white porcelain bottles) and has low accuracy. Structured-light vision urement has the advantages of high accuracy, simple equipment, easy operation an cost, but the camera is easily affected by illumination. Considering large size, sing face color, complex texture, and low cost of snow sculpture, structured-light vision urement is more suitable for 3D measurement of snow sculpture.
Structured-light sensors need to work in a relatively controlled light enviro traditionally [8], so it is usually operated in a room, while measuring outdoors has a been a challenge. However, snow sculpture is directly exposed to the outdoor en ment with sunlight, shadows, reflections, and other environmental noise factors where. Night scanning can reduce the noise, but it is adverse to operator's healt work quality in the temperature range of −20 °C to −30 °C. 3D laser scanner can pe 3D measurements such as scanning ancient buildings in the day, but its sparse sca point cloud cannot fully describe the details of snow sculpture. Therefore, we ta vantage of a structured-light sensor with a laser as the light source to scan snow scu precisely in the field. Moreover, compared to the coding mode, which is mostly u controlled scenes, the fixed mode structured-light sensor with better anti-noise p mance [9] is more applicable for 3D measurement of snow sculpture.
It is hard to obtain a higher signal to noise ratio (SNR) even when using a lase jection outdoors, because the intensity of sunlight is usually 2-5 orders of mag higher than that of structured-light stripe [8]. The existing methods have not well re this problem. For example, Microsoft's Kinect operates difficultly in high ambien [10]. Modifying the structured-light sensor, such as installing filter or increasing t tensity of the light source, is a method to reduce noise, but it is at the expense of robu and cost. So many researchers prefer to solve the above problems by digital imag cessing.
Stripe extraction is the key factor for fixed mode structured-light equipment prove measurement accuracy, because the calibration and coding processes have fixed. In Reference [8], Gupta et al. proposed an adaptive stripe intensity method prove SNR by adjusting the width of stripe according to the intensity of sunlight, does not increase the extra power consumption, but reduces the scanning efficien Reference [11], O'Toole et al. analyzed the stripe collected by a camera on direct and rect light projection paths, and the ambient noise was directly ignored, but the o At present, the main methods for 3D measurement of cultural relics and artworks include laser scanning, sequence image method and structured-light vision measurement [4][5][6][7]. Laser scanning is highlyaccuratebut expensive, and its point cloud is not dense enough to adequately describe details. Sequence image method has low requirements for equipment and is easy to operate, but it cannot measure objects without obvious texture (such as pure white porcelain bottles) and has low accuracy. Structured-light vision measurement has the advantages of high accuracy, simple equipment, easy operation and low cost, but the camera is easily affected by illumination. Considering large size, single surface color, complex texture, and low cost of snow sculpture, structured-light vision measurement is more suitable for 3D measurement of snow sculpture.
Structured-light sensors need to work in a relatively controlled light environment traditionally [8], so it is usually operated in a room, while measuring outdoors has always been a challenge. However, snow sculpture is directly exposed to the outdoor environment with sunlight, shadows, reflections, and other environmental noise factors everywhere. Night scanning can reduce the noise, but it is adverse to operator's health and work quality in the temperature range of −20 • C to −30 • C. 3D laser scanner can perform 3D measurements such as scanning ancient buildings in the day, but its sparse scanning point cloud cannot fully describe the details of snow sculpture. Therefore, we take advantage of a structured-light sensor with a laser as the light source to scan snow sculpture precisely in the field. Moreover, compared to the coding mode, which is mostly used in controlled scenes, the fixed mode structured-light sensor with better anti-noise performance [9] is more applicable for 3D measurement of snow sculpture.
It is hard to obtain a higher signal to noise ratio (SNR) even when using a laser projection outdoors, because the intensity of sunlight is usually 2-5 orders of magnitude higher than that of structured-light stripe [8]. The existing methods have not well resolved this problem. For example, Microsoft's Kinect operates difficultly in high ambient light [10]. Modifying the structured-light sensor, such as installing filter or increasing the intensity of the light source, is a method to reduce noise, but it is at the expense of robustness and cost. So many researchers prefer to solve the above problems by digital image processing.
Stripe extraction is the key factor for fixed mode structured-light equipment to improve measurement accuracy, because the calibration and coding processes have been fixed. In Reference [8], Gupta et al. proposed an adaptive stripe intensity method to improve SNR by adjusting the width of stripe according to the intensity of sunlight, which does not increase the extra power consumption, but reduces the scanning efficiency. In Reference [11], O'Toole et al. analyzed the stripe collected by a camera on direct and indirect light projection paths, and the ambient noise was directly ignored, but the optical simulating model is too complicated. In Reference [12], Steger proposedan unbiased detection method based on Hessian matrix, which can locate the curve points in sub-pixel accuracy and has strong anti-interference ability, but it relies heavily on segmentation results of stripe area. In Reference [13], Usamentiaga et al. segmented foreground information using center of Appl. Sci. 2021, 11, 3324 3 of 16 gravity method.Then the relative motion of two trapezoidal windows was used to search the laser skeleton, and curve fitting was used to mend the laser stripe gap.
All the above methods focus on eliminating noise and extracting light stripe from a single image, while few establish frame-to-frame correlation of consecutive video. Interframe correlation can be regarded as a process of target tracking. In the presence of environmental noise, surface reflection and partial occlusion, a combination of spatial and temporal information can be used to detect laser stripe stably and rapidly. In Reference [14], Janne used Kalman filter to track segmented light stripe, and their brightness, length, direction and position are used to establish the correlation of adjacent frames. This method is used to detect obstacles outdoors, but it is only suitable for part of outdoor scenes.
To solve the problem that a structured-light sensor is disturbed heavilyby environment noise in snow sculpture measurement, a 3D vision measurement method focused on light stripe extraction and tracking is proposed in this paper, as shown in Figure 2.
simulating model is too complicated. In Reference [12], Steger proposedan unbiased tection method based on Hessian matrix, which can locate the curve points in sub-p accuracy and has strong anti-interference ability, but it relies heavily on segmentatio sults of stripe area. In Reference [13], Usamentiaga et al. segmented foreground in mation using center of gravity method.Then the relative motion of two trapezoidal dows was used to search the laser skeleton, and curve fitting was used to mend the stripe gap.
All the above methods focus on eliminating noise and extracting light stripe fro single image, while few establish frame-to-frame correlation of consecutive video. I frame correlation can be regarded as a process of target tracking. In the presence of e ronmental noise, surface reflection and partial occlusion, a combination of spatial temporal information can be used to detect laser stripe stably and rapidly. In Refer [14], Janne used Kalman filter to track segmented light stripe, and their brightness, len direction and position are used to establish the correlation of adjacent frames. This me is used to detect obstacles outdoors, but it is only suitable for part of outdoor scenes.
To solve the problem that a structured-light sensor is disturbed heavilyby env ment noise in snow sculpture measurement, a 3D vision measurement method foc on light stripe extraction and tracking is proposed in this paper, as shown in Figure 2  The original images mixed with noise collected by the structured-light senso classified based on RGB color space distribution and histogram image at first. Then mal weight combination of RGB is constructed, and the color image is transformed in monochromatic value image with high SNR. At last, a Decision Tree Model is establi in a spatial and temporal context (STC-DTM) to extract and track the laser stripe. result of the model after coordinate system transformation is a 3D measurement p cloud. This method improves the performance of a measurement system through im processing technique without changing the components of the structured-light sens is universal for other similar 3D measurements.
The remainder of this paper is organized as follows. Section 2 provides details o method and their algorithms. Section 3 describes the result of experiments for evalua The original images mixed with noise collected by the structured-light sensor are classified based on RGB color space distribution and histogram image at first. Then optimal weight combination of RGB is constructed, and the color image is transformed into a monochromatic value image with high SNR. At last, a Decision Tree Model is established in a spatial and temporal context (STC-DTM) to extract and track the laser stripe. The result of the model after coordinate system transformation is a 3D measurement point cloud. This method improves the performance of a measurement system through image processing technique without changing the components of the structured-light sensor. It is universal for other similar 3D measurements.
The remainder of this paper is organized as follows. Section 2 provides details of the method and their algorithms. Section 3 describes the result of experiments for evaluating the accuracy, stability and speed of the method. The final section discusses and concludes the paper.

Noise Classification
The laser stripe in a structured-light image is the foreground, while othersignals interfering with the stripe extraction are noise. By analyzing snow sculpture measurement in actual scene, we found the main noise is from sunlight, shadows, surface color, surface reflections, and occasionally colored lights. As multiple types of noise are superimposed on images, it is hard to analyze such multi-source noise by constructing an optical path model.
In space distribution, light and surface color are global noise, whereas shadow, surface texture, and reflections are local. Snow sculpture is generally located in an open field, surrounded by snow. Since there are generally no colored buildings, flowers or other objects around, the possibilityof color pollution can be virtually eliminated.During measurement, the structured-light scanner is calibrated with sunlight. According to color distribution, sunlight and shadow usually have similar R, G, and B components, whereas the color values of colored light and surface colors are usually higher in one color dimension. Reflective noise is related to illumination and surface characteristics, and shows localized high intensity. The laser stripe isoverlaid by a large amount of environmental noise under sunlight, and the intensity distribution in R, G and B space is very close, thus presentinga single peak in the histogram, as shown in Figure 3. Conversely, the similarity between the three histograms is greatly reduced when there is interference from colored light or surface color. In the case of surface reflection, the similarity of intensity in space distribution decreases. the paper.

Noise Classification
The laser stripe in a structured-light image is the foreground, while othersignals interfering with the stripe extraction are noise. By analyzing snow sculpture measurement in actual scene, we found the main noise is from sunlight, shadows, surface color, surface reflections, and occasionally colored lights. As multiple types of noise are superimposed on images, it is hard to analyze such multi-source noise by constructing an optical path model.
In space distribution, light and surface color are global noise, whereas shadow, surface texture, and reflections are local. Snow sculpture is generally located in an open field, surrounded by snow. Since there are generally no colored buildings, flowers or other objects around, the possibilityof color pollution can be virtually eliminated.During measurement, the structured-light scanner is calibrated with sunlight. According to color distribution, sunlight and shadow usually have similar R, G, and B components, whereas the color values of colored light and surface colors are usually higher in one color dimension. Reflective noise is related to illumination and surface characteristics, and shows localized high intensity. The laser stripe isoverlaid by a large amount of environmental noise under sunlight,and the intensity distribution in R,G and B space is very close, thus presentinga single peak in the histogram, as shown in Figure 3. Conversely, the similarity between the three histograms is greatly reduced when there is interference from colored light or surface color. In the case of surface reflection, the similarity of intensity in space distribution decreases.
Based on the above analysis of color histogram,a qualitative description of the ambient noise in R, G and B space is made, which representsan approximate Gaussian distribution with different mean values and variances, and it will guide the color value space transformation.

Monochromatic Value Space Transformation
RGB color space sometimes does not show part of information we need, and some important information cannot be distinguished from the color space. Therefore, transforming a color value space to a monochromatic value space is more conducive to the extraction of laser stripe. In this paper, a linear transformation is used because it does not cause large jumps, singular value points, or discontinuities, and is faster than nonlinear transformation [15]. A gray image is the most typical monochromatic value image generated by linear transformation.
In a discrete color snow sculpture image fij (with a size of M × N), the tristimulus values of the pixels are Rij, Gij and Bij. The linear transformation is defined as Equation (1). Based on the above analysis of color histogram, a qualitative description of the ambient noise in R, G and B space is made, which representsan approximate Gaussian distribution with different mean values and variances, and it will guide the color value space transformation.

Monochromatic Value Space Transformation
RGB color space sometimes does not show part of information we need, and some important information cannot be distinguished from the color space. Therefore, transforming a color value space to a monochromatic value space is more conducive to the extraction of laser stripe. In this paper, a linear transformation is used because it does not cause large jumps, singular value points, or discontinuities, and is faster than nonlinear transformation [15]. A gray image is the most typical monochromatic value image generated by linear transformation.
In a discrete color snow sculpture image f ij (with a size of M × N), the tristimulus values of the pixels are R ij , G ij and B ij . The linear transformation is defined as Equation (1).
where F ij is the monochromatic value image after transformation, and i and j represent the row and column indexes of pixelsrespectively, with I = 1, 2 . . . , M, j = 1, 2 . . . , N, w r , w g , w b ∈ R. From Equation (1), the monochromatic value image is determined by the transformation coefficients. So, we need to define anobjective function to search for theoptimal w r , w g and w b to highlight the laser characteristics. The projected stripe of the laser is relatively concentrated and presents a concise structure, and its profiles showing all the luminance row vectors are shown in Figure 4. There is a high contrast between the laser and background, implying a highSignal to Noise Ratio (SNR). The objective function should make full use of this characteristic so that the laser stripe can be extracted easily after transformation. Therefore, the objective function is taken as a measure of the contrast.   V arimaxNorm,whichisequivalenttomaximizingKurtosis withassumedzero-mean [ 17].However,minimizingnot m aximizing( 2)isappropriateforimpulsivesourceswhen ithasakurtosislessthanzero [ 16].Analternativechoice ofobjectivefunctionmightbede nedasfollows: Thereupon,reconstructamaximizingwhichdifferentiableeverywherelike( 4) IftheimageisthoughttoprocessasaMulti-channelSig nal(with segmentsand elementspersegment),the excessKurtosiscanbewrittenas (5) where isthemeanofcolumn inthetransformimage . Thesolutionofmaximizing( 5)iscorrespondingto ( 6): The pixel values in the laser stripe are close to their averages in Figure 4. In these areas, the contrast increases with energy concentration, and the waveform is steep, suggesting that kurtosis is high. This can be seen as a contrast between the laser stripe and the background. Thus, it is reasonable to choose the kurtosis to define the contrast. The kurtosis K is defined as Equation (2).
where κ 4 is the fourth cumulant, κ 2 is the second cumulant, and µ 4 and σ 4 are the mean of fourth moment and the square of the variance t of the probability distribution respectively. In particular, the kurtosis is zero in the case of normal distribution. We expect that the laser stripe has the maximum kurtosis after transformation, while the noise signals from background have very low kurtosis, suggesting they are very disordered. In signal systems, disorder is synonymous with entropy. Entropy is affected by the amount of information and its random properties. The more information there is, the more disordered the signalsare, and the greater the entropy is. Wiggins [16] first used the minimum entropy deconvolution technique and called it the minimum entropy method. Therefore, maximizing kurtosis is equivalent to minimizing entropy, and the transformation model can be called the minimum entropy model. When K is less than 0 or if the signals come from a pulse source, minimizing is better than maximizing for the purpose of this study [17,18]. Therefore, the objective function can be further defined as Equation (3).
Equation (3) is fully differentiable, thus the maximizing function can be defined as Equation (4). The monochromatic value image F ij obtained after transformation is regarded as a multi-channel signal (with N segments and M elements per segment), and the Kurtosis can be written as where µ j is the mean of column j in the image F ij . The solution of maximizing (5) is given in Equation (6): Obviously, it is difficult to solve the above equation. We can learn such knowledge from Robert t. Collins et al. [19]. The continuous coefficient w r , w g and w b in Equation (1) determine an infinite color feature set. To calculateexpediently, w r , w g and w b are discretized as integersconfined in [−2,2], and some common color spaces and models are also covered by this range, such as R + G + B, R-B, and so on, implying that it is feasible.
The laser used in the paper is red, and R component of the original image is higher in content than the other components, so limit w r ≥ 0. The value ranges of w r , w g and w b are w r ∈ {0, 1, 2}, w g , w b ∈ {−2, −1, 0, 1, 2}, where w r , w g and w b are not allowed to be zero at the same time. Then the coefficient vector can be solved by the traversal method. In Figure 5, a red single line laser is used. The optimal coefficient vector of the captured image is w r , w g , w b = (1, −1, 0), and the monochromatic value space is R-G, whose transformation result of laser stripe is shown in Figure 5d. For comparison, Figure 5c shows the result in typical gray scale space w r , w g , w b = (0.30, 0.59, 0.11). It is obvious that R-G space has more advantages in noise suppression and better SNR is obtained from it. The laser stripe is clearly distinguished from the background in R-G space, and the amount of data is only one third of the original image in Figure 5a.

Stripe Extraction and Tracking
After image preprocessing, global noise such as illumination is eliminated to the maximum, and the geometry edge of the laser stripe is kept continuous. However, some high light regions caused by strong reflection of snow sculpture surface are similar to laser projection, and the laser stripe presents different deformations along the texturedtarget surface, which can cause a large deviation during laser Stripe extraction and tracking. With traditional methods it is difficult to balance robustness, accuracy and real-time performance. In this paper, a Decision Tree Model based on spatial and temporal context (STC-DTM) is proposed to solve these problems.

Establishment of aSpatial Decision Tree Model(S-DTM)
In a single frame, there are two basic spatial constraints between thelaser stripe and background nearby: continuity and uniqueness [20]. Although thelaser stripe is a smooth and continuous line, the jump or break points indicate it is shaded. In addition, in the same frame, thelaser stripe cannot appear in more than one place (parallel or grid light stripe as a whole). Based on these constraints, the image pixels can be classified as two states: on or off thelaser stripe. Therefore, stripe extraction can be considered a dichotomy To reduce harm of cold environment to people, we usually measure snow sculpture in the daytime. Also, snowy days are not chosen to avoid uneven distribution of new snow and its impact on measurement results. The R, G, B weights in the algorithm of this paper are dynamically changed. During each measurement, we readjust the parameters according to the calculation results to adapt to the measurement environment at that time.

Stripe Extraction and Tracking
After image preprocessing, global noise such as illumination is eliminated to the maximum, and the geometry edge of the laser stripe is kept continuous. However, some high light regions caused by strong reflection of snow sculpture surface are similar to laser projection, and the laser stripe presents different deformations along the texturedtarget surface, which can cause a large deviation during laser Stripe extraction and tracking. With traditional methods it is difficult to balance robustness, accuracy and real-time performance.
In this paper, a Decision Tree Model based on spatial and temporal context (STC-DTM) is proposed to solve these problems.

Establishment of aSpatial Decision Tree Model(S-DTM)
In a single frame, there are two basic spatial constraints between thelaser stripe and background nearby: continuity and uniqueness [20]. Although thelaser stripe is a smooth and continuous line, the jump or break points indicate it is shaded. In addition, in the same frame, thelaser stripe cannot appear in more than one place (parallel or grid light stripe as a whole). Based on these constraints, the image pixels can be classified as two states: on or off thelaser stripe. Therefore, stripe extraction can be considered a dichotomy of decision trees, taking the pixels in each column of the image as a sample set, and thenbeing classified by selected features column by column.
The brightness distribution of a laser stripeis shown in Figure 6. We can learn from the figure thatthe central point is much brighter than neighboring points, and the distribution is continuous. So, the brightness of central points is very close between adjacent columns, and their distance is usually very small. Therefore, the brightness difference and the distance are candidate features for our study.

Stripe Extraction and Tracking
After image preprocessing, global noise such as illumination is eliminated to the maximum, and the geometry edge of the laser stripe is kept continuous. However, some high light regions caused by strong reflection of snow sculpture surface are similar to laser projection, and the laser stripe presents different deformations along the texturedtarget surface, which can cause a large deviation during laser Stripe extraction and tracking. With traditional methods it is difficult to balance robustness, accuracy and real-time performance. In this paper, a Decision Tree Model based on spatial and temporal context (STC-DTM) is proposed to solve these problems.

Establishment of aSpatial Decision Tree Model(S-DTM)
In a single frame, there are two basic spatial constraints between thelaser stripe and background nearby: continuity and uniqueness [20]. Although thelaser stripe is a smooth and continuous line, the jump or break points indicate it is shaded. In addition, in the same frame, thelaser stripe cannot appear in more than one place (parallel or grid light stripe as a whole). Based on these constraints, the image pixels can be classified as two states: on or off thelaser stripe. Therefore, stripe extraction can be considered a dichotomy of decision trees, taking the pixels in each column of the image as a sample set, and thenbeing classified by selected features column by column.
The brightness distribution of a laser stripeis shown in Figure 6. We can learn from the figure thatthe central point is much brighter than neighboring points, and the distribution is continuous. So, the brightness of central points is very close between adjacent columns, and their distance is usually very small. Therefore, the brightness difference and the distance are candidate features for our study. The laser stripe represents foreground information in the image, and it can be segmented effectively by the center of mass [14,21], which can be used to thin the light stripe. Canny edge detection is performed on the obtained monochromatic image to find edge pixel points, and match adjacent edge pixel points in each column to form point-pairs.In this way, the laser stripe presents a series of point-pairs. The distance between point-pairs is relatively small and their positions are concentrated. Moreover, the brightness in the point-pairs is also very similaron the corresponding monochromatic value image. So, we can calculate the center-of-mass position of R component for each point-pair by Equation (7). The laser stripe represents foreground information in the image, and it can be segmented effectively by the center of mass [14,21], which can be used to thin the light stripe. Canny edge detection is performed on the obtained monochromatic image to find edge pixel points, and match adjacent edge pixel points in each column to form point-pairs. In this way, the laser stripe presents a series of point-pairs. The distance between pointpairs is relatively small and their positions are concentrated. Moreover, the brightness in the point-pairs is also very similaron the corresponding monochromatic value image. So, we can calculate the center-of-mass position of R component for each point-pair by Equation (7).
where w is window value (in the unit of pixel) of the calculated area, I ij is brightness value of the pixel at row i and column j on the image, and M i (w) is column coordinates. After obtaining a series of brightness peaks, the laser stripe is refined into a chain of laser skeleton, as shown in Figure 7a. However, because of noise, there cannot be only one center-of-mass point in each column, and the light stripe skeleton must be accompanied by some noise points, so we must classify it further. All the center-of-mass positions obtained in each column are taken as a position set of potentiallight stripe center points, and each position is represented as ω r (t), t = 1, 2 . . . T, where T is step(column) number of the image, r is the index at step t, and r ∈ [1, R(t)], R(t) is the amount of center-of-mass points at step t, as shown in Figure 7b.The coordinates of the edge pixel point on step t are [t, e r (t)].The mean R component of all the pixels in the point-pair ([t, e r (t)], [t, e r+1 (t)]) is m r (t), and thenthe brightness difference of the center-of-mass points between adjacent columnscan be described as Equation (8).
where i is the ith point at step (t−1), and j is the jth point at step t. In addition, the distance between the centers of mass is also an important feature to judge whether they are the center points of the laser stripe, because the center points on the laser stripe do not change position suddenly. The Euclidean distance between two pixel points can be expressed as where c i (t − 1) and c j (t) are column coordinates of the two center-of-mass points. The selected features are substantiated with data to derive H f and H d , which are the upper limit values of f ij and d ij respectively, with Gain(f ij ) > Gain(d ij ). So, the position set is classified by brightness difference f ij at first, and then estimated by distance d ij . The process of thelaser stripe extraction by S-DTM is shown in Figure 7b. The result in Figure 7c shows thelaser stripe is smoother and clearer, and noise has been well eliminated.
where w is window value (in the unit of pixel) of the calculated area, Iijis brightness value of the pixel at row i and column j on the image, and Mi(w) is column coordinates. After obtaining a series of brightness peaks, the laser stripe is refined into a chain of laser skeleton, as shown in Figure 7a. However, because of noise, there cannot be only one center-of-mass point in each column, and the light stripe skeleton must be accompanied by some noise points, so we must classify it further.
where i is the ith point at step (t-1), and j is the jth point at step t.
In addition, the distance between the centers of mass is also an important feature to judge whether they are the center points of the laser stripe, because the center points on the laser stripe do not change position suddenly. The Euclidean distance between two pixel points can be expressed as where ( 1) The selected features are substantiated with data to derive Hf and Hd, which are the upper limit values of fij and dij respectively, with Gain(fij) > Gain(dij). So, the position set is classified by brightness difference fij at first, and then estimated by distance dij. The process of thelaser stripe extraction by S-DTM is shown in Figure 7b. The result in Figure 7c shows thelaser stripe is smoother and clearer, and noise has been well eliminated.

Establishment of aTemporal Decision Tree Model (T-DTM)
Similar to spatial context correlation, there is a strong temporal relationship between a light stripe and its background in video frame sequences. A traditional research about target tracking proposes that the local context of a current frame helps predict the location of light stripe in the next frame, because the shape, color, position of the target will not change muchbetween adjacent frames, and the rate of change is relatively stable [22,23]. Video sequences captured by a structured-light sensor also have the same characteristics. Ideally, laser linesin adjacent frames are represented as many line segments and arcs with similar shape, area and color, unlessthey are changed by the texture and reflection of snow sculpture surface, sensor movement and so on. Therefore, the line segments and arcs in each frame may or may not be on the laser stripe, and they also can be classified by a decision tree.
Several possible laser strip regions, which include the optimal solution and several suboptimal solutions of the S-DTM, are defined as a series of sub-windows W n (k) = (x, y, I, ∆, s), where x and y are size of the window, I is mean brightness of thelaser stripe in the window, ∆ is the change rate of tracking result relative to the previous frame, s is the scale factor, s ∈ [0.1, 10], n is the amount of candidate windows per frame, n ∈ [1, N(k)], N(k) is the amount of sub-windows at step k, k = 1, 2, . . . , F, and F is the number of the frame. Each sub-window W n (k) is defined as a positionset of the current frame as shown in Figure 8, and measurable features between two adjacent frames are expressed as Equations (10) and (11). where P mn and I mn present the position and brightness change between W m (k) and W n (k − 1) respectively. In sub-window W n (k), the average coordinates of all pixels on the laser stripe are (x n (k), y n (k)) and the mean brightness of R component is I n (k).  V arimaxNorm,whichisequivalenttomaximizingKurtosis withassumedzero-mean [ 17].However,minimizingnot m aximizing( 2)isappropriateforimpulsivesourceswhen ithasakurtosislessthanzero [ 16].Analternativechoice ofobjectivefunctionmightbede nedasfollows:
where du and dv represent the velocity of the scanner in the u and v directions of the image space respectively, and τ is the interval between adjacent frames. So, the change rate of tracking position P ∆ is described as Equation (13).
The selected features are substantiated with data to derive H I and H ∆ , which are the upper limit values of I mn and P ∆ respectively, with Gain(P ∆ ) > Gain(I mn ). The sub-window position set is then classified according to the gain values in proper order. The process of thelaser stripe tracking by T-DTM is shown in Figure 8, where the filled circles represent the laser stripe positions.
At this point, the STC-DTM has beenestablished. Here S-DTM and T-DTM are not in a cascade relation, but unified as a whole. The video collected by structured-light sensor is the input of STC-DTM, and the output is a series of smooth centerline tracks, which are also a global optimal solution of the model. Although the input of the temporal model depends on the output of the spatial model, optimal solution of the whole model depends on the original data set and the feature set. The optimal solution of the spatial model may be suboptimal in the temporal model, and vice versa.
According to the inherent calibration equation of the structured-light sensor, the tracksfrom STC-DTM are transformed from an image coordinate system to a sensor coordinate system, and then transformed to a global coordinate system [24] to finally obtain the point cloud data of the snow sculpture surface.

Results
To evaluate the validity and applicability of the method proposed in this paper, some experiments have been carried out. In the experiments, the algorithms are implemented in C++ and tested with an Intel Core i7-4790CPU of 3.6 GHz. In the structured-light device, a red single line laser projector of 650 nm and a color camera of SONY 1/4-in Charge coupled device(CCD) are used, and the angle between them is fixed at 45 • . The video capture speed is 25 frames per second, and the image size is 640 × 480 pixels, 5.6 µm × 5.6 µm/pixel, with 8 mm of focal length, and the laser line width is about 2 mm.

Accuracy Test
In this section, a standard cube, a snow sculpture and a conventional object are measured, and then mean error and root mean square (RMS) error are calculated, respectively. The snow sculpture is large without a standard size, so the value measured by high-precision instrument is taken as the standard value, and the value measured by the method proposed in this paper is taken as the measurement value. The feature distance is used to evaluate measurement accuracy, to avoid the error caused by transformation of point coordinates. The standard value is d, and the measurement value is d . Then the mean error can be expressed as Equation (14).
where n is the number of the selected feature line segments, with i = 1, 2, . . . , n.

Standard Cube Measurement
In the experiment, a standard cube with black and white grid is designed. The size of the cube is 200 mm × 200 mm × 200 mm, the mesh of which is 20 mm × 20 mm, and the uncertainty is 0.01 mm. Six positions of the cube are chosen randomly, where 13 feature distances on the laser line are selected, as shown in Figure 9. The standard values are measured by the vernier caliper (0.02 mm), and the result is shown in Table 1    The RMS error of the feature distance is 0.125, and (0.049, 0.147, 0.113) in X, Y and Z directions. The mean error of the feature distance is 0.046 mm, and the maximum errors in X, Y and Z directions are 0.099 mm, 0.409 mm and 0.313 mm, respectively.

Snow Sculpture Measurement
A snow sculpture is selected with the approximate size of 3000 mm × 3000 mm × 3000 mm. Twenty feature points are selected on the surface of the snow sculpture, and 20 blue round marking patches with a diameter of about 2 mm and a thickness of 0.1 mm are attached to these points, as shown in Figure 10. At first, a laser tracker is used to measure each pair of feature points for three times, and an average is derived. The distance between two points is then calculated as the standard value. Next, the center of corresponding blue marking patches are in the point cloud generated by the structured-light device. The distance between them is obtained. The average of three measurements is taken as the final measurement. The resultsare shown in Table 2.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 12 of 17 marking patches are in the point cloud generated by the structured-light device. The distance between them is obtained. The average of three measurements is taken as the final measurement. The resultsare shown in Table 2.    The RMS error of the feature distance is 0.722, and (0.755, 0.588, 0.862) in X, Y and Z directions. The mean error of the feature distance is 0.574 mm, and the maximum errors in X, Y and Z directions are 1.522 mm, 1.033 mm and 1.409 mm, respectively.

Conventional Object Measurement
In the above two experiments, the measurement accuracy is quite different, so the third object is tested. The third measured object, as a conventional object, is irregular, opaque and less reflective, with a size of 4000 mm × 1000 mm × 2000 mm. 10 feature distances are selected as shown in Figure 11. The measurementsareprocessedin the same way as that in Section 3.1.2, and the result is shown in Table 3.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 13 of 17 In the above two experiments, the measurement accuracy is quite different, so the third object is tested. The third measured object, as a conventional object, is irregular, opaque and less reflective, with a size of 4000 mm × 1000 mm × 2000 mm. 10 feature distances are selected as shown in Figure 11. The measurementsareprocessedin the same way as that in Section 3.1.2, and the result is shown in Table 3.  The RMS error of the feature distance is 0.508, and (0.539, 0.297, 0.435)in X, Y and Z  The RMS error of the feature distance is 0.508, and (0.539, 0.297, 0.435) in X, Y and Z directions. The mean error of the feature distance is 0.159 mm, and the maximum errors in X, Y and Z directions are 0.782 mm, 0.402 mm and 0.625 mm, respectively.

Speed and Robustness Evaluation
In the present study, two snow sculpture scenarios are selected, corresponding to the lighting conditions at 7:00 a.m. (with little sunlight) and 10:00 a.m. (with much sunlight) respectively. By using structure from motion method with patch-based multiview stereopsismethod processing(PMVS-SFM) brought forward by Furukawa et al. [25], laser scanning method using Leica RTC360 and the method proposed in this paper, 3D reconstruction is performed, with the results given in Figures 12 and 13 a relatively sparse point cloud with vague details. Moreover, its laser measurement accuracy of surface texture of the snow sculpture is affected by secondary reflection characteristics of the surface. By comparison, the method proposed in this paper successfully addresses the impact of sunlight and secondary reflection of snow, thus leading to more satisfactory imaging. The benefits include clearer texture and higher stability and accuracy. Meanwhile, the average execution time of 1000 frames is calculated to be 21.3 ms. It is enough to ensure real-time nature of the normal video capture operation at a speed of 25 frame/s.

Discussion
The measurement error statisticalresult of the three tests in Section 3.1 are shown in Figure 14. The errors are all distributed in a certain range, and the overall measurement accuracy of snow sculpture is lower than that of the other two measurement objects, with large local errors. The reason is analyzed as follows. a relatively sparse point cloud with vague details. Moreover, its laser measurement accuracy of surface texture of the snow sculpture is affected by secondary reflection characteristics of the surface. By comparison, the method proposed in this paper successfully addresses the impact of sunlight and secondary reflection of snow, thus leading to more satisfactory imaging. The benefits include clearer texture and higher stability and accuracy. Meanwhile, the average execution time of 1000 frames is calculated to be 21.3 ms. It is enough to ensure real-time nature of the normal video capture operation at a speed of 25 frame/s.

Discussion
The measurement error statisticalresult of the three tests in Section 3.1 are shown in Figure 14. The errors are all distributed in a certain range, and the overall measurement accuracy of snow sculpture is lower than that of the other two measurement objects, with large local errors. The reason is analyzed as follows. It can be seen from the test results that for snow sculpture with complicated surface texture, a dense 3D point cloud generated with PMVS-SFM image reconstruction method of Furukawa contains more invalid areas, preventing it from well reproducing the original appearance of the snow sculpture. The measurement grows even poorer with the increase of exposure to sunlight. Laser scanning method is less sensitive to sunlight, but provides a relatively sparse point cloud with vague details. Moreover, its laser measurement accuracy of surface texture of the snow sculpture is affected by secondary reflection characteristics of the surface. By comparison, the method proposed in this paper successfully addresses the impact of sunlight and secondary reflection of snow, thus leading to more satisfactory imaging. The benefits include clearer texture and higher stability and accuracy.
Meanwhile, the average execution time of 1000 frames is calculated to be 21.3 ms. It is enough to ensure real-time nature of the normal video capture operation at a speed of 25 frame/s.

Discussion
The measurement error statisticalresult of the three tests in Section 3.1 are shown in Figure 14. The errors are all distributed in a certain range, and the overall measurement accuracy of snow sculpture is lower than that of the other two measurement objects, with large local errors. The reason is analyzed as follows.

Light Environment and Optical Characteristics of the Measured object Surfaces Affect the Measurement Accuracy
The surface characteristics of the measured objects include color, texture, brightness, roughness, etc., which produce different noise signals on the collected images. In the outdoor environment, the snow sculpture has strong reflection, complex texture, shadow and other factors that affect the measurement accuracy, which are different from the surface characteristics of standard cube and conventional object, leading to reduced accuracy. It is shown in Table 4 that the multiple additional noise sources lead to the decrease of accuracy and speed, but the measurement error is acceptable for large snow sculpture. It is also provedthrough the tests that the method is applicable in outdoor complex light environment, meaning that it can provide technical support for digital archiving of snow sculptures.

Light Environment and Optical Characteristics of the Measured Object Surfaces Affect the Measurement Accuracy
The surface characteristics of the measured objects include color, texture, brightness, roughness, etc., which produce different noise signals on the collected images. In the outdoor environment, the snow sculpture has strong reflection, complex texture, shadow and other factors that affect the measurement accuracy, which are different from the surface characteristics of standard cube and conventional object, leading to reduced accuracy. It is shown in Table 4 that the multiple additional noise sources lead to the decrease of accuracy and speed, but the measurement error is acceptable for large snow sculpture. It is also provedthrough the tests that the method is applicable in outdoor complex light environment, meaning that it can provide technical support for digital archiving of snow sculptures. Some large local errors appear at feature distances where the texture is complex, such as No.9 on snow sculpture, No.10 on conventional object, because serious distortion and occlusion of laser stripe are caused by concave-convex surfaces and lead to inaccurate stripe extraction. In addition, some long feature distances have large errors, because the accumulated errors increase during the measurement without mark points. Therefore, highaccuracy splicing method without mark points is a main task in the next-stage research to improve the accuracy.

Conclusions
This paper proposes an accurate, fast and robust 3D measurement method based on structured-light used in complexity light environment. First, an optimal monochromatic value space based on a minimum entropy model is selected for segmentation to maximize the elimination of global noise. Then, a Spatial and Temporal Context Decision Tree Model (STC-DTM) is constructed toextract and track the laser stripe accurately to obtain an accurate and dense 3D point cloud. Finally, the experiments show that this method is effective and applicable to CCD cameras without optical filtering function. Moreover, the method is universal for 3D field measurement and reconstruction in the sectors of industrial detection, cultural archaeology and criminal investigation, and therefore has a good application prospect.