The Tracking and Frequency Measurement of the Sway of Leaﬂess Deciduous Trees by Adaptive Tracking Window Based on MOSSE

: The tree sway frequency is an important part of the dynamic properties of trees. In order to obtain trees sway frequency in wind, a method of tracking and measuring the sway frequency of leaﬂess deciduous trees by adaptive tracking window based on MOSSE was proposed. Firstly, an adaptive tracking window is constructed for the observed target. Secondly, the tracking method based on Minimum Output Sum Of Squared Error Filter (MOSSE) is used to track tree sway. Thirdly, Fast Fourier transform was used to analyze the horizontal sway velocity of the target area on the trees, and the sway frequency was determined. Finally, comparing the power spectral densities (PSDs) of the x axis acceleration measured by the accelerometer and PSDs of the x axis velocity measured by the video, the fundamental sway frequency measured by the accelerometer is equal to the fundamental sway frequency measured by video. The results show that the video-based method can be used successfully for measuring the sway frequency of leaﬂess deciduous trees.


Introduction
Wind has an important influence on a single tree or whole forest stands and is one of the main causes of forest disasters. Trunks and branches bend and spin to a certain degree under the action of wind. Under the action of strong winds, the wind-induced load of trees is too high, and the trees break or are uprooted [1,2], which reduces the timber yield of trees and causes economic loss [3][4][5]. The tree sway frequency is an important part of the dynamic properties of trees. The more the tree frequency is far from the wind frequency, the higher is the safety. The fundamental sway frequency at which a tree would tend to sway is an important research focus because sway magnitude could enlarge rapidly at this frequency and lead to possible failure [6,7]. Researchers have conducted a number of outdoor experiments to measure tree sway frequencies, which can be divided into two types depending on the source of the tree's vibration excitation. The first is enacted during a time of little natural wind. A rope and winch is used to pull back the trees. When the trees is released, the free motion of the trees is recorded. This type of experiment is commonly called a "pull test" [8]. The second is the tracking of sway motion under natural wind conditions, which can truly reflect the dynamic properties of trees [6,7,[9][10][11]. Therefore, a method that can measure tree sway in the wind is of great significance to understand the dynamic properties of trees.
In order to monitor the sway and frequency response of trees, a variety of different motion capture methods are used. Researchers have used an accelerometer to measure the trees sway [12][13][14]. Roodbaraky [15] measured the trees sway using potentiometers of interest. In addition, if the tracking is carried out for a long time, it is very likely that the tracking will be lost due to the change of illumination. If dense optical flow tracking is carried out, it requires a lot of computation and takes a long time to process, so it is not suitable for tracking trees sway for a long period of video data. To address the above problems, a method is needed to measure trees sway in nature that is insensitive to light conditions, can track the selected feature consistently, and has a processing speed that is suitable for long periods of data collection.
The target tracking method based on the Minimum Output Sum Of Squared Error Filter (MOSSE) is robust to illumination, size, attitude, and non-rigid deformation of a target [29]. Trees rotate and deform under the force of wind, but this algorithm effectively tracks a target as it undergoes such movement. In addition, its computation speed is the fastest among current tracking algorithms [30]. This method does not require preprocessing of video data and tracking areas can be selected as required. However, the tracking effect of this method is affected by the size of the tracking window. As the tracking window is small, the initial training sample cannot get enough features when the affine changes, so the phenomenon of tracking loss will appear in the process of tracking trees sway. Although a large tracking window will reduce drift, it will increase the amount of computation. At the same time, if multiple branches are covered, the final measured frequency will be affected by multiple branches and be inaccurate.
For these reasons, a method of trees sway tracking and frequency measurement by adaptive tracking window based on MOSSE is proposed. This method can make the tracking window size reasonable to ensure the tracking effect and reduce the amount of computation. In this study, the sway of two features of one tree are tracked via the MOSSE method, and sway frequencies were obtained using Fast Fourier transform (FFT) analysis of the horizontal sway velocities of these features. The results were found to be comparable to frequencies derived from sway data obtained with an accelerometer. The purpose of this study is to obtain the sway frequency of trees in wind by a video-based method.

Data Acquisition and Test System
A naturally growing perennial Betula platyphylla Sukaczev tree with a height of 23.2 m and diameter at breast height (DBH) of 27.4 cm was selected as the experimental object. The total station and trigonometric elevation method are used to measure the height of the tree. DBH is the tree diameter measured 1.3 m above the ground. It was located in the Maoershan Forest Ecosystem National Field Scientific Observation and Research Station (127 • 39 50 E, 45 • 24 32 N) in Shangzhi City, Heilongjiang Province. The camera (HSAB-IPC3219H, Huabang, ShenZhen, China) used for video recording was fixed at 4.5 m high on a steel pillar about 15.9 m from the bottom of the tree. The camera angle respect to the horizontal line was about 25 • . The resolution of the camera was 1920 × 1080, and the focal length was set at 35 mm. A three-axis accelerometer (AIT2500, Jingming, Yangzhou, China) was set at 7.9 m up from the soil, below the lower tree knot of the tree to directly measure the trunk sway; the sampling frequency of the accelerometer was 50 Hz. This section is designated as S 1,a . The wind speed was measured by a three-axis ultrasonic anemometer (WindMaster, GILL, Lymington, Britain) installed at 20 m of the anemometer tower. The distance between the anemometer tower and the experimental object was 10 m. The measurement range of the ultrasonic anemometer is 0−45 m·s −1 , the resolution is 0.01 m·s −1 , and the output frequency was set to 10 Hz. The schematic diagram of the test system is shown in Figure 1.
In order to avoid the added complexity of leaves when targeting branches and trunks for measurement, video monitoring was done in autumn, during periods of strong wind (7-8 November 2019). Monitoring samples were 30 min in length. A total of 10 30-min sample videos were taken. The average wind speed during the monitoring period was 3.33 m·s −1 , and the maximum wind speed reached 12.28 m·s −1 . Two positions on the tree in the video were selected for tracking and measurement. The area of position 1 covering a segment of the major primary branch is designated as S 1,v . The area of position 2 covering a segment of a branch on the major primary branch is designated as S 2,v . The positions of the monitoring areas are shown in Figure 2. In order to avoid the added complexity of leaves when targeting branches and trunks for measurement, video monitoring was done in autumn, during periods of strong wind (7-8 November 2019). Monitoring samples were 30 min in length. A total of 10 30-min sample videos were taken. The average wind speed during the monitoring period was 3.33 m·s −1 , and the maximum wind speed reached 12.28 m·s −1 .
Two positions on the tree in the video were selected for tracking and measurement. The area of position 1 covering a segment of the major primary branch is designated as S1,v. The area of position 2 covering a segment of a branch on the major primary branch is designated as S2,v. The positions of the monitoring areas are shown in Figure 2.
The accelerometer can be used to measure the frequency at tree trunk. In this study, it is used to validate the frequency measured by video. The sampling frequency of the video is 16 Hz. Based on the formula of Bunce [6], the fundamental sway frequency of the tree is estimated to be 0.26 Hz according to its height and DBH, when height to the base of the live crown (GRTOCR) is ignored in the condition of leaf-off-above-freeze. Due to the limited resolution of the video, small amplitude movement may not be detected. In order to have a better tracking effect and improve the computing efficiency, the sampling frequency of 4Hz, which is enough to meet the measurement requirements of low-order sway frequency of trees according to the Nyquist-Shannon sampling theorem that the sampling frequency should be greater than 2 times of the highest signal frequency, was selected to measure the sway frequency of the trees.
(a) The accelerometer can be used to measure the frequency at tree trunk. In this study, it is used to validate the frequency measured by video. The sampling frequency of the video is 16 Hz. Based on the formula of Bunce [6], the fundamental sway frequency of the tree is estimated to be 0.26 Hz according to its height and DBH, when height to the base of the live crown (GRTOCR) is ignored in the condition of leaf-off-above-freeze. Due to the limited resolution of the video, small amplitude movement may not be detected. In order to have a better tracking effect and improve the computing efficiency, the sampling frequency of 4Hz, which is enough to meet the measurement requirements of low-order sway frequency of trees according to the Nyquist-Shannon sampling theorem that the sampling frequency should be greater than 2 times of the highest signal frequency, was selected to measure the sway frequency of the trees. In order to avoid the added complexity of leaves when targeting branches and trunks for measurement, video monitoring was done in autumn, during periods of strong wind (7-8 November 2019). Monitoring samples were 30 min in length. A total of 10 30-min sample videos were taken. The average wind speed during the monitoring period was 3.33 m·s −1 , and the maximum wind speed reached 12.28 m·s −1 .
Two positions on the tree in the video were selected for tracking and measurement. The area of position 1 covering a segment of the major primary branch is designated as S1,v. The area of position 2 covering a segment of a branch on the major primary branch is designated as S2,v. The positions of the monitoring areas are shown in Figure 2.
The accelerometer can be used to measure the frequency at tree trunk. In this study, it is used to validate the frequency measured by video. The sampling frequency of the video is 16 Hz. Based on the formula of Bunce [6], the fundamental sway frequency of the tree is estimated to be 0.26 Hz according to its height and DBH, when height to the base of the live crown (GRTOCR) is ignored in the condition of leaf-off-above-freeze. Due to the limited resolution of the video, small amplitude movement may not be detected. In order to have a better tracking effect and improve the computing efficiency, the sampling frequency of 4Hz, which is enough to meet the measurement requirements of low-order sway frequency of trees according to the Nyquist-Shannon sampling theorem that the sampling frequency should be greater than 2 times of the highest signal frequency, was selected to measure the sway frequency of the trees. Positions of S1,a and S1,v.

Minimum Output Sum of Squared Error Filter (MOSSE)
The minimum output sum of square of error filter (MOSSE) is a kind of algorithm which generates tracking data using a small number of training images [29]. MOSSE is used to track tree sway.
With the given training sample f i and expected output g i , the corresponding filter h i is calculated. For the desired output g i , the MOSSE filter is defined by a two-dimensional Gaussian distribution. The peak value is obtained at the center of the training image f i , and the output value decreases with distance from the centre. The image is adjusted so that the target feature is in the centre position (x i , y i ), and the expected output of the training image is defined as (1): σ is the variance of the two-dimensional Gaussian distribution. In order to reduce the computational complexity in the process of filter training, FFT can be used to transform the input image f i and the expected output g i into the frequency domain, and simple inter-element multiplication can be used to solve for the filter. After FFT, input image f i , expected output g i and filter h are respectively represented as F i , G i and H. Thus, the filter can be obtained by (2): The division calculation is the division between elements. In order to minimize the error between the output of the convolution of the filter and the sample image and the set expected value G i , the quadratic error minimization criterion is used for optimization, and the formula is shown in (3): where ⊙ represents the dot multiplication of the corresponding position. In the above optimization problem, the optimization function is a real valued function containing complex variables. H * and F i * are used to represent the conjugate variables of H and F i respectively. After solving, the approximate solution of H * is obtained, as shown in (4):

Minimum Output Sum of Squared Error Filter (MOSSE)
The minimum output sum of square of error filter (MOSSE) is a kind of algorithm which generates tracking data using a small number of training images [29]. MOSSE is used to track tree sway.
With the given training sample f i and expected output g i , the corresponding filter h i is calculated. For the desired output g i , the MOSSE filter is defined by a two-dimensional Gaussian distribution. The peak value is obtained at the center of the training image f i , and the output value decreases with distance from the centre. The image is adjusted so that the target feature is in the centre position (x i , y i ), and the expected output of the training image is defined as (1): σ is the variance of the two-dimensional Gaussian distribution. In order to reduce the computational complexity in the process of filter training, FFT can be used to transform the input image f i and the expected output g i into the frequency domain, and simple inter-element multiplication can be used to solve for the filter. After FFT, input image f i , expected output g i and filter h are respectively represented as F i , G i and H. Thus, the filter can be obtained by (2): The division calculation is the division between elements. In order to minimize the error between the output of the convolution of the filter and the sample image and the set expected value G i , the quadratic error minimization criterion is used for optimization, and the formula is shown in (3): min where represents the dot multiplication of the corresponding position. In the above optimization problem, the optimization function is a real valued function containing complex variables. H * and F * i are used to represent the conjugate variables of H and F i respectively. After solving, the approximate solution of H * is obtained, as shown in (4): H is the formula in the MOSSE filter. Using the above method to calculate the filter variable H, the MOSSE algorithm can analyse video and create a data set of the sway velocities of targeted tree features over a period of time. The process is as follows: (1) Initialization: During the training process of the filter, multiple images are obtained through random affine transformation of the initial target window in the first frame of the video. In this process, the image rotates and the target displacement is inaccurate. To solve this problem, the gray value of the image is normalized to an average of zero, and then the image is multiplied by a cosine window to make the edge value of the image close to zero, which highlights the weight of the intermediate target more. After the normalization and cosine processing, the training sample of the target was obtained, and the corresponding two-dimensional Gaussian distribution was defined as g i .
(2) Establish the target appearance model: f i and g i are converted into frequency domain variables F i and G i by FFT, and H is obtained according to (4).
(3) Motion estimation: H is used to filter the search threshold of new frame T to obtain the response G T , and the inverse Fourier transform (IFFT) is applied to it to obtain the spatial domain response g T , where g T contains the target position information.
(4) Target positioning: the position of the peak value of the spatial domain response g T is the central position of the target.
(5) Occlusion and failure detection: The Peak to Sideline Ratio (PSR) is used to determine the target occlusion and measure the quality of tracking results. PSR is defined as (5): g max is the peak, u s1 and σ s1 are the mean and standard deviation of the side lobes. (6) Target appearance update: the tracking result of processed frame and current frame is considered comprehensively, and the learning rate is introduced to adapt to the change of target appearance and prevent drift. A 0 and B 0 are 0 initially, and T is the frame number. The update is as shown in (6)- (8): η equals zero means that the filter is never updated. When η is large enough, the updated filter is most affected by the current frame, and the influence of the tracked frame on the filter decreases exponentially with time.
(1)-(4) were composed from the view point of linear evaluation. (6)-(8) divide the model formula of the filter into two parts, molecular and denominator. Each part is updated separately to be more robust to external influences such as deformation and illumination. In most cases, the local motion of the trunk and branches in wind is linear rotation or slightly distorted, so it is not necessary to consider the nonlinearity.

Adaptive Construction of Trace Window
To track tree sway with the MOSSE algorithm, the adaptive construction of a tracking window must be completed first. Tree sway is a recurrent movement, so an effective tracking window can be defined using only enough frames to depict a single complete sway motion of the targeted feature.
When the tracking position is selected, a tracking window that just covers the branch to be tracked is defined. For a video of duration T, the first T/n (n ≥ 5, n indicates that the video is divided into n segments) segments of the video track every m frames. If all trace is successful, consider the trace window size appropriate, and return. Otherwise, increase the tracking window according to the step size s (s ≥ 1) and re-track. The algorithm flow chart is shown in Figure 3. motion of the targeted feature.
When the tracking position is selected, a tracking window that just covers the branch to be tracked is defined. For a video of duration T , the first T/n (n ≥ 5, n indicates that the video is divided into n segments) segments of the video track every m frames. If all trace is successful, consider the trace window size appropriate, and return. Otherwise, increase the tracking window according to the step size s (s ≥ 1) and re-track. The algorithm flow chart is shown in Figure 3.  Figure 3. Flowchart of the adaptive construction method of the tracking window.

Method for Measuring Tree Sway Frequency
Trees kept swaying in the wind, which makes it difficult to determine the initial position of trunks and branches in the windless state. In strong winds, the movement of the branch includes its own movement and movement of the trunk, so it is difficult to measure the absolute displacement of the trunk and branch, and it is difficult to measure the sway frequency of any targeted feature via its own movements, while it is easier to measure the relative displacement of the trunk and branch. The instantaneous velocity of that relative displacement, measured with the MOSSE tracking algorithm, can be used to calculate tree sway frequency.
Trees sway consists of motion in three dimensions. Z is the line-of-site of the camera, x and y are the horizontal and vertical axes perpendicular to line-of-site of the camera. Frequency is a directionless characteristic, and so the direction or dimension chosen for measuring movement is irrelevant. A video display is 2-dimensional, and movement along the x axis and y axis of that 2-dimensional plane are most easily discriminated. In the process described in this study, movement along the x axis is measured, and the sway frequency response can be obtained by FFT of the instantaneous average velocity along the x axis. Select the upper left corner of the rectangular tracking window as the feature point, and obtain the coordinates of the feature points of the tracking window in the process of tracking the movement of trees. The change of coordinate of feature points in this process is used to calculate the average velocity along the x axis between two frames of video, as shown in (9):

Method for Measuring Tree Sway Frequency
Trees kept swaying in the wind, which makes it difficult to determine the initial position of trunks and branches in the windless state. In strong winds, the movement of the branch includes its own movement and movement of the trunk, so it is difficult to measure the absolute displacement of the trunk and branch, and it is difficult to measure the sway frequency of any targeted feature via its own movements, while it is easier to measure the relative displacement of the trunk and branch. The instantaneous velocity of that relative displacement, measured with the MOSSE tracking algorithm, can be used to calculate tree sway frequency.
Trees sway consists of motion in three dimensions. Z is the line-of-site of the camera, x and y are the horizontal and vertical axes perpendicular to line-of-site of the camera. Frequency is a directionless characteristic, and so the direction or dimension chosen for measuring movement is irrelevant. A video display is 2-dimensional, and movement along the x axis and y axis of that 2-dimensional plane are most easily discriminated. In the process described in this study, movement along the x axis is measured, and the sway frequency response can be obtained by FFT of the instantaneous average velocity along the x axis. Select the upper left corner of the rectangular tracking window as the feature point, and obtain the coordinates of the feature points of the tracking window in the process of tracking the movement of trees. The change of coordinate of feature points in this process is used to calculate the average velocity along the x axis between two frames of video, as shown in (9): v x is average velocity which can be considered as the instantaneous velocity between two frames of video, X i is the coordinate value of the feature point along the x axis on frame i, X i−1 is the coordinate value of the feature point in the x direction on frame i − 1, and ∆t is the time interval between two adjacent frames. The FFT is performed on the velocities calculated through the video set. The velocities through the video form the data set on which the FFT is performed. The FFT produces spectral information which can be displayed in the form of a spectrum. The frequency at which the greatest energy is found represents the spectral peak, which is identified as the frequency of the targeted feature.

Method Flow
The method of tree sway tracking and frequency measurement by adaptive tracking window based on MOSSE includes the following steps. First, an adaptive tracking window is constructed for the observed target. Second, the MOSSE algorithm is used to track selected features as they move within the frame. Third, the frequency of the tree is obtained by FFT analysis of the sway velocity of trees along the x-axis and the peak frequency is identified.

Wind Speed
Using an ultrasonic anemometer, we can obtain the vector data of wind speed in three directions, namely (U x , U y , U z ) in the cartesian coordinate system, and calculate the absolute value of wind speed U = U x 2 + U y 2 + U z 2 . The time history curves of absolute values of wind speed in the first four monitoring periods are shown in Figure 4.
represents the spectral peak, which is identified as the frequency of the targeted feature.

Method Flow
The method of tree sway tracking and frequency measurement by adaptive tracking window based on MOSSE includes the following steps. First, an adaptive tracking window is constructed for the observed target. Second, the MOSSE algorithm is used to track selected features as they move within the frame. Third, the frequency of the tree is obtained by FFT analysis of the sway velocity of trees along the x-axis and the peak frequency is identified.

Wind Speed
Using an ultrasonic anemometer, we can obtain the vector data of wind speed in three directions, namely (U , U , U ) in the cartesian coordinate system, and calculate the absolute value of wind speed U = U + U + U . The time history curves of absolute values of wind speed in the first four monitoring periods are shown in Figure 4.

Coordinate Change of Target Feature Points along the x Axis in Videos
The coordinate changes of target features of S 1,v , S 2,v along the x axis in the videos are obtained by using the proposed MOSSE method, and are displayed in Figure 5. The abscissa is the time, and the ordinate is the coordinate value of the feature points along the x axis in the pixel coordinate system. In the pixel coordinate system, the vertex in the upper left corner of the image plane is taken as the origin, and the x axis and y axis are parallel to the x axis and y axis of the image coordinate system, respectively, in pixels. The number of pixels represents the size of the target feature point moving along the x axis; it is proportional to the distance the target moves in the physical world in the same direction without changing the focal length.
It can be seen from Figure 5 that the coordinate values of the feature points along the x axis in the first frame in different monitoring periods are different, which indicates that the initial positions of S 1,v and S 2,v in the video are different. The abscissa of S 1,v and S 2,v fluctuated over time, indicating that the tree began to sway with the wind. Each extreme point in the graph indicates that the tree reaches its maximum movement in one direction and then starts moving in the opposite direction. The sharp part of the graph indicates that the tree sway speed is fast, while the gentle part indicates that the tree sway speed is slow. S 1,v and S 2,v have trajectories of similar shape, S 2,v showed larger fluctuations in coordinates than S 1,v . x axis in the pixel coordinate system. In the pixel coordinate system, the vertex in the up per left corner of the image plane is taken as the origin, and the x axis and y axis are parallel to the x axis and y axis of the image coordinate system, respectively, in pixels. The number of pixels represents the size of the target feature point moving along the x axis; i is proportional to the distance the target moves in the physical world in the same direction without changing the focal length. It can be seen from Figure 5 that the coordinate values of the feature points along the x axis in the first frame in different monitoring periods are different, which indicates tha the initial positions of S1,v and S2,v in the video are different. The abscissa of S1,v and S2, fluctuated over time, indicating that the tree began to sway with the wind. Each extreme point in the graph indicates that the tree reaches its maximum movement in one direction and then starts moving in the opposite direction. The sharp part of the graph indicates that the tree sway speed is fast, while the gentle part indicates that the tree sway speed is slow. S1,v and S2,v have trajectories of similar shape, S2,v showed larger fluctuations in co ordinates than S1,v.

The Velocity of the Feature Point along the x Axis in the Videos
Equation (9) was used to calculate the velocity time history curves of S1,v and S2,v. The velocity time history curves of feature points of S1,v were shown in Figure 6. The abscissa is time and the ordinate is the velocity of the feature point along the x-axis.

The Velocity of the Feature Point along the x Axis in the Videos
Equation (9) was used to calculate the velocity time history curves of S 1,v and S 2,v . The velocity time history curves of feature points of S 1,v were shown in Figure 6. The abscissa is time and the ordinate is the velocity of the feature point along the x-axis. It can be seen from Figure 6 that the initial speeds of the first frame are different in different monitoring periods. As time changes, the speed begins to fluctuate. If the value of the speed is not zero, it indicates that the tree is swaying. If the value of the speed is zero, it indicates that the tree is at rest, or represents the extreme point in one direction; a this time, it will start to move in another direction. It can be seen from Figure 6 that the initial speeds of the first frame are different in different monitoring periods. As time changes, the speed begins to fluctuate. If the value of the speed is not zero, it indicates that the tree is swaying. If the value of the speed is zero, it indicates that the tree is at rest, or represents the extreme point in one direction; at this time, it will start to move in another direction.

Comparison of Frequency Measured by a Video and an Accelerometer
Spectral analysis using FFT in OriginPro9.1 was used to calculate the frequencies of the tree. The power spectral density (PSD) of the x axis acceleration measured by the accelerometer and the x axis velocity measured by the video provides a measure of the tree's response as a function of frequency. According to the sampling theorem, the maximum signal frequency obtained from video is 2 Hz. Therefore, for comparison, the frequency range of power spectrum obtained by an accelerometer and video is limited to 2 Hz. The power of each spectrum was averaged over a bandwidth of 0.01 Hz in order to enhance the statistical stability of the estimates [18].

Discussion
From a comparison with the measurement results of the accelerometer, it can be se that the frequency of the target can be identified by FFT analysis of the motion signal the target based on the video, which is comparable with the measurement results of t accelerometer. The method based on video can be used to collect data convenientlyvideo can be recorded at any time with digital cameras or even mobile phones. In add tion, although only the leafless period of deciduous species is monitored, there is no lim to the leafless period if the monitoring target is the trunk. In addition, monitoring of ful leaved trees is a future research direction.
Although video has the advantages of easy data collection and visualization, it st has its limitations. When collecting video data, it is easy to be affected by collection equi ment and natural environment, such as camera shaking, light, rain, snow, and fog, and is not possible to monitor trees in dense forests. Due to the influence of these factors, the will be a variety of noises and disturbances in the image, which will affect the monitorin effect of the video method on tree sway. The method does not work at night due to ligh In rainy, snowy, and foggy weather, tracking will fail due to video blur. Target trackin in a hazy video is a typical problem. Chaudhary [31] studied this problem and propose ResINet, which is a novel end-to-end convolution neural network for image de-hazin As can be seen from Figure 7, the spectra resembled the Type I or III reported by Baker [17], with a recognizable peak between 0.2 and 0.4 Hz. The peak frequencies of the acceleration average PSD spectrum of S 1,a and the velocity average PSD spectrum of S 1,v are 0.27 Hz. The peak frequency of the velocity average PSD spectrum of S 2,v is 0.26Hz. The peak frequency is the fundamental sway frequency of the tree. The proposed method can be used for motion tracking during leafless periods and for fundamental vibration mode detection. In addition, the frequencies of S 1,a , S 1,v are equal, and the frequencies of S 2,v are close to S 1,a , and S 1,v . This is consistent with the results of Baker [17], where multiple points on a tree were measured and the frequencies of these points were close. This is because frequencies along the same tree are always the same at each point; what changes is vibration amplitude and acceleration due to variable damping along the tree structure.

Discussion
From a comparison with the measurement results of the accelerometer, it can be seen that the frequency of the target can be identified by FFT analysis of the motion signal of the target based on the video, which is comparable with the measurement results of the accelerometer. The method based on video can be used to collect data conveniently-a video can be recorded at any time with digital cameras or even mobile phones. In addition, although only the leafless period of deciduous species is monitored, there is no limit to the leafless period if the monitoring target is the trunk. In addition, monitoring of fully leaved trees is a future research direction.
Although video has the advantages of easy data collection and visualization, it still has its limitations. When collecting video data, it is easy to be affected by collection equipment and natural environment, such as camera shaking, light, rain, snow, and fog, and it is not possible to monitor trees in dense forests. Due to the influence of these factors, there will be a variety of noises and disturbances in the image, which will affect the monitoring effect of the video method on tree sway. The method does not work at night due to light. In rainy, snowy, and foggy weather, tracking will fail due to video blur. Target tracking in a hazy video is a typical problem. Chaudhary [31] studied this problem and proposed ResINet, which is a novel end-to-end convolution neural network for image de-hazing. We have not studied this question, which will be a more in-depth study. However, it should be noted that our method is mainly used to monitor trees in the leafless period, during which there is less rain. In addition, a single camera cannot capture the three-dimensional movement of trees; we only monitor the movement along the x axis in the video.
In the case of strong winds or high winds, parts of other trees will collide with parts of the sample tree. When the target is occluded by a small part or the background changes little, it has little impact on the tracking effect. The principle of MOSSE algorithm is to generate a high response (correlation peak) for each interested target in the scene and a low response for the background. The position corresponding to the maximum value in the expected output is the new position of the tracked target. Therefore, when the target is partially occluded or the background changes little, the tracking target will always correspond to the correlation peak. In addition, Peak to Sidelobe Ratio (PSR) is used to determine the quality of tracking results during the tracking process. If a large area of occlusion occurs, the PSR value will determine the tracking failure, and the tracking will continue when the target reappears.
In high winds, there may be greater deformation and sway of trunks and branches. The MOSSE algorithm is robust to the sway amplitude and small deformation of tracking target; the sway amplitude, especially, hardly affects the tracking effect. However, if the stiffness of the tree is low and large bending occurs, it may lead to tracking failure. If the stiffness is large enough, the tracking target only has small deformation, which will not lead to tracking failure.

Conclusions
An adaptive tracking window is constructed based on the MOSSE algorithm, and a method for tracking and measuring the frequency of leafless deciduous tree sway is established; the fundamental mode of the tree sway is analyzed. A Betula platyphylla Sukaczev tree located in the Maoershan Forest Ecosystem National Field Scientific Observation and Research Station was taken as the research object. The results measured by the video tracking method were compared with those measured by the accelerometer to verify the feasibility and effectiveness of the method. The main conclusions are as follows: 1.
The video-based method can be used successfully for measuring tree sway frequency under field conditions. The fundamental sway frequency measured by the accelerometer is equal to the fundamental sway frequency measured by the video.

2.
The key to this method is the construction of an adaptive tracking window. The two problems owing to which tracking fails-tracking window being too small and tracking speed and measurement accuracy are reduced due to the tracking window being too large-is addressed with the adaptive tracking window. The instantaneous velocity of the tree is calculated, and the frequency response of the tree is obtained by using FFT for spectrum analysis of instantaneous velocity. 3.
The frequency identification method of trees is based on the tracking method based on MOSSE, which ensures that the method is robust and fast and can track tree sway for a long time. In addition, the installation of the equipment is simple; thus, the method has cost efficiency performance in frequency measurement.
However, this method does not work at night, might not work for strong winds, and is only for leafless trees. In addition, a single camera cannot obtain tree depth information. In future, we will study the tracking and measurement of three-dimensional movement of trees and extend our research to multi-species and whole-leaf trees.
Author Contributions: D.X., A.W. and X.Y. designed the experiment. A.W. and X.Y. carried out the experiments, analysed the data, and wrote the original manuscript with the help and critique of D.X., A.W. and X.Y. contributed equally to this work. All authors have read and agreed to the published version of the manuscript.