A Feature Extraction and Classiﬁcation Method to Forecast the PM 2.5 Variation Trend Using Candlestick and Visual Geometry Group Model

: Currently, the continuous change prediction of PM 2.5 concentration is an air pollution research hotspot. Combining physical methods and deep learning models to divide the pollution process of PM 2.5 into effective multiple types is necessary to achieve a reliable prediction of the PM 2.5 value. Therefore, a candlestick chart sample generator was designed to generate the candlestick chart from the online PM 2.5 continuous monitoring data of the Guilin monitoring station site. After these generated candlestick charts were analyzed through the Gaussian diffusion model, it was found that the characteristics of the physical transmission process of PM 2.5 pollutants can be reﬂected. Based on a set three-day period, using the time linear convolution method, 2188 sets of candlestick chart data were obtained from the 2013–2018 PM 2.5 concentration data. There existed 16 categories generated by unsupervised classiﬁcation that met the established classiﬁcation judgment standards. After the statistical analysis, it was found that the accuracy rate of the change trend of these classiﬁcations reached 99.68% during the next period. Using the candlestick chart data as the training dataset, the Visual Geometry Group (VGG) model, an improved convolutional neural network model, was used for the classiﬁcation. The experimental results showed that the overall accuracy ( OA ) value of the candlestick chart combination classiﬁcation was 96.19%, and the Kappa coefﬁcient was 0.960. IN the VGG model, the overall accuracy was improved by 1.93%, on average, compared with the support vector machines (SVM), LeNet, and AlexNet models. According to the experimental results, using the VGG classiﬁcation method to classify continuous pollution data in the form of candlestick charts can more comprehensively retain the characteristics of the physical pollution process and provide a classiﬁcation basis for accurately predicting PM 2.5 values. At the same time, the statistical feasibility of this method has been proved.


Introduction
An increase in PM 2.5 has a very serious impact on human health and may induce lung cancer, leukemia, breast cancer, and other malignant tumors [1][2][3][4]. To protect public health, many monitoring stations have been built to detect real-time PM 2.5 concentrations. These data provide a basis for predicting PM 2.5 values. The research on the classification of PM 2.5 data is the basis for studying the principle of PM 2.5 physical diffusion. At present, most researchers directly use the original PM 2.5 data to carry out the numerical prediction research of PM 2.5 through the black-box model. However, because black-box models only reflect the general causal relationship between related factors, they cannot express the

The Candlestick Chart
The Japanese candlestick chart was developed by Munehisa Homma during the 18th century and introduced to the Western world by Steve Nison in his book published in 1991 [46]. The candlestick chart is composed of an opening price, highest price, lowest price, and closing price. The color of the candlestick chart is determined by the opening price and the closing price. A green candlestick chart means that the closing price is higher than the opening price, and a red candlestick chart means that the closing price is lower than the opening price. For one day of PM2.5 data, the initial value corresponds to the opening price, the end value corresponds to the closing price, the minimum value corresponds to the lowest price, and the maximum value corresponds to the highest price, as shown in Figure 1.
A red candlestick means that the end value is smaller than the initial value, indicating that the PM2.5 concentration is decreasing. A green candlestick means that the end value is greater than the initial value, indicating that the PM2.5 concentration is rising. In Figure  1, Open and Close in A and B are opposite, and Low and High are the same. PM2.5 exists as the initial, end, maximum, and minimum values in different periods, which vary regularly. The candlestick chart is made from these four characteristic values, and it is able to reflect the time-series variation regulation [47]. In the financial field, a candlestick chart is used based on perceptual cognition due to a lack of mechanism support. Therefore, a method for PM2.5 data feature extraction and classification combined with the Gaussian diffusion model and candlestick chart was designed that will provide basic theoretical support for the extraction of pollution process features during the pollutant PM2.5 data period. A red candlestick means that the end value is smaller than the initial value, indicating that the PM 2.5 concentration is decreasing. A green candlestick means that the end value is greater than the initial value, indicating that the PM 2.5 concentration is rising. In Figure 1, Open and Close in A and B are opposite, and Low and High are the same. PM 2.5 exists as the initial, end, maximum, and minimum values in different periods, which vary regularly. The candlestick chart is made from these four characteristic values, and it is able to reflect the time-series variation regulation [47]. In the financial field, a candlestick chart is used based on perceptual cognition due to a lack of mechanism support. Therefore, a method for PM 2.5 data feature extraction and classification combined with the Gaussian diffusion model and candlestick chart was designed that will provide basic theoretical support for the extraction of pollution process features during the pollutant PM 2.5 data period.

The Gaussian Diffusion Model and the Candlestick Chart
In the practical work of an atmospheric environmental impact assessment, Gaussian diffusion is typically used for the atmospheric diffusion calculation [48][49][50]. The Gaussian diffusion model is a point source diffusion model that is suitable for uniform atmospheric conditions and an area of wide and flat ground [51]. It can be used to discuss the diffusion of PM 2.5 . The specific equation is as follows: X(x, y, z, t, H) = Q 2πvσ y σ z exp(− 1 2 (1) where X (x, y, z, t, H) is the gas concentration (kg/m 3 ) diffused x meters downwind, y meters laterally, and z meters above the ground; σ x , σ y , and σ z (m) are the diffusion parameters on the x, y, and z axes, respectively, and calculated according to the atmospheric stability selection parameter; H (m) is the height of the monitoring point; and v (m/s) is the wind speed.
Without considering the spatial model, one-dimensional analysis of the PM 2.5 diffusion process between the stations was conducted in combination with the source intensity, wind direction, and wind speed. The site located in the upwind direction of the target site was regarded as the occurrence location of the PM 2.5 . To simplify the target site upwind site as the birthplace of PM 2.5 , it was referred to as the strong source. The Gaussian diffusion model was established to simulate the diffusion process of PM 2.5 and analyze the PM 2.5 of the target stations.
As shown in Figure 2, A, B, C, and D represent the four sites in the research area, with site C as the target site and site B as the PM 2.5 source site. Assuming that the wind direction was from site B to site C, the diffusion process from site B to site C is indicated by the arrow. In the practical work of an atmospheric environmental impact assessment, Gaussian diffusion is typically used for the atmospheric diffusion calculation [48][49][50]. The Gaussian diffusion model is a point source diffusion model that is suitable for uniform atmospheric conditions and an area of wide and flat ground [51]. It can be used to discuss the diffusion of PM2. 5. The specific equation is as follows: ( , , , , ) = 2 ( − 1 2 ) × ( ( − 1 2 ( − ) ) + ( − 1 2 ( + ) )) where X (x, y, z, t, H) is the gas concentration (kg/m 3 ) diffused x meters downwind, y meters laterally, and z meters above the ground; σx, σy, and σz (m) are the diffusion parameters on the x, y, and z axes, respectively, and calculated according to the atmospheric stability selection parameter; H (m) is the height of the monitoring point; and v (m/s) is the wind speed.
Without considering the spatial model, one-dimensional analysis of the PM2.5 diffusion process between the stations was conducted in combination with the source intensity, wind direction, and wind speed. The site located in the upwind direction of the target site was regarded as the occurrence location of the PM2.5. To simplify the target site upwind site as the birthplace of PM2.5, it was referred to as the strong source. The Gaussian diffusion model was established to simulate the diffusion process of PM2.5 and analyze the PM2.5 of the target stations.
As shown in Figure 2, A, B, C, and D represent the four sites in the research area, with site C as the target site and site B as the PM2.5 source site. Assuming that the wind direction was from site B to site C, the diffusion process from site B to site C is indicated by the arrow. Supposing that only one-dimensional diffusion was considered in the simulation process, and Q is the concentration of PM2.5 at the strong source, v is the wind speed, d is the wind direction, and the concentration of PM2.5 at the target point C is equivalent to the concentration of gas X in the Gaussian diffusion model. When the PM2.5 data of Site C was only affected by site B, the parameter sensitivity analysis combined with the Gaussian diffusion equation will result in the following three situations showed in Figure 3: When Q is unchanged, the relationship between the change in v and C, and the relationship between the change between d and C are shown in Figure 3a,b respectively. Moreover, when v and d remain unchanged, the relationship between Q and C is shown in Figure 3c. Supposing that only one-dimensional diffusion was considered in the simulation process, and Q is the concentration of PM 2.5 at the strong source, v is the wind speed, d is the wind direction, and the concentration of PM 2.5 at the target point C is equivalent to the concentration of gas X in the Gaussian diffusion model. When the PM 2.5 data of Site C was only affected by site B, the parameter sensitivity analysis combined with the Gaussian diffusion equation will result in the following three situations showed in Figure 3: When Q is unchanged, the relationship between the change in v and C, and the relationship between the change between d and C are shown in Figure 3a,b respectively. Moreover, when v and d remain unchanged, the relationship between Q and C is shown in Figure 3c. In the practical work of an atmospheric environmental impact assessment, Gaussian diffusion is typically used for the atmospheric diffusion calculation [48][49][50]. The Gaussian diffusion model is a point source diffusion model that is suitable for uniform atmospheric conditions and an area of wide and flat ground [51]. It can be used to discuss the diffusion of PM2. 5. The specific equation is as follows: ( , , , , ) = 2 ( − 1 2 ) × ( ( − 1 2 ( − ) ) + ( − 1 2 ( + ) )) where X (x, y, z, t, H) is the gas concentration (kg/m 3 ) diffused x meters downwind, y meters laterally, and z meters above the ground; σx, σy, and σz (m) are the diffusion parameters on the x, y, and z axes, respectively, and calculated according to the atmospheric stability selection parameter; H (m) is the height of the monitoring point; and v (m/s) is the wind speed.
Without considering the spatial model, one-dimensional analysis of the PM2.5 diffusion process between the stations was conducted in combination with the source intensity, wind direction, and wind speed. The site located in the upwind direction of the target site was regarded as the occurrence location of the PM2.5. To simplify the target site upwind site as the birthplace of PM2.5, it was referred to as the strong source. The Gaussian diffusion model was established to simulate the diffusion process of PM2.5 and analyze the PM2.5 of the target stations.
As shown in Figure 2, A, B, C, and D represent the four sites in the research area, with site C as the target site and site B as the PM2.5 source site. Assuming that the wind direction was from site B to site C, the diffusion process from site B to site C is indicated by the arrow. Supposing that only one-dimensional diffusion was considered in the simulation process, and Q is the concentration of PM2.5 at the strong source, v is the wind speed, d is the wind direction, and the concentration of PM2.5 at the target point C is equivalent to the concentration of gas X in the Gaussian diffusion model. When the PM2.5 data of Site C was only affected by site B, the parameter sensitivity analysis combined with the Gaussian diffusion equation will result in the following three situations showed in Figure 3: When Q is unchanged, the relationship between the change in v and C, and the relationship between the change between d and C are shown in Figure 3a,b respectively. Moreover, when v and d remain unchanged, the relationship between Q and C is shown in Figure 3c. Relationship between the PM 2.5 concentration change at the target point. Assuming that Q is unchanged, the relationship between the change in v and C is shown in (a). Assuming that Q is unchanged, the relationship between the change in d and C is shown in (b). Assuming that v and d remain unchanged, the relationship between Q and C is shown in (c). Table 1 shows the Gaussian process corresponding to the nine forms of the candlestick chart. Among them, the wind direction (d) is increased when the wind blows from the pollution source to the target site, and the wind direction (d) is decreased when there is no wind blowing from the pollution source to the target site. The wind speed (v) is increased when the wind speed from the pollution source increases, and the wind speed (v) is decreased when the wind speed from the pollution source decreases. Y represents that the item has changed, and N represents that the item has not changed.  Figure 3. Relationship between the PM2.5 concentration change at the target point. Assuming that Q is unchanged, the relationship between the change in v and C is shown in (a). Assuming that Q is unchanged, the relationship between the change in d and C is shown in (b). Assuming that v and d remain unchanged, the relationship between Q and C is shown in (c). Table 1 shows the Gaussian process corresponding to the nine forms of the candlestick chart. Among them, the wind direction (d) is increased when the wind blows from the pollution source to the target site, and the wind direction (d) is decreased when there is no wind blowing from the pollution source to the target site. The wind speed (v) is increased when the wind speed from the pollution source increases, and the wind speed (v) is decreased when the wind speed from the pollution source decreases. Y represents that the item has changed, and N represents that the item has not changed.

Candlestick
Chart The source intensity (Q), wind speed (v), and wind direction (d) extracted by the Gaussian equation directly affect the change of PM2.5 concentration. These three variables are also related to the variables affecting stocks in the financial market. Among them, the source intensity (Q) corresponds to the trading volume of the stock, which has a direct and obvious impact on the stock. The wind speed (v) corresponds to the trading speed of the stock, which affects the stock index to some extent. The wind direction (d) corresponds to an increase or decrease in stock holdings, which directly determines the direction of the stock.
The PM2.5 candlestick chart is composed of an initial value, an end value, a maximum value, and a minimum value. Among them, a red candlestick represents the overall downward trend of PM2.5 concentration on this day. A green candlestick represents the overall increase in PM2.5 concentration on this day. According to the four values of the initial value, end value, maximum value, and minimum value, the length of the upper shadow line, the lower shadow line, and the entity are confirmed. The nine basic forms of PM2.5 candlestick charts are obtained from differences in the length and color of the upper and lower shadows and the entities. Table 2 shows the calculation methods of the nine basic forms of PM2.5 candlestick chart. Relationship between the PM2.5 concentration change at the target point. Assuming that Q is unchanged, the relationship between the change in v and C is shown in (a). Assuming that Q is unchanged, the relationship between the change in d and C is shown in (b). Assuming that v and d remain unchanged, the relationship between Q and C is shown in (c). Table 1 shows the Gaussian process corresponding to the nine forms of the candlestick chart. Among them, the wind direction (d) is increased when the wind blows from the pollution source to the target site, and the wind direction (d) is decreased when there is no wind blowing from the pollution source to the target site. The wind speed (v) is increased when the wind speed from the pollution source increases, and the wind speed (v) is decreased when the wind speed from the pollution source decreases. Y represents that the item has changed, and N represents that the item has not changed.

Candlestick
Chart The source intensity (Q), wind speed (v), and wind direction (d) extracted by the Gaussian equation directly affect the change of PM2.5 concentration. These three variables are also related to the variables affecting stocks in the financial market. Among them, the source intensity (Q) corresponds to the trading volume of the stock, which has a direct and obvious impact on the stock. The wind speed (v) corresponds to the trading speed of the stock, which affects the stock index to some extent. The wind direction (d) corresponds to an increase or decrease in stock holdings, which directly determines the direction of the stock.
The PM2.5 candlestick chart is composed of an initial value, an end value, a maximum value, and a minimum value. Among them, a red candlestick represents the overall downward trend of PM2.5 concentration on this day. A green candlestick represents the overall increase in PM2.5 concentration on this day. According to the four values of the initial value, end value, maximum value, and minimum value, the length of the upper shadow line, the lower shadow line, and the entity are confirmed. The nine basic forms of PM2.5 candlestick charts are obtained from differences in the length and color of the upper and lower shadows and the entities. Table 2 shows the calculation methods of the nine basic forms of PM2.5 candlestick chart. Relationship between the PM2.5 concentration change at the target point. Assuming that Q is unchanged, the relationship between the change in v and C is shown in (a). Assuming that Q is unchanged, the relationship between the change in d and C is shown in (b). Assuming that v and d remain unchanged, the relationship between Q and C is shown in (c). Table 1 shows the Gaussian process corresponding to the nine forms of the candlestick chart. Among them, the wind direction (d) is increased when the wind blows from the pollution source to the target site, and the wind direction (d) is decreased when there is no wind blowing from the pollution source to the target site. The wind speed (v) is increased when the wind speed from the pollution source increases, and the wind speed (v) is decreased when the wind speed from the pollution source decreases. Y represents that the item has changed, and N represents that the item has not changed.

Candlestick
Chart The source intensity (Q), wind speed (v), and wind direction (d) extracted by the Gaussian equation directly affect the change of PM2.5 concentration. These three variables are also related to the variables affecting stocks in the financial market. Among them, the source intensity (Q) corresponds to the trading volume of the stock, which has a direct and obvious impact on the stock. The wind speed (v) corresponds to the trading speed of the stock, which affects the stock index to some extent. The wind direction (d) corresponds to an increase or decrease in stock holdings, which directly determines the direction of the stock.
The PM2.5 candlestick chart is composed of an initial value, an end value, a maximum value, and a minimum value. Among them, a red candlestick represents the overall downward trend of PM2.5 concentration on this day. A green candlestick represents the overall increase in PM2.5 concentration on this day. According to the four values of the initial value, end value, maximum value, and minimum value, the length of the upper shadow line, the lower shadow line, and the entity are confirmed. The nine basic forms of PM2.5 candlestick charts are obtained from differences in the length and color of the upper and lower shadows and the entities. Table 2 shows the calculation methods of the nine basic forms of PM2.5 candlestick chart. Relationship between the PM2.5 concentration change at the target point. Assuming that Q is unchanged, the relationship between the change in v and C is shown in (a). Assuming that Q is unchanged, the relationship between the change in d and C is shown in (b). Assuming that v and d remain unchanged, the relationship between Q and C is shown in (c). Table 1 shows the Gaussian process corresponding to the nine forms of the candlestick chart. Among them, the wind direction (d) is increased when the wind blows from the pollution source to the target site, and the wind direction (d) is decreased when there is no wind blowing from the pollution source to the target site. The wind speed (v) is increased when the wind speed from the pollution source increases, and the wind speed (v) is decreased when the wind speed from the pollution source decreases. Y represents that the item has changed, and N represents that the item has not changed.

Candlestick
Chart The source intensity (Q), wind speed (v), and wind direction (d) extracted by the Gaussian equation directly affect the change of PM2.5 concentration. These three variables are also related to the variables affecting stocks in the financial market. Among them, the source intensity (Q) corresponds to the trading volume of the stock, which has a direct and obvious impact on the stock. The wind speed (v) corresponds to the trading speed of the stock, which affects the stock index to some extent. The wind direction (d) corresponds to an increase or decrease in stock holdings, which directly determines the direction of the stock.
The PM2.5 candlestick chart is composed of an initial value, an end value, a maximum value, and a minimum value. Among them, a red candlestick represents the overall downward trend of PM2.5 concentration on this day. A green candlestick represents the overall increase in PM2.5 concentration on this day. According to the four values of the initial value, end value, maximum value, and minimum value, the length of the upper shadow line, the lower shadow line, and the entity are confirmed. The nine basic forms of PM2.5 candlestick charts are obtained from differences in the length and color of the upper and lower shadows and the entities. Table 2 shows the calculation methods of the nine basic forms of PM2.5 candlestick chart. Relationship between the PM2.5 concentration change at the target point. Assuming that Q is unchanged, the relationship between the change in v and C is shown in (a). Assuming that Q is unchanged, the relationship between the change in d and C is shown in (b). Assuming that v and d remain unchanged, the relationship between Q and C is shown in (c). Table 1 shows the Gaussian process corresponding to the nine forms of the candlestick chart. Among them, the wind direction (d) is increased when the wind blows from the pollution source to the target site, and the wind direction (d) is decreased when there is no wind blowing from the pollution source to the target site. The wind speed (v) is increased when the wind speed from the pollution source increases, and the wind speed (v) is decreased when the wind speed from the pollution source decreases. Y represents that the item has changed, and N represents that the item has not changed.

Candlestick
Chart The source intensity (Q), wind speed (v), and wind direction (d) extracted by the Gaussian equation directly affect the change of PM2.5 concentration. These three variables are also related to the variables affecting stocks in the financial market. Among them, the source intensity (Q) corresponds to the trading volume of the stock, which has a direct and obvious impact on the stock. The wind speed (v) corresponds to the trading speed of the stock, which affects the stock index to some extent. The wind direction (d) corresponds to an increase or decrease in stock holdings, which directly determines the direction of the stock.
The PM2.5 candlestick chart is composed of an initial value, an end value, a maximum value, and a minimum value. Among them, a red candlestick represents the overall downward trend of PM2.5 concentration on this day. A green candlestick represents the overall increase in PM2.5 concentration on this day. According to the four values of the initial value, end value, maximum value, and minimum value, the length of the upper shadow line, the lower shadow line, and the entity are confirmed. The nine basic forms of PM2.5 candlestick charts are obtained from differences in the length and color of the upper and lower shadows and the entities. Table 2 shows the calculation methods of the nine basic forms of PM2.5 candlestick chart. Relationship between the PM2.5 concentration change at the target point. Assuming that Q is unchanged, the relationship between the change in v and C is shown in (a). Assuming that Q is unchanged, the relationship between the change in d and C is shown in (b). Assuming that v and d remain unchanged, the relationship between Q and C is shown in (c). Table 1 shows the Gaussian process corresponding to the nine forms of the candlestick chart. Among them, the wind direction (d) is increased when the wind blows from the pollution source to the target site, and the wind direction (d) is decreased when there is no wind blowing from the pollution source to the target site. The wind speed (v) is increased when the wind speed from the pollution source increases, and the wind speed (v) is decreased when the wind speed from the pollution source decreases. Y represents that the item has changed, and N represents that the item has not changed.

Candlestick
Chart The source intensity (Q), wind speed (v), and wind direction (d) extracted by the Gaussian equation directly affect the change of PM2.5 concentration. These three variables are also related to the variables affecting stocks in the financial market. Among them, the source intensity (Q) corresponds to the trading volume of the stock, which has a direct and obvious impact on the stock. The wind speed (v) corresponds to the trading speed of the stock, which affects the stock index to some extent. The wind direction (d) corresponds to an increase or decrease in stock holdings, which directly determines the direction of the stock.
The PM2.5 candlestick chart is composed of an initial value, an end value, a maximum value, and a minimum value. Among them, a red candlestick represents the overall downward trend of PM2.5 concentration on this day. A green candlestick represents the overall increase in PM2.5 concentration on this day. According to the four values of the initial value, end value, maximum value, and minimum value, the length of the upper shadow line, the lower shadow line, and the entity are confirmed. The nine basic forms of PM2.5 candlestick charts are obtained from differences in the length and color of the upper and lower shadows and the entities. Table 2 shows the calculation methods of the nine basic forms of PM2.5 candlestick chart. Relationship between the PM2.5 concentration change at the target point. Assuming that Q is unchanged, the relationship between the change in v and C is shown in (a). Assuming that Q is unchanged, the relationship between the change in d and C is shown in (b). Assuming that v and d remain unchanged, the relationship between Q and C is shown in (c). Table 1 shows the Gaussian process corresponding to the nine forms of the candlestick chart. Among them, the wind direction (d) is increased when the wind blows from the pollution source to the target site, and the wind direction (d) is decreased when there is no wind blowing from the pollution source to the target site. The wind speed (v) is increased when the wind speed from the pollution source increases, and the wind speed (v) is decreased when the wind speed from the pollution source decreases. Y represents that the item has changed, and N represents that the item has not changed.

Candlestick
Chart The source intensity (Q), wind speed (v), and wind direction (d) extracted by the Gaussian equation directly affect the change of PM2.5 concentration. These three variables are also related to the variables affecting stocks in the financial market. Among them, the source intensity (Q) corresponds to the trading volume of the stock, which has a direct and obvious impact on the stock. The wind speed (v) corresponds to the trading speed of the stock, which affects the stock index to some extent. The wind direction (d) corresponds to an increase or decrease in stock holdings, which directly determines the direction of the stock.
The PM2.5 candlestick chart is composed of an initial value, an end value, a maximum value, and a minimum value. Among them, a red candlestick represents the overall downward trend of PM2.5 concentration on this day. A green candlestick represents the overall increase in PM2.5 concentration on this day. According to the four values of the initial value, end value, maximum value, and minimum value, the length of the upper shadow line, the lower shadow line, and the entity are confirmed. The nine basic forms of PM2.5 candlestick charts are obtained from differences in the length and color of the upper and lower shadows and the entities. Table 2 shows the calculation methods of the nine basic forms of PM2.5 candlestick chart. Relationship between the PM2.5 concentration change at the target point. Assuming that Q is unchanged, the relationship between the change in v and C is shown in (a). Assuming that Q is unchanged, the relationship between the change in d and C is shown in (b). Assuming that v and d remain unchanged, the relationship between Q and C is shown in (c). Table 1 shows the Gaussian process corresponding to the nine forms of the candlestick chart. Among them, the wind direction (d) is increased when the wind blows from the pollution source to the target site, and the wind direction (d) is decreased when there is no wind blowing from the pollution source to the target site. The wind speed (v) is increased when the wind speed from the pollution source increases, and the wind speed (v) is decreased when the wind speed from the pollution source decreases. Y represents that the item has changed, and N represents that the item has not changed.

Candlestick
Chart The source intensity (Q), wind speed (v), and wind direction (d) extracted by the Gaussian equation directly affect the change of PM2.5 concentration. These three variables are also related to the variables affecting stocks in the financial market. Among them, the source intensity (Q) corresponds to the trading volume of the stock, which has a direct and obvious impact on the stock. The wind speed (v) corresponds to the trading speed of the stock, which affects the stock index to some extent. The wind direction (d) corresponds to an increase or decrease in stock holdings, which directly determines the direction of the stock.
The PM2.5 candlestick chart is composed of an initial value, an end value, a maximum value, and a minimum value. Among them, a red candlestick represents the overall downward trend of PM2.5 concentration on this day. A green candlestick represents the overall increase in PM2.5 concentration on this day. According to the four values of the initial value, end value, maximum value, and minimum value, the length of the upper shadow line, the lower shadow line, and the entity are confirmed. The nine basic forms of PM2.5 candlestick charts are obtained from differences in the length and color of the upper and lower shadows and the entities. Table 2 shows the calculation methods of the nine basic forms of PM2.5 candlestick chart. Relationship between the PM2.5 concentration change at the target point. Assuming that Q is unchanged, the relationship between the change in v and C is shown in (a). Assuming that Q is unchanged, the relationship between the change in d and C is shown in (b). Assuming that v and d remain unchanged, the relationship between Q and C is shown in (c). Table 1 shows the Gaussian process corresponding to the nine forms of the candlestick chart. Among them, the wind direction (d) is increased when the wind blows from the pollution source to the target site, and the wind direction (d) is decreased when there is no wind blowing from the pollution source to the target site. The wind speed (v) is increased when the wind speed from the pollution source increases, and the wind speed (v) is decreased when the wind speed from the pollution source decreases. Y represents that the item has changed, and N represents that the item has not changed.

Candlestick
Chart The source intensity (Q), wind speed (v), and wind direction (d) extracted by the Gaussian equation directly affect the change of PM2.5 concentration. These three variables are also related to the variables affecting stocks in the financial market. Among them, the source intensity (Q) corresponds to the trading volume of the stock, which has a direct and obvious impact on the stock. The wind speed (v) corresponds to the trading speed of the stock, which affects the stock index to some extent. The wind direction (d) corresponds to an increase or decrease in stock holdings, which directly determines the direction of the stock.
The PM2.5 candlestick chart is composed of an initial value, an end value, a maximum value, and a minimum value. Among them, a red candlestick represents the overall downward trend of PM2.5 concentration on this day. A green candlestick represents the overall increase in PM2.5 concentration on this day. According to the four values of the initial value, end value, maximum value, and minimum value, the length of the upper shadow line, the lower shadow line, and the entity are confirmed. The nine basic forms of PM2.5 candlestick charts are obtained from differences in the length and color of the upper and lower shadows and the entities. Table 2 shows the calculation methods of the nine basic forms of PM2.5 candlestick chart.
The source intensity (Q), wind speed (v), and wind direction (d) extracted by the Gaussian equation directly affect the change of PM 2.5 concentration. These three variables are also related to the variables affecting stocks in the financial market. Among them, the source intensity (Q) corresponds to the trading volume of the stock, which has a direct and obvious impact on the stock. The wind speed (v) corresponds to the trading speed of the stock, which affects the stock index to some extent. The wind direction (d) corresponds to an increase or decrease in stock holdings, which directly determines the direction of the stock.
The PM 2.5 candlestick chart is composed of an initial value, an end value, a maximum value, and a minimum value. Among them, a red candlestick represents the overall downward trend of PM 2.5 concentration on this day. A green candlestick represents the overall increase in PM 2.5 concentration on this day. According to the four values of the initial value, end value, maximum value, and minimum value, the length of the upper shadow line, the lower shadow line, and the entity are confirmed. The nine basic forms of PM 2.5 candlestick charts are obtained from differences in the length and color of the upper and lower shadows and the entities. Table 2 shows the calculation methods of the nine basic forms of PM 2.5 candlestick chart.  Assuming that Q is unchanged, the relationship between the change in v and C is shown in (a). Assuming that Q is unchanged, the relationship between the change in d and C is shown in (b). Assuming that v and d remain unchanged, the relationship between Q and C is shown in (c). Table 1 shows the Gaussian process corresponding to the nine forms of the candlestick chart. Among them, the wind direction (d) is increased when the wind blows from the pollution source to the target site, and the wind direction (d) is decreased when there is no wind blowing from the pollution source to the target site. The wind speed (v) is increased when the wind speed from the pollution source increases, and the wind speed (v) is decreased when the wind speed from the pollution source decreases. Y represents that the item has changed, and N represents that the item has not changed.
The source intensity (Q), wind speed (v), and wind direction (d) extracted by the Gaussian equation directly affect the change of PM2. 5 concentration. These three variables are also related to the variables affecting stocks in the financial market. Among them, the source intensity (Q) corresponds to the trading volume of the stock, which has a direct and obvious impact on the stock. The wind speed (v) corresponds to the trading speed of the stock, which affects the stock index to some extent. The wind direction (d) corresponds to an increase or decrease in stock holdings, which directly determines the direction of the stock.  Assuming that Q is unchanged, the relationship between the change in v and C is shown in (a). Assuming that Q is unchanged, the relationship between the change in d and C is shown in (b). Assuming that v and d remain unchanged, the relationship between Q and C is shown in (c). Table 1 shows the Gaussian process corresponding to the nine forms of the candlestick chart. Among them, the wind direction (d) is increased when the wind blows from the pollution source to the target site, and the wind direction (d) is decreased when there is no wind blowing from the pollution source to the target site. The wind speed (v) is increased when the wind speed from the pollution source increases, and the wind speed (v) is decreased when the wind speed from the pollution source decreases. Y represents that the item has changed, and N represents that the item has not changed.
The source intensity (Q), wind speed (v), and wind direction (d) extracted by the Gaussian equation directly affect the change of PM2. 5 concentration. These three variables are also related to the variables affecting stocks in the financial market. Among them, the source intensity (Q) corresponds to the trading volume of the stock, which has a direct  . Relationship between the PM2.5 concentration change at the target point. Assuming that Q is unchanged, the relationship between the change in v and C is shown in (a). Assuming that Q is unchanged, the relationship between the change in d and C is shown in (b). Assuming that v and d remain unchanged, the relationship between Q and C is shown in (c). Table 1 shows the Gaussian process corresponding to the nine forms of the candlestick chart. Among them, the wind direction (d) is increased when the wind blows from the pollution source to the target site, and the wind direction (d) is decreased when there is no wind blowing from the pollution source to the target site. The wind speed (v) is increased when the wind speed from the pollution source increases, and the wind speed (v) is decreased when the wind speed from the pollution source decreases. Y represents that the item has changed, and N represents that the item has not changed.

Candlestick
Chart Assuming that Q is unchanged, the relationship between the change in v and C is shown in (a). Assuming that Q is unchanged, the relationship between the change in d and C is shown in (b). Assuming that v and d remain unchanged, the relationship between Q and C is shown in (c). Table 1 shows the Gaussian process corresponding to the nine forms of the candlestick chart. Among them, the wind direction (d) is increased when the wind blows from the pollution source to the target site, and the wind direction (d) is decreased when there is no wind blowing from the pollution source to the target site. The wind speed (v) is increased when the wind speed from the pollution source increases, and the wind speed (v) is decreased when the wind speed from the pollution source decreases. Y represents that the item has changed, and N represents that the item has not changed.

The PM 2.5 Candlestick Chart
The Computed Mode Table 1 shows the Gaussian process corresponding to the nine forms of the candlestick chart. Among them, the wind direction (d) is increased when the wind blows from the pollution source to the target site, and the wind direction (d) is decreased when there is no wind blowing from the pollution source to the target site. The wind speed (v) is increased when the wind speed from the pollution source increases, and the wind speed (v) is decreased when the wind speed from the pollution source decreases. Y represents that the item has changed, and N represents that the item has not changed.

Candlestick
Chart The source intensity (Q), wind speed (v), and wind direction (d) extracted by the Gaussian equation directly affect the change of PM2.5 concentration. These three variables are also related to the variables affecting stocks in the financial market. Among them, the source intensity (Q) corresponds to the trading volume of the stock, which has a direct and obvious impact on the stock. The wind speed (v) corresponds to the trading speed of the stock, which affects the stock index to some extent. The wind direction (d) corresponds to an increase or decrease in stock holdings, which directly determines the direction of the stock.
The PM2.5 candlestick chart is composed of an initial value, an end value, a maximum value, and a minimum value. Among them, a red candlestick represents the overall downward trend of PM2.5 concentration on this day. A green candlestick represents the overall increase in PM2.5 concentration on this day. According to the four values of the initial value, end value, maximum value, and minimum value, the length of the upper shadow line, the lower shadow line, and the entity are confirmed. The nine basic forms of PM2.5 candlestick charts are obtained from differences in the length and color of the upper and lower shadows and the entities. Table 2 shows the calculation methods of the nine basic forms of PM2.5 candlestick chart.

MAX = END > INIT = MIN
, the relationship between the change in d and C is shown in (b). Assuming that v n unchanged, the relationship between Q and C is shown in (c).
1 shows the Gaussian process corresponding to the nine forms of the chart. Among them, the wind direction (d) is increased when the wind blows llution source to the target site, and the wind direction (d) is decreased when wind blowing from the pollution source to the target site. The wind speed (v) when the wind speed from the pollution source increases, and the wind speed ased when the wind speed from the pollution source decreases. Y represents has changed, and N represents that the item has not changed. e basic patterns of the candlestick charts.
, and wind direction (d) extracted by the quation directly affect the change of PM2.5 concentration. These three variables ated to the variables affecting stocks in the financial market. Among them, the nsity (Q) corresponds to the trading volume of the stock, which has a direct s impact on the stock. The wind speed (v) corresponds to the trading speed of hich affects the stock index to some extent. The wind direction (d) corresponds se or decrease in stock holdings, which directly determines the direction of the 2.5 candlestick chart is composed of an initial value, an end value, a maximum a minimum value. Among them, a red candlestick represents the overall trend of PM2.5 concentration on this day. A green candlestick represents the ease in PM2.5 concentration on this day. According to the four values of the e, end value, maximum value, and minimum value, the length of the upper e, the lower shadow line, and the entity are confirmed. The nine basic forms of estick charts are obtained from differences in the length and color of the upper shadows and the entities. Table 2 shows the calculation methods of the nine of PM2.5 candlestick chart.

MAX = INIT > END = MIN
ship between the PM2.5 concentration change at the target point. Assuming that he relationship between the change in v and C is shown in (a). Assuming that Q relationship between the change in d and C is shown in (b). Assuming that v hanged, the relationship between Q and C is shown in (c).
ows the Gaussian process corresponding to the nine forms of the t. Among them, the wind direction (d) is increased when the wind blows on source to the target site, and the wind direction (d) is decreased when blowing from the pollution source to the target site. The wind speed (v) n the wind speed from the pollution source increases, and the wind speed when the wind speed from the pollution source decreases. Y represents s changed, and N represents that the item has not changed.
ic patterns of the candlestick charts.
, and wind direction (d) extracted by the on directly affect the change of PM2.5 concentration. These three variables to the variables affecting stocks in the financial market. Among them, the (Q) corresponds to the trading volume of the stock, which has a direct act on the stock. The wind speed (v) corresponds to the trading speed of affects the stock index to some extent. The wind direction (d) corresponds decrease in stock holdings, which directly determines the direction of the andlestick chart is composed of an initial value, an end value, a maximum inimum value. Among them, a red candlestick represents the overall d of PM2.5 concentration on this day. A green candlestick represents the in PM2.5 concentration on this day. According to the four values of the d value, maximum value, and minimum value, the length of the upper lower shadow line, and the entity are confirmed. The nine basic forms of k charts are obtained from differences in the length and color of the upper ows and the entities. Table 2 shows the calculation methods of the nine M2.5 candlestick chart.

MAX > INIT > END = MIN
5 of 20 etween the PM2.5 concentration change at the target point. Assuming that ationship between the change in v and C is shown in (a). Assuming that Q onship between the change in d and C is shown in (b). Assuming that v d, the relationship between Q and C is shown in (c).
the Gaussian process corresponding to the nine forms of the ong them, the wind direction (d) is increased when the wind blows urce to the target site, and the wind direction (d) is decreased when ing from the pollution source to the target site. The wind speed (v) wind speed from the pollution source increases, and the wind speed n the wind speed from the pollution source decreases. Y represents nged, and N represents that the item has not changed.
terns of the candlestick charts.
, and wind direction (d) extracted by the irectly affect the change of PM2.5 concentration. These three variables variables affecting stocks in the financial market. Among them, the corresponds to the trading volume of the stock, which has a direct n the stock. The wind speed (v) corresponds to the trading speed of ts the stock index to some extent. The wind direction (d) corresponds ease in stock holdings, which directly determines the direction of the stick chart is composed of an initial value, an end value, a maximum um value. Among them, a red candlestick represents the overall PM2.5 concentration on this day. A green candlestick represents the M2.5 concentration on this day. According to the four values of the ue, maximum value, and minimum value, the length of the upper er shadow line, and the entity are confirmed. The nine basic forms of rts are obtained from differences in the length and color of the upper and the entities. Table 2 shows the calculation methods of the nine andlestick chart.

MAX = INIT > END > MIN
5 of 20 en the PM2.5 concentration change at the target point. Assuming that ship between the change in v and C is shown in (a). Assuming that Q ip between the change in d and C is shown in (b). Assuming that v e relationship between Q and C is shown in (c).
Gaussian process corresponding to the nine forms of the them, the wind direction (d) is increased when the wind blows to the target site, and the wind direction (d) is decreased when from the pollution source to the target site. The wind speed (v) d speed from the pollution source increases, and the wind speed wind speed from the pollution source decreases. Y represents , and N represents that the item has not changed.
of the candlestick charts.
, wind speed (v), and wind direction (d) extracted by the y affect the change of PM2.5 concentration. These three variables iables affecting stocks in the financial market. Among them, the sponds to the trading volume of the stock, which has a direct e stock. The wind speed (v) corresponds to the trading speed of e stock index to some extent. The wind direction (d) corresponds in stock holdings, which directly determines the direction of the chart is composed of an initial value, an end value, a maximum alue. Among them, a red candlestick represents the overall 5 concentration on this day. A green candlestick represents the oncentration on this day. According to the four values of the aximum value, and minimum value, the length of the upper adow line, and the entity are confirmed. The nine basic forms of e obtained from differences in the length and color of the upper the entities. Table 2 shows the calculation methods of the nine estick chart.

MAX > INIT > END > MIN
* MAX is the maximum value, END is the end value, INIT is the initial value, MIN is the minimum value.

Data Sources
The data for the study came from the online monitoring stations of air quality in Guilin. Since the primary pollution in Guilin is from external sources, Guilin is an ideal source of data. Using the PM 2.5 data of Guilin City to reflect the relationship between the Gaussian diffusion model and the candlestick chart, this research was less affected by sudden changes. In addition, the transmission of PM 2.5 between the stations in Guilin City was regarded as a uniform atmospheric condition. The Guilin Monitoring Station was selected as the target site. The hourly PM 2.5 data from 2013 to 2018 was selected as the basic dataset, which included the six-year hourly PM 2.5 data of the station. Figure 4 shows the location of the target site.

Technical Route
A method was designed to extract the transmission characteristics of sequential PM2.5 using the candlestick chart. The six-year PM2.5 hourly data of the Guilin Monitoring Station was used as the research data. First, the candlestick chart sample generator was

Technical Route
A method was designed to extract the transmission characteristics of sequential PM 2.5 using the candlestick chart. The six-year PM 2.5 hourly data of the Guilin Monitoring Station was used as the research data. First, the candlestick chart sample generator was designed to convert the PM 2.5 data into a three-day candlestick chart format. Then these candlestick charts were classified to find the possible combination types using unsupervised classification methods. In addition, the accuracy of the unsupervised classification was obtained by judging the change trend of the PM 2.5 concentration of each type during the next period. Finally, the candlestick chart marked with the classification labels was trained and classified using the VGG model. After the classification results were obtained, the classification accuracy of the VGG model was counted and compared with other classification models. The PM 2.5 data classification framework is shown in Figure 5.

Candlestick Chart Sample Generator
The real body of the candlestick is composed of the initial and end values of the PM2.5 data for a 24-h day, as shown in Figure 6. The upper and lower shadows of the candlestick are formed by connecting the maximum and minimum of the PM2.5 data for the 24-h day and the physical column by thin lines. As a result, the PM2.5 data in the form of a candlestick chart is displayed.
Since PM2.5 hourly data is used as basic research data, there are 24 values for one day of PM2.5 data. The entity of the candlestick chart is composed of the initial value and the end value of the day. The maximum value of the candlestick chart is the highest value of PM2.5 concentration in a day, and the minimum value of the candlestick chart is the lowest value of PM2.5 concentration in a day. In this way, one day of PM2.5 data is transformed into a candlestick chart.
The convolution principle was adopted by the candlestick chart sample generator. The PM2.5 data formed a candlestick graph every three days by setting the sliding window size to three days and the sliding step to one day, as shown in Figure 7. A candlestick chart combination was formed using the three-day PM2.5 data. In Figure 7, the time-series data is continuous PM2.5 data in daily units, with 24 data per day. The candlestick chart sample generator only coevolved the sequence of time.

Candlestick Chart Sample Generator
The real body of the candlestick is composed of the initial and end values of the PM 2.5 data for a 24-h day, as shown in Figure 6. The upper and lower shadows of the candlestick are formed by connecting the maximum and minimum of the PM 2.5 data for the 24-h day and the physical column by thin lines. As a result, the PM 2.5 data in the form of a candlestick chart is displayed.
Since PM 2.5 hourly data is used as basic research data, there are 24 values for one day of PM 2.5 data. The entity of the candlestick chart is composed of the initial value and the end value of the day. The maximum value of the candlestick chart is the highest value of PM 2.5 concentration in a day, and the minimum value of the candlestick chart is the lowest value of PM 2.5 concentration in a day. In this way, one day of PM 2.5 data is transformed into a candlestick chart.

Candlestick Chart Unsupervised Classification and Evaluation
The candlestick chart image data classification here refers to the extraction and differentiation of the characteristics of the PM2.5 transmission process. The candlestick chart underwent image processing and was analyzed using unsupervised classification, and then the results were evaluated.
In order to improve the accuracy of candlestick chart classification, it is necessary to determine the duration of PM2.5 pollution, and use this to determine the duration of a candlestick chart combination. PM2.5 data from January 2013, a period of severe pollution, were selected as the study object. Figure 8 shows a line chart of PM2.5 data at the monitoring station in January 2013.
It can be seen from Figure 8 that the duration of PM2.5 pollution that occurred during the selected time is three days. It can also be said that the value of PM2.5 will reach its peak after three days from the beginning of PM2.5 pollution. After verification with a large amount of data, it was found that the PM2.5 pollution duration of the site was three days most of the time. After analysis, it was found that this was because the source intensity, Q, wind speed, v, and wind direction, d, are updated faster, so that the PM2.5 pollution situation will be updated within three days. Therefore, it was most appropriate to judge the average change in the PM2.5 concentration over the following three days. Formula (2) is a specific evaluation formula. At the same time, it is also determined that the duration of the next candlestick chart combination is three days. The convolution principle was adopted by the candlestick chart sample generator. The PM 2.5 data formed a candlestick graph every three days by setting the sliding window size to three days and the sliding step to one day, as shown in Figure 7. A candlestick chart combination was formed using the three-day PM 2.5 data. In Figure 7, the time-series data is continuous PM 2.5 data in daily units, with 24 data per day. The candlestick chart sample generator only coevolved the sequence of time.

Candlestick Chart Unsupervised Classification and Evaluation
The candlestick chart image data classification here refers to the extraction and differentiation of the characteristics of the PM2.5 transmission process. The candlestick chart underwent image processing and was analyzed using unsupervised classification, and then the results were evaluated.
In order to improve the accuracy of candlestick chart classification, it is necessary to determine the duration of PM2.5 pollution, and use this to determine the duration of a candlestick chart combination. PM2.5 data from January 2013, a period of severe pollution, were selected as the study object. Figure 8 shows a line chart of PM2.5 data at the monitoring station in January 2013.
It can be seen from Figure 8 that the duration of PM2.5 pollution that occurred during the selected time is three days. It can also be said that the value of PM2.5 will reach its peak after three days from the beginning of PM2.5 pollution. After verification with a large amount of data, it was found that the PM2.5 pollution duration of the site was three days most of the time. After analysis, it was found that this was because the source intensity, Q, wind speed, v, and wind direction, d, are updated faster, so that the PM2.5 pollution situation will be updated within three days. Therefore, it was most appropriate to judge the average change in the PM2.5 concentration over the following three days. Formula (2) is a specific evaluation formula. At the same time, it is also determined that the duration of the next candlestick chart combination is three days.

Candlestick Chart Unsupervised Classification and Evaluation
The candlestick chart image data classification here refers to the extraction and differentiation of the characteristics of the PM 2.5 transmission process. The candlestick chart underwent image processing and was analyzed using unsupervised classification, and then the results were evaluated.
In order to improve the accuracy of candlestick chart classification, it is necessary to determine the duration of PM 2.5 pollution, and use this to determine the duration of a candlestick chart combination. PM 2.5 data from January 2013, a period of severe pollution, were selected as the study object. Figure 8 shows a line chart of PM 2.5 data at the monitoring station in January 2013.
It can be seen from Figure 8 that the duration of PM 2.5 pollution that occurred during the selected time is three days. It can also be said that the value of PM 2.5 will reach its peak after three days from the beginning of PM 2.5 pollution. After verification with a large amount of data, it was found that the PM 2.5 pollution duration of the site was three days most of the time. After analysis, it was found that this was because the source intensity, Q, wind speed, v, and wind direction, d, are updated faster, so that the PM 2.5 pollution Atmosphere 2021, 12, 570 9 of 19 situation will be updated within three days. Therefore, it was most appropriate to judge the average change in the PM 2.5 concentration over the following three days. Formula (2) is a specific evaluation formula. At the same time, it is also determined that the duration of the next candlestick chart combination is three days. where Y represents the difference between the current three-day average PM2.5 concentration and the next three-day average PM2.5 concentration. When Y > 0, it means that the pollution will be reduced in the future; when Y < 0, it means that pollution will increase in the future. xi is the current average PM2.5 concentration on day i, and xj is the average PM2.5 concentration on day j in the future.

VGG Model
VGG is a network model proposed by the Oxford Visual Geometry Group that was adapted from the CNN model [52]. The improvement in the VGG model compared to the CNN model is that it uses several consecutive 3 × 3 convolution kernels to replace the larger convolution kernels of the CNN model. The VGG model replaces the large-scale convolution kernel by stacking multiple small convolution kernels, which reduce the training parameters while ensuring the same receptive field. In the convolutional layer, the calculation of the receptive field is as follows: where rn is the size of the receptive field of this layer; kn is the size of the convolution kernel of this layer; and Sn is the size of the convolution stride.
For the classification experiment of the PM2.5 data in the form of a candlestick chart, a VGG model was designed that contained six fundamental hidden layers; namely, a convolutional layer, a pooling layer, a flattened layer, a fully connected layer, and two other functional layers (i.e., the flattened layer and the dropout layer), as shown in Figure  8.
The rectifying linear element (ReLU) was used as the activation function for all of the hidden layers in VGG model, which can effectively avoid the gradient disappearance problem. The max () function was used to describe the ReLU function, as shown in Equation (4): The ReLU function is equivalent to nonlinear mapping, which can increase the expression capacity of the network. Each weight, , of the feature map can be calculated according to Equation (5): where represents the kernel weight of the jth feature graph at layer i, which connects all the feature graphs at layer i − 1. Mj represents all the feature graphs connected by the jth feature graph in layer i. Cross entropy is used as the cost function, which is defined as: where Y represents the difference between the current three-day average PM 2.5 concentration and the next three-day average PM 2.5 concentration. When Y > 0, it means that the pollution will be reduced in the future; when Y < 0, it means that pollution will increase in the future. x i is the current average PM 2.5 concentration on day i, and x j is the average PM 2.5 concentration on day j in the future.

VGG Model
VGG is a network model proposed by the Oxford Visual Geometry Group that was adapted from the CNN model [52]. The improvement in the VGG model compared to the CNN model is that it uses several consecutive 3 × 3 convolution kernels to replace the larger convolution kernels of the CNN model. The VGG model replaces the largescale convolution kernel by stacking multiple small convolution kernels, which reduce the training parameters while ensuring the same receptive field. In the convolutional layer, the calculation of the receptive field is as follows: where r n is the size of the receptive field of this layer; k n is the size of the convolution kernel of this layer; and S n is the size of the convolution stride. For the classification experiment of the PM 2.5 data in the form of a candlestick chart, a VGG model was designed that contained six fundamental hidden layers; namely, a convolutional layer, a pooling layer, a flattened layer, a fully connected layer, and two other functional layers (i.e., the flattened layer and the dropout layer), as shown in Figure 9.
The rectifying linear element (ReLU) was used as the activation function for all of the hidden layers in VGG model, which can effectively avoid the gradient disappearance problem. The max () function was used to describe the ReLU function, as shown in Equation (4): where n is the number of training instances; ( ) is the ith training and an instance of the kth forecast results; and ( ) represents the kth true result of the ith training instance:

Evaluation Index
To achieve the optimal hyper-parameter values, the performance of the VGG model was evaluated using two metrics: overall accuracy (OA) and the Kappa index [53][54][55].
OA refers to the proportion of correctly classified samples to all samples, and its calculation equation is: where TP is a positive sample that is correctly classified by the model; FN is a positive sample that is incorrectly classified by the model; FP is a negative sample that is incorrectly classified by the model; and TN is a negative sample that is correctly classified by the model. The Kappa coefficient is a type of ratio that represents the ratio of the error reduction between classifications and a completely random classification. Its calculation equation is: where p0 is the sum of the number of samples correctly classified in each category divided by the total number of samples, which is OA. Supposing that the number of real samples in each category is a1, a2, …, ac, the predicted number of samples in each category is b1, b2, …, bc, and the total number of samples is N, then: = × + × + ⋯ + × (10) Figure 9. Architecture of the constructed VGG model.
The ReLU function is equivalent to nonlinear mapping, which can increase the expression capacity of the network. Each weight, a i j, of the feature map can be calculated according to Equation (5): where w i j represents the kernel weight of the jth feature graph at layer i, which connects all the feature graphs at layer i − 1. M j represents all the feature graphs connected by the jth feature graph in layer i. Cross entropy is used as the cost function, which is defined as: where n is the number of training instances;ŷ (i) k is the ith training and an instance of the kth forecast results; and y (i) k represents the kth true result of the ith training instance:

Evaluation Index
To achieve the optimal hyper-parameter values, the performance of the VGG model was evaluated using two metrics: overall accuracy (OA) and the Kappa index [53][54][55].
OA refers to the proportion of correctly classified samples to all samples, and its calculation equation is: where TP is a positive sample that is correctly classified by the model; FN is a positive sample that is incorrectly classified by the model; FP is a negative sample that is incorrectly classified by the model; and TN is a negative sample that is correctly classified by the model. The Kappa coefficient is a type of ratio that represents the ratio of the error reduction between classifications and a completely random classification. Its calculation equation is: where p 0 is the sum of the number of samples correctly classified in each category divided by the total number of samples, which is OA. Supposing that the number of real samples in each category is a 1 , a 2 , . . . , a c , the predicted number of samples in each category is b 1 , b 2 , . . . , b c , and the total number of samples is N, then:

Hyper Parameter Settings
To evaluate the performance of the VGG model, some hyperparameters need to be set. The primary parameters that need to be set are the default dimensions of the VGG model (m), the input size (s i ), the number of convolution kernels (n c ), the size of the convolution kernel (s c ), the size of the pooling window (s p ), and the number of dense units (n d ). The hyperparameter setting here adopts the hyperparameter setting of the CNN model in Suoyan Pan's research [56]. Table 3 shows the hyper-parameters in the VGG model. Table 3. Hyper-parameters involved in the VGG model.

Hyper-Parameters Initial Values
Default dimension of the VGG model (m) 2 Input size (s i ) 9 Number of convolution kernels (n c ) 256 size of the convolution kernel (s c ) 3 Pooling window size (s p ) 2 Number of dense units (n d ) 1024

Candlestick Chart Combination
After implementing unsupervised classification of 2188 groups of PM 2.5 data, from the Guilin Monitoring Station, in the form of a candlestick chart, 16 candlestick chart combinations were obtained. Using Equation (2) as the evaluation index, the accurate data of future change trend prediction reached 99.68%, which was verified using the PM 2.5 data of the site from 2013 to 2018. It showed that the future change trend of PM 2.5 was accurately obtained using these 16 candlestick chart combinations, as shown in Tables 4 and 5. Candlestick chart x FOR PEER REVIEW 12 of 20

Hyper Parameter Settings
To evaluate the performance of the VGG model, some hyperparameters need to be set. The primary parameters that need to be set are the default dimensions of the VGG model (m), the input size (si), the number of convolution kernels (nc), the size of the convolution kernel (sc), the size of the pooling window (sp), and the number of dense units (nd). The hyperparameter setting here adopts the hyperparameter setting of the CNN model in Suoyan Pan's research [56]. Table 3 shows the hyper-parameters in the VGG model.

Candlestick Chart Combination
After implementing unsupervised classification of 2188 groups of PM2.5 data, from the Guilin Monitoring Station, in the form of a candlestick chart, 16 candlestick chart combinations were obtained. Using Equation (2) as the evaluation index, the accurate data of future change trend prediction reached 99.68%, which was verified using the PM2.5 data of the site from 2013 to 2018. It showed that the future change trend of PM2.5 was accurately obtained using these 16 candlestick chart combinations, as shown in Tables 4 and 5.
In Tables 4 and 5, the 16 candlestick chart combinations are listed. Among them, eight combinations predicted that the future PM2.5 concentration will increase, and eight combinations predicted that the future PM2.5 concentration will decrease. It also lists the corresponding relationship between the 16 combinations that will cause changes in the PM2.5 concentration in the following days, the parameter changes in the Gaussian equation, and the proportion of each category to the total number of samples. In Tables 4 and 5, Y represents that the item has changed, and N represents that the item has not changed. As Guilin is a low-industry city, the primary form of pollution is from external pollution sources. Hence, the 16 changes in the figure below will not appear when the source strength, Q, wind speed, v, and wind direction, d, do not change. Candlestick chart

Hyper Parameter Settings
To evaluate the performance of the VGG model, some hyperparameters need to be set. The primary parameters that need to be set are the default dimensions of the VGG model (m), the input size (si), the number of convolution kernels (nc), the size of the convolution kernel (sc), the size of the pooling window (sp), and the number of dense units (nd). The hyperparameter setting here adopts the hyperparameter setting of the CNN model in Suoyan Pan's research [56]. Table 3 shows the hyper-parameters in the VGG model.

Candlestick Chart Combination
After implementing unsupervised classification of 2188 groups of PM2.5 data, from the Guilin Monitoring Station, in the form of a candlestick chart, 16 candlestick chart combinations were obtained. Using Equation (2) as the evaluation index, the accurate data of future change trend prediction reached 99.68%, which was verified using the PM2.5 data of the site from 2013 to 2018. It showed that the future change trend of PM2.5 was accurately obtained using these 16 candlestick chart combinations, as shown in Tables 4 and 5.
In Tables 4 and 5, the 16 candlestick chart combinations are listed. Among them, eight combinations predicted that the future PM2.5 concentration will increase, and eight combinations predicted that the future PM2.5 concentration will decrease. It also lists the corresponding relationship between the 16 combinations that will cause changes in the PM2.5 concentration in the following days, the parameter changes in the Gaussian equation, and the proportion of each category to the total number of samples. In Tables 4 and 5, Y represents that the item has changed, and N represents that the item has not changed. As Guilin is a low-industry city, the primary form of pollution is from external pollution sources. Hence, the 16 changes in the figure below will not appear when the source strength, Q, wind speed, v, and wind direction, d, do not change. Candlestick chart

Hyper Parameter Settings
To evaluate the performance of the VGG model, some hyperparameters need to be set. The primary parameters that need to be set are the default dimensions of the VGG model (m), the input size (si), the number of convolution kernels (nc), the size of the convolution kernel (sc), the size of the pooling window (sp), and the number of dense units (nd). The hyperparameter setting here adopts the hyperparameter setting of the CNN model in Suoyan Pan's research [56]. Table 3 shows the hyper-parameters in the VGG model.

Candlestick Chart Combination
After implementing unsupervised classification of 2188 groups of PM2.5 data, from the Guilin Monitoring Station, in the form of a candlestick chart, 16 candlestick chart combinations were obtained. Using Equation (2) as the evaluation index, the accurate data of future change trend prediction reached 99.68%, which was verified using the PM2.5 data of the site from 2013 to 2018. It showed that the future change trend of PM2.5 was accurately obtained using these 16 candlestick chart combinations, as shown in Tables 4 and 5.
In Tables 4 and 5, the 16 candlestick chart combinations are listed. Among them, eight combinations predicted that the future PM2.5 concentration will increase, and eight combinations predicted that the future PM2.5 concentration will decrease. It also lists the corresponding relationship between the 16 combinations that will cause changes in the PM2.5 concentration in the following days, the parameter changes in the Gaussian equation, and the proportion of each category to the total number of samples. In Tables 4 and 5, Y represents that the item has changed, and N represents that the item has not changed. As Guilin is a low-industry city, the primary form of pollution is from external pollution sources. Hence, the 16 changes in the figure below will not appear when the source strength, Q, wind speed, v, and wind direction, d, do not change. Candlestick chart

Hyper Parameter Settings
To evaluate the performance of the VGG model, some hyperparameters need to be set. The primary parameters that need to be set are the default dimensions of the VGG model (m), the input size (si), the number of convolution kernels (nc), the size of the convolution kernel (sc), the size of the pooling window (sp), and the number of dense units (nd). The hyperparameter setting here adopts the hyperparameter setting of the CNN model in Suoyan Pan's research [56]. Table 3 shows the hyper-parameters in the VGG model.

Candlestick Chart Combination
After implementing unsupervised classification of 2188 groups of PM2.5 data, from the Guilin Monitoring Station, in the form of a candlestick chart, 16 candlestick chart combinations were obtained. Using Equation (2) as the evaluation index, the accurate data of future change trend prediction reached 99.68%, which was verified using the PM2.5 data of the site from 2013 to 2018. It showed that the future change trend of PM2.5 was accurately obtained using these 16 candlestick chart combinations, as shown in Tables 4 and 5.
In Tables 4 and 5, the 16 candlestick chart combinations are listed. Among them, eight combinations predicted that the future PM2.5 concentration will increase, and eight combinations predicted that the future PM2.5 concentration will decrease. It also lists the corresponding relationship between the 16 combinations that will cause changes in the PM2.5 concentration in the following days, the parameter changes in the Gaussian equation, and the proportion of each category to the total number of samples. In Tables 4 and 5, Y represents that the item has changed, and N represents that the item has not changed. As Guilin is a low-industry city, the primary form of pollution is from external pollution sources. Hence, the 16 changes in the figure below will not appear when the source strength, Q, wind speed, v, and wind direction, d, do not change. Candlestick chart

Hyper Parameter Settings
To evaluate the performance of the VGG model, some hyperparameters need to be set. The primary parameters that need to be set are the default dimensions of the VGG model (m), the input size (si), the number of convolution kernels (nc), the size of the convolution kernel (sc), the size of the pooling window (sp), and the number of dense units (nd). The hyperparameter setting here adopts the hyperparameter setting of the CNN model in Suoyan Pan's research [56]. Table 3 shows the hyper-parameters in the VGG model.

Candlestick Chart Combination
After implementing unsupervised classification of 2188 groups of PM2.5 data, from the Guilin Monitoring Station, in the form of a candlestick chart, 16 candlestick chart combinations were obtained. Using Equation (2) as the evaluation index, the accurate data of future change trend prediction reached 99.68%, which was verified using the PM2.5 data of the site from 2013 to 2018. It showed that the future change trend of PM2.5 was accurately obtained using these 16 candlestick chart combinations, as shown in Tables 4 and 5.
In Tables 4 and 5, the 16 candlestick chart combinations are listed. Among them, eight combinations predicted that the future PM2.5 concentration will increase, and eight combinations predicted that the future PM2.5 concentration will decrease. It also lists the corresponding relationship between the 16 combinations that will cause changes in the PM2.5 concentration in the following days, the parameter changes in the Gaussian equation, and the proportion of each category to the total number of samples. In Tables 4 and 5, Y represents that the item has changed, and N represents that the item has not changed. As Guilin is a low-industry city, the primary form of pollution is from external pollution sources. Hence, the 16 changes in the figure below will not appear when the source strength, Q, wind speed, v, and wind direction, d, do not change. Candlestick chart

Hyper Parameter Settings
To evaluate the performance of the VGG model, some hyperparameters need to be set. The primary parameters that need to be set are the default dimensions of the VGG model (m), the input size (si), the number of convolution kernels (nc), the size of the convolution kernel (sc), the size of the pooling window (sp), and the number of dense units (nd). The hyperparameter setting here adopts the hyperparameter setting of the CNN model in Suoyan Pan's research [56]. Table 3 shows the hyper-parameters in the VGG model.

Candlestick Chart Combination
After implementing unsupervised classification of 2188 groups of PM2.5 data, from the Guilin Monitoring Station, in the form of a candlestick chart, 16 candlestick chart combinations were obtained. Using Equation (2) as the evaluation index, the accurate data of future change trend prediction reached 99.68%, which was verified using the PM2.5 data of the site from 2013 to 2018. It showed that the future change trend of PM2.5 was accurately obtained using these 16 candlestick chart combinations, as shown in Tables 4 and 5.
In Tables 4 and 5, the 16 candlestick chart combinations are listed. Among them, eight combinations predicted that the future PM2.5 concentration will increase, and eight combinations predicted that the future PM2.5 concentration will decrease. It also lists the corresponding relationship between the 16 combinations that will cause changes in the PM2.5 concentration in the following days, the parameter changes in the Gaussian equation, and the proportion of each category to the total number of samples. In Tables 4 and 5, Y represents that the item has changed, and N represents that the item has not changed. As Guilin is a low-industry city, the primary form of pollution is from external pollution sources. Hence, the 16 changes in the figure below will not appear when the source strength, Q, wind speed, v, and wind direction, d, do not change. Candlestick chart

Hyper Parameter Settings
To evaluate the performance of the VGG model, some hyperparameters need to be set. The primary parameters that need to be set are the default dimensions of the VGG model (m), the input size (si), the number of convolution kernels (nc), the size of the convolution kernel (sc), the size of the pooling window (sp), and the number of dense units (nd). The hyperparameter setting here adopts the hyperparameter setting of the CNN model in Suoyan Pan's research [56]. Table 3 shows the hyper-parameters in the VGG model.

Candlestick Chart Combination
After implementing unsupervised classification of 2188 groups of PM2.5 data, from the Guilin Monitoring Station, in the form of a candlestick chart, 16 candlestick chart combinations were obtained. Using Equation (2) as the evaluation index, the accurate data of future change trend prediction reached 99.68%, which was verified using the PM2.5 data of the site from 2013 to 2018. It showed that the future change trend of PM2.5 was accurately obtained using these 16 candlestick chart combinations, as shown in Tables 4 and 5.
In Tables 4 and 5, the 16 candlestick chart combinations are listed. Among them, eight combinations predicted that the future PM2.5 concentration will increase, and eight combinations predicted that the future PM2.5 concentration will decrease. It also lists the corresponding relationship between the 16 combinations that will cause changes in the PM2.5 concentration in the following days, the parameter changes in the Gaussian equation, and the proportion of each category to the total number of samples. In Tables 4 and 5, Y represents that the item has changed, and N represents that the item has not changed. As Guilin is a low-industry city, the primary form of pollution is from external pollution sources. Hence, the 16 changes in the figure below will not appear when the source strength, Q, wind speed, v, and wind direction, d, do not change. Candlestick chart

Hyper Parameter Settings
To evaluate the performance of the VGG model, some hyperparameters need to be set. The primary parameters that need to be set are the default dimensions of the VGG model (m), the input size (si), the number of convolution kernels (nc), the size of the convolution kernel (sc), the size of the pooling window (sp), and the number of dense units (nd). The hyperparameter setting here adopts the hyperparameter setting of the CNN model in Suoyan Pan's research [56]. Table 3 shows the hyper-parameters in the VGG model. Table 3. Hyper-parameters involved in the VGG model.

Hyper-Parameters
Initial Values Default dimension of the VGG model (m) 2 Input size (si) 9 Number of convolution kernels (nc) 256 size of the convolution kernel (sc) 3 Pooling window size (sp) 2 Number of dense units (nd) 1024

Candlestick Chart Combination
After implementing unsupervised classification of 2188 groups of PM2.5 data, from the Guilin Monitoring Station, in the form of a candlestick chart, 16 candlestick chart combinations were obtained. Using Equation (2) as the evaluation index, the accurate data of future change trend prediction reached 99.68%, which was verified using the PM2.5 data of the site from 2013 to 2018. It showed that the future change trend of PM2.5 was accurately obtained using these 16 candlestick chart combinations, as shown in Tables 4 and 5.
In Tables 4 and 5, the 16 candlestick chart combinations are listed. Among them, eight combinations predicted that the future PM2.5 concentration will increase, and eight combinations predicted that the future PM2.5 concentration will decrease. It also lists the corresponding relationship between the 16 combinations that will cause changes in the PM2.5 concentration in the following days, the parameter changes in the Gaussian equation, and the proportion of each category to the total number of samples. In Tables 4 and 5, Y represents that the item has changed, and N represents that the item has not changed. As Guilin is a low-industry city, the primary form of pollution is from external pollution sources. Hence, the 16 changes in the figure below will not appear when the source strength, Q, wind speed, v, and wind direction, d, do not change. Candlestick chart   Candlestick chart x FOR PEER REVIEW 13 of 20 Table 5. Eight categories of the candlestick chart combinations for PM2.5 declines. 9  10  11  12  13  14  15 16  Tables 3 and 4 reflect the 16 kinds of PM2.5 characteristics of change. There are three main variables affecting the change of PM2.5 concentration, namely source intensity (Q), wind speed (v), and wind direction (d). Among them, the source intensity (Q) represents the total pollution of PM2.5, which will have a direct impact on the change of PM2.5 concentration. When the Q increases, the PM2.5 concentration will increase significantly in the future, which will lead to the occurrence of combinations 1, 4, 7, and 8. When the Q decreases, the PM2.5 concentration will decrease significantly in the future, which will lead to the occurrence of combinations 9, 12, 13, and 16. The wind speed (v) represents the pollution rate of PM2.5, which determines the change rate of PM2.5 concentration. When the v increases, there will be a significant increase in PM2.5 concentration in the future, which will lead to combinations 2 and 3. When the v decreases, the PM2.5 concentration will decrease significantly in the future, which will lead to the occurrence of combinations 10 and 11. The wind direction (d) determines the change state of PM2.5. When the d changes, it will directly lead to the color change of PM2.5 candlestick diagram, thus affecting the future PM2.5 concentration change. Combinations 5, 6, 14, and 15 are due to the change of d to determine the future trend of PM2.5 concentration.

Species
Hu et al. proposed 103 candlestick chart combinations in 2019, of which there were 29 candlestick chart combinations for three days [57]. By examining the comparison, it was found that in the 16 candlestick chart combinations obtained by the unsupervised classification, all of them matched these 29 three-day candlestick chart combinations. There were only 16 candlestick chart combinations at the Guilin Monitoring Station because the PM2.5 pollution types in Guilin are primarily from external pollution sources, while the unmatched types primarily occur under self-pollution.

Analysis of the VGG Model Classification Results
All the deep learning models in this research were trained on TensorFlow, and the traditional machine learning models were implemented through the scikit-learn library, and RMSprop was used as the optimizer.
The Guilin Monitoring Station was selected as the target site, and the PM2.5 data in the form of a candlestick chart for the six years from 2013 to 2018 was used as the basic dataset. The four-year candlestick chart PM2.5 data from 2013 to 2016 was used as the training set, and the two-year data from 2017 and 2018 was used as the test set. After training and convergence, the optimal model weights of the six hyperparameters of the VGG classification model were obtained, namely m = 2, si = 9, nc = 256, sc = 3, sp = 2, and nd = 1024. The number of each category and OA value after classification are shown in Table  5.
It can be seen from  Candlestick chart  Tables 3 and 4 reflect the 16 kinds of PM2.5 characteristics of change. There are three main variables affecting the change of PM2.5 concentration, namely source intensity (Q), wind speed (v), and wind direction (d). Among them, the source intensity (Q) represents the total pollution of PM2.5, which will have a direct impact on the change of PM2.5 concentration. When the Q increases, the PM2.5 concentration will increase significantly in the future, which will lead to the occurrence of combinations 1, 4, 7, and 8. When the Q decreases, the PM2.5 concentration will decrease significantly in the future, which will lead to the occurrence of combinations 9, 12, 13, and 16. The wind speed (v) represents the pollution rate of PM2.5, which determines the change rate of PM2.5 concentration. When the v increases, there will be a significant increase in PM2.5 concentration in the future, which will lead to combinations 2 and 3. When the v decreases, the PM2.5 concentration will decrease significantly in the future, which will lead to the occurrence of combinations 10 and 11. The wind direction (d) determines the change state of PM2.5. When the d changes, it will directly lead to the color change of PM2.5 candlestick diagram, thus affecting the future PM2.5 concentration change. Combinations 5, 6, 14, and 15 are due to the change of d to determine the future trend of PM2.5 concentration.
Hu et al. proposed 103 candlestick chart combinations in 2019, of which there were 29 candlestick chart combinations for three days [57]. By examining the comparison, it was found that in the 16 candlestick chart combinations obtained by the unsupervised classification, all of them matched these 29 three-day candlestick chart combinations. There were only 16 candlestick chart combinations at the Guilin Monitoring Station because the PM2.5 pollution types in Guilin are primarily from external pollution sources, while the unmatched types primarily occur under self-pollution.

Analysis of the VGG Model Classification Results
All the deep learning models in this research were trained on TensorFlow, and the traditional machine learning models were implemented through the scikit-learn library, and RMSprop was used as the optimizer.
The Guilin Monitoring Station was selected as the target site, and the PM2.5 data in the form of a candlestick chart for the six years from 2013 to 2018 was used as the basic dataset. The four-year candlestick chart PM2.5 data from 2013 to 2016 was used as the training set, and the two-year data from 2017 and 2018 was used as the test set. After training and convergence, the optimal model weights of the six hyperparameters of the VGG classification model were obtained, namely m = 2, si = 9, nc = 256, sc = 3, sp = 2, and nd = 1024. The number of each category and OA value after classification are shown in Table  5.
It can be seen from  Candlestick chart  Tables 3 and 4 reflect the 16 kinds of PM2.5 characteristics of change. There are three main variables affecting the change of PM2.5 concentration, namely source intensity (Q), wind speed (v), and wind direction (d). Among them, the source intensity (Q) represents the total pollution of PM2.5, which will have a direct impact on the change of PM2.5 concentration. When the Q increases, the PM2.5 concentration will increase significantly in the future, which will lead to the occurrence of combinations 1, 4, 7, and 8. When the Q decreases, the PM2.5 concentration will decrease significantly in the future, which will lead to the occurrence of combinations 9, 12, 13, and 16. The wind speed (v) represents the pollution rate of PM2.5, which determines the change rate of PM2.5 concentration. When the v increases, there will be a significant increase in PM2.5 concentration in the future, which will lead to combinations 2 and 3. When the v decreases, the PM2.5 concentration will decrease significantly in the future, which will lead to the occurrence of combinations 10 and 11. The wind direction (d) determines the change state of PM2.5. When the d changes, it will directly lead to the color change of PM2.5 candlestick diagram, thus affecting the future PM2.5 concentration change. Combinations 5, 6, 14, and 15 are due to the change of d to determine the future trend of PM2.5 concentration.
Hu et al. proposed 103 candlestick chart combinations in 2019, of which there were 29 candlestick chart combinations for three days [57]. By examining the comparison, it was found that in the 16 candlestick chart combinations obtained by the unsupervised classification, all of them matched these 29 three-day candlestick chart combinations. There were only 16 candlestick chart combinations at the Guilin Monitoring Station because the PM2.5 pollution types in Guilin are primarily from external pollution sources, while the unmatched types primarily occur under self-pollution.

Analysis of the VGG Model Classification Results
All the deep learning models in this research were trained on TensorFlow, and the traditional machine learning models were implemented through the scikit-learn library, and RMSprop was used as the optimizer.
The Guilin Monitoring Station was selected as the target site, and the PM2.5 data in the form of a candlestick chart for the six years from 2013 to 2018 was used as the basic dataset. The four-year candlestick chart PM2.5 data from 2013 to 2016 was used as the training set, and the two-year data from 2017 and 2018 was used as the test set. After training and convergence, the optimal model weights of the six hyperparameters of the VGG classification model were obtained, namely m = 2, si = 9, nc = 256, sc = 3, sp = 2, and nd = 1024. The number of each category and OA value after classification are shown in Table  5.
It can be seen from  Candlestick chart  Tables 3 and 4 reflect the 16 kinds of PM2.5 characteristics of change. There are three main variables affecting the change of PM2.5 concentration, namely source intensity (Q), wind speed (v), and wind direction (d). Among them, the source intensity (Q) represents the total pollution of PM2.5, which will have a direct impact on the change of PM2.5 concentration. When the Q increases, the PM2.5 concentration will increase significantly in the future, which will lead to the occurrence of combinations 1, 4, 7, and 8. When the Q decreases, the PM2.5 concentration will decrease significantly in the future, which will lead to the occurrence of combinations 9, 12, 13, and 16. The wind speed (v) represents the pollution rate of PM2.5, which determines the change rate of PM2.5 concentration. When the v increases, there will be a significant increase in PM2.5 concentration in the future, which will lead to combinations 2 and 3. When the v decreases, the PM2.5 concentration will decrease significantly in the future, which will lead to the occurrence of combinations 10 and 11. The wind direction (d) determines the change state of PM2.5. When the d changes, it will directly lead to the color change of PM2.5 candlestick diagram, thus affecting the future PM2.5 concentration change. Combinations 5, 6, 14, and 15 are due to the change of d to determine the future trend of PM2.5 concentration.
Hu et al. proposed 103 candlestick chart combinations in 2019, of which there were 29 candlestick chart combinations for three days [57]. By examining the comparison, it was found that in the 16 candlestick chart combinations obtained by the unsupervised classification, all of them matched these 29 three-day candlestick chart combinations. There were only 16 candlestick chart combinations at the Guilin Monitoring Station because the PM2.5 pollution types in Guilin are primarily from external pollution sources, while the unmatched types primarily occur under self-pollution.

Analysis of the VGG Model Classification Results
All the deep learning models in this research were trained on TensorFlow, and the traditional machine learning models were implemented through the scikit-learn library, and RMSprop was used as the optimizer.
The Guilin Monitoring Station was selected as the target site, and the PM2.5 data in the form of a candlestick chart for the six years from 2013 to 2018 was used as the basic dataset. The four-year candlestick chart PM2.5 data from 2013 to 2016 was used as the training set, and the two-year data from 2017 and 2018 was used as the test set. After training and convergence, the optimal model weights of the six hyperparameters of the VGG classification model were obtained, namely m = 2, si = 9, nc = 256, sc = 3, sp = 2, and nd = 1024. The number of each category and OA value after classification are shown in Table  5.
It can be seen from  Candlestick chart  Tables 3 and 4 reflect the 16 kinds of PM2.5 characteristics of change. There are three main variables affecting the change of PM2.5 concentration, namely source intensity (Q), wind speed (v), and wind direction (d). Among them, the source intensity (Q) represents the total pollution of PM2.5, which will have a direct impact on the change of PM2.5 concentration. When the Q increases, the PM2.5 concentration will increase significantly in the future, which will lead to the occurrence of combinations 1, 4, 7, and 8. When the Q decreases, the PM2.5 concentration will decrease significantly in the future, which will lead to the occurrence of combinations 9, 12, 13, and 16. The wind speed (v) represents the pollution rate of PM2.5, which determines the change rate of PM2.5 concentration. When the v increases, there will be a significant increase in PM2.5 concentration in the future, which will lead to combinations 2 and 3. When the v decreases, the PM2.5 concentration will decrease significantly in the future, which will lead to the occurrence of combinations 10 and 11. The wind direction (d) determines the change state of PM2.5. When the d changes, it will directly lead to the color change of PM2.5 candlestick diagram, thus affecting the future PM2.5 concentration change. Combinations 5, 6, 14, and 15 are due to the change of d to determine the future trend of PM2.5 concentration.
Hu et al. proposed 103 candlestick chart combinations in 2019, of which there were 29 candlestick chart combinations for three days [57]. By examining the comparison, it was found that in the 16 candlestick chart combinations obtained by the unsupervised classification, all of them matched these 29 three-day candlestick chart combinations. There were only 16 candlestick chart combinations at the Guilin Monitoring Station because the PM2.5 pollution types in Guilin are primarily from external pollution sources, while the unmatched types primarily occur under self-pollution.

Analysis of the VGG Model Classification Results
All the deep learning models in this research were trained on TensorFlow, and the traditional machine learning models were implemented through the scikit-learn library, and RMSprop was used as the optimizer.
The Guilin Monitoring Station was selected as the target site, and the PM2.5 data in the form of a candlestick chart for the six years from 2013 to 2018 was used as the basic dataset. The four-year candlestick chart PM2.5 data from 2013 to 2016 was used as the training set, and the two-year data from 2017 and 2018 was used as the test set. After training and convergence, the optimal model weights of the six hyperparameters of the VGG classification model were obtained, namely m = 2, si = 9, nc = 256, sc = 3, sp = 2, and nd = 1024. The number of each category and OA value after classification are shown in Table  5.
It can be seen from  Candlestick chart  Tables 3 and 4 reflect the 16 kinds of PM2.5 characteristics of change. There are three main variables affecting the change of PM2.5 concentration, namely source intensity (Q), wind speed (v), and wind direction (d). Among them, the source intensity (Q) represents the total pollution of PM2.5, which will have a direct impact on the change of PM2.5 concentration. When the Q increases, the PM2.5 concentration will increase significantly in the future, which will lead to the occurrence of combinations 1, 4, 7, and 8. When the Q decreases, the PM2.5 concentration will decrease significantly in the future, which will lead to the occurrence of combinations 9, 12, 13, and 16. The wind speed (v) represents the pollution rate of PM2.5, which determines the change rate of PM2.5 concentration. When the v increases, there will be a significant increase in PM2.5 concentration in the future, which will lead to combinations 2 and 3. When the v decreases, the PM2.5 concentration will decrease significantly in the future, which will lead to the occurrence of combinations 10 and 11. The wind direction (d) determines the change state of PM2.5. When the d changes, it will directly lead to the color change of PM2.5 candlestick diagram, thus affecting the future PM2.5 concentration change. Combinations 5, 6, 14, and 15 are due to the change of d to determine the future trend of PM2.5 concentration.
Hu et al. proposed 103 candlestick chart combinations in 2019, of which there were 29 candlestick chart combinations for three days [57]. By examining the comparison, it was found that in the 16 candlestick chart combinations obtained by the unsupervised classification, all of them matched these 29 three-day candlestick chart combinations. There were only 16 candlestick chart combinations at the Guilin Monitoring Station because the PM2.5 pollution types in Guilin are primarily from external pollution sources, while the unmatched types primarily occur under self-pollution.

Analysis of the VGG Model Classification Results
All the deep learning models in this research were trained on TensorFlow, and the traditional machine learning models were implemented through the scikit-learn library, and RMSprop was used as the optimizer.
The Guilin Monitoring Station was selected as the target site, and the PM2.5 data in the form of a candlestick chart for the six years from 2013 to 2018 was used as the basic dataset. The four-year candlestick chart PM2.5 data from 2013 to 2016 was used as the training set, and the two-year data from 2017 and 2018 was used as the test set. After training and convergence, the optimal model weights of the six hyperparameters of the VGG classification model were obtained, namely m = 2, si = 9, nc = 256, sc = 3, sp = 2, and nd = 1024. The number of each category and OA value after classification are shown in Table  5.
It can be seen from  Candlestick chart  Tables 3 and 4 reflect the 16 kinds of PM2.5 characteristics of change. There are three main variables affecting the change of PM2.5 concentration, namely source intensity (Q), wind speed (v), and wind direction (d). Among them, the source intensity (Q) represents the total pollution of PM2.5, which will have a direct impact on the change of PM2.5 concentration. When the Q increases, the PM2.5 concentration will increase significantly in the future, which will lead to the occurrence of combinations 1, 4, 7, and 8. When the Q decreases, the PM2.5 concentration will decrease significantly in the future, which will lead to the occurrence of combinations 9, 12, 13, and 16. The wind speed (v) represents the pollution rate of PM2.5, which determines the change rate of PM2.5 concentration. When the v increases, there will be a significant increase in PM2.5 concentration in the future, which will lead to combinations 2 and 3. When the v decreases, the PM2.5 concentration will decrease significantly in the future, which will lead to the occurrence of combinations 10 and 11. The wind direction (d) determines the change state of PM2.5. When the d changes, it will directly lead to the color change of PM2.5 candlestick diagram, thus affecting the future PM2.5 concentration change. Combinations 5, 6, 14, and 15 are due to the change of d to determine the future trend of PM2.5 concentration.
Hu et al. proposed 103 candlestick chart combinations in 2019, of which there were 29 candlestick chart combinations for three days [57]. By examining the comparison, it was found that in the 16 candlestick chart combinations obtained by the unsupervised classification, all of them matched these 29 three-day candlestick chart combinations. There were only 16 candlestick chart combinations at the Guilin Monitoring Station because the PM2.5 pollution types in Guilin are primarily from external pollution sources, while the unmatched types primarily occur under self-pollution.

Analysis of the VGG Model Classification Results
All the deep learning models in this research were trained on TensorFlow, and the traditional machine learning models were implemented through the scikit-learn library, and RMSprop was used as the optimizer.
The Guilin Monitoring Station was selected as the target site, and the PM2.5 data in the form of a candlestick chart for the six years from 2013 to 2018 was used as the basic dataset. The four-year candlestick chart PM2.5 data from 2013 to 2016 was used as the training set, and the two-year data from 2017 and 2018 was used as the test set. After training and convergence, the optimal model weights of the six hyperparameters of the VGG classification model were obtained, namely m = 2, si = 9, nc = 256, sc = 3, sp = 2, and nd = 1024. The number of each category and OA value after classification are shown in Table  5.
It can be seen from  Candlestick chart  Tables 3 and 4 reflect the 16 kinds of PM2.5 characteristics of change. There are three main variables affecting the change of PM2.5 concentration, namely source intensity (Q), wind speed (v), and wind direction (d). Among them, the source intensity (Q) represents the total pollution of PM2.5, which will have a direct impact on the change of PM2.5 concentration. When the Q increases, the PM2.5 concentration will increase significantly in the future, which will lead to the occurrence of combinations 1, 4, 7, and 8. When the Q decreases, the PM2.5 concentration will decrease significantly in the future, which will lead to the occurrence of combinations 9, 12, 13, and 16. The wind speed (v) represents the pollution rate of PM2.5, which determines the change rate of PM2.5 concentration. When the v increases, there will be a significant increase in PM2.5 concentration in the future, which will lead to combinations 2 and 3. When the v decreases, the PM2.5 concentration will decrease significantly in the future, which will lead to the occurrence of combinations 10 and 11. The wind direction (d) determines the change state of PM2.5. When the d changes, it will directly lead to the color change of PM2.5 candlestick diagram, thus affecting the future PM2.5 concentration change. Combinations 5, 6, 14, and 15 are due to the change of d to determine the future trend of PM2.5 concentration.
Hu et al. proposed 103 candlestick chart combinations in 2019, of which there were 29 candlestick chart combinations for three days [57]. By examining the comparison, it was found that in the 16 candlestick chart combinations obtained by the unsupervised classification, all of them matched these 29 three-day candlestick chart combinations. There were only 16 candlestick chart combinations at the Guilin Monitoring Station because the PM2.5 pollution types in Guilin are primarily from external pollution sources, while the unmatched types primarily occur under self-pollution.

Analysis of the VGG Model Classification Results
All the deep learning models in this research were trained on TensorFlow, and the traditional machine learning models were implemented through the scikit-learn library, and RMSprop was used as the optimizer.
The Guilin Monitoring Station was selected as the target site, and the PM2.5 data in the form of a candlestick chart for the six years from 2013 to 2018 was used as the basic dataset. The four-year candlestick chart PM2.5 data from 2013 to 2016 was used as the training set, and the two-year data from 2017 and 2018 was used as the test set. After training and convergence, the optimal model weights of the six hyperparameters of the VGG classification model were obtained, namely m = 2, si = 9, nc = 256, sc = 3, sp = 2, and nd = 1024. The number of each category and OA value after classification are shown in Table  5.
It can be seen from Table 6 that the total number of samples was 2188, and the average accuracy of each category reached 96.19%. The number of samples in the 16 categories In Tables 4 and 5, the 16 candlestick chart combinations are listed. Among them, eight combinations predicted that the future PM 2.5 concentration will increase, and eight combinations predicted that the future PM 2.5 concentration will decrease. It also lists the corresponding relationship between the 16 combinations that will cause changes in the PM 2.5 concentration in the following days, the parameter changes in the Gaussian equation, and the proportion of each category to the total number of samples. In Tables 4 and 5, Y represents that the item has changed, and N represents that the item has not changed. As Guilin is a low-industry city, the primary form of pollution is from external pollution sources. Hence, the 16 changes in the figure below will not appear when the source strength, Q, wind speed, v, and wind direction, d, do not change.
The 16 candlestick chart combinations shown in Tables 4 and 5 reflect the 16 kinds of PM 2.5 characteristics of change. There are three main variables affecting the change of PM 2.5 concentration, namely source intensity (Q), wind speed (v), and wind direction (d). Among them, the source intensity (Q) represents the total pollution of PM 2.5 , which will have a direct impact on the change of PM 2.5 concentration. When the Q increases, the PM 2.5 concentration will increase significantly in the future, which will lead to the occurrence of combinations 1, 4, 7, and 8. When the Q decreases, the PM 2.5 concentration will decrease significantly in the future, which will lead to the occurrence of combinations 9, 12, 13, and 16. The wind speed (v) represents the pollution rate of PM 2.5 , which determines the change rate of PM 2.5 concentration. When the v increases, there will be a significant increase in PM 2.5 concentration in the future, which will lead to combinations 2 and 3. When the v decreases, the PM 2.5 concentration will decrease significantly in the future, which will lead to the occurrence of combinations 10 and 11. The wind direction (d) determines the change state of PM 2.5 . When the d changes, it will directly lead to the color change of PM 2.5 candlestick diagram, thus affecting the future PM 2.5 concentration change. Combinations 5, 6, 14, and 15 are due to the change of d to determine the future trend of PM 2.5 concentration.
Hu et al. proposed 103 candlestick chart combinations in 2019, of which there were 29 candlestick chart combinations for three days [57]. By examining the comparison, it was found that in the 16 candlestick chart combinations obtained by the unsupervised classification, all of them matched these 29 three-day candlestick chart combinations. There were only 16 candlestick chart combinations at the Guilin Monitoring Station because the PM 2.5 pollution types in Guilin are primarily from external pollution sources, while the unmatched types primarily occur under self-pollution.

Analysis of the VGG Model Classification Results
All the deep learning models in this research were trained on TensorFlow, and the traditional machine learning models were implemented through the scikit-learn library, and RMSprop was used as the optimizer.
The Guilin Monitoring Station was selected as the target site, and the PM 2.5 data in the form of a candlestick chart for the six years from 2013 to 2018 was used as the basic dataset. The four-year candlestick chart PM 2.5 data from 2013 to 2016 was used as the training set, and the two-year data from 2017 and 2018 was used as the test set. After training and convergence, the optimal model weights of the six hyperparameters of the VGG classification model were obtained, namely m = 2, s i = 9, n c = 256, s c = 3, s p = 2, and n d = 1024. The number of each category and OA value after classification are shown in Table 6.
It can be seen from Table 6 that the total number of samples was 2188, and the average accuracy of each category reached 96.19%. The number of samples in the 16 categories was close to the number of samples in each category in Tables 4 and 5. This indicates the accuracy of the definition of the 16 candlestick chart combinations. It further indicates the feasibility of using the candlestick chart to reflect the physical diffusion characteristics of PM 2.5 . The confusion matrix, also known as the error matrix, is a standard format for accuracy evaluation that can reflect the accuracy of the image classification. The VGG model classification results displayed by the confusion matrix are shown in Figure 10. In Figure 10, the size of the value is represented by the square size and color depth. The Kappa coefficient of the VGG model classification experiment calculated by the confusion matrix was 0.960. According to the calculation result of the Kappa coefficient, it is known that the classification accuracy of the VGG model is very high using the candlestick chart feature to reflect the physical diffusion feature of PM 2.5 . Atmosphere 2021, 12, x FOR PEER REVIEW 15 of 20 Figure 9. VGG classification results displayed in the confusion matrix.

Model Comparison Analysis
To verify the classification performance of the VGG model, the VGG model was compared with three models, SVM, LeNet, and AlexNet, using the OA, Kappa values, and training times as quantitative results, as shown in Table 7. It can be seen from Table 7 that the OA, Kappa values, and training times of the VGG model were the best of all the experimental models. Comparatively, the VGG model had the least computational burden because the model contained only six fundamental function layers, rather than the deeper and repetitive functional layers. Using a comparison, it was found that the VGG model with the best hyperparameters had the highest classification accuracy, with the OA and Kappa values improved by approximately 0.56-3.41% and 0.01-0.044, respectively. Figure 10 shows a candlestick chart of the PM2.5 data conversion during the first and fourth quarters of 2018. By utilizing the classification results of PM2.5 data from the first two weeks of January 2018 as an example, the graphical displays of classification results of the four models are shown in Figure 11. Marks 1-8 in Figure 11 refers to the classification of the future PM2.5 concentration increases, and marks 9-16 refer to the classification of the future PM2.5 concentration decreases. Referring to Tables 4 and 5, the classification results of the four models of the SVM, LeNet, AlexNet, and VGG, which intercepted some data, were judged based on actual data. It was found that the accuracy

Model Comparison Analysis
To verify the classification performance of the VGG model, the VGG model was compared with three models, SVM, LeNet, and AlexNet, using the OA, Kappa values, and training times as quantitative results, as shown in Table 7. It can be seen from Table 7 that the OA, Kappa values, and training times of the VGG model were the best of all the experimental models. Comparatively, the VGG model had the least computational burden because the model contained only six fundamental function layers, rather than the deeper and repetitive functional layers. Using a comparison, it was found that the VGG model with the best hyperparameters had the highest classification accuracy, with the OA and Kappa values improved by approximately 0.56-3.41% and 0.01-0.044, respectively. Figure 11 shows a candlestick chart of the PM 2.5 data conversion during the first and fourth quarters of 2018. By utilizing the classification results of PM 2.5 data from the first two weeks of January 2018 as an example, the graphical displays of classification results of the four models are shown in Figure 12. Marks 1-8 in Figure 12 refers to the classification of the future PM 2.5 concentration increases, and marks 9-16 refer to the classification of the future PM 2.5 concentration decreases. Referring to Tables 4 and 5, the classification results of the four models of the SVM, LeNet, AlexNet, and VGG, which intercepted some data, were judged based on actual data. It was found that the accuracy of the classification results of the VGG model reached 100%, as shown in Figure 12a. Therefore, the VGG model accurately classified all of the PM 2.5 data. However, the other three models had classification errors, and the errors all appeared between similar categories. The 8th category was incorrectly classified into the 12th category by the SVM model, as shown in Figure 12b. The 10th category was incorrectly classified into the 4th category by the LeNet model, as shown in Figure 12c. The 13th category was incorrectly classified by the AlexNet model divided into the third category, as shown in Figure 12d. of the classification results of the VGG model reached 100%, as shown in Figure 11a. Therefore, the VGG model accurately classified all of the PM2.5 data. However, the other three models had classification errors, and the errors all appeared between similar categories. The 8th category was incorrectly classified into the 12th category by the SVM model, as shown in Figure 11b. The 10th category was incorrectly classified into the 4th category by the LeNet model, as shown in Figure 11c. The 13th category was incorrectly classified by the AlexNet model divided into the third category, as shown in Figure 11d.  This was mainly because (1) the VGG model primarily contained fundamental function layers, which guaranteed the classification accuracies by using the PM2.5 data in the form of a candlestick chart; and (2) the VGG model was not encumbered by a large number of implementation layers, which greatly shortened the model training time to improve the PM2.5 data classification efficiency.

Conclusions and Prospects
The physical principle of PM2.5 transmission has not been reflected by current studies that have examined PM2.5 transmission simulations. This is because the machine learning of the classification results of the VGG model reached 100%, as shown in Figure 11a. Therefore, the VGG model accurately classified all of the PM2.5 data. However, the other three models had classification errors, and the errors all appeared between similar categories. The 8th category was incorrectly classified into the 12th category by the SVM model, as shown in Figure 11b. The 10th category was incorrectly classified into the 4th category by the LeNet model, as shown in Figure 11c. The 13th category was incorrectly classified by the AlexNet model divided into the third category, as shown in Figure 11d.  This was mainly because (1) the VGG model primarily contained fundamental function layers, which guaranteed the classification accuracies by using the PM2.5 data in the form of a candlestick chart; and (2) the VGG model was not encumbered by a large number of implementation layers, which greatly shortened the model training time to improve the PM2.5 data classification efficiency.

Conclusions and Prospects
The physical principle of PM2.5 transmission has not been reflected by current studies that have examined PM2.5 transmission simulations. This is because the machine learning This was mainly because (1) the VGG model primarily contained fundamental function layers, which guaranteed the classification accuracies by using the PM 2.5 data in the form of a candlestick chart; and (2) the VGG model was not encumbered by a large number of implementation layers, which greatly shortened the model training time to improve the PM 2.5 data classification efficiency.

Conclusions and Prospects
The physical principle of PM 2.5 transmission has not been reflected by current studies that have examined PM 2.5 transmission simulations. This is because the machine learning models and hybrid models used by these studies were black-box models. These black-box models are established based on the relationship between input and output. Although this reflects a general direct causal relationship between related factors, it cannot describe the