China’s Public Firms’ Attitudes towards Environmental Protection Based on Sentiment Analysis and Random Forest Models

: In this article, we investigated changes in public ﬁrms’ attitudes towards environmental protection in 2018–2021 in China. We crawled the ﬁrm–investor Q&A record on the website of East Money, extracted the carbon- and environment-related corpus, and then applied the sentiment analysis method of NLP (natural language processing) to calculate the sentiment weight of each ﬁrm-level record to estimate the attitude before and after towards carbon reduction. We found that there were signiﬁcant changes in ﬁrms’ attitudes towards carbon reduction and environmental protection after the COVID-19 pandemic and the implementation of environment-related policies. We also found a heterogeneous effect of the attitude in different industries. In addition, we built several models to examine the relationship between a ﬁrm’s carbon reduction attitude and its ﬁnancial performance. We found that: A goal with consequent speciﬁc policies can raise the positive attitudes of ﬁrms toward carbon reduction topics; ﬁrms’ attitudes toward ecological topics are different from industry to industry, which means that there are different needs and situations in the trend of carbon reduction from industry to industry. COVID-19 inﬂuenced ﬁrms’ attitudes toward carbon reduction and environmental protection, calling back the classic dilemma or trilemma of economic growth, carbon reduction, and energy consumption or, perhaps, epidemic control today. The stock situation also inﬂuenced the attitude toward environmental protection.


Introduction
From 2018 to 2021, China experienced big events, such as the COVID-19 pandemic, economic transformation, trade war, and environmental topics. Alongside these events, China has proposed carbon reduction targets of "carbon neutrality" and a "carbon peak". Under these circumstances, we aim to explore how the attitudes of public firms dynamically change. The attitudes of firms toward energy conservation and emission reduction are affected by many factors. According to past field research, some Chinese firms believe that emission reduction has restricted the development of enterprises, while some believe that it is beneficial in the long term, and the attitude is influenced by industries, technology for and cost of emission reduction, size of firms, and other attributes of firms [1]. Especially under the COVID-19 pandemic, economic policy uncertainty has risen, and emission reduction behavior is also affected by economic policy uncertainty (EPU) [2]. When policy uncertainty increases, manufacturing companies tend to use cheap and highly polluting fossil energy [3]. At present, research on China's emission reduction issues mostly focuses on regional research [4], several typical industries [3], the relationship of energy consumption, emissions, and the economy [5], and the trade-off between emissions and economic development [6]. Recent research showed that with these policies, China can achieve the carbon intensity target by 2030, but with a negative impact on economic growth [6]; also in addition, energy consumption and economic growth are mutually important influencing factors [4], leading to a trilemma among energy consumption, carbon emissions, and economic growth [5]. However, although short-term effects exist, in the long term, a positive correlation of economic growth and carbon reduction was observed in BRICS and OECD economies [5].
There is not much research work on the firm level, and the existing research focuses more on emission behavior rather than attitude. As for the literature on the attitudes of firms toward emissions, Xing Lu's work [1] is important, reflecting 120 firms' attitudes directly through surveys. Firm-level research also showed that in the long term, firms prefer optimizing energy consumption and investing in green technologies, especially non-state-owned firms and firms with high external financing dependence [7]. In response to policies with uncertainty on carbon emission intensity, manufacturing firms prefer to use cheap and dirty fossil fuels [6].
In this article, we studied the dynamic changes in Chinese public firms' attitudes toward environmental protection from 2018 to 2021 and explored the factors influencing their attitudes to verify how the environmental protection policies, the COVID-19 pandemic, the industries of companies, and the stock performance of companies touch the nerves of companies.
Our contribution mainly lies in: (1) Starting from a Q&A of the listed firms with investor organizations, we constructed a collection of text data of Chinese firms' comments about carbon reduction; (2) we applied sentiment analysis (NLP methods) to estimate the firms' attitudes towards carbon reduction, and with the estimated results, we segmented the time span into three periods with two key time points (when the government's goal was set and when consequent policies were released), leading to the conclusion that, on their own, goals cannot raise the positive attitudes of firms, but goals with consequent policies can; (3) as applying NLP methods in order to estimate firms' attitudes towards carbon reduction is a complicated and dirty project that involves collecting text data, text mining and cleanup, and conducting NLP methods, we provided another more elegant access to firms' attitudes toward carbon reduction through financial and industry data, which were modeled by random forests; (4) we explored the industry factor and found that the attitude score differed from industry to industry; (5) we investigated how COVID-19 influenced firms' attitudes toward carbon reduction, finding that the attitudes did not float significantly before and after COVID-19, but if we controlled the financial data of firms, a more positive attitude could be observed.

Workflow
To estimate firms' attitudes towards environmental protection topics, we collected over 304,000 records of investor Q&A texts with their timestamps from the website of East Money [8], and then extracted the texts relevant to environmental protection, including those about carbon reduction, to calculate the attitude weight score by using sentiment analysis. Then, we analyzed the attitude weight score by industry, period, and other The time category of the Q&A record. P1 refers to those from before the "Double Carbon" goal was set. P3 refers to those from after the goal was incorporated into the government's work report. P2 is the time between p1 and p3. Then, we split p1 into p0 and p1 according to the time of the COVID-19 outbreak in China. When p0 is included, p1 refers to the period between the Wuhan shutdown and when the "Double Carbon" goal was proposed. The main steps are shown in Figure 1. We first cleaned up the text data to preserve only the text that was relevant to carbon reduction. Then, we used sentiment analysis to score the attitude of the sentiment in each text datum, which reflected the attitudes of firms in each investor Q&A session. With the sentiment score, we analyzed how firms' attitudes varied by different periods (segmented by COVID-19 and carbon policies) and by industry by using the Wilcoxon test to verify the significance of group differences. In the next step, with the estimated sentiment score, combined with other stock data collected from the Choice dataset from East Money, we built several models to explore the relationship between the sentiment weight and these indicators. Then, we obtained predictive results and the RMSE indicator from random forest models, to estimate the performance of the models. In the processing of the descriptive statistics, we found that, after the goals were proposed by President Xi and incorporated into government work reports, the frequency of words related to the environment mentioned in investor Q&As increased. Through the sentiment analysis, we obtained the sentiment score of the firms in each Q&A session. Then, according to the results of the scores, we verified that there was a significant increase in positive attitudes toward the environment after the "Double Carbon" goal was incorporated into the government report, but not after the goal was set, and there were significant differences between different industries. According to the linear models, we found significant influences from COVID-19, stock values (and floats of stock values), and the industry. Finally, as the NLP method involves a heavy workload in data collection and cleaning, we built models to predict the attitude scores from numerical financial data, which were much easier to collect. The RMSE (of the predicted result and the real data) of each model was calculated to compare the performance of the models and return the best random forest model. This part is summarized in Table 2.  Table 4, Table 5, Figure 2, Figure 3 Extract relevant corpus • Only reserve environment related text. Sentiment analysis to get the sentiment score • Table 6, Table 7, Figure 4, Figure 5 Analytics work on the sentiment score • How the score change in different periods • Group analytics and significance verification (Table 8, Table 9, Figure 6, Figure  7) Combined sentiment score with financial data of the listed firms Linear models • Further investigate factors of attitudes score.
• Appendix A tables A1 and A2.

Random forest models
• As another elegant way to predict firms' ecology sentiment score. Use linear models as baseline. • Table 10,  In the processing of the descriptive statistics, we found that, after the goals were proposed by President Xi and incorporated into government work reports, the frequency of words related to the environment mentioned in investor Q&As increased. Through the sentiment analysis, we obtained the sentiment score of the firms in each Q&A session. Then, according to the results of the scores, we verified that there was a significant increase in positive attitudes toward the environment after the "Double Carbon" goal was incorporated into the government report, but not after the goal was set, and there were significant differences between different industries. According to the linear models, we found significant influences from COVID-19, stock values (and floats of stock values), and the industry. Finally, as the NLP method involves a heavy workload in data collection and cleaning, we built models to predict the attitude scores from numerical financial data, which were much easier to collect. The RMSE (of the predicted result and the real data) of each model was calculated to compare the performance of the models and return the best random forest model. This part is summarized in Table 2. Table 2. Methods and findings.

Descriptive statistics
After the goals were proposed by President Xi and incorporated into government work reports, the frequency of words related to the environment mentioned in investor Q&As increased.

Methods Findings
Sentiment analysis (one of the NLP methods) We obtained the sentiment score for carbon reduction.
Analytics on the sentiment score Group analytics (Wilcoxon test) (1) There was a significant increase in positive attitudes toward the environment after the "Double Carbon" goal was incorporated into government reports, but not after the goal was set. (2) There were significant differences between different industries. model1: lm1 (1) COVID-19 showed a significant influence on the sentiment score. (2) The stock value, float of the stock value, and industry also influenced the sentiment score.
model2: lm2 The sentiment score was significantly influenced by whether a firm was in the technology industry. model3: rf1 A non-NLP way to predict firms' attitudes was provided. model4: rf2 Applied the four models for prediction and estimated the models by using the RMSE (a standard machine learning procedure).
Model3 (rf1) had the best RMSE, which means the lowest error in prediction.

NLP
For further verification and inspection, we applied the sentiment analysis method of NLP (natural language processing) to calculate the sentiment weight of each record in the Q&A text data to estimate the changes before and after the "Double Carbon" goal was set. In recent research, NLP methods have been extensively used to explore the non-numerical aspects of organizations, such as corporate culture, attitude, CSR, the personality traits of CEOs, etc. In Kai Li's work [9], they used Word2Vec to build dictionaries for corporate culture. In Shavin Malhotra's work [10], linguistic techniques were also applied to attain a CEO's traits from a spoken text. In our work, we used sentiment analysis to estimate a corporation's attitude towards the carbon emission goals and calculated a numerical result to represent the extent of firms' negative and positive attitudes.

Sentiment Analysis
Sentiment analysis is an NLP method that was first contributed by Turney [11] and Pang [12], who estimated binary attitudes in comments toward movies and commodities. Word segmentation methods can mainly be categorized into 4 groups: dictionary-based (keyword) word segmentation, word association, statistic-based word segmentation, and understanding-based word segmentation methods [13]. dictionary-based word segmentation methods match text data with the words in a constructed dictionary to obtain a word segmentation result [14]. Statistical methods, such as the support vector machine (SVM), N-gram grammar model (N-gram), hidden Markov model (HMM), and so on, usually use training data to build models [13]. The most common methods of word segmentation are usually combinations of dictionary-based and statistical models. In addition, with the development of deep learning, we have obtained more complex word segmentation methods that are closer to the human brain's understanding, such as BERT (a bidirectional neural network model). Such models are usually more accurate, have more complex algorithms, and are slower to implement.
As for our research, we aimed to estimate the emotional polarity of text data with more of a focus on sentimental words, rather than other words. In addition, the terms in the investor Q&As were mostly commonly used, standard, modern words; thus, we chose an agile way to detect the words in sentiment dictionaries. A sentiment dictionary maps words and the human emotions that they stand for, and it stores the emotions as computational values, such as numerical or True/False values. For example, we can use a positive number to represent a positive emotion and a negative one in a similar way. The absolute value can reflect the extent of an emotion. An example of the simplest dictionary is given in Table 3. The most prevailing Chinese sentiment dictionaries include Tsinghua Li Jun's positive and negative sentiment dictionary [15], the Chinese Academy of Sciences' Chinese sentiment degree dictionary, Dalian University of Technology's sentiment dictionary, and Tan Songbo's positive and negative sentiment dictionary based on a hotel evaluation corpus. We compiled the emotional dictionary of the Information Retrieval Laboratory of the Dalian University of Technology [16] and Li Jun's positive and negative sentiment dictionary from Tsinghua University and used the combination in the word segmentation method after deduplication in the dictionaries.
For the word segmentation results, we removed the stop words before calculating the sentiment score. Commonly used stop vocabularies include the stop vocabulary of Harbin Institute of Technology, the stop vocabulary of Baidu, and the stop vocabulary of the Machine Intelligence Laboratory of Sichuan University. We integrated the Baidu stop word list and the stop word database of the Machine Intelligence Laboratory of Sichuan University [17] and removed the stop words from the segmentation results based on the integrated stop word list.
We applied the combined dictionaries to our text records. The process can be briefly interpreted as estimating the positive level of the corpus according to the words in the text. For example, if a text record was "Thanks, the company actively pays attention to carbon-emission-related policies and actively participates in it. The current financial report does not have this business", we obtained "thanks | company | actively | pays attention to | carbon emission | related policies | actively | participate", which are 8 phrases after the word segmentation and removal of stop words (which we did in the former steps). The sentiment dictionary provides a mapping between words and scores.

Random Forests
The random forests technique is a machine learning method that is advantageous in terms of the usage efficiency of data because of its ability to use out-of-bag (OOB) samples and to rank variables according to their importance [18]. The recent research work by Heinrich [19] used random forest regression for carbon emission estimation to find the importance ranking of variables. In our work, with random forests, we attained the best-predicted results with the lowest RMSE and derived a ranking of importance.
All statistical analysis work was implemented with R version 4.1.2 (and packages for it).

Data
Our data sources were records of Q&A sessions between investor organizations and Chinese public firms, which were taken from the datasets of East Money. We crawled for company names, stock codes, and investor survey questions and answers on different dates. The crawling results included a total of more than 304,000 records of data from 2018 to 2021, and each piece of data contained multiple questions and answers. This amount of data is meaningful in machine learning [20]. The data included records of more than 304,000 questions and answers from 2609 public firms from 13 November 2018 to 12 November 2021. We published the crawled data and some subsequent collated data [21].
The data contained a total of 304,322 questions and answers between public firms and investors. Questions and answers for the same company on the same day were counted as one record; thus, each question and answer contained multiple questions and answers.
The period of the data was from 13 November 2018 to 12 November 2021, including the time frame before and after the "Double Carbon" goal was proposed. A total of 2609 public firms were covered by the data. Now, there are currently more than 4000 companies in China's stock market, and there were 3584 in 2018 as of the beginning of the data collection [22].
To improve the accuracy of the analysis and facilitate the comparison of the situation before and after the relevant policy, we set two key time points (summarized in Table 4). One is 22 September 2020, when President Xi Jinping first mentioned the terms "carbon neutrality" and "carbon peaking" at the United Nations. The other time point is when the "Double Carbon" policy was written into the State Council's government work report on 5 March 2021. From the perspective of the time distribution of the data, in the raw data, there were 134,731 units of survey data before 22 September 2020 and 53,309 units of survey data between the two dates. From 5 March 2021 to the present, there were a total of 116,282 units of survey data.

Descriptive Statistical Results of the Raw Data
Since our original data included all of the Q&A records of public firms, not all records were related to environmental protection, which meant that we needed to extract the records that were related. However, we could still find how much the importance of environmental protection changed over time from 2018 to the end of 2021 according to the proportion of records mentioning keywords of environmental protection in the whole collection of records. Thus, before selecting the text records related to environmental topics, we performed a descriptive statistical analysis on the text records related to energy conservation and emission reduction.
The following (Table 5) presents the frequency of relevant text records in different periods. From the results of the statistical description, after the "Double Carbon" goal was proposed, especially after being incorporated into the government work report, the data in the investor Q&As showed an increasing interest in reducing carbon dioxide emissions (Figures 2 and 3).     As for each keyword, we can see dramatic increases in the proportions of term from period1 to period2, and from period2 to period3. To further analyze what the trend stood for, we used the subsequent processing of the sentiment analysis. Specifically, although the trend could reflect the increasing attention to the keywords of the "Double Carbon" goal, we still need an accurate method to estimate the firms' attitudes towards the keywords. This is what we do in the next section on sentiment analysis. As for each keyword, we can see dramatic increases in the proportions of term from period1 to period2, and from period2 to period3. To further analyze what the trend stood for, we used the subsequent processing of the sentiment analysis. Specifically, although the trend could reflect the increasing attention to the keywords of the "Double Carbon" goal, we still need an accurate method to estimate the firms' attitudes towards the keywords. This is what we do in the next section on sentiment analysis.

Data Pre-Processing
We grouped the data into two categories. The first category included data that contained keywords about double carbon, energy saving, or emission reduction. The keyword set included the seven keywords of "carbon", "energy-saving", "emission reduction", "environmental protection", "low carbon", "carbon neutral", and "carbon peak".
The other category comprised the rest of the data. Finally, after classification, there were 75,786 units of data in the first category. Among them, the units of data before 22 September 2020 numbered 30,025, the units of data after 5 March 2021 numbered 34,639, and the number between the two points was 11,122 (Table 4).
However, the text records included all of the questions and answers in one session, which meant that not all of the text was about our topic. Therefore, before the next step, we thoroughly cleaned the text to preserve only the Q&As that contained the seven keywords. For example, the record of Hailiang Shares on 16 September 2021 included the basic information and five Q&As, but only Q&A2 and Q&A4 were related to the carbon reduction topic. Thus, after our pre-processing, only Q&A2 and Q&A4 were preserved in the record, and we applied this function to all of the text records of the Q&As. Compared to the original data, the pre-processed data stuck more tightly to the main topic, which was beneficial in improving the performance of the consequent models. We have also published the cleaned data [23].

The Result of the Sentiment Analysis and its Distribution
We calculated the sentiment weight of each record and show the results in Table 6. For each unit of text data, we received a sentiment weight, representing the firm's attitude in a specific Q&A session. A negative value represented a negative attitude, and a positive one represented the opposite. The larger the absolute value was, the greater the extent of the attitude was. In this table (Table 6), we can observe an increasing trend in the median, mean, and third-quartile values. In addition, we have included an interactive picture ( Figure 4) to show how the entire sentiment weight flowed over time. If there were multiple records on the same date, we calculated the mean as the sentiment score on that day.

PEER REVIEW
12 of 30 We can observe that: (1) There was a low-level weight around the period of the Wu- (3) the goal was incorporated into the government's work report. We used a 30-day rolling average on the data.
We can observe that: (1) There was a low-level weight around the period of the Wuhan shutdown. A possible reason can be the negative influence of COVID-19 on emotions and expectations, which will be one of our focal points later; (2) a dramatic soar of the sentiment weight in the last period after the goal was incorporated into the government report; (3) according to the figure, we cannot tell whether the change after the goal was proposed on 22 September 2020 is significant, which will be discussed in the next section.
We then calculated the average sentiment weight on the same date in each period group (there were multiple records made on the same day by different companies). As shown in Figure 5, we found that the p2 records showed a slightly higher average sentiment weight in the distribution than that of p1, and p3 had a higher average sentiment weight than those of both p1 and p2.
OR PEER REVIEW 13 of 30 Figure 5. Distribution of the daily average sentiment weight in each period.

The Group Differences in Sentiment Weight
To verify the significance of the differences in each period, we conducted a Wilcoxon test (results in Table 7). In addition, we have visualized the test results in Figure 6.

The Group Differences in Sentiment Weight
To verify the significance of the differences in each period, we conducted a Wilcoxon test (results in Table 7). In addition, we have visualized the test results in Figure 6.  The test verified that there was a significant increase in the average sentiment w after the "Double Carbon" goal was incorporated into the government's work repo vs. p3), but not after the goal was set (p1 vs. p2), which implied that firms woul change their attitudes only because of the government's goal, but further policies w push them to be significantly more positive in their attitudes toward environmenta tection (at least with their attitudes in public).
As shown in Figure 4, we also observed a low level around the Wuhan shutd leading us to the influence of COVID-19 on the attitude towards carbon reduction; we split period0 (before the Wuhan shutdown) from period1. However, there was n nificant difference between p0 and p1 or p2, but only between it and p3 ( Figure 7). The test verified that there was a significant increase in the average sentiment weight after the "Double Carbon" goal was incorporated into the government's work report (p2 vs. p3), but not after the goal was set (p1 vs. p2), which implied that firms would not change their attitudes only because of the government's goal, but further policies would push them to be significantly more positive in their attitudes toward environmental protection (at least with their attitudes in public).
As shown in Figure 4, we also observed a low level around the Wuhan shutdown, leading us to the influence of COVID-19 on the attitude towards carbon reduction; thus, we split period0 (before the Wuhan shutdown) from period1. However, there was no significant difference between p0 and p1 or p2, but only between it and p3 ( Figure 7).
We also explored whether there were significant differences in sentiment weight among all 96 industries. We grouped the firms' sentiment results by the industries to which they belonged and conducted a Wilcoxon test to verify each combination of industries. Thus, we obtained 4560 pairs for comparison, and 3122 of them were significant (Table 8), which meant that industries had a significant influence on the sentiment weights of the firms.  The test verified that there was a significant increase in the average sentiment weight after the "Double Carbon" goal was incorporated into the government's work report (p2 vs. p3), but not after the goal was set (p1 vs. p2), which implied that firms would not change their attitudes only because of the government's goal, but further policies would push them to be significantly more positive in their attitudes toward environmental protection (at least with their attitudes in public).
As shown in Figure 4, we also observed a low level around the Wuhan shutdown, leading us to the influence of COVID-19 on the attitude towards carbon reduction; thus, we split period0 (before the Wuhan shutdown) from period1. However, there was no significant difference between p0 and p1 or p2, but only between it and p3 ( Figure 7). There are significant differences of firms' attitudes between p0 and p3, p1 and p3, p2 and p3. ns: p > 0.05; ***: p ≤ 0.001.

Predictive Models of Firms' Sentiment Weights and Stock Data Based on Advanced Tree Models
We further explored the stock data of all of the firms observed in our attitude dataset to find the relationship of the sentiment weight with other values, such as the stock value, industry, and so on. The stock data came from the Choice dataset from East Money.
We used a linear regression model as the baseline model and a random forest models for further improvement. We split the dataset randomly with a proportion of 7:3, with 7 as the training data and 3 as the test data, in order to assess the performance of each model, as is done with supervised machine learning [24]. Then, we used our best model to rank the importance of the variables. The RMSE indicator was used to assess the models, and our best model was the one that satisfied: Then, we improvised the random forest models by optimizing the number of trees [25] and other hyper-parameters [26].

Model l
Result 1: As we can see in the table in the Appendix A, unlike with the verification of the observation of the significance of group differences, COVID-19 had a significant influence (in this model, industries were compared with "White household appliances", and periods were compared with "periodp0"). We added another variable to check whether a company's belonging to a technology field had a significant influence on the sentiment weight. The companies in technology industries were marked as 1, the traditional industries as -1, and industries that did not directly produce carbon as 0 (mostly service industries, such as banking and finance; see Table A2 in the Appendix A). Since the variable came from the industries, we removed the industry variable to avoid multicollinearity problems. model 2: weight = a1 × date + a2 × curret_value + a3 × percentage of increase + a4 × amount of float + a5 × volume + a6 × recent trading volume + a7 × open + a8 × price-earnings ratio.TTM. + a9 × total value + a10 × whether_tech +a11 × percentage of float in 60 days. + a12 × percentage of float in this year. + a13 × period (3) Result 2 (Table 9): In this model, we observed whether technology industries significantly influenced the sentiment weight result. However, R-squared was reduced because we removed the more specific variable-industry.

Stepwise Feature Selection
Before we used the random forest models, we used a forward stepwise selection to assist with the choice of the variables in model 1 (Table 10). There were several variables related to stock, leading to a potential interrelationship within the set of variables. We started from the intercept term, adding a variable in each step according to the contribution to the difference in the AIC after adding it, and we ended with all 13 variables in model 1. The result shows that after adding the other 12 variables, the variable "volume" did not contribute to the model, and should be deleted from the set of variables.  From the lines above, we found that adding the industry variable to the starting model (weight of~1) would lead to the best AIC. Thus, the stepwise selection started with a weight of 1 + the field in the next step.
Step 2: From the lines above, we found that adding the period would lead to the best AIC. Thus, the stepwise selection started with a weight of~1 + the field + the period in the next step.
We can see that in this step, adding the variable "volume" was not better than adding nothing (<none>) according to the AIC. Thus, the stepwise variable selection suggested that we delete the volume variable. In the importance ranking ( Figure 8) in model 3, we found that the date and the percentage of the float of stock were the two most important variables, while the period ranked last. The reason was that the period variable was related to the date variable, and if the date mostly explained the changes in attitudes, the part left to the period would be less, which was verified when we removed the date from model 3 and ranked the importance of the variables again ( Figure 9). model3: random forest 1(weight) = rf (period, %increase_this_year, %increase, %increase_60days, date, amount_of_float, current_value, open, total_value, recent_trading_volume, price-earnings ratio.TTM.) (4) ranked last. The reason was that the period variable was related to the date variable, a if the date mostly explained the changes in attitudes, the part left to the period would less, which was verified when we removed the date from model 3 and ranked the portance of the variables again ( Figure 9).   ranked last. The reason was that the period variable was related to the date variable, a if the date mostly explained the changes in attitudes, the part left to the period would less, which was verified when we removed the date from model 3 and ranked the i portance of the variables again ( Figure 9).

Model 4
We added the whether_tech variable to see the importance rank again ( Figure 10). However, the stock variables still rank high in the figure. Possible reason can be the industry variable is related with the stock performance.

Prediction and Model Assessment
We used RMSE indicators to combine the model performance of the 4 models (Table  11). A lower RMSE indicates a better prediction result, and a better performance of the model.

Prediction and Model Assessment
We used RMSE indicators to combine the model performance of the 4 models (Table 11). A lower RMSE indicates a better prediction result, and a better performance of the model. According to the RMSE indicators, we found that model 3 had the best performance.

Conclusions
Based on the question-and-answer records of Chinese public firms' investor surveys, this article examined the changes in companies' attitudes towards carbon reduction before and after the "Double Carbon" policy. First, our descriptive statistical result shows the following.
There was an increasing trend in the frequency of carbon reduction and environmental protection after the "Double Carbon" goal was proposed and incorporated into the government's work report, indicating a growing keenness on the topic.
Through sentiment analysis methods, we estimated the sentiment weight of each survey record. According to the weight, through the verification of group differences, we observed that: 1.
There was a significant increase in firms' attitudes towards carbon reduction and environmental protection after the "Double Carbon" goal was incorporated into the government's work report and consequent relevant policies were added, but the same significant increase was not found after the goal was proposed.

2.
A strong significance could be observed in the differences in attitude among the industries. A total of 3122 of the 4560 possible pairs for comparison showed a strong significance in the differences in industries' attitudes towards carbon reduction and environmental protection. 3.
The influence of COVID-19 on attitudes was not observed.

4.
Then, in the linear regression models, we observed that: 5.
Whether a firm is in a technology industry significantly influences the firm's attitude. 6.
Other significantly related variables were stock value, the increase in stock value since the start of the year, and stock data. 7.
COVID-19 significantly influenced firms' attitudes towards carbon reduction and environmental protection, which was different from the findings in the verification of the significance of group differences.
Finally, we applied random forests to attain the most accurate predictive model. Since the sentiments and emotions of humans are so delicate to estimate, with the predictive models based on non-linguistic variables that were constructed, there were more ways to predict, verify, and assess the firms' attitudes towards ecological topics.
According to the conclusion, our policy advice is:

1.
A goal with consequent specific policies can raise the positive attitudes of firms toward carbon reduction topics, but not the goal alone.

2.
Firms' attitudes toward ecological topics are different from industry to industry, which means that there are different needs and situations in the trend of carbon reduction from industry to industry. Detailed policies with differentiation will be more suitable.

3.
COVID-19 influences firms' attitudes toward carbon reduction and environmental protection, calling back the classic dilemma or trilemma of economic growth, carbon reduction, and a third factor, such as energy consumption or epidemic controls today.
Our database is large in scale and rich in content; it can support more research and exploration tasks. In the analytical work of this article, we calculated the changes in attitudes of Chinese public firms after key time points. However, more research can be conducted. For example, attitudes are also potentially influenced by more firm-level factors, such as CSR, the ownership of the firm (state-owned or private), the corporate culture, and the personalities of CEOs. As for the topic of carbon reduction, an LDA topic analysis model can be used to extract a company's views and to measure in different directions of emission reduction. In the sentiment analysis, this article distinguished between positively and negatively sentimental words, but with a greater extent of the words imported, the results of sentiment analysis can be more accurate; at the same time, the content of the dictionary can also be further improved. Word2Vec, BERT, etc. can be used to build a dictionary based on the topics of carbon reduction and environmental protection. These are also our next directions.