Next Article in Journal
Variability in Soil Parent Materials at Different Development Stages Controlled Phosphorus Fractions and Its Uptake by Maize Crop
Previous Article in Journal
Effects of Biochar-Based Fertilizers on Energy Characteristics and Growth of Black Locust Seedlings
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

China’s Public Firms’ Attitudes towards Environmental Protection Based on Sentiment Analysis and Random Forest Models

1
School of Business Administration, East China Normal University, Shanghai 200241, China
2
School of Professional Studies, Columbia University, New York, NY 10019, USA
3
International College, China Agricultural University, Beijing 100091, China
4
School of Economics, Huazhong University of Science and Technology, Wuhan 430074, China
5
School of Environment and Energy, Peking University, Beijing 100871, China
6
School of Economics, Peking University, Beijing 100871, China
7
Kogod School of Business, American University, Washington, DC 20016, USA
8
Research Institute of Economics and Management, South Western University of Finance and Economics, Chengdu 611130, China
9
School of Competitive Sports, Beijing Sport University, Beijing 100084, China
10
Guanghua School of Management, Peking University, Beijing 100871, China
*
Author to whom correspondence should be addressed.
Sustainability 2022, 14(9), 5046; https://doi.org/10.3390/su14095046
Submission received: 20 March 2022 / Revised: 17 April 2022 / Accepted: 18 April 2022 / Published: 22 April 2022

Abstract

:
In this article, we investigated changes in public firms’ attitudes towards environmental protection in 2018–2021 in China. We crawled the firm–investor Q&A record on the website of East Money, extracted the carbon- and environment-related corpus, and then applied the sentiment analysis method of NLP (natural language processing) to calculate the sentiment weight of each firm-level record to estimate the attitude before and after towards carbon reduction. We found that there were significant changes in firms’ attitudes towards carbon reduction and environmental protection after the COVID-19 pandemic and the implementation of environment-related policies. We also found a heterogeneous effect of the attitude in different industries. In addition, we built several models to examine the relationship between a firm’s carbon reduction attitude and its financial performance. We found that: A goal with consequent specific policies can raise the positive attitudes of firms toward carbon reduction topics; firms’ attitudes toward ecological topics are different from industry to industry, which means that there are different needs and situations in the trend of carbon reduction from industry to industry. COVID-19 influenced firms’ attitudes toward carbon reduction and environmental protection, calling back the classic dilemma or trilemma of economic growth, carbon reduction, and energy consumption or, perhaps, epidemic control today. The stock situation also influenced the attitude toward environmental protection.

1. Introduction

From 2018 to 2021, China experienced big events, such as the COVID-19 pandemic, economic transformation, trade war, and environmental topics. Alongside these events, China has proposed carbon reduction targets of “carbon neutrality” and a “carbon peak”. Under these circumstances, we aim to explore how the attitudes of public firms dynamically change. The attitudes of firms toward energy conservation and emission reduction are affected by many factors. According to past field research, some Chinese firms believe that emission reduction has restricted the development of enterprises, while some believe that it is beneficial in the long term, and the attitude is influenced by industries, technology for and cost of emission reduction, size of firms, and other attributes of firms [1]. Especially under the COVID-19 pandemic, economic policy uncertainty has risen, and emission reduction behavior is also affected by economic policy uncertainty (EPU) [2]. When policy uncertainty increases, manufacturing companies tend to use cheap and highly polluting fossil energy [3]. At present, research on China’s emission reduction issues mostly focuses on regional research [4], several typical industries [3], the relationship of energy consumption, emissions, and the economy [5], and the trade-off between emissions and economic development [6]. Recent research showed that with these policies, China can achieve the carbon intensity target by 2030, but with a negative impact on economic growth [6]; also in addition, energy consumption and economic growth are mutually important influencing factors [4], leading to a trilemma among energy consumption, carbon emissions, and economic growth [5]. However, although short-term effects exist, in the long term, a positive correlation of economic growth and carbon reduction was observed in BRICS and OECD economies [5].
There is not much research work on the firm level, and the existing research focuses more on emission behavior rather than attitude. As for the literature on the attitudes of firms toward emissions, Xing Lu’s work [1] is important, reflecting 120 firms’ attitudes directly through surveys. Firm-level research also showed that in the long term, firms prefer optimizing energy consumption and investing in green technologies, especially non-state-owned firms and firms with high external financing dependence [7]. In response to policies with uncertainty on carbon emission intensity, manufacturing firms prefer to use cheap and dirty fossil fuels [6].
In this article, we studied the dynamic changes in Chinese public firms’ attitudes toward environmental protection from 2018 to 2021 and explored the factors influencing their attitudes to verify how the environmental protection policies, the COVID-19 pandemic, the industries of companies, and the stock performance of companies touch the nerves of companies.
Our contribution mainly lies in: (1) Starting from a Q&A of the listed firms with investor organizations, we constructed a collection of text data of Chinese firms’ comments about carbon reduction; (2) we applied sentiment analysis (NLP methods) to estimate the firms’ attitudes towards carbon reduction, and with the estimated results, we segmented the time span into three periods with two key time points (when the government’s goal was set and when consequent policies were released), leading to the conclusion that, on their own, goals cannot raise the positive attitudes of firms, but goals with consequent policies can; (3) as applying NLP methods in order to estimate firms’ attitudes towards carbon reduction is a complicated and dirty project that involves collecting text data, text mining and cleanup, and conducting NLP methods, we provided another more elegant access to firms’ attitudes toward carbon reduction through financial and industry data, which were modeled by random forests; (4) we explored the industry factor and found that the attitude score differed from industry to industry; (5) we investigated how COVID-19 influenced firms’ attitudes toward carbon reduction, finding that the attitudes did not float significantly before and after COVID-19, but if we controlled the financial data of firms, a more positive attitude could be observed.

2. Materials and Methods

2.1. Workflow

To estimate firms’ attitudes towards environmental protection topics, we collected over 304,000 records of investor Q&A texts with their timestamps from the website of East Money [8], and then extracted the texts relevant to environmental protection, including those about carbon reduction, to calculate the attitude weight score by using sentiment analysis. Then, we analyzed the attitude weight score by industry, period, and other financial variables from the Choice dataset from East Money. A detailed explanation is shown in Table 1.
The main steps are shown in Figure 1. We first cleaned up the text data to preserve only the text that was relevant to carbon reduction. Then, we used sentiment analysis to score the attitude of the sentiment in each text datum, which reflected the attitudes of firms in each investor Q&A session. With the sentiment score, we analyzed how firms’ attitudes varied by different periods (segmented by COVID-19 and carbon policies) and by industry by using the Wilcoxon test to verify the significance of group differences. In the next step, with the estimated sentiment score, combined with other stock data collected from the Choice dataset from East Money, we built several models to explore the relationship between the sentiment weight and these indicators. Then, we obtained predictive results and the RMSE indicator from random forest models, to estimate the performance of the models.
In the processing of the descriptive statistics, we found that, after the goals were proposed by President Xi and incorporated into government work reports, the frequency of words related to the environment mentioned in investor Q&As increased. Through the sentiment analysis, we obtained the sentiment score of the firms in each Q&A session. Then, according to the results of the scores, we verified that there was a significant increase in positive attitudes toward the environment after the “Double Carbon” goal was incorporated into the government report, but not after the goal was set, and there were significant differences between different industries. According to the linear models, we found significant influences from COVID-19, stock values (and floats of stock values), and the industry. Finally, as the NLP method involves a heavy workload in data collection and cleaning, we built models to predict the attitude scores from numerical financial data, which were much easier to collect. The RMSE (of the predicted result and the real data) of each model was calculated to compare the performance of the models and return the best random forest model. This part is summarized in Table 2.

2.2. NLP

For further verification and inspection, we applied the sentiment analysis method of NLP (natural language processing) to calculate the sentiment weight of each record in the Q&A text data to estimate the changes before and after the “Double Carbon” goal was set. In recent research, NLP methods have been extensively used to explore the non-numerical aspects of organizations, such as corporate culture, attitude, CSR, the personality traits of CEOs, etc. In Kai Li’s work [9], they used Word2Vec to build dictionaries for corporate culture. In Shavin Malhotra’s work [10], linguistic techniques were also applied to attain a CEO’s traits from a spoken text. In our work, we used sentiment analysis to estimate a corporation’s attitude towards the carbon emission goals and calculated a numerical result to represent the extent of firms’ negative and positive attitudes.

2.3. Sentiment Analysis

Sentiment analysis is an NLP method that was first contributed by Turney [11] and Pang [12], who estimated binary attitudes in comments toward movies and commodities. Word segmentation methods can mainly be categorized into 4 groups: dictionary-based (keyword) word segmentation, word association, statistic-based word segmentation, and understanding-based word segmentation methods [13]. dictionary-based word segmentation methods match text data with the words in a constructed dictionary to obtain a word segmentation result [14]. Statistical methods, such as the support vector machine (SVM), N-gram grammar model (N-gram), hidden Markov model (HMM), and so on, usually use training data to build models [13]. The most common methods of word segmentation are usually combinations of dictionary-based and statistical models. In addition, with the development of deep learning, we have obtained more complex word segmentation methods that are closer to the human brain’s understanding, such as BERT (a bidirectional neural network model). Such models are usually more accurate, have more complex algorithms, and are slower to implement.
As for our research, we aimed to estimate the emotional polarity of text data with more of a focus on sentimental words, rather than other words. In addition, the terms in the investor Q&As were mostly commonly used, standard, modern words; thus, we chose an agile way to detect the words in sentiment dictionaries. A sentiment dictionary maps words and the human emotions that they stand for, and it stores the emotions as computational values, such as numerical or True/False values. For example, we can use a positive number to represent a positive emotion and a negative one in a similar way. The absolute value can reflect the extent of an emotion. An example of the simplest dictionary is given in Table 3.
The most prevailing Chinese sentiment dictionaries include Tsinghua Li Jun’s positive and negative sentiment dictionary [15], the Chinese Academy of Sciences’ Chinese sentiment degree dictionary, Dalian University of Technology’s sentiment dictionary, and Tan Songbo’s positive and negative sentiment dictionary based on a hotel evaluation corpus. We compiled the emotional dictionary of the Information Retrieval Laboratory of the Dalian University of Technology [16] and Li Jun’s positive and negative sentiment dictionary from Tsinghua University and used the combination in the word segmentation method after deduplication in the dictionaries.
For the word segmentation results, we removed the stop words before calculating the sentiment score. Commonly used stop vocabularies include the stop vocabulary of Harbin Institute of Technology, the stop vocabulary of Baidu, and the stop vocabulary of the Machine Intelligence Laboratory of Sichuan University. We integrated the Baidu stop word list and the stop word database of the Machine Intelligence Laboratory of Sichuan University [17] and removed the stop words from the segmentation results based on the integrated stop word list.
We applied the combined dictionaries to our text records. The process can be briefly interpreted as estimating the positive level of the corpus according to the words in the text. For example, if a text record was “Thanks, the company actively pays attention to carbon-emission-related policies and actively participates in it. The current financial report does not have this business”, we obtained “thanks | company | actively | pays attention to | carbon emission | related policies | actively | participate”, which are 8 phrases after the word segmentation and removal of stop words (which we did in the former steps). The sentiment dictionary provides a mapping between words and scores.

2.4. Random Forests

The random forests technique is a machine learning method that is advantageous in terms of the usage efficiency of data because of its ability to use out-of-bag (OOB) samples and to rank variables according to their importance [18]. The recent research work by Heinrich [19] used random forest regression for carbon emission estimation to find the importance ranking of variables. In our work, with random forests, we attained the best-predicted results with the lowest RMSE and derived a ranking of importance.
All statistical analysis work was implemented with R version 4.1.2 (and packages for it).

2.5. Data

Our data sources were records of Q&A sessions between investor organizations and Chinese public firms, which were taken from the datasets of East Money. We crawled for company names, stock codes, and investor survey questions and answers on different dates. The crawling results included a total of more than 304,000 records of data from 2018 to 2021, and each piece of data contained multiple questions and answers. This amount of data is meaningful in machine learning [20]. The data included records of more than 304,000 questions and answers from 2609 public firms from 13 November 2018 to 12 November 2021. We published the crawled data and some subsequent collated data [21].
The data contained a total of 304,322 questions and answers between public firms and investors. Questions and answers for the same company on the same day were counted as one record; thus, each question and answer contained multiple questions and answers.
The period of the data was from 13 November 2018 to 12 November 2021, including the time frame before and after the “Double Carbon” goal was proposed. A total of 2609 public firms were covered by the data. Now, there are currently more than 4000 companies in China’s stock market, and there were 3584 in 2018 as of the beginning of the data collection [22].
To improve the accuracy of the analysis and facilitate the comparison of the situation before and after the relevant policy, we set two key time points (summarized in Table 4). One is 22 September 2020, when President Xi Jinping first mentioned the terms “carbon neutrality” and “carbon peaking” at the United Nations. The other time point is when the “Double Carbon” policy was written into the State Council’s government work report on 5 March 2021.
From the perspective of the time distribution of the data, in the raw data, there were 134,731 units of survey data before 22 September 2020 and 53,309 units of survey data between the two dates. From 5 March 2021 to the present, there were a total of 116,282 units of survey data.

3. Results and Discussion

3.1. Descriptive Statistical Results of the Raw Data

Since our original data included all of the Q&A records of public firms, not all records were related to environmental protection, which meant that we needed to extract the records that were related. However, we could still find how much the importance of environmental protection changed over time from 2018 to the end of 2021 according to the proportion of records mentioning keywords of environmental protection in the whole collection of records. Thus, before selecting the text records related to environmental topics, we performed a descriptive statistical analysis on the text records related to energy conservation and emission reduction.
The following (Table 5) presents the frequency of relevant text records in different periods. From the results of the statistical description, after the “Double Carbon” goal was proposed, especially after being incorporated into the government work report, the data in the investor Q&As showed an increasing interest in reducing carbon dioxide emissions (Figure 2 and Figure 3).
As for each keyword, we can see dramatic increases in the proportions of term from period1 to period2, and from period2 to period3. To further analyze what the trend stood for, we used the subsequent processing of the sentiment analysis. Specifically, although the trend could reflect the increasing attention to the keywords of the “Double Carbon” goal, we still need an accurate method to estimate the firms’ attitudes towards the keywords. This is what we do in the next section on sentiment analysis.

3.2. Data Pre-Processing

We grouped the data into two categories. The first category included data that contained keywords about double carbon, energy saving, or emission reduction. The keyword set included the seven keywords of “carbon”, “energy-saving”, “emission reduction”, “environmental protection”, “low carbon”, “carbon neutral”, and “carbon peak”.
The other category comprised the rest of the data. Finally, after classification, there were 75,786 units of data in the first category. Among them, the units of data before 22 September 2020 numbered 30,025, the units of data after 5 March 2021 numbered 34,639, and the number between the two points was 11,122 (Table 4).
However, the text records included all of the questions and answers in one session, which meant that not all of the text was about our topic. Therefore, before the next step, we thoroughly cleaned the text to preserve only the Q&As that contained the seven keywords. For example, the record of Hailiang Shares on 16 September 2021 included the basic information and five Q&As, but only Q&A2 and Q&A4 were related to the carbon reduction topic. Thus, after our pre-processing, only Q&A2 and Q&A4 were preserved in the record, and we applied this function to all of the text records of the Q&As. Compared to the original data, the pre-processed data stuck more tightly to the main topic, which was beneficial in improving the performance of the consequent models. We have also published the cleaned data [23].

3.3. Sentiment Weight and Characteristics

3.3.1. The Result of the Sentiment Analysis and its Distribution

We calculated the sentiment weight of each record and show the results in Table 6.
For each unit of text data, we received a sentiment weight, representing the firm’s attitude in a specific Q&A session. A negative value represented a negative attitude, and a positive one represented the opposite. The larger the absolute value was, the greater the extent of the attitude was. In this table (Table 6), we can observe an increasing trend in the median, mean, and third-quartile values. In addition, we have included an interactive picture (Figure 4) to show how the entire sentiment weight flowed over time. If there were multiple records on the same date, we calculated the mean as the sentiment score on that day.
This figure (Figure 4) presents the sentiment scores of firms’ attitudes over the whole timeline. We added three vertical lines to the plot: (1) The time point of the Wuhan shutdown because of COVID-19; (2) the time point of the “Double Carbon” goal proposed by President Xi; (3) the goal was incorporated into the government’s work report. We used a 30-day rolling average on the data.
We can observe that: (1) There was a low-level weight around the period of the Wuhan shutdown. A possible reason can be the negative influence of COVID-19 on emotions and expectations, which will be one of our focal points later; (2) a dramatic soar of the sentiment weight in the last period after the goal was incorporated into the government report; (3) according to the figure, we cannot tell whether the change after the goal was proposed on 22 September 2020 is significant, which will be discussed in the next section.
We then calculated the average sentiment weight on the same date in each period group (there were multiple records made on the same day by different companies). As shown in Figure 5, we found that the p2 records showed a slightly higher average sentiment weight in the distribution than that of p1, and p3 had a higher average sentiment weight than those of both p1 and p2.

3.3.2. The Group Differences in Sentiment Weight

To verify the significance of the differences in each period, we conducted a Wilcoxon test (results in Table 7). In addition, we have visualized the test results in Figure 6.
The test verified that there was a significant increase in the average sentiment weight after the “Double Carbon” goal was incorporated into the government’s work report (p2 vs. p3), but not after the goal was set (p1 vs. p2), which implied that firms would not change their attitudes only because of the government’s goal, but further policies would push them to be significantly more positive in their attitudes toward environmental protection (at least with their attitudes in public).
As shown in Figure 4, we also observed a low level around the Wuhan shutdown, leading us to the influence of COVID-19 on the attitude towards carbon reduction; thus, we split period0 (before the Wuhan shutdown) from period1. However, there was no significant difference between p0 and p1 or p2, but only between it and p3 (Figure 7).
We also explored whether there were significant differences in sentiment weight among all 96 industries. We grouped the firms’ sentiment results by the industries to which they belonged and conducted a Wilcoxon test to verify each combination of industries. Thus, we obtained 4560 pairs for comparison, and 3122 of them were significant (Table 8), which meant that industries had a significant influence on the sentiment weights of the firms.

3.4. Predictive Models of Firms’ Sentiment Weights and Stock Data Based on Advanced Tree Models

We further explored the stock data of all of the firms observed in our attitude dataset to find the relationship of the sentiment weight with other values, such as the stock value, industry, and so on. The stock data came from the Choice dataset from East Money.
We used a linear regression model as the baseline model and a random forest models for further improvement. We split the dataset randomly with a proportion of 7:3, with 7 as the training data and 3 as the test data, in order to assess the performance of each model, as is done with supervised machine learning [24]. Then, we used our best model to rank the importance of the variables. The RMSE indicator was used to assess the models, and our best model was the one that satisfied:
Min (RMSE (predict_result_best_model (train_data), test_data))
Then, we improvised the random forest models by optimizing the number of trees [25] and other hyper-parameters [26].

3.4.1. Model l

Result 1:
As we can see in the table in the Appendix A, unlike with the verification of the observation of the significance of group differences, COVID-19 had a significant influence (in this model, industries were compared with “White household appliances”, and periods were compared with “periodp0”).
model 1: weight = a1 × date + a2 × curret_value + a3 × percentage of increase + a4 × amount of float + a5 × volume + a6 × recent trading volume + a7 × open + a8 × price-earnings ratio.TTM. + a9 × total value + a10 × industry + a11 × percentage of float in 60 days. + a12 × percentage of float in this year. + a13 × period

3.4.2. Model 2

We added another variable to check whether a company’s belonging to a technology field had a significant influence on the sentiment weight. The companies in technology industries were marked as 1, the traditional industries as –1, and industries that did not directly produce carbon as 0 (mostly service industries, such as banking and finance; see Table A2 in the Appendix A). Since the variable came from the industries, we removed the industry variable to avoid multicollinearity problems.
model 2: weight = a1 × date + a2 × curret_value + a3 × percentage of increase + a4 × amount of float + a5 × volume + a6 × recent trading volume + a7 × open + a8 × price-earnings ratio.TTM. + a9 × total value + a10 × whether_tech +a11 × percentage of float in 60 days. + a12 × percentage of float in this year. + a13 × period
Result 2 (Table 9):
In this model, we observed whether technology industries significantly influenced the sentiment weight result. However, R-squared was reduced because we removed the more specific variable—industry.

3.4.3. Stepwise Feature Selection

Before we used the random forest models, we used a forward stepwise selection to assist with the choice of the variables in model 1 (Table 10). There were several variables related to stock, leading to a potential interrelationship within the set of variables. We started from the intercept term, adding a variable in each step according to the contribution to the difference in the AIC after adding it, and we ended with all 13 variables in model 1. The result shows that after adding the other 12 variables, the variable “volume” did not contribute to the model, and should be deleted from the set of variables.

3.4.4. Model 3

In the importance ranking (Figure 8) in model 3, we found that the date and the percentage of the float of stock were the two most important variables, while the period ranked last. The reason was that the period variable was related to the date variable, and if the date mostly explained the changes in attitudes, the part left to the period would be less, which was verified when we removed the date from model 3 and ranked the importance of the variables again (Figure 9).
model3: random forest 1(weight) = rf (period, %increase_this_year, %increase, %increase_60days, date, amount_of_float, current_value, open, total_value, recent_trading_volume, price-earnings ratio.TTM.)

3.4.5. Model 4

We added the whether_tech variable to see the importance rank again (Figure 10). However, the stock variables still rank high in the figure. Possible reason can be the industry variable is related with the stock performance.
Model4: random forest 1(weight) = rf (period, %increase_this_year, %increase, %increase_60days, date, amount_of_float, current_value, open, total_value, recent_trading_volume, price-earnings ratio.TTM., whether_tech)

3.4.6. Prediction and Model Assessment

We used RMSE indicators to combine the model performance of the 4 models (Table 11). A lower RMSE indicates a better prediction result, and a better performance of the model.

4. Conclusions

Based on the question-and-answer records of Chinese public firms’ investor surveys, this article examined the changes in companies’ attitudes towards carbon reduction before and after the “Double Carbon” policy. First, our descriptive statistical result shows the following.
There was an increasing trend in the frequency of carbon reduction and environmental protection after the “Double Carbon” goal was proposed and incorporated into the government’s work report, indicating a growing keenness on the topic.
Through sentiment analysis methods, we estimated the sentiment weight of each survey record. According to the weight, through the verification of group differences, we observed that:
  • There was a significant increase in firms’ attitudes towards carbon reduction and environmental protection after the “Double Carbon” goal was incorporated into the government’s work report and consequent relevant policies were added, but the same significant increase was not found after the goal was proposed.
  • A strong significance could be observed in the differences in attitude among the industries. A total of 3122 of the 4560 possible pairs for comparison showed a strong significance in the differences in industries’ attitudes towards carbon reduction and environmental protection.
  • The influence of COVID-19 on attitudes was not observed.
  • Then, in the linear regression models, we observed that:
  • Whether a firm is in a technology industry significantly influences the firm’s attitude.
  • Other significantly related variables were stock value, the increase in stock value since the start of the year, and stock data.
  • COVID-19 significantly influenced firms’ attitudes towards carbon reduction and environmental protection, which was different from the findings in the verification of the significance of group differences.
Finally, we applied random forests to attain the most accurate predictive model. Since the sentiments and emotions of humans are so delicate to estimate, with the predictive models based on non-linguistic variables that were constructed, there were more ways to predict, verify, and assess the firms’ attitudes towards ecological topics.
According to the conclusion, our policy advice is:
  • A goal with consequent specific policies can raise the positive attitudes of firms toward carbon reduction topics, but not the goal alone.
  • Firms’ attitudes toward ecological topics are different from industry to industry, which means that there are different needs and situations in the trend of carbon reduction from industry to industry. Detailed policies with differentiation will be more suitable.
  • COVID-19 influences firms’ attitudes toward carbon reduction and environmental protection, calling back the classic dilemma or trilemma of economic growth, carbon reduction, and a third factor, such as energy consumption or epidemic controls today.
Our database is large in scale and rich in content; it can support more research and exploration tasks. In the analytical work of this article, we calculated the changes in attitudes of Chinese public firms after key time points. However, more research can be conducted. For example, attitudes are also potentially influenced by more firm-level factors, such as CSR, the ownership of the firm (state-owned or private), the corporate culture, and the personalities of CEOs. As for the topic of carbon reduction, an LDA topic analysis model can be used to extract a company’s views and to measure in different directions of emission reduction. In the sentiment analysis, this article distinguished between positively and negatively sentimental words, but with a greater extent of the words imported, the results of sentiment analysis can be more accurate; at the same time, the content of the dictionary can also be further improved. Word2Vec, BERT, etc. can be used to build a dictionary based on the topics of carbon reduction and environmental protection. These are also our next directions.

Author Contributions

Conceptualization, C.L., Y.Y. and L.L.; methodology, C.L., Y.W. and L.L.; software, L.L. and Y.W.; validation, J.W. and Y.Y.; formal analysis, C.L., W.L., Z.L., J.Z. and L.L.; investigation, J.Z., J.W. and Y.Y.; resources, C.L. and L.L.; data curation, Q.H., J.W. and J.G.; writing—original draft preparation, C.L., L.L., J.Z., J.W., Y.Y., Z.L., Y.W., Q.H., W.L. and J.G.; writing—review and editing, L.L.; visualization, Y.Y. and W.L.; supervision, L.L.; project administration, L.L.; funding acquisition, L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data generated or analyzed during this study are included in this published article. For more details, see https://github.com/luyuyuyu/gov_mkt_carbon_nlp (Accessed on 3 March 2022).

Acknowledgments

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. The results of model 1.
Table A1. The results of model 1.
TermEstimateStd.Errort Valuep. ValueSignif Codes
1(intercept)122.846714.490618.4776712.36 × 10−17***
2date−0.006410.000802−7.993381.34 × 10−15***
3current_value−1.085270.101149−10.72947.91 × 10−27***
4percentage of increase0.6200660.03956815.67093.22 × 10−55***
5amount of float0.8030830.1160036.9229194.48 × 10−12***
6volume2.55 × 10−96.16 × 10−90.4141120.678794***
7recent trading volume−0.00014.80 × 10−5−2.183490.029005
8open1.0849420.1017310.664951.58 × 10−26*
9price–earnings ratio.TTM.−0.001160.000548−2.11490.034443***
10total value8.12 × 10−121.67 × 10−124.8719451.11 × 10−6*
11–105

industry, compare to white household appliances
semiconductors−1.325830.932814−1.421320.15523***
glass17.307744.512583.8354430.000125
animal husbandry−2.844010.881095−3.227820.001248***
ship and marine equipment7.1570585.2745471.3569050.174817**
motor0.8781441.11230.7894840.429833
electricity0.2738860.9900880.2766280.782067
power supply3.5883640.9085563.9495247.84 × 10−5***
electronic devices−4.195871.165698−3.599450.000319***
electronic equipment manufacturing2.4393990.8103153.0104340.00261**
electronic components−0.62490.955577−0.653950.513147
real estate development4.5417322.4790981.832010.066956.
textiles−1.773114.728651−0.374970.707684
non-bank finance0.7075481.8793890.3764780.706563
clothing and home textiles−3.806771.030464−3.694230.000221***
steel structures−0.101540.836296−0.121410.903363
steel3.0112070.8463673.5578040.000374***
port shipping10.497832.236624.6936152.69 × 10−6***
road rail2.251312.4509830.9185330.358344
optoelectronic device−4.29971.129387−3.807110.000141***
broadcasting−2.60288.556085−0.30420.760973
rail transit equipment50.767935.6293499.0184381.97 × 10−19***
precious metals9.2190868.5628591.0766360.281648
aerospace equipment−6.879121.93049−3.563410.000366***
aviation airport4.4049223.312291.3298720.183566
synthetic fiber and resin10.093590.88449711.411673.98 × 10−30***
internet service5.9660191.4175034.2088242.57 × 10−5***
internet technology0.6997861.9400250.360710.718318
internet business−2.544727.420536−0.342930.731653
fertilizers and pesticides−0.884220.850499−1.039650.298507
new chemical materials17.60470.92807818.9695.81 × 10−80***
chemical materials2.4144270.8196922.9455280.003226**
chemicals5.1668980.7964516.4873998.81 × 10−11***
chemical and pharmaceutical−1.789320.919407−1.946170.051639.
environmental protection21.784060.95111322.903761.64 × 10−115***
robots−1.463640.923671−1.584590.113066
basic metal0.2000880.8383840.238660.811371
infrastructure0.5564081.6851680.3301790.741266
computer software15.196350.95448415.921016.22 × 10−57***
computer hardware3.7475181.0744153.4879610.000487***
furniture0.2435151.1686280.2083770.834935
building construction17.805260.970918.338937.05 × 10−75***
education−1.790096.644787−0.26940.787624
new metal and non-metal materials6.0831960.8506587.1511688.72 × 10−13***
metal products−4.607010.828141−5.563072.66 × 10−8***
forestry3.1149344.5175780.6895140.490503
retail1.4074164.5118640.3119370.75509
trading−3.135831.604586−1.954290.050672.
coal1.5638531.489191.0501360.293661
refractory14.415661.4770469.7597881.75 × 10−22***
agriculture−4.298082.232262−1.925430.054181.
print media2.80675214.78150.1898830.849402
other electrical equipment3.4932111.243062.8101710.004953**
other home appliances1.1332011.8230940.6215810.53422
other building materials−0.184920.978066−0.189070.85004
other delivery equipment0.8599541.8947780.4538550.649935
other light industry−0.469415.272863−0.089020.929063
car1.7655030.7849582.2491690.024506*
gas0.6290261.6058510.3917090.695275
commercial property management−0.984644.160513−0.236660.81292
biomedicine−4.417072.675452−1.650960.098753.
petroleum gas6.1159420.9416996.494588.40 × 10−11***
food−3.04980.971268−3.140020.00169**
audiovisual equipment6.5643791.4796414.4364689.16 × 10−6***
transmission and transformation equipment2.5710720.9602622.6774690.00742**
cement2.7563161.3474382.0455970.040801*
water affairs2.8791281.6968061.6967930.089742.
ceramics6.7462631.5432924.3713461.24 × 10−5***
iron ore6.1210135.6316011.0869040.277084
railway equipment5.1803822.6778951.9344980.053058.
communication devices3.3120281.4182482.3352950.019532*
general equipment1.0977380.7700441.4255520.154004
satellite applications2.7091811.8867081.435930.151028
entertainment supplies−2.353572.753838−0.854650.392749
logistics−1.196811.069861−1.118660.263291
rare metals−1.589561.16583−1.363460.172743
rubber products2.2184471.1172551.9856230.047081*
consumer electronics−3.938731.951649−2.018160.04358*
home appliances2.5224291.2466032.0234420.043033*
leisure service0.4352176.6450080.0654950.94778
medical service−4.715852.986223−1.57920.114296
medical instruments−4.505120.998439−4.512166.43 × 10−6***
pharmaceutical business−6.105411.505683−4.054915.02 × 10−5***
banking−3.275781.567455−2.089880.036634*
drinks−2.363684.164547−0.567570.570329
marketing service1.9541792.4242670.8060910.420194
movies and animation−4.282384.328788−0.989280.322531
fishery−1.30778.590432−0.152230.879008
paper printing1.6104450.8962661.7968380.072367.
lighting devices2.2205981.6159521.3741730.169394
traditional Chinese medicine production−4.441791.487589−2.98590.002829**
jewelry−7.880813.236608−2.43490.014899*
professional service8.090880.8750219.2464932.41 × 10−20***
professional setting−1.252120.794771−1.575450.11516
decoration2.7269031.622721.6804520.092876.
comprehensive1.8547431.5575771.1907870.233743
106percentage of float in 60 days−0.034860.003465−10.06128.63 × 10−24***
107percentage of float in this year0.0157250.00085118.475945.72 × 10−76***
periods 108–110, compare to p0periodp12.4191410.3533876.8455887.70 × 10−12***
periodp24.2213450.4876258.6569474.98 × 10−18***
periodp37.6331210.61779612.355415.11 × 10−35***
Significant codes: ·: p > 0.1; .: p ≤ 0.1; *: p ≤ 0.05; **: p ≤ 0.01; ***: p ≤ 0.001
Table A2. The whether_tech variable and corresponding industries.
Table A2. The whether_tech variable and corresponding industries.
NameWhether_Tech
1banking0
2glass−1
3audiovisual equipment1
4other building materials−1
5electricity−1
6trading0
7environmental protection1
8real estate development−1
9metal products−1
10animal husbandry−1
11electronic devices1
12building construction−1
13basic metal−1
14commercial property management0
15electronic components1
16chemical and pharmaceutical1
17professional setting1
18synthetic fiber and resin−1
19white goods−1
20car1
21transmission and transformation equipment−1
22cement−1
23gas−1
24chemical materials−1
25internet service1
26logistics−1
27road rail−1
28paper printing−1
29infrastructure−1
30port shipping−1
31new metal and non-metal materials1
32food−1
33general equipment−1
34traditional Chinese medicine production1
35water affairs−1
36coal−1
37fertilizers and pesticides−1
38petroleum gas−1
39drinks−1
40rubber products−1
41power supply−1
42forestry−1
43medical service0
44non-bank finance0
45steel−1
46rare metals1
47aerospace equipment1
48professional service0
49retail0
50biomedicine1
51new chemical materials1
52comprehensive−1
53textile−1
54chemicals−1
55agriculture−1
56broadcasting0
57motors−1
58railway equipment−1
59computer hardware1
60computer software1
61pharmaceutical business0
62electronic equipment manufacturing1
63iron ore−1
64clothing and home textiles−1
65decoration−1
66refractory−1
67semiconductors1
68communication devices1
69other delivery equipment−1
70marketing service0
71steel structures−1
72precious metals−1
73leisure service0
74ceramics−1
75education0
76movies and animation0
77entertainment supplies−1
78other electrical equipment−1
79medical instruments1
80optoelectronic devices1
81rail transit equipment−1
82furniture−1
83home appliances−1
84robots1
85other light industry−1
86lighting devices−1
87jewelry−1
88consumer electronics−1
89aviation airport1
90ship and marine equipment1
91satellite application1
92fishery−1
93other home appliances−1
94internet business0
95internet technology1
96print media0

References

  1. Xing, L.; Shi, L.; Hussain, A. Corporations response to the energy saving and pollution abatement policy. Int. J. Environ. Res. 2010, 4, 637–646. [Google Scholar]
  2. Liu, Y.; Zhang, Z. How does economic policy uncertainty affect CO2 emissions? A regional analysis in China. Environ. Sci. Pollut. Res. 2022, 29, 4276–4290. [Google Scholar] [CrossRef] [PubMed]
  3. Yu, J.; Shi, X.; Guo, D.; Yang, L. Economic policy uncertainty (EPU) and firm carbon emissions: Evidence using a China provincial EPU index. Energy Econ. 2021, 94, 105071. [Google Scholar] [CrossRef]
  4. Zhao, M.; Lü, L.; Zhang, B.; Luo, H. Dynamic Relationship among Energy Consumption, Economic Growth and Carbon Emissions in China. Res. Environ. Sci. 2021, 34, 1509–1522. [Google Scholar]
  5. Nawaz, M.A.; Hussain, M.S.; Kamran, H.W.; Ehsanullah, S.; Maheen, R.; Shair, F. Trilemma association of energy consumption, carbon emission, and economic growth of BRICS and OECD regions: Quantile regression estimation. Environ. Sci. Pollut. Res. 2021, 28, 16014–16028. [Google Scholar] [CrossRef] [PubMed]
  6. Li, P.; Ouyang, Y. Quantifying the role of technical progress towards China’s 2030 carbon intensity target. J. Environ. Plan. Manag. 2021, 64, 379–398. [Google Scholar] [CrossRef]
  7. Liu, X.; Ji, Q.; Yu, J. Sustainable development goals and firm carbon emissions: Evidence from a quasi-natural experiment in China. Energy Econ. 2021, 103, 105627. [Google Scholar] [CrossRef]
  8. East Money Website. Available online: https://data.eastmoney.com/jgdy/tj.html (accessed on 29 December 2021).
  9. Li, K.; Mai, F.; Shen, R.; Yan, X. Measuring corporate culture using machine learning. Rev. Financ. Stud. 2021, 34, 3265–3315. [Google Scholar] [CrossRef]
  10. Malhotra, S.; Reus, T.H.; Zhu, P.; Roelofsen, E.M. The acquisitive nature of extraverted CEOs. Adm. Sci. Q. 2018, 63, 370–408. [Google Scholar] [CrossRef]
  11. Turney, P.D. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. arXiv 2002, arXiv:0212032. [Google Scholar]
  12. Pang, B.; Lee, L.; Vaithyanathan, S. Thumbs up? Sentiment classification using machine learning techniques. arXiv 2002, arXiv:0205070. [Google Scholar]
  13. Cambria, E.; Schuller, B.; Xia, Y.; Havasi, C. New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 2013, 28, 15–21. [Google Scholar] [CrossRef] [Green Version]
  14. Ortony, A.; Clore, G.L.; Collins, A. The Cognitive Structure of Emotions; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
  15. The Natural Language Processing Group at the Department of Computer Science and Technology, Tsinghua University (THUNLP). Available online: http://nlp.csai.tsinghua.edu.cn/site2/index.php/13-sms (accessed on 29 December 2021).
  16. Xu, L.; Lin, H.; Pan, Y.; Ren, H.; Chen, J. Constructing the affective lexicon ontology. J. China Soc. Sci. Tech. Inf. 2008, 27, 180–185. (In Chinese) [Google Scholar]
  17. Yu, J.; Yin, J.; Fei, S. Identifying Synonyms Based on Sentence Structure Analysis. Data Anal. Knowl. Discov. 2013, 29, 35–40. [Google Scholar] [CrossRef]
  18. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  19. Heinrich, V.H.; Dalagnol, R.; Cassol, H.L.; Rosan, T.M.; de Almeida, C.T.; Silva Junior, C.H.; Campanharo, W.A.; House, J.I.; Sitch, S.; Hales, T.C.; et al. Large carbon sink potential of secondary forests in the Brazilian Amazon to mitigate climate change. Nat. Commun. 2021, 12, 1785. [Google Scholar] [CrossRef] [PubMed]
  20. Machine Learning Mastery. Available online: https://machinelearningmastery.com/much-training-data-required-machine-learning/ (accessed on 7 March 2022).
  21. GitHub. Available online: https://github.com/luyuyuyu/gov_mkt_carbon_nlp/blob/main/raw_data (accessed on 7 March 2022).
  22. The World Bank. 2021. World Development Indicators. Available online: https://databank.worldbank.org/source/world-development-indicators (accessed on 16 December 2021).
  23. GitHub. Available online: https://github.com/luyuyuyu/gov_mkt_carbon_nlp/blob/main/clean.zip (accessed on 3 March 2022).
  24. Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach, 2nd ed; Prentice Hall: Hoboken, NJ, USA, 2003. [Google Scholar]
  25. Oshiro, T.M.; Perez, P.S.; Baranauskas, J.A. How Many Trees in a Random Forest? Springer: Heidelberg/Berlin, Germany, 2012; pp. 154–168. [Google Scholar]
  26. Probst, P.; Wright, M.N.; Boulesteix, A.L. Hyperparameters and tuning strategies for random forest. Wiley Interdiscip Rev. Data Min. Knowl Discov. 2019, 9, e1301. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Flowchart of our work.
Figure 1. Flowchart of our work.
Sustainability 14 05046 g001
Figure 2. The frequency of keywords in the dataset.
Figure 2. The frequency of keywords in the dataset.
Sustainability 14 05046 g002
Figure 3. A dramatic increase in the proportions of relevant words, especially in period3 (after the goal was incorporated into the government report).
Figure 3. A dramatic increase in the proportions of relevant words, especially in period3 (after the goal was incorporated into the government report).
Sustainability 14 05046 g003
Figure 4. Firms’ sentiment weights with respect to carbon reduction and environmental protection (this is an interactive plot; see the complete figure at https://github.com/luyuyuyu/gov_mkt_carbon_nlp/blob/main/p1.html accessed on 5 Mar 2022).
Figure 4. Firms’ sentiment weights with respect to carbon reduction and environmental protection (this is an interactive plot; see the complete figure at https://github.com/luyuyuyu/gov_mkt_carbon_nlp/blob/main/p1.html accessed on 5 Mar 2022).
Sustainability 14 05046 g004
Figure 5. Distribution of the daily average sentiment weight in each period.
Figure 5. Distribution of the daily average sentiment weight in each period.
Sustainability 14 05046 g005
Figure 6. The group difference test of the daily average sentiment weight in each period. ns: p > 0.05; ***: p ≤ 0.001.
Figure 6. The group difference test of the daily average sentiment weight in each period. ns: p > 0.05; ***: p ≤ 0.001.
Sustainability 14 05046 g006
Figure 7. There are significant differences of firms’ attitudes between p0 and p3, p1 and p3, p2 and p3. ns: p > 0.05; ***: p ≤ 0.001.
Figure 7. There are significant differences of firms’ attitudes between p0 and p3, p1 and p3, p2 and p3. ns: p > 0.05; ***: p ≤ 0.001.
Sustainability 14 05046 g007
Figure 8. The variable importance ranking in model 3.
Figure 8. The variable importance ranking in model 3.
Sustainability 14 05046 g008
Figure 9. The importance ranking after removing the date variable.
Figure 9. The importance ranking after removing the date variable.
Sustainability 14 05046 g009
Figure 10. The variable importance ranking in model 4.
Figure 10. The variable importance ranking in model 4.
Sustainability 14 05046 g010
Table 1. Variables and explanations.
Table 1. Variables and explanations.
VariablesExplanationSource
companyName of the invested public firmEast Money’s website
com_codeThe stock code of the company
dateThe date when the Q&A was conducted by investors and the firm
textThe text record of the Q&AEast Money’s website, cleaned by the author; only texts about the environment were preserved
weightThe sentiment score calculated from the variable text using sentiment analysisCalculated by the author
period (p1, p2, p3)The time category of the Q&A record. P1 refers to those from before the “Double Carbon” goal was set. P3 refers to those from after the goal was incorporated into the government’s work report. P2 is the time between p1 and p3.
Then, we split p1 into p0 and p1 according to the time of the COVID-19 outbreak in China. When p0 is included, p1 refers to the period between the Wuhan shutdown and when the “Double Carbon” goal was proposed.
The writer set this according to the variable date
current_valueof the stock valueChoice dataset from East Money
percentage of increaseof the stock value
amount of floatof the stock value
volumeof the stock value
recent trading volumeof the stock value
speed of incrementof the stock value
turnoverof the stock value
volume of transactionof the stock value
highest valueof the stock value
lowest valueof the stock value
openof the stock value
closeof the stock value
stock amplitudeof the stock value
quantity relative ratioof the stock value
price-earnings ratio.TTM.of the stock value
price-earnings ratio.LYR.of the stock value
price/book value ratioof the stock value
market_valueof the stock value
total valueof the stock value
industry96 different industries; the industry to which the company belongs
the percentage of float in 60 daysof the stock value
the percentage of float in this yearof the stock value
Table 2. Methods and findings.
Table 2. Methods and findings.
MethodsFindings
Descriptive statisticsAfter the goals were proposed by President Xi and incorporated into government work reports, the frequency of words related to the environment mentioned in investor Q&As increased.
Sentiment analysis (one of the NLP methods)We obtained the sentiment score for carbon reduction.
Analytics on the sentiment scoreGroup analytics (Wilcoxon test)
(1)
There was a significant increase in positive attitudes toward the environment after the “Double Carbon” goal was incorporated into government reports, but not after the goal was set.
(2)
There were significant differences between different industries.
model1: lm1
(1)
COVID-19 showed a significant influence on the sentiment score.
(2)
The stock value, float of the stock value, and industry also influenced the sentiment score.
model2: lm2The sentiment score was significantly influenced by whether a firm was in the technology industry.
model3: rf1A non-NLP way to predict firms’ attitudes was provided.
model4: rf2
Applied the four models for prediction and estimated the models by using the RMSE (a standard machine learning procedure).Model3 (rf1) had the best RMSE, which means the lowest error in prediction.
Table 3. The simplest sentiment dictionary.
Table 3. The simplest sentiment dictionary.
WordSentiment Weight
sad−1
very sad−2
happy1
very happy2
Table 4. The time distribution of the data.
Table 4. The time distribution of the data.
Date13 November 2018
to
22 September 2020
22 September 202022 September 2020
to
5 March 2021
5 March 20215 March 2021
to
12 November 2021
Periodperiod1President Xi proposed China’s “Double Carbon” goal at the United Nationsperiod2The “Double Carbon” goal was written into the State Council’s government work reportperiod3
Number of data records 134,73153,309116,282
The proportion of the total volume44.27%17.52%38.31%
Table 5. Frequency of the appearance of relevant words in the data.
Table 5. Frequency of the appearance of relevant words in the data.
Key wordsPeriodFrequencyThe Proportion of Surveys In Each Period
carbonperiod153793.9924%
period236736.89%
period320,45917.59%
total29,511
low carbonperiod13350.248%
period26381.197%
period336783.16%
total4651
carbon neutralizationperiod180.005937757%
period26991.3%
period387027.48%
total9409
carbon peakperiod100
period21820.34%
period352394.5%
total5421
emission reductionperiod123161.7%
period212872.4%
period354784.7%
total9081
energy savingperiod181916.0795%
period227685.19%
period311,4309.829%
total22,389
Table 6. The sentiment weight distribution in each period.
Table 6. The sentiment weight distribution in each period.
PeriodPeriod1Period2Period3
Sentiment weight distributionMin: −5.000Min: −3.00Min: −5.00
1st Qu: 3.0001st Qu: 3.001st Qu: 3.00
Median: 6.000Median: 7.00Median: 8.00
Mean: 9.155Mean: 10.28Mean: 12.41
3rd Qu: 12.0003rd Qu:15.003rd Qu: 16.00
Max: 210.000Max: 113.00Max: 809.00
Count28,97710,81833,290
Table 7. The group analysis: A Wilcoxon test was used to verify whether the differences in groups were significant.
Table 7. The group analysis: A Wilcoxon test was used to verify whether the differences in groups were significant.
VariableGroup1Group2pp.Adjp.Formatp.SignifMethod
1avg_sentip1p20.8720640.870.87nsWilcoxon
2avg_sentip1p33.02 × 10−139.10 × 10−133.00 × 10−13***Wilcoxon
3avg_sentip2p36.71 × 10−91.30 × 10−86.70 × 10−9***Wilcoxon
ns: p > 0.05; ***: p ≤ 0.001.
Table 8. There were significant differences in the sentiment weights among industries.
Table 8. There were significant differences in the sentiment weights among industries.
VariableGroup1Group2pp.Adjp.Formatp.SignifMethod
weightInternet technologyInternet business0.00516410.00516**Wilcoxon
weightInternet technologyChemical fertilizer and pesticide1.74 × 10−105.50 × 10−71.70 × 10−10***
weightInternet technologyNew materials1.13 × 10−284.40 × 10−25<2 × 10−16***
weightInternet technologyChemical materials0.00104210.00104**
weightInternet technologyChemical products0.02940110.0294*
weightInternet technologychemical/pharmaceutical8.32 × 10−70.00238.30 × 10−7***
A total of 4554 rows were omitted, and 3122 of the 4560 comparison groups had significant differences.
The complete results are available at: https://github.com/luyuyuyu/gov_mkt_carbon_nlp/blob/main/by_field.csv (accessed on 3 March 2022).
Significant code: *: p ≤ 0.05; **: p ≤ 0.01; ***: p ≤ 0.001.
Table 9. Results of model 2.
Table 9. Results of model 2.
TermEstimateStd.Errort Valuep.ValueSignif Codes
1(Intercept)101.494413.5127.5114265.94 × 10−14***
2date−0.005150.000748−6.895415.43 × 10−12***
3current_value−0.297460.078357−3.796280.000147***
4percentage of increase0.7324610.03477121.06524.35 × 10−98***
5amount of float0.2128330.1020692.0851930.037057*
6volume−1.53 × 10−85.04 × 10−9−3.040760.002361**
7recent trading volume0.0002254.14 × 10−55.4475735.13 × 10−8***
8open0.3086440.0787963.917038.98 × 10−5***
9price–earnings ratio.TTM.0.0043320.0004679.2683431.96 × 10−20***
10total value−2.44 × 10−121.36 × 10−12−1.79830.072136.
11percentage of float in 60 days−0.035910.002974−12.07731.55 × 10−33***
12percentage of float in this year0.0058560.0007337.9861531.42 × 10−15***
13–15
period, compare to p0
periodp11.6437330.3298414.9834066.27 × 10−7***
periodp23.4857510.4563087.6390272.23 × 10−14***
periodp36.4882190.57682111.248232.56 × 10−29***
16whether_tech01.1495430.3629723.167030.001541**
17whether_tech10.3655390.1390872.6281360.008588**
Significant codes: .: p ≤ 0.1; *: p ≤ 0.05; **: p ≤ 0.01; ***: p ≤ 0.001.
Table 10. Steps of the forward selection.
Table 10. Steps of the forward selection.
The First Step to Add a Variable
Start: AIC = 281,527.3
Weight~1
DfSum of SqRSSAIC
+industry951,291,71911,364,592276,220
+period3130,99612,525,314281,002
+%increase1126,48312,529,827281,016
+date184,43112,571,879281,187
+%increase_this_year182,65312,573,657281,195
+amount of float145,13612,611,175281,347
+price-earnings ratio.TTM.134,49612,621,814281,390
+current_value128,36012,627,950281,415
+today126,43512,629,876281,423
+%increase_60days1236112,653,950281,520
+recent_trading_volume1235712,653,954281,520
+volume1185512,654,455281,522
+<none> 12,656,310281,527
+total_value119012,656,120281,529
From the lines above, we found that adding the industry variable to the starting model (weight of ~1) would lead to the best AIC. Thus, the stepwise selection started with a weight of 1 + the field in the next step.
Step 2:
AIC = 276,219.6
Weight ~ 1 + industry
DfSum of SqRSSAIC
+ period3108,20411,256,388275,737
+%increase_this_year164,28511,300,307275,932
+ date159,84311,304,748275,952
+amount of float136,59111,328,000276,057
+open1580211,358,790276,196
+current_value1577611,358,816276,196
+amount_of_float1250811,362,083276,210
+total_value1166011,362,931276,214
+volume194011,363,652276,217
+<none> 11,364,592276,220
+%increase_60days129811,364,294276,220
+recent_trading_volume117011,364,422276,221
+price-earnings ratio.TTM.111411,364,477276,221
From the lines above, we found that adding the period would lead to the best AIC. Thus, the stepwise selection started with a weight of ~1 + the field + the period in the next step.
Several steps were omitted, and the AIC continued improving until the model became weight ~ industry + period + %increase_this_year + %increase + %increase_60days + date + amount_of_float + current_value + open + total_value + recent_trading_volume + price-earnings ratio.TTM.
We can see that in this step, adding the variable “volume” was not better than adding nothing (<none>) according to the AIC. Thus, the stepwise variable selection suggested that we delete the volume variable.
DfSum of SqRssAIC
<none> 11,104,597275,064
+volume137.3711,104,560275,066
Table 11. Prediction results of the four models.
Table 11. Prediction results of the four models.
Model1 (lm1)Model2 (lm2)Model3 (rf1)Model4 (rf2)
RMSE13.9149317.6379610.9837912.84664
noteBaselineUse the whether_tech variable instead of industry in comparison with model 1.Cannot use the industry variable, since rf models reject factors with too many levels (96 levels in the industry variable).
Remove the volume variable in model 1 according to the stepwise selection result.
Add the whether_tech variable in comparison with model 3.
According to the RMSE indicators, we found that model 3 had the best performance.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, C.; Li, L.; Zheng, J.; Wang, J.; Yuan, Y.; Lv, Z.; Wei, Y.; Han, Q.; Gao, J.; Liu, W. China’s Public Firms’ Attitudes towards Environmental Protection Based on Sentiment Analysis and Random Forest Models. Sustainability 2022, 14, 5046. https://doi.org/10.3390/su14095046

AMA Style

Li C, Li L, Zheng J, Wang J, Yuan Y, Lv Z, Wei Y, Han Q, Gao J, Liu W. China’s Public Firms’ Attitudes towards Environmental Protection Based on Sentiment Analysis and Random Forest Models. Sustainability. 2022; 14(9):5046. https://doi.org/10.3390/su14095046

Chicago/Turabian Style

Li, Cai, Luyu Li, Jiaqi Zheng, Jizhi Wang, Yi Yuan, Zezhong Lv, Yinghao Wei, Qihang Han, Jiatong Gao, and Wenhao Liu. 2022. "China’s Public Firms’ Attitudes towards Environmental Protection Based on Sentiment Analysis and Random Forest Models" Sustainability 14, no. 9: 5046. https://doi.org/10.3390/su14095046

APA Style

Li, C., Li, L., Zheng, J., Wang, J., Yuan, Y., Lv, Z., Wei, Y., Han, Q., Gao, J., & Liu, W. (2022). China’s Public Firms’ Attitudes towards Environmental Protection Based on Sentiment Analysis and Random Forest Models. Sustainability, 14(9), 5046. https://doi.org/10.3390/su14095046

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop