Is Investing in Companies Manufacturing Solar Components a Lucrative Business? A Decision Tree Based Analysis

In an era of increasing energy production from renewable sources, the demand for components for renewable energy systems has dramatically increased. Consequently, managers and investors are interested in knowing whether a company associated with the semiconductor and related device manufacturing sector, especially the photovoltaic (PV) systems manufacturers, is a money-making business. We apply a new approach that extends prior research by applying decision trees (DTs) to identify ratios (i.e., indicators), which discriminate between companies within the sector that do (designated as “green”) and do not (“red”) produce elements of PV systems. Our results indicate that on the basis of selected ratios, green companies can be distinguished from the red companies without an in-depth analysis of the product portfolio. We also find that green companies, especially operating in China are characterized by lower financial performance, thus providing a negative (and unexpected) answer to the question posed in the title.


Introduction
The majority of the global energy supply is generated by the burning of fossil fuels. Progress and global economic growth have been accompanied by an increase in the consumption of fossil fuels, resulting in climate change and global warming. Importantly, fossil fuels are distributed unequally across the global geography, creating challenges related to access in unstable and conflict zones. This has resulted in a shift towards the use of renewable energy sources (RES), with the intention of reducing dependency on fossil fuels and the challenges associated with the use of fossil fuels. Renewable energy is locally available, clean, sustainable and eco-friendly and therefore offers an attractive alternative to fossil fuels. Consequently, researchers and policy makers have shown an increased interest in greener sources of energy. According to data [1] available from the International Energy Agency (IEA), the share of electricity generation from renewable sources in overall energy production in 2018 has exceeded 25% and is constantly growing. This is evident in the increased production numbers of renewable energy over the three preceding decades (see Figure 1). During recent decades, the share of energy from wind and solar photovoltaic (PV) in total production of renewable energy has increased significantly. Many countries have decided to promote renewable energy and to reduce dependence on fossil fuels and to do so, have established mandatory renewable energy targets. Such measures form part of respective governments' legislated plans that require electricity retailers to source specific proportions of total electricity sales from RES within a fixed time frame. For example, the People's Republic of China and India have a target level of 50% and 40% of total electric power generation from non-fossil energy sources, respectively. Renewable energies have been incorporated leading source of non-fossil energy. Specifically, the People's Republic of China has established targets of 723 GW from RES by 2020. The underlying targets include increasing wind power capacity to 210 GW and solar energy to 110 GW [2]. In turn, the government of India set a target of achieving 175 GW from RES by 2022 which mainly consists of energy from solar (100 GW) and wind (60 GW) [3]. Moreover, Turkey, Thailand and Chinese Taipei have established RES targets as part of their total electric power generation. Turkey has set a target share of RES at 30% by 2030. Thailand has set a target level of 20% by 2036 and Chinese Taipei has a target of 8% by 2025 (see Figure 2). According to data provided by the IEA, none of the countries with set targets have achieved their targets, aside from Turkey which achieved its target in 2015.  Many countries have decided to promote renewable energy and to reduce dependence on fossil fuels and to do so, have established mandatory renewable energy targets. Such measures form part of respective governments' legislated plans that require electricity retailers to source specific proportions of total electricity sales from RES within a fixed time frame. For example, the People's Republic of China and India have a target level of 50% and 40% of total electric power generation from non-fossil energy sources, respectively. Renewable energies have been incorporated leading source of non-fossil energy. Specifically, the People's Republic of China has established targets of 723 GW from RES by 2020. The underlying targets include increasing wind power capacity to 210 GW and solar energy to 110 GW [2]. In turn, the government of India set a target of achieving 175 GW from RES by 2022 which mainly consists of energy from solar (100 GW) and wind (60 GW) [3]. Moreover, Turkey, Thailand and Chinese Taipei have established RES targets as part of their total electric power generation. Turkey has set a target share of RES at 30% by 2030. Thailand has set a target level of 20% by 2036 and Chinese Taipei has a target of 8% by 2025 (see Figure 2). According to data provided by the IEA, none of the countries with set targets have achieved their targets, aside from Turkey which achieved its target in 2015. Many countries have decided to promote renewable energy and to reduce dependence on fossil fuels and to do so, have established mandatory renewable energy targets. Such measures form part of respective governments' legislated plans that require electricity retailers to source specific proportions of total electricity sales from RES within a fixed time frame. For example, the People's Republic of China and India have a target level of 50% and 40% of total electric power generation from non-fossil energy sources, respectively. Renewable energies have been incorporated leading source of non-fossil energy. Specifically, the People's Republic of China has established targets of 723 GW from RES by 2020. The underlying targets include increasing wind power capacity to 210 GW and solar energy to 110 GW [2]. In turn, the government of India set a target of achieving 175 GW from RES by 2022 which mainly consists of energy from solar (100 GW) and wind (60 GW) [3]. Moreover, Turkey, Thailand and Chinese Taipei have established RES targets as part of their total electric power generation. Turkey has set a target share of RES at 30% by 2030. Thailand has set a target level of 20% by 2036 and Chinese Taipei has a target of 8% by 2025 (see Figure 2). According to data provided by the IEA, none of the countries with set targets have achieved their targets, aside from Turkey which achieved its target in 2015. Achieving these targets necessitates an increase in installed electric power generation capacity. Thus, the demand for manufactured solar modules, solar cells, solar silicon rods, solar wafers, solar power, solar photovoltaic products and equipment is increasing. This increase in demand is further compounded by the incentives offered by governments to promote investment in the PV industry and RES energy in general. For example, the Chinese government has set up national science and technology plans to support the research and development (R&D) into PV technology, the setting up of key laboratories that promote R&D in the industry and the promotion of demonstration projects in rural areas. Also, a feed-in tariff policy was implemented and exemptions from import tariffs were offered for imported equipment for domestic and foreign PV investment projects. The Chinese government also offered tax incentives for PV related enterprises in the form of R&D cost rebates, rebates for the purchases of special equipment and the amortization of costs for intangible assets and tax policies to promote PV power generation, high-tech enterprises in the PV sector and investment in R&D and PV production processes [7,8] provides an overview of the measures applied to promote investment in PV capacity within the EU-27 countries. The most popular measure to promote PV is the use of feed-in tariffs, which guarantee the price at which electricity is purchased. This is then followed by subsidies which are part of specific national programmes although subsidies are decreasing. Then, tax incentives are offered in almost half of the EU-27 countries. Broadly, these include tax credits, exemptions and reduced tax rates. For example, in Bulgaria, value added tax (VAT) for PV revenues is 10% as opposed to the normal rate of 15%. Tax deductions or tax credits are applied in Belgium, Denmark, France, Germany, Ireland and the Netherlands, reducing the amount of taxable income or the amount of tax due respectively. A number of countries have implemented tradable green certificates. For example, Belgium implemented a green certificate trading system for electricity, providing renumeration per certificate coming from a grid operator. Finally, soft loans have also been offered by a small number of EU countries. Solangi et al. [9] outline measures used to promote solar energy deployment in the US and Canada. In the US, incentives were formulated as early as 1978, with investment tax credits being offered for renewable energy technologies initially. Residential tax credits of 30% and business tax credits for investment in renewable energy, including solar energy, of 15% were offered respectively. Later on, residential and business investment tax credits were raised to 30% and extended, resulting in a doubling of installed PV capacity. In Canada, feed-in tariffs were implemented in 18 provinces, with the feed-in tariff implemented in Ontario credited with stimulating manufacturing and the growth of the Ontario PV industry. The Canadian government also provided loan guarantees, tax holidays and subsidies to support PV manufacturing. Sachu [10] outlines policies applied in the top 10 RES producing countries, including Germany. To promote the use of renewable energy, the German government mandated the purchase of renewable energy and offered large subsidies to renewable energy producers. A target was set to reach a specified level of solar PV capacity and feed-in tariffs were established. Sachu [10] notes that the policies applied in Germany have had a significant impact on reducing soft costs associated with solar installation, such as permitting, inspections, interconnection, financing and customer acquisition.
All these policies, which include targets and incentives, require that investment in RES companies increases. This can readily take place happen if RES related companies, specifically those that produce photovoltaic components, offer an attractive alternative to other companies that belong to the semiconductor and related device manufacturing sector and are lucrative overall. We contribute to the literature on RES by focusing on a class of companies that operate within the renewable energy value chain. These are companies that manufacture semiconductors and related solid state devices. We divide these companies into two groups. The green group comprises companies that manufacture solar modules, solar cells, solar silicon rods, solar wafers, solar power, solar photovoltaic products and related equipment for RES companies. This group is the focus of this paper. The red group comprises the remainder of companies that are not associated with RES companies).
We seek to identify ratios (i.e., indicators), which discriminate between companies within the manufacturing sector that produce PV components and those that do not. Once we have identified such ratios, we can analyze these ratios to answer the question as to whether these companies show strong financial performance, that is, offer an attractive investment alternative. In the literature, there is great interest in evaluating returns for the energy sector, especially for companies that produce renewable energy. Relationships between returns on renewable energy stocks, changes in the oil price, equity indices, carbon prices and carbon pass-through rates, for example, Trück and Weron [11] examine convenience yields and risk premiums in the EU-wide CO 2 emissions trading scheme during the first Kyoto commitment period. They report that during their sample period, the EUA market has shifted from a period of backwardation to a period of contango with negative convenience yields. Bohl et al. [12] investigate the common factors driving the performance of German renewable energy stocks. They find that renewable energy stocks mirror ambiguity related to the future economic outlook faced by the industry. Henriques and Sadorsky [13] investigate the relationship between alternative energy stock prices, technology stock prices, oil prices and interest rates. Stock prices for alternative energy companies are found to be impacted by shocks to technology stock prices but shocks to oil prices do not have an impact on alternative energy companies. Inchauspe et al. [14] investigate the impact of energy prices and stock market indices on private investments in renewable energy, finding that these have shown substantial growth attributable to government policies, oil prices and evolving market liquidity. Kumar et al. [15] analyze the relationship between oil prices and alternate energy prices, finding that oil prices, stock prices for high technology firms and interest rates impact clean stocks. Managi and Okimoto [16] analyze the relationships between oil prices, clean energy stocks prices and technology stock prices. The impact of legislation on electricity future contracts in examined by Reference [17]. Studies also investigate the financial performance of firms operating in the energy sector [18][19][20][21][22]. For example, Capece et al. [18] focus on changes in the performance of the natural gas retail markets and analyze the financial statements of 105 companies by performing cluster analysis. They report that most companies in the sector perform well, with best performers belonging to existing business groups. Capece et al. [19] analyze the combined effect of regulatory measures and of the economic crisis on the performance of Italian gas companies. The observe that average measures of profitability rose until 2009 and then declined in 2009. This is attributed to the financial crisis and regulatory policies.
A number of studies, similarly to ours, evaluate the financial performance of firms that operate within the renewable energy sector using financial ratios. For example, Halkos and Tzeremes [20] evaluate the financial performance of renewable energy firms. They apply a data envelopment analysis and construct financial ratios to assess performance. Performance is found to be positively related to high levels of returns on assets and equity and by lower levels of debt. They also undertake a within sector analysis, finding that firms producing wind energy outperform firms generating hydropower. The dataset is limited to the Greek renewable energy sector and considers a limited number of ratios in the analysis of RES firm performance. Paun [21] analyzes the sustainability of the renewable energy sector in Romania. Their sample encompasses 91 major energy producers for the years 2012 to 2015. To analyze performance, they consider profitability ratios, measures of return on equity and measures of return on assets but do not consider key measures in the form of return on investment and the current ratio, given limitations in the data. They find that companies that produce fossil fuels perform better relatively to companies that produce green energy. They also report deteriorating performance, after 2013 and relatively low returns on equity for RES firms after 2014, suggesting that RES firms underperform other sectors. Also, returns on assets are found to decrease after 2013, this being attributed to changes in government and delays in issuing green certificates, leading to decreases in investment. Based upon ratio analysis, Paun [21] concludes that RES companies are close to financial distress. Tomczak [22] investigates whether powerplants that produce electricity using renewable energy sources are in a better financial position than those that rely upon traditional (fossil fuel) energy sources using a sample of companies in Baltic and Central European countries, comprising a total sample of 37 companies. In order to assess financial performance, he considers a total of 16 ratios indicative of four aspects, namely liquidity, profitability, turnover and debt-ratios that are used in bankruptcy prediction. An analysis of ratios shows RES companies have lower return on assets and return on equity ratios than fossil fuel energy producers, which translates into lower interest from potential investors. Also, fossil fuel companies were found to be characterized by higher profitability but lower turnover ratios. Tomczak [22] concludes that investing in RES companies is not profitable business. Our study is similar to these studies; we also consider financial ratios, through the use of DTs, we identify and then analyze financial ratios to gain an insight into the performance of green companies.
Another strand of literature considers the relationship between corporate environmental performance and financial performance, with performance again measured by ratios, demonstrating the value of ratio analysis [23][24][25][26]. Clarkson et al. [23] investigate the factors that impact firms' decisions to engage in a proactive environmental strategy. The study is carried out using a sample of firms belonging to the four most polluting firms in the US, namely the pulp and paper, chemical, oil and gas and metals and mining industries. They consider how changes in environmental strategy impact profitability, liquidity and leverage. Analysis shows that becoming green leads to improved firm performance and that improved firm performance subsequently improves relative environmental performance. This is reflected by increasing return on assets, rising cashflows (profitability) and a decreases in leverage. Ruggiero and Lehkonen [24] show that there is a negative relationship between renewable energy production by electricity producers and short-and long-term financial performance. To measure financial performance, they use the return on equity and the return on assets and a firm's market value relative to total assets and regress these onto the volume of renewable energy produced, as a main variable of interest. Granger-causality is also tested. They find a negative relationship between measures of financial performance and the amount of renewable energy produced, attributing this to higher capital costs. Sueyoshi and Goto [25] examine the impact of environmental regulations in the US. Similarly to Ruggiero and Lehkonen [24], they use the return on assets as a measure of financial performance. They regress the return on assets for a 167 US electric utilities onto measures of environmental protection. Results indicate that environmental protection expenditure by US electric utilities has resulted in decreased financial performance. Sueyoshi and Goto [25] conclude that that both the positive and negative aspects of environmental policies should be considered, given that emphasis is usually placed upon the positive aspects and negative aspects. Telle [26] challenges the methods applied in previous studies to conclude that firms that go green benefit financially. They note that one of the issues in studies that seek to relate financial performance to environmental performance is the presence of omitted variables. The omission of variables may result in an (erroneous) positive relationship between financial performance and environmental performance. Telle [26] shows that when the return on sales for Norwegian plants are regressed onto observable firm characteristics and omitted unobserved variables are controlled form, the positive impact of good environmental performance disappears. The study's significance lies in challenging findings of a positive impact of environmental performance on financial performance. The question of going green-and analogously-of producing components for the RES industry remains unanswered. Within the context of our present study, such findings suggest that green firms-and those related to RES-may not be characterized by improved financial performance relative to firms that are not green.
Finally, a number studies consider further aspects of the financial performance of firms that operate within the energy sector. For example, Pätäri et al. [27] investigate whether corporate social responsibility investments have an impact on corporate performance within the energy industry, finding that different aspects of corporate social responsibility impact either both profitability and/or market value. Arslan-Ayaydin and Thewissen [28] compare the financial performance of energy sector firms with different environmental scores, finding that those with good scores outperform financially those with poor scores. Sueyoshi and Goto [29] investigate the efficiency of national oil companies and aim to establish whether companies that are under public ownership outperform those under international ownership, showing that companies under national ownership outperform international oil companies under international private ownership in terms of efficiency.
None of these studies seek to investigate whether investing in companies manufacturing solar components is a lucrative business by classifying companies on the basis of ratios and then analyzing these ratios. To answer to this question, we apply a decision tree based analysis. We frame our analysis within the theoretical basis provided by literature on corporate failure and financial distress, which relies upon the classification of firms, as we do in this study [30,31]. The genesis of bankruptcy prediction models, is attributed to Beaver (1966) who uses 30 grouped ratios and information for failed and non-failed industrial firms to identify five ratios predictive of bankruptcy. His noteworthy conclusion is that ratios for distressed firms differ from those of healthy firms, making it possible to predict financial distress by discriminating between healthy and distressed firms. Altman [32] builds upon Beaver's [33] work and applies multiple discriminant analysis (MDA) to estimate a five-factor model predictive of bankruptcy for a sample of manufacturing firms. The joint consideration of ratios in Altman (1968) removes the ambiguity and confusion associated with interpreting individual ratios. Ohlson [34] proposes the use of logit analysis (logistic regression) to estimate an O-score model that combines financial ratios and indicators, yielding a probability of bankruptcy bounded between 0 and 1. The model provides a more precise and readily understood indication of the likelihood of bankruptcy and permits a greater range of outcomes than Altman's (1968) model. Frydman, Altman and Kao [35] propose a non-parametric solution in the form of decision trees (DTs) to classify distressed and non-distressed firms. DTs require no distributional assumptions or transformed variables. They can handle missing values and qualitative data, are readily understood and can incorporate misclassification costs. They apply these to manufacturing and retailing companies, finding that DTs are more accurate and minimize misclassification costs. Koh and Low [36] show that DTs outperform logit analysis and artificial neural networks in predicting bankruptcies. Similarly, Li, Sun and Wu [37] assess the performance of algorithmic implementations of decision trees for companies on the Shenzen and Shanghai Stock Exchange, finding that these outperform other classifications methods-notably methods used in bankruptcy prediction. Gepp, Kumar and Bhattacharya [38] take a similar approach, showing that DT implementations are superior classifiers and predictors of bankruptcy for retailing and manufacturing companies. Fedorova, Gilenko and Dovzhenko [39] extend the application of bankruptcy prediction models to the Russian manufacturing sector by combining statistical methods with artificial intelligence techniques.
We build upon bankruptcy prediction literature that applies DTs to classify bankrupt firms and to predict bankruptcy. We apply DTs in a novel manner. While we do not aim to predict bankruptcy on the basis of financial ratios, we set out to determine whether companies that manufacture solar modules, solar cells, solar silicon rods, solar wafers, solar power, solar photovoltaic products and related equipment (green companies) can be differentiated from other enterprises in the sector that are not associated with RES companies (red companies) on the basis of financial ratios and whether these companies are in a better financial state. Based upon our analysis, the most critical ratios can be identified from a total of 62 ratios. It is hoped that using these ratios will give managers and investors the ability to undertake a broader and more detailed assessment of companies in the sector, without only focusing on profitability ratios.

Data and Methodology
Our methodology comprises a number of steps. First, we collect data from financial reports for companies in the semiconductor and related device manufacturing sector. Financial reports are downloaded from the Emerging Markets Information Service (EMIS) database, a Euromoney Institutional Investor Company (www.emis.com). The initial sample consists of 2345 companies operating in China (1742), Chinese Taipei (272), South Korea (114), Thailand (48), Singapore (37), India (33), Vietnam (24), Hong Kong (21), Malaysia (20), Russia (10), Turkey (6), Ukraine (4), Ecuador (3) and two companies each from the Czech Republic, Indonesia, Iran, Philippines as well as one each from Bulgaria, El Salvador and Romania. To determine whether investing in companies that manufacture RES components pays off, we divide companies in the sector into two groups. The first group comprises Energies 2020, 13, 499 7 of 27 enterprises that manufacture solar modules, solar cells, solar silicon rods, solar wafers, solar power, solar photovoltaic products and related equipment. We define companies within this group as green companies. The second group comprises companies that are not associated with RES companies. We defined these companies as red companies. The number of companies in our samples is unbalanced that means the number of green companies is much lower than the number of red companies. There are 528 companies in the green group while 1817 companies in the red group. All data are for 2017.
The second step is to construct financial ratios for the companies in our sample using financial reports. Consequently, we consider 62 indicators. Most of these have also been considered in Reference [40]. The ratios considered characterize different aspects of financial performance, namely liquidity, profitability, efficiency, solvency and other aspects (see Table 1). Such variables have been widely used in the analysis of the financial status of companies for the purposes of bankruptcy prediction.
The final step is to apply a decision tree (DT) analysis. We take a similar approach as in bankruptcy prediction models and the principle remains the same; we identify ratios that discriminate between green and red companies. We however do not apply commonly used techniques in bankruptcy prediction. Therefore, we do not use multiple discriminant analysis (MDA) as this technique requires a multivariate normal distribution and equal dispersion matrices. Also, we do not use logit analysis as this method is highly sensitive to multicollinearity and relies upon the assumption of homogenous variation in the data and is highly sensitive to outliers, missing values and extreme non-normality [37,41,42]. Instead, we apply DTs to differentiate between companies associated with the RES manufacturing sector and companies belonging to the general semiconductor and related device manufacturing sector. DT methods have numerous advantages. They require no distributional assumptions or transformed variables. They can handle missing values and qualitative data, are readily understood and can incorporate misclassification costs. Splitting rules are univariate, permitting an easy identification of significant variables. DTs however require the probabilities of successful and failed businesses as inputs. DT algorithms generate a set of tree based classification rules and assign observations to either a successful or failing group, within the context of bankruptcy prediction models. The process begins with a root node, followed by non-leaf nodes which reflect splitting rules-financial ratios. These are then connected to leaf nodes, which represent success or failure. The process of constructing a DT begins with a search for an independent variable that divides observations in a sample in such a way that the difference in the dependent variable is greatest between subgroups. In the next stage, each subgroup is subdivided further by again searching for an independent variable that divides the subgroup so that the difference in the dependent variable is greatest between the subdivided groups. DT algorithms determine the best splitting rule at each non-leaf node. The process continues until splitting no longer produces statistically significant differences in subgroups or subgroups are too small for further division. To simplify the process, some nodes may be removed (pruned), while maintaining a small error rate. DTs differ from logit analysis and MDA in that they identify the relative significance of variables, unlike the latter two approaches which only identify significant variables. DTs nevertheless have predictive power by setting out a sequence of nodes that leads to a classification [36,38,42].  To obtain robustness of results, the database is divided into six samples that consist of different numbers of companies and ratios, detailed information can be found in Tables S1-S6 (Supplementary Energies 2020, 13, 499 9 of 27 Materials). From the onset, we consider all 62 indicators and then proceed to reduce this number to 38 and finally to 8 variables. The criterion for removing indicators from our database is a lack of sufficient data for companies comprising a database. The research samples used in the construction of DTs consider a number of companies. Samples with a low number of variables comprise a larger number of companies. The more ratios in a sample, the lower the number of companies it has. The number of enterprises is also lower for samples in which the number of green and red companies is balanced. By reducing the number of indicators, we increase the number of enterprises in the sample. The list of all samples considered in the study together with number of ratios and companies is reported in Table 2.
We view DTs as being well suited for the task at hand. DTs are appropriate for large datasets with a relatively short history. Furthermore, our aim is to build a tree with a minimum number of nodes. Consequently, DT rules are simpler and easier to interpret. The general algorithm comprises the following steps: 1.
With a set of K-records, determine that they belong to the same class. If so, end the algorithm.

2.
Otherwise, consider all possible classifications of the overall set K into subsets K1, K2, . . . , Kn so that they are as homogeneous as possible.

3.
Assess each of the classifications according to adopted criteria and select the best one. 4.
Divide the set K according to the adopted criteria.
The subject of division is an N-element set of objects. Here, these are companies that are characterized by M + 1 ratios (e.g., retained earnings to total assets ratio, (gross profit + extraordinary items + financial expenses) to total assets ratio) which indicate their financial standing. Therefore, the vector [y, x] together with the respective financial ratios can be defined as follows: where: x 1 , x 2 , . . . , x M are financial variables. y is the dependent variable, defined a C for red companies and Z for green companies.
Once indicators are defined (data from the Equation (1)), a relationship between y and the variables x M can be established on the basis of the predictors that determine the value of y as follows: For this purpose, a recursive split methodology is used to obtain an approximation of the following specification: where: R k for k = 1, 2-disjoint groups in multidimensional space (red,green), a k -model parameters, determined upon the basis of: where: p(l|k )-probability that an element of the R_k group belongs to class l.  The multidimensional space of independent variables (X m ) is divided into groups. The model is created by submitting models built in each of K disjoint groups. For quantitative variables, as used in this study, the model can be stated as follows: where: v (d) km -upper limit of the segment in the m-th dimension of space, v (g) km -lower limit of the segment in the m-th dimension of space, I-indicator function: The DT is a diagrammatic representation of specification (3). The DT algorithm first identifies ratios according to which companies. We term this the training set. It then uses these values to test data set. The model is constructed recursively [43].
We apply three algorithms to construct DTs, namely the Chi-squared Automatic Interaction Detector (CHAID), Classification and Regression Trees (CRT) and Quick, Unbiased, Efficient Statistical Tree (QUEST) algorithms. The CHAID algorithm is an effective algorithm for building DTs developed by Reference [44]. It is mainly used in the segmentation or extension of a DT. The algorithm is appropriate for both quantitative and qualitative variables. It is not a binary method as it can produce more than two categories at any given node. Therefore, this method has the potential to produce a larger DT relative to binary methods. At each step, the CHAID algorithm selects an independent (predictor) variable that has the strongest interaction with the dependent variable. Categories of each predictor are merged if they do not significantly differ in relation to the dependent variable.
The CRT algorithm was developed by Breiman, Firedman, Olshen and Stone in 1984 [45]. Unlike the CHAID algorithm, it is a binary decision algorithm. It is robust to outliers, making it different from other classical methods. It functions in a recurrent manner which means that data is divided into two subsets so that records in each subset are more homogeneous than in the previous sub-set. Both subsets are then again divided until the criterion of homogeneity and other retention criteria are met. The ultimate objective is to maximize homogeneity within sample sub-groups. We apply the Gini index to determine the optimal sub-set division: The QUEST algorithm is a relatively new DT algorithm for binary classifications. It is most often used to classify and explore data [46] and is similar to the CRT algorithm. The difference lies in that the QUEST algorithm is time efficient and unburdened. It does not lose its predictive quality while being efficient and decreases complexity and thereby minimizes DT size [47].
To confirm the accuracy of the resultant classifications, we examined accuracy with various settings for training parameters and apply the 25-fold cross-validation methodology. The calculated risk of cross verification in the output is the risk averaging for 25 test samples. Subsampling DTs are not shown for n-fold cross verification. Only the DT constructed on the full sample is reported. Similarly, only the full sample classification table is reported. Finally, we also investigate whether based upon the results obtained and the analysis of profitability ratios, green companies exhibit superior performance relative to the remainder of the sector. We point out the most critical indicators occurring in the tree and we test the statistical significance of differences in the selected indicators between groups of companies using the Student's t-test.

Results
This section is organized as follows. First, data analysis in defined samples is presented. Then, DT results are shown for outcomes of the 25-fold cross-validation. Next, the analysis of selected of critical indicators is given. Finally, the analysis of profitability ratios is introduced. The outcome of research is presented in Tables 3-26

Data Analysis in Defined Samples
We investigate whether the data can be further analyzed and whether the variables are correctly differentiated in the defined samples. Several methods are applied for this purpose. Firstly, basic statistics are calculated for each sample for all variables. These are the minimum value, the maximum value, arithmetic means, standard deviation and variation coefficients. The obtained values exhibit proper characteristics and correctness of data for all analyzed samples in that scope (example calculated values for the database RG_38V for all companies and for green and red ones are presented in Tables 3-5). For each variable, the variation coefficient exceeds the recommended critical value (higher than 0.1 or 0.15). We can therefore proceed with our analysis.  An analysis of the reliability of the variables is also undertaken and we set out to determine whether differentiation in our defined is sufficient for further research. For this purpose, we estimate Cronbach's alpha for the databases and report the results in Table 6. The objective is to eliminate indicators that do not show sufficient differentiation.
The samples with the lowest Cronbach's alpha are the databases constructed using eight indicators. These are the sample with a large disproportion between red and green records (RG_8V) and the balanced sample (RG_BD_8V). However, both values indicate the viability for further analysis. The remaining four samples exhibit sufficiently high variable differentiation. The largest differentiation occurs in the sample comprising a balanced number of records of red and green companies for all 62 indicators (RG_BD_62V). A slightly lower value is obtained for the database comprising all 62 indicators (RG_62V) but without a balanced number of records for both groups of companies. The remaining two groups (RG_38V, RG_BD_38V) show slightly lower but nevertheless sufficient differentiation. Our Cronbach's alpha estimates confirm the viability of further analysis for the groups of variables in all samples.

DT Results for Whole Companies
The first samples include 38 indicators. It comprises 486 companies, of which 402 are red companies and the remaining 84 are green companies. To limit the size of our DTs in the beginning, the minimum number of observations in the parent node is defined as 100 and the child node as 50 The algorithm identifies four independent variables, namely the short-term liabilities to operating expenses ratio (X51), days sales of inventory II ratio (X46), days sales of inventory I ratio (X20) and the debt to assets ratio (X2), to construct the DT. The DT subsequently comprises nine nodes, including five end nodes. Unfortunately, the classification matrix following the application of 25-fold cross-validation indicates that it is not possible to correctly classify companies in relevant groups. The effect of cross-validation for various numbers of companies in parent and child nodes is presented in Table 5. The algorithms identify financial ratios which unambiguously show that a company manufactures RES component or not. Next, we reduced the number of companies in the parent and child node to 50 and then 25, followed by a reduction to 20 and finally to 10. The reduction in the number of companies in the nodes improves classification ability attributable to the algorithms. The effect of cross-validation for a different number of records at parent and child nodes is reported in Table 7. We note that the short-term liabilities to operating expenses (X51) ratio, is the most important ratio, according to which the first classification in the DT is established. The second one is X46 days sales of inventory II. Cross-validation for the CRT algorithm is at 99.0% for red and 20.2% for green companies respectively and the QUEST algorithm achieves a cross-validation level of 97.5% for red companies and 19.0% for green companies, respectively. According to the QUEST algorithm, the Energies 2020, 13, 499 15 of 27 first classifying ratio is the equity to total asset ratio (X10). The QUEST algorithm underperforms the CHAID algorithm, as reported in Table 8. A further consideration is whether a reduction in the number of indicators coupled with a simultaneous increase in the number of companies in the sample will positively impact the classification capability of DT algorithms. A reduced number of indicators permits the definition of a sample with a higher number of companies. The sample of only eight ratios comprises 2120, of which 479 are green companies.
The classification option for the sample constructed in this way has been checked using all three DT algorithms. The materiality level for division nodes is set at the level of 0.05 permitting a maximum number of 100 iterations and Pearson's chi-square statistics are applied. DT extension is limited to a maximum of three levels. In application of DTs to the RG_8V sample, cross-validation shows very poor accuracy correctness. All three DT algorithms are applied to 20 companies in the parent node and 10 in the child node (Table 9). Next, we construct a sample for all indicators which is accompanied by a decrease in the number of companies This sample comprises 62 variables and 305 companies, including 37 records green companies which means 12.13% of all companies in the sample The number of records of parent and child nodes has been reduced to proportions of 20 and 10. We leave DT size unchanged at the third level as no material changes are observed in the other levels. Following the reduction in the number of records in parent and child nodes, the CHAID algorithm produces a correct classification rate of 96.6% for red companies and 29.7% for green companies. Of all DT algorithms, the CHAID algorithm produces the best results whereas the CRT algorithm underperforms classifying 96.6% of red companies and 27.0% of green companies correctly. The QUEST algorithm does not show any classification capability for green companies (see Table 10). Based upon these results, the question that can be posed is as to whether a reduction in the disproportion between the number of records for green and red companies respectively in the databases will have a material impact on the classification capacity of the algorithm. Consequently, we balance the number of companies in all databases under consideration. The balanced database with the main eight indicators (RG_BD_8V) comprises 963 companies of which 479 constitute green enterprises. The balanced database with 38 indicators (RG_BD_38V) comprises 188 companies, of which 84 are green enterprises. The database with the highest number of indicators (RG_BD_62V)_ comprises only 77 companies, of which 37 are green companies. The results of this classification are presented in Table 13.
Similar to the databases with an unbalanced and balanced number of green and red companies with 38 indicators (RG_38V & RG_BD_38V), the classifying ratio according to which the first division the DT is undertaken is X51. Balancing records in this database brings the expected effect in the form of a material improvement in the DT classification of companies. The number of correctly classified companies now exceeds 50%. The CHAID algorithm produces the most favourable results, with the number of records in the parent and child nodes equalling 20 and 10 and an accurate classification rate of 84.6% for red companies and 67.9% for green enterprises (Table 11). Table 11. Cross-validation classification matrix for the RG_BD_38V sample using the CHAID algorithm. The remaining three algorithms also show material improvements in classification ability. The CRT algorithm, in contrast to CHAID, yields a more accurate classification rate for green companies with 71.4% of companies classified correctly and 81.7% of red companies classified correctly, somewhat fewer than the CHAID algorithm. The QUEST algorithm underperforms, classifying 74.0% of red companies and 58.3% of green companies correctly. In all three cases, the variable used in the first classification is the short-term liabilities to operating expenses ratio (X51) Results are summarized in Table 12. In the balanced database comprising eight indicators, the reduction in the maximum number of companies in parent and child DT nodes from approximately 100 to 50 and then from 50 to 25, improves classification accuracy, as evident in Table 13. Although the percentage of correctly classified green companies increases from 57.6% to 63.3%, at the same time, the percentage of correctly classified red companies decreases from 66.1% to 61.6%. However, the resultant DT is now larger, with three levels of depth and as many as 12 nodes. This contrasts with 10 nodes in the unbalanced database. Table 13. Cross-validation classification matrix for the RG_BD_8V sample, tree-building algorithm: CHAID. The three years gross profit to asset ratio (X24) is the first classifying ratio in the DT. For the unbalanced database with eight indicators (RG_8V), revenue growth ratio (X21) is the classification ratio the root. Other DT algorithms also show higher correct classification rates. The best classification accuracy for green companies is attained by the CRT algorithm, which correctly classifies 61.2% of red companies and 70.1% of green companies. The classification matrix for all DT algorithms for the balanced database with eight indicators is presented in Table 14. The balanced database comprising all 62 indicators (RG_BD_62V) comprises the lowest number of records. It is composed of data for 37 green enterprises and 41 red enterprises, comprising a total of 77 records. The number of records in the parent node is set to 50 and then 20 and in the child node to 25 and 10 respectively. The CHAID algorithm produces the best results (Table 15). It correctly classifies 70.3% of green enterprises and 85.0% of red enterprises. The QUEST algorithm underperforms all other algorithms, failing to correctly classify any green enterprises (Table 16). Table 15. Cross-validation classification matrix for the RG_BD_62V database, tree-building algorithm: CHAID.

The Number of Records in the Parent and Child Nodes: 50 and 25
Observed Using a balanced number of records for red enterprises and green enterprises in all three databases translates into increased classification accuracy for green enterprises. Concurrently, the classification accuracy decreases somewhat for red enterprises (see Table 17). Using the balanced database (RG_BD_8V) for eight variables, the percentage of correctly classified green enterprises for the database increases from 4.6% to 63.3% and for 38 variable database, the percentage of correctly classified green enterprises increases from 47.6% to 67.9% when the databased is balanced. For the database of 62 variables, it increases from 29.7% to 70.4%.
An analogous our study of classification using decision trees was also carried out for grouped indicators. As previous studies show that only balanced databases give positive effects for the classification of enterprises into green and red (see Table 17), only these databases were used for the study, divided by groups of indicators (see Table 18). Databases have been built to analyse the indicators indicated in the groups. The indicators included in the group and the indicators used in research to analyse the RG_BD_62V and RG_BD_38V databases are presented in Table 19. Two balanced databases, namely the RG_BD_ 38V and RG_BD_62V databases, were used for calculations. The most interesting results are presented. Our studies have shown that the CHAID method exhibits the best classification performance. This method was used in this analysis. For group P indicators-the profitability indicators-cross-validation showed a correct classification rate of 85 percent for red enterprises and 70.3 percent for green enterprises (Table 20). The use of cross-validation for a database with fewer indicators (RG_BD_38V) resulted in 88.5 percent of correctly classification red enterprises and 51.2percent of green enterprises. The result is slightly worse for the database with 62 indicators. However, for the group of E indicators, namely efficiency indicators, the result presents an improvement when using a database with 38 indicators (70.2 percent-red, 73.8 percent-green) compared to a database with 62 indicators (95.0 percent-red but only 35.1 percent-green). Following the removal of a single outlier, the model fit improves significantly. Data on green enterprises from the RG_38V database were used to build a linear regression model. The model was built using the stepwise method. Initially, all 38 variables constituting the examined database were introduced into the model (see Table 3). The least squares method was used to fit a regression through a set of observations. The level of significance for retaining a variable was set at 0.05 and at 0.1 for deletion. The dependent variable is X56 (net profit/equity). Following this variable selection procedure, a model comprising 10 predictors was obtained. This model has the highest value of the coefficient of determination R-squared (R 2 = 0.990, R 2 − corrected = 0.988). The representative equation is as follows: All coefficients are statistically significant (Table 21) and the model exhibits the desired statistical significance (Table 22). Endogeneity regarding correlated-omitted-variables bias for the built regression model has been tested. For this purpose, the impact threshold for a confounding variable (ITCV) test was used [48,49]. An impact statistic were calculated (ITCV) indicating the minimum impact of a confounding variable that would be needed to render the coefficient statistically insignificant. According to the received value of ITCV, alpha = 0.05, an omitted variable would have to be correlated at 0.983 with the outcome and at 0.983 with the predictor of interest (conditioning on observed covariates) to invalidate an inference based on a threshold of 0.23 for statistical significance. Correspondingly the impact of an omitted variable must be 0.966 (0.983 × 0.983) to invalidate an inference.

Case Study for Data only for Chinese Companies
Basic research was carried out for 2345 enterprises from various countries, in particular for Asian countries (over 93%). All data was for 2017. First, estimations for all companies were made and they became the basis for calculations for which only Chinese companies were selected from the database. The data constitutes almost 75% of the entire database. The database configured in this way was used to build two databases for enterprises in China, one with 8 indicators and the other with 38 indicators. A database of financial results of Chinese enterprises was used for the calculations. Initially, the database comprised 1742 observations in total (almost 75% of the previous database with data from several different countries), of which 1267 were red enterprises and 475-green enterprises. Because previous studies have shown that the decision tree algorithm works best on databases with an equal number of green and red enterprises, the database has been balanced. Finally, due to lack of data, two databases for Chinese enterprises with different numbers of indicators (8 and 38) and balanced but also different numbers of observations were built (Table 23). China_RG_BD_38V (see Table 3)  38  115  83  32 Calculations have been made for these two balanced databases. The result of the classification is similar to the results for the database in which enterprises were mixed from several countries (Table 24). This is due to the fact that data for Chinese companies constitutes the majority in the analysed database (almost 75%). The first division variables were X21 and X8 respectively.

Analysis of Critical Indicators
The use of DTs makes it possible to identify important indicators that distinguish companies within the sector. The list of our classifying ratios is presented in Table 25. Five ratios represent different groups of ratios, relating to liquidity, turnover, one from our initial analysis and profitability. We identify the short-term liabilities to operating expenses ratio (X51) as the most important ratio for classification. The second most important ratio is the revenue growth ratio (X21). On the other hand, the last one is the size of working capital ratio. This ratio is also used in bankruptcy prediction models for manufacturing companies and its inclusion is motivated by Altman's (1968) model [32]. In Table 26, the differences between the estimated values of financial ratios associated with green and red companies are tested. Aside from the revenue growth ratio (for two databases, RG_62V and RG_BD_62V), these results confirm that the differences in ratios are statistically significant. This confirms that these ratios are important for classifying companies belonging to the semiconductor and solid state device Energies 2020, 13, 499 23 of 27 manufacturing sector. Moreover, green companies that produce components for renewable energy sources (RES) are characterized by lower financial ratios except the payables to operating expenses ratio, which indicate generally higher levels of debt.

Analysis of Profitability Ratios
The analysis of selected indicators shows that only one profitability indicator has been identified as critical. Therefore, key indicators from the investor and manager's point of analyzed in this subsection are the namely the return on assets (ROA), the return on sales (ROS) and the return on equity (ROE) ratios. It is worth adding that the selected profitability ratios were calculated for China given its representativity in the sample. Three-quarters of the sample comprises companies operate in China.

Conclusions
Our study investigates whether investing in the solar technology manufacturers' components is a lucrative business by analysing the semiconductor and related device manufacturing sector. To do so, we apply a novel approach for the sector, namely DTs. We consider 62 financial ratios and our initial database comprises 2345 companies, mostly operating in China. The companies in the sector are classified into two groups, green companies which are those for which production is related to renewable energy and red companies which are those for which production is not related to renewable energy. We define six samples, both balanced and unbalanced samples and comprising large and small samples of companies and 25-fold cross-validation, which does not significantly improve the results. The literature reports on RES companies, with most of the existing literature related to power plants [21,22], electric utilities [24] or energy companies [30]. Our results are similar to other studies of RES companies. On the basis of certain financial ratios, we find that green companies underperform red companies [21,22]. Our contribution lies in addressing the lack of studies that assess the relative financial performance of companies that provide equipment needed for renewable energy production. Moreover, we apply DTs for the purposes of identifying classifying ratios.
The decision tree based analysis identifies the most important indicators for evaluating enterprises, especially for those operating in China. This provides a broader overview of the financial standing of groups of companies to managers and investors. Our results indicate that investing in companies that manufacture RES components may not be a lucrative business. For managers, this is important information that indicates that RES companies are not profitable or as financially sound relative to companies in the general semiconductor and solid state device manufacturing sector. For investors potentially interested in investing in RES associated companies, our findings demonstrate a need for caution. This should be viewed within the context of achieving RES targets in total energy production. Given government targets for renewable energy, the demand for components for renewable energy production will increase. However, it remains to be seen whether this increase in demand will accompany profits.
The topic chosen by us for analysis is difficult and multi-threaded but it is worth making a foundation for further research. Our research provides a solid basis for further exploration of this data. With this foundation in place, we can continue research in the directions indicated by the reviewer. The authors are aware that all factors, both fundamental and economic, may impact the profitability of doing business.

Conclusions
Our study investigates whether investing in the solar technology manufacturers' components is a lucrative business by analysing the semiconductor and related device manufacturing sector. To do so, we apply a novel approach for the sector, namely DTs. We consider 62 financial ratios and our initial database comprises 2345 companies, mostly operating in China. The companies in the sector are classified into two groups, green companies which are those for which production is related to renewable energy and red companies which are those for which production is not related to renewable energy. We define six samples, both balanced and unbalanced samples and comprising large and small samples of companies and 25-fold cross-validation, which does not significantly improve the results. The literature reports on RES companies, with most of the existing literature related to power plants [21,22], electric utilities [24] or energy companies [30]. Our results are similar to other studies of RES companies. On the basis of certain financial ratios, we find that green companies underperform red companies [21,22]. Our contribution lies in addressing the lack of studies that assess the relative financial performance of companies that provide equipment needed for renewable energy production. Moreover, we apply DTs for the purposes of identifying classifying ratios.
The decision tree based analysis identifies the most important indicators for evaluating enterprises, especially for those operating in China. This provides a broader overview of the financial standing of groups of companies to managers and investors. Our results indicate that investing in companies that manufacture RES components may not be a lucrative business. For managers, this is important information that indicates that RES companies are not profitable or as financially sound relative to companies in the general semiconductor and solid state device manufacturing sector. For investors potentially interested in investing in RES associated companies, our findings demonstrate a need for caution. This should be viewed within the context of achieving RES targets in total energy production. Given government targets for renewable energy, the demand for components for renewable energy production will increase. However, it remains to be seen whether this increase in demand will accompany profits.
The topic chosen by us for analysis is difficult and multi-threaded but it is worth making a foundation for further research. Our research provides a solid basis for further exploration of this data. With this foundation in place, we can continue research in the directions indicated by the reviewer.