Performance Dissimilarities in European Union Manufacturing: The Effect of Ownership and Technological Intensity

: Our paper addresses the relevance of a set of continuous and categorical variables that describe industry characteristics to differences in performance between foreign versus locally owned companies in industries with dissimilar levels of technological intensity. Including data on manufacturing sector performance from 20 European Union member countries and covering the 2009–2016 period, we used the random forests methodology to identify the best predictors of EU manufacturing industries’ a priori classiﬁcation based on two main attributes: ownership (foreign versus local) and technological intensity. We found that EU foreign-owned businesses dominate locally owned ones in terms of size, which gives them an edge in obtaining higher proﬁts, cash ﬂow and investments and coping with higher personnel costs. Furthermore, ownership is a more important differentiator of performance at the industry level than the industry’s technological level. The performance of foreign-owned high-tech manufacturing industry units across the EU is the most heterogeneous compared to the other four categories, indicating particularities linked to technological level, ownership, and even location. Our ﬁndings suggest that multinational enterprises in high-tech industries transfer to eastern EU countries’ activities and processes with lower technological intensity and higher labour intensity, but also that locally owned businesses, even within high-tech industries, have lower technological levels. risk for our model, computed according to [48], are 0.272 (SE: 0.026) for the training set and 0.283 (SE: 0.038) for the test set. They show the proportion of incorrectly classiﬁed observations by the model. The misclassiﬁcation rate was the lowest (0.254) for three trees (33 to 35). Of all variables, Pcostturn used in all three trees, while Empl_en and GI_en were used in two trees. Tree 34, as shown in Figure uses the lowest number of variables (5) for classiﬁcation, with ﬁve non-terminal nodes and six terminal nodes.


Introduction
The March 2000 Lisbon European Council established the objective of making the European Union "the most competitive and dynamic knowledge-based economy in the world, capable of sustainable economic growth with more and better jobs and greater social cohesion" [1]. This ambitious target cannot be reached without the consolidation of an efficient and competitive manufacturing sector, since an economy based only on service industries cannot survive for a long period of time. The ongoing COVID-19 crisis has significantly shaken the manufacturing sector around the world, and it will take years for turnover to return to its precrisis levels. Moreover, the accelerated trend towards the use of Industry 4.0 technologies that include automation, advanced analytics use, and connectivity will transform the manufacturing sector in terms of efficiency and effectiveness, product and service customization to clients' needs, and will lead to the creation of new business models [2].
The manufacturing sector includes diverse activities that occur in a wide range of enterprises, from traditional smaller businesses to large-scale and technology-intensive corporations. The manufacturing sector's contribution to the value added in the EU declined sharply in recent decades, from 19.8% in 1991 to 13.9% in 2009, but reached 14.4% in 2019 [3]. In 2018, more than 32 million people were employed in the manufacturing sector in the EU in 2.1 million enterprises (21.7% of the people in employment and 8.7% of the number of enterprises in the non-financial business economy). In addition, manufacturing was responsible for 64% of private sector research and development (R&D) and 49% of innovation expenditure in Europe in 2018 (European Commission, 2020). Looking back, the competitiveness of the EU economy proved to be remarkably dependent on the ability of the manufacturing sector to deliver high-quality innovative products employing the latest advances in information and communications technology (ICT), and this trend will certainly accelerate in the years after the COVID-19 pandemic.
The last ten years have seen a notable increase in the foreign ownership of EU companies [4], fueled by the globalization process and the expansion of multinational enterprises (MNEs). Researchers have investigated the effects of inward investments, technological opportunities, and spillover effects brought by foreign companies, building on the ownership, location, and internalization (OLI) paradigm of international production articulated by John Dunning [5,6]. The paradigm offers a framework for the decision of multinational enterprises to expand abroad through foreign direct investments, indicating a logical approach from assessing ownership advantages (O) to location-related advantages (L) and internalization advantages (I). From this perspective, a still unsolved problem concerns whether and to what extent foreign-owned firms have superior performance (productivity, profitability, etc.) compared to locally owned firms. In many cases, foreign owners select the best local firms to invest in or local firms in high-productivity industries. Thus, some foreign-owned businesses may benefit from a productivity advantage that is disconnected from ownership.
The general view is that ownership provides an edge for companies, as foreign companies perform significantly better than local ones, although several contributions in the literature have questioned this opinion. Our paper builds on this strand of the literature and provides evidence on the differences in performance between foreign-owned and locally owned companies in the EU-28 manufacturing industry between 2009 and 2016, after the global financial crisis of 2008-2009, but with a focus on how the technological level of industries within the manufacturing sector might intermediate between these differences. Our main research hypothesis is that significant differences exist between the performance of companies with different types of ownership and levels of technological intensity in the manufacturing sector in EU-28. We address the relevance of a set of continuous and categorical variables that describe industry characteristics for the observed differences in performance and contribute to the literature by identifying the best variables that predict the a priori classification of manufacturing industries in a class defined by two attributes: ownership and technological intensity. To our knowledge, this is the first investigation of the manufacturing sector in the EU when both performance differentiators are considered. Gaining insight into manufacturing industries' attributes from both perspectives might represent a starting point for the manufacturing sector's transformation with the support of industrial policies in the post-COVID-19 world.
In this framework, it is worth considering the heightened interest in the increasing role of technological advancements in economic growth and development brought about by the COVID-19 pandemic, when the use of technology became a critical and essential component of our lives. For the EU, recent reports forecasted an impressive growth of the digital economy, which is estimated at adding 1.1% to the EU's annual economic growth and at increasing the region's GDP by more than 14% by 2030 [7], building on the desired position of EU as global leader through the implementation of the Digital Single Market strategy [8]. Starting with the "A Digital Agenda for Europe" series of documentation issued in 2010 [9], the EU has constantly reinforced its willingness to use digitalization and technologies such as the Internet of Things, big data and artificial intelligence as major drivers of innovation-led economic growth. The EU's commitment to the digital economy will also be backed by a planned budget of 7.6 billion Euros within the Digital Europe Programme that aims at accelerating the economic recovery after the pandemic and supporting the digital upscaling of the EU's economies and societies, with a particular focus on small and medium-sized companies [10]. Therefore, it is expected that the development of industries with a strong high-tech orientation will represent a fundamental component of economic advancement in the EU. Moreover, the EU's economies and particularly the eastern ones may use the technological spillover effects induced by foreign direct investments to boost the technological level of local enterprises, which may further foster economic growth [11][12][13]. From this perspective, our paper's genuine contribution is to shed light on the current situation in the manufacturing sector of the EU when the technological level and ownership are considered, which may lay the foundations for a better formulation of industrial policies and of measures towards attracting FDI that are able to support technological advancements.
The rest of this paper is organized as follows. Section 1 offers insight into the research directions and results in the literature. Section 2 presents the data and the research methodology used. The main findings are shown and discussed in Section 3. The last section concludes, discusses the limits of our analysis, and outlines a few directions for future research.

Research Background
Over time, various theoretical frameworks have attempted to assess the main reasons behind superior business performance, and numerous studies have empirically examined the factors behind companies' results. Researchers are constantly revisiting this field, driven by the evolving transformation of economies and business environment.
The starting point in understanding firm performance was the work of Bain and Mason written between 1940 and 1950 [14][15][16] who proposed the industry environment as a performance driver in the structure-conduct-performance framework, as part of the paradigm of industrial organization. Later, Refs. [16,17] supported this view, with the amendment that [16] suggested the firm's capability to use and sustain its competitive advantages as an addition to the performance framework.
Company ownership as a criterion for performance is a factor that is often discussed in the literature. Empirical research attempted to explain whether foreign ownership resulting from foreign direct investments, for example, provides companies with better performance compared to local or domestic ownership. Interestingly, the literature focused on two main directions in responding to this question: the specific advantages of MNEs which accompany foreign ownership, and the liability of foreignness. The well-known OLI paradigm proposed by Dunning [5,6] introduced the "ownership advantage", built on tangible and/or intangible assets of the firm that are transferred abroad, as one of the factors behind the higher competitiveness of MNEs. From here, the literature developed, and various authors advanced improvements by adding industry, size, country of origin, host country level of development, multinationality per se or multinationality level as superior performance drivers associated with foreign ownership [17][18][19][20][21]. On the other hand, the "liability of foreignness", understood as "the cost of doing business abroad that results in a competitive disadvantage for an MNE subunit" [22], in the form of spatial distance, unfamiliarity with the host country, or obstacles and restrictions in conducting business [23], counterbalances foreign ownership benefits. The consequence is a lower profitability of foreign-owned companies against domestic-owned ones, although the sector or industry of operation of the firm, the country of origin, or the specific mode of entry may alter this result [24,25]. In this context, several contributions point out that a differentiation needs to be made between purely foreign companies and affiliates of MNEs when comparing their performance with local companies [26,27].
In a recent study, two main reasons behind the superior performance of foreign-owned firms over locally owned ones were highlighted [28]. The first reason is that only highly productive companies are the origin of foreign investments, and the second is that foreign investors select only well-performing companies to invest in, which links the superior performance to a selection bias. In this context, Ref. [29] discovered that MNEs were more profitable than local corporations in Greece, but foreign ownership did not have a significant impact on business performance in Portugal. At the same time, Ref. [30] found that profitability drivers differ between foreign and local Greek companies. While the profitability of foreign companies mainly depends on their market share, knowledge and experience acquired in the local market, training intensity, and product differentiation using more technologically intensive inputs, the profitability of local companies depends only on market share and product differentiation through local advertising and R&D. Our research complements this line of research by providing insight into the differences between foreign versus locally owned companies within the EU, thus portraying dissimilarities not only at the country level, but also at a wider level, within a region where economic integration for decades now is expected to have significantly changed the performance patterns across countries and industries [31][32][33].
Concerning the manufacturing firms' performance, in the last two decades, the empirical literature has been growing. Previously, empirical research addressed firm-level, industry-level, and macroeconomic determinants of firm performance [34]. The evidence based on firm-level and industry-level determinants outlined a positive and significant link between R&D intensity and productivity of manufacturing firms. Other authors examined the link between innovation and productivity considering manufacturing firms from Finland, Norway, and Sweden, and discovered that R&D and innovation performance were the main determinants of the differences in productivity growth between firms [35]. Later, the positive effects of R&D expenditures, output innovation, investments in physical capital, market share, and export on labour productivity in Spanish manufacturing companies were shown [36]. It has often been demonstrated that R&D activities play an essential role in firms' product or process innovations [37], which further allows us to link a higher degree of technological intensity of an industry to superior performance.
Other factors were also proposed as drivers of performance for manufacturing firms. For instance, Ref. [38] discovered that the previous year's growth, minimum efficient scale, total factor productivity, exports, capital-labour ratio, technology usage, sunk costs, and age had an impact on firms' growth in both SMEs and large manufacturing firms in Iran, but company size and economic particularities of the country also had their role in explaining performance. In the same vein, [34] used firm, industry, and macroeconomic determinants as performance drivers and have shown that age, labour costs, industry concentration, GDP growth, and inflation were significantly influencing the profitability of Croatian companies. Ref. [39] reported that e-business use contributes positively to the performance of manufacturing SMEs in Spain by the instrumentality of organizational innovation.
Regarding the differences between firms from high-and low-tech industries, most researchers investigated them considering the number and type of innovations implemented or how firms handle the process of commercialization. In this framework, Ref. [40] found that low-tech product innovators differed from their high-tech counterparts regarding structure, market orientation, and need for external financing. Furthermore, Ref. [41] suggested that high-tech firms' higher investment propensity in product R&D and low-tech firms' higher investments in process R&D may not be an appropriate approach to innovation for SMEs.
The empirical literature regarding the driving factors of performance of firms in highversus low-tech industries is rather poor to our knowledge. Ref. [42] found that foreign ownership cannot be considered a driving factor for the growth of Canadian high-tech and low-tech firms. Later, Ref. [43] found remarkable differences between innovative and noninnovative Italian firms in terms of profitability and growth rates, particularly when size is considered. In this framework, our paper is one of the pioneering endeavours that aims at a better understanding of the drivers of performance dissimilarities in industries with different technological levels, which may be further used by businesses and governments alike to address the existing profitability and productivity gaps as an avenue towards fostering economic growth and development.

Data
Our investigation covers the 2009-2016 period and includes data on manufacturing sector performance from 20 EU member countries (Austria, Bulgaria, Czech Republic, Finland, France, Germany, Greece, Hungary, Italy, Latvia, Lithuania, the Netherlands, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, and the United Kingdom). Annual frequency data for 11 manufacturing industries were collected from the Foreign Affiliates Statistics Database (FATS) of Eurostat. The number of countries and industries included in the analysis is strictly based on data availability.
The core of our analysis is the "industry unit" (IU), which represents the "average" enterprise in each manufacturing industry. An IU has three attributes: ownership, industry of operation, and country where the IU originates from. In terms of ownership, IUs were classified as foreign (F) or locally owned, and, from the country perspective, we included them in the eastern (E) or western-located groups, depending on their geographical location within the EU. The 11 industries were further classified depending on their technological intensity, into high-technology (high-tech) or low-technology (low-tech) industries (see Table 1), using the EU High-tech classification of manufacturing industries based on NACE Rev.2 2-digit codes. To the best of our knowledge, no previous research has used the "average enterprise" as defined by the three attributes employed in our endeavour, although other authors have dealt with this type of artificial construct [44][45][46]. However, we found this approach more useful instead of building other constructs such as "average employee", given our interest in observing firm performance depending on ownership and technological levels in specific industries. Overall, 440 IUs were included in our investigation, equally divided into foreign versus locally owned (40 IUs per industry). Of them, 280 IUs were in low-tech industries (7 industries), and the remaining 160 IUs were in high-tech industries (4 industries). In regional terms, 242 IUs were western-located, and 198 were eastern-located. In our sample, locally owned IUs held higher shares than foreign-owned IUs in turnover and persons employed, but in some industries, such as C20 or C29, both in the high-tech category, the share of foreign-owned IUs was higher, in the 40-50% range. In addition, foreignowned businesses had higher shares in turnover and persons employed in high-tech versus low-tech industries, in the range of 30-50% for turnover and 25-50% for employment. Figure 1 shows the relative importance of the 11 industries in EU manufacturing in 2016. Their shares in turnover vary between 1.1% (C13 and C18) and 14.6% (C29), and in the number of employees between 2.2% (C18) and 14% (C10). In absolute terms, industry C29 generated the highest turnover in 2016-1,082,642.5 million Euros-and industry C13 generated the lowest-78,000 million Euros in 2016. In the same year, 638,794 persons were employed in industry C18 (the lowest number in our 11 industries), and 4,019,413 persons worked in industry C10. The countries included in our sample cover to a significant extent the industries' turnover and number of employees, which ensures its representativeness; our IUs hold a share in turnover between 89.7% (C10) and 98.6% (C31), and a share of the number of employees between 77.4% (C18) and 98.3% (C29).
industries), and the remaining 160 IUs were in high-tech industries (4 industries). In regional terms, 242 IUs were western-located, and 198 were eastern-located. In our sample, locally owned IUs held higher shares than foreign-owned IUs in turnover and persons employed, but in some industries, such as C20 or C29, both in the high-tech category, the share of foreign-owned IUs was higher, in the 40-50% range. In addition, foreign-owned businesses had higher shares in turnover and persons employed in high-tech versus lowtech industries, in the range of 30-50% for turnover and 25-50% for employment. Figure 1 shows the relative importance of the 11 industries in EU manufacturing in 2016. Their shares in turnover vary between 1.1% (C13 and C18) and 14.6% (C29), and in the number of employees between 2.2% (C18) and 14% (C10). In absolute terms, industry C29 generated the highest turnover in 2016-1,082,642.5 million Euros-and industry C13 generated the lowest-78,000 million Euros in 2016. In the same year, 638,794 persons were employed in industry C18 (the lowest number in our 11 industries), and 4,019,413 persons worked in industry C10. The countries included in our sample cover to a significant extent the industries' turnover and number of employees, which ensures its representativeness; our IUs hold a share in turnover between 89.7% (C10) and 98.6% (C31), and a share of the number of employees between 77.4% (C18) and 98.3% (C29).

Sample Characteristics
The IUs included in our research depict a diverse landscape of performance across EU on the ownership versus technological intensity axis. Table 1 shows the descriptive statistics of the 17 continuous variables based on their 2009-2016 mean values for each IU. Between 2009 and 2016, manufacturing IUs generated a turnover of 20.09 million Euros, value-added of 4.50 million Euros and gross operating profit of 1.77 million Euros at the mean average enterprise level. They have also employed a mean of 77.70 persons per enterprise and paid average personnel costs of 2.73 million Euros. Their mean gross investments reached 0.74 million Euros at enterprise level. When we change the reference to the employee, the turnover reached, on average, 0.19 million Euros, gross operating profits only 0.02 million Euros, and personnel costs 0.03 million Euros (higher than operating profits).
Manufacturing IUs had a labour productivity measured by ALP of 47.41 thousand Euros per person employed and by SWALP of 166.98%. Mean profitability (GOR) was 9.57%, and IUs' mean shares in turnover were 26.75% for value-added, 17.15% for personnel costs, and 4.59% for gross investments. The medians were, with a few exceptions-Perscost_emp, GI_emp, VAturn, and Pcostturn-lower than the corresponding means, which indicates right-skewed distributions (positive skewness). This applies to each of the four categories of IUs. Kurtosis values showed leptokurtic distributions and the presence of "fat tails", for all variables at the enterprise level, for most of the variables at employee

Sample Characteristics
The IUs included in our research depict a diverse landscape of performance across EU on the ownership versus technological intensity axis. Table 1 shows the descriptive statistics of the 17 continuous variables based on their 2009-2016 mean values for each IU. Between 2009 and 2016, manufacturing IUs generated a turnover of 20.09 million Euros, value-added of 4.50 million Euros and gross operating profit of 1.77 million Euros at the mean average enterprise level. They have also employed a mean of 77.70 persons per enterprise and paid average personnel costs of 2.73 million Euros. Their mean gross investments reached 0.74 million Euros at enterprise level. When we change the reference to the employee, the turnover reached, on average, 0.19 million Euros, gross operating profits only 0.02 million Euros, and personnel costs 0.03 million Euros (higher than operating profits).
Manufacturing IUs had a labour productivity measured by ALP of 47.41 thousand Euros per person employed and by SWALP of 166.98%. Mean profitability (GOR) was 9.57%, and IUs' mean shares in turnover were 26.75% for value-added, 17.15% for personnel costs, and 4.59% for gross investments. The medians were, with a few exceptions-Perscost_emp, GI_emp, VAturn, and Pcostturn-lower than the corresponding means, which indicates right-skewed distributions (positive skewness). This applies to each of the four categories of IUs. Kurtosis values showed leptokurtic distributions and the presence of "fat tails", for all variables at the enterprise level, for most of the variables at employee level, as well as for ALP, SWALP and GIturn. This points towards the existence of larger IUs, but with significantly higher labour productivity and propensity towards investment.
The high variation in performance across IUs is also signalled by the minimum and maximum values of variables, as well as by standard deviation. These differences between IUs are better understood when we consider the ownership and the technological intensity of the origin industry. Foreign-owned companies had higher mean turnover, value added, personnel costs, gross investments, employees, and gross operating profits at both the enterprise and employee level compared to locally owned companies. The differences in means are high particularly at the enterprise level-between 7.87 (Perscost_en) and 10.17 times (GOS_en) larger for foreign-owned companies-but also at the employee level, although only up to two times higher for foreign-owned companies. This suggests a higher homogeneity in terms of employment between foreign and locally owned IUs.
Foreign-owned companies' mean SWALP was only 1.11 times higher than locally owned companies', while for the other relative performance variables, locally owned companies recorded slightly better values than foreign-owned companies. Moreover, variables' distributions for locally owned IUs show higher kurtosis than for foreign-owned IUs, indicating the likely presence of larger and better-performing IUs for the former. In relative terms, however, locally owned companies seem to be more profitable, generate a higher value-added share of turnover, invest more (as share of turnover), but allocate a higher proportion of costs and turnover to personnel costs. This implies that locally owned IUs have a higher labour use intensity and/or employ more personnel than needed.
When we use technological intensity as a performance discriminator, the results mirror those for ownership; this time, IUs in high-tech industries record better performance than IUs in low-tech industries at the enterprise and employee level. In contrast, the difference at the enterprise level is significantly smaller for high-tech versus low-tech IUs, compared to foreign versus local ownership-mean values of performance variables at the enterprise level were between 2 and 3 times higher only for high-tech versus low-tech IUs, compared to 7 to 10 times higher for foreign versus local IUs. For low-tech industries, C10 and C22 included the highest IUs at the enterprise and employee level, respectively, by considering the means and medians of variables (29.45 million Euros in C10 and 13.96 million Euros in C22 as mean turnover per enterprise; 7.72 million Euros in C10 and 7.09 million Euros in C22 as median turnover per enterprise). At the other end, the smallest IUs in turnover per enterprise were in industry C18 (mean turnover per enterprise of 5.66 million Euros and median of 1.99 million Euros). In relative performance terms, C10 and C22 recorded the highest means and medians for labour productivity, while C18, C22, and C16 showed better profitability and the highest investment intensity, but also personnel costs. In the case of high-tech industries, C29 and C20 displayed the best means and medians for variables at the enterprise and employee level, respectively. Moreover, C29 had the highest labour productivity, profitability, and share of gross investments in turnover, but the lowest value-added and personnel costs share of turnover. Positive skewness characterizes the distribution of performance variables for high-tech versus low-tech IUs, indicating that more IUs had lower values of variables than the group mean. Kurtosis values suggest, for most variables, leptokurtic distributions for both high-tech and low-tech IUs; the exceptions are mainly for relative performance variables for low-tech IUs.

The Random Forest Methodology
Our research addresses the relevance of a reduced set of continuous and categorical variables that describe business and industry characteristics for differences in performance. We were interested in identifying the best industry-related variables that can predict the a priori classification of an IU in a class defined by two attributes: ownership and technological intensity. We defined four categories based on these attributes as follows: foreign-owned high-tech IUs (Foreign-HT), foreign-owned low-tech IUs (Foreign-LT), locally owned high-tech IUs (Local-LT), and locally owned low-tech IUs (Local-LT).
The variables included in the model are presented in Figure 2. Each IU is described by 17 continuous performance variables (predictors) and one categorical variable (Eastern versus Western EU). There were six aggregate variables at industry level (turnover, value-added, gross operating surplus, personnel costs, gross investments, and number of employees) that we divided by the number of enterprises to obtain variables at the enterprise level (6) and by the number of employees to obtain variables at the employee level (5), in each industry. Additionally, we used six relative performance variables already calculated by Eurostat (ALP, SWALP, and GOR) or calculated by us (VAturn, Pcostturn, and GIturn). by 17 continuous performance variables (predictors) and one categorical variable (Eastern versus Western EU). There were six aggregate variables at industry level (turnover, valueadded, gross operating surplus, personnel costs, gross investments, and number of employees) that we divided by the number of enterprises to obtain variables at the enterprise level (6) and by the number of employees to obtain variables at the employee level (5), in each industry. Additionally, we used six relative performance variables already calculated by Eurostat (ALP, SWALP, and GOR) or calculated by us (VAturn, Pcostturn, and GIturn). Considering the nature of our analysis, we are faced with a classification problem, and one of the best methods to use is a random forest model. Random forests belong to the classification and regression tree category of models, which are wide-used alternatives to the more traditional linear and logistic regression or discriminant analysis [47]. Ref. [48] is the reference for classification and regression trees (CART). CART algorithms sequentially create binary decision trees using the power of predictors (variables) to partition (or split) the data, with the goal of reducing the conditional variation in the dependent variable. Afterwards, cross-validation is used to select the best tree out of the "grown" trees in the process. CART models have the advantage of simplicity, as explaining observations' classification or prediction is made considering the split variables and observations included in categories [49]. Another important advantage of CART-based models, including random forests, is the departure from the implicit linear relationship assumption between the predicted variable and its predictors. Hence, they are better able to identify links between variables that might have otherwise been challenging to spot [50].
Random forests have been employed in several studies to predict the probability that a customer will honour his debt [51], to model travel choice behaviour [52], to predict default patterns [53], to forecast insolvency of insurance companies [54], to predict a household's energy consumption [55], or to predict short-term congested traffic flow [56]. Considering the nature of our analysis, we are faced with a classification problem, and one of the best methods to use is a random forest model. Random forests belong to the classification and regression tree category of models, which are wide-used alternatives to the more traditional linear and logistic regression or discriminant analysis [47]. Ref. [48] is the reference for classification and regression trees (CART). CART algorithms sequentially create binary decision trees using the power of predictors (variables) to partition (or split) the data, with the goal of reducing the conditional variation in the dependent variable. Afterwards, cross-validation is used to select the best tree out of the "grown" trees in the process. CART models have the advantage of simplicity, as explaining observations' classification or prediction is made considering the split variables and observations included in categories [49]. Another important advantage of CART-based models, including random forests, is the departure from the implicit linear relationship assumption between the predicted variable and its predictors. Hence, they are better able to identify links between variables that might have otherwise been challenging to spot [50].
Random forests have been employed in several studies to predict the probability that a customer will honour his debt [51], to model travel choice behaviour [52], to predict default patterns [53], to forecast insolvency of insurance companies [54], to predict a household's energy consumption [55], or to predict short-term congested traffic flow [56].
Random forests consist of many decision trees (or models) that operate as an ensemble or committee, whose advantage relies on the fact that as the trees are uncorrelated, the ensemble will outperform the individual constituent models. The rationale behind this is that as a group, the trees can perform better than individual trees. The algorithm starts at a root node and splits the data into child nodes whose structure looks like an upside-down tree. By using bagging or bootstrap aggregation (a technique which allows each individual tree to randomly sample from the dataset with replacement, thus training on different sets of data) but also feature randomness (a technique which makes individual trees pick from a random subset of features, thus using different variables or features to decide), more variation among the trees in the model is brought together with lower correlation across trees and higher diversification.
As shown in Figure 3, each random forest will predict a different outcome of classes for the same test feature or variable, and a small subset of the forest will look at a random set of features. Let us suppose that out of 1000 random forest trees generated initially, 100 trees predict some unique targets while the remaining trees predict other unique targets. Then, the votes of the first 100 trees are generated out of 100 random decisions and likewise for the rest of the targets. If the first 100 trees return the highest number of votes, it means that the final random forest returns the first target as the predicted target (a process known as majority voting). The same applies to the rest of the targets: if the algorithm predicts that the rest of the targets are similar to the predicted targets, then the high-level decision tree can vote the best variables to predict a certain classification.
tree. By using bagging or bootstrap aggregation (a technique which allows each individual tree to randomly sample from the dataset with replacement, thus training on different sets of data) but also feature randomness (a technique which makes individual trees pick from a random subset of features, thus using different variables or features to decide), more variation among the trees in the model is brought together with lower correlation across trees and higher diversification.
As shown in Figure 3, each random forest will predict a different outcome of classes for the same test feature or variable, and a small subset of the forest will look at a random set of features. Let us suppose that out of 1000 random forest trees generated initially, 100 trees predict some unique targets while the remaining trees predict other unique targets. Then, the votes of the first 100 trees are generated out of 100 random decisions and likewise for the rest of the targets. If the first 100 trees return the highest number of votes, it means that the final random forest returns the first target as the predicted target (a process known as majority voting). The same applies to the rest of the targets: if the algorithm predicts that the rest of the targets are similar to the predicted targets, then the high-level decision tree can vote the best variables to predict a certain classification. The prerequisites for having a model that makes accurate class predictions are features (variables in our case) that ''have at least some predictive power'' so that the prediction offers a better solution than just random guessing but also ensuring that individual trees have low correlations with one another [57]. An important feature of the random forest methodology is the ability to rank predictors according to their internal measure of variable importance, which makes it an easy and friendly tool to use. In addition, predictors that do not add value can be easily spotted and eliminated, which prevents overfitting.
In traditional regression models, multicollinearity (or near-linear dependence) between independent variables or regressors is a well-known critical issue for model estimation and reliability, which needs to be handled before the model is implemented [58]. The prerequisites for having a model that makes accurate class predictions are features (variables in our case) that ''have at least some predictive power" so that the prediction offers a better solution than just random guessing but also ensuring that individual trees have low correlations with one another [57]. An important feature of the random forest methodology is the ability to rank predictors according to their internal measure of variable importance, which makes it an easy and friendly tool to use. In addition, predictors that do not add value can be easily spotted and eliminated, which prevents overfitting.
In traditional regression models, multicollinearity (or near-linear dependence) between independent variables or regressors is a well-known critical issue for model estimation and reliability, which needs to be handled before the model is implemented [58]. Several techniques have been proposed to manage this problem; among the best known are lasso regression [59,60], ridge regression [61], and the elastic net methodology proposed by [62]. In random forest modelling, multicollinearity between predictors is handled through bootstrap and feature sampling, which pick different combinations of variables for the models it builds. Thus, the classification result is not influenced by the multicollinearity between predictors. However, the determination of variable importance for classification of cases is affected by multicollinearity. We chose to run random forests on the whole set of variables, regardless of their correlation, and to consider potential correlation issues when interpreting the results.
To determine the number of variables (predictors) included in the sampling, we used the formula provided by [48] and set the number of predictors at 5. The other model settings were as follows: 70% of observations were used to generate the prediction and 30% of observations for testing; we used a sufficient number of trees (360) to generate the prediction and two different seeds for random number generators to verify the results-we selected the prediction with the lowest risk estimate (or misclassification error) for the test sample; the minimum number of children in a node was 5, and the maximum number of nods was 100; the maximum number of levels was 10 and the maximum tree size was 100. The training was instructed to automatically stop when the percentage decrease in the training error was 3%, i.e., if the training error was not improved by at least 3% given the number of cycles, the training was stopped. The algorithm stopped after generating 270 trees. Misclassification costs were 1 for all categories of the categorical dependent variable. We made scale-adjustment standardization of variables before implementing the model.

Results
When ownership and industry's technological intensity were merged, the results indicate that ownership is a more important differentiator of performance than technological level. Figure 4 shows the boxplots for all variables that help to spot differences between the four categories of IUs. On average, Foreign-HT IUs perform better than Local-HT IUs (in terms of mean, median, minimum, and maximum values). The same result is confirmed for Foreign versus Local-LT IUs, with a significantly higher difference in statistical indicators' values for high-tech versus low-tech industries. Overall, Foreign-HT IUs record the best mean and median performance at enterprise and employee levels, and for SWALP, while Local-LT IUs perform better in relative terms (GOR, GIturn, VAturn, and Pcostturn). This result may be partially explained by the smaller size of Local-LT IUs, which distorts the absolute performance. This is in line with the unreliability of using financial indicators that measure relative performance, such as Internal rate of return (IRR) or Profitability index (PI) for ranking projects of different scales (Brealey et al., 2020). On the other hand, standard deviations show that the performance of Foreign-HT IUs is more diverse compared to the other categories of IUs.
The random forest (RF) model implemented considered the a priori classification of IUs into the four classes depending on their ownership and industry's technological intensity as the categorical dependent variable, the 17 business performance variables as continuous predictors, and region (east-west) as the categorical predictor. Figure 5 presents the model summary for the training and test data samples.
Any prediction has an inherent amount of uncertainty. The risk estimates for our model, computed according to [48], are 0.272 (SE: 0.026) for the training set and 0.283 (SE: 0.038) for the test set. They show the proportion of incorrectly classified observations by the model. The misclassification rate was the lowest (0.254) for three trees (33 to 35). Of all variables, Pcostturn was used in all three trees, while Empl_en and GI_en were used in two trees. Tree 34, as shown in Figure 6, uses the lowest number of variables (5) for classification, with five non-terminal nodes and six terminal nodes.
Several techniques have been proposed to manage this problem; among the best known are lasso regression [59,60], ridge regression [61], and the elastic net methodology proposed by [62]. In random forest modelling, multicollinearity between predictors is handled through bootstrap and feature sampling, which pick different combinations of variables for the models it builds. Thus, the classification result is not influenced by the multicollinearity between predictors. However, the determination of variable importance for classification of cases is affected by multicollinearity. We chose to run random forests on the whole set of variables, regardless of their correlation, and to consider potential correlation issues when interpreting the results.
To determine the number of variables (predictors) included in the sampling, we used the formula provided by [48] and set the number of predictors at 5. The other model settings were as follows: 70% of observations were used to generate the prediction and 30% of observations for testing; we used a sufficient number of trees (360) to generate the prediction and two different seeds for random number generators to verify the results-we selected the prediction with the lowest risk estimate (or misclassification error) for the test sample; the minimum number of children in a node was 5, and the maximum number of nods was 100; the maximum number of levels was 10 and the maximum tree size was 100. The training was instructed to automatically stop when the percentage decrease in the training error was 3%, i.e., if the training error was not improved by at least 3% given the number of cycles, the training was stopped. The algorithm stopped after generating 270 trees. Misclassification costs were 1 for all categories of the categorical dependent variable. We made scale-adjustment standardization of variables before implementing the model.

Results
When ownership and industry's technological intensity were merged, the results indicate that ownership is a more important differentiator of performance than technological level. Figure 4 shows the boxplots for all variables that help to spot differences between the four categories of IUs. On average, Foreign-HT IUs perform better than Local-HT IUs (in terms of mean, median, minimum, and maximum values). The same result is confirmed for Foreign versus Local-LT IUs, with a significantly higher difference in statistical indicators' values for high-tech versus low-tech industries. Overall, Foreign-HT IUs record the best mean and median performance at enterprise and employee levels, and for SWALP, while Local-LT IUs perform better in relative terms (GOR, GIturn, VAturn, and Pcostturn). This result may be partially explained by the smaller size of Local-LT IUs, which distorts the absolute performance. This is in line with the unreliability of using financial indicators that measure relative performance, such as Internal rate of return (IRR) or Profitability index (PI) for ranking projects of different scales (Brealey et al., 2020). On the other hand, standard deviations show that the performance of Foreign-HT IUs is more diverse compared to the other categories of IUs.  The random forest (RF) model implemented considered the a priori classification of IUs into the four classes depending on their ownership and industry's technological intensity as the categorical dependent variable, the 17 business performance variables as continuous predictors, and region (east-west) as the categorical predictor. Figure 5 presents the model summary for the training and test data samples. Any prediction has an inherent amount of uncertainty. The risk estimates for our model, computed according to [48], are 0.272 (SE: 0.026) for the training set and 0.283 (SE: 0.038) for the test set. They show the proportion of incorrectly classified observations by the model. The misclassification rate was the lowest (0.254) for three trees (33 to 35). Of all variables, Pcostturn was used in all three trees, while Empl_en and GI_en were used in two trees. Tree 34, as shown in Figure 6, uses the lowest number of variables (5) for classification, with five non-terminal nodes and six terminal nodes. The predictor importance graph shows the importance ranking on a scale from 0 to 1 for each predictor variable included in the model. Predictor importance is calculated by totalling the decline in node impurity and dividing it by the largest sum identified over all predictors. In our model, based on [63], the sum for all predictors and over all nodes is calculated, and not only for split variables. This method avoids the possibility that a variable that is not used by the model as a split variable can be ignored and unidentified as important, as is the case with the method of [48].  Any prediction has an inherent amount of uncertainty. The risk estimates for our model, computed according to [48], are 0.272 (SE: 0.026) for the training set and 0.283 (SE: 0.038) for the test set. They show the proportion of incorrectly classified observations by the model. The misclassification rate was the lowest (0.254) for three trees (33 to 35). Of all variables, Pcostturn was used in all three trees, while Empl_en and GI_en were used in two trees. Tree 34, as shown in Figure 6, uses the lowest number of variables (5) for classification, with five non-terminal nodes and six terminal nodes. The predictor importance graph shows the importance ranking on a scale from 0 to 1 for each predictor variable included in the model. Predictor importance is calculated by totalling the decline in node impurity and dividing it by the largest sum identified over all predictors. In our model, based on [63], the sum for all predictors and over all nodes is calculated, and not only for split variables. This method avoids the possibility that a variable that is not used by the model as a split variable can be ignored and unidentified as important, as is the case with the method of [48]. The predictor importance graph shows the importance ranking on a scale from 0 to 1 for each predictor variable included in the model. Predictor importance is calculated by totalling the decline in node impurity and dividing it by the largest sum identified over all predictors. In our model, based on [63], the sum for all predictors and over all nodes is calculated, and not only for split variables. This method avoids the possibility that a variable that is not used by the model as a split variable can be ignored and unidentified as important, as is the case with the method of [48].
The most important variable that predicts IU classification into the four categories based on ownership and industry technological intensity (see Figure 7) is the number of persons employed per enterprise, followed by the five variables at enterprise level-of them, turnover per enterprise has the highest importance (0.888) and personnel costs per enterprise the lowest importance (0.800). They were followed by personnel costs per employee (0.793), ALP (0.714), and the remaining variables at the employee level-the lowest importance is for GI per employee, 0.654). Remarkably, the relative performance predictors included in the analysis performed poorer than predictors at the enterprise and employee level for IU classification (except for ALP), but their importance is still above 0.50. Instead, region seems to play an insignificant predictive role when distinguishing between manufacturing IUs on the ownership-technology axis. enterprise the lowest importance (0.800). They were followed by personnel costs per employee (0.793), ALP (0.714), and the remaining variables at the employee level-the lowest importance is for GI per employee, 0.654). Remarkably, the relative performance predictors included in the analysis performed poorer than predictors at the enterprise and employee level for IU classification (except for ALP), but their importance is still above 0.50. Instead, region seems to play an insignificant predictive role when distinguishing between manufacturing IUs on the ownership-technology axis. The classification results are presented in Figure 8 and Table 2, that show the percentage of observed versus predicted IUs in each of the four categories for the test set and for the whole sample. The classification results are presented in Figure 8 and Table 2, that show the percentage of observed versus predicted IUs in each of the four categories for the test set and for the whole sample.
ployee level for IU classification (except for ALP), but their importance is still above 0.50. Instead, region seems to play an insignificant predictive role when distinguishing between manufacturing IUs on the ownership-technology axis. The classification results are presented in Figure 8 and Table 2, that show the percentage of observed versus predicted IUs in each of the four categories for the test set and for the whole sample.  In the test sample, the model classifies the best Foreign-LT and Local-LT IUs; the percentages of correct classification are 84.85% and 83.33%, respectively. High-tech IUs are less well classified, with percentages of correct classification of only 65.00% for Local-HT and 40.00% for Foreign-HT IUs. Of all categories, Foreign-HT IUs had the highest misclassification percentage, as 60.00% of them are classified by the model as Foreign-LT IUs. At the other end, the lowest (and above zero) misclassification rate is encountered for Foreign-LT IUs, in which case 3.03% were predicted as Local-LT IUs.
Besides misclassification percentages, which reflect the power of variables included in the model to predict the observed category for each IUs, identifying the incorrectly classified IUs might offer further insight into the idiosyncrasies of manufacturing industries' performance in the EU. We further present and discuss the results based on the test sample. The 15 Foreign-HT IUs that were classified by the model as Foreign-LT IUs (40% of the total) are mostly eastern-located (9 from eastern EU countries versus 6 from western EU countries) and operate in three industries: C27, C28 and C29. One IU from the Local-HT category is seen by the model as sharing similar characteristics to Foreign-HT IUs (from C20 and Germany, thus western-located IU). Six IUs from the same category were classified as low-tech, either foreign (2 IUs in C27 and C29, one from the western EU and one from the eastern EU), or local (in C27, C28 and C29, two from the western EU and one from the eastern EU). Moving to the Foreign-LT category, there were 5 misclassified IUs by the model (15.15%), of which 3 were misclassified as Foreign-HT (in C10 and C18, all from Western-located EU countries), one as Local-LT (in C31, from an Eastern-located country), and one as Local-HT (in industry C16, from a western-located country). Lastly, 10 IUs included in the Local-LT category were seen by the model as being more like Foreign-LT IUs (2 IUs from C10 and C25, one from the western EU and one from the eastern EU) or Local-LT IUs (8 IUs from C10, C22 and C25, two from the eastern EU and 6 from the western EU). Overall, the model correctly assigned all IUs in industries C13 (low-tech) and C20 (high-tech), and the highest misclassification rates were found for C27 (high-tech; 75% of the IUs), C22 (low-tech; 45.45%), C28 (high-tech; 41.6%), and C29 (high-tech; 40%). When observing the misclassified IUs by their ownership, more foreign-owned IUs were misclassified compared to locally owned ones (20 versus 17), but the highest proportion of misclassified foreign-owned IUs were from high-tech industries (15 out of 20), while the highest proportion of misclassified locally owned IUs are from low-tech industries (10 versus 7). Moreover, when the region was considered, 23 misclassified IUs were western-located and 14 were eastern-located. Most eastern-located IUs (9 out of 14) were misclassified by the model as Foreign-LT, against their a priori classification of Foreign-HT. For western-located IUs, 7 out of 23 are incorrectly classified as Local-HT, despite their a priori classification as Local-LT, and 6 out of 23 were misclassified as Foreign-LT, although they were Foreign-HT IUs in the a priori classification.
The gains chart for each category is presented in Figure 9. The chart shows the percentage of observations that are correctly classified (red curves), linked to the top percentage of cases in each category (represented on the horizontal axis). The gains chart associated with a good model is further away from the baseline random classification of the cases (the straight blue line in Figure 9), which may be interpreted as the classification produced by tossing a coin. Our model performs significantly better than random classification for all categories of IUs, indicated by the overall ascending cumulative curves.

Conclusions
Our research proposed a new approach to the study of industry performance, supported by the random forests methodology, aimed at identifying the best predictors of EU-28 manufacturing industries' a priori classification based on two main attributes: ownership (foreign versus local) and technological intensity. An understanding of performance from both perspectives offers insight into industries' idiosyncrasies and opens the door for further research in the field.
EU foreign-owned businesses dominate locally owned ones in terms of size, which gives them an edge for higher profits, cash flows, and investments, on one hand, but also allows them to pay higher personnel costs. Locally owned businesses fare better in relative terms, as indicated by their higher value-added and investment share of turnover. Instead, the later have higher shares of personnel costs, which implies a more intensive or less efficient use of labour. High-tech industries record better performance than their low-tech counterparts, but the differences between them are significantly smaller compared to ownership as a discriminator, which makes ownership a more important differentiator of performance than the industry's technological level. Following on from here, the industry units that are foreign-owned and in the high-tech category have shown the best performance at the enterprise and employee level, while the locally owned low-tech industry units had the best performance in relative terms. On the other hand, the performance of foreign-owned high-tech manufacturing IUs across the EU is very heterogeneous, indicating particularities linked to technological level, ownership and even location that deserve to be further investigated. The best predictors of ownership-technological level classification are performance variables at the enterprise level and their importance is reinforced by the positive correlations between them. They are followed by variables at employee level and labour productivity. This result confirms that differences in size between industry units matter highly for their classification.

Conclusions
Our research proposed a new approach to the study of industry performance, supported by the random forests methodology, aimed at identifying the best predictors of EU-28 manufacturing industries' a priori classification based on two main attributes: ownership (foreign versus local) and technological intensity. An understanding of performance from both perspectives offers insight into industries' idiosyncrasies and opens the door for further research in the field.
EU foreign-owned businesses dominate locally owned ones in terms of size, which gives them an edge for higher profits, cash flows, and investments, on one hand, but also allows them to pay higher personnel costs. Locally owned businesses fare better in relative terms, as indicated by their higher value-added and investment share of turnover. Instead, the later have higher shares of personnel costs, which implies a more intensive or less efficient use of labour. High-tech industries record better performance than their low-tech counterparts, but the differences between them are significantly smaller compared to ownership as a discriminator, which makes ownership a more important differentiator of performance than the industry's technological level. Following on from here, the industry units that are foreign-owned and in the high-tech category have shown the best performance at the enterprise and employee level, while the locally owned low-tech industry units had the best performance in relative terms. On the other hand, the performance of foreign-owned high-tech manufacturing IUs across the EU is very heterogeneous, indicating particularities linked to technological level, ownership and even location that deserve to be further investigated. The best predictors of ownership-technological level classification are performance variables at the enterprise level and their importance is reinforced by the positive correlations between them. They are followed by variables at employee level and labour productivity. This result confirms that differences in size between industry units matter highly for their classification.
Besides size, our results show that a business' reality than is more diverse than suggested by the ownership-technological intensity framework. On the one hand, they point towards the presence of significant idiosyncrasies at industry level, which need to be addressed from other perspectives and using additional variables. On the other hand, location plays a role for performance differentiation, particularly through the activities of MNEs in high-tech industries that transfer in Eastern EU countries activities and processes with lower technological intensity and higher labour intensity, while keeping in origin or more developed countries activities with higher levels of incorporated technology. At the same time, our findings imply that across the EU, but mostly in eastern countries, locally owned businesses, even within high-tech industries, are companies with lower technological levels.
Our results are relevant in terms of a wider understanding of the importance of technological advancements and foreign direct investments for economic growth and development and the formulation, in this framework, of economic policies. As the current pandemic demonstrates, technology plays a ubiquitous role in our lives and its role in economic progress will continue to rise. Moreover, the European Union's strategic competitive orientation towards digitalization accompanied by considerable funding in the Digital Europe Programme within the 2021-2027 Multiannual Financial Framework will certainly propel a sustained effort of businesses towards enhancing their technological level, hopefully mostly in the low-tech manufacturing industries. Given the important role played by foreign enterprises in the transfer of technology to local enterprises through horizontal and vertical business relationships, EU countries and particularly Eastern ones where the need for the Manufacturing sector transformation is demanding should adopt and promote measures to attract foreign direct investments aimed at supporting this changeover. Moreover, solid economic policies should be directed towards improving local firms' absorption capacities as a pre-requisite for positive technological spillover. Furthermore, the increased technological level of the manufacturing sector may mitigate one of the most challenging problems that we face, i.e., environmental degradation. Coupled with the positive impact of foreign direct investments on carbon emissions [64], there is plenty of room for central and local authorities to encourage foreign investments that support both an upgrading of the technological level in manufacturing industries and a decline in pollution.
There are a few limitations of our research, which we intend to address in future research on the topic. One limit refers to the FATS database, which includes under "foreign ownership" only companies where foreign investors hold a minimum of 50% of the affiliate capital. This might lead to distortions in the results, as enterprises that have a lower foreign investment share are classified as "locally owned", despite the significant presence of foreign ownership. Correcting for this limitation requires data at company level that will also open the possibility for including more performance variables in the analysis. The number and nature of industries included in the analysis represent another source of limitation. This was due only to data availability, and we hope that this will be remedied in the future. Furthermore, the business performance at employee level and considering different positions in the industry-i.e., the best or the worst performers-may also form the subject of further research, as well as the impact of economic integration with EU for business performance.  Data Availability Statement: All data used in this research is publicly available from Eurostat, the European Commission database: https://ec.europa.eu/eurostat/data/database (accessed on 10 August 2020).