Do We Consider Sustainability When We Measure Small and Medium Enterprises’ (SMEs’) Performance Passing through Digital Transformation?

: Small-medium enterprises (SMEs) represent 90% of business globally. Digital Transformation (DT) affects SMEs differently from larger companies because although SMEs have more ﬂexibility and agility for adapting to new circumstances, they also have more limited resources and specialization capabilities. Thus, it is fundamental to measure SMEs’ performance considering different perspectives. Here, we describe and analyze the state-of-the-art of DT in SMEs, focusing on performance measurement. We center on whether the tools used by SMEs encompass the triple bottom line of sustainability (i.e., environmental, social, and economic aspects). To do so, in December 2021, we performed a comprehensive systematic literature review (SLR) on the Web of Science and Scopus. In addition, we also explored a novel approach for SLR: topic modeling with a machine learning technique (Latent Dirichlet Allocation). The differences and interchangeability of both methods are discussed. The ﬁndings show that sustainability is treated as a separate topic in the literature. The social and environmental aspects are the most neglected. This paper contributes to sustainable development goals (SDGs) 1, 5, 8, 9, 10, and 12. A conceptual framework and future research directions are proposed. Thus, this paper is also valuable for policymakers and SMEs switching their production paradigm toward sustainability and DT.


Introduction
Organizations confront a considerable number of challenges to their business operations. One of their initiatives to be competitive is adopting new technologies, which implies the emergence of the digital economy. Digital transformation (DT) has globally changed business practices and organizational culture [1]; it breaks boundaries, challenging the enterprises' competitiveness [2,3].
In this context, small-and medium-sized enterprises (SMEs) deserve specific attention as they represent a significant share of global business. SMEs account for 90% of all firms and 50% of employment globally [4]. Additionally, SMEs have inherent characteristics that differentiate them from larger companies [5,6]. For example, they tend to be less productive and pollute more [7]. Moreover, SMEs tend to have more flexibility and agility for adapting to new circumstances, more limited resources, and specialization capabilities [8]. These characteristics mirror SMEs' performance while facing the DT process [5,6]. Furthermore, SMEs require specific dimensions, variables, and mathematical tools for measuring their performance.

Small and Medium Enterprises (SMEs)
There is no globally standardized definition of small and medium enterprises (SMEs). The most common classifications are based on a financial measure and/or the number of employees. Even the same country may have different definitions, depending on the industry. For example, in the USA, a "small enterprise" in the "Agriculture, forestry, fishing, and hunting" industry is based on annual income for all subindustries, except for the logging subindustry. In the logging subindustry, a "small enterprise" is an enterprise with less than 500 employees [17].
Another example is Brazil and Chile, two countries on the same continent, adopting standardized definitions independently of the SME industry. In both cases, they use financial and employee criteria. In Brazil, SMEs have from 20 to 249 employees. However, the more usual definition in Brazil is based on the annual income criterion according to the Statute of Micro and Small Enterprises. In this case, SMEs have a yearly income from BRL 360,000 (Brazilian currency) to BRL 3,600,000, except for SMEs in the banking sector that follow a different definition [18]. In Chile, an SME is defined as "an enterprise with 10 to 199 workers" or "an enterprise whose annual income from sales and services and other business activities is greater than 2400 UF (Chilean currency, automatically inflation corrected), but less than 100,000 UF in the last calendar year" [19]. Similar conflicts among definitions also occur among smallholder farmers and agricultural SMEs [20].
In summary, the definitions based on the number of employees are usually different in terms of number. Furthermore, some definitions may consider temporary employees, such as Japanese SMEs [21]. What is more, the definition based on financial terms is usually determined by local law, established in terms of a local currency value at the date of the law approval, without any inflationary consideration. Hence, to be comparable definitions from different countries, it may be necessary to correct inflation, convert currency, and make the definitions represent similar economic importance to each analyzed economy. To the best of our knowledge, this kind of procedure is not yet established in the literature.
The World Trade Organization [22] highlights that, in general, the lower productivity is often attributed to small businesses' inability to achieve economies of scale, difficulties they face in accessing credit or investment, lack of appropriate skills, and their informality [22]. Additionally, there are some characteristics that may affect SMEs' performance, such as industry, management, technology, technical competence in marketing and innovation [23], the level of internationalization [24], and ownership [1]. In other words, even assuming the same definition, SMEs are usually heterogeneous, and heterogeneity affects performance [25]. Specifically, in the case of SMEs in Europe, findings suggest that heterogeneity hinders the transition to the TBL of sustainability once this transition de-Sustainability 2023, 15, 4917 4 of 30 mands capabilities and capacities that are asymmetric among SMEs [11]. Heterogeneity also may influence the adoption of innovation by SMEs in Brazil [26].
Here, we assumed that all papers that used the acronym "SME" with the explanation "small and medium enterprises", "small-and medium-sized enterprises", or "smallmedium enterprises" were referring to a comparable term. Furthermore, we considered that the term "SME" could encompass micro and self-employed enterprises.

Digital Transformation (DT)
After reviewing and analyzing 134 well-received definitions of DT, Gong and Ribiere (2021) [27] posed the following definition: "Digital transformation is a fundamental change process, enabled by the innovative use of digital technologies accompanied by the strategic leverage of key resources and capabilities, aiming to radically improve an entity and redefine its value proposition for its stakeholders.". In this context, an entity may be an organization, a business network, an industry, or a society. To the best of our knowledge, there is no definition of DT to SMEs as a separate entity. Thus, we adopted the definition of Gong and Ribiere (2021) [27].
Therefore, we considered DT as a synonym of "digitalization", "digital transition", and "digital innovation". However, we did not consider "digitization" and "Industry 4.0 (4.0)" as synonyms of DT. Digitization is the conversion of analog information to digital. Activities are not made more valuable by digitization. Usually, digitization is used to describe the process of digitizing internal and external procedures [28]. Although sometimes mentioned as a synonym of I4.0, the concept of DT stresses the implications for strategy and business model innovation and underlines the emerging technologies in the business model, and, in turn, the rise of cross-industry ecosystems [29]. Furthermore, the term I4.0 is mainly related to the DT process in the manufacturing sector [29]. For example, Sassanelli et al. (2020) [30] proposed a holistic methodology to evaluate a manufacturing company passing through the digitization process in terms of the level of digital and lean maturity. Readers interested in a more conceptual understanding may refer to [31].
However, when we consider a multi-industry perspective, the average size of enterprises in the service industry is typically smaller than in manufacturing [32]. Additionally, the birth rates of employer enterprises are higher in the services industry than in manufacturing [32]. Between 2008 and 2014, the employment rate in manufacturing decreased in most countries of the Organization for Economic Co-operation and Development (OECD), except in Germany and Luxemburg [33]. Therefore, we adopted the term "digital transformation" due to its industrial range and the emphasis on business model innovation and business strategy. However, we accepted papers that used the terms "Industry 4.0" or "digitization", only if the authors also jointly used the terms "digital transformation", "digital transition", "digital innovation", or "digitalization".

Triple Bottom Line (TBL)
The concept of sustainability is commonly based on three aspects: economic, social, and environmental-also named the "triple bottom line" (TBL), as proposed by Elkington (1994Elkington ( , 1998 [34,35]. The author encouraged organizations to measure their performance using a multidimensional perspective that integrates not only the traditional indicators but also includes environmental and social aspects. However, sustainability performance measurement is a challenge because there is no universal standard for the calculation of sustainable TBL performance [36,37]. In this perspective, as stated by Santos et al. (2019) [38], the way to measure, obtain, and analyze the appropriate environmental information can be a huge challenge for organizations.
The Circular Economy (CE) is a field related to the TBL. Although also very relevant, the concept of the CE differs from the TBL because the CE is a system-level solution framework focused on addressing resource issues (e.g., pollution, waste, biodiversity loss, and climate change) [39]. Many advancements have been achieved through the CE. For exam-ple, a conceptual data model to standardize and structure data in circular manufacturers has been proposed [40]; an SLR was made to identify the relevant information and data required to support manufacturers transitioning to the CE [41]; and an SLR was made that integrated the circular supply chain, the CE, and I4.0 [42]. The latter [42] considered TBL among the dimensions of analysis. The authors identified 19 articles encompassing TBL (from a sample of 198 papers). They concluded that 60% of the papers that considered TBL neglected the social aspect. The authors also emphasized that this result agrees with a previous paper [43]. They emphasized that the social aspect is the least investigated in supply chains and that, even when it is, the analysis is typically skimpy. In summary, the concept of the CE is focused on resources (more emphasis on economic and environmental aspects), while TBL encompasses and equally emphasizes the social aspect.
After the COVID-19 pandemic, the TBL performance became even more relevant [44]. Preliminary investigations showed that SMEs were the most affected by the pandemic and faced more difficulties from interrupting their operations. This may have caused long-term liquidity problems and affected the maintenance of jobs [45]. SMEs account for 50% of employment globally [4]. Specifically, SMEs are responsible for most female jobs [46]. During the SLR filtering process, we classified the papers based on how they approached sustainability's three aspects (TBL).

Manual SLR
This study adopts the systematic literature review (SLR) method to describe and analyze SMEs' current tools for performance measurements, specifically the mathematical tools (as well as their respective dimensions and variables) for measuring the process of DT simultaneously with the three aspects of sustainability (environmental, social, and economic). The SLR method has been considered a replicable, scientific, and transparent literature review approach that minimizes bias. It is an iterative process for identifying the extant literature about some research topics [47].
The five main steps were based on the recommendations by Tranfield et al. (2003) [47] and Moher et al. (2009) [48]. These steps were (i) Research question formulation; (ii) Search strategy; (iii) Selection and Evaluation of relevant studies; (iv) Analysis and synthesis of results; (v) Reporting the review. The full SLR process is summarized in Figure 1, which illustrates the review protocol to provide transparency to the process. In the first step, we defined the main research question (RQ) and the two secondary research questions (SRQs). They were presented in the Introduction section. Further details about the review protocol can be found in [31].
Scopus and Web of Science were chosen because they have international and wide coverage, and they are regularly updated [49]. Further justification for the dataset choice can be found in [31]. We developed strings to cover a few keywords related to the constructs from each RQ. Some keywords were identified among the three themes after a preliminary review of DT, SMEs, performance measurement, and sustainability. The final search strings were defined after running tests to ensure reliable searches. The strings can be found in [31].
Once the search strings were defined, we established the criteria for the inclusion and exclusion of papers. The search was conducted in September 2021 and repeated in December 2021. After the search, the results were exported and converted as files from the StArt software. The StArt was used during Selection (Filter 1) and Extraction (Filter 2) ( Figure 1). Scopus and Web of Science were chosen because they have international and wide coverage, and they are regularly updated [49]. Further justification for the dataset choice can be found in [31]. We developed strings to cover a few keywords related to the constructs from each RQ. Some keywords were identified among the three themes after a preliminary review of DT, SMEs, performance measurement, and sustainability. The final search strings were defined after running tests to ensure reliable searches. The strings can be found in [31].
Once the search strings were defined, we established the criteria for the inclusion and exclusion of papers. The search was conducted in September 2021 and repeated in December 2021. After the search, the results were exported and converted as files from the StArt software. The StArt was used during Selection (Filter 1) and Extraction (Filter 2) ( Figure  1).
The selection and evaluation steps consisted of three filters. The first filter consisted of reading the title and abstract of each paper found in the search, eliminating duplicated papers and papers that did not meet the inclusion criteria ( Figure 1). Two independent reviewers executed all filters, double-checking the filter results until they agreed.
In the extraction step (Filter 2), the reviewers read the introduction and conclusion of the selected papers and started qualifying them. This filter was performed because, in Filter 1, the retained papers could have doubtful relevance to the interest area. If, after The selection and evaluation steps consisted of three filters. The first filter consisted of reading the title and abstract of each paper found in the search, eliminating duplicated papers and papers that did not meet the inclusion criteria ( Figure 1). Two independent reviewers executed all filters, double-checking the filter results until they agreed.
In the extraction step (Filter 2), the reviewers read the introduction and conclusion of the selected papers and started qualifying them. This filter was performed because, in Filter 1, the retained papers could have doubtful relevance to the interest area. If, after reading the introduction and conclusion, a paper proved not to be pertinent, it was excluded with the justification registered in the StArt software. In this step, the papers were also classified by answering 11 questions, as stated in the SLR Protocol [31]. In this filter, the questions were used to classify the selected studies and evaluate their importance for the research. Further information can be found in [31].
Data were coded in the content analysis step following the basic requirements proposed by Barnes et al. (2022) [50,51]. Specifically, a codebook was written, establishing a code for each aspect of TBL and a code for DT. Both reviewers independently read the papers and coded the text content. To determine whether a paper considers a certain aspect of TBL, it was necessary that both reviewers read it and had a consensus about coding it with the same aspects of TBL, and there were subcodes for variables and dimensions. The software ATLAS. TI was used to execute the code procedures. This software supports the organization of ideas and concepts, and it helps to cluster the tools, dimensions, and variables for measuring the impacts of the TBL and DT on the performance of SMEs.
After the papers were coded, we answered the research questions and identified the tools, dimensions, and variables for measuring the impacts of DT on the performance of SMEs.

LDA-Based SLR
Conducting a manual SLR is a time-consuming, laborious, and costly effort. Consequently, automated techniques in SLRs have increased [52]. Dinter et al. (2021) [52] analyzed 41 papers that propose an automated or semi-automated approach for SLRs. The authors concluded that selecting primary papers is the most automated stage in SLRs. However, topic modeling is still rarely applied to select primary papers in an exploratory literature review [13,15].
The LDA is a state-of-the-art [13] and the most used [53][54][55] topic modeling technique [53][54][55]. The LDA is a probabilistic method that extracts topics from a collection of papers. A topic is a distribution of terms (words) over a fixed vocabulary. The semantics and meaning of the sentences are not evaluated. However, LDA analyzes the terms in each paper and calculates the joint probability distribution between the observed (terms in the paper) and the unobserved (the hidden structure of topics) [13].
It is essential to highlight that, in general, topic modeling works best with large volumes of text data. However, the minimum number of papers required for applying topic modeling in an SLR can vary depending on the research question, the literature's nature, and the review's goals [13]. Some authors have successfully used topic modeling with relatively small datasets, depending on the research question and the scope of the review. For example, Saha (2021) [15] applied Latent Dirichlet Allocation (LDA) to 948 papers on game theory in the management literature, Asmussen and Møller (2019) [13] applied LDA to 650 papers on lean manufacturing in the management literature, Nguyen et al. (2023) [56] applied LDA to 108 papers on blockchains applications in supply chain management, and Queiroz et al. (2022) [14] applied LDA to 92 papers on DT and lean philosophy applied to SMEs.
Determining the number of topics (k) is a crucial parameter to guarantee the quality of the LDA results. Once this LDA approach is unsupervised, we do not know the relationship between the papers before the model is executed. Calculating the perplexity is normally used as cross-validation to estimate an adequate number of topics [13]. Additionally, it can be used as an indicator that the number of papers in the dataset reached the minimum threshold. Perplexity is a metric used to evaluate language models, where a low score indicates a better generalization. Lowering the perplexity is equivalent to maximizing the overall probability of papers being on a topic. Choosing the right number of topics is the art of balancing the right number while keeping the perplexity at the lowest possible level [13].
For example, Figure 2 shows the perplexity of LDA models applied to all searched papers. The line graph shows how average perplexity decreases with the increasing number of topics. In other words, the model fits better as the number of topics increases. As an illustration, a fit with k = 15 topics may be interesting because it is in a region where perplexity is decreasing, and it is the configuration with maximum discriminatory power, concentrating 48 of the 91 (52%) manually accepted papers on 7 topics (1,3, 7, 8, 10, 12, and 14).
Like Queiroz et al. (2022) [14], we adopted the framework proposed by Asmussen and Møller (2019) [13], followed by an onion approach. Due to the limited number of papers, it was observed that in some cases, one topic had many more papers than others. Furthermore, a topic with a high concentration of papers presents a list of words that encompass many different themes, i.e., an obstacle to adequate paper segregation. In such cases, the onion approach refers to the solution of running the LDA only with the papers on the concentrated topic and repeating this loop until the result does not present a concentration of papers on any topic. The first onion layer had 91 papers. The second onion layer had 73 papers. In both cases, the best fit was k = 7. The third onion layer had 64 papers. The fourth onion layer had 37 papers. In both cases, the best fit was k = 5. number of topics. In other words, the model fits better as the number of topics increases. As an illustration, a fit with k = 15 topics may be interesting because it is in a region where perplexity is decreasing, and it is the configuration with maximum discriminatory power, concentrating 48 of the 91 (52%) manually accepted papers on 7 topics (1,3, 7, 8, 10, 12, and 14). Like Queiroz et al. (2022) [14], we adopted the framework proposed by Asmussen and Møller (2019) [13], followed by an onion approach. Due to the limited number of papers, it was observed that in some cases, one topic had many more papers than others. Furthermore, a topic with a high concentration of papers presents a list of words that encompass many different themes, i.e., an obstacle to adequate paper segregation. In such cases, the onion approach refers to the solution of running the LDA only with the papers on the concentrated topic and repeating this loop until the result does not present a concentration of papers on any topic. The first onion layer had 91 papers. The second onion layer had 73 papers. In both cases, the best fit was k = 7. The third onion layer had 64 papers. The fourth onion layer had 37 papers. In both cases, the best fit was k = 5.
We investigated the possibility of applying the framework of Asmussen and Møller (2019) [13] as a tool for validating the manual SLR and providing new insights. The papers that LDA used to build the topics in each onion layer were compared to the papers manually selected. LDA-based selection rejected 30 papers in Filter 1 (instead of 47 from the manual selection, i.e., 64% of the manual SLR rejected sample). Additionally, for technical issues already pointed out in the literature [13,57] and to be overpassed in future methodological developments, the proposed LDA algorithm did not read all the papers accepted by Filter 2 and 3. The algorithm read 91 of the 113 papers (81%). The algorithm used 58 of the 74 accepted by Filter 2 (78%) and 33 of the 35 accepted by Filter 3 (94%). We assumed that this level was already enough for a valid comparison and interpretation of results from both methods (manual SLR and LDA-based SLR).

Research Profiling
After the selection and extraction steps, we manually profiled 35 papers. As shown in Figure 3, although there were no time restrictions, the oldest paper was published in We investigated the possibility of applying the framework of Asmussen and Møller (2019) [13] as a tool for validating the manual SLR and providing new insights. The papers that LDA used to build the topics in each onion layer were compared to the papers manually selected. LDA-based selection rejected 30 papers in Filter 1 (instead of 47 from the manual selection, i.e., 64% of the manual SLR rejected sample). Additionally, for technical issues already pointed out in the literature [13,57] and to be overpassed in future methodological developments, the proposed LDA algorithm did not read all the papers accepted by Filter 2 and 3. The algorithm read 91 of the 113 papers (81%). The algorithm used 58 of the 74 accepted by Filter 2 (78%) and 33 of the 35 accepted by Filter 3 (94%). We assumed that this level was already enough for a valid comparison and interpretation of results from both methods (manual SLR and LDA-based SLR).

Research Profiling
After the selection and extraction steps, we manually profiled 35 papers. As shown in Figure 3, although there were no time restrictions, the oldest paper was published in 2016 [5]. Almost 80% (27 papers) were published in the last two years (2020-2021), and one paper was accepted to be published in 2022. This highlights the constantly growing scholarly interest in this field of study. The 35 papers were published in 27 journals, indicating that there is still no major consolidated source about the theme. Most of these are leading journals once they are indexed in Journal Citation Report (JCR)-listed journals and the Chartered Association of Business Schools (CABS) journals' ranking list. The following journals published more than one paper: Journal of Business Research (3), Applied Sciences scholarly interest in this field of study. The 35 papers were published in 27 journals, indicating that there is still no major consolidated source about the theme. Most of these are leading journals once they are indexed in Journal Citation Report (JCR)-listed journals and the Chartered Association of Business Schools (CABS) journals' ranking list. The following journals published more than one paper: Journal of Business Research (3), Applied Sciences (2), Technological Forecasting and Social Change (2), Competitiveness Review: An International Business Journal (2), and Sustainability (2).  Table 1 summarizes the 35 accepted papers. As can be seen, 23 articles developed and applied surveys for collecting (primary) data. However, these surveys differ in sample size, the number of respondents per SME, and the definition of SME adopted by the authors. This indicates that the lack of consensus and standardization hardens the comparison among results.  Table 1 summarizes the 35 accepted papers. As can be seen, 23 articles developed and applied surveys for collecting (primary) data. However, these surveys differ in sample size, the number of respondents per SME, and the definition of SME adopted by the authors. This indicates that the lack of consensus and standardization hardens the comparison among results.

Data Collection Approaches
As shown in Table 2, the following variables were the most used for characterizing SMEs in surveys: the size (number of employees), the business industry, the age, and the ownership. As discussed in Section 2.1, sizes defined by the number of employees are usually different in terms of numbers. Furthermore, in the DT process, remote and hybrid work replaces activity-based positions [10]. This implies that enterprises with fewer employees can be large enterprises (and not necessarily SMEs). In this regard, a standardized and measurable definition of a "digital SME" represents a contribution to further investigations.
In the case of ownership, the authors normally discriminate between family-owned SMEs and SMEs controlled by company groups. However, in China, it is also possible to have estate SMEs, private-public SMEs, and/or SMEs listed in stock markets [71]. These observations and the lack of a commonly accepted definition of SMEs make us question, for example, if a non-governmental organization could be considered an SME. If so, under which conditions? Additionally, in the case of family SMEs' investigation, the gender of the owner proved to be a significant variable for determining the DT performance of the SMEs. Men tend to be more engaged in DT than women [65]. There is a lack of investigation into whether the gender of the managers of non-family-owned SMEs also affects DT performance. Besides gender, the less frequented variables of characterization are sales (or income from sales), geographical location, and the marital status of the owners [81].
Furthermore, six articles worked with qualitative approaches (interviews and case studies). Seven articles worked with secondary data; these were mostly literature reviews (6), except for [1], which quantitatively investigated a dataset from the Centre of European Economic Research (ZEW)'s 2015 ICT survey. Regular surveys with standardized questionnaires enable cross-temporal studies, which are pointed to as a gap in the theme [72,82]. Additionally, one article collected data on scraping SMEs' websites [80]. Scraping may become a promising data collection approach for future investigations a having a website becomes a mandatory requirement for SMEs in most industries. Regarding survey sampling, five approaches were identified in the literature: random sample choice [71,82], balanced industrial representation of the SMEs in the sample [9], focus on the SMEs from a unique industry [72], focus on SMEs considered as innovative [8,58,[60][61][62]72,75], and focus on the SMEs considered as entrepreneurial [63]. Regarding the survey respondents, as shown in Table 1, most of the research surveyed only one person per SME (usually the owner or a senior manager) [32,59,71,74]. However, one paper surveyed multiple employees from the same SME [59], and other papers did not mention how many respondents were surveyed [61,62].

Methodological Approaches
As seen in Table 1, nine papers used mixed methodological approaches. These approaches were mostly (6) literature reviews, followed by one application of the reviewed concepts/performance indicators. They strengthened and legitimated the contribution of qualitative research for management knowledge [88]. However, the additional barriers to integrating qualitative and quantitative research have already been pointed out [89].
Two papers applied exclusively qualitative approaches: a paper with case studies analyzing evidence of the challenges SMEs face while redesigning their business model due to DT [60] and a critical review aiming to design a framework for SMEs building digital trust [73]. The remaining papers (24, 69%) applied exclusively quantitative approaches. Hence, quantitative tools and approaches are predominant in the field.
Among the quantitative tools, structural modeling (SEM) (10) and econometrics (9) were the most used. Without considering DT as one of the SMEs' environmental performance factors, the literature had already discussed the application of SEM as a performance tool for SMEs [90]. In this regard, the results indicated that SEM and regressions are consolidated tools in the field. On the other hand, cluster analysis, analysis of variance (ANOVA), Decision-Making Trial and Evaluation Laboratory (DEMATEL), and integrations with fuzzy techniques were used more than once. This indicates these tools are emerging in the field. No paper used an approach directly associated with performance measurements, such as Stochastic Frontier Analysis (SFA) and Data Envelopment Analysis (DEA). This suggests that these tools are still not explored in the field and remain an open field for possibilities.

Variables and Dimension
Regarding the aspects, as seen in Table 1, 33 papers directly considered the DT aspect, 28 considered the economic aspect, 22 considered the social aspect, and 5 papers considered the environmental aspect. Only four papers [60,61,73,87] considered the four aspects simultaneously. Hence, there is a gap in the studies considering the TBL in this context. Table 3 shows the identified variables and dimensions regarding DT. The dimensions were attributed by the authors of each paper. Only nine authors classified the used variables into dimensions. We identified 192 variables and 34 dimensions.   • Platform Orientation [63], for understanding the competitiveness of SMEs in platforms. This dimension is associated with external operations. • Portal Usefulness, Portal Interface, and Service-Orientation Portal Function [5], for understanding the effects of DT on SMEs' performance. This is the oldest identified paper. The word "portal" can be understood as what was later named "platform". These dimensions are associated with external activities. • Organizational Resilience, Infrastructure System, Manufacturing System, Data Transformation, and Digital Technology [64], for defining readiness indicators for SMEs' DT. The authors focused on the manufacturing industry; the adopted dimensions are associated with internal activities. Instead of "platform", the authors used the words "system", "technology", and "data" for the use of internal platforms. • System Integration [66], for understanding the DT priorities of Indian SMEs in the manufacturing sector. Again, the dimension is associated with internal activities, and the word used is "system". • Measurement System, Technology Management, Data Management, and Customer Experience [69], for understanding the DT priorities of Canadian SMEs in the manufacturing industry. Three dimensions are associated with internal activities and use the words "system", "technology", and "data". The "Customer Experience" dimension is associated with external activities, and it is linked to the measure "customization". There are no variables related to platforms for communicating with customers. The use of platforms is a general practice for SMEs in retailing but is not taken into consideration in the literature in the manufacturing industry.

DT
• Overall Digitalization Degree, Digitalization Method, Digital Technology Adoption, Business Mode, and Long-term Crisis Responses [71], for investigating the response of the SMEs to the pandemic. These dimensions are associated with internal and external operations in a multi-industrial context. • Fixed-line Broadband-Connectivity, Mobile Broadband-Connectivity, Online Presence, E-commerce-Online Presence, Online Activity, ICT Infrastructure, Advanced Technologies-ICT Infrastructure, Production Technologies-ICT Infrastructure, ICT Policy, and ICT Usage [76], for understanding the DT of fiber-based SME manufacturers in Europe. The dimensions are associated with internal and external activities as well as a strategy (ICT Policy). However, the dimensions associated with external activities are not associated with the use of a shared selling platform (common for retailing SMEs) but with the "Proprietary Website" and "B2B E-business Activity (Online Activity)". Although common for SMEs in the manufacturing industry, these investments are unusual and unaffordable for the DT of SMEs in other industries (such as service and retailing). IT Perspectives [84], for measuring the effects of factors influencing the readiness of SMEs towards DT. This dimension is linked to the variable "Technical Infrastructure", which may be associated with internal activities. As it is a multi-industrial perspective, the term "infrastructure" can represent investments in generally used technologies or advanced ones (usually prohibitive for SMEs in some industries).
In summary, we conclude that the literature explores dimensions and variables for the DT in this context. However, there are some considerations that should be highlighted. First, most papers focused on measuring performance at a micro-level (the SMEs' perspective only). There is a gap in measurements considering a macro perspective (e.g., market, legislation, etc.). Second, the variables and dimensions can be divided into those focused on the SME's internal activities and those focused on the SME's external activities (e.g., communication with customers and suppliers). In general terms, papers that investigate manufacturing tend to consider the DT internally in the SMEs, while papers that investigate other industries tend to consider the DT as an enabler for the relationships and communications external to the SMEs. It is critical to highlight here that the DT measures (e.g., digital platform metrics) have an integrative potential, connecting internal and external activities.
We observed that, although manufacturing is a decreasing sector with a smaller representation in the employment rate worldwide [40], it tends to be more investigated in the literature. Additionally, SMEs from the manufacturing industry may be more interested in and have more resources for investing in the DT process than SMEs from other industries. The manufacturing industry tends to associate DT with the words "system", "data", "technology", and "ICT", while other industries tend to use the words "platform", "portal", "website", and "marketplace". This may be a consequence of the fact that (i) there is heterogeneity among SMEs, and (ii) authors work with different definitions of SME.
As well as a standardized definition of SME, there is a lack of a clear definition of "platform", and its differentiation from "portal", "website", "marketplace", and "social media". Finally, there is a lack of variables and dimensions that consider cybersecurity and data protection. Table 4 shows the identified economic variables and dimensions. We identified 122 variables and 16 dimensions. In other words, there are 36% fewer economic variables than DT ones. This indicates that economic aspects are somewhat less explored than DT in the context of SMEs' performance.  Economic dimensions are linked with variables of more than one aspect of interest in our paper. For example, the social aspect of the TBL is also professional competence, leadership, culture, and digital responsibility. The dimensions of R&D Infrastructure, Managing Resources for DT, and IT Perspectives and Economic Perspectives are linked to economic and DT variables. However, any economic variable that clearly connects microand macro levels (or internal and external activities in the SME) was not observed.

Economic
In summary, the economic dimensions tended to be more strategic than operational. The economic dimensions brought to light some environmental factors (macro-level) that serve as enablers and driving forces, such as regulatory systems, market orientation, and internationalization. It also showed that the pandemic's short-and long-term impacts are a focus of interest. In this regard, future studies should explore more operational variables of the economic aspect. Furthermore, they should understand the dependency of the economic aspect related to the others and how it is affected by the environmental aspect. Table 5 shows the identified social variables and dimensions. We identified 74 variables and 11 dimensions. In other words, there are 62% fewer social variables than DT ones. This indicates that social aspects are more neglected. The same phenomenon was registered in the supply chain literature, considering the circular economy [43].   [83], to understand SMEs' maturity level regarding DT. • IT Perspectives [84], for measuring the effects of factors influencing the readiness of SMEs towards DT.

Social
In summary, the observed social variables and dimensions were mostly strategic and not operational. They relate to (and sometimes depend on) DT and economic variables and dimensions. It is worth noting that no paper considered work satisfaction. Moreover, no paper considered the salary of employees. Finally, no paper considered that SMEs might be hiring employees from other countries to work remotely. The only paper that considered job security during the DT was focused on the response to the pandemic crisis and not on a long-term response to job impacts caused by the DT as a historical process. Hence, these topics stand out as future directions for research. Table 6 shows the identified environmental variables and dimensions. We identified six variables and two dimensions. There are 97% fewer environmental variables than DT ones. Hence, the environmental aspect is the most neglected. The identified environmental variables were:

Environmental
• Environmental Orientation and Sustainability Market [61], for understanding the relationship between digital and environmental orientations to enhance innovation outcomes. Innovations are assumed to be mandatorily related to Digital Transformation. • Environmental Sustainability Readiness [60], for investigating the impact of AI on the international performance of SMEs and investigating how the relationship between internationalization and DT affects sustainability. • Environmental linked to the dimension of Corporate Digital Responsibility and Sustainability linked to the dimension of Digital Trust [73], for building digital trust while implementing high-performance computing (HPC) in SMEs. • Sustainability Strategy [87], for understanding the role of sustainability in the relation between digital business strategy and financial performance.
In summary, all papers adopted strategic environmental variables and dimensions. There is a lack of operational and tactical perspectives. Papers that considered the environmental aspect were focused on sustainability. There is a gap in considering the environment as one of the aspects of any SME's performance without the need for a specific focus on sustainability.

Topic Modeling for Papers' Initial Selection
Our first purpose was to investigate if the LDA could discriminate against the 30 papers initially rejected by the SLR (Filter 1), serving as support for the SLR validation. We applied the LDA for all papers of the sample (121). We investigated different numbers of topics (k = 5, 7, 10, 15, 20). Then, we compared the number of papers approved by the SLR that was allocated to each topic for each k. As can be seen in Table 7, k = 15 represents the configuration with maximum discriminatory power and with a decreasing perplexity level (Figure 2), concentrating 48 of the 91 (52%) initially accepted papers on 7 topics. We consider this evidence that the human subjectivity of selecting papers was constrained enough by the SLR methodology. Thus, the results are comparable with the non-human method. Both methods are not interchangeable but serve as mutual support. Topics with 100% of accepted papers T4  -T5  T1, T3, T7, T8, T10, T12, T14  T1, T5, T7, T8, T11, T12, T16,  T17, T18, T19 Number of papers 24 -5 48 34 Our second purpose was to investigate the possibilities of LDA for providing new insights into the SLR interpretation. Table 8 provides the topics that are 100% composed of accepted papers. The terms that are exclusive to these topics are in green. Thirty terms only appear in these topics. Among them, we highlight the terms "covid", "pandem", "crisis", "respons", "measur", "impact", all in T10. This suggests that interest in the theme increased due to the pandemic, and the papers that treat the pandemic may correspond to a separate focus of interest inside the field. The topic with * is focused on the environmental and social aspects and the topic with ** is focused on the pandemics. The terms that are exclusive to these topics are in green.
Additionally, the terms "platform" (T3), "portal", and "cloud" (T7) indicate that these topics are relevant for the DT variables. The terms "sustain" and "environment" (T14) are not exclusive of the topics produced with accepted papers, but they are in the same topic (T14) with the exclusive terms "work" "safeti", "health", "cultur", and "employe", which are all related to social sustainability. There is no other topic with terms related to any social or environmental aspects of sustainability. This corroborates the conclusion that social and environmental aspects are under-investigated in the literature and treated as a separate theme in the field. It is also worth noting that the pandemic and social sustainability are treated in different topics (T10 and T14). This suggests that research about the pandemic may be neglecting the social impacts (such as unemployment).

Topic Modeling for Papers' Filtering: The Onion Approach
Filters 2 and 3 required more human interpretation. We compared LDA results with the human filters. To do this, we adopted what we call the onion approach, as explained in Section 3.2 "LDA-based SLR". Tables 9-12 show the results of each onion layer.  T1  T2  T3  T4  T5 *  T6 **  T7 *  industri  digit  innov  capabl  orient  covid  port  product  manag  knowledg  perform  firm  pandem  work  manufactur  busi  famili  innov  market  crisi  data  system  technolog  enterpris  busi  perform  smes  sustain  chain  smes  perform  model  environment  busi  servic  data  research  industri  agil  sustain  respons  health  evalu  studi  model  organiz  innov  distribut  safeti  suppli  develop  extern  strategi  green  enterpris  forest  oper  compani  search  firm  capabl  impact  environment  indic  model  dih  relationship  suppli  market  iot The topic with * is focused on the environmental and social aspects and the topic with ** is focused on the pandemics. The orange column was used to show which topic was used to perform the LDA of the next onion layer.
The orange column was used to show which topic was used to perform the LDA of the next onion layer. The terms "digit", "technolog", "smes", busi", and "firm" appear in all layers, indicating that the search strings and filters achieved the research focus well. Additionally, the terms "market", "orient", "manag", "capabl", "knowledg", and "innov" appear in all layers. This corroborates that market orientation (external activities) may be a driving force of the DT. Moreover, managerial aspects are the most frequently investigated. Market orientation is usually investigated considering the capabilities' perspective, and this theme is deeply correlated with innovation and knowledge.  T1  T2  T3 *  T4  T5  T6  T7  capabl  digit  sustain  perform  market  capabl  industri  innov  busi  environment  variabl  onlin  firm  product  matur  manag  port  studi  media  knowledg  manufactur  market  smes  green  chain  distribut  agil  chain  orient  technolog  capit  suppli  communic  entrepreneuri  compani  knowledg  research  cultur  effect  custom  famili  design  social  innov  variabl  factor  social  orient  technolog  crisi  model  smsps  portal  compani  effect  suppli  organiz  develop  intellectu  adopt  competit  intern  process  servic  studi  tool  signific  enterpris  innov  system The topic with * is focused on the environmental and social aspects. The orange column was used to show which topic was used to perform the LDA of the next onion layer. The topic with * is focused on the environmental and social aspects. The orange column was used to show which topic was used to perform the LDA of the next onion layer.  T1  T2  T3  T4  T5   digit  valu  innov  industri  firm  busi  market  capabl  knowledg  work  manag  chain  market  innov  crisi  technolog  onlin  competit  technolog  respons  smes  aspect  matur  smes  chang  compani  resourc  servic  product  covid  research  smes  technolog  manufactur  dynam  process  enterpris  orient  extern  entrepreneur  transform  develop  iot  firm  small  develop  indic  knowledg  adopt  smes   Table 9 has the terms related to sustainability concentrated in T5 and T7. Similarly, Table 10 has them in T1, T3, and T4. Table 11 has them concentrated in T3, and Table 12 has no more terms related to environmental sustainability but has "work" in the same topic with "covid", correlating the pandemic with social sustainability. Only Tables 7 and 9 have terms related to the pandemic. In both cases, they are isolated in a unique topic. This indicates that these themes, "pandemic" and "sustainability", are treated as separate themes in the field and separate from each other.
The terms "environment", "sustain", "model", "perform", and "studi" appear in Tables 9-11, evidencing the adequacy of the SLR. The terms "transform" and "chang" only appear in Table 12. The term "entrepreneur" appears in Tables 10 and 12, suggesting that, as well as innovation, entrepreneurship is frequently correlated to DT in SMEs. Hence, the variables for DT should also consider measuring innovation, entrepreneurship, and market orientation capability.
Additionally, "famili" appears in Tables 9 and 10, indicating that SMEs are usually correlated with the family business, and the term "manufactur" appears in Tables 9-11, corroborating the result that the manufacturing industry is more studied than other industries. The terms "suppl" and "chain" appear in Tables 9 and 10, suggesting that "supply chain" is also more studied than other themes. The term "agile" appears in Tables 9 and 10, suggesting that agile management is also associated with SMEs. Finally, the term "matur" appears in Tables 10 and 12, indicating that DT is frequently understood based on maturity levels. In this way, it is demonstrated that the LDA served as a useful support tool while executing an SLR. Figure 4 provides the framework proposed here for those (researchers, policymakers, and SMEs) interested in measuring SMEs' performance, considering the TBL and DT. SMEs' performance is subject to internal (micro-level) and external factors (macro-level). As discussed in Section 4.3, SMEs are subject to different definitions and heterogeneity. Thus, this may be considered in any performance investigation due to its possible moderator effect. Table 2 provides the variables identified in the literature for quantifying heterogeneity.

Framework for Measuring SMEs' Performance
As discussed in Section 4.4.1. DT, papers divide performance metrics into those related to internal and external activities. This is also supported by the LDA evidence discussed in Section 5.2. However, the DT aspect has an integrative potential, using a shared digital platform among different stakeholders. Depending on the level of the SMEs' digital maturity, it is possible to measure DT performance by considering internal and external activities jointly. This was also evidenced by the LDA results in Section 5.2. Similarly, depending on the business environment characteristics, it is possible to measure the DT performance considering micro-and macro-levels jointly. Given this, although not found in the literature (Table 3), cybersecurity and data protection procedures are important variables for enabling this integration.  As discussed in Section 4.4.1. DT, papers divide performance metrics into those related to internal and external activities. This is also supported by the LDA evidence discussed in Section 5.2. However, the DT aspect has an integrative potential, using a shared digital platform among different stakeholders. Depending on the level of the SMEs' digital maturity, it is possible to measure DT performance by considering internal and external activities jointly. This was also evidenced by the LDA results in Section 5.2. Similarly, depending on the business environment characteristics, it is possible to measure the DT performance considering micro-and macro-levels jointly. Given this, although not found in the literature (Table 3), cybersecurity and data protection procedures are important variables for enabling this integration.
All aspects of TBL and DT affect each other and affect final performance. Researchers are recommended to deeply investigate these relationships. However, our results indicate that, differently from the DT aspect, economic, social, and environmental aspects do not have a strong potential to enable the integration between internal and external activities and between micro-and macro-levels. Regarding the TBL aspects, performance can be measured considering different variables for internal/external activities and micro-/macro-levels.
The variables found for measuring the economic, social, and environmental aspects are registered in Tables 5-7, respectively. For example, policymakers interested in fostering innovation among SMEs may be interested in tracking digital and economic measures. In this case, at the macro-level, the digital variable can be the use (or not) of a certain platform connecting regulatory agencies, SMEs, suppliers, and other stakeholders. Furthermore, policymakers can track a variable representing the financing approved for each SME to innovate (economic measure). At the micro-level, for attending to this policy, SMEs can, for example, track the economic dimension "R&D Infrastructure", through the variables related to internal activities ("R&D Department", "ICT Investment in R&D", "Patents or Trademarks", "In-house Innovation Capacity") and external activities ("Innovative Collaboration") as registered in [77]- Table 5.
However, to keep the same illustration while promoting an innovation policy, policymakers may also be interested that this policy positively impacts jobs (in quantity or quality). From the macro-level perspective, this can be measured based on labor protection requests (as [71] in Table 6) or regional unemployment rates. At the micro-level, the SMEs can measure this through variables for internal activities (such as "Acquisition and All aspects of TBL and DT affect each other and affect final performance. Researchers are recommended to deeply investigate these relationships. However, our results indicate that, differently from the DT aspect, economic, social, and environmental aspects do not have a strong potential to enable the integration between internal and external activities and between micro-and macro-levels. Regarding the TBL aspects, performance can be measured considering different variables for internal/external activities and micro-/macro-levels. The variables found for measuring the economic, social, and environmental aspects are registered in Tables 5-7, respectively. For example, policymakers interested in fostering innovation among SMEs may be interested in tracking digital and economic measures. In this case, at the macro-level, the digital variable can be the use (or not) of a certain platform connecting regulatory agencies, SMEs, suppliers, and other stakeholders. Furthermore, policymakers can track a variable representing the financing approved for each SME to innovate (economic measure). At the micro-level, for attending to this policy, SMEs can, for example, track the economic dimension "R&D Infrastructure", through the variables related to internal activities ("R&D Department", "ICT Investment in R&D", "Patents or Trademarks", "In-house Innovation Capacity") and external activities ("Innovative Collaboration") as registered in [77]- Table 5.
However, to keep the same illustration while promoting an innovation policy, policymakers may also be interested that this policy positively impacts jobs (in quantity or quality). From the macro-level perspective, this can be measured based on labor protection requests (as [71] in Table 6) or regional unemployment rates. At the micro-level, the SMEs can measure this through variables for internal activities (such as "Acquisition and Development of Skills", as in [69], Table 6) and variables for external activities (such as "External Openness and Collaboration", as in [68], Table 6).
Finally, policymakers may also be interested in fostering an innovation policy while guaranteeing negative impacts on jobs and the environment will be restricted. From the macro-level perspective, environment restriction can be measured through a certification system for SMEs, while SMEs can measure whether their suppliers (external activities) and themselves (internal activities) are attending to the agreed environmental targets. This framework can be used for guiding the creation of performance metrics at operational, tactic, and strategic levels.
Finally, variables' relationships can be investigated through SEM and econometrics. Furthermore, performance indicators can be proposed based on DEA, SFA, or among other methods (Table 1). In this way, it will be possible to highlight benchmarks and best practices, as well as determine if targets are achieved. Although here it is recommended to measure performance considering at least ten variables (i.e., six at the micro-level, three at the macro-level, and a DT variable integrating levels, as in Figure 4), it is essential to highlight that the variable choice depends on the goal and a context view.

Conclusions
The tools, dimensions, and variables for measuring and investigating the impacts of DT on the performance of SMEs are an increasing topic of interest, but the body of knowledge is still developing. The number of papers on the theme is still small. Moreover, the lack of a commonly accepted definition of SME and the heterogeneity among SMEs are obstacles to the comparison of results among different papers. Consequently, this also represents an obstacle to the building of the body of knowledge. What is more, there is a lack of a definition of what DT is for an SME and how it could differ from DT in larger companies.
Among the analyzed papers, we identified that quantitative tools are predominant, mainly structural equation modeling (SEM) and econometrics. Decision-Making Trial and Evaluation Laboratory (DEMATEL) and integrations with fuzzy techniques may represent the current methodological frontier on the theme. However, it is worth noting that Stochastic Frontier Analysis (SFA) and Data Envelopment Analysis (DEA), methods focused on performance measurement, still need to be explored, representing complete fields of new future research possibilities. Furthermore, Artificial Intelligence (AI) is not explored, probably because of data unavailability. We strongly recommend the standardization of definitions and data collection procedures for enabling the application of AI methods.
We collected and classified the used variables and dimensions. Then, we could conclude that the TBL is still neglected. The joint analysis of the manual and the LDAbased SLRs indicated that environmental and social sustainability are treated as separate themes in the field, and they are not integrated with investigations about DT and economic performance. The variables and dimensions of DT are the most explored in the literature. They vary depending on digital maturity level. Potentially, the heterogeneity among SMEs (such as their sizes and industries) is affecting the maturity of the DT process. The manufacturing industry is more investigated and may have specific characteristics. The economic variables and dimensions are the second-most investigated. In both cases, we identified operational and strategic variables and dimensions, but the relationship between economic performance and DT remains unexplained. Third, social and environmental variables and dimensions are significantly less investigated. When they are treated in the literature, they tend to represent only a strategic level.
Therefore, many future research directions were pointed out in this text. Among them, we pinpoint the standardization of the definition of "SME"; the standardization of SMEs' data collection procedures; investigations into cross-national and cross-temporal scenario;, the development of a systematic taxonomy of the DT and TBL variables and dimensions considering operational, tactical, and strategical levels; investigations about how TBL and DT aspects interact and influence each other; and the development of a quantitative approach for measuring SMEs' sustainable and digital performance based on tools such as Data Envelopment Analysis (DEA) and Stochastic Frontier Analysis (SFA).
Regarding the SLR methodology, manual and LDA-based SLRs demonstrated to be useful and practical approaches for the comparison of results and novel insights. However, in our investigation, the LDA-based SLR could not substitute for the manual SLR completely. As with all research, our study is not without limitations. One of them is that the used LDA algorithm was not able to read all papers. Future applications should improve the algorithm proposed by Asmussen and Møller (2019) [13] until it is able to read all papers. The improved algorithm should also be able to deal with pre-defined expressions composed of two or more terms, such as "supply chain", instead of "suppl" and "chain" as two different terms in the topic construction. Beyond improving the algorithm for paper selection, it is important to emphasize that further research is needed to automate other steps of the SLR, such as the planning and reporting steps [66].