Tools for Correlation and Regression Analyses in Estimating a Functional Relationship of Digitalization Factors

: Digitalization processes affect all levels and spheres of human activities, from personal communications to public events. The widespread implementation of digital technologies has an ambiguous effect on the personal, social, and economic paths of modern society’s development. At the moment, there is no single approach to the estimation of digitalization’s impact, particularly on the ﬁnancial and economic properties of a company. The objective of this study was to create multiple models for the assessment of digitalization’s impact on company performance. To accomplish this objective, the following steps were performed: conducting a literature survey on the experience with digital technologies’ implementation; selecting the most appropriate mathematical tools for correlation and regression analyses; determining a functional relationship between the factors and consequences of digitalization in terms of companies’ performance; identifying digitalization factors, risks, and their impact on the ﬁnancial sustainability of companies; and creating a multiple regression model of the functional relationship between digitalization factors and the ﬁnancial sustainability of companies. In the course of the study, a correlation analysis of the dependence of companies’ ﬁnancial sustainability on a number of digitalization factors has been conducted, and different ways of using company performance data as effective features and predictive factors are offered. The article includes sampling data on the parameterization and quality evaluation of multiple regression models. The validity of the multiple models suggested was tested with actual statistical data obtained from 16 Russian companies. The application of the multiple regression model was devised to estimate digitalization’s impact on companies’ performance, and their ﬁnancial sustainability can be seen as the most important practical implication of this study.


Introduction
In the modern context of digital transformations and digitalization in all spheres of the economy and management, studying the impact of information technology (IT) on company performance has become a popular research avenue. Modern digital technologies make it possible for companies to attain a new level, providing information about services and products virtually to any international/national/regional market. The rapid development of digital financial tools has drastically changed the work of financial organizations, transforming every aspect of their business models. Literature reviews [1][2][3][4][5][6][7][8] have shown that regardless of social and economic development level, most countries consider digitalization as a tool to strengthen their economic position in the world, which contributes to improving their stability and the welfare of their populations.
The implementation of information and communication technologies (ICTs) has a crucial effect on the financial sustainability of companies. The high financial sustainability and performance of a company, first and foremost, reflect the priorities of its internal financial policy and a high resiliency toward environmental concerns.
Modern ICTs facilitate the processing of financial and other documents, which leads to increased labor efficiency in all departments.
The effectiveness of digitalization has been described in a large number of scientific articles. The range of issues discussed is constantly expanding, which testifies to the complexity of the phenomenon. However, a vast majority of investigators have pointed to the effect of business digitalization. Highlighting the advantages of digitalization, reference [9] called it the Holy Grail that helps to create "fundamentally new, disruptive business models". Reference [10] examined the pros and cons of digitalization in leading European companies and came to the conclusion that digital transformation has a favorable effect on company performance, proven by the high take-up of digital technologies, higher sales, better access to customers, the greater flexibility of price setting, the greater ease of sharing knowledge, and more efficient production processes. The study conducted by [11] in the automotive industry proved the benefits of digital transformations, forecasting higher profits, better productivity, greater competitiveness, better services, and greater satisfaction. Small-to-medium-sized European enterprises were examined by [12], and it was demonstrated that business model innovation driven by social media and big data has a positive impact on business performance. The favorable opportunities of the digital world were emphasized by [13] in the present-day pandemic.
Many studies have been devoted to information and the way in which we structure and use it. Investigating the role of the Internet of Things (IoT) and big data in businesses' digital transformation, reference [14] pointed out how disorganized knowledge in this sphere leads to inconsistent paths toward developing both research and practice. Having analyzed different aspects of understanding and applying data, reference [15] concluded that digitalization results in "information literacy" challenges and leads to new possibilities for providers of accounting information.
Some individual aspects of digitalization have been addressed in numerous works. The impact of digital transformation in marketing was examined in [16]. Reference [17] considered the crucial importance of choosing appropriate digital tactics and developing business strategies aimed at supporting managers. Reference [18], admitting the key role of digital technologies in maximizing profits, investigated the impact of social media as a marketing tool. Reference [19] investigated the way in which three social partners (governments, business associations, and trade unions) adapt to a new digitized context and their level of digitization. Digital twins (DTs) technology and its benefits in industrial applications were described in [20].
Although some investigators have concentrated on individual digitalization factors, other scientists have come up with different ideas for creating models to estimate and analyze digitalization's impact. Reference [21] developed "a model for digital innovationsdriven business model regeneration" that can be used to analyze digital innovations' effects and update existing business models. Reference [22] looked at innovations in digital energy service business models and revealed two significant value gaps related to the ability of these models to increase social and environmental benefits.
Some works have provided a detailed overview of the literature devoted to digitalization in business. The study conducted in [23] revealed three thematic clusters: technological innovation, strategic management, and digital transformation. Reference [24] paid attention to an under-researched sphere of digitalization: self-employed workers and freelancers running their business online. Reference [25] examined a poorly investigated question about the impact of digital servitization on sustainability based on the experience of small and medium manufacturing companies in Italy.
Thus, the digital transformation of the economic environment carried out by a largescale implementation of ICTs implies the transformation of all activities of modern companies to digital platforms. Such a transformation entails changing business-running standards, expanding horizons and creating new types of business (digital business), changing company structure, seizing opportunities that are provided by AI for business transformation and audience accumulation, broadening spheres, and conducting commu-nication in a web [26][27][28][29][30][31]. However, there is no single approach that could be applied to estimate the impact of digitalization on company performance and the financial and economic features of companies. It can be claimed that the development of such approaches to estimate company performance in terms of digitalization's impact is seen as an urgent issue. The literature review has identified the two most important digitalization factors that impact company performance. Among these factors, the most common ones are labor productivity and digitalization costs [32][33][34][35][36][37][38][39]; hence, they must be used in creating models for estimating digitalization's impact on company performance.

Materials and Methods
The objective of the study is to create multiple models for the assessment of digitalization's impact on company performance.
The study material is a set of financial and economic indicators of 16 Russian companies obtained as of 2019.
The digital transformation of society should be considered using an interdisciplinary scientific approach that integrates methods of economic theory, philosophy, sociology, ICTs, and applied mathematics, and in particular, correlation and regression analyses.
To estimate digitalization's impact on a company's financial sustainability, the following multiple regression model can be used [40]: where y (t) is an effective feature that describes company performance and a company's financial sustainability; f is a functional relationship between the effective feature and externalities; x 1 , x 2 , x 3 . . . x m is a set of externalities; and ε(t) is a stochastic component that shows the impact of factors that are not explicitly included in the regression equation. When determining the parameters of a multiple linear regression (MLR), an equation can be derived using the least square (LS) method. To do that, a system of standard equations must be solved: where y is a dependent variable that denotes any indicator of a company's financial and economic performance; x i are the externalities that indicate digitalization's impact on a company's performance and sustainability; and a, b i are MLR equation coefficients that describe changes in the average value of the effective feature, and take place with every unit change in the average value of the factorial feature. When specifying a model, it is very important to compose a system of indicators for choosing appropriate predictive factors. Moreover, the coefficients of the pair correlation between the endogenous variable and externalities are to be calculated. The following correlation matrix is to be used: where r yxi is a pair correlation coefficient that shows the impact of a certain predictor on the endogenous variable, and r xixj is a coefficient of the predictor's pair correlation that describes the interdependence of factors.
To identify the predictor's interdependence, matrix R* must be introduced into the analysis to exclude a multicollinearity: The value of a pair correlation coefficient can be calculated by a formula, using a covariance of features: where is a mean square deviation of a certain factorial feature; and is a mean square deviation of the effective feature, To determine the value of the coefficient of the pair correlation between the effective feature and the predictor, the following formula can be used: where ∑ ( y x − y) 2 is the sum of the squared deviations of the predictor-affected modelled values of the effective feature against the average y value, calculated using empirical data; and ∑ (y − y) 2 is the aggregate sum of the squared deviations of y's empirical values against its average value. Traditionally, there are two directions in analyzing a matrix of a pair correlation. The first is estimating and analyzing the power of the predictor's impact on the effective feature, and the second is determining and analyzing the interdependence/closeness of the factorial features.
To identify a multicollinearity of predictors that can misrepresent the functional relationship of the variables under study and lead to creating invalid models, the Glauber method can be applied [41].
To check for a multicollinearity in the system of the chosen factorial features, χ 2 criteria can be used. It is calculated by the following formula: the χ 2 criteria are calculated with regards to the degrees of freedom k = m(m−1)

.
At a given reliability (probability) level of p calculation, a designated number of degrees of freedom k = m(m−1) 2 can be referred to a table (critical) value of χ 2 kp criteria. If χ 2 p ≤ χ 2 kp , then the multicollinearity between the predictive factors is considered to be zero, and all of the factorial features can be included into the investigation, according to the researcher's judgment.
If χ 2 p ≥ χ 2 kp , then there is a multicollinearity between predictors at a given probability level of p calculation, and the further selection of factorial features to derive the regression equation is subject to additional limitations.
The least square method allows for the determination of the regression model's parameters. In similar cases [42], the identification of multiple regression parameters comes from calculating the minimum of the G function, which delineates the sum of the empirical data's deviation from the effective feature's modeled values. The G function is calculated as follows: where f ( ) is a functional relation that represents a mathematical form of measuring the scatter of empirical data regarding the modeled values. In most cases, the square of the scatter of the effective feature's empirical and modeled values can be used as a mathematical form. With the regression model's parameters identified, the quality and reliability analyses for the devised regression model are to be performed. To do so, several quality characteristics can be considered.
To estimate the general quality of the regression equation, a coefficient of determination is commonly used. The value of the coefficient can be calculated by the following formula: is the value of residual dispersion that indicates the deviation of the empirical and modeled values of the effective feature; and is the value of the aggregate dispersion that denotes the scatter of the empirical values against the average empirical value of the effective feature, which is measured as follows: The value of the determination coefficient varies from 0 to 1. The closer its value is to 1 (which means a lesser relationship between the residual and aggregate dispersion), the higher the general quality of the created regression equation will be.
Additionally, the quality of the created regression model can be examined using a coefficient of the multiple correlation, which describes the value of the joint impact of the factorial features included in the regression equation on the effective feature. To measure the value of the multiple correlation coefficient, the following formula can be used: The closer the empirical values of the effective feature are to the regression line, the higher the value of the multiple correlation coefficient will be. Eventually, the reliability of the devised regression equation and its outcome will increase.
The second direction in the quality assessment of the devised regression equation is the estimation of the statistical significance of the regression equation, which can be carried out using the F-test.
There is an assumption that if a sustainable relationship exists between predictors and the effective feature, which is characterized by the devised regression equation, the factor dispersion of the effective feature affected by regression equation predictors must be much greater than the residual dispersion of the effective feature affected by other factors, which are not included in the regression equation.
To determine Fisher's criterion, the ratio of the factor dispersion to residual dispersion of the effective feature is calculated: where is a factor dispersion. Fisher's criterion can also be calculated by the following formula: where n is the length of the effective feature's time series and m is the number of factorial features included in the regression equation.
Comparing the calculated value of Fisher's criterion to the critical value that is computed based on the given significance level α (0.05 or 0.01) and the degree of freedom v 1 = m − 1, v 2 = n − m, the following conclusion is made: If F f act > F table , then the devised regression equation is considered statistically significant; otherwise, it will be statistically insignificant.
The assessment of the quality of the devised regression equation can be performed by estimating the statistical significance of the regression coefficients; that, in turn, confirms the statistical significance of the factorial features included in the equation. This estimation can be performed with the help of Student's criterion: where bi are the regression coefficients of factorial features and µ bi is the standard deviation of the regression coefficient calculated by the following formula: Since the residual dispersion can be determined by formula (13), the standard deviation of the regression coefficient is calculated through the residual dispersion: To calculate the regression parameter a needed for determining Student's criterion, the following formula can be used: where a is the absolute term of a regression equation that represents the value of the effective feature when the factorial features included in the regression equation do not affect the effective feature or correspond to zero. The value of µ a can be calculated in the following way: The computed values of Student's criterion are compared to the critical values of the criterion, and are determined with regards to freedom degrees n-m and a given calculation probability for α (0.01; 0.05). If t calculated > t critical , then investigators agree to the statistical significance of the regression coefficient against a certain predictor; otherwise, the regression coefficient will not be statistically significant.

Results
After using correlation and regression analysis tools to devise the multiple models for the relationship of digitalization factors, the analysis of the digitalization factors and their impact on the effectiveness and sustainability of a business was carried out (Table 1).     Table 1 contains the effects of digital technologies on a company's financial sustainability. As a result of the study conducted, the multiple regression model for the estimation of digitalization's impact on the financial sustainability of companies has been created: Y(FU) t = f (IRPT t , ISZ t , IASR t , IABP t , HBI t , RKB t , ZPK t , HDI t ), where Y(FU) t is the effective feature that characterizes the effectiveness and sustainability of business activity, and f is a functional relationship between the effective feature and a set of factorial features.
The following parameters can be used as the effective and factorial features based on the investigator's judgment (see Table 2): Derived values that describe the predictors' dynamic patterns with time, i.e., their indexes, can be used as predictors rather than absolute values of exogeneous features. An autoregressive model of the m-dimensional vector of exogeneous variables can be used to create a system of predictor values and to calculate the index of factorial features: where α i and β i are the parameters of the multiple regression model used to compute dynamics indexes for a number of exogeneous variables; t is the index that shows the factorial features changing over time; p is the value of the lag of the interrelated observations of the predictors' time series. It characterizes the structure of the autoregressive model and the extent of the relationships between the observations of the exogeneous variables' time series; ε it is called white noise errors. The variable describes the errors in measuring the numerical value of regression model parameters, as well as the violation of some prerequisites for the LS method. Moreover, it shows the impact of slack (qualitative) variables on the predictor value. To validate the created multiple models aimed at estimating digitalization factors' impact on a company's performance and sustainability, data from 16 companies of the Smolensk region were obtained.
Only companies from the Smolensk region were studied, since an expert survey in other regions involves significant expense.
The choice of the study objects (companies) was justified by the availability of the benchmark statistics (original statistical data) on the financial activity of the companies, as well as by the desire of these companies to take part in the expert survey. Some companies, due to the worsening of their performance indicators, refused to provide their statistics, referring to a commercial confidentiality policy to explain this refusal. Moreover, the pandemic has had a negative impact on company performance in most cases, and a number of firms have ceased to exist.
The following parameters of financial sustainability were taken into consideration: the coefficient of financial independence (KFN); the coefficient of the current liquidity (KTL); the coefficient of the absolute liquidity (KAL); and the coefficient of a short-term debt (KKZ). Table 3 contains the sampling data for a number of companies under study. For validating the multiple models created for estimating the impact of digitalization factors on a company's financial sustainability, the following factorial features were chosen: X1-Index of labor efficiency increase (IRPT); X2-Expenses on ICTs (HBI 1); X3-Expenses on implementing/installing/maintaining new software and ICTs (HBI 2); X4-Expenses on the advanced professional training of employees working with ICTs (ZPK).
As previously mentioned, the choice of these factorial features is justified by the availability of the benchmark statistics (original statistical data) on the financial activity of the companies, as well as by the desire of these companies to take part in the expert survey.
For some factorial features included in the regression model for estimating digitalization's impact on a company's financial sustainability, it was impossible to obtain statistical data that were of a qualitative nature. These include data such as the index of cost cutting related to implementing digital technologies into business activities (ISZ), the index of the automatization of strategic planning and day-to-day management (IASR), the index of business process automatization (IABP), cybercrime-related risks (RKB), and a number of qualitative parameters (HDI). Statistical data on these factorial features within this study are difficult to obtain, as special (experimental) observation and day-to-day monitoring of these factors are required for every company involved in the study. Table 4 presents the sampling data on the factorial features that have numeric expression and that were obtained from the expert interviews and the accounting documents of the companies. The numbers are shown in thousands of rubles. To identify digitalization's impact on the financial sustainability of the above-mentioned companies, the tools of correlation and regression analyses, as well as statistical methods of modelling functional relationships for company performance indicators, were used.
The correlation dependence of the financial sustainability indicators of some companies and digitalization factors is shown in the matrix of the pair correlation of the effective and factorial features. Table 5 furnishes the sampling data on the correlation analysis.

Discussion
The results of the study, conducted with the use of the tools of correlation and regression analyses, reveal the impact of digitalization factors on the financial sustainability of companies; however, the intensity of the intercorrelation of the examined features varies from moderate to mild. This can be explained by the fact that digitalization factors do not play a pivotal role in improving a company's performance and financial sustainability. There are other factors, which are not related to digitalization, that have a greater effect on company performance. Nevertheless, in the context of digital globalization, companies cannot develop effectively unless they correspond to modern digital requirements in their activities [43][44][45][46][47][48].
The study results demonstrate that the coefficient of financial independence (KFN) is directly correlated with factorial features that depict the labor efficiency increase (X1/IRPT), expenses on software and ICTs (X2/HBI1, X3/HBI2), and expenses on the advanced professional training of employees (X4/ZPK). Thus, digitalization has a positive effect on the coefficient of the company's financial independence.
The data obtained point to the fact that digitalization factors affect financial sustainability, but their influence is ambiguous. For example, the index of labor efficiency increase (X1/IRPT) has a direct correlation with indicators of a company's financial sustainability, such as the coefficient of financial independence (KFN). This is due to the fact that an increase in labor efficiency leads to increased output, which positively affects the specified indicators of a company's financial sustainability.
However, the index of labor efficiency increase (X1/IRPT) has an inverse correlation with the indicators of a company's financial sustainability, such as the coefficients of the current and absolute liquidity (KTL and KAL), and the coefficient of a short-term debt (KKZ). These findings tell us that increases in labor efficiency lead to increased output and increased sales, which reduce receivables and payables, and decrease the coefficient of a short-term debt.
Digitalization costs (X2/HBI1, X3/HBI2, X4/ZPK) have a mild inverse correlation with some financial sustainability indicators, such as the coefficient of a short-term debt (KKZ), since expenses on the advanced professional training of employees decrease company profits and have a negative impact on the values of the specified parameters.
In most cases, the impact of digitalization factors (X2/HBI1, X3/HBI2, X4/ZPK) has a direct correlation with the indicators of a company's financial sustainability.
It is necessary to point out the strong correlation between predictors X2/HBI1, X3/HBI2, and X4/ZPK, which is quite obvious, since a significant part of the expenses on ICTs (X2/HBI1) causes considerable expenses in the implementation of digital technologies (X3/HBI2) and expenses on the advanced professional training of employees working with ICTs (X4/ZPK). According to the correlation and regression analysis, to ensure the high validity of the created models, it is possible to reject one of the highly correlated factors (X2/HBI1, X3/HBI2, X4/ZPK), or to unite these factors into one (X2/HBI) that embraces the influence of the other related factors and is equal to the sum of these factors.
The strength of correlation, i.e., the degree of the impact of digitalization factors on a company's financial sustainability, can be considered mild or moderate based on the study's findings. It can be concluded that digitalization factors are not determinants of the level of financial sustainability.
In addition to the correlation between digitalization factors and indicators of a company's financial sustainability, multiple regression models for the dependence of each indicator on digitalization factors have been built. Table 6 contains sampling data on the calculation of the multiple regression models for the dependence of a company's financial sustainability indicators on the above-mentioned digitalization factors. When determining the values of the indicators used in the models, the companies' financial statistics have been taken for a certain period, which is stipulated in the assignment (the study was conducted under government orders).
An insignificant value of the regression coefficients of the multiple models is justified by the fact that the majority of the financial sustainability indicators range from 0 to 1. The factorial features that characterize digitalization's impact on the company's financial sustainability are shown in relatively greater numbers. Moreover, as stated above, digitalization's impact on a company's financial sustainability is not pivotal in the numerical estimation of a company's financial sustainability indicators.
In the study, the quality of the devised regression equation was estimated to determine the reliability of further study results. Table 7 furnishes the results of the regression analysis of the functional dependence of the financial independence coefficient (27) on the examined factorial features. The values of the following parameters of the regression statistics are depicted in Table 7: multiple correlation coefficient (R); determination coefficient (R 2 ); determination coefficient adjusted for degrees of freedom (R 2 A ); standard error (µ); and Durbin-Watson statistics (d). All of these parameters describe the quality of the multiple models for the estimation of digitalization factors' impact on the coefficient of a company's financial independence.
The value of the multiple correlation coefficient (R) equaling to 0.707 shows that there is a positive correlation between the performance indicator and the predictors under study. The low value of the determination coefficient (R 2 ) equaling to 0.499 speaks to the insufficient quality of the created regression equation. The low value of the adjusted determination coefficient implies that substantial factors, which are more influential in terms of the effective feature, are not included in the regression equation.
Taking into consideration the high correlation between the factors X2/HBI1, X3/HBI2, and X4/ZPK, regression models with two factors have been created as well. These factors are X1/IRPT (index of the labor efficiency increase) and X2/HBI (expenses on ICT purchasing, implementing, and staff training). Factor X2/HBI is equal to the sum of the values of X2/HBI1, X3/HBI2, and X4/ZPK. Sampling data on the results of creating a two-factor regression model of the dependency of a company's financial sustainability indicators on digitalization factors can be seen in Table 8. The analysis of the quality of the two-factor regression models was carried out, and the sampling data on the absolute liquidity ratio can be seen in Table 9. Table 9. The basic parameters of the regression statistics for the two-factor model of the functional relationship between digitalization factors and the absolute liquidity ratio. The calculations show that the regression statistics' parameters of the two-factor regression do not differ much from the multiple models. On the whole, it can be unambiguously claimed that there is a correlation between the chosen predictors that reflect the digitalization processes. On the one hand, there is a correlation between a company's financial sustainability and performance; however, on the other hand, these factors are not determinants in the estimation of a company's financial sustainability.
As a result of this study, the following contributions to research into financial sustainability are made: − The real-world experience of digitalization, in terms of its impact on company performance, has been examined; − Mathematical tools of correlation and regression analyses have been investigated to identify the functional relationship of digitalization indicators; − Digitalization factors that have an effect on a company's performance and financial sustainability have been systematized; − The nature of the manifestations and consequences of digitalization's impact on a company's financial sustainability have been described; − The multiple regression model for the estimation of digitalization's impact on financial sustainability has been created; − The multiple models that were created were validated with statistics on some Russian companies; − The matrix of the pair correlations of the system of the effective and factorial features that reflect digitalization's impact on financial sustainability has been built and described; − Multiple regression models for the dependence of financial sustainability indicators on digitization factors have been created; − The results of the correlation and regression analyses of the multiple models created have been described.

Conclusions
The results of this research contribute to the studying of digitalization's impact on company performance and financial sustainability. The selected digitalization factors can be referred to both on a macro and micro level when analyzing the experience of different countries. The tools of correlation and regression analysis have allowed us to identify a certain correlation between a company's financial sustainability indicators and digitalization factors. With the multiple models created, it has become possible to determine the invariable regression models of the dependence of different exogenous and endogenous factors.
This research has a number of restrictions and conditions. Firstly, the sampling included regional medium-sized companies. With expanded sampling that takes into account large-sized companies, the results could be different.
Secondly, factors such as the index of labor efficiency increase and the expenses of ICT purchasing, implementation, and staff training have been taken as predictors that characterize digitalization's impact on a company's performance and financial sustainability. With expanded fields of research and additional statistical data, other indicators of digitalization's impact can be included.
Thirdly, an extended study time period, bigger sampling, and additional information on digitalization's effects at the macro and micro levels could provide a background for data validation and further research.
Despite the restrictions and conditions, the practical implications of the research assume the application of the devised approach to create multiple models for assessing digitalization's impact on a company's financial sustainability.