1. Introduction
Time series models are used in analyzing millions of transactions in spotting patterns, determining relationships, and detecting abnormalities and irregularities among dependent data. The emergence of business sustainability creates an opportunity to further examine the application of time series models using financial economic performance information and non-financial environmental, social, and governance (ESG) sustainability performance information. We address the application of time series models in business research by discussing the differences between correlation, association, and Granger causality and provide practical examples of their application in analyzing financial and non-financial sustainability data and their relationships. Time series analyses have traditionally been used in many disciplines such as finance, marketing, engineering, and medical sciences. In economics, finance, and marketing, the main use of time series analyses is forecasting. In marketing, the use of time series analyses is primarily focused on detecting the pattern of consumer buying habit to predict their future purchases. Time series models such as random walk, random walk with drift, and white noise are the most commonly used time series analyses in economics and finance. However, the use of time series analyses is uncommon in accounting and auditing professions, particularly in the sustainability literature. Thus, this study explores the use of time series analyses in examining the financial and non-financial dimensions of sustainability performance and its consequences for decision-making by all stakeholders.
Prior studies on sustainability-related research published in premier business and accounting journals conclude that, despite more than several decades of research and more than 100 sustainability-related empirical studies, the results are mixed regarding the link between sustainability performance and financial performance because of empirical model specifications [
1,
2]. Business sustainability is defined in the literature as a process of achieving financial economic sustainability to generate desired long-term returns on investment for shareholders while protecting interests of other stakeholders in achieving environmental, social, and governance (ESG) sustainability performance [
2]. We argue that the mixed results of prior sustainability studies are triggered by using different periods, estimation methods, definition, and construction of related variables and, more importantly, the interpretation of results in terms of correlation, association, and causation. Because of these shortcomings and mixed results, we are motivated to study this topic and make an effort to add to the sustainability literature by investigating differences and interpretations of correlation, association, causation, and Granger causality. We present examples for correlation, association, causation, and the Granger causality, examine their main differences and illustrate how the use of a linear regression is inappropriate when the true relationship is non-linear. Finally, we discuss the policy, practical, and educational implications by showing how time series models can be efficiently and effectively applied in business sustainability, developing predictive models of managerial strategies, decisions and actions, evaluating the feasibility, cost efficiency and effectiveness of new rules, regulations, and using time series in data science algorithms to capture all relevant financial and non-financial information for decision-making.
The remainder of this paper is organized as follows:
Section 2 presents institutional foundation, whereas
Section 3 provides the literature review.
Section 4 presents practical examples and implications of these examples are offered in
Section 5. The last section concludes the paper.
2. Institutional Foundation
In econometrics textbooks the most commonly used representation is a structural equation model (SEM). This form of econometrics representation is so important that almost all econometrics textbooks start with discussions of SEM. As an example, prior studies [
3] examine the effect of excise cigarette taxes on the extent of smoking by using the following simple linear regression model:
In this equation, the dependent variable, Y, is the extent of smoking, the independent variable, X, is the excise cigarette tax and ε is the entire error included in the model (e.g., errors of measurements, model mismedication). In this model, to estimate the β coefficient (called the effect coefficient), it is critical that X and ε be independent of each other. The independence of X from ε is known as the exogeneity of X, or X being an exogenous variable. Independent variables (X) and error term must be independent of each other. In fact, the error term is the effects of all other variables that are not included in the model. The X represents all variables that are included in the model. If X and error term are not independent, then we will have serial correlation. Researchers argue that if all underlying assumptions of the SEM are maintained, then the model can answer all questions related to causal relationships [
3].
Haavelmo [
4] concludes that in the linear equation of Y = β X + ε, the β X is the expected value of Y given that we set the value of X at x or simply set β x = E [Y│x], which is different from the conditional expectation [
5]. Some studies argue that the above interpretation has been misunderstood or questioned by many econometricians [
6]. For example, Goldberger [
7] agrees with the interpretation that considers β X to be the expected value of Y given that x is fixed, while Wermuth [
8] disagrees and instead argues that β X is not E [Y│x].
The main difference between Goldberger [
7] and Wermuth’s [
8] interpretations, in which econometrics textbooks fall, is whether the structural equations imply a causal meaning or not. Some econometrics textbooks posit that SEM equations represent causal relationships, while other textbooks posit that the SEM equations represent the joint probability distribution. These two points of view are the extreme points and most econometrics textbooks fall somewhere between these two.
Chen and Pearl [
6] argue that the main source of confusion is the lack of a precise mathematical definition of casual relationship. They state that SEM equations are used for two different purposes: one is for predictive problems and the other one is for causal problems or policy decisions. In predictive problems, one seeks to answer the question of what the value of Y will be given that we observe the value of X to be x. In predictive problems, we can define β by the expression of β x = E [Y│do(x)], but it is incorrect to define β in the same way for casual relationship.
Another relevant concept is ceteris paribus. The concept of ceteris paribus is widely used in economics and is directly linked to causal relationships. In econometrics when we talk about the definition of demand, we state that when the price of a good rises, then the quantity demanded of that good will decrease, ceteris paribus, or holding other factors fixed. With the same notion when we hold all other variables fixed, or ceteris paribus, then any relationship between Y and X, in Y = β X + ε relationship, must be a causal relationship.
Another concept that is tied to causal relationship is the discussion of X to be an exogenous variable. The exogeneity of X in a linear relationship between Y and X is held when X is independent of all other factors (variables) included in ε. For example, in a completely randomized process in which all participants are randomly assigned to either control or treatment group, independent of characteristics of the subjects, we can argue that X is exogenous. This interpretation of the exogeneity of X is different from the alternative interpretation in which one defines β X as E [Y│X]. In other words, if the researcher is only interested in a conditional expectation or prediction, then the causal relationship is of no importance. This argument is consistent with textbooks authored by Hill, Griffiths and Lim [
9].
As discussed earlier, in the equation representing the relationship between Y and X, it is necessary for X to be exogenous and uncorrelated with ε in order to estimate β in the Y = β X + ε relationship. In this equation, ε is the effect of all other variables causing change in Y that are not included in X. The β represents the change in Y when X changes by one unit holding all other variables fixed, ceteris paribus. In addition, Chen and Pearl [
6] argue that if we incorrectly consider β X to be the expected value of Y given X or E [Y│X], then the statement of independence of X of ε will be meaningless. In this context, the E [Y│X], is called the conditional expectation of Y. If we are only interested in the conditional expectation, then any bias in the causal relationship can be ignored, and we can reliably use the regression equation for estimating α, or the slope of the equation.
Furthermore, Chen and Pearl [
6] argue that if, through randomization, we force the exogeneity to X, then we will not estimate the conditional expectation, but the interventional expectation. They added that conditional expectation and the interventional expectation are not the same and posit that, “by requiring that exogeneity be a default assumption of the model, we limit its application to trivial and uninteresting problems, providing no motivation to tackle more realistic problems” [
6].
In short, we argue that in business research, researchers need to differentiate between correlation, association, causation, and Granger causality. Correlation is a statistical measure of the relationship between two variables ignoring the effects of other variables. Correlation measure ranges between −1 and +1 with −1 indicating a perfect negative correlation and +1 indicating a perfect positive correlation. No correlation is represented by close to zero correlation and approaches zero when the two variables are not linearly dependent. In calculating the correlation coefficient, no effort is made to control the effects of other related variables. However, in calculating the association measure, the researcher examines the relationship between two variables while holding the effects of all other related variables fixed. In other words, the association is represented by β in the relationship between Y and X, which indicates the extent of change in Y when X changes, holding the effects of all other variables unrelated to X and Y, ε, fixed (ceteris paribus).
In the study of the causation, or the cause–effect relationship between two variables, researchers are concerned about the effect of X on Y. In other words, in the presence of a causal relationship we posit that X causes changes in Y. For causation between X and Y in the direction from X to Y (for X to cause Y) to hold, three conditions must be present: (1) X and Y must vary together, (2) X must occur before Y and (3) no other variables must cause change in Y (when the effects of these other variables are controlled). That is, the researcher should show that when X does not change, then there will be no change in Y. We believe that condition (3) is the most difficult one to achieve. This difficulty is believed to be the main reason that causation is rarely used or used incorrectly in the business literature.
The difficulty of achieving a causal relationship between two variables moved researchers toward a special form of causation called “Granger causality.” Granger [
10] introduced, for the first time, a specific form of causation that later became known as “the Granger causality.” He posits that if a variable Granger causes another variable, then we can use the past values of the first variable to predict the value of the second one beyond the effects of past values of the second variable.
The above discussions reveal that the strongest relationship between two variables is a causal relationship; however, when it is not possible to show a cause–effect relationship, then the next best relationship is the Granger causality relationship. Furthermore, most business researchers are interested in using a linear model to fit their data. Even though a linear model may be a good approximation to fit data, the use of a linear model is not appropriate in many cases, as we have shown below.
Taken together, the extant business literature examines the relationship between two variables, but in most cases, researchers do not properly differentiate between correlation, association, and causation, and in many cases the researchers use these terminologies interchangeably despite their major differences. Given the above discussions, this study is an attempt to show how the use of a linear relationship can be misleading in some cases and shows how sustainability research can extend beyond reporting only correlation and association between ESG sustainability performance and financial performance. In our study, by using practical examples, we show how the Granger causality test which is based on time series analyses can be incorporated into sustainability research.
3. Literature Review
A large body of the literature discusses the applications of econometrics tools such as correlation, association, and Granger causality. Correlation analysis is used in almost all studies that use regression equations to be sure that independent variables are not correlated with each other because high correlation between independent variables can result in a multicollinearity problem. The multicollinearity problem creates bias in the estimates of regression coefficients. Examples of correlation analyses can be found in prior studies [
11,
12,
13,
14]
Regression analysis is extensively used by researchers in different fields. For example, Francis and Mialon [
15] examine the association between the duration of marriages and wedding expenses by conducting a survey of 3000 married individuals in the United States and regress wedding expenses (ceremonies and engagement rings) against marriage duration. Francis and Mialon [
15] find a negative association between these two variables suggesting that when wedding expenses increase, duration of marriage decreases. Other studies examine the association between determinants of quality of marriage and its duration [
16,
17,
18,
19,
20].
In the area of Granger causality, Cevher [
21] posits that X Granger causes Y if Y can be predicted by considering the past observations of both X and Y. As an example, Cevher [
21] argues that the extent of parents’ expenditures on education for their children will result in the success of their children in the future, finding a positive association between spending on education and results of education. This type of relationship provides an example of classical Granger causality, and its validity can be tested when both variables (in a two-variable model) are stationary. When two variables are not stationary, but are associated in long run (meaning that they are cointegrated), then it is not possible to find a vector autocorrelation (VAR) model. In this instance, the classical Granger causality test is not appropriate. However, when at least one variable is not stationary, the Toda–Yamamota test can and should be used for the Granger causality test among variables [
22].
There are two categories of Granger causality, classical Granger causality and modern (advanced) Granger causality analyses. In classical Granger causality, Cevher [
21] posits that the pairwise Granger causality should be used when there is only one dependent variable and one independent variable. Therefore, in this type of Granger causality relationship, the researcher analyzes variables only two by two. On the contrary, in modern (advanced) Granger causality, there exist more than two variables and dependent and independent variables are not determined in advance. Dependent and independent variables in this type of Granger causality are determined by a tool called R Package. For a discussion of R package and its application in advanced (modern) Granger causality, the reader can refer to the Boelstraete and Rosseel [
23] paper.
Granger causality can be either conditional or partial. Conditional Granger causality is used when the Granger causality from X to Y and Z depends on other variables. When conditional Granger causality fails (that is, when exogenous variables are present), then prior research concludes that partial Granger causality should be applied [
24]. Partial Granger causality takes into account the underlying relationship among all variables in a network. Finally, authors question the validity of a special type of Granger causality in neuroscience called the Granger–Geweke causality and concluded that Granger–Geweke may not be applicable without considering the critical components of the system [
25]. These authors argue that the lack of attention to the critical components of the system model can lead to spurious results [
25]. However, Barnett, Barrett and Seth [
26] reject the above criticism and argue that this criticism is the result of a misconception, as well as an incomplete review of related literature. The next section provides several practical examples of using correlation, association, causation, and Granger causality.
5. Policy, Practical, Education and Research Implications
5.1. Policy Implications
The time series models presented in this study can be used by regulators, standard setting bodies, and policy makers in evaluating the efficiency and effectiveness of new standards and rules in the sense that economic consequences of new rules and standards be evaluated in terms of cost-benefit analyses and cause and effect with intended objectives. The intended consequences of rules, regulations and standards can be examined over time by using the time series models. We posit that the causation can be used by regulators in evaluating the effects of their proposed regulations and standards. The causation and association can help investors to better evaluate the pattern of data and detect unusual changes in bottom line information.
We posit that the time series models can be used by regulators in evaluating the financial statements of public companies. The Sarbanes-Oxley Act of 2002 requires that the Securities and Exchange Commission (SEC) review financial statements of public companies at least once every three years to prevent and detect fraud and irregularities. Time series models and analyses can be submitted as supplementary information through the SEC’s Electronic Data Gathering and Retrieval (EDGAR) website. This supplementary information can help investors to better evaluate the pattern of data and detect unusual changes in bottom line information. Time series analyses are consistent with Section 408 of the Sarbanes-Oxley Act that requires the SEC to review annual financial statements of public companies to comply with the SEC rules and regulations. Lastly, time series analyses can be used by the Public Company Accounting Oversight Board (PCAOB) in inspecting annual financial statement audits.
Several organizations worldwide including the Global Reporting Initiative (GRI), International Integrated Reporting Council (IIRC), Sustainability Accounting Standard Board (SASB), and the United Nations Global Compact have issued guidelines regarding disclosure of non-financial environmental, social, and governance (ESG) sustainability performance information [
33]. These organizations can use time series analyses presented in this paper in determining cause and effects of voluntary disclosure of ESG sustainability disclosure and the materiality benchmark that should be used in determining the type and extent of such disclosure.
5.2. Practical Implications
Time series models and analyses can be used in practice for different purposes such as detecting symptoms of financial misstatements caused by errors fraud and irregularities. It is practical to use past data to build time series models for forecasting future events and financial earnings. These forecasts can then be compared with actual data to detect unusual fluctuations of data and investigate the variances (the differences between forecasted and actual data). Correlation, association, and special cases of causation can be used in practice for different purposes such as the determination of the link between non-financial ESG sustainability performance and financial performance. These types of comparisons are of interest to both public companies and their investors in developing hypotheses of the link between ESG sustainability and financial performance and then collecting data to support or reject their hypotheses. Hypotheses are developed by evaluating the pattern of past data together with the use of correlation, association, or Granger causality. In addition, time series information can be used in testing the causes and effects of sustainability disclosures.
The relationship between financial performance and ESG performance has been extensively yet inconclusively debated in the literature in the past decade, which has caused investors not to pay enough attention to sustainability factors of risk, performance, and disclosure [
2,
34]. However, a growing number of investors are now considering the impact of investing with a keen focus on financial return and ESG sustainability factors, regulators mandate ESG sustainability performance disclosure and public companies prepare and disseminate sustainability reports. In this era of sustainability-oriented investors, directors and executives, a major challenge is to show that ESG sustainability factors contribute to the bottom-line earnings and long-term return. Time series analyses can be used to demonstrate the causes and effects of the global move toward sustainability initiatives.
5.3. Education Implications
As discussed earlier in this paper, time series models and analyses are not adequately utilized and applied in business literature. Business schools and accounting programs can focus on the differences between correlation, association, and causation to ensure that these concepts are not used inappropriately and interchangeably. Educating students about these important topics are of great importance for courses that deal with budgeting, sustainability, and forecasting. We posit that the inadequate understanding of correlation, association, and causation is the result of unfamiliarity of accounting students with these important topics. Therefore, we recommend that business schools and accounting programs incorporate these topics into their related courses and better educate students in this regard.
We posit that underutilization of time series models in business is the result of unfamiliarity of academics, students, and practitioners with time series concepts. Even though time series courses are offered and taught in economics departments of many universities, usually as part of college of art and science, only a small number of business students take these courses because they are not classified as core courses, and it is not a common practice for business students to take these courses. Therefore, we recommend that business schools and accounting programs worldwide incorporate time series courses in their business and accounting curricula.
5.4. Research Implications
As discussed in the literature review of this paper, business researchers have not yet adequately used time series models and analyses in their studies. The common practice by business scholars is to use the traditional statistics in which they develop hypotheses, then collect data to test their hypotheses. However, in time series statistics, researchers collect data and investigate the pattern of data to generate hypotheses. The complexity of real-life activities and the high cost of using a cross-sectional approach and massive unobservable noises in the real life make the use of time series research more efficient and effective. We hope our study will open a window of opportunities for using time series concepts in business and, particularly, in the emerging opaque sustainability area where the link between financial economic performance and non-financial ESG sustainability performance is not well-established. To structure and effectively lead the business and sustainability research, we recommend that a framework and taxonomy for sustainability be developed to be used by researchers. We emphasize that the availability and expansion of public data have enabled the use of a time series approach in sustainability research in comparison with studies that use cross-sectional approach.
6. Conclusions
Time series models have been used in business research to transform unstructured and semi-structured data into structured information in improving the quality of financial and non-financial information. Researchers apply time series analyses in examining the value-relevance of non-financial ESG information and its link to financial and market performance. These studies often find mixed results of the relationship between financial economic sustainability performance information and non-financial ESG sustainability performance information because of the use of different time periods, variables definition and construction, hypothesis development and justification, and estimation methods. The quality and usefulness of sustainability studies could also improve by proper interpretation of results in terms of correlation, association, and causation. In this paper we discuss the differences between correlation, association, and Granger causality and their implications in business research. Business studies are often focused on the determination of the association between variables of interest and, thus, researchers examine the relationship between two variables while holding the effects of other related variables fixed (ceteris paribus). In science studies, researchers are able to examine the causation or the cause–effect relationship between two variables (e.g., smoking and cancers). For causation between X and Y in the direction from X to Y (for X to cause Y) to hold, three conditions must be present: (1) X and Y must vary together, (2) X must occur before Y and (3) no other variables must cause change in Y (when the effects of these other variables are controlled). These conditions are often challenging and difficult to hold in business research.
The difficulty of achieving a causal relationship between two variables encourage business researchers to consider a special case of causation called “the Granger causality” that focuses on using the past values of the first variable to predict the value of the second variable beyond the effects of past values of the second variable. We offer practical examples for correlation, association, and the Granger causality and discuss their main differences. We present analyses regarding the improvement from correlation (looking at movement of two variables without controlling the effects of other variables) to association (looking at movement of two variables after holding the effects other variables fixed), and to causation (what variables cause change in dependent variable). We conclude that because achieving causation is extremely difficult, the alternative is the use of Granger causality, which is the second best to causality. We show, using an empirical example, how the use of a linear regression may not be appropriate when the true relationship is not linear. Finally, we discuss the policy, practical, and educational implications of our paper. Academics conducting business sustainability research can use our suggested example in examining the possible link between financial economic sustainability performance and non-financial ESG sustainability performance, use time series analyses in detecting patterns in unstructured data, develop testable research hypotheses, and estimate association models that produce economically and statistically significant robust results.