The Determinants of the U.S. Consumer Sentiment: Linear and Nonlinear Models

: We examined the determinants of the U.S. consumer sentiment by applying linear and nonlinear models. The data are monthly from 2009 to 2019, covering a large set of ﬁnancial and nonﬁnancial variables related to the stock market, personal income, conﬁdence, education, environment, sustainability, and innovation freedom. We show that more than 8.3% of the total of eigenvalues deviate from the Random Matrix Theory (RMT) and might contain pertinent information. Results from linear models show that variables related to the stock market, conﬁdence, personal income, and unemployment explain the U.S. consumer sentiment. To capture nonlinearity, we applied the switching regime model and showed a switch towards a more positive sentiment regarding energy e ﬃ ciency, unemployment rate, student loan, sustainability, and business conﬁdence. We additionally applied the Gradient Descent Algorithm to compare the errors obtained in linear and nonlinear models, and the results imply a better model with a high predictive power.


Introduction
Previous empirical studies provide evidence on the association between consumer sentiment and economic and financial variables (Gupta et al. 2014;Fisher and Huh 2016;Baghestani and Palmer 2017;Shahzad et al. 2019). For example, the academic literature is rich on how sentiment can explain returns on stocks (Schmeling 2009;Akhtar et al. 2011;Chung et al. 2012;Balcilar et al. 2018;Zhou 2018). Given the importance of the consumer sentiment to business-cycle analysis (Lahiri and Zhao 2016), it is very informative to extend the related literature and understand the determinants of the consumer sentiment, especially in the largest economy, the U.S. This is important as the empirical evidence indicates that economic and financial crises are reflected in a decrease in economic activities, personal incomes, spending, and a depressed labor market. Such economic implications on consumers will ultimately affect consumer sentiment and thus consumers' perceptions of the overall economy and of their personal financial conditions (van Giesen and Pieters 2019). In addition to its association with income, wealth, and stock market performance, consumer sentiment can be affected by the establishment of an environmental governance system that is found to be beneficial to economic conditions. Furthermore, there are many advantages of being more energy-efficient, and more environmentally friendly, which leads to more sustainability and satisfaction (Issa et al. 2011). However, previous studies tend to focus on consumer sentiment in a relatively generic setting and often ignore the determinants of consumer sentiment in linear and nonlinear models. Especially, there is lack of

Literature Review
Previous studies indicate that the investor sentiment is studied through different perspectives, except from a consumer sentiment perspective. Fisher and Statman (2000) examined the relationship between Wall Street strategists and the sentiment of individual investors and found evidence of a negative relationship. Baker and Wurgler (2006) studied how investor sentiment can affect the cross-section of stock returns. They defined Investor sentiment as the degree of market participants' being overly optimistic or pessimistic about financial markets. They showed that investor sentiment, broadly defined, has significant cross-sectional effects, which undermines classical finance theory in which investor sentiment does not play any role in the cross-section of stock prices, realized returns, or expected returns. Baker and Wurgler (2007) indicated that it is quite possible to measure investor sentiment, and that waves of sentiment have clearly discernible, important, and regular effects on individual firms and on the overall stock market returns. Kurov (2008) analyzed the sentiment of traders through feedback trading and found that positive feedback trading appears to be more active in periods of high investor sentiment, which is consistent with the notion that feedback trading is driven by expectations of noise traders. Cofnas (2015) indicated that business and consumer confidence data are a powerful source of information that can move financial markets. They showed that, when related survey results are released, they provide important information on expectations regarding the local economy. Focusing on the drivers of consumer sentiment over business cycles, Lahiri and Zhao (2016) showed that macroeconomic variables can explain consumer sentiment and highlight the role of household perceptions on their own financial and employment prospects. Paraboni et al. (2018) showed the existence of a significant relationship between measures of market sentiment and risk. The developed U.S. and German markets demonstrate a stronger relationship between optimism and risk, while the emerging Chinese market demonstrates a stronger relationship between pessimism and risk. Wadud et al. (2019) showed the impact of consumer confidence on U.S. household credit delinquency rates. Zhang and Pei (2019) explored the impact of investor sentiment on stock returns of petroleum companies by using a binomial probability distribution model to build a daily investor sentiment endurance index. According to their results, the index can effectively predict the stock returns of petroleum companies, and the sentiment effect becomes stronger in the period of economic expansion. Bouteska (2019) examined whether the investor sentiment has a moderating effect on the impact of earnings restatements on security prices by studying the cumulative abnormal returns and investor sentiment. The results show that investor conservatism represents a dominant factor to explain the positive relationship between cumulative abnormal return and investor sentiment.
In light of the above studies, we contribute to the academic literature by studying the consumer sentiment from a different perspective by focusing on the linear and nonlinear relationship between the U.S. consumer sentiment and several financial and nonfinancial explanatory variables related to stock market, confidence, education, environment, sustainability, and innovation freedom. Notably, the Gradient Descent Algorithm is new to the above academic literature and its application refines the prediction models involving the determinants of the U.S. consumer sentiment. In fact, we show that the sum of errors computed by using Gradient Descent Algorithm is smaller than the one found in ordinary linear regressions and the switching regime model, which indicates that the algorithm gives less error and could be used to do better predictions comparing to the other models with a high predictive power.

Data
Our data are at the monthly frequency. Given that data on several variables under study are not all at the daily frequency, we opted for the monthly frequency for all the variables under study. Accordingly, where needed, we computed the monthly growth/returns for daily series, by calculating the average growth/return observed during each month. Our data cover the following series that are often used in previous studies:

•
University of Michigan Consumer Sentiment Index is a monthly survey of U.S. consumer confidence levels conducted by the University of Michigan. It is based on telephone surveys that gather information on consumer expectations regarding the overall economy. • Bloomberg Barometer Startups Global Index measures both the occurrence and level of historical and recent venture activity for U.S.-based startups excluding biotechnology. The index is a gauge of startup activity that equally considers capital raised, deal count, first financings, and exit count.

•
Business Confidence Index provides information on future developments, based upon opinion surveys on developments in production, orders, and stocks of finished goods in the industry sector. • Dow Jones Sustainability United States 40 Index is composed of U.S. sustainability leaders as identified by Sustainable Asset Management (SAM) through a corporate sustainability assessment. The index represents the top 20% of the largest 600 U.S. companies in the Dow Jones Sustainability U.S. Index based on long-term economic, environmental, and social criteria.

•
Morgan Stanley Capital International (MSCI) Global Energy Efficiency Index includes developed and emerging market large-, mid-, and smallcap companies that derive 50% or more of their revenues from products and services in energy efficiency. • MSCI USA ESG leaders index is a capitalization weighted index that provides exposure to companies with high Environmental, Social, and Governance (ESG) performance relative to their sector peers.

•
Personal Income in Billions is the income that persons receive in return for their provision of labor, land, and capital used in current production and the net current transfer payments that they receive from business and from government.
• S&P Carbon Efficiency Index is designed to measure the performance of companies in the S&P 500, while overweighting or underweighting those companies that have lower or higher levels of carbon emissions per unit of revenue. • S&P Consumer Finance Index provides liquid exposure to mortgage real estate investment trusts (REITs), thrifts and mortgage finance companies, diversified and regional banks, consumer finance or data processing services companies trading on U.S. stock exchanges. • S&P Municipal Bond Education Index consists of bonds in the S&P Municipal Bond Index from the Higher Education and Student Loan Sectors. • U.S. unemployment rate is defined as the percentage of unemployed people who are currently in the labor force. In order to be in the labor force, a person either must have a job or have looked for work in the last four weeks.
Data are taken from Bloomberg and S&P databases and cover the period October 2009-July 2019. We designate by p t the monthly average level of the series on month t. We compute the natural logarithmic growth/returns of each series as: r t = ln p t+1 p t , which yields 118 monthly observations.

Empirical Models and Results
In this section, we examine multicollinearity, analyze the correlation matrix based on the Random Matrix Theory (RMT), and conduct regression analyses. First, we analyze the logarithmic growth/returns of each variable to understand some of their features as well as the structure of cross-correlation, which helps in refining our models. Then, we run multiple regressions to uncover how each of the variable is contributing to the explanation of the U.S. consumer sentiment.

Multicollinearity Analysis
The presence of multicollinearity among the independent variables is assessed via the variance inflation factor (VIF): The variables are said to be not correlated if the VIF is close to one, moderately correlated if the VIF is between one and five, and highly correlated if the VIF exceeds five. Table 1 shows that only three variables have a VIF above five (DOW JONES SUSTAINABILITY U.S. INDEX; MSCI USA LEADERS INDEX; SP500 CARBON EFFICIENT).

Random Matrix Theory Analysis
Using RMT, Pafka and Kondor (2004) found that the effect of noise in the correlation matrices of financial series can be large and that the filtering based on RMT is particularly powerful in this respect. Laloux et al. (1999Laloux et al. ( , 2000 indicated that the empirical correlation matrix leads to a dramatic underestimation of the real risk, by overinvesting in artificially low-risk eigenvectors. They showed that less than 6% of the eigenvectors, which are responsible for 26% of the total volatility, appear to carry some information. In order to quantify correlations, we first calculate the growth/return of series i = 1, . . . , N over a time scale ∆t, where S i (t) denotes the level of the series i. Since different series (variables) have varying levels of volatility (standard deviation), we define a normalized return . . denotes a time average over the period studied. We then compute the equal-time cross-correlation matrix C with elements By construction, the elements C ij are restricted to the domain −1 ≤ C ij ≤ 1, where C ij = 1 corresponds to perfect relations, C ij = −1 corresponds to perfect anti-correlations, and C ij = 0 corresponds to uncorrelated pairs of stocks.
The difficulties in analyzing the significance and meaning of the empirical cross-correlation coefficients C ij are due to the fact that market conditions change with time, and the cross correlations that exist between any pair of variables may not be stationary.
Furthermore, the finite length of time series available to estimate cross-correlations introduces "measurement noise".
If we have N returns with the same length equal to L, then the empirical cross-correlation matrix C could be computed by C ij . In our case, we have N = 62 and L = 1491. By diagonalizing matrix C, we obtain Cu k = λ k u k In matrix notation, the correlation matrix can be expressed as . . , L − 1 , and G T denotes transpose of G. Therefore, we consider a random correlation matrix where A is an N × L matrix containing N time series of L random elements a im with zero mean and unit variance, which are mutually uncorrelated.
Statistical properties of random matrices such as R are known (e.g., Dyson 1971;Sengupta and Mitra 1999). Particularly, in the limit N → ∞, L → ∞, such that Q ≡ L/N(> 1) is fixed, the probability density function P rm (λ) of eigenvalues λ of the random correlation matrix R is given by For λ within the bounds λ − ≤ λ i ≤ λ + , where λ − and λ + are, respectively, the minimum and maximum eigenvalues of R given by where σ 2 is equal to the mean of eigenvalues of the correlation matrix (Bouchaud and Potters 2003). The distribution of the components u k (l) l = 1, 2, . . . , N of an eigenvector u k of a random correlation matrix R should obey the standard normal distribution with zero mean and unit variance (Plerou et al. 2002), We observe that there are deviations from the interval of eigenvalues [λ − , λ + ] predicted by RMT. Then, these deviating values might contain pertinent information, and therefore they are not noisy elements.
It is found that theoretical eigenvalues bounds (maximum and minimum) are λ max = 1.7395 and λ min = 0.4639. We have 12 (N) series and 118 (L) monthly returns for each equity. Then, the value of Q is equal to L N = 9.8333. By analyzing results, we observed in Figure 1 that many eigenvalues deviate from RMT interval of predictions. Laloux et al. (2000) found that there is less than 6% of eigenvalues that might contain pertinent information. In our case, these deviations represent 8.33% of the total of eigenvalues, which is a very important percentage. Then, only 91.67% of eigenvalues deals with random matrix theory distribution. Moreover, the maximum of empirical value of eigenvalues (λ 1 = 5.0374) exceeds what is predicted by random matrix theory λ max = 1.7395. where is equal to the mean of eigenvalues of the correlation matrix (Bouchaud and Potters 2003). The distribution of the components { ( )| = 1, 2, . . . , } of an eigenvector u of a random correlation matrix R should obey the standard normal distribution with zero mean and unit variance (Plerou et al. 2002), We observe that there are deviations from the interval of eigenvalues [ , ] predicted by RMT. Then, these deviating values might contain pertinent information, and therefore they are not noisy elements. It is found that theoretical eigenvalues bounds (maximum and minimum) are = 1.7395 and = 0.4639. We have 12 ( ) series and 118 ( ) monthly returns for each equity. Then, the value of is equal to = 9.8333.
By analyzing results, we observed in Figure 1 that many eigenvalues deviate from RMT interval of predictions. Laloux et al. (2000) found that there is less than 6% of eigenvalues that might contain pertinent information. In our case, these deviations represent 8.33% of the total of eigenvalues, which is a very important percentage. Then, only 91.67% of eigenvalues deals with random matrix theory distribution. Moreover, the maximum of empirical value of eigenvalues ( = 5.0374 ) exceeds what is predicted by random matrix theory = 1.7395.

Regression Analysis
In the following, we present the results from regressing the University of Michigan Consumer Sentiment Index on the various independent variables. Table 2 presents the results from the ordinary least squares (OLS) linear regression for the undifferentiated variables (Model 1) and the differentiated variables (Model 2). In both models, the F-statistic is significant, suggesting that all the independent variables jointly can influence the University of Michigan Consumer Sentiment Index. We observe in Model 1 that only two variables are significant at the level of 5% and the adjusted R-squared represents about 7% of the explained variance. The MSCI USA Leaders Index is significant at the 5% level and that companies with high Environmental, Social, and Governance (ESG) performance contribute positively and importantly in the improvement of the U.S. consumer sentiment. In Model 2, the adjusted R-squared is 23.43% of the explained variance, and six variables are significantly related to the University of Michigan Consumer Sentiment Index. These are Business Confidence Index (-1), Dow Jones Sustainability Index (-1), MSCI Global Energy Efficiency (-1), Personal income (-4), SP Consumer Finance Index (-3), and Unemployment rate (-3). Besides, the adjusted R-squared improved from 7.18% to 23.43% of the explained variance. Model 2 could only be used for predictions given that the differentiations were done iteratively until we got the best results. Overall, some of our results are in line with Lahiri and Zhao (2016), who showed that macroeconomic conditions can explain the sentiment of U.S. consumers. Note: *, **, *** denote significance at 10%, 5%, and 1% levels, respectively. ‡ denotes series that follows a normal distribution according to the Jarque Bera Test (Jarque and Bera 1980).
Next, we split the full sample period into two equal sub-periods to assess whether the estimated model maintains the same predictive power. The related results are given in Table 3. Notably, they show the significance of seven variables in the first sub-period compared to only one variable in the second sub-period. The Adjusted R-squared is 37% in the first sub-period and 1.5% in the second sub-period. Accordingly, we can say that the model is not stable over time since it shows very different results in each sub-period. This suggests the need to move beyond the linear regression in order to capture nonlinearity in the model. Note: *, **, *** denote significance at 10%, 5%, and 1% levels, respectively.

Regime Switching Model
We applied the regime-switching model (Hamilton 2005), which is used in previous studies (e.g., Geng et al. 2016).
For one variable, the typical behavior could be described with a first autoregression as follows, with ε t ∼ N 0, σ 2 , which seemed to be adequately the observed data for t = 1, 2, . . . , t 0 . t 0 is a date where there is a significant change in the average of the series, so that instead the data would be described as follows, y t = c 2 + ∅y t−1 + ε t for t = t 0 + 1, t 0 + 2, . . . This fix of changing the value of the intercept from c 1 to c 2 might help the model to get back on track with better forecasts, but it is rather unsatisfactory as a probability law that could have generated the data. Rather than claim that the first equation above governed the data up to date t 0 and the second one after that date, it is possible to write that in one equation, y t = c s t + ∅y t−1 + ε t where s t is a random variable, as a result of institutional changes that happened in the sample. s t = 1 for t = 1, 2, . . . , t 0 and s t = 2 for t = t 0 + 1, t 0 + 2, . . . The probabilistic model of what caused the change from s t = 1 to s t = 2 where s t is the realization of a two-state Markov chain with Pr(s t = j s t−1 = k, . . . , y t−1 , y t−2 , . . .) = Pr(s t = j s t−1 = i) = p ij s t is not supposed to be observed directly, but only infer its operation though the observed behavior y t . The probability of a change in regime depends on the past only through the value of the most recent regime (Hamilton 2005). Furthermore, if the regime change reflects a fundamental change in monetary or fiscal policy, the prudent assumption would seem to be to allow the possibility for it to change back again, suggesting that p 22 < 1 is often a more natural formulation for thinking about changes in regime than p 22 = 1 (Hamilton 2005). We present in Table 4 the results of the regime-switching model.
Results from Regime 1 show that the SP Municipal Bond Education is statistically significant at 10%, reflecting the positive impact of the student loan on the University of Michigan Consumer Sentiment Index. The SP Consumer Finance index and unemployment rate are also significant at the level of 5%.
The SP Consumer Finance Index contributes significantly and positively to the consumer sentiment index, while the unemployment rate contributes negatively to it. In that state (1), the U.S. consumer seems to be frustrated about the other variables and then has less confidence in sustainability variables, innovation freedom, energy efficiency policies, and personal income expectations. However, MSCI Global Energy Efficiency, SP Municipal Bond Education, and Personal Income contribute negatively to the level of the University of Michigan Consumer Sentiment Index after switching from Regime 1 to Regime 2. This result should be explained as an important change in the social and economic state in the U.S. Furthermore, other variables become significant; Business Confidence Index and MSCI USA Leaders Index have a significant and positive impact. This can be explained by the fact that the improvement of businesses affects directly and positively the consumer sentiment. As for the positive impact of the MSCI USA Leaders Index, it can be explained by the fact that U.S. consumers are highly satisfied by the Environmental, Social, and Governance (ESG) performance of companies belonging to the MSCI USA Leaders Index. Finally, personal income shows a negative relationship with the University of Michigan Consumer Sentiment Index. Based on the above results, we indicate that the regime-switching model presents a switch from a state one where U.S. consumers were not confident about the variables studied in relation to sustainability, personal income, environment, and business confidence to state two, when there was a switch towards more confidence and more positive sentiment regarding energy efficiency, unemployment rate, student loan, sustainability, and business confidence.
Furthermore, results from Table 5 show that the probabilities of being in Regime 1 are more than 68% while the probabilities of being in Regime 2 are 31.56%. We also observe that both probabilities do not depend on the origin state. The constant expected duration is also higher in Regime 1 with a duration of 3.17 months, compared to only 1.47 months in Regime 2. Accordingly, U.S. consumers seem to stay most of the time in Regime 1 and are usually less confident in sustainability variables, innovation freedom, energy efficiency policies, and personal income expectations. The switch to Regime 2 is seasonal and depends on the social, economic, political, and institutional mutations in the U.S. Table 5.
Constant simple transition probabilities and constant expected durations of the regime-switching regression.

Gradient Descent Algorithm
We computed a gradient descent for the linear regression in order to compare the results obtained. The gradient descent is an optimization algorithm used to minimize a function by iteratively moving in the direction of steepest descent, as defined by the negative of the gradient (Cauchy 1847). Algorithms play an important role in the optimization process. They are defined as a finite sequence of well-defined, computer-implementable instructions, in order to solve a class of problems or to perform a computation. Gradient descent will allow us to update linear regression coefficients in an iterative way until convergence. Then, we will try to minimize the function of mean squared error (cost function) that is considered as the difference between the estimator and the estimated values.
The adjustment of this equation allows for making a calculation simpler with the Gradient Descent Algorithm to obtain the following equation, Gradient Descent changes the theta values iteratively until, in a way, that minimizes the cost function. We start the algorithm by initializing theta (0) and theta (1).
where α, alpha, is the learning rate, or how quickly we want to move towards the minimum. If α is too large, however, we can overshoot. The algorithm will be repeated until convergence After computing the algorithm, we obtained the results presented in Table 6. We can see that the sum of errors computed by using Gradient Descent Algorithm is smaller than the one found in the other ordinary linear regressions and the switching regime model. The interpretation of the coefficient is not possible since this learning machine algorithm aims to minimize the cost function regardless of the meaning of the coefficients. Thus, this model could be used to do predictions that are more accurate since it has less errors comparing to the other models computed above and gives it a high predictive power.

Conclusions
In this paper, we examined the relationship between the U.S. consumer sentiment and other relevant financial indexes in relation to education, environment, sustainability, and innovation freedom. We started by analyzing all the variables structure via cross-correlation and RMT analysis. Results show that more than 8.33% of the total of eigenvalues contain deviate from the RMT and contain then pertinent information, which means that those variables are useful for our analysis. Then, we used the linear regression which fails to capture the nonlinearity interaction among the variables, especially after estimating the linear regression in two equal sub-periods. Accordingly, we employed the regime-switching regression, and the results show that the model presents a switch from a Regime 1 where U.S. consumers were not confident about the variables studied in relation to sustainability, personal income, environment, and business confidence, to Regime 2, where there was a switch towards more confidence and more positive sentiment regarding energy efficiency, unemployment rate, student loan, sustainability, and business confidence. However, U.S. Consumers stay most of the time in Regime 1 and are usually less confident in sustainability variables, innovation freedom, energy efficiency policies, and personal income expectations. The switch to Regime 2 is seasonal and depends on the social, economic, political and institutional mutations in the U.S. Finally, we computed the Gradient Descent Algorithm to compare the errors obtained in each model. We found that the algorithm gives less error and could be used to do better predictions comparing to the other models with a high predictive power.
Our analyses and results extend our limited understanding regarding the exogenous factors that determine the U.S. consumer sentiment and the suitability of prediction models. In fact, we have shown that noneconomic and nonfinancial variables matter to the level of the U.S. consumer sentiment and that nonlinear models combined with Gradient Descent Algorithm have a more significant prediction power over standard regression models.
Our results have policy implications given that the findings presented above improve our understanding of the factors driving the sentiment of U.S. consumers. The results can be useful to investors in a way that would help them better understand the drivers of the U.S. consumers' sentiment and the overall level of confidence in the U.S. economy. Furthermore, the results have implications regarding consumption, saving, investment, and other related variables. Consumers can benefit from the findings to enhance their understanding of the most important problems that are impacting their sentiments, which might induce economic, social, and political consequences through voting decisions or economic and social adjustments. For policymakers, there seems to be a possibility to design policies capable of exploiting the association between current economic and stock markets conditions and consumers' confidence. Given the significant role played by specific factors, a practical policy formulation is merited to enhance U.S. consumer confidence with appropriate initiatives that involve education, environment, sustainability, and innovation freedom. If employed, such policies can enhance the well-being of the U.S. consumers.
Future studies can consider conducting an analysis that involves various developed and emerging countries. Another extension could be the application of a mixed-data sampling to exploit the high frequency of data on stock indices in explaining consumer confidence.