Survey expectations elicited from ETS have advantages over experimental expectations, as they are based on the knowledge of agents who operate in the market and provide detailed information about many economic variables. Additionally, above all, they are available ahead of the publication of official quantitative data. These characteristics make them particularly useful for prediction. Additionally, since BCS questionnaires are harmonized, survey results allow comparisons among different countries’ business cycles and have become an indispensable tool for monitoring the evolution of the economy.
2.1. Quantification of Qualitative Survey Expectations
The balance statistic was proposed by [
28,
29], who defined it as a measure of the average changes expected in the quantitative variable of reference. Let
be the actual average percentage change of variable
, and
the specific change for agent
i at time
t. By discriminating between agents according to whether they reported an increase or a decrease, and assuming that the expected changes (
,
) remain constant both over time and across respondents, the author formalized the relationship between actual changes in a variable and respondents’ expectations as
, where
is a mean zero disturbance. The sub index denotes the period in which the survey was responded, i.e., the period in which the expectation was formed; while the supra index denotes the period to which the expectation refers. Thus, by means of ordinary least squares, estimates of
and
can be obtained, and then used to generate one-period-ahead forecasts of
in:
This framework was augmented by [
30,
31], allowing for an asymmetrical relationship between individual changes and the evolution of the quantitative variable of reference. By regressing actual values of the variable on respondents’ perceptions of the past (
,
), non-linear estimates of the parameters can be obtained and used in the following expression to generate forecasts of
:
This approach to the quantification of expectations came to be known as the regression method. In the following years, this framework was expanded by [
32], who made positive and negative individual changes dependent on past values of the quantitative variable of reference, and proposed a non-linear dynamic regression model to quantify survey responses.
A drawback of the regression approach to quantifying survey responses is that there is no empirical evidence that agents judge past values in the same way as when they formulate expectations about the future [
26]. As a result, the regression approach is restricted to expectations of variables over which agents have direct control, be it prices or production. Additionally, the implementation of this method requires the availability of individual data. For an appraisal of individual firm data on expectations, see [
33].
Alternatively, in [
34], the author developed a theoretical framework to generate quantitative estimates from the balance statistic. Based on the assumption that respondents report a variable to go up (or down) if the mean of their subjective probability distribution lies above (or below) a certain level, the author defined the indifference threshold, also known as the difference limen. Let
denote the unobservable expectation that agent
i has over the change of variable
, the indifference interval can be defined as
, where
and
are the lower and upper limits of the indifference threshold for agent
i at time
t. Assuming that response bounds are symmetric and fixed both across respondents and over time (
,
), and that agents base their responses according to an independent subjective probability distribution
that has the same form across respondents, an aggregate density function can be derived. Thus,
and
could be regarded as consistent estimates of population proportions. This framework is summarized in
Figure 1, where the individual density functions are assumed to be normally distributed.
This theoretical framework was further developed by [
35], configuring what came to be known as the probability approach or the Carlson–Parkin method. The main notion behind this quantification procedure lies in the fact that estimates of
are conditional on a particular value for the imperceptibility parameter (
) and a specific form for the aggregate density function. The authors assumed that the individual density functions were normally distributed and estimated
by assuming that over the in-sample period,
is an unbiased estimate of
. Consequently, the role of
becomes scaling aggregate expectations
, such that the average value of
equals
. Therefore, qualitative responses can be transformed into quantitative estimates as follows:
where
and
respectively correspond to the abscissa of
and
, and the imperceptibility parameter is computed as:
There is no consensus on the type of probability distribution aggregate average expectations come from. Researchers have used alternative distributions [
36,
37,
38,
39]. While the normality hypothesis was rejected at first [
40,
41], later evidence was found that normal distributions provided expectations that were as accurate as those produced by other non-normal distributions [
42,
43,
44,
45]. Recently, using data from consumers’ price expectations in the EA [
46] proved that the distribution choice provided only minor improvements in forecast accuracy, while other assumptions such as unbiased expectations and the number of survey response categories played a pivotal role.
Another strand of the literature has focused on refining the probability approach by relaxing the assumptions of symmetry and constancy of the indifference bounds, which is
. Different alternatives have been proposed in the literature in order to introduce dynamic imperceptibility parameters in the probability framework for quantification. While some authors chose to make the threshold parameters depend on time-varying quantitative variables [
47,
48,
49], others imposed the unbiasedness condition over predefined subperiods [
50]. By introducing the assumption that the imperceptibility parameters were subject to both permanent and temporary shocks, and using the Cooley–Prescott model, the probabilistic approach was allowed to include asymmetric and time-varying indifference thresholds [
51].
More recently, this framework was further developed by using a state-space representation that allowed for asymmetric and dynamic response thresholds generated by a first-order Markov process [
52,
53]:
where
and
are autoregressive parameters;
and
are two independent and normally distributed disturbances with mean zero and variances
and
; and
and
are respectively derived from the Carlson–Parkin framework depicted in
Figure 1 as:
and
. Assuming null initial conditions, the authors used the Kalman filter for parameter estimation.
Picking up the notion that the indifference parameters may be dependent on past values of an economic variable, in [
54] a smooth transition model was used to allow for time variation in the scaling parameter. Similarly, based on the results obtained by [
55], where it is shown that inflation expectations depend on agents’ previous experience, in [
56], the author expanded the Carlson–Parkin framework by determining an experience horizon and assuming that agents’ expectations are distributed in the same way as the actual variable was distributed over the period defined by the experience horizon.
There is inconclusive evidence regarding the variation of the indifference thresholds across agents. Since this hypothesis can only be tested by means of the analysis of individual expectations or by generating experimental expectations via Monte Carlo simulations, further improvements of quantification procedures have mostly been developed at the micro level, comparing the individual responses with firm-by-firm realizations [
57,
58,
59,
60]. A procedure to quantify individual categorical expectations was developed based on the assumption that responses were triggered by a latent continuous random variable [
57]; the authors found evidence against constant thresholds in time. Another variant of the Carlson–Parkin method developed at the micro level, with asymmetric and time invariant thresholds, pivoted around the “conditional absolute null” property, which can be regarded as an assumption based on the empirical finding that the median of realized quantitative values corresponding to “no change” is zero [
58]. This approach allowed for solving the zero-response problem that occurs when all respondents fall into one of the extreme responses (an increase or a decrease).
Using a matched sample of qualitative and quantitative individual stock market forecasts, in [
59], the authors corroborated the importance of introducing asymmetric and dynamic indifference parameters, but found that individual heterogeneity across respondents did not play a major role in forecast accuracy. Based on a matched sample of households, and using a hierarchical ordered probit model, in [
60], the authors found strong evidence against the threshold constancy, symmetry, homogeneity, and overall unbiasedness assumptions of the probability method, showing that when the unbiasedness assumption is replaced by a time-varying calibration, the resulting quantified series is found to better track the quantitative benchmark.
In parallel with the analysis at the micro level, the methodology has also been developed by means of experimental expectations. As with quantified survey expectations, simulated expectations have usually been used to test economic hypotheses, such as rational expectations [
61], and to assess the performance of the different quantification methods. In this regard, some authors have focused on the estimation of the measurement error introduced by the probabilistic method [
62,
63]. In [
62], the author proposed a refinement of the Carlson–Parkin method. In [
64], computer-generated expectations were computed to assess the forecasting performance of different quantification methods; the author also presented a variation of the balance statistic that took into account the proportion of respondents reporting that the variable remains unchanged.
By means of a simulation experiment, in [
65], the author additionally showed that the omission of neutral responses resulted in an overestimation of the level of individual heterogeneity across respondents. Dispersion-based metrics of disagreement among respondents have been used in recent years to proxy economic uncertainty. In this sense, in [
66], the authors generalized the Carlson–Parkin procedure to generate cross-sectional and time-varying proxies of the variance. Using data from the CESifo Business Climate Survey for Germany and from the Philadelphia Fed’s Business Outlook Survey for the US, in [
11], the authors proposed the following measure of disagreement based on the dispersion of respondents’ expectations to proxy economic uncertainty:
This metric is the square root of the balance statistic. Since then, several measures of disagreement among survey expectations have been increasingly used to proxy economic uncertainty [
11,
12,
13,
14]. The omission of
E in the calculation of the balance statistic, and consequently in (9), implies a loss of the information concerning the degree of uncertainty of the respondents. In order to overcome this limitation, in [
65], the authors presented a methodological framework to derive a geometric measure of disagreement that explicitly incorporated the share of neutral responses. This metric can be interpreted as the percentage of discrepancy among responses. The original framework uses a positional approach to determine the likelihood of disagreement among election outcomes [
67]. Using agents’ expectations from the CESifo World Economic Survey (WES) about the country’s situation regarding the overall economy, the authors found that the proposed measure (10) coevolved with the standard deviation of the balance (9).
This metric of disagreement for BCS can be derived as follows. Assuming that the number of answering categories is three—rise (
P), fall (
M), and no change (
E)—and given that that the sum of the shares of responses adds to a one, a potential representation of the vector of aggregated shares of responses is as a point on a simplex that encompasses all possible combinations of responses (
Figure 2). In the equilateral triangle, the vector of responses, denoted as blue point, corresponds to the unique convex combination of three reply options for each period in time.
Each vertex in
Figure 2 corresponds to a point of maximum consensus; conversely, the center of the simplex corresponds to the point of maximum disagreement, indicating that the answers are distributed equally among all response categories for a given time period. In this framework, in which all vertices are at the same distance to the center, the proportion of consensus is given by the relative weight of the distance of the point to the barycenter, which can be formalized as:
This metric is bounded between zero and one and conveys a geometric interpretation. Therefore, the proportion of (geometric) disagreement can be computed as .
This framework has been expanded for a larger quantity of reply options [
68,
69]. If the number of answering categories is noted as
N, and
denotes the aggregate percentage of responses in category
i at time
t, where
, the level of disagreement could be computed as:
As in (10), this metric is bounded between zero and one, and can be regarded as a metric of qualitative variation that gives the proportion of disagreement among respondents. In [
68], the author designed a simulation experiment and sampled the distribution of
D for
and for
, finding that for three answering categories, the statistic encompassed a wider range and its distribution of scores was more uniform. In references [
46,
56], the authors showed that the number of response categories is crucial to the forecast accuracy of quantified expectations.
There seems to be no consensus in the literature regarding the information content of disagreement among agents to refine predictions. On the one hand, in [
70], the authors did not find evidence that uncertainty helped in refining forecasts of GDP and inflation in the EA. On the other hand, in references [
71,
72], the authors found that including uncertainty indicators as predictors improved accuracy of forecasts of economic activity in Croatia, the UK, the US, and the EA. In [
73], the authors found that macroeconomic uncertainty contained useful information to predict employment, especially in the construction and manufacturing industries. In [
74], the author applied expression (10) in order to compute an indicator of employment, which was included as a predictor in time-series models, obtaining better forecasts of unemployment rates than with ARIMA models.
Similarly, with the aim of assessing the predictive power of disagreement, in [
12], a vector autoregressive (VAR) framework was used to generate out-of-sample recursive forecasts for output growth, inflation, and unemployment rates for different forecast horizons. The author obtained more accurate predictions of GDP with disagreement in business expectations (about manufacturing production), and of unemployment with disagreement among consumers’ expectations (about unemployment). It was also found that disagreement in business surveys Granger-caused macroeconomic aggregates in most countries, while the opposite happened for disagreement in consumer surveys.
2.2. Machine Learning Techniques for the Conversion of Survey Data on the Expected Direction of Change
In this subsection, we describe a new approach to obtain quantitative measures of agents’ expectations from qualitative survey data based on the application of GAs. Given the data-driven nature of this methodology, no assumptions are made regarding agents’ expectations.
This empirical modeling approach is based on the GP estimation of a symbolic regression (SR) [
75]. GP is a soft-computing search procedure based on evolutionary computation. As such, it is founded on the implementation of algorithms that apply Darwinian principles of the theory of natural selection to automated problem solving. See [
76] for a review of the literature on the application of evolutionary computation to economic modeling and forecasting.
This optimization algorithm represents programs in tree structures that learn and adapt by changing their size, shape, and composition of the models. Unlike conventional regression analysis, which is based on a certain ex-ante model specification, SR can be regarded as a free-form regression approach in which the GP algorithm searches for relationships between variables and evolves the functions until it reaches a solution that can be described by the algebraic expression that best fits the data.
This simultaneous procedure offers, on the one hand, an overview of the most relevant interactions between the variables analyzed and the type of relationship between them. On the other, it helps to identify a priori unknown interactions. The implementation of the process starts from the generation of a random population of functions, expressed as programs. From this initial population, the algorithm makes a first selection of the fittest. From this point on, successive simulations are concatenated that generate new, more suitable generations. In order to guarantee diversity in the population, genetic operators are applied in each simulation: reproduction, crossing, and mutation. Reproduction aims to copy the function, while crossover and mutation consist of the exchange or substitution of parts of the function.
By applying these operations recursively, similar to the evolution of species, the fitness of the members increases with each generation. In order to assess the degree of fitness, a loss function is used. The process is programmed to stop, either when an individual program reaches a predefined fitness level, or when a predetermined number of generations is reached. The end result would be the best individual function found throughout the process. See [
24] for a detailed description on the implementation of GP, and [
77,
78] for a detailed review of the main issues of GP.
GP was first proposed by [
75] to evaluate the non-linear interactions between price level, GNP, money supply, and the velocity of money. Given the versatility of the procedure, and its suitability to find unknown patterns in large databases, GP attracts more and more researchers from different areas with the aim of carrying out complex modeling tasks. It has only been recently that it has begun to be applied in the quantification of the qualitative information contained in the ETS, with the aim of estimating economic growth [
79,
80,
81,
82] and also the evolution of unemployment [
83].
In [
80,
81,
82], the authors used GP to derive mathematical functional forms that combined survey indicators from the CESifo WES to approximate year-on-year growth rates of quarterly GDP. In the WES, a panel of experts are asked to assess their country’s general situation at present and expectations regarding the overall economy, foreign trade, etc. The authors use the generated proxies of economic growth as building blocks in a regularized regression to estimate the evolution of GDP. In [
79], the authors used survey expectations from the WES to generate two evolved indicators: a perceptions index, using agents’ assessments about the present, and an expectations index with their expectations about the future. Recently, in [
24], the authors used the balances of the survey variables contained in
Table 1 for thirteen European countries and the EA to generate country-specific business confidence indicators via GP to nowcast and to forecast quarter-on-quarter growth rates of GDP. They also replicated the analysis with the information contained in
Table 2 to generate empirical consumer confidence indicators. When assessing the out-of-sample forecasting performance of recursively evolved sentiment indicators, the authors obtained superior results than with time-series models. In
Section 4, we assess the performance of quarterly unemployment expectations obtained by applying evolved expressions obtained through GP, analyzing the ability of the generated series of expectations to nowcast unemployment rates in the midst of the pandemic.