Researchers have focused on the development and use of composite indicators to represent complex phenomena [1
]. These indicators’ construction is almost always done by methods that aggregate a reasonable number of manifested variables, which can be weighted or not, in a new synthesis variable [2
]. The problem is that this aggregation and weighting do not allow one to capture the effects that the multiple underlying dimensions of the phenomenon have on each other. Thus, it is disregarded, for example, that the socioeconomic condition of families influences their housing conditions [3
]. This limitation means that intra-urban inequality composite indicators constructed from methods based on aggregation and weighting (e.g., see [4
]) do not capture the effects among the underlying dimensions of the inequality. Such methods do not allow for consideration of the influence that indicators have on each other, such as socioeconomic [6
], neighborhood [7
], and household [8
In this context, this research aims to explore the direct and indirect effects that the different underlying dimensions of intra-urban inequality have on the indicator that represents this phenomenon. In order to build a composite indicator that captures the effects of the underlying dimensions of intra-urban inequality, a model that combines confirmatory factor analysis with a model of simultaneous equations [9
], known as structural equation modeling [10
], was developed. The model was developed taking as an example the intra-urban inequality of the Maringá–Sarandi–Paiçandu conurbation in Brazil. This model comprises first- and second-order structures. The first-order structure is composed of non-observed variables that represent three underlying dimensions of the intra-urban inequality. The second-order structure is the variable that synthesizes the variables of the first-order structure. From this synthesis variable, the Structured Intra-urban Inequality Indicator (S-III) is expected to contribute to a theorized measurement of intra-urban inequality that considers the interrelationships among the underlying dimensions of this phenomenon.
In addition to this introduction, this article presents urban inequality with its research fronts and some examples of indicators in Section 2
. The main foundations of the composite indicators are presented in Section 3
. Section 4
presents structural equation modeling, as well as its different variables, models, and relationships. Section 5
presents the research materials and methods, showing the variables and dimensions of the indicator, the characteristics of the model, and the description of the tests necessary to validate the results. Section 6
presents the results, discussions, and contributions to the research, followed by the conclusions, limitations, and suggestions for future work.
2. Urban Inequality
Urban inequality has been studied from unique perspectives, such as, for example, economic [13
], geographic [15
], and, mainly, sociological perspectives (Bourdieu, 2016; [18
]. These perspectives guide researchers and create areas of specialties and different research fronts on urban inequality. However, what are these research fronts and how we identify them?
One of the most accurate techniques [22
] to identify research fronts [23
] is the frequency analysis of co-citation by influential researchers. VOSviewer is a software used to analyze how the most influential authors are cited and organized within the specialized literature [24
]. Figure 1
shows the co-citation analysis of 313 publications indexed in the Scopus database that explicitly mention urban inequality.
The analysis of co-citations shows four research fronts on urban inequality. From left to right, the first research front, in red dots, represents works that most frequently cite Harvey [15
] and Bourdieu (2016). The citation of these authors’ works from this research front indirectly focuses on the study of the dynamics of power in society, as well as the geographic study of urban poverty and its consequences. The second research front, in blue dots, represents researchers that most frequently cite the works of Holzer [13
] and Wilson [21
]. Studies that often mention these authors are indirectly concerned with understanding how geographic characteristics affect the work/employment of low-income people and how the low employment opportunity in the neighborhoods exacerbates poverty. Research associated with the third research front, in yellow dots, cites the works of Massey [18
] and Massey and Denton [25
], which address the problem of immigration and the effects of urban segregation of blacks, or cite the works of Sampson and Laub [20
] and Sampson et al. [19
], which are concerned with collective engagement, understanding of crime, the effects of neighborhoods, and the social organization of cities. The fourth and last research front on urban inequality, represented in green dots, is formed by works that most frequently cite the works of Farley and Frey [26
] and Clark and Dieleman [17
]. Such research focuses on the study of population trends and on the analysis of patterns such as racial and ethnic differences and changes, as well as their effects on the urban housing market.
Differentiated research fronts indicate the recognition that urban inequality is made of different dimensions. In this context, researchers have used composite indicators not only to represent urban inequality [4
], but also to represent its different dimensions. For example, it is possible to indicate socioeconomic [6
], neighborhood [7
], and household inequalities [8
], among others [31
]. The present research focuses both on the general representation of urban inequality and on its dimensions. It also deals with the relationships and the influences of these dimensions on urban inequalities, which are represented through composite indicators.
3. Composite Indicators
The literature shows that there is no single, consolidated definition of what composite indicators (CIs) are. However, it is possible to state that CIs are a mathematical aggregation of variables, normalized or standardized, weighted or not, in a single indicator capable of representing different dimensions of a complex concept or phenomenon [33
]. There is strong criticism about the aggregation and weighting process for the construction of a CI [36
], as well as about its ability to measure a complex concept or phenomenon [38
]. Even so, CIs have attracted the attention of researchers in an increasing number of publications on varied areas of knowledge [38
], including the analysis of intra-urban inequality [1
A wide variety of methods can be used in the construction of CIs [2
]. Regardless of the method, the construction begins with the definition of the structure of individual indicators, which should be sufficient to describe the phenomenon [34
]. This decision can be based on expert opinion—for instance, applying an analytic hierarchy process [39
]—or on the statistical structure of the data set—e.g., using multivariate analyses [40
]. In particular, multivariate analyses are useful for assessing the general structure of individual indicators in order to verify the adequacy of these indicators and to justify methodological choices for weighting and aggregating the variables [34
Among the many options of multivariate analysis methods used in the construction of CIs, factorial methods are common choices. Cronbach’s coefficient alpha (CA) measures the internal consistency of the pairwise correlations between individual indicators [35
]. The use of CA allows one to evaluate how well the individual indicators describe multidimensional constructs [34
]. The application of principal component analysis (PCA) results in major components that account for a maximum amount of variance in observed variables [42
]. PCA is commonly used for dimensionality reduction [43
]. Factor analysis (FA) allows for the estimation of latent variables that influence the responses of the observed variables [42
]. FA is commonly used to describe the variability between the correlated observed variables in terms of a potentially smaller number of unobserved variables [43
]. Correspondence analysis is a non-parametric descriptive/exploratory technique of dimensionality reduction similar to PCA and FA, but applied to categorical data instead of continuous data [35
]. Multiple correspondence analysis (MCA) is the extension of simple correspondence analysis applied to data sets with more than two categorical variables [44
These factorial methods measure the degree of similarity between the individual variables, indicating whether the structure of the CI is sufficiently reliable to describe the phenomenon [35
]. However, when limiting themselves to aggregating the variables in a CI, these methods disregard the effects that the variables and dimensions of the phenomenon have on each other. For example, they disregard that the income variable influences the infant mortality rate [46
] or that the dimension of the housing conditions of families is influenced by the socioeconomic dimension [3
How can this limitation be overcome? In order for the CI to capture the influence of its multiple dimensions, Cataldo et al. [1
] suggest the structuring of latent variable blocks or dimensions that aggregate observed variables of their own and that are related according to the theoretical framework. This model is operationalized through structural equation modeling (SEM) and allows us to answer what the strength and significance of the effects between the dimensions of urban inequality are in the indicator.
4. Structural Equation Modeling (SEM)
The theory of structural equation modeling began with the seminal work of Jöreskog [47
] from the design of a model that combines confirmatory factor analysis and a system of simultaneous equations [9
]. In summary, this model is formed by two types of variables: the latent variables, which are not observed and represent theoretical concepts or constructs [48
], and the observed variables, which are the measurable variables that are associated with a concept or construct [50
SEM allows for the operation of two models: covariance-based structural equation modeling (CB-SEM) and partial least square structural equation modeling (PLS-SEM) [11
]. While CB-SEM aims to test, confirm, or compare alternative theories, PLS-SEM aims to explore a theoretical framework [48
]. In both cases, the construction of the model is carried out dynamically through the inclusion/exclusion of variables or construction relationships based on three elements. First, the relevance of the variables in the construct is assessed using their factor loads. Second, the strength of the associations between variables and constructs and between constructs is measured using their correlation coefficients. Third, the statistical significance of the relationships is evaluated using a t-test between variables and constructs or between constructs [50
The model should also consider how the latent and observed variables are related. The relationship between the latent variables and the observed variables will be formative in four situations: first, when the direction of causality is towards the constructs to be built; second, when the observed variables define some characteristics of the construct; third, when changes in the observed variables cause changes in the construct; fourth, when changes in the construct do not cause changes in the observed variables [51
]. In addition to these options, when changes in the latent variable influence the measurements of the observed variables, the relationship between the construct and the observed variables will be reflexive [52
In summary, SEM allows us to assess the strength and significance of the effects between the variables and the dimensions of the construct of interest, and to build an indicator to capture these effects. For example, Park et al. [53
] show that the urbanization indicator of the Inner Mongolia region is strongly and significantly influenced by the dimension of economic development, while the Mongolia region is strongly and significantly influenced by the dimensions of social goods and economic development. The literature also brings other examples of works that use PLS-SEM to explore how the dimensions of a phenomenon are related and influence indicators of quality of work [54
], fair and sustainable well-being [55
], disorder perceived in the neighborhood [56
], and social cohesion [31
6. Results and Discussions
For the definition of the first-order structure that determines the creation of the latent variables representing the underlying dimensions of intra-urban inequality, 11 variables that exceeded the loading threshold of 0.70 [12
] were selected. This threshold means that the latent variable explains at least 50% of the variance of the observed variable [49
]. Neighborhood inequality comprised the following observed variables: streets without paving (ENT_5); wireless locations (ENT_8); public places without manholes (ENT_9); and streets without sidewalks (ENT_10). Socioeconomic inequality was made up of the following observed variables: number of people of brown color/race (POP_3); number of heads of household with income below or equal to 2 minimum wages (POP_9); number of heads of households with income above 20 minimum wages (POP_10); and number of illiterate heads of household (POP_13). Finally, households inequality comprised the following observed variables: average household income (DOM_1); households with four bathrooms or more (DOM_3); and inadequate housing (DOM_7). These three constructs make up the second-order urban inequality structure, the S-III. The results of the reliability and validity tests of the first- and second-order structures are shown in Table 2
As noted, all constructs tested internal consistency, composite reliability, and convergent validity above critical reference values. The discriminating validity of the three constructs that make up the S-III presents conflicting results. According to Fornell and Larcker [76
], as seen in Table 3
, the AVE
of each latent variable is greater than all the squared correlations of this latent variable with the others.
criterion states that discriminant validity occurs when the test result is less than 0.90. Considering that the HTMT
values of the neighborhood inequality and socioeconomic inequality constructs were 0.466 and 0.551 for the neighborhood inequality and household inequality constructs, it can be said that these constructs are different. The HTMT
of 0.904 does not allow the affirmation that the socioeconomic inequality and household inequality constructs are different. The explanation for the failure in this test is obtained by the criterion of cross-loading [10
]. By this criterion, it is observed in Table 4
that the outer loading of the average household income variable (DOM_1) in red is greater in the socioeconomic inequality construct than in its origin construct, household inequality.
Although the results of the discriminant validity are divergent, we highlight in the words of Henseler et al. [77
] (p. 131) that “a failure to establish the discriminating validity between two constructs does not imply that the concepts are identical”, especially when the research provides support for this differentiation. This differentiation exists if the IBGE [60
] classifies the average household income as household data, even if such income is associated with the families’ income. Finally, the PLS-SEM internal validity tests the relationship between the constructs that form the S-III reflecting urban inequality. The results of this test are illustrated in Figure 3
, which also shows the factor loadings of the observed variables of each latent variable and the AVE
of the latent variables.
It can be seen in Figure 3
that all relationships exceeded the critical significance value: t
-test > 1.96 [12
]. However, the strength of these relationships is not homogeneous. In particular: (i) Socioeconomic inequality is moderately related to household inequality (0.67 > R
> 0.33); (ii) households inequality is weakly related to neighborhood inequality (R
> 0.33); (iii) neighborhood inequality is moderately related to urban inequality (0.67 > R
> 0.33), which, in turn, is substantially related to socioeconomic inequality and household inequality (R
> 0.67). From the presence of these statistically significant relationships, it is possible to know what the direct and indirect effects between the latent variables that form the S-III are. Table 5
shows these direct and indirect effects.
The results presented in Table 5
show that socioeconomic inequality influences household inequality, and that household inequality influences neighborhood inequality. First, changes of one standard deviation in socioeconomic inequality influence household inequality by 0.785 standard deviations. Second, changes of one standard deviation in household inequality influence neighborhood inequality by 0.482 standard deviations. Table 5
also shows that changes of one standard deviation in socioeconomic inequality influence neighborhood inequality by 0.378 standard deviations. It is noteworthy that this last relationship is the influence of an indirect effect. In the model presented in Figure 3
, it can be seen that there is no significant relationship between socioeconomic inequality and neighborhood inequality. The explanation for the occurrence of this indirect effect is in the following sequence of significant relationships: socioeconomic inequality -> household inequality -> neighborhood inequality.
The internal validity test shows that the relationships between the underlying dimensions of intra-urban inequality are statistically valid. In turn, the coefficients of determination indicate that the chances of these relationships occurring as they were estimated vary between R2
= 0.23 and R2
= 0.83. It is, therefore, necessary to add the relative importance of each underlying dimension of intra-urban inequality in the S-III. This relative importance is indicated by the outer weights [12
] of 0.27, 0.30, and 0.62 of the socioeconomic, household, and neighborhood dimensions, respectively, in the S-III construct of intra-urban inequality. The relative importance of each underlying dimension of intra-urban inequality in S-III can be seen in the circles of Figure 4
. The indirect effects between the underlying dimensions can also be observed beside the black and blue arrows.
As seen in Figure 4
, neighborhood inequality is influenced directly by household inequality and indirectly by socioeconomic inequality. These influences are reflected both in the relative importance measured by the outer weights (OW
) and in the absolute contribution measured by the outer loadings (OL
) of the neighborhood inequality latent variable in the intra-urban inequality construct. Table 6
shows the OW
of the underlying dimensions of intra-urban inequality in the S-III construct.
The results in Table 6
indicate that socioeconomic and household inequalities have less relative and absolute importance in the intra-urban inequality S-III construct than neighborhood inequality. The OW
of 0.27 and 0.30 and the OL
of 0.77 and 0.81 of socioeconomic and household inequalities are less than the OW
of 0.62 and the 0.88 OL
of neighborhood inequality. These results suggest that neighborhood inequality is the most important dimension in the formation of the intra-urban inequality S-III construct. Proportionally, neighborhood inequality, household inequality, and socioeconomic inequality contribute 52%, 25%, and 23% to S-III, respectively.
Determining which dimensions and variables to use in the study of inequalities is always a hard task because of the availability and quality of the data concerning the breakdown, coverage, detailing, periodicity, and reliability. Given the availability, it is necessary, based on the conceptualization of the phenomenon, to define which data to select to develop indicators that are both pertinent and economical. The definition of which data to use is not a simple task, because in each socio-spatial context, there are specific combinations that more accurately reflect the analyzed phenomenon, which is, in this case, inequality.
In this sense, the present work contributes to: (i) the definition of, based on a significant number of data, that which most consistently represents each dimension of the phenomenon; (ii) the definition of which of these dimensions contributes most to the final indicator—in this case, the urban inequality indicator; and (iii) showing the levels of precariousness that together help to identify urban inequalities.
For the case of the urban conurbation of Maringá–Sarandi–Paiçandu, it has as a practical contribution the conclusion that the variables related to the basic urban infrastructure, such as paving of roads, rain galleries, or sidewalks, highly discriminate against the most vulnerable areas of the city, expressing striking intra-urban inequalities. This shows that the inequality to which citizens are subject is not linked to income level, but also to the conditions in the place where they live. In summary, the place of residence and its characteristics weigh heavily in the creation, conservation, and deepening of social inequalities that are also territorial. Often, it weighs heavily on the perpetuation of social disadvantages across generations, as the studies by Stiglitz [78
] and Deaton [79
] have pointed out.
These analyses allow us to conclude that with the urban conurbation of Maringá–Sarandi–Paiçandu, the urban spaces with precarious infrastructure are representative icons of intra-urban inequality and denote precariousness in the living standards of families closely related to levels of education and income, as well as their color/race.
This work used PLS-SEM to capture the direct and indirect effects of the underlying dimensions of urban inequality in a summary indicator: neighborhood inequality, socioeconomic inequality, and household inequality. Beyond quantifying a multidimensional phenomenon, this research shows how to perform a theorized measurement of urban inequality. From this measurement, it is possible to identify which dimensions most influence the others, and which dimensions have greater weight in the measurement of urban inequality. This identification makes it possible, for example, to establish tax zones for the taxation of urban property, to develop investment plans in urban infrastructure, or to prioritize areas for implementing public policies aimed at reducing inequality.
With the urban conurbation of Maringá–Sarandi–Paiçandu, socioeconomic inequality is the dimension that most influences the others, while neighborhood inequality has greater weight in urban inequality. In this sense, improvements in the socioeconomic conditions of less favored regions, for example, from the exemption from property tax, can encourage families to invest in the conditions of their homes. For example, improvements in the conditions of the neighborhood in urban infrastructure have significant weight in reducing the urban inequality of the urban conurbation of Maringá–Sarandi–Paiçandu.
Although the PLS-SEM offers different spectra of analysis to support inequality reduction policies, it is necessary to remember that statistical methods of this nature are not immune to errors and distortions because the model’s responses do not perfectly represent the phenomenon or interrelationships. For this, the AVE
and correlation coefficients should be equal to 1. In addition, because the answers found in the model did not occur in all areas, atypical observations show that the model does not apply to the entire city (see Appendix A
). A final limitation can be attributed to the frequency at which census data are updated. In Brazil, as in many countries, these census data are updated every ten years [60
]. As a result, the indicators built tend not to represent a present reality. Therefore, it is a challenge for future research to develop methodologies that make it possible to update census data and, based on these data, develop longitudinal analyses of multidimensional phenomena.