Next Article in Journal
Integration of UAV Photogrammetry and GIS for Digital Elevation Modeling in Urban Land Use Planning
Previous Article in Journal
AI-Driven Valuation of Circular Economy Investments: Implications for Sustainable Real Estate and Resource Management
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Telework in the Brazilian Context: Social and Economic Factors Under a Machine Learning Approach

by
Laryssa de Andrade Mairinque
1,*,
Robson Bruno Dutra Pereira
2 and
Josiane Palma Lima
1
1
Industrial Engineering and Management Institute, Federal University of Itajubá, Itajubá 37500-903, Brazil
2
Departament of Mechanical and Industrial Engineering, Federal University of São João del Rei, São João del-Rei 36307-352, Brazil
*
Author to whom correspondence should be addressed.
Sustainability 2026, 18(6), 3043; https://doi.org/10.3390/su18063043
Submission received: 24 January 2026 / Revised: 1 March 2026 / Accepted: 3 March 2026 / Published: 20 March 2026
(This article belongs to the Section Economic and Business Aspects of Sustainability)

Abstract

Telework has expanded rapidly, yet its determinants and temporal dynamics remain insufficiently documented in developing countries. This study examines the evolution of telework in Brazil from 2022 to 2025 using machine learning models applied to nationally representative microdata from the Continuous National Household Sample Survey, based on approximately 210,000 households per reference period. A standardized pipeline was implemented across four time windows, including preprocessing, missing-data handling, class balancing via random under-sampling, feature encoding and normalization, and stratified data splitting with 5-fold cross-validation. Nine classification algorithms were evaluated and hyperparameter-tuned using ANOVA racing, with model performance assessed primarily through the ROC AUC metric. The results indicate consistently high discriminative performance across all analyzed periods (ROC AUC > 0.80). The temporal evaluation further reveals overlapping confidence intervals among the predictive models, indicating statistically comparable performance over time and no evidence of a universally dominant algorithm. Variable-importance analyses show that the set of the eight most relevant predictors remained stable, although their relative rankings changed, with gender increasing in importance in the most recent periods. Overall, telework in Brazil is jointly shaped by sociodemographic and occupational factors, highlighting its selective nature and the relevance of temporal monitoring to inform research and policy.

1. Introduction

In the past, the nature of work tools determined the physical location where tasks had to be performed. However, in today’s digital era, many of these tools have become portable, allowing certain categories of work to be carried out virtually anywhere [1]. Telework refers to the performance of professional activities outside traditional office environments or fixed workplaces. In theory, this arrangement reduces the number of daily commutes. In addition, it may influence the choice of transportation modes, raising important questions about how societies adapt when commuting is no longer a routine necessity [2]. Telework has become increasingly common, driven not only by advances in communication technologies but also by the COVID-19 pandemic, which accelerated its adoption across multiple sectors.
The Internet is often described as an “electronic highway,” a metaphor that underscores its functional parallel with traditional road networks. This analogy suggests that digital connectivity can serve as an alternative to physical travel, reinforcing the notion that virtual and physical modes of transport are becoming increasingly interconnected. From an environmental perspective, the potential to replace motorized trips with “virtual travel” is particularly appealing [3]. Within this context, telework emerges as a viable substitute for certain routine trips, enabling individuals to perform professional and personal activities remotely. As a result, it has increasingly been considered a public policy instrument to alleviate road congestion, reduce transportation-related costs, and mitigate greenhouse gas emissions [4,5,6].
The adoption of telework may be influenced by a wide range of factors. Overall, the literature indicates that the typical teleworker profile is predominantly composed of men [7,8,9], with higher income levels [10,11,12,13,14], higher educational attainment [7,8,9,10,13,14,15], and generally belonging to an intermediate age group [8,10,14,16]. These patterns suggest that access to telework tends to be more frequent among individuals with greater social and economic capital, which may reinforce existing inequalities within this context.
Recent studies have applied machine learning algorithms, such as Decision Trees, Random Forest, SVM, and Naïve Bayes, to predict telework adoption, mainly in the context of COVID-19  [17,18,19,20,21]. Although these studies report high predictive accuracy, they generally rely on small samples and focus primarily on comparing model performance rather than systematically evaluating the relative importance of sociodemographic determinants. More advanced models, such as extreme gradient boosting and neural networks, have also been applied using smartphone usage and GPS data to identify work patterns, including telework and breaks [19]; however, sociodemographic factors were not central to the analysis. Overall, the machine learning-based literature tends to prioritize predictive accuracy without comparatively assessing the role of structural sociodemographic inequalities.
As commuting patterns and daily activities become less dependent on a fixed physical workplace, travel is likely to diversify and become increasingly shaped by individual needs rather than following a logic rigidly determined by the urban spatial structure [22,23,24]. In this scenario, understanding the human and social factors that influence telework adoption becomes essential not only for the effective management of this work arrangement but also to support urban and economic planning from a more systemic perspective. Against this backdrop, the objective of this study is to evaluate the determinants of remote work in Brazil using machine learning techniques. Machine learning techniques were chosen due to their ability to explore and compare multiple classification algorithms, enhancing model robustness, mitigating overfitting, and improving the ability to generalize to new and unseen data [25]. Extensive data were collected through the administration of a survey.
A gap persists in the literature, particularly in developing countries. Although studies on telework have increased, most focus on advanced economies and prioritize predictive accuracy without systematically assessing the role of sociodemographic determinants in machine learning applications. Even in emerging economies such as Vietnam and India, analyses primarily rely on traditional econometric models and focus on travel behavior or specific contexts, without using nationally representative data or comparative machine learning approaches [10,15].
These limitations are particularly relevant in countries such as Brazil, where inequalities in income, education, and occupational segmentation may decisively influence access to remote work. In such contexts, sociodemographic factors may be central determinants of telework. By using nationally representative Brazilian data and multiple machine learning techniques, this study seeks to fill this gap and provide a more methodologically robust and context-sensitive analysis of telework in structurally unequal labor markets.
Based on this context, this study advances the following exploratory hypotheses regarding remote work in Brazil:
 H1.
Higher levels of income and educational attainment are positively associated with engaging in telework, reflecting inequalities in access to digital resources and knowledge-intensive occupations
 H2.
Occupational characteristics play a significant role in shaping the adoption of telework
 H3.
The relative importance of socioeconomic and occupational factors varies over time, in response to changes in labor market dynamics
 H4.
The machine learning models exhibit adequate predictive performance in identifying the determinants of telework    
From this perspective, examining telework through the lens of socioeconomic inequalities in developing countries is essential to advance the understanding of this work arrangement. Such efforts can contribute to the design of more inclusive public policies and to urban planning guided by principles of equity, efficiency, and sustainability.

2. Materials and Methods

This study relies on a robust database (big data) collected through a survey, based on a sample of approximately 210 thousand households. The data used in this research were obtained from the Continuous National Household Sample Survey (PNAD Contínua), conducted by the Brazilian Institute of Geography and Statistics (IBGE). This survey is one of Brazil’s main official sources of information on the socioeconomic conditions of the population, providing continuous and comparable statistics on the labor market, education, income, and demographic characteristics. PNAD Contínua is characterized by scientific rigor, broad geographic coverage, availability of detailed microdata, and temporal continuity, which enable consistent, comparable, and generalizable analyses for the Brazilian population as a whole.
In the first quarter of 2025, the survey estimated a population of 177.17 million individuals aged 14 years and older, of whom approximately 110.64 million (62.2%) were in the labor force and 66.98 million (37.8%) were outside it. During this period, there were about 103.8 million employed individuals (58.6% of the working-age population) and 6.82 million unemployed individuals (3.8%), resulting in an unemployment rate of 6.2% and a labor force participation rate of 62.2% [26]. These indicators highlight the relevance of PNAD Contínua as an empirical basis for analyses related to labor market dynamics in Brazil.
PNAD Contínua follows a probabilistic sampling design and is conducted continuously across the entire national territory, covering both urban and rural areas. The sample comprises approximately 210 thousand households distributed across about 3500 municipalities, ensuring high statistical representativeness of the Brazilian population.
Data collection is carried out through computer-assisted personal interviews (Computer Assisted Personal Interview—CAPI), using standardized questionnaires. The instrument is structured around a core module containing demographic, educational, occupational, and income information, complemented by supplementary modules periodically administered to address specific topics related to labor market conditions and living standards. This modular structure enables detailed analyses and consistent temporal comparisons. The coding of the variables used in this study follows the official definitions provided by the PNAD Continuous survey [27].
The database includes sociodemographic, educational, occupational, and income-related information. For this study, the following variables were selected:
  • Demographic characteristics (sex, age, federative unit);
  • Educational attainment;
  • Employment status and type of employment relationship;
  • Industry sector;
  • Income;
  • Place of work.
The sample was restricted to employed individuals in the reference period. The response variable corresponded to the place of work, which was originally composed of multiple categories. These categories were recoded into two classes: Telework and On-site work. The recoding was based on semantic criteria aligned with the telework literature. Categories considered ambiguous were excluded in order to reduce classification noise. Although this decision led to a reduction in sample size, it contributed to greater conceptual consistency. The explanatory variables included both numerical and categorical attributes, resulting in a mixed data structure, which is common in socioeconomic problems.

3. Survey Data Analysis

A temporal analysis of telework in Brazil was conducted through the application of machine learning techniques. The time windows considered were: October/November/Decem-ber 2022; July/August/September 2023; April/May/June 2024; and January/February/March 2025, in order to maintain regular intervals across the periods analyzed. This approach made it possible to assess changes in the best-performing models over time, as well as in the most influential variables associated with telework. Accordingly, the study contributes both to the identification of consistent patterns and to the detection of structural shifts in the phenomenon, providing more reliable evidence to support the understanding of telework dynamics in Brazil. The periods analyzed were chosen because they include the same set of variables available in the PNAD Contínua survey, allowing for a consistent comparison, and to focus the analysis exclusively on the post-pandemic context.
Figure 1 illustrates the workflow adopted in each period of analysis, including the implementation of the machine learning methods and the assessment of feature importance.
Data preprocessing was carried out in a systematic and standardized manner across all periods analyzed. Initially, observations with missing values in the response variable were removed, and the labels of categorical variables were standardized. Subsequently, highly correlated variables were examined, and one of them was excluded to mitigate potential multicollinearity issues.
Rare categories were identified using a frequency-based threshold, whereby nominal predictors with relative frequency below 1% of total observations were consolidated into a single category labeled “Other”, aiming to reduce sparsity and improve model stability. This grouping procedure was applied consistently across all analyzed periods. No substantive redefinition of categories was performed; only low-frequency groups were aggregated according to the predefined threshold to ensure statistical robustness.
Missing values were handled according to variable type: for numerical variables, median imputation was applied; for categorical variables, a specific category was created to represent unknown values. Due to class imbalance, random under-sampling of the majority class was performed, resulting in a balanced dataset. Before data balancing, a loss of sensitivity in predicting individuals working remotely was observed. Table 1 presents the dataset sizes before, N#, and after, N*, class balancing. For all periods, a perfect balance between the two classes was achieved. Notably, in the first period analyzed, a higher proportion of individuals were working from home. From 2023 onward, an increasing number of people have returned to the workplace.
Numerical variables were normalized, while categorical variables were encoded using one-hot encoding, with rare categories grouped to reduce sparsity. All preprocessing steps were organized within a pipeline, ensuring that parameters were estimated exclusively on the training set and subsequently applied to the validation and test sets, thereby preventing information leakage.
After balancing, the data were partitioned using stratified sampling into:
  • 80% for training;
  • 20% for testing,
Within the training set, k-fold cross-validation (k = 5) was applied. This strategy allowed the estimation of average model performance and reduced the variance of the estimates. The test set remained completely isolated until the final evaluation stage. Nine classification algorithms were evaluated, as presented in the workflow shown in Figure 1. The selection of these algorithms aimed to represent different methodological families, enabling a comprehensive comparison across linear models, tree-based methods, kernel-based approaches, and neural networks.
Subsequently, hyperparameter tuning was performed using the ANOVA racing method, which statistically evaluates the performance of candidate configurations and progressively eliminates inferior combinations. Up to 25 hyperparameter configurations were evaluated per model. The area under the ROC curve (ROC AUC) was adopted as the primary optimization criterion due to its reliability in binary classification problems with potential class imbalance. The hyperparameters tuned for each algorithm are described in Table 2.
Finally, the last stage of the workflow presented in Figure 1 corresponds to the assessment of variable importance. At this stage, the best-performing model is interpreted using variable importance plots and partial dependence plots. Variable importance is obtained through permutation of the predictor columns prior to generating predictions and computing performance metrics on the test set. The greater the increase in error after permutation, the higher the importance of the independent variable. Boxplots are used to compare the error obtained after variable permutation with the error of the full model.
Partial dependence plots were employed to visualize how each variable affects the predicted response using Shapley Additive Explanations (SHAP). The SHAP approach computes the average contribution of each variable across different combinations of feature orderings [28].
All analyses were conducted using the R programming language [29], version 4.5.1, and the modeling process followed the tidymodeling approach, with support from the tidymodels  [30,31], finetune [32], and DALEXtra [33] packages.

Results and Discussion

After completing the data preparation and preprocessing stages, the analyses were conducted. To evaluate the performance of the classification methods across the analyzed periods, confidence intervals for accuracy and ROC AUC were estimated using five-fold cross-validation. Figure 2, Figure 3, Figure 4 and Figure 5 present these results for each period, while Figure 6, Figure 7, Figure 8 and Figure 9 display the confidence intervals for the precision and recall metrics of the evaluated methods. Overall, the overlap of confidence intervals across all evaluated models and metrics indicates the absence of statistical evidence of consistently superior performance by any algorithm throughout the analyzed periods. In practical terms, the models exhibit statistically comparable performance.
Given this scenario, the ROC AUC metric was adopted as the primary criterion for selecting the machine learning models, as it provides a more comprehensive assessment of classifier discriminative ability compared to scalar measures such as accuracy, error rate, or error cost. Furthermore, by decoupling classifier performance from class imbalance and misclassification costs, ROC AUC offers advantages over other evaluation metrics, such as precision–recall and lift curves [34].
The ROC AUC metric assesses the discriminatory ability of classification models, representing the probability that a positive observation is correctly ranked above a negative observation, regardless of the decision threshold adopted [35]. Values close to 1 indicate excellent predictive performance, whereas values near 0.5 reflect performance equivalent to random chance. In the present study, as shown in Table 3, all ROC AUC values were above 0.80, indicating a high capacity of the models to distinguish between remote and on-site workers. These results confirm the adequacy of the machine learning approaches employed and the relevance of the variables used to model the telework phenomenon.
Subsequently, an analysis of predictor variable importance was conducted for each study period, considering the selected model previously identified, as shown in Figure 10, Figure 11, Figure 12 and Figure 13. The results indicate that the set of the eight most relevant variables remained consistent over time, although their relative importance rankings varied across periods. This pattern suggests consistency of the main determinants of telework, accompanied by relative adjustments in the contribution of each variable as a function of contextual and temporal changes.
Table 4 presents the variables with the greatest contribution to the predictive performance of the algorithms across the four analysis periods, together with their respective descriptions. These variables encompass sociodemographic, occupational, educational, and economic dimensions, highlighting the multifactorial nature of telework.
The analysis of Figure 10, Figure 11, Figure 12 and Figure 13 reveals relevant changes in the hierarchy of variable importance over time. In 2022, occupational and sectoral variables (VD4011, VD4009, and VD4010) accounted for the largest contributions to the model, indicating the predominance of professional profile characteristics in explaining telework. In 2023, the variable related to the respondent’s gender (V2007) emerged among the most influential predictors, alongside occupational characteristics (VD4011 and VD4010). From 2024 onward, gender became the top-ranked variable in terms of importance and remained the most influential factor in 2025, followed by variables related to employment status and sector of activity (VD4009, VD4010, and VD4011).
This pattern suggests a structural shift in the dynamics of telework, whereby sociodemographic factors have come to play a more decisive role than occupational characteristics alone. This shift points to potential inequalities in access to and continuity of telework arrangements between men and women in more recent periods. Nevertheless, the characteristics associated with the type of occupation continue to play a central role in determining the feasibility of telework.
Contour plots were constructed to visualize how the probability of an event varies as a function of two or more variables simultaneously. The contour plots show the predicted probability of engaging in telework as estimated by the multivariate model. The figures present two-dimensional projections of the probability surface as a function of two continuous variables, allowing visualization of how the probability varies across different combinations of these factors. The contour lines connect points with equal predicted probability, while variations in intensity highlight regions of higher and lower propensity for telework. Stratification by categories reveals structural differences between groups, indicating that the relationship between the continuous variables and the likelihood of telework is not homogeneous. Distinct patterns in the slope and concentration of high-probability areas are observed, suggesting relevant heterogeneity across the segments analyzed [25].
In this case, age (V2009), weekly working hours (V4039), gender, and occupational category were related to the estimated probability of engaging in telework for each analyzed period, as shown in Figure 14, Figure 15, Figure 16 and Figure 17. Overall, in the first two periods of analysis, both genders exhibited a higher concentration of individuals with a high probability of telework compared with the more recent periods. However, the patterns associated with occupational category and age remained consistent over time, with a lower likelihood of telework among individuals in private-sector employment and among younger workers.
Focusing on the most recent analysis, referring to the first quarter of 2025 (Figure 17), the results reveal a consistent pattern of a stronger tendency toward telework among women across virtually all occupational categories, with particular emphasis on the self-employed and employer groups, for which the estimated probability reaches the highest values. These findings are consistent with the literature, which indicates that, due to the need to reconcile professional and personal life, women tend to opt for telework when this alternative is available. The presence of children constitutes a relevant factor in this decision, as it enables greater time spent at home and facilitates the balance between family and work responsibilities [36,37]. However, evidence also suggests that for some women, telework has resulted in an increased domestic workload [38], highlighting the gendered implications of adopting this work arrangement.
These findings are reinforced by national statistics: although women accounted for 51.8% of the working-age population in the 3rd quarter of 2025, men predominated among the employed, with an employment rate of 69.0% compared to 49.2% for women, who, in turn, represented 54.1% of the unemployed population [39]. These data suggest that, while telework provides greater flexibility, it also reflects structural gender inequalities: many women can only participate in the labor market through telework due to domestic responsibilities, which helps explain why, despite constituting the majority of the working-age population, their employment rate remains lower.
Age exhibits a nonlinear effect for both genders: the probability of telework tends to increase in intermediate and older age groups, particularly when combined with shorter weekly working hours. This pattern suggests that telework is more prevalent among occupational profiles characterized by greater autonomy and seniority. This result is consistent with the literature, which characterizes the typical teleworker as an individual with greater educational attainment, higher income, superior occupational status, and generally belonging to intermediate age groups [36,40]. More recent evidence, based on explainable artificial intelligence approaches, reinforces this pattern by indicating that telework adoption is strongly associated with more skilled occupations, greater autonomy, and more flexible work arrangements, underscoring the role of occupational structure and labor market segmentation in shaping access to this modality [41]. In this context, ref. [42] argue that such patterns tend to intensify sociospatial inequalities, as workers in lower-skilled occupations have fewer opportunities to engage in telework, thereby reinforcing occupational and social divisions within urban space.
The probability of working remotely was also analyzed using contour plots as a function of habitual monthly earnings from the main job (VD4016) and years of schooling (VD3005), while additionally accounting for differences across groupings of the main economic activity of the enterprise (VD4010) and occupational groupings of the main job (VD4011), as shown in Figure 18, Figure 19, Figure 20 and Figure 21.
The analysis of the contour plots indicates a positive association between habitual monthly earnings from the main job (VD4016) and years of schooling (VD3005) with the estimated probability of working remotely. Across all specifications, a systematic gradient is observed, whereby individuals located in higher income and education strata exhibit substantially greater probabilities of engaging in telework. A positive interaction between these two variables is also evident, as increases in schooling amplify the marginal effect of income and vice versa, indicating complementarity between human capital and economic position in determining the feasibility of telework.
However, the positive effects of income (VD4016) and schooling (VD3005) on the probability of working remotely are heterogeneous across the labor market, varying according to groupings of the main economic activity of the enterprise (VD4010) and occupational groupings of the main job (VD4011). The contour plots show that sectors characterized by greater levels of administrative organization and intensive use of information and communication technologies, such as administrative services, display higher probability levels and steeper gradients with respect to individual characteristics. In contrast, activities such as construction, maintenance, and commerce remain concentrated in low-probability regions, even among individuals with elevated income and schooling levels, highlighting structural constraints on the adoption of telework.
These patterns are further reinforced by regional disparities in educational attainment across Brazil which, given the positive relationship between schooling and the likelihood of telework, contribute to shaping the spatial distribution of these opportunities. In the third quarter of 2025, approximately 68.8% of employed individuals nationwide had completed at least secondary education, while 24.7% held a tertiary degree. Nevertheless, these aggregate figures conceal substantial regional heterogeneity. In the North and Northeast regions, the share of workers who had not completed primary education reached 23.1% and 24.5%, respectively, exceeding the levels observed in other regions. By contrast, the Southeast and South regions concentrated the largest proportions of individuals with at least secondary education, accounting for 73.0% and 68.8%, respectively. Moreover, the Southeast stood out as the region with the highest share of workers holding a completed tertiary degree, at 27.8%. Given that telework-intensive sectors require higher levels of schooling and digital skills, these regional educational asymmetries interact with sectoral specialization to amplify the unequal diffusion of telework across the Brazilian territory, particularly to the detriment of less educated and structurally peripheral regions [39].
This pattern is consistent with the literature, which emphasizes the central role of the nature of productive activities and the content of occupational tasks in shaping the feasibility of telework. In sectors and occupations that require physical presence and direct interaction, such as transportation and construction, the gradients associated with income and schooling remain relatively flat, reflecting reduced marginal returns to these attributes. Conversely, in occupational groupings linked to office-based and cognitively intensive activities, such as those performed by scientists, engineers, and administrative professionals, more pronounced positive gradients are observed, whereby increases in schooling and earnings translate into substantial rises in the probability of telework. Overall, these findings reinforce the interpretation that, while individual attributes enhance access to telework, their effectiveness is fundamentally conditioned by sectoral and occupational structural factors, as documented in the literature [43,44,45].
In this sense, when considering structural factors, the results also reinforce a critical reflection on who is effectively included in the expansion of telework and who remains excluded due to digital inequality. Thus, access to telework is determined not only by sectoral and occupational characteristics but also by broader dimensions of digital inequality. Individuals in vulnerable situations, characterized by limited financial resources, low digital literacy, fragile support networks, or persistent psychological barriers, face cumulative disadvantages that restrict their ability to benefit from telework opportunities, even when their occupations are, in principle, compatible with remote modalities [46]. In this context, telework should be understood not only as a labor market arrangement, but as a phenomenon embedded in the broader governance of digital environments, in which institutional commitment to accessibility standards, inclusive communication, and citizen-centered innovation becomes central to preventing the reproduction of existing social and spatial inequalities.
These results reinforce the interpretation that telework in Brazil does not depend solely on individual characteristics but is strongly conditioned by occupational structure and gender inequalities. The higher probability observed among women in positions of greater autonomy suggests that telework may simultaneously expand opportunities for flexibility and reproduce inequalities, insofar as it remains restricted to groups with greater control over work organization. Thus, the observed pattern underscores the selective and stratified nature of telework, aligning with evidence that sociodemographic and occupational factors jointly shape access to telework.
Furthermore, the replicability of this research primarily concerns the methodological framework employed rather than the direct reproduction of empirical results. The analytical approach, combining large-scale survey data with machine learning classification techniques, can be applied to other national contexts, provided that comparable datasets are available. However, the outcomes themselves may differ substantially across countries due to structural and contextual factors, such as income distribution patterns, geographic scale, labor market organization, and levels of economic development. Therefore, while the methodology is transferable, the interpretation of results must remain sensitive to country-specific socioeconomic and institutional conditions, reinforcing the importance of contextualized analyses when examining telework dynamics internationally.

4. Conclusions

This study investigated the dynamics of telework from a temporal perspective through the application of machine learning techniques, considering the Brazilian context based on nationally representative data covering the period from 2022 to 2025. Nine classification algorithms were evaluated and hyperparameter-tuned using ANOVA racing, with model performance assessed primarily through the ROC AUC metric. The results indicate consistently high discriminative capacity across all periods (ROC AUC > 0.80). Moreover, the observed overlap in confidence intervals across models and evaluation metrics throughout the analyzed years suggests statistically comparable performance among the algorithms, with no consistent evidence of superiority of any single method over time.
These findings demonstrate that no single model exhibits universal dominance over the analyzed time horizon, reinforcing the importance of dynamic and continuous evaluations in the process of predictive algorithm selection.
Regarding the explanatory variables, substantial level of consistency was observed among the most important predictors over time. The eight leading variables remained present in all periods, although with changes in their relative ranking. Notably, the increasing relevance of gender in more recent periods points to a reconfiguration of the determinants of telework, in which sociodemographic factors assume a more central role. These findings reinforce the evidence that telework is not merely a technological or occupational phenomenon, but is also deeply influenced by social dimensions.
Additionally, the consistent presence of variables such as education, income, and age confirms the selective nature of telework, predominantly associated with occupations requiring enhanced levels of qualification and autonomy. This pattern contributes to the persistence and potentially the intensification of socioeconomic and sociospatial inequalities.
From a public policy perspective, these results highlight the need for actions that explicitly consider the unequal conditions under which telework is accessed and performed. The growing importance of gender-related factors indicates that telework policies should include measures to reduce work–family conflicts and address persistent inequalities, such as the uneven distribution of care responsibilities and occupational segregation. The significant role of income and educational attainment underscores the importance of complementary policies aimed at digital inclusion and skills development.
In the Brazilian context, it is particularly crucial to strengthen access to education in specific regions, given that schooling is unevenly distributed across the territory, contributing to the widening of inequalities. Thus, telework should be understood not only as a flexible work arrangement but also as a public policy instrument whose benefits depend on strategies aimed at reducing inequalities, especially in developing-country contexts.
The study offers scientific contributions by advancing the state of the art on telework in Brazil and its determinants, demonstrating that this work arrangement exhibits persistent structural patterns while simultaneously undergoing relevant transformations in its explanatory factors. Furthermore, the study has technological implications by highlighting the importance of employing machine learning techniques and developing high-performing models to support analytical processes and decision-making. From a practical perspective, the results provide consistent evidence to inform the formulation and improvement of public policies. As a future research agenda, continuous monitoring of the variables influencing telework in the country is recommended, given that this dynamic is susceptible to change over time. The study presents limitations regarding the comparability of model performance gains across different feature transformation techniques. In this regard, future research may more systematically assess the impact of changes in variable distributions over time, as well as the effects of techniques such as median imputation, artificial category creation, among others, on model performance. Additionally, unsupervised learning methods may be explored to address potential correlations among variables.

Author Contributions

Conceptualization, methodology, data curation, software, formal analysis, visualization, and original draft preparation were performed by L.d.A.M. R.B.D.P. contributed to model execution, computational support, validation, and writing support. J.P.L. supervised the research and contributed to the interpretation of results, writing, and critical revision of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Federal University of Itajubá (UNIFEI).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available from the Brazilian Institute of Geography and Statistics (IBGE) through the Continuous National Household Sample Survey (PNAD Contínua), at https://www.ibge.gov.br (accessed on 1 October 2025). These data were derived from public domain resources.

Acknowledgments

The authors would like to acknowledge the Brazilian National Council for Scientific and Technological Development (CNPq), the Coordination of Superior Level Staff Improvement (CAPES), the Research Support Foundation of the State of Minas Gerais (FAPEMIG), and the Federal University of Itajubá (UNIFEI).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Hopkins, J.L.; McKay, J. Investigating ‘anywhere working’ as a mechanism for alleviating traffic congestion in smart cities. Technol. Forecast. Soc. Change 2019, 142, 258–272. [Google Scholar] [CrossRef]
  2. Rasche, B.; Dreber, N.; Zehl, F.; Knie, A. Forschung für die Mobilitätswende: COVID-19-Pandemie als Treiber? GAIA–Ecol. Perspect. Sci. Soc. 2021, 30, 276–277. [Google Scholar] [CrossRef]
  3. Martens, M.; Korver, W. Forecasting and assessing the mobility effects of teleservices: Scenario approach. Transp. Res. Rec. 2000, 1706, 118–125. [Google Scholar] [CrossRef]
  4. Van Lier, T.; De Witte, A.; Macharis, C. How worthwhile is teleworking from a sustainable mobility per-spective? The case of Brussels Capital region. Eur. J. Transp. Infrastruct. Res. 2014, 14, 244–267. [Google Scholar]
  5. de Vos, D.; Meijers, E.; van Ham, M. Working from home and the willingness to accept a longer commute. Ann. Reg. Sci. 2018, 61, 375–398. [Google Scholar] [CrossRef]
  6. Mouratidis, K.; Peters, S. COVID-19 impact on teleactivities: Role of built environment and implications for mobility. Transp. Res. Part A Policy Pract. 2022, 158, 251–270. [Google Scholar] [CrossRef]
  7. De Abreu e Silva, J.; Melo, P.C. Home telework, travel behavior, and land-use patterns. J. Transp. Land Use 2018, 11, 419–441. [Google Scholar] [CrossRef]
  8. Ravalet, E.; Rérat, P. Teleworking: Decreasing mobility or increasing tolerance of commuting distances? Built Environ. 2019, 45, 582–602. [Google Scholar] [CrossRef]
  9. López Soler, J.R.; Christidis, P.; Vassallo, J.M. Teleworking and online shopping: Socio-economic factors affecting their impact on transport demand. Sustainability 2021, 13, 7211. [Google Scholar] [CrossRef]
  10. Nguyen, M.H. Factors influencing home-based telework in Hanoi (Vietnam) during and after the COVID-19 era. Transportation 2021, 48, 3207–3238. [Google Scholar] [CrossRef] [PubMed]
  11. Fatmi, M.R.; Orvin, M.M.; Thirkell, C.E. The future of telecommuting post COVID-19 pandemic. Transp. Res. Interdiscip. Perspect. 2022, 16, 100685. [Google Scholar] [CrossRef] [PubMed]
  12. Hensher, D.A.; Balbontin, C.; Beck, M.J.; Wei, E. The impact of working from home on modal commuting choice response during COVID-19: Implications for two metropolitan areas in Australia. Transp. Res. Part A Policy Pract. 2022, 155, 179–201. [Google Scholar] [CrossRef]
  13. Salon, D.; Mirtich, L.; Bhagat-Conway, M.W.; Costello, A.; Rahimi, E.; Mohammadian, A.K.; Chauhan, R.S.; Derrible, S.; da Silva Baker, D.; Pendyala, R.M. The COVID-19 pandemic and the future of telecommuting in the United States. Transp. Res. Part D Transp. Environ. 2022, 112, 103473. [Google Scholar]
  14. Sweet, M.; Scott, D.M. Insights into the future of telework in Canada: Modeling the trajectory of telework across a pandemic. Sustain. Cities Soc. 2022, 87, 104175. [Google Scholar] [CrossRef]
  15. Nayak, S.; Pandit, D. Potential of telecommuting for different employees in the Indian context beyond COVID-19 lockdown. Transp. Policy 2021, 111, 98–110. [Google Scholar] [CrossRef]
  16. Zhang, S.; Moeckel, R.; Moreno, A.T.; Shuai, B.; Gao, J. A work-life conflict perspective on telework. Transp. Res. Part A Policy Pract. 2020, 141, 51–68. [Google Scholar] [CrossRef]
  17. Kumara, B.; Herath, G.; Wijeratne, P.; Banujan, K. Work From Home After Covid-19: Machine Learn-ing-Based Approach to Predict Employee’s Choice. In Proceedings of the 2022 International Con-ference on Decision Aid Sciences and Applications (DASA), Chiangrai, Thailand, 23–25 March 2022; pp. 147–150. [Google Scholar]
  18. Abesiri, S.; Rupasingha, R. Predicting Employee Preference of Teleworking Using Machine Learning Tech-niques in the Post COVID-19 Period in Sri Lanka. In Proceedings of the 2022 International Research Conference on Smart Computing and Systems Engineering (SCSE), Colombo, Sri Lanka, 1 September 2022; Volume 5, pp. 22–27. [Google Scholar]
  19. Chen, H.H.; Lu, H.H.S.; Weng, W.H.; Lin, Y.H. Developing a Machine Learning Algorithm to Predict the Probability of Medical Staff Work Mode Using Human-Smartphone Interaction Patterns: Algorithm Devel-opment and Validation Study. J. Med. Internet Res. 2023, 25, e48834. [Google Scholar] [CrossRef] [PubMed]
  20. Rehan, F.A.; Bukhari, F.; Iqbal, W. Impact of COVID-19 on Productivity of Software Engineers: A Com-parative Analysis of Work from Home (WFH) and Work from Office (WFO) Environment using Machine Learning. In Proceedings of the 2023 2nd International Conference on Emerging Trends in Electrical, Control, and Telecommunication Engineering (ETECTE), Lahore, Pakistan, 27–29 November 2023; pp. 1–8. [Google Scholar] [CrossRef]
  21. Setiawan, J.; Alamsari, R.G. Prediction of Work From Home Post COVID-19 using Classification Model. In Proceedings of the 2022 Seventh International Conference on Informatics and Computing (ICIC), Bali, Indonesia, 8–9 December 2022; pp. 1–6. [Google Scholar] [CrossRef]
  22. Elldér, E. Does telework weaken urban structure–travel relationships? J. Transp. Land Use 2017, 10, 187–210. [Google Scholar] [CrossRef]
  23. Oliveira, M.L.D.; Mairinque, L.D.A.; Santos, J.B.D.; Lima, J.P. Multivariate analysis of public transport quality: A case study in a medium-sized Brazilian city. Production 2022, 32, e20210117. [Google Scholar] [CrossRef]
  24. Barros dos Santos, J.; Lima, J.P. Health Determinants, Applications, and Methods: A Systematic Literature Review on the Relationships Between the Urban Transport of People and Health. Transp. Res. Rec. 2024, 2678, 245–271. [Google Scholar] [CrossRef]
  25. Gareth, J.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R; Springer: New York, NY, USA, 2013. [Google Scholar]
  26. Instituto Brasileiro de Geografia e Estatística (IBGE). Pesquisa Nacional por Amostra de Domicílios Contínua: Quadro Sintético 1º Trimestre de 2025 (Jan–Fev–Mar); Coordenação de Trabalho e Rendimento: Rio de Janeiro, Brazil, 2025. [Google Scholar]
  27. Instituto Brasileiro de Geografia e Estatística (IBGE). Dicionário de Variáveis da PNAD Contínua–Microdados 2022, Visita 1; IBGE: Rio de Janeiro, Brazil, 2023. Available online: https://ftp.ibge.gov.br/Trabalho_e_Rendimento/Pesquisa_Nacional_por_Amostra_de_Domicilios_continua/Anual/Microdados/Visita/Visita_1/Documentacao/dicionario_PNADC_microdados_2022_visita1_20231129.xls (accessed on 23 January 2026).
  28. Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
  29. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2025; Available online: https://www.R-project.org/ (accessed on 23 January 2026).
  30. Kuhn, M.; Silge, J. Tidy Modeling with R: A Framework for Modeling in the Tidyverse; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2022. [Google Scholar]
  31. Kuhn, M.; Wickham, H. Tidymodels: A Collection of Packages for Modeling and Machine Learning Using Tidyverse Principles. Available online: https://www.tidymodels.org (accessed on 23 January 2026).
  32. Kuhn, M. finetune: Additional Functions for Model Tuning, R Package Version 1.2.1; Available online: https://CRAN.R-project.org/package=finetune (accessed on 23 January 2026).
  33. Maksymiuk, S.; Gosiewska, A.; Biecek, P. Landscape of R packages for eXplainable Artificial Intelligence. arXiv 2020, arXiv:2009.13248. [Google Scholar]
  34. Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
  35. Hosmer, D.W.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression, 3rd ed.; Wiley: Hoboken, NJ, USA, 2013. [Google Scholar]
  36. Mairinque, L.A.; Pereira, R.B.D.; Lima, J.P. Telework and Sustainable Urban Mobility: Conceptual Modeling From the Systematic Literature Review. Bus. Soc. Rev. 2026, 131, e70043. [Google Scholar] [CrossRef]
  37. Asgari, H.; Jin, X.; Mohseni, A. Choice, frequency, and engagement: Framework for telecommuting behavior analysis and modeling. Transp. Res. Rec. 2014, 2413, 101–109. [Google Scholar] [CrossRef]
  38. Xiang, B. Remote work, social inequality and the redistribution of mobility. Int. Migr. 2022, 60, 280–282. [Google Scholar] [CrossRef]
  39. Instituto Brasileiro de Geografia e Estatística (IBGE). Pesquisa Nacional por Amostra de Domicílios Contínua–Indicadores Trimestrais: 3º Trimestre de 2025 (Jul–Set); IBGE: Rio de Janeiro, Brazil, 2025. Available online: https://biblioteca.ibge.gov.br/visualizacao/periodicos/2421/pnact_2025_3tri.pdf (accessed on 23 January 2026).
  40. Vilhelmson, B.; Thulin, E. Who and where are the flexible workers? Exploring the current diffusion of tele-work in Sweden. New Technol. Work. Employ. 2016, 31, 77–96. [Google Scholar] [CrossRef]
  41. Ogungbire, A.; Mitra, S.K. Unlocking telecommuting patterns before, during, and after the COVID-19 pandemic: An explainable AI-driven study. Transp. Res. Interdiscip. Perspect. 2024, 28, 101244. [Google Scholar] [CrossRef]
  42. Reuschke, D.; Ekinsmyth, C. New spatialities of work in the city. Urban Stud. 2021, 58, 2177–2187. [Google Scholar] [CrossRef]
  43. Adobati, F.; Debernardi, A. The breath of the Metropolis: Smart working and new urban geographies. Sustainability 2022, 14, 1028. [Google Scholar] [CrossRef]
  44. Krasilnikova, N.; Levin-Keitel, M. Telework as a Game-Changer for Sustainability? Transitions in Work, Workplace and Socio-Spatial Arrangements. Sustainability 2022, 14, 6765. [Google Scholar] [CrossRef]
  45. Magnus, M.; Glackin, S.; Hopkins, J.L. The Working-from-Home Natural Experiment in Sydney, Australia: A Theory of Planned Behaviour Perspective. Sustainability 2022, 14, 13997. [Google Scholar]
  46. Kolotouchkina, O.; Ripoll González, L.; Belabas, W. Smart Cities, Digital Inequalities, and the Challenge of Inclusion. Smart Cities 2024, 7, 3355–3370. [Google Scholar] [CrossRef]
Figure 1. Workflow.
Figure 1. Workflow.
Sustainability 18 03043 g001
Figure 2. Accuracy and ROC AUC–Fourth quarter of 2022.
Figure 2. Accuracy and ROC AUC–Fourth quarter of 2022.
Sustainability 18 03043 g002
Figure 3. Accuracy and ROC AUC—Third quarter of 2023.
Figure 3. Accuracy and ROC AUC—Third quarter of 2023.
Sustainability 18 03043 g003
Figure 4. Accuracy and ROC AUC—Second quarter of 2024.
Figure 4. Accuracy and ROC AUC—Second quarter of 2024.
Sustainability 18 03043 g004
Figure 5. Accuracy and ROC AUC–First quarter of 2025.
Figure 5. Accuracy and ROC AUC–First quarter of 2025.
Sustainability 18 03043 g005
Figure 6. Precision and recall—Fourth quarter of 2022.
Figure 6. Precision and recall—Fourth quarter of 2022.
Sustainability 18 03043 g006
Figure 7. Precision and recall—Third quarter of 2023.
Figure 7. Precision and recall—Third quarter of 2023.
Sustainability 18 03043 g007
Figure 8. Precision and recall—Second quarter of 2024.
Figure 8. Precision and recall—Second quarter of 2024.
Sustainability 18 03043 g008
Figure 9. Precision and recall—First quarter of 2025.
Figure 9. Precision and recall—First quarter of 2025.
Sustainability 18 03043 g009
Figure 10. Variable Importance Plot (VIP) of the Random Forest model—Fourth Quarter of 2022. Variable codes correspond to the descriptions presented in Table 4.
Figure 10. Variable Importance Plot (VIP) of the Random Forest model—Fourth Quarter of 2022. Variable codes correspond to the descriptions presented in Table 4.
Sustainability 18 03043 g010
Figure 11. Variable Importance Plot (VIP) of the Random Forest model—Third quarter of 2023. Variable codes correspond to the descriptions presented in Table 4.
Figure 11. Variable Importance Plot (VIP) of the Random Forest model—Third quarter of 2023. Variable codes correspond to the descriptions presented in Table 4.
Sustainability 18 03043 g011
Figure 12. Variable Importance Plot (VIP) of the XGB model—Second quarter of 2024. Variable codes correspond to the descriptions presented in Table 4.
Figure 12. Variable Importance Plot (VIP) of the XGB model—Second quarter of 2024. Variable codes correspond to the descriptions presented in Table 4.
Sustainability 18 03043 g012
Figure 13. Variable Importance Plot (VIP) of the XGB model—First quarter of 2025. Variable codes correspond to the descriptions presented in Table 4.
Figure 13. Variable Importance Plot (VIP) of the XGB model—First quarter of 2025. Variable codes correspond to the descriptions presented in Table 4.
Sustainability 18 03043 g013
Figure 14. Contour plots of predicted telework probability by V2009 and V4039, stratified by V2007 and VD4009–Fourth Quarter of 2022.
Figure 14. Contour plots of predicted telework probability by V2009 and V4039, stratified by V2007 and VD4009–Fourth Quarter of 2022.
Sustainability 18 03043 g014
Figure 15. Contour plots of predicted telework probability by V2009 and V4039, stratified by V2007 and VD4009–Third quarter of 2023.
Figure 15. Contour plots of predicted telework probability by V2009 and V4039, stratified by V2007 and VD4009–Third quarter of 2023.
Sustainability 18 03043 g015
Figure 16. Contour plots of predicted telework probability by V2009 and V4039, stratified by V2007 and VD4009–Second quarter of 2024.
Figure 16. Contour plots of predicted telework probability by V2009 and V4039, stratified by V2007 and VD4009–Second quarter of 2024.
Sustainability 18 03043 g016
Figure 17. Contour plots of predicted telework probability by V2009 and V4039, stratified by V2007 and VD4009–First quarter of 2025.
Figure 17. Contour plots of predicted telework probability by V2009 and V4039, stratified by V2007 and VD4009–First quarter of 2025.
Sustainability 18 03043 g017
Figure 18. Contour plots of predicted telework probability by VD4016 and VD3005, stratified by VD4010 and VD4011–Fourth Quarter of 2022.
Figure 18. Contour plots of predicted telework probability by VD4016 and VD3005, stratified by VD4010 and VD4011–Fourth Quarter of 2022.
Sustainability 18 03043 g018
Figure 19. Contour plots of predicted telework probability by VD4016 and VD3005, stratified by VD4010 and VD4011–Third Quarter of 2023.
Figure 19. Contour plots of predicted telework probability by VD4016 and VD3005, stratified by VD4010 and VD4011–Third Quarter of 2023.
Sustainability 18 03043 g019
Figure 20. Contour plots of predicted telework probability by VD4016 and VD3005, stratified by VD4010 and VD4011–Second Quarter of 2024.
Figure 20. Contour plots of predicted telework probability by VD4016 and VD3005, stratified by VD4010 and VD4011–Second Quarter of 2024.
Sustainability 18 03043 g020
Figure 21. Contour plots of predicted telework probability by VD4016 and VD3005, stratified by VD4010 and VD4011–First Quarter of 2025.
Figure 21. Contour plots of predicted telework probability by VD4016 and VD3005, stratified by VD4010 and VD4011–First Quarter of 2025.
Sustainability 18 03043 g021
Table 1. Data set size before and after class balancing.
Table 1. Data set size before and after class balancing.
YearN#N*
202545,79815,194
202445,21012,754
202346,25512,666
202245,30712,574
Table 2. Machine Learning Models and Tuned Hyperparameters.
Table 2. Machine Learning Models and Tuned Hyperparameters.
ModelTuned Hyperparameters
Multinomial Logistic Regressionpenalty, mixture
Decision Treetree_depth, min_n, cost_complexity
Bagging with Decision Tree (CART)tree_depth, min_n, cost_complexity
Random Forestmtry, min_n, trees
Extreme Gradient Boosting (XGBoost)tree_depth, learn_rate, loss_reduction, min_n, sample_size, trees
Support Vector Machine (RBF Kernel)cost, rbf_sigma
Support Vector Machine (Polynomial Kernel)cost, degree
Multivariate Adaptive Regression Splines (MARS)prod_degree
Multilayer Perceptron (MLP)hidden_units, penalty, epochs
Table 3. Performance of the best models by year according to the ROC AUC metric.
Table 3. Performance of the best models by year according to the ROC AUC metric.
YearModelROC AUC
2025XGBoost0.8142
2024XGBoost0.8257
2023Random Forest0.8235
2022Random Forest0.8056
Table 4. Most important variables for predicting telework.
Table 4. Most important variables for predicting telework.
CodeVariable Description
UFFederative Unit (state)
V1023Type of area of residence
V2007Gender
V2009Age of the resident
V2010Race or color
V3009AHighest level of education previously attended
V3014Completion status of the highest level of education previously attended
V4001Participation in paid work during the reference week
V4013Main economic activity of the enterprise
V4033Receipt of monetary earnings or withdrawals from the main job
V4039Usual weekly hours worked in the main job
V4040Job tenure in the current job
VD3005Years of schooling standardized to the 9-year elementary education system
VD4009Employment status and employment category of the main job
VD4010Main activity grouping of the enterprise in the main job
VD4011Occupational grouping of the main job
VD4016Usual monthly income from the main job
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mairinque, L.d.A.; Pereira, R.B.D.; Lima, J.P. Telework in the Brazilian Context: Social and Economic Factors Under a Machine Learning Approach. Sustainability 2026, 18, 3043. https://doi.org/10.3390/su18063043

AMA Style

Mairinque LdA, Pereira RBD, Lima JP. Telework in the Brazilian Context: Social and Economic Factors Under a Machine Learning Approach. Sustainability. 2026; 18(6):3043. https://doi.org/10.3390/su18063043

Chicago/Turabian Style

Mairinque, Laryssa de Andrade, Robson Bruno Dutra Pereira, and Josiane Palma Lima. 2026. "Telework in the Brazilian Context: Social and Economic Factors Under a Machine Learning Approach" Sustainability 18, no. 6: 3043. https://doi.org/10.3390/su18063043

APA Style

Mairinque, L. d. A., Pereira, R. B. D., & Lima, J. P. (2026). Telework in the Brazilian Context: Social and Economic Factors Under a Machine Learning Approach. Sustainability, 18(6), 3043. https://doi.org/10.3390/su18063043

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop