Analyzing Communication and Migration Perceptions Using Machine Learning: A Feature-Based Approach

Tirado-Espín, Andrés; Marcillo-Vera, Ana; Cáceres-Benítez, Karen; Almeida-Galárraga, Diego; Orozco Garzón, Nathaly; Moreno Guaicha, Jefferson Alexander; Carvajal Mora, Henry

doi:10.3390/journalmedia6030112

Open AccessArticle

Analyzing Communication and Migration Perceptions Using Machine Learning: A Feature-Based Approach

by

Andrés Tirado-Espín

¹

,

Ana Marcillo-Vera

¹

,

Karen Cáceres-Benítez

²

,

Diego Almeida-Galárraga

²

,

Nathaly Orozco Garzón

^3,*

,

Jefferson Alexander Moreno Guaicha

⁴

and

Henry Carvajal Mora

³

¹

School of Mathematical and Computational Sciences, Yachay Tech University, Hacienda San José s/n, San Miguel de Urcuquí 100119, Ecuador

²

School of Science Biological and Engineering, Yachay Tech University, Hacienda San José s/n, San Miguel de Urcuquí 100119, Ecuador

³

ETEL Research Group, Faculty of Engineering and Applied Sciences, Networking and Telecommunications Engineering, Universidad de Las Américas (UDLA), Quito 170503, Ecuador

⁴

Departamento de Posgrados, Campus Morelos, Universidad de Investigación e Innovación de México, Insurgentes 115, Centro, Jiutepec 62550, Morelos, Mexico

^*

Author to whom correspondence should be addressed.

Journal. Media 2025, 6(3), 112; https://doi.org/10.3390/journalmedia6030112

Submission received: 26 May 2025 / Revised: 7 July 2025 / Accepted: 15 July 2025 / Published: 18 July 2025

Download

Browse Figures

Versions Notes

Abstract

Public attitudes toward immigration in Spain are influenced by media narratives, individual traits, and emotional responses. This study examines how portrayals of Arab and African immigrants may be associated with emotional and attitudinal variation. We address three questions: (1) How are different types of media coverage and social environments linked to emotional reactions? (2) What emotions are most frequently associated with these portrayals? and (3) How do political orientation and media exposure relate to changes in perception? A pre/post media exposure survey was conducted with 130 Spanish university students. Machine learning models (decision tree, random forest, and support vector machine) were used to classify attitudes and identify predictive features. Emotional variables such as fear and happiness, as well as perceptions of media clarity and bias, emerged as key features in classification models. Political orientation and prior media experience were also linked to variation in responses. These findings suggest that emotional and contextual factors may be relevant in understanding public perceptions of immigration. The use of interpretable models contributes to a nuanced analysis of media influence and highlights the value of transparent computational approaches in migration research.

Keywords:

media framing; Arab and African migrants; decision tree; public perception; random forest; support vector machine

1. Introduction

By 1 January 2014, the EU had 33.5 million foreign-born individuals who were not born within the EU, or 6.6% of the EU’s population, as reported by Eurostat. Third-country migration brought an additional 5.1 million migrants to the EU by 2022, marking a continuing trend of increasing migration. Since 2015, Europe has been facing a growing migration crisis, with southern countries, and more specifically Spain, becoming key host countries for migrants from all over the world (Galvañ & Giménez, 2020; King, 2019). In 2015, Germany had the highest number of immigrants at 2.1 million, but Spain received 1.3 million new immigrants, which corresponded to a ratio of 19 immigrants for every 1000 residents. This continuous trend has resulted in a tremendous increase of the immigrant population in Spain (Manthei, 2020). Data from the National Statistics Institute (INE) (Instituto Nacional de Estadística, 2020a, 2020b) indicate that, as of January 2020, the country’s migratory balance stood at 5,235,375 people, with Venezuelans, Colombians, and Moroccans making up the largest number of arrivals. As of January 2023, Spain’s immigrant population stood at 6.1 million (European Commission, 2024). These groups currently make up 11% of Spain’s population, primarily as a result of Spain’s economic recovery, which has revived Spain as an attractive destination for immigrants (Enríquez, 2019; Khai, 2025; Martínez-Martínez et al., 2018).

Although migrants are an asset to the economy and contribute to the demographic diversity of host countries, public opinion is generally ambivalent. Some studies even highlight that migration can play a key role in mitigating the effects of population aging, particularly by alleviating labor shortages and supporting the sustainability of welfare systems (Pérez Fructuoso et al., 2025). Immigrant groups are usually stereotyped, in this case, regarding their perceived failure (Sbaa et al., 2025; Tirado-Espín et al., 2022). In United States or Spain, Latin American immigrants, for instance, are stereotyped as being “lazy” (Yemane & Fernández-Reino, 2021). Social media and media outlets increasingly portray immigration negatively, showing migrants as a threat to national security and also emphasizing cultural instability (Tirado-Espín et al., 2021). Reframing and manipulation of popular news stories by the mainstream on immigration are among the favorite tactics of anti-immigrant rhetoric, emphasizing crime, social disorder, cultural conflicts, and how migrants are seen to be economically burdensome (Ekman, 2019; Kalfeli et al., 2024; Tirado-Espín et al., 2020). This is especially worrying considering that recent research highlights how specific migrant subgroups may turn to migration as a means of escaping domestic stigma and marginalization only to encounter new forms of exclusion and vulnerability in host societies (Chasciar et al., 2024).

In the last couple of years, the growing number of immigrants in Spain has been paralleled by increasingly complex and, at times, contradictory public debates. Public opinion is influenced by media reporting, which aims to depict immigrants with negative stereotypes, particularly in terms of crime, economic costs, and cultural differences (Ardèvol-Abreu, 2015; Boateng et al., 2021; Indelicato & Martín, 2024; Robutti, 2024). Such portrayals have been associated with increased discriminatory attitudes and xenophobia, potentially contributing to the reinforcement of social cleavages (Bekteshi & Bellamy, 2024; Komendantova et al., 2023). For instance, between 1996 and 1997, 65% of the immigration news was negative, and only 18% was good news. Negative reporting increased to 69% in 2006–2007, while positive reporting rose only to 21% (Arjona & Checa, 2011). Framing of this nature has the tendency to downplay the constructive contributions that are made by the immigrants to the host country in terms of economic benefits and contributions to social growth. In addition, research over recent years has suggested that not only does this framing have an influence, but even more importantly, what term one uses in naming the immigrant (for instance, “asylum seeker” or “refugee”) can shape popular views, maybe inducing a firmer negative stereotyping or, alternatively, even better ones on the basis of terms chosen (Verleyen & Beckers, 2023). In the same way, it has been observed that the media frame migration in a tension between humanitarianism and security, in which the two discourses coexist complementarily and reflect the real tensions in society as it confronts migratory flows (Kollias et al., 2025; Magazzini, 2021). These tensions also manifest at the local level, where stakeholders and NGOs often attempt to mediate between national migration policies and the demands of arriving populations, often encountering institutional obstacles and fragmented governance systems (Aydar, 2022).

1.1. Motivation and Contributions

In light of these objectives, our study aims to achieve the following goals: (1) analyze the relationship between different types of media coverage and emotional responses towards immigrants in Spain, (2) investigate how exposure to positive or negative news is associated with changes in individuals’ attitudes towards immigrants, and (3) develop a predictive model to estimate the emotional impact of media coverage about immigrants based on individual characteristics, perceptions, and media consumption patterns. To guide our research, we pose the following questions:

RQ1: How are different types of media coverage and social environments associated with emotional responses towards immigrants in Spain?
RQ2: What emotional reactions are most commonly associated with media portrayals of Arab and African immigrants?
RQ3: In what ways do individual characteristics, such as political affiliation and previous media exposure, influence shifts in attitudes towards immigrants following media consumption?

By exploring these questions, this study seeks to provide important insight into the intricate relationship between media coverage and public opinion, ultimately informing the development of strategies that promote a more inclusive discussion about immigration.

To classify perceptions into positive and negative, three supervised learning algorithms—decision tree (DT), random forest (RF), and support vector machine (SVM)—were utilized. These models were chosen as they are interpretable, consistent, and proficient in task classification. Ensemble learning was supported, and thus model generalization and accuracy were enhanced by capitalizing on the strengths of each classifier through ensembling of the models. The use of these models was justified as it has the ability to accommodate high-dimensional complex data without compromising interpretability, providing insights on the respective importance of various features.

Given the complexity and sensitivity of topics like migration and public attitudes, interpretability becomes essential. Interpretability is understood as the degree to which a human can understand the cause of a decision (Christoph, 2020). As Doshi-Velez and Kim highlight, model performance metrics alone, such as accuracy, often fail to capture the practical relevance or trustworthiness of predictions (Christoph, 2020). This study therefore considers interpretability both as a means of transparency and as a tool for identifying potential biases in feature influence. Furthermore, the use of decision trees and ensemble models aligns with recommended strategies for high-dimensional data in small samples, as discussed by Hastie, Tibshirani, and Friedman (Hastie, 2009), allowing for structured feature selection and variance reduction without compromising generalization.

Besides, as AI is increasingly incorporated into news production and public opinion analysis, one should make sure that algorithms are well designed and trained so they do not entrench existing biases, particularly when it comes to sensitive topics like migration (De-Lima-Santos & Ceron, 2022). This approach aimed to identify key factors associated with attitudes towards Arab-African immigrants in Spain.

Figure 1 presents a schematic illustration of the research process. It begins with survey data collection pre- and post-exposure to media content, then data preprocessing and exploratory data analysis. This sequence involves binarization of migration impressions and model training. Importance scores from the classification models informed feature selection, enabling us to determine the most pertinent predictors. This approach aligns with previous research emphasizing the success of ensemble and linear models in classification tasks for migration and perception (Anuar et al., 2023; K. Best et al., 2022; K. B. Best et al., 2021).

The findings suggest possible mechanisms through which media depiction and sociopolitical context may be associated with affective reactions and orientations, and that political identity is the main determinant of emotions upon being exposed to information, such as fear, interest, and surprise. The findings highlight the potential of evidence-based methods to contribute to public opinion research and to guiding effective media campaigns.

Conceptual Framework: Media Framing, Context, and Individual Traits

To conceptually frame this study, we draw on media framing theory, which views framing as the process by which certain aspects of perceived reality are emphasized in communication in order to shape how audiences understand problems, identify causes, evaluate moral implications, and consider possible solutions (Entman, 1993).

This process of selection and emphasis plays a key role in eliciting emotional responses. According to Entman’s framework, framing enhances the prominence of certain elements within a message, increasing the likelihood that audiences will notice, interpret, and retain the information presented (Entman, 1993). In the context of migration-related news, highlighting emotionally charged themes—such as conflict, victimization, or solidarity—can trigger specific emotional reactions like fear or happiness, ultimately shaping public interpretation and memory of the content.

This aligns with our research objective to examine how individuals emotionally and cognitively respond to media portrayals of immigrants. Within Entman’s framework, framing influences not only what people notice and understand, but also how they evaluate issues and make decisions based on that information (Entman, 1993). Emotional reactions such as fear or happiness are therefore not incidental; rather, they are central to how framing shapes interpretation, attitude formation, and potential behavioral responses.

Framing influences attitudes and emotions more than behavior, with documented moderate effects on perceptions (

d = 0.41

–

0.47

) but weaker influence on action (

d = 0.11

) (Amsalem & Zoizner, 2022). These effects operate through mechanisms such as availability, accessibility, and applicability (Chong & Druckman, 2007). In digital contexts, framing is shaped not only by elites but also by ordinary users whose ideological orientation and region influence frame adoption—liberals favoring moral/humanitarian frames, conservatives favoring threat frames (Mendelsohn et al., 2021). Framing effects are also shaped by hybrid media dynamics and participatory cultures, making them volatile and context dependent. Moreover, individual psychological traits like fear or locus of control moderate message reception, with prosocial framing proving more persuasive among low-fear individuals (Ceylan & Hayran, 2021). Rather than assuming causality, this framework supports the interpretive modeling of associations among media content, context, and individual traits.

Finally, Entman highlights that framing operates not only through what is emphasized, but also through what is excluded. Omissions can be just as influential as inclusions in directing audience interpretation (Entman, 1993). From this perspective, the prominence of certain emotional responses—such as fear or happiness—can be seen as a result not only of explicit emphasis in the media, but also of the systematic exclusion of alternative frames. Our study therefore interprets the presence of these emotions as a reflection of how dominant narratives shape public perception by privileging specific evaluative perspectives while silencing others.

1.2. Related Works

With the extent of the migration condition and the contribution of media to the formation of public attitude, an attempt has to be made to understand the impact of social support, discrimination, and integration on the psychosocial adjustment of immigrants in host countries, considering the ways in which heterogeneous background, language proficiency, and settlement experience might condition these variables (McCann et al., 2023; Michalovich, 2021). Many studies have analyzed the predictors of the psychosocial health of immigrants, particularly in Spain and other major destinations for migration like China. These studies investigate the impact of variables like social support networks, perceived discrimination, and sense of community (SOC) on mental health, life satisfaction, and social exclusion experiences of migrant populations.

Beyond the Spanish and Chinese contexts, it is also important to consider cases that reveal different mechanisms of anti-migrant sentiment shaped by political controversy and media dynamics. A notable example is Turkey, where a distinct form of public hostility has emerged not only around economic or criminal concerns, but particularly in response to the extension of voting rights to naturalized Syrian migrants. As documented by Yurtcicek Ozaydin (Ozaydin, 2018), public sentiment was heavily influenced by widespread rumors on social media alleging that the Turkish government had granted citizenship en masse to Syrian migrants ahead of the 2018 general elections to secure electoral support for President Erdogan. Although the official numbers were low, this narrative fueled a dominant wave of anti-migrant discourse. In a Twitter-based analysis, 77.6% of anti-Syrian tweets prior to the elections were linked to the voting issue, which remained the most prevalent category even after the elections. The study also showed a notable fragmentation: economic and voting concerns clustered together among critics of the government, whereas crime-related discourse was more diffusely shared. This case underscores how politically charged policy decisions—when amplified by media rumors—can intensify anti-migrant sentiment even among individuals who might otherwise not oppose immigration on cultural or economic grounds.

Table 1 presents an in-depth overview of the most relevant research on social support and well-being among immigrants in different places, including the quantitative methods used and the most important conclusions obtained in the research. The various columns of the table play different functions: while the Article column specifies the respective research, the Nationalities of Immigrants column lists the respective immigrant groups that are the focus of investigation. The Sample Size column tallies the number of subjects, thus providing information on the scope of the study. The Age Range column identifies the population groups that each study aimed to include. The Quantitative Methodology column outlines the statistical analysis employed, including regression analysis and cluster analysis. The Scales Used column outlines the measurement tools employed, such as social support and life satisfaction scales, which are utilized to measure the primary variables. Finally, the Quantitative Results section provides an overview of the main findings, emphasizing the relationships among variables such as social support and discrimination. This research is the foundation for understanding the varied experiences of immigrants and the importance of social integration in mitigating the negative effects of discrimination and cultural adjustment.

Stepping back from that, Hombrados-Mendieta et al. (2019)’s article has some beneficial information about the ways in which social support networks contribute positively towards the well-being of Spain’s immigrants. The authors’ research aimed to explore the effect of the networks upon such indicators as sense of community (SOC), life satisfaction (SWL), and physical as well as mental health. Comparing a random sample of 1131 Latin American, African, and Eastern European immigrants, the authors used a structural equation model (SEM) to assess strong correlations. Of special note, support of locals was correlated with SOC positively (coefficient = 0.131), which correlated extremely with enhanced life satisfaction (coefficient = 0.648). These are structural equation model (SEM) standardized regression weights. They indicate the direction and magnitude of the relationship between latent or observed variables. For example, a coefficient of 0.648 for sense of community (SOC) and life satisfaction (SWL) represents a strong positive relationship where greater SOC is associated with greater SWL. Negative values, i.e., −0.171, reflect reverse associations. Coefficients are all dimensionless and generally within the range −1 to 1 (Kline, 2023). In addition, larger SWL was found with decreases in both illness symptoms on a mental scale (−0.012) and disease on a physical scale (−0.171), which speaks of the defensive character of the relations (Hombrados-Mendieta et al., 2019).

In its expansion on what had been noted in the reports submitted by Hombrados-Mendieta et al. (2019), García-Cid et al. (2020)’s research studied the impact of experiences of perceived discrimination on immigrant psychosocial well-being in Spain and determined the ways through which a sense of community (SOC) can alleviate such harmful effects. The study drew upon a sample of 1714 immigrants from Eastern Europe, Africa, and Latin America, using multiple regression analysis to analyze interrelations among discrimination, psychological distress, life satisfaction, and social exclusion. The results revealed that perceived discrimination was a strong predictive factor in both psychological distress (+0.33) and social exclusion (+0.43), as well as correlating with lower life satisfaction (−0.36). Above all, immigrants who had a higher level of SOC endorsed fewer negative outcomes, as SOC was strongly correlated with life satisfaction (+0.44) and negatively correlated with psychological distress (−0.27) and social exclusion (−0.28). SOC thus appears to play a buffering function in tempering the effects of discrimination. Additionally, García-Cid et al. (2020)’s study utilized a range of measures such as the Perceived Discrimination Questionnaire, Brief Sense of Community Scale, and General Health Questionnaire (GHQ-12) to sketch a wide-ranging picture of such interrelations (García-Cid et al., 2020).

Building on previous studies, the work of Caro-Carretero et al. (2024) is extremely significant as it examines the evolution of Spanish public opinion towards immigration across the time frame 2015–2017 through the application of advanced machine learning algorithms like the hybrid wrapper algorithm and clustering. Analysis of a dataset involving over 7000 participants showed that attitudes were structured in symbolic racism, aversive racism, and compound prejudice. In 2015, symbolic racism accounted for 38% of the existing attitudes, whereas competition for scarce resources accounted for 20% in 2017. Interestingly, aversive racism was a consistent explanatory factor throughout the three-year period. People were divided into non-multicultural and multicultural profiles, which showed profound differences in the tolerance of immigration between the groups. Specifically, 36.9% were non-multicultural in 2015, against 29.2% that identified themselves as multicultural. In 2016, 55.4% were multicultural, and just 10.3% were non-multicultural; however, in 2017, perception shifted again to 40.1% non-multicultural and 19.5% identification as multicultural. The research also found that latent racism accounted for 38% of the attitudes in 2015, and subtle racism was linked to as much as 80% in some categories. Moreover, the belief that immigrants were abusing the health system rose from 36% in 2015 to 41% in 2016. Furthermore, in 2015, 10.1% of the population identified a discrepancy in the number of scholarships that were being offered to immigrants (Caro-Carretero et al., 2024).

In an attempt to get a deeper understanding of the well-being of the immigrant, Gutiérrez-Rodríguez et al. (2024) examined heterogeneous trends of social inclusion of Latin American migrant families settled in Spain, with a focus on the interplay between individual, family, and society. Using a cluster analysis based on data from 263 Venezuelan, Cuban, Colombian, and Argentine families, the research defined three patterns of social inclusion: high inclusion (32%), partial inclusion (35%), and low inclusion (33%). The research stated that being in one of these profiles depended on household composition, use of social services, and length of residency in the area. Employing such tools as the Economic Hardship Questionnaire, the Medical Outcomes Study Social Support Survey (MO-SS), and the Neighborhood Cohesion Instrument, the research demonstrated that economic, social, and community conditions interrelate to influence the level of social inclusion reported by these families. In addition, multinomial logistic regression analyses showed that participants with greater degrees of exclusion were more likely to be single-parent households, have lower social support, and face lower neighborhood cohesion. The study underscores the need to counter social exclusion risk in migrant settings with a multi-component intervention and calls for tailor-made interventions with a view to promoting the integration of migrant families into Spanish society. Furthermore, the study determined that 56.7% of the participants were unemployed, and 64.6% had an income of less than 500 euros/month, while 0.4% earned more than 2000 euros. Self-reported financial difficulty was on average 2.9 on a 5-point scale. Social support levels were reported as moderate to high: instrumental support (3.71), emotional support (3.87), and affective support (4.14). Neighborhood cohesion scores showed moderate activity levels: attraction (3.48), neighbor relations (2.91), and sense of belonging (3.19), with economic hardship but good affective support networks (Gutiérrez-Rodríguez et al., 2024).

Building on previous work, Yang et al. (2021) analyze the impacts of personally perceived physical and social neighborhood change on China’s internal migrants’ mental well-being in Shenzhen. Drawing on a sample of 591, random forests are used to estimate nonlinear impacts based on neighborhood change covariates of aesthetics, safety, green space, and social cohesion, both prior to, and after, migration. The results indicate that a reduction in perceived safety and social cohesion following migration is strongly linked with poor mental health outcomes, but that an improvement in these domains does not necessarily translate into improved psychological well-being. Furthermore, the authors note that personal factors, including physical health and income, are more important in determining mental health than are changes in the surrounding neighborhood environment. The migrant sample had a mean General Health Questionnaire-12 (GHQ-12) score of 6.610, which was a moderate measure of mental health issues. For changes in neighborhood environments, aesthetic appeal (−0.132) and safety (−0.029) were slightly less favorable, while accessibility (0.174) and greenness (0.240) were improved. Social cohesion fell modestly by −0.052. The participants’ mean age was 31.374 years, and 35% of them indicated that they had poor or fair physical health. Additionally, 69% lacked a Shenzhen Hukou, and 68% were interprovincial migrants. The income level indicated 23% earned 4000 CNY or less, 43% earned 4001 to 8000 CNY, and 34% earned above 8000 CNY. Surprisingly, 77% of the participants were working, and 23% were not working (Yang et al., 2021).

In contrast to the central studies previously mentioned, the following studies offer valuable insights but are somewhat less critical of the broader understanding of immigrant integration and attitudes toward immigration. Indelicato (2022) used the International Social Survey Programme (ISSP) 2013 dataset that comprised responses from 9066 participants across six European nations: Belgium, Germany, Spain, France, the UK, and Portugal. The research compared immigrants’ attitudes in various European areas based on regional groupings and the existence of separatist areas. The results showed significant regional variation in receptiveness to immigration, with northern and eastern European nations, as well as the Iberian Peninsula, being more welcoming. The research also determined the most receptive groups to immigration, including the young, the high-income class, non-Catholics, and foreigners (Indelicato et al., 2022).

Sánchez-Holgado et al. (2022) focused on the attitudes toward immigrants and refugees in Spain, analyzing 97,710 geolocated tweets from 2015 to 2020. Using a deep learning tool (recurrent neural networks) to detect racist and xenophobic hate speech, the study found no significant correlation between the proportion of immigrants and either positive attitudes or hate speech. The study concluded that regional factors, rather than national trends, may play a more significant role in shaping attitudes toward immigration, suggesting that regional policies could be more effective in addressing hate speech and fostering positive attitudes toward migrants (Sánchez-Holgado et al., 2022).

Formoso-Suárez et al. (2022) investigated the happiness of Latin American immigrants in Spain, focusing on religious coping and social support. The investigation, which involved immigrants from Venezuela, Argentina, Bolivia, Brazil, Chile, Colombia, Ecuador, and other nations, determined that greater religiosity and social support were the most robust predictors of happiness among the immigrants, where women had greater religiosity and more positive religious coping styles compared to men. The research emphasized the paramount importance of religion and social support in the happiness of immigrants, pointing out that positive religious coping styles played a pivotal role in achieving greater happiness and social support (Formoso-Suárez et al., 2022).

Roman Etxebarria et al. (2024) examined the life satisfaction, social networks, and inclusion of 373 migrants from various regions of the globe, such as Latin America, Eastern Europe, Africa, Asia, and Central Europe. Central Europeans had the highest life satisfaction, while African and Asian migrants had the lowest inclusion, life satisfaction, and social networks. The study revealed powerful correlations between social network size and inclusion and overall life satisfaction. The women interviewees manifested higher levels of satisfaction with life compared to the male respondents and also had larger social networks, with family relationships taking center stage in the inclusion and well-being of migrants (Roman Etxebarria et al., 2024).

Finally, Indelicato (2022) conducted a study utilizing multiple data sources, including the European Social Survey (ESS), the ISSP, and additional information from Eurostat, the Economist Intelligence Unit, and electoral data. Using data envelopment analysis (DEA) and fuzzy set theory (FST), the study evaluated citizens’ openness to immigration across various countries. The results indicated that northern and eastern European countries, along with the Iberian Peninsula, were the most tolerant toward immigration. Key factors associated with openness to immigration included youth, high-income individuals, non-Catholics, and foreign citizens. The study also found that capital regions and tourism-driven islands displayed higher levels of openness, while countries like the US and Russia had stricter identity requirements, with left-wing individuals more likely to view national identity in civic rather than ethnic terms (Indelicato, 2022).

From a methodological perspective, while several studies have applied statistical models or machine learning to analyze perceptions and attitudes, few works explicitly address the challenges posed by small sample sizes and complex, high-dimensional feature spaces. In the present study, we draw upon the established literature in statistical learning theory (Hastie, 2009) and model interpretability (Christoph, 2020) to inform our approach. In particular, we employ algorithms that allow structured variable selection and support model-agnostic interpretation techniques, thereby facilitating a more transparent and socially contextualized analysis of classification outputs. This is especially relevant when working with sensitive constructs such as political orientation, emotional responses, and intergroup attitudes.

2. Methodology for Survey-Based Classification and Feature Analysis

The chapter provides an evaluation of the entire methodology pipeline that was designed to analyze Arab-African perceptions in Spain and Europe. It gives a procedural description of each stage of the study, from data gathering to categorization. It seeks to outline how perceptions were quantified before exposure to stereotypical reports using questionnaires and machine learning methods.

As can be seen in Figure 2, the methodology used here is linear and sequential. Starting from data collection, preprocessing, exploratory data analysis (EDA), feature selection, and then classification using machine learning algorithms like decision tree, random forest, and support vector machine, all the components contribute to identifying variables associated with participants’ attitudes before and after exposure to stereotypical information.

2.1. Dataset

The participants were 130 first-semester university students from the Faculty of Information Sciences at the Complutense University of Madrid (UCM), aged between 18 and 20 years, and with both parents of Spanish nationality. Inclusion criteria were: active enrollment in the UCM, first-semester status, age within the specified range, Spanish parentage, and signed informed consent. Exclusion criteria included students from other faculties or semesters, those participating in international exchange programs, those outside the target age range, individuals with direct migratory backgrounds (first generation), and respondents who failed to complete the survey properly.

Participants were directly recruited through institutional access due to the researchers’ academic ties with UCM. No financial or other forms of incentive were given. A total of 130 students, randomly sampled from a well-defined, finite population of 194 students eligible to participate, achieved a response rate of 67%. The combination of high coverage, meaning the proportion of the total population covered, and the demographic and contextual homogeneity of the group, gives methodological rigor and credibility to the data collected (Lakens, 2022; Ziegler & Fiedler, 2024).

Although a formal a priori power analysis was not conducted prior to data collection, a post hoc sensitivity power analysis was conducted using Python’s statsmodels package to assess the statistical power of the fixed sample size. As shown in Figure 3, with N = 130 (65 per group), the design achieves 80% power to detect a medium effect size (Cohen’s

d \approx 0.45

), and 90% power for

d \approx 0.55

. These thresholds are consistent with theoretically meaningful effects in psychological and social sciences.

This method is aligned with the guidance of Lakens (2022), who suggests that, when access to a population is restricted and the coverage is equally distributed, the results of a sensitivity analysis should provide a relatively sound basis for providing evidence for sample justification. As such, the chosen sample size is verified as appropriate to provide enough power and precision to examine medium-sized effects, and because of this, confidence and security in the trustworthiness and interpretation of the findings presented are assured.

Information utilized in this research was gathered from a survey of Spain with the aim of quantifying the attitudes of Arab-Africans in Spain and Europe. The survey employed a seven-point Likert scale, a psychometric response scale that has proven to be effective in measuring opinion, attitude, and emotional nuances. This seven-point scale allowed respondents to respond on a spectrum of strong disagreement to strong agreement, allowing respondents’ feelings regarding migration, stereotypes, and social contact to be interpreted with some nuance. A seven-point scale, compared to four or six points, allowed the feelings of participants to be measured more accurately, something particularly desirable when conducting sensitive research into areas like migration and cultural beliefs (Lu et al., 2024).

Participants first completed a pre-exposure survey before being shown stereotypical facts. Two conditions were formed: one was presented with positive stereotypical facts (61 participants) and another with negative stereotypical facts (69 participants). Participants completed a post-exposure questionnaire on emotions, interests, and impressions subsequent to exposure.

The survey had two parts: the first part was devoted to the assessment of overall perceptions and attitudes towards Arab and African migration, while the second part assessed spontaneous responses to a news headline on migration. The first part addressed variables involving opinions towards migration, emotional responses, hypothetical social situations, perceptions of violence, sources influencing immigration attitudes, and political ideology, as shown in Table 2. The variables analyzed in this project delivered a clear insight into participants’ first impressions regarding migration. The next factor set was designed to assess affective responses, interest, and perceptions of the news story, thus allowing exploration of how stereotypical portrayals may be associated with variations in participants’ views. Table 2, through its categorization of variables, provides a clear summary of the dataset, outlining measurement, as well as coding methods applied to each variable.

2.2. Data Preprocessing

Data preprocessing is the essential step for maintaining the quality, consistency, and integrity of the dataset prior to initiating any modeling or statistical analysis. In the present research, the preprocessing procedure was carefully planned to respond to data complexity and sensibility regarding Arab and African migration perceptions. This phase consisted of some crucial steps, such as categorization, cleaning, and transformation. These steps are graphically outlined in Figure 4, which shows the logical sequence of data transformation and manipulation.

The initial step in preprocessing involved categorizing data into two general categories: pre-exposure and post-exposure, aligning with the two surveys shown in the above Table 2. Pre-exposure entailed data gathered from the first survey, whose purpose was to examine participants’ baseline attitudes and perceptions of Arab and African migration prior to exposure to any stereotypical information. This provided an objective account of the participants’ first impressions. The post-exposure comprised information that had been gathered from the second survey, wherein the participants’ responses had been recorded after they had been exposed to positive or negative stereotypical narratives. This classification was needed for analyzing the impact of stereotypical content on the attitudes and emotions of people, thereby facilitating exploration of possible changes in attitudes associated with exposure to different narrative tones.

Following classification, data cleaning was performed to improve the quality and credibility of the dataset. Deletion of irrelevant columns such as (P12), Otro, and Nacionalidad was carried out as they were not relevant to the research questions. Apart from being non-relevant, these variables lacked variance or had near-zero variance, implying responses from participants were identical or had minimal variation across the dataset. Zero-variance variables do not pose discriminatory effects in modeling but can have negative impacts on model performance. Their removal was thus critical to ensure the consistency and accuracy of the analysis. In addition, missing value rows were removed to ensure data consistency as well as to avert possible biases.

To allow for interpretability and model simplicity, certain variables were recoded. Political orientation was recoded to a three-category level (Left, Center, and Right), based on provided survey responses. This recoding simplified the analysis by grouping political ideologies into clear interpretable categories.

All the variables were found to be nominal or ordinal data types. The seven-point Likert scaled measures were ordinal since they were ranked categories for the extent of agreement or frequency. Examples include attitudes toward migration, emotional responses, and ratings of hypothetical scenarios. Categorical variables such as Political Orientation were nominal because they represent distinct groups without any implicit order. In order to enable numerical analysis, all the nominal and ordinal variables were coded numerically, rendering them consistent with the following modeling techniques. The encoding preserved the ordinal ordering of the Likert scale, enabling quick processing and analysis of the data.

The preprocessing step, comprising the removal of missing values, encoding of non-numerical attributes, and the elimination of irrelevant columns, is crucial in making the survey data valid and reliable for machine learning applications. Since data tend to be irregular, noisy, or incomplete, the use of structured preprocessing techniques enhances the robustness of models by minimizing inconsistency and optimizing the selection of features. The previous studies have cited the central role that data cleaning and transformation play in cleansing data mining models, representing the relevance of such operations in our research (Canchen, 2019).

2.3. Exploratory Data Analysis

After preprocessing, exploratory data analysis (EDA) was conducted to obtain a good grasp of the dataset and to guide the selection of the target variable for future modeling. This step was important in analyzing the inter-variable relationships, assessing their distributions, and establishing patterns that would be beneficial in constructing predictive models. EDA focused in particular on finding a variable that would properly measure Arab and African migration perceptions, according to the research objectives.

EDA began with the determination of a Spearman correlation matrix. Spearman’s correlation coefficient was used because it is usable on ordinal and non-parametric data since it evaluates the strength and direction of a monotonic relationship without requiring linearity (Ali Abd Al-Hameed, 2022). This was most suitable because most of the variables were gathered through a seven-point Likert scale. A correlation matrix was generated to search for the relationship between all the variables in order to find any target variables that possessed strong correlations with other predictors. The main objective of constructing this correlation matrix was to identify patterns of positive and negative correlations, which may reflect underlying attitudinal or perceptual dimensions of relevance to migration.

To study the association between variables in the pre-exposure cohort, a correlation matrix was constructed based on Spearman’s rank correlation coefficient. The resulting heatmap (Figure 5) is a plot of correlations among variables of the initial survey and, as a result, is amenable to pattern and cluster identification. This method resulted in the selection of (P1): Migration Perception as a suitable target variable candidate, based on it having very low and negative correlations with emotional and attitudinal variables, which suggested its potential relevance for capturing baseline attitudes prior to exposure to stereotypical storylines.

In order to validate this decision, (P1)’s range was found to have a high enough number of answers on the Likert scale in order to validate its suitability in modeling migration attitudes. This choice of methodology was based on the conceptual usability of (P1) in addressing research questions since it specifically measured Arab and African migration general attitudes.

To explore the relationships between (P1) and other variables, stacked bar plots were generated. These plots were used to investigate how (P1) varied across different categories, such as levels of comfort with living alongside immigrants or political orientation. The objective was to observe interaction patterns, potentially revealing moderating or mediating effects. This was accomplished without reaching final conclusions since the EDA had been programmed to explore trends, rather than formulate hypotheses.

In addition, cross-tabulations were also performed to examine the association between categorical variables, for example, political orientation and gender. Cross-tabulations were created to see if political opinions vary by gender to a large extent, considering the established connection between political orientation and attitudes toward migration. Such analysis was even more important in placing (P1) in context in terms of broader sociopolitical dimensions, to determine whether it was an appropriate target variable that would be capable of measuring complex social attitudes.

By virtue of this elaborative exploratory analysis, (P1) was also seen as a valid target variable, with its correlations observed across several predictors and conceptual relevance to the research agenda. All the understanding achieved during the EDA phase adequately informed further feature selection and modeling, providing the (P1) candidate with an empirical and conceptual basis.

2.4. Feature Selection and Binarization

The feature selection process targeted (P1), perception of migration in Spain by Arabic and African immigrants, evaluated on a seven-point Likert scale. (P1) was selected as the target variable because it is conceptually relevant to the study goal to be aware of migration perceptions and because it has high correlations with other variables revealed by exploratory data analysis (EDA). It is striking that (P1) was particularly low and negatively correlated with emotion and attitude measures, thereby underlining its distinctiveness as a proxy for baseline attitudes prior to exposure to stereotypical reports.

To improve interpretability and facilitate modeling, (P1) was binarized to separate positive and negative attitudes. In particular, responses with scores of 5 or more on the Likert scale were labeled as positive, and those with scores less than 5 were labeled as negative. This binarization was consistent with the aim of examining polarized attitudes towards migration and facilitated the use of classification algorithms by minimizing the multidimensionality of attitudinal classifications.

The data were subsequently prepared for model training and evaluation by partitioning them into training and test sets, allocating 80% of the data for training and 20% for testing. It is conventional in machine learning practice to adhere to the 80/20 split, which provides a balance between sufficient training data and sufficient test data to evaluate model performance (Muraina, 2022). This approach prevents overfitting and ensures the model’s performance generalizes well to new data.

Feature scaling was performed on all the predictor variables using StandardScaler to normalize their distributions. Normalization caused each variable to have an equal contribution to the model’s learning process, preventing the issue that variables with larger numerical ranges would have a disproportionate impact on the model coefficients. Standardization was especially needed with so many various variables, such as numerically coded nominal and ordinal data. Lastly, feature selection for the final model was informed by random forest (RF) and decision tree (DT) classifier importance values. Both of these models predict feature importance based on impurity reduction as measured by the Gini impurity index. This permitted an intensive and systematic feature selection process, optimizing modeling conditions to support a predictive analysis of attitudes with enhanced reliability and interpretability towards Arab and African migration.

This approach is aligned with recommended practices in statistical learning and interpretable modeling, particularly in scenarios involving high-dimensional data with limited sample sizes. As outlined by Hastie, Tibshirani, and Friedman (Hastie, 2009), ensemble methods like random forests are especially effective in stabilizing predictions and reducing variance without inflating bias. Moreover, decision trees offer straightforward interpretability, as they allow us to trace how each feature contributes to classification splits. These properties are further supported by Molnar (Christoph, 2020), who emphasizes the advantages of model-specific interpretability in structured tree-based methods, including their intuitive outputs and usefulness for identifying variables that may influence model predictions in sensitive domains.

2.5. Classification Models

A decision tree, which is a non-parametric supervised classification method (Becker et al., 2023; Casarin et al., 2021), was developed using the Gini impurity criterion for splitting. Decision trees recursively split the data into subsets based on the value of input features, creating decision nodes and leaf nodes. Each decision node represents a feature test, while each leaf node represents a class label. The Gini impurity for a node j is calculated as:

I_{G} (j) = 1 - \sum_{i = 1}^{C} p_{i}^{2}

where

p_{i}

is the proportion of samples belonging to class i at node j, and C is the total number of classes. This impurity measure is commonly known as the Gini index (Breiman et al., 2017).

Decision tree was implemented in this experiment using the following settings: maximum depth of 10 to prevent overfitting by limiting the growth of the tree, minimum sample size of 4 to add stability, and minimum sample split of 2 to maintain the integrity of the tree while obtaining maximum information gain. Features inducing greater reductions in impurity are considered more influential in predicting the target variable (Almutiri & Saeed, 2022; Bouke et al., 2023). Therefore, the Gini criterion was chosen due to its effectiveness in measuring the “purity” of nodes, leading to more balanced splits and improved model interpretability.

Interpretability in decision trees is not only derived from their structure but also from how they quantify the influence of each variable. According to Molnar (Christoph, 2020), the ability to follow a feature through a split path provides contrastive and selective explanations, which are more aligned with how humans understand causal-like reasoning in decision-making processes, even without establishing true causality. This makes decision trees particularly suitable for domains like social perception, where understanding the basis of a classification is as relevant as the classification outcome itself.

In addition to the decision tree, a random forest classifier was employed as an ensemble of unpruned classification models (K. B. Best et al., 2021). Random forest builds multiple decision trees at training time, each from a bootstrap sample of the training data, and merges their predictions to deliver increased accuracy and generalization. Overfitting is reduced by averaging the predictions of the constituent trees, thereby rendering it more robust and stable. In this research, the random forest model was set up with 200 estimators to provide stable models, splitting with the Gini criterion in line with the decision tree model. It was also set up with a maximum depth of 10 and minimum sample size of 2 for consistency among models while keeping models interpretable and avoiding overfitting. Random forest feature importance is based on reduction in Gini impurity across all trees and measures the contribution each feature makes to reduce the impurity at a node. It is expressed computationally as:

I_{G} = \sum_{t = 1}^{T} \frac{n_{t}}{N} [I_{G} (j) - p_{L} I_{G} (L) - p_{R} I_{G} (R)]

where T is the total number of trees,

n_{t}

is the number of samples reaching node t, N is the total number of samples,

I_{G} (j)

is the Gini impurity of the parent node,

I_{G} (L)

and

I_{G} (R)

are the Gini impurities of the left and right children, and

p_{L}

and

p_{R}

are the proportions of samples going to the left and right nodes. This formula is used to compute the weighted average Gini decrease across all trees in the random forest model (Xie et al., 2023).

The random forest model identified the variables that work most in favor of the classification. Based on this model, the 10 most important variables were selected, which were then used as inputs for the support v

To mitigate potential bias introduced by categorical variables with many levels, we relied on permutation-based feature importance as an additional reference. This method, discussed extensively by Molnar (Christoph, 2020), assesses a feature’s relevance by measuring the increase in model error when its values are randomly permuted, breaking any association with the target. This approach provides a model-agnostic view of variable contribution and complements impurity-based importance measures. It is particularly helpful in high-dimensional social data where multiple features may have overlapping or subtle influences.

The support vector machine (SVM) is a machine learning model based on the structural risk minimization principle to find the optimal hyperplane with the maximum class margin (Roy & Chakraborty, 2023). For this study, a linear kernel was used to maintain model interpretability and examine the linear separability of binarized perceptions. The SVM model was trained with regularization parameter

C = 1.0

to give a trade-off between the margin maximization and classification error minimization in order to make the classification process stable. SVM performs very well with high-dimensional and sparse data and clear decision boundaries defined by support vectors, and therefore it is best for this research.

Figure 6 illustrates the decision boundary created by the support vector machine (SVM) algorithm with a linear kernel on an artificially created dataset. In this case, the decision boundary, represented by the solid line, is between two linearly separable classes, and the margins are represented by the dashed lines. The support vectors, the critical points defining the margin, are found through red circles. The support vectors determine the location of the hyperplane, and if one is removed, a change in the decision boundary will be observed. This drawing depicts how SVM finds the maximum margin between the two classes for better generalization.

The efficacy of SVM is not restricted to two-dimensional data. In higher dimensions, the SVM model looks for a hyperplane that maximally separates classes with a margin. For linearly separable data, a linear hyperplane will do, as illustrated in Figure 6. However, for non-linearly separable data, the SVM can be generalized using the kernel trick such that the input features are implicitly mapped to a higher-dimensional feature space (Negi et al., 2024). The SVM is then able to identify a separating hyperplane in the new feature space such that the model is capable of handling complex decision boundaries that are non-linearly separable in the original feature space.

To facilitate a more detailed analysis of the relative contribution of each feature to the classification result, feature importance in SVM was approximated from the absolute values of the coefficients of the linear decision hyperplane. In linear SVM, the decision function is:

f (x) = w^{T} x + b

where w is the weight vector and b is the bias term. This formulation represents a linear decision boundary commonly used for classification tasks (Piccialli & Sciandrone, 2022).

As emphasized by Molnar (Christoph, 2020), linear models offer a direct path to interpretability through their coefficients, making them a preferred option when transparent reasoning is needed. The interpretability of linear SVM models, in particular, allows for the identification of features most strongly associated with the classification outcome, supporting a socially responsible analysis framework where model behavior can be scrutinized and discussed without relying on opaque decision logic. However, this approach can only be applied to linear kernels because, in the case of nonlinear kernels (like RBF or polynomial), the decision boundary lies in a mapped feature space, and thus it becomes difficult to explain the importance of the original features directly (Valkenborg et al., 2023).

The ensemble of decision tree, random forest, and SVM models enabled thorough classification analysis using the interpretability of decision trees, accuracy and robustness of random forest, and generalization power of SVM. All this is a demonstration of ensemble learning, which is a process of aggregating multiple models to enhance performance. Ensemble learning methods train several base learners and aggregate their predictions, leading to better accuracy and generalization than individual models alone (Mienye & Sun, 2022).

The ensemble method improved the model by decreasing bias and variance, making classification of perceptions more consistent. Decision tree and random forest models selected the most influential variables for the classification. The same variables were validated by the SVM model, providing a better understanding of the variables most strongly associated with perceptions, as observed within the studied sample, of Arab-African immigration in Spain. This method ensured a strict and systematic feature selection procedure, maximizing the modeling conditions for accurate and reliable predictive analysis.

3. Results

To carry out an effective and thorough analysis, we worked diligently on each step of the methodology, which included data gathering, feature extraction, and model evaluation. We further compared the outcome of the first and second phases of the research (pre- and post-exposure to immigration news) and analyzed the performance of trained models. The assessment was based on the models used and the characteristics chosen, along with attitudinal and emotional transformation in different groups. We analyzed how exposure to different migration news reporting was associated with variations in participants’ feelings and attitudes. The emotional and perceptual characteristics most strongly associated with the classifications were identified using classification models like decision trees, random forests, and support vector machines (SVM). These models were tuned to recognize patterns in the responses of the participants, encoding the data in precision measures, AUC-ROC, and importance coefficients of each feature, showing the performance and applicability of particular emotions, economic situations, and perceptions in the classification of the responses.

3.1. Descriptive Statistical Analysis and Graphics

Before applying classification and preprocessing, we examined the full distribution of responses to item P1, measured on a seven-point Likert scale. As shown in Figure 7a, responses concentrated around the midpoint, with most participants selecting values 4 or 5, and fewer choosing the extremes. This pattern suggests generally neutral to slightly positive attitudes in the sample. For classification purposes, the responses were binarized by grouping values from 1 to 4 as Negative and values from 5 to 7 as Positive, thereby retaining all responses, including the midpoint. The resulting distribution is displayed in Figure 7b, showing a nearly balanced class distribution with 66 positive and 64 negative responses.

Figure 8 allows us to observe the frequency of political orientation within each gender. Although there are numerical differences—particularly in the Left and Center categories for the female group, which is the largest—these are descriptive rather than conclusive. Figure 9a shows the responses to question Q1 distributed by political orientation. Variability in response frequencies is observed, and positive and negative responses are almost evenly balanced across political orientations. This pattern indicates a relatively even distribution of opinions across political orientations, with a minor tendency toward left-leaning responses in this sample. In Figure 9b, regarding interest in the news, people with positive responses appear to show slightly less interest in migration-related news. However, this does not appear substantially different from the level of interest shown by those with negative perceptions of migration, considering the sample size. Importantly, we cannot infer that these differences directly affect the emotions or perceptions analyzed unless gender and political orientation are identified as significant features in the predictive models.

3.2. Feature Analysis in Random Forest and Decision Tree Models Based on Gini Index, Before and After Exposure

Before incorporating migration-related data, the decision tree (DT) and random forest (RF) models identified features with high importance scores based on the Gini index, a measure that assesses the homogeneity of nodes and their efficiency in isolating observations in the models. Within the RF model, variables related to crime and economy (A3 and A6) showed high importance, along with emotional responses such as admiration (E2) and fear (E1), leading to high Gini coefficients, as displayed in Figure 10. Similarly, within the DT model, A6 and A3 also showed high relevance, along with the variable CA (friends’ environment) and A7 (perceived social problems).

After exposure to news, the factors highlighted by the Gini index in both models indicated a shift in importance towards emotional and interpretative aspects, as shown in Figure 11. The following figure reflects a shift in the type of features prioritized by the models, particularly toward emotional and situational aspects after exposure. In the RF model, intense emotions (Em3) and interest in the news ((P2) NEWS INTEREST) were among the most relevant features, along with contextual variables such as D and S.1, which showed higher Gini index values. In the case of the DT model, Em3, (P2) NEWS INTEREST, and C (public environment) also showed high Gini index values, indicating their relative importance within the model rankings.

From a sociopolitical perspective, the shift in feature importance before and after exposure suggests a change in the types of considerations participants prioritized when evaluating migration-related content. In the first phase, variables linked to economic consequences (A6: potential economic benefit; A3: association with crime) and perceived social disruption (A7) were among the most influential. These elements are frequently present in public discourse on migration, particularly in narratives focused on material impact. After exposure, the most prominent features included emotional responses such as Em3 (distrust) and Em2 (admiration), along with indicators of information engagement like P2 (interest in the news). This reordering of importance may reflect increased sensitivity to affective and contextual dimensions. The inclusion of variables related to public settings (C, D, S1) also points to greater relevance of the social environment in how participants organized their perceptions. These findings are consistent with prior research indicating that media portrayals emphasizing uncertainty and threat can elevate emotional and interpretative responses to migration, shaping how individuals engage with such topics (Esses et al., 2013).

3.3. Model Performance and Metrics Obtained

Table 3 presents a comparative analysis of model performance in the two surveys conducted before and after participants were exposed to migration-related news with different thematic focus. In the first survey, the selected features of the decision tree (DT) model achieved a Gini index of 0.7478.When these features were used to train a support vector machine (SVM), the model reached a precision of 1.00 for negative responses and an overall accuracy of 88.46%.

The random Forest (RF) model showed slightly lower accuracy (84.62%) but remained competitive, with a Gini index of 0.2073 for the most important features. These key features included economic perceptions, residential integration (CA), and context of exposure (C), suggesting that these dimensions contributed to participants’ classification in both phases.

When exposed to the news, the results of the second survey show drastic changes in performance values. For the DT model, Gini index decreased to 0.4147 for the best-performing features, and even though its accuracy slightly decreased to 80.77%, it still had high values in precision and f1-score. In contrast, the RF model exhibited a marked performance improvement, achieving an accuracy of textbf92.31% and a Gini index of 0.5433 for the best-performing features. In this case, the selected characteristics included specific emotional variables (Em3, Em2) and factors such as the degree of interest aroused by the news ((P2) NEWS INTEREST), which reflect the influence of emotions and type of exposure on the perception of the migration phenomenon.

For the initial survey, coefficients derived from the DT model indicated that features A6 and C had the strongest positive associations with the predicted class (0.6663 and 0.6657), while A3 and A7 were negatively weighted (−0.3332 and −0.3333). These variables corresponded to economic perceptions and public exposure context. In the RF model, A6 and A8 also showed high importance (0.3704 and 0.3977), suggesting a similar underlying pattern. The combination of indicators varied slightly between phases but consistently influenced the classification outcomes.

To ensure a more robust evaluation of model behavior—particularly given the moderate class distribution—we included balanced accuracy, macro-averaged F1, and Cohen’s

κ

alongside overall accuracy. These metrics provide greater insight into the fairness and stability of classifiers, especially when performance differences are subtle.

3.4. Cross-Validation and Metric Stability

To assess the consistency of model performance beyond a single train–test split, we re-evaluated all classifiers using stratified 5-fold cross-validation. This approach provides a better understanding of variability across resampling iterations and avoids overreliance on one data partition.

Table 4 summarizes the mean and standard deviation of five metrics across folds: accuracy, balanced accuracy, macro F1, Cohen’s

κ

, and ROC AUC.

Overall, random forest and SVM classifiers outperformed decision trees in both datasets. In the pre-exposure phase, SVM with decision-tree-selected features achieved the highest AUC (0.8295 ± 0.0970), whereas in the post-exposure phase, SVM trained on random forest features performed best (AUC = 0.7655 ± 0.0861). Across conditions, most models attained balanced accuracy and macro F1 above 0.60 with acceptable variability, suggesting reliable generalization across folds.

4. Discussion

In order to respond to the research questions posed, this study examines how different types of media coverage and social contexts may be associated with emotional attitudes toward immigrants in Spain (RQ1 ), identifies the most prevalent emotional responses to Arab and African immigration based on media coverage (RQ2), and explores the role of individual characteristics (such as political orientation and prior media exposure) in shaping attitudes after reading news about immigration (RQ3). The results indicate trends suggesting that media narrative and emotional tone of news coverage appear to be associated with variations in public opinion regarding immigration, in a manner consistent with the previous literature. These findings may also highlight the potential for more constructive narratives in public discourse. The comparison and discussion of these results will be addressed as study questions are responded to, as is evident in Table 5.

4.1. RQ1: Impact of Media Coverage and Social Environments on Emotional Responses

The findings of this study suggest that exposure to certain types of information content and social contexts may be associated with variations in affective responses and attitudes toward migrants of Arab and African origin in Spain. Decision tree classification models applied to pre-exposure data (84% accuracy) identified “Circle of friends” and “On the street” as features with high predictive power and elevated Gini indices. This may reflect the role that perceived social environments play in shaping initial attitudes toward immigration, as observed in the classification model.

In response to news articles largely composed of statistics and data on immigration, the predictors identified by the classification models shifted. For instance, in decision tree models with 80% accuracy, the question “What did you think about the news?” emerged as informative, with responses such as “Boring” and “Confusing” appearing among the most relevant features. In contrast, a more accurate random forest model (92%) ranked “Biased” and “Superficial” as the top predictive responses. These results suggest that perceived qualities of information—such as clarity, neutrality, or framing—may be associated with distinct patterns of cognitive and emotional responses to migration narratives.

Furthermore, in the SVM binary classification model (positive vs. negative perceptions), the feature “Superficial” (S.1) had a positive coefficient (+0.3316), indicating an association between perceiving information as superficial and the likelihood of expressing negative attitudes toward migration. Conversely, the attribute “Biased” (S) showed a negative coefficient (−0.2082), suggesting that when participants perceived information as biased, they were more likely to be classified as having positive attitudes. This pattern may reflect a tendency among some individuals to discount or disbelieve negative information when it is perceived as biased, though such interpretations should be treated cautiously given the correlational nature of the study.

The results underscore how the framing and interpretation of informational content—particularly when based on statistical evidence and empirical data—may be associated with variations in public opinion toward migration. These findings suggest that participants’ perceptions of transparency and credibility in such content are linked to their evaluative attitudes, highlighting the importance of how information is presented when discussing complex social topics such as immigration.

Building on previous research, such as Hombrados-Mendieta et al. (2019) and García-Cid et al. (2020), the present findings are consistent with the idea that both social environments and mediated information are linked to variations in attitudes toward migration. While earlier work emphasized the protective role of social networks and sense of community against discrimination, the current results further suggest that perceptions of objective, data-driven content may also be associated with shifts in evaluative responses. These observations are in line with Caro-Carretero et al. (2024), who described migration attitudes as dynamic and context-dependent, shaped by an interplay between informational and structural factors.

4.2. RQ2: Common Emotional Reactions Related to Arab and African Immigration

Affective reactions prior to exposure to immigration news may be associated with classification outcomes, as suggested by the predictive models. In particular, the random forest (RF) model trained on the first survey identified emotional features such as “mistrust” and “fear” (E3, E1) as among the most influential predictors. These features showed increased Gini importance and corresponding coefficient values, indicating their relevance within the model’s decision-making process.

Upon closer examination, the random forest model identified “happiness” and “surprise” (Em2, Em3) as particularly salient variables, showing a notable increase in relative importance as reflected by the Gini index and associated coefficients. With a classification accuracy of 92%, the model suggests that these emotions may be linked to variation in responses to migration-related news. These results underscore the relevance of emotional involvement in the classification of public opinion toward immigration in the studied sample.

This is consistent with the findings of Hombrados-Mendieta et al. (2019), who reported that emotive elements added to narratives tend to increase public awareness regarding immigration issues. Similarly, García-Cid et al. (2020) suggests that such content can shift perceptions toward more emotionally charged or polarized responses. A comparable pattern was observed in our model, where emotive features emerged among those with higher predictive importance, suggesting a potential link between emotional framing and attitudinal classification.

4.3. RQ3: Influence of Individual Characteristics on Attitude Shifts

The models suggest that individual characteristics, such as political orientation and prior media exposure, may be associated with variations in emotional and perceptual responses following exposure. For example, the SVM model achieved 100% precision for negative responses in the pre-exposure phase and an overall accuracy of 88.46%, with variables such as positive economic impact (A6 and A8) appearing more prominently among respondents identifying as centrist or left-leaning. These findings are consistent with (Caro-Carretero et al., 2024), who noted the effectiveness of SVM in classifying responses with lower emotional content. Regarding media exposure, both the random forest (RF) and decision tree (DT) models indicated that variables such as “interest in the news” (P2) and exposure context (C) emerged as salient features in post-exposure predictive modeling.

An analysis of feature coefficients provides insight into how media exposure may be associated with shifts in the relative importance of classification variables. Before exposure, economic and security-related factors (A6 and A3) displayed significant positive coefficients, suggesting that positive perceptions of immigration were more likely to co-occur with favorable evaluations of economic and security aspects. In contrast, after exposure, the random forest model assigned a negative coefficient to the emotion labeled as “fear”(Em3) and a positive coefficient to the variable labeled as “interest in news” (P2), indicating a potential shift toward affective and interpretative factors. These patterns may reflect how media coverage marked by emotional content is linked to more affectively charged interpretations. This observation is consistent with prior findings by Gutierrez and Yang (Gutiérrez-Rodríguez et al., 2024; Yang et al., 2021), who reported that exposure to media narratives about threats, risks, or social tensions is often associated with more emotive public responses.

4.4. General Discussion and Contributions to the Field

The present work offers new insights into the relationship between media reporting, social environment, and emotional responses toward immigration, by taking into account subtle affective processes as well as contextual influences. In contrast to much prior research, this study incorporates the use of machine learning techniques—decision trees, random forests, and support vector machines—to explore patterns of perceptual and emotional responses, thereby contributing to a more nuanced understanding of how these patterns may manifest within a sample of Spanish university students when evaluating Arab and African immigration. The results are consistent with and expand upon the previous literature by illustrating the context-sensitive and dynamic nature of emotional reactions to migration narratives.

The results for RQ1, which examines the impact of media coverage and social environments, align with but also expand upon the existing literature. In this study, decision tree and random forest models identified the predictive importance of social network influences, such as “Circle of friends” and “On the street,” prior to exposure to news content. These findings are supported by Hombrados-Mendieta et al. (2019), who highlighted the buffering role of social support on mental health and life satisfaction (SOC and SWL correlation: 0.648). The present study adds to this line of work by suggesting that media narratives may modulate the perceived influence of social predictors. Following exposure, attitudes related to news clarity and credibility—such as perceptions of being “Biased” or “Superficial”—emerged as significant predictors, pointing to the possible role of informational framing in shaping attitudinal responses. While these findings do not imply causation, they could inform strategies aimed at promoting more inclusive and evidence-based narratives in public discourse.

In the examination of RQ2, the findings support the presence of strong emotional responses to immigration narratives and offer novel insights into the dynamic nature of such responses. Explanatory models in the pre-exposure phase identified “fear” and “mistrust” as predominant emotional predictors, which aligns with the findings of Caro-Carretero et al. (2024), who reported elevated levels of latent and symbolic racism in Spain. In contrast, positive affective reactions—such as “happiness” and “surprise”—were more prominent in stories designed to elicit constructive emotional responses. These patterns resemble those described by Gutiérrez-Rodríguez et al. (2024), who observed greater levels of inclusion when emotional support was present (emotional support: 3.87). The current study is consistent with this perspective by suggesting that emotional transitions may be fluid and that positive emotional responses may be linked to more inclusive attitudes toward migration.

With regard to RQ3, the analysis of individual characteristics suggested shifts in the determinants of attitude following exposure, with political orientation and prior media use emerging as salient features. Before exposure, concerns related to security and the economy (i.e., A6 and A3) were associated with more favorable perceptions. In contrast, the models developed after exposure emphasized affective features such as “fear” and interest in the news content (P2), which may reflect a shift toward more emotionally driven interpretations, possibly influenced by narrative framing. This pattern is consistent with the findings of Yang et al. (2021), who emphasized the role of social and environmental factors in shaping mental well-being and perception. Notably, an unexpected association emerged in which participants who perceived information as biased tended to express more positive attitudes toward migration. This finding may suggest a complex interplay between media trust and attitudinal responses, and highlights the potential relevance of perceived credibility in attitudinal variation.

The results reported herein are consistent with and complement prior research on public opinion toward immigrants and the role of emotional responses. Specifically, the associations observed between narrative framing, emotive involvement, and perceptions of Arab and African immigrants align with the findings of Indelicato et al. (2022), who identified economic and cultural dimensions as relevant predictors of anti-immigrant attitudes across Europe. The present work adds to this discussion by suggesting that emotional responses such as “happiness” and “fear” may be linked to variations in these attitudes—an aspect that has received less attention in prior research. In contrast to the lack of correlation between immigration and hate speech reported by Sánchez-Holgado et al. (2022), the patterns observed here may reflect processes of emotive polarization, potentially shaped by the agenda-setting dynamics of mainstream media. In this regard, the emotional dimensions identified in our data are aligned with the findings of Formoso-Suárez et al. (2022) on happiness and coping strategies, as well as with those of Roman Etxebarria et al. (2024) on the role of public discourse in shaping integration narratives. By incorporating emotional dynamics into the analysis, this research contributes to the literature that emphasizes not only structural and economic factors (Indelicato, 2022), but also affective components in understanding public attitudes toward migration.

4.5. Limitations

While the present study provides relevant insights into the emotional and cognitive processes underlying attitudes toward immigration, several limitations should be acknowledged. First, the relatively small sample size (N = 130) limits the generalizability of the findings. Compared to larger-scale studies such as Indelicato et al. (2022) (9066 surveys) or Sánchez-Holgado et al. (2022) (97,710 tweets), the current sample may not capture the full variability of political orientations or social contexts in the broader Spanish population. Furthermore, the voluntary nature of participation introduces a potential self-selection bias, as individuals with particular views may have been more likely to participate. Additionally, the reliance on self-report instruments may be subject to social desirability effects or inaccuracies in emotional self-assessment.

It is also important to note that the findings are specific to a Spanish university context, and caution should be exercised when extrapolating to other populations or cultural settings. Although machine learning models demonstrated high predictive performance, these outcomes reflect associations within the present dataset and should not be interpreted as evidence of causal relationships. Future research with larger, more diverse, and representative samples will be essential to enhance external validity and further explore the role of emotional framing in public attitudes toward immigration.

Additionally, unlike large-scale analyses such as Sánchez-Holgado et al. (2022) and Indelicato et al. (2022), the relatively small sample size of the present study may constrain its ability to detect more subtle differences between subgroups, such as political or cultural orientations. This limitation underscores the need for caution when attempting to generalize findings to broader populations. To enhance representativeness and robustness, future studies may benefit from employing stratified sampling methods, which could help better capture subgroup-level variation in emotional and perceptual responses across different contexts.

5. Conclusions

The results suggest that emotionally charged material may be associated with more polarized reactions, which is consistent with previous research on the influence of media framing on public opinion. Additionally, the study found that individual characteristics, such as political orientation and prior media exposure, were associated with differences in responses to immigration narratives. These findings suggest a possible divide between pragmatically and emotionally motivated opinions, shaped in part by prior exposure and personal context.

Interestingly, both the support vector machine and random forest models achieved high classification performance when using features associated with sentiment-related responses, both before and after media exposure. This may reflect a shift from reasoning centered on pragmatic concerns—such as security and economic impact—to a greater emphasis on affective interpretation following exposure. Taken together, these findings suggest that media content may play a role in shaping public opinion, highlighting the dual potential of media to either foster understanding or contribute to polarization on immigration-related issues.

While the findings offer relevant insights, several limitations should be acknowledged. The sample size, although reasonable for exploratory analysis, may not capture the full diversity of the Spanish population, limiting the generalizability of results to other demographic groups. Moreover, the study focuses specifically on Arab and African migration to Spain, and therefore the findings may not extend to other geographic regions or migration patterns. Although the support vector machine and random forest models performed well within the scope of this dataset, their performance is dependent on the quality and diversity of input data, and further refinement is needed to enhance their predictive utility. Additionally, the study relies on self-reported survey data, which may be subject to response biases such as social desirability.

Future studies could build on these observations by incorporating surveys with more demographically and culturally diverse populations, as well as by exploring geographically distinct contexts. Longitudinal designs may help track changes in attitudes over time, offering a better understanding of how emotional and perceptual responses evolve. Additionally, case studies of initiatives in media environments that aim to promote equitable coverage and reduce polarization could provide valuable contextual insights. Further research should also assess the role of alternative media sources, including social media platforms, in framing public perceptions of immigration and shaping emotional responses.

Author Contributions

Conceptualization, A.T.-E., A.M.-V., K.C.-B., D.A.-G., N.O.G., J.A.M.G., and H.C.M.; methodology, A.T.-E., A.M.-V., K.C.-B., D.A.-G., N.O.G., J.A.M.G., and H.C.M.; software, A.T.-E., A.M.-V., K.C.-B., D.A.-G., N.O.G., J.A.M.G., and H.C.M.; validation, A.T.-E., A.M.-V., K.C.-B., D.A.-G., N.O.G., J.A.M.G., and H.C.M.; formal analysis, A.T.-E., A.M.-V., K.C.-B., D.A.-G., N.O.G., J.A.M.G., and H.C.M.; investigation, A.T.-E., A.M.-V., K.C.-B., D.A.-G., N.O.G., J.A.M.G., and H.C.M.; resources, N.O.G., H.C.M.; data curation, A.T.-E., A.M.-V., K.C.-B., D.A.-G., N.O.G., J.A.M.G., and H.C.M.; writing—original draft, A.T.-E., A.M.-V., K.C.-B., D.A.-G., N.O.G., J.A.M.G., and H.C.M.; writing—review and editing, A.T.-E., A.M.-V., K.C.-B., D.A.-G., N.O.G., J.A.M.G., and H.C.M.; visualization, A.T.-E., A.M.-V., K.C.-B., D.A.-G., N.O.G., J.A.M.G., and H.C.M.; supervision, A.T.-E., N.O.G., and H.C.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Universidad de Las Américas under Research Project 485.XIV.24.

Institutional Review Board Statement

The ethical review and approval were waived for this study because it involved anonymous and voluntary survey responses by adult participants without personal, medical, or sensitive information being collected. The study followed the Declaration of Helsinki (2013 revision), the Belmont Report, and the European General Data Protection Regulation (EU GDPR 2016/679). This study was minimal risk and therefore exempt from formal IRB review according to Article 8 of the Code of Ethics for Research of Yachay Tech University (RCG- SO-09-No.-042-2020). To fully protect the confidentiality of the data, all responses were anonymous, no identifiers were retained, and raw data were stored securely and encrypted and could only be accessed by the research team.

Informed Consent Statement

All participants in the study provided informed consent. Participants were informed regarding the aim of the research, what it would mean to participate (being voluntary and anonymous), and how their data would be used and their data protected, and they consented to the use of their anonymized responses for academic purposes.

Data Availability Statement

The cleaned survey datasets, analysis code, and figure scripts are openly available in https://doi.org/10.5281/zenodo.15807903, Zenodo.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ali Abd Al-Hameed, K. (2022). Spearman’s correlation coefficient in statistical analysis. International Journal of Nonlinear Analysis and Applications, 13(1), 3249–3255. [Google Scholar]
Almutiri, T., & Saeed, F. (2022). A hybrid feature selection method combining Gini index and support vector machine with recursive feature elimination for gene expression classification. International Journal of Data Mining, Modelling and Management, 14(1), 41–62. [Google Scholar] [CrossRef]
Amsalem, E., & Zoizner, A. (2022). Real, but limited: A meta-analytic assessment of framing effects in the political domain. British Journal of Political Science, 52(1), 221–237. [Google Scholar] [CrossRef]
Anuar, A., Mohd Hussain, N. H., & Byrd, H. (2023). Tree-based machine learning in classifying reverse migration. Mathematical Sciences and Informatics Journal (MIJ), 4(1), 49–56. [Google Scholar]
Ardèvol-Abreu, A. (2015). Framing o teoría del encuadre en comunicación. Orígenes, desarrollo y panorama actual en España. Revista Latina de Comunicación Social, 70, 423–450. [Google Scholar] [CrossRef]
Arjona, Á., & Checa, J. C. (2011). Españoles ante la inmigración: El papel de los medios de comunicación = Spaniards’ perspective of immigration: The role of the Media. Comunicar: Revista Científica Iberoamericana de Comunicación y Educación = Scientific Journal of Media Education, 37(2), 1–17. [Google Scholar]
Aydar, Z. (2022). The life opportunities of young refugees: Understanding the role, function and perceptions of local Stakeholders. Social Sciences, 11(11), 527. [Google Scholar] [CrossRef]
Becker, T., Rousseau, A.-J., Geubbelmans, M., Burzykowski, T., & Valkenborg, D. (2023). Decision trees and random forests. American Journal of Orthodontics and Dentofacial Orthopedics, 164(6), 894–897. [Google Scholar] [CrossRef] [PubMed]
Bekteshi, V., & Bellamy, J. L. (2024). Adapting for Well-Being: Examining acculturation strategies and Mental Health among Latina immigrants. Social Sciences, 13(3), 138. [Google Scholar] [CrossRef]
Best, K., Gilligan, J., Baroud, H., Carrico, A., Donato, K., & Mallick, B. (2022). Applying machine learning to social datasets: A study of migration in southwestern Bangladesh using random forests. Regional Environmental Change, 22(2), 52. [Google Scholar] [CrossRef]
Best, K. B., Gilligan, J. M., Baroud, H., Carrico, A. R., Donato, K. M., Ackerly, B. A., & Mallick, B. (2021). Random forest analysis of two household surveys can identify important predictors of migration in Bangladesh. Journal of Computational Social Science, 4, 77–100. [Google Scholar] [CrossRef]
Boateng, F. D., Pryce, D. K., & Chenane, J. L. (2021). I may be an immigrant, but I am not a criminal: Examining the association between the presence of immigrants and crime rates in Europe. Journal of International Migration and Integration, 22, 1105–1124. [Google Scholar] [CrossRef]
Bouke, M. A., Abdullah, A., ALshatebi, S. H., Abdullah, M. T., & El Atigh, H. (2023). An intelligent DDoS attack detection tree-based model using Gini index feature selection method. Microprocessors and Microsystems, 98, 104823. [Google Scholar] [CrossRef]
Breiman, L., Friedman, J., Olshen, R. A., & Stone, C. J. (2017). Classification and regression trees. Routledge. [Google Scholar]
Canchen, L. (2019). Preprocessing methods and pipelines of data mining: An overview. arXiv. [Google Scholar] [CrossRef]
Caro-Carretero, R., Fernández, M., & Valbuena, C. (2024). Advancing the knowledge of Spaniards’ attitudes towards immigration. SAGE Open, 14(3), 21582440241271912. [Google Scholar] [CrossRef]
Casarin, R., Facchinetti, A., Sorice, D., & Tonellato, S. (2021). Decision trees and random forests. In The essentials of machine learning in finance and accounting (pp. 7–36). Routledge. [Google Scholar]
Ceylan, M., & Hayran, C. (2021). Message framing effects on individuals’ social distancing and helping behavior during the COVID-19 pandemic. Frontiers in Psychology, 12, 579164. [Google Scholar] [CrossRef] [PubMed]
Chasciar, V., Chasciar, D. R., Coman, C., Toderici, O. F., Toader, L., & Bularca, M. C. (2024). Post-detention migration in Romania: Reasons, challenges and solutions for preventing recidivism and ensuring reintegration into society. Societies, 14(11), 213. [Google Scholar] [CrossRef]
Chong, D., & Druckman, J. N. (2007). Framing theory. Annu. Rev. Polit. Sci., 10(1), 103–126. [Google Scholar] [CrossRef]
Christoph, M. (2020). Interpretable machine learning: A guide for making black box models explainable. Independently Published. [Google Scholar]
De-Lima-Santos, M., & Ceron, W. (2022). Artificial intelligence in news media: Current perceptions and future outlook. Journalism and Media, 3(1), 13–26. [Google Scholar] [CrossRef]
Ekman, M. (2019). Anti-immigrant sentiments and mobilization on the Internet. In SAGE handbook of media and migration (pp. 551–562). SAGE. [Google Scholar]
Enríquez, C. G. (2019). Inmigración en España: Una nueva fase de llegadas. Análisis del Real Instituto Elcano (ARI), 28, 1. [Google Scholar]
Entman, R. M. (1993). Framing: Toward clarification of a fractured paradigm. Journal of Communication, 43(4), 51–58. [Google Scholar] [CrossRef]
Esses, V. M., Medianu, S., & Lawson, A. S. (2013). Uncertainty, threat, and the role of the media in promoting the dehumanization of immigrants and refugees. Journal of Social Issues, 69(3), 518–536. [Google Scholar] [CrossRef]
European Commission. (2024). Migration and migrant population statistics. Available online: https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Migration_and_migrant_population_statistics (accessed on 1 October 2024).
Formoso-Suárez, A. M., Saiz, J., Chopra, D., & Mills, P. J. (2022). The impact of religion and social support on self-reported happiness in latin american immigrants in spain. Religions, 13(2), 122. [Google Scholar] [CrossRef]
Galvañ, A. N., & Giménez, C. O. (2020). Discurso del odio en radio: Análisis de los editoriales de las cadenas COPE y SER tras la llegada del Aquarius a España. Miguel Hernández Communication Journal, 11(1), 117–138. [Google Scholar] [CrossRef]
García-Cid, A., Gómez-Jacinto, L., Hombrados-Mendieta, I., Millán-Franco, M., & Moscato, G. (2020). Discrimination and psychosocial well-being of migrants in Spain: The moderating role of sense of community. Frontiers in Psychology, 11, 2235. [Google Scholar] [CrossRef] [PubMed]
Gutiérrez-Rodríguez, N., Álvarez Lorenzo, M., & Rodrigo López, M. J. (2024). Variability of social inclusion patterns involving personal, family and social characteristics in Latino Migrant families in Spain. Child & Family Social Work, 30(3), 410–422. [Google Scholar] [CrossRef]
Hastie, T. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer. [Google Scholar]
Hombrados-Mendieta, I., Millán-Franco, M., Gómez-Jacinto, L., Gonzalez-Castro, F., Martos-Méndez, M. J., & García-Cid, A. (2019). Positive influences of social support on sense of community, life satisfaction and the health of immigrants in Spain. Frontiers in Psychology, 10, 2555. [Google Scholar] [CrossRef] [PubMed]
Indelicato, A. (2022). Quantitative methods to measure attitudes toward immigrants and national identity Unpublished doctoral dissertation (Unpublished doctoral dissertation, Las Palmas, Spain, Universidad de Las Palmas de Gran Canaria).
Indelicato, A., & Martín, J. C. (2024). The effects of three facets of national identity and other socioeconomic traits on attitudes towards immigrants. Journal of International Migration and Integration, 25(2), 645–672. [Google Scholar] [CrossRef]
Indelicato, A., Martín, J. C., & Scuderi, R. (2022). Comparing regional attitudes toward immigrants in six European countries. Axioms, 11(7), 345. [Google Scholar] [CrossRef]
Instituto Nacional de Estadística. (2020a). Cifras de Población (CP) a 1 de enero de 2020. Estadística de Migraciones (EM). Año 2019. Datos provisionales (Tech. Rep. No. 981). Available online: https://www.ine.es/prensa/cp_e2020_p.pdf (accessed on 2 September 2024).
Instituto Nacional de Estadística. (2020b). Cifras de Población (CP) a 1 de julio de 2019. Estadística de Migraciones (EM). Primer semestre de 2019. Datos provisionales (Vol. 80; Tech. Rep.). Available online: https://www.ine.es/prensa/cp_j2019_p.pdf (accessed on 2 September 2024).
Kalfeli, P. N., Angeli, C., & Frangonikolopoulos, C. (2024). Victims of a human tragedy or “Objects” of migrant smuggling? Media framing of Greece’s deadliest migrant shipwreck in Pylos’ dark waters. Journalism and Media, 5(2), 537–551. [Google Scholar] [CrossRef]
Khai, T. S. (2025). Unsafe at home and vulnerable abroad: The struggle of forgotten myanmar asylum seekers and migrants in thailand post-coup D’état. Social Sciences, 14(4), 245. [Google Scholar] [CrossRef]
King, R. (2019). Diverse, fragile and fragmented: The new map of European migration. Central and Eastern European Migration Review, 8(1), 9–32. [Google Scholar]
Kline, R. B. (2023). Principles and practice of structural equation modeling. Guilford Publications. [Google Scholar]
Kollias, A., Kountouri, F., & Kalamanti, S. (2025). Framing migration through the Crisis Era 2015–2022: A content and semantic network analysis of the Greek press. Journalism and Media, 6(1), 4. [Google Scholar] [CrossRef]
Komendantova, N., Erokhin, D., & Albano, T. (2023). Misinformation and its impact on contested policy issues: The example of migration discourses. Societies, 13(7), 168. [Google Scholar] [CrossRef]
Lakens, D. (2022). Sample size justification. Collabra: Psychology, 8(1), 33267. [Google Scholar] [CrossRef]
Lu, Q., Lun, D., Dawkins-Moultin, L., Li, Y., Chen, M., Giordano, S. H., Pennebaker, J. W., Young, L., & Wang, C. (2024). Study protocol for writing to heal: A culturally based brief expressive writing intervention for Chinese immigrant breast cancer survivors. PLoS ONE, 19(9), e0309138. [Google Scholar] [CrossRef] [PubMed]
Magazzini, T. (2021). Antidiscrimination meets integration policies: Exploring new diversity-related challenges in Europe. Social Sciences, 10(6), 221. [Google Scholar] [CrossRef]
Manthei, G. (2020). The effects of refugee immigration on income inequality in germany: A case study. Technical Report. Diskussionsbeiträge.
Martínez-Martínez, L., Cambra, U. C., & Espín, C. A. T. (2018). Las noticias de inmigración en redes sociales y sus efectos sobre los jóvenes. Análisis descriptivo de las investigaciones en revistas científicas desde 2012 a 2017. In Comunicación, paz y conflictos (pp. 173–182). Dykinson, S.L. [Google Scholar] [CrossRef]
McCann, K., Sienkiewicz, M., & Zard, M. (2023). The role of media narratives in shaping public opinion toward refugees: A comparative analysis. International Organization for Migration Geneva. [Google Scholar]
Mendelsohn, J., Budak, C., & Jurgens, D. (2021). Modeling framing in immigration discourse on social media. arXiv. [Google Scholar] [CrossRef]
Michalovich, A. (2021). Digital media production of refugee-background youth: A scoping review. Journalism and Media, 2(1), 30–50. [Google Scholar] [CrossRef]
Mienye, I. D., & Sun, Y. (2022). A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access, 10, 99129–99149. [Google Scholar] [CrossRef]
Muraina, I. (2022, December 10–12). Ideal dataset splitting ratios in machine learning algorithms: General concerns for data scientists and data analysts. 7th International Mardin Artuklu Scientific Research Conference (pp. 496–504), Online. [Google Scholar]
Negi, H. S., Dimri, S. C., Kumar, B., & Ram, M. (2024). Support vector machine and classification, kernel trick for separating of data points. Mathematics in Engineering, Science & Aerospace (MESA), 15(2), 401–409. [Google Scholar]
Ozaydin, Y. S. (2018, October 9–11). The right of vote to syrian migrants: The rise and fragmentation of anti-migrant sentiments in Turkey. Asian Conference on Media, Communication & Film, Tokyo, Japan. [Google Scholar]
Pérez Fructuoso, M. J., García Revilla, R., Martinez Moure, O., & Cea Moure, R. (2025). Analysis of aging in Spain: Contemporary sociological and demographic implications. Societies, 15(2), 46. [Google Scholar] [CrossRef]
Piccialli, V., & Sciandrone, M. (2022). Nonlinear optimization and support vector machines. Annals of Operations Research, 314(1), 15–47. [Google Scholar] [CrossRef]
Robutti, R. (2024). Comparison of the living conditions of the immigrant population in major European countries. Societies, 14(9), 179. [Google Scholar] [CrossRef]
Roman Etxebarria, G., Berasategi Sancho, N., Idoiaga-Mondragon, N., & Legorburu Fernandez, I. (2024). Migrant perceptions of their social inclusion, social networks, and satisfaction with life in northern Spain. Societies, 14(1), 3. [Google Scholar] [CrossRef]
Roy, A., & Chakraborty, S. (2023). Support vector machine in structural reliability analysis: A review. Reliability Engineering & System Safety, 233, 109126. [Google Scholar] [CrossRef]
Sánchez-Holgado, P., Amores, J. J., & Blanco-Herrero, D. (2022). Online hate speech and immigration acceptance: A study of Spanish provinces. Social Sciences, 11(11), 515. [Google Scholar] [CrossRef]
Sbaa, M., Donati, S., & Zappalà, S. (2025). Not all migrants are the same: Decent work and pre-and post-migration experiences of economic migrants. Social Sciences, 14(3), 1–23. [Google Scholar] [CrossRef]
Tirado-Espín, A., Cuesta, U., Martínez-Martínez, L., & Almeida-Galárraga, D. (2020). Agenda-setting and immigration: Critical analysis of discourse and frequency in the media. Descriptive analysis of research in scientific journals from 2015 to 2020. RISTI-Revista Iberica de Sistemas e Tecnologias de Informacao, 2020(E35), 289–301. [Google Scholar]
Tirado-Espín, A., Cuesta, U., Martínez-Martínez, L., & Almeida-Galárraga, D. (2021, September 1–3). Framing and immigration: New frames in media and social networks. International Conference on Communication and Applied Technologies (pp. 140–152), Bogotá, Colombia. [Google Scholar]
Tirado-Espín, A., Cuesta, U., Martínez-Martínez, L., Ramos-Gil, Y., & Almeida-Galárraga, D. (2022). News frames in the media and social networks: Prejudices and stereotypes towards immigrants in Spain. In Marketing and smart technologies: Proceedings of icmarktech 2021 (Volume 2, pp. 363–373). Springer. [Google Scholar]
Valkenborg, D., Rousseau, A.-J., Geubbelmans, M., & Burzykowski, T. (2023). Support vector machines. American Journal of Orthodontics and Dentofacial Orthopedics, 164(5), 754–757. [Google Scholar] [CrossRef] [PubMed]
Verleyen, E., & Beckers, K. (2023). European refugee crisis or European migration crisis? How words matter in the news framing (2015–2020) of asylum seekers, refugees, and migrants. Journalism and Media, 4(3), 727–742. [Google Scholar] [CrossRef]
Xie, X., Yuan, M.-J., Bai, X., Gao, W., & Zhou, Z.-H. (2023). On the Gini-impurity preservation for privacy random forests. Advances in Neural Information Processing Systems, 36, 45055–45082. [Google Scholar]
Yang, M., Hagenauer, J., Dijst, M., & Helbich, M. (2021). Assessing the perceived changes in neighborhood physical and social environments and how they are associated with Chinese internal migrants’ mental health. BMC Public Health, 21, 1240. [Google Scholar] [CrossRef] [PubMed]
Yemane, R., & Fernández-Reino, M. (2021). Latinos in the United States and in Spain: The impact of ethnic group stereotypes on labour market outcomes. Journal of Ethnic and Migration Studies, 47(6), 1240–1260. [Google Scholar] [CrossRef]
Ziegler, J., & Fiedler, K. (2024). Small Sample Size and Group Homogeneity: A Crucial Ingredient to Inter-Group Bias. Personality and Social Psychology Bulletin, 1461672231223335. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Graphical abstract: methodological framework for analyzing the impact of media coverage on emotional responses toward immigrants.

Figure 2. Study methodology flowchart.

Figure 3. Sensitivity power analysis showing statistical power as a function of effect size (Cohen’s d) for N = 130.

Figure 4. Overview of data preprocessing stages.

Figure 5. Correlation matrix (Spearman) for pre-exposure group.

Figure 6. Decision boundary in SVM (linear kernel) with support vectors. The plot illustrates the decision boundary (solid line) and margins (dashed lines) separating two linearly separable classes. Support vectors, which define the margin, are circled in red. Blue points represent one class, while red points represent the other class. The linear kernel is used for simplicity, and the regularization parameter

C = 1.0

balances margin maximization with classification error minimization.

Figure 6. Decision boundary in SVM (linear kernel) with support vectors. The plot illustrates the decision boundary (solid line) and margins (dashed lines) separating two linearly separable classes. Support vectors, which define the margin, are circled in red. Blue points represent one class, while red points represent the other class. The linear kernel is used for simplicity, and the regularization parameter

C = 1.0

balances margin maximization with classification error minimization.

Figure 7. Comparison between original (left) and binarized (right) distributions of responses to P1. The binary transformation groups values 1–4 as negative and 5–7 as positive.

Figure 8. Heatmap depicting gender versus political orientation, with color intensity representing response frequency.

Figure 9. Comparison of stacked bar charts illustrating responses to P1 categorized by political orientation (left) and news interest levels (right). (a) Stacked bar chart illustrating responses to P1 categorized by political orientation. (b) Stacked bar chart displaying responses to P1 categorized by news interest levels.

Figure 10. Feature importance comparison between decision tree and random forest models based on the first survey. The three most relevant features, according to Gini index scores, were “Immigration can benefit the economy” (A6), “Immigration may increase crime” (A3), and “Immigrants cause problems” (A7).

Figure 11. Feature importance comparison between decision tree and random forest models from the second survey. The top three features, based on Gini index scores, were “Distrust” (Em3), “Overall interest in the news” (INTERÉS NOTICIA), and “Admiration” (Em2).

Table 1. State of the art on immigrant well-being and social support in Spain and China.

Article	Nationalities of Immigrants	Sample Size	Age Range	Quantitative Methodology	Scales Used	Quantitative Results
Hombrados-Mendieta et al. (2019)	Ukraine, Romania, Bulgaria, Russia, Maghreb (Africa), Paraguay, Argentina, Colombia, Venezuela	1131 immigrants (49% men, 51% women)	18 to 70 years	Structural equation modeling (SEM)	GHQ-12, SWLS, social support questionnaire, SCI-2, illness questionnaire (INE)	Native friends’ support and SOC: 0.131; SOC and SWL: 0.648. SWL and mental health symptoms: −0.012; SWL and diseases: −0.171.
García-Cid et al. (2020)	Eastern Europe (31.6%), Africa (33.2%), Latin America (35.2%)	1714 immigrants (48.7% M, 51.3% F)	16 to 74 years	Multiple regression analysis (PROCESS tool in SPSS 20)	Perceived discrimination questionnaire, Brief Sense of Community Scale, GHQ-12, SWLS, social exclusion feelings scale	Perceived discrimination: SOC (−0.29), life satisfaction (−0.36); psychological distress (+0.33), social exclusion (+0.43). SOC: satisfaction (+0.44); distress (−0.27), exclusion (−0.28). Discrimination: distress (+0.275), satisfaction (−0.244), exclusion (+0.375).
Caro-Carretero et al. (2024)	Various origins (no specific nationalities mentioned)	2470 in 2015, 2460 in 2016, 2455 in 2017 (Spanish participants, gender not specified)	Over 18 years	Hybrid wrapper algorithm and clustering techniques	Attitude questionnaire towards immigration, Likert scales (0–10) covering symbolic racism, aversive racism, and subtle prejudice	Symbolic racism: 38% (2015); resource competition: 20% (2017). Aversive racism persisted. Multicultural attitudes: 2015: 36.9% non, 29.2% multicultural; 2016: 10.3% non, 55.4% multicultural; 2017: 40.1% non, 19.5% multicultural. Latent racism: 38%, subtle racism: 80%. Health system abuse perception: 36% (2015), 41% (2016). Unequal scholarship access: 10.1% (2015).
Gutiérrez-Rodríguez et al. (2024)	Latin Americans (Venezuela, Cuba, Colombia, Argentina)	263 migrant families (79.8% F)	Mean age 40.5 years	Cluster analysis, multinomial logistic regression	Economic Hardship Questionnaire, Medical Outcomes Study Social Support Survey (MO-SS), Neighborhood Cohesion Instrument	Inclusion profiles: High (32%), Partial (35%), Low (33%); predictors: family, social services, residence length. Unemployment: 56.7%. Income: 64.6% <500 €, 0.4% >2000 €. Difficulty: 2.9/5. Support: instrumental 3.71, emotional 3.87, affectionate 4.14. Cohesion: attraction 3.48, relationships 2.91, belonging 3.19.
Yang et al. (2021)	Chinese internal migrants from various provinces	591 migrants (56% M, 44% F)	17 to 68 years	Random forest model for nonlinear associations between neighborhood changes and mental health	GHQ-12, Neighborhood Environment Walkability Scale (NEWS-A)	GHQ-12: 6.610. Aesthetics: −0.132; safety: −0.029; accessibility: 0.174; green space: 0.240; cohesion: −0.052. Age: 31.374 years. Fair/poor health: 35%. Non-Shenzhen Hukou: 69%; inter-provincial migrants: 68%. Income: 23% ≤4000 CNY, 43% 4001–8000 CNY, 34% >8000 CNY. Employment: 77% employed, 23% unemployed.
Indelicato et al. (2022)	Belgium, Germany, Spain, France, the UK, and Portugal	9066	24 years old or younger (7.70%) up to 75 years old or older (10.38%)	TOPSIS	5-point Likert scale	Iberian Peninsula most open (20% immigrants in Balearic Islands); UK and Belgium most anti-immigrant (<10% in Corsica); far-right regions oppose due to economy, crime, and culture; Muslims most pro-immigrant, Catholics more negative.
Sánchez-Holgado et al. (2022)	No specific nationalities mentioned	97,710 geolocated tweets	-	Twitter API and deep learning for hate speech detection; INE and CIS data on migration attitudes and foreign population	Hate speech scale (0–1); proportion of foreign citizens (0–1); migration attitude scale (CIS 2017, recoded 0–1 from Likert 1–4)	Analyzed 97,710 geolocated tweets (2015–2020); no significant correlation between foreign population, hate speech, or immigration attitudes.
Formoso-Suárez et al. (2022)	Latin American immigrants, mainly Venezuelans, plus others from 13 countries.	206	18 to 74 years	Correlational design with convenience sampling; data collected via Google Forms and analyzed in SPSS V.25	EAS for social support, R-COPE for religious coping, DUREL for religiosity, and an acculturation questionnaire	Happiness correlated positively with religiosity, social support, and positive coping; negatively with negative coping. Regression showed these factors, plus gender, predicted happiness, with men explaining 1% more variance.
Roman Etxebarria et al. (2024)	Latin America (48.3%), Eastern Europe (24.1%), Africa (20.9%), Asia (3.8%), and Central Europe (2.9%).	373	18 to 65 years	Quantitative study using ANOVAs, correlations, and SPSS 24 to analyze inclusion, life satisfaction, and social networks by demographics	Used Woosnam’s Inclusion Scale, SWLS (5 items) for life satisfaction, and LSNS for social networks. Reliability: $α$ = 0.693–0.914	Women had higher life satisfaction and social network scores. Younger migrants (18–35) scored higher in friendship networks. Central Europeans had the highest scores, while African and Asian migrants had the lowest. Broad social networks correlated with life satisfaction (r = 0.334) and inclusion (r = 0.564).
Indelicato (2022)	Data came from the ESS, ISSP, Eurostat, Economist Intelligence Unit, and electoral records. Immigrant nationalities were unspecified, with comparisons based on country-specific factors.	-	-	The study applied DEA to examine indicators like national identity and migration, and used fuzzy set theory (FST) in an effort to shed light on Likert-scale responses for analysis	Likert scales, converted to fuzzy numbers	Most accepting were northern and eastern Europe as well as the Iberian Peninsula. Young, wealthy, non-Catholic, and foreign dwellers were more tolerant, and the capital areas and vacation islands had higher ATI scores. In the US and Russia, stricter standards of identity were detected, and leftists were more supportive of civic as opposed to ethnic national identity.

Table 2. Guide to variables used in surveys, grouped by question.

Question	Variable	Description	Scale/Observation
Survey 1: Evaluation of Arab and African Migration
–	ID	Unique identifier for each participant.	Categorical
–	Edad	Age group (1: 18–22, 2: 23–27, 3: 28–31, 4: 32+).	Ordinal
–	Sexo	Gender (1: Male, 2: Female).	Binary
–	Nacionalidad	Nationality (1: Spanish, 2: Spanish and other, 3: Other).	Categorical
P1. General evaluation	(P1)	Overall perception of Arab and African migration.	Likert 1–7
P2. Respond to the following statements:	A1	Immigrants take jobs Spaniards don’t want.	Likert 1–7
	A2	Immigrants are still needed in Spain.	Likert 1–7
	A3	Immigration may increase crime.	Likert 1–7
	A4	Society cannot function without immigrants.	Likert 1–7
	A5	Immigration is linked to insecurity.	Likert 1–7
	A6	Immigration can benefit the economy.	Likert 1–7
	A7	Immigrants cause problems.	Likert 1–7
	A8	Immigrants contribute to national development.	Likert 1–7
P3. What emotions do immigrants provoke in you?	E1	Interest.	Likert 1–7
	E2	Joy.	Likert 1–7
	E3	Surprise.	Likert 1–7
	E4	Sadness.	Likert 1–7
	E5	Anger.	Likert 1–7
	E6	Disgust.	Likert 1–7
	E7	Contempt.	Likert 1–7
P4. Imagine the following scenarios:	(P4.1)	Living near many Arab or African immigrants.	Likert 1–7
	(P4.2)	Working/studying with Arab or African immigrants.	Likert 1–7
P5. Attacks by immigrants	(P5)	Concern about immigrant attacks on Spaniards.	Likert 1–7
P6. Attacks by Spaniards	(P6)	Concern about Spaniard attacks on immigrants.	Likert 1–7
P7. Sources of opinion	CA	Friends.	Binary (1 = selected)
	CF	Family.	Binary (1 = selected)
	TV	Television.	Binary (1 = selected)
	PR	Radio.	Binary (1 = selected)
	R	Press or magazines.	Binary (1 = selected)
	I	Internet.	Binary (1 = selected)
	CT	School or workplace.	Binary (1 = selected)
	Otro	Other source.	Binary (1 = selected)
	NR	No response.	Binary (1 = selected)
	NC	Don’t know.	Binary (1 = selected)
P8. Media attention	TV.1	TV attention to migration.	Likert 1–7
	PD	Digital press attention.	Likert 1–7
	RS	Social media attention.	Likert 1–7
P9. Media portrayal	TV.2	TV image of immigrants.	Likert 1–7
	PD.1	Digital press image.	Likert 1–7
	RS.1	Social media image.	Likert 1–7
P10. Time spent on media	(P10)	Time spent on media.	Likert 1–7
P11. Political orientation	I.1	Left-wing.	Binary (1 = selected)
	C.1	Center.	Binary (1 = selected)
	D	Right-wing.	Binary (1 = selected)
P12. Voting	(P12)	Political party voted for in the last election.	Open-ended
Survey 2: Reaction to Migration News
–	ID	Unique identifier for each participant.	Categorical
P1. What emotions did the news provoke in you?	Em1	Fear.	Likert 1–7
	Em2	Admiration.	Likert 1–7
	Em3	Distrust.	Likert 1–7
	Em4	Insecurity.	Likert 1–7
	Em5	Sympathy.	Likert 1–7
	Em6	Discomfort.	Likert 1–7
	Em7	Indifference.	Likert 1–7
	Em8	Shame.	Likert 1–7
	Em9	Contempt.	Likert 1–7
	Em10	Guilt.	Likert 1–7
P2. Rate your interest in the news	INTERÉS NOTICIA	Overall interest.	Likert 1–7
P3. How did you perceive the news?	C	Confusing.	Likert 1–7
	LD	Difficult to read.	Likert 1–7
	S	Superficial.	Likert 1–7
	S.1	Biased.	Likert 1–7
	MS	Too simple.	Likert 1–7
	D	Decontextualized.	Likert 1–7
	I	Imprecise.	Likert 1–7
	A	Boring.	Likert 1–7

Table 3. SVM results for first and second survey using top features from decision tree and random forest. Bold values indicate the highest performance metrics observed among all evaluated models.

Model	Features	Class	Acc.	Prec.	Recall	F1	Bal. Acc.	Macro F1	Cohen’s $κ$
First Survey
SVM (DT)	A6, A3, A7, CA, C	Negativa	0.8846	1.00	0.81	0.90	0.9062	0.8831	0.7692
SVM (DT)	A6, A3, A7, CA, C	Positiva	0.8846	0.77	1.00	0.87	0.9062	0.8831	0.7692
SVM (RF)	A6, A3, A7, E2, P5, E1, A5, E3, E4, A4, A8, E7, P6, E5	Negativa	0.8462	1.00	0.75	0.86	0.8750	0.8452	0.6977
SVM (RF)	A6, A3, A7, E2, P5, E1, A5, E3, E4, A4, A8, E7, P6, E5	Positiva	0.8462	0.71	1.00	0.83	0.8750	0.8452	0.6977
Second Survey
SVM (DT)	Em3, P2, A, C	Negativa	0.8077	0.86	0.80	0.83	0.8091	0.8051	0.6108
SVM (DT)	Em3, P2, A, C	Positiva	0.8077	0.75	0.82	0.78	0.8091	0.8051	0.6108
SVM (RF)	Em3, Em2, D, C, LD, S1, S	Negativa	0.9231	0.93	0.93	0.93	0.9212	0.9212	0.8424
SVM (RF)	Em3, Em2, D, C, LD, S1, S	Positiva	0.9231	0.91	0.91	0.91	0.9212	0.9212	0.8424

Table 4. Mean performance and standard deviation of classifiers across pre- and post-exposure datasets using stratified 5-fold cross-validation. Bold values indicate the best performance metrics within each exposure condition.

Dataset	Model	Accuracy	Balanced Acc.	Macro F1	Kappa	ROC AUC
Pre-exposure	Decision Tree	0.6231 ± 0.0510	0.6250 ± 0.0509	0.6129 ± 0.0678	0.2488 ± 0.1018	0.7049 ± 0.0425
	Random Forest	0.7154 ± 0.0713	0.7179 ± 0.0693	0.7140 ± 0.0719	0.4340 ± 0.1400	0.7902 ± 0.0554
	SVM (DT Features)	0.7385 ± 0.1341	0.7383 ± 0.1340	0.7362 ± 0.1349	0.4766 ± 0.2679	0.8295 ± 0.0970
	SVM (RF Features)	0.6462 ± 0.0662	0.6474 ± 0.0649	0.6452 ± 0.0660	0.2943 ± 0.1303	0.7345 ± 0.0576
Post-exposure	Decision Tree	0.6126 ± 0.0399	0.6141 ± 0.0410	0.6092 ± 0.0396	0.2273 ± 0.0813	0.6863 ± 0.0477
	Random Forest	0.6831 ± 0.0869	0.6833 ± 0.0873	0.6820 ± 0.0874	0.3663 ± 0.1741	0.7340 ± 0.0625
	SVM (DT Features)	0.6742 ± 0.0478	0.6737 ± 0.0482	0.6670 ± 0.0494	0.3476 ± 0.0962	0.7016 ± 0.0489
	SVM (RF Features)	0.6889 ± 0.0821	0.6891 ± 0.0818	0.6867 ± 0.0814	0.3781 ± 0.1638	0.7655 ± 0.0861

Table 5. Comparison of quantitative studies and key findings on migration attitudes.

Article	Sample size	Quantitative Methodology	Quantitative Results
Our study	130 surveys	Ensemble learning: DT, RF with SVM	DT-RF key features before exposure: A6, A3, A7; after: Em3, Em2, P2, D, C. DT accuracy: 88.46%→80.77%, RF: 84.62%→92.31%. Key coefficients: DT (Em3 −0.4168, A −0.4260, P2 0.3056, C 0.1760), RF (Em3 −0.4896, LD −0.2781, Em2 0.5531, S.1 0.3316).
Hombrados-Mendieta et al. (2019)	1131 immigrants	Structural equation modeling (SEM)	SOC and SWL: 0.648; SWL and diseases: −0.171; SWL and mental health symptoms: −0.012.
García-Cid et al. (2020)	1714 immigrants	Multiple regression analysis (PROCESS tool in SPSS 20)	SOC-Sat: 0.44; SOC-Dist: −0.27; SOC-Excl: −0.28; Sat-Dist: −0.36; Disc-Dist: +0.28; Disc-Sat: −0.24; Disc-Excl: +0.38.
Caro-Carretero et al. (2024)	2470 in 2015, 2460 in 2016, 2455 in 2017	Hybrid wrapper algorithm and clustering techniques	Symbolic racism: 38% (2015); resource competition: 20% (2017); multicultural attitudes: 2015: 36.9% non, 29.2% multicultural; 2017: 40.1% non, 19.5% multicultural; latent racism: 38%; subtle racism: 80%; health abuse: 36% (2015), 41% (2016); unequal scholarship: 10.1%.
Gutiérrez-Rodríguez et al. (2024)	263 migrant families	Cluster analysis, multinomial logistic regression	Inclusion: high 32%, partial 35%, low 33%; predictors: family, services, residence. Unemployment 56.7%, income: 64.6% <500 €. Difficulty: 2.9/5. Support: instrumental 3.71, emotional 3.87, affectionate 4.14. Cohesion: attraction 3.48, relationships 2.91.
Indelicato et al. (2022)	9066 surveys	TOPSIS	Iberian Peninsula most open (20% immigrants in Balearic Islands); UK and Belgium most anti-immigrant (<10% in Corsica); far-right oppose due to economy, crime, culture; Muslims pro-immigrant, Catholics more negative.
Sánchez-Holgado et al. (2022)	97,710 geolocated tweets	Twitter API and deep learning	No correlation between foreign population, hate speech, or immigration in 97K tweets (2015–2020).
Formoso-Suárez et al. (2022)	206 surveys	Correlational design with convenience sampling	Happiness linked to religiosity, support, and coping; gender explained 1% more variance.
Roman Etxebarria et al. (2024)	373 surveys	ANOVAs, correlations, and SPSS 24	Women had higher life satisfaction and social networks; younger migrants had larger friendship networks; Central Europeans scored highest, others lowest; social networks linked to life satisfaction and inclusion.
Indelicato (2022)	-	DEA and fuzzy set theory (FST)	Northern, eastern Europe, Iberian Peninsula most tolerant; youth, high-income, non-Catholics, foreigners more open; capital regions, tourist islands higher ATI; US, Russia stricter on identity; left-wing favored civic identity

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tirado-Espín, A.; Marcillo-Vera, A.; Cáceres-Benítez, K.; Almeida-Galárraga, D.; Orozco Garzón, N.; Moreno Guaicha, J.A.; Carvajal Mora, H. Analyzing Communication and Migration Perceptions Using Machine Learning: A Feature-Based Approach. Journal. Media 2025, 6, 112. https://doi.org/10.3390/journalmedia6030112

AMA Style

Tirado-Espín A, Marcillo-Vera A, Cáceres-Benítez K, Almeida-Galárraga D, Orozco Garzón N, Moreno Guaicha JA, Carvajal Mora H. Analyzing Communication and Migration Perceptions Using Machine Learning: A Feature-Based Approach. Journalism and Media. 2025; 6(3):112. https://doi.org/10.3390/journalmedia6030112

Chicago/Turabian Style

Tirado-Espín, Andrés, Ana Marcillo-Vera, Karen Cáceres-Benítez, Diego Almeida-Galárraga, Nathaly Orozco Garzón, Jefferson Alexander Moreno Guaicha, and Henry Carvajal Mora. 2025. "Analyzing Communication and Migration Perceptions Using Machine Learning: A Feature-Based Approach" Journalism and Media 6, no. 3: 112. https://doi.org/10.3390/journalmedia6030112

APA Style

Tirado-Espín, A., Marcillo-Vera, A., Cáceres-Benítez, K., Almeida-Galárraga, D., Orozco Garzón, N., Moreno Guaicha, J. A., & Carvajal Mora, H. (2025). Analyzing Communication and Migration Perceptions Using Machine Learning: A Feature-Based Approach. Journalism and Media, 6(3), 112. https://doi.org/10.3390/journalmedia6030112

Article Menu

Analyzing Communication and Migration Perceptions Using Machine Learning: A Feature-Based Approach

Abstract

1. Introduction

1.1. Motivation and Contributions

Conceptual Framework: Media Framing, Context, and Individual Traits

1.2. Related Works

2. Methodology for Survey-Based Classification and Feature Analysis

2.1. Dataset

2.2. Data Preprocessing

2.3. Exploratory Data Analysis

2.4. Feature Selection and Binarization

2.5. Classification Models

3. Results

3.1. Descriptive Statistical Analysis and Graphics

3.2. Feature Analysis in Random Forest and Decision Tree Models Based on Gini Index, Before and After Exposure

3.3. Model Performance and Metrics Obtained

3.4. Cross-Validation and Metric Stability

4. Discussion

4.1. RQ1: Impact of Media Coverage and Social Environments on Emotional Responses

4.2. RQ2: Common Emotional Reactions Related to Arab and African Immigration

4.3. RQ3: Influence of Individual Characteristics on Attitude Shifts

4.4. General Discussion and Contributions to the Field

4.5. Limitations

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI