Is the Definition of Roma an Important Matter? The Parallel Application of Self and External Classification of Ethnicity in a Population-Based Health Interview Survey

The Roma population is typified by a poor and, due to difficulties in ethnicity assessment, poorly documented health status. We aimed to compare the usefulness of self-reporting and observer-reporting in Roma classification for surveys investigating differences between Roma and non-Roma populations. Both self-reporting and observer-reporting of Roma ethnicity were applied in a population-based health interview survey. A questionnaire was completed by 1849 people aged 18–64 years; this questionnaire provided information on 52 indicators (morbidity, functionality, lifestyle, social capital, accidents, healthcare use) indicators. Multivariate logistic regression models controlling for age, sex, education and employment were used to produce indicators for differences between the self-reported Roma (N = 124) and non-Roma (N = 1725) populations, as well as between observer-reported Roma (N = 179) and non-Roma populations (N = 1670). Differences between interviewer-reported and self-reported individuals of Roma ethnicity in statistical inferences were observed for only seven indicators. The self-reporting approach was more sensitive for two indicators, and the observer-reported assessment for five indicators. Based on our results, the self-reported identity can be considered as a useful approach, and the application of observer-reporting cannot considerably increase the usefulness of a survey, because the differences between Roma and non-Roma individuals are much bigger than the differences between indicators produced by self-reported or observer-reported data on individuals of Roma ethnicity.


Introduction
The Roma population is among the largest minorities in Europe. According to common experiences, which are supported by many data that are not detailed enough to establish effective interventions, their socio-economic and health status is far from acceptable. Despite substantial uncertainties, the EU considers this problem a high priority [1]. This problem also necessitates more systematic research on the role of the Roma ethnicity on health determinants, indicating that the scientific base must be strengthened to establish an adequate Roma policy [2].
Small-and large-scale health surveys are used extensively throughout Europe to assess the population health status. The use of regular surveys to evaluate the health status of the Roma population by inserting the Roma ethnicity into the variables examined during data collection seems to be technically simple and promising. Roma-specific survey results could be very informative. However, this approach is hindered by legal constraints (right to personal data protection of survey participants) to the age range 18-64 years (There were nine self-reported and a further nine interviewer-reported Roma among 572 subjects older than 65.) Ultimately, the investigation focused on 1849 subjects.
There were 124 self-reported Roma subjects, whereas 179 people were categorised as Roma ethnicity by interviewers, of whom 61 individuals were identified only by the observers. (Figure 1)

Socio-Economic Status
There was no difference between Roma and non-Roma samples with respect to sex and marital status composition. The Roma age distribution was shifted towards the younger age groups. The economic activity and the level of education were significantly higher among non-Roma individuals. The Roma households were bigger than the non-Roma households. The differences between the Roma and non-Roma were similar, independently of whether ethnicity was assessed via self-report or interviewer-report (Table 1).
According to the multivariate logistic regression analysis, employed Roma individuals were less willing to declare their Roma ethnicity than economically inactive Roma individuals. Similar underreporting of Roma ethnicity was observed in younger age groups with borderline significance ( Table 2).

Descriptive Health Status Indicators for Roma
According to the crude descriptive measures, the general health status of the Roma is inferior to that of the non-Roma. There is no difference between Roma and non-Roma individuals with respect to accident frequency and adherence in drug consumption.
Apart from the equal crude prevalence of cardiometabolic disorders, chronic disorders show a higher occurrence among the Roma. Since a higher prevalence is observed for cardiometabolic diseases, the general chronic disease occurrence of the Roma does not deviate significantly from that of the non-Roma. The geographical access to health care is similar among Roma and non-Roma individuals, while the access in terms of time is worse among Roma than among non-Roma individuals. The lifestyle indicators are disadvantageous among the Roma, but two indicators (prevalence of obesity and hearing loss) show no association with the Roma ethnicity.

Socio-Economic Status
There was no difference between Roma and non-Roma samples with respect to sex and marital status composition. The Roma age distribution was shifted towards the younger age groups. The economic activity and the level of education were significantly higher among non-Roma individuals. The Roma households were bigger than the non-Roma households. The differences between the Roma and non-Roma were similar, independently of whether ethnicity was assessed via self-report or interviewer-report (Table 1).
According to the multivariate logistic regression analysis, employed Roma individuals were less willing to declare their Roma ethnicity than economically inactive Roma individuals. Similar underreporting of Roma ethnicity was observed in younger age groups with borderline significance ( Table 2).

Descriptive Health Status Indicators for Roma
According to the crude descriptive measures, the general health status of the Roma is inferior to that of the non-Roma. There is no difference between Roma and non-Roma individuals with respect to accident frequency and adherence in drug consumption.
Apart from the equal crude prevalence of cardiometabolic disorders, chronic disorders show a higher occurrence among the Roma. Since a higher prevalence is observed for cardiometabolic diseases, the general chronic disease occurrence of the Roma does not deviate significantly from that of the non-Roma. The geographical access to health care is similar among Roma and non-Roma individuals, while the access in terms of time is worse among Roma than among non-Roma individuals. The lifestyle indicators are disadvantageous among the Roma, but two indicators (prevalence of obesity and hearing loss) show no association with the Roma ethnicity.  Almost each functional status and the oral health-related indicators are worse among the Roma. Indicators related to social capital are similar among the Roma and non-Roma. The only exception is that the Roma probably face more difficulties when they need help from neighbours. The ethnic differences in the use of preventive services varies depending on the service.
There are five indicators (difficult to see clearly; not easy to receive help from the neighbours if he/she would need it; cholesterol level was measured in the last year; blood glucose level was measured in the last year; and pulled out teeth because of dental caries or loose teeth) for which the conclusions regarding differences between Roma and non-Roma individuals are not the same when assessing ethnicity by self-reporting vs. interviewer-reporting. Each of the observed differences suggests that the Roma status is worse if the interviewer-reporting approach is applied and is equal to the non-Roma status if the method of self-reported ethnicity is applied (Table 3).

Roma Ethnicity as a Health Determinant Independent of Socio-Economic Status
Using logistic regression to investigate the differences between the characteristics of the two Roma definitions compared to the non-Roma population, it was found that for 33 indicators, there were no remarkable differences, whereas there were significant differences for 14 variables based on both Roma definitions (results for each indicator are presented in detail in Table 4.) Differences between interviewer-reported and self-reported Roma ethnicity-based ORs were observed for seven indicators. However, the deviations of odds ratios from self-reporting and interviewer-reporting analyses were the same for these seven indicators, and the corresponding confidence intervals showed a wide overlap.   In the self-reporting-based Roma analysis, a body mass index (BMI) above the normal value had less risk of respiratory system disorders (OR: 0.64; 95% CI: 0.41-0.99), whereas respiratory system disorders occurred with higher risk (OR: 1.88; 95% CI: 1.09-3.26) among the Roma. However, the use of glasses or contact lenses (OR: 0.47; 95% CI: 0.28-0.80) and blood glucose measurement in the last year (OR: 0.65; 95% CI: 0.44-0.95) were less likely among the Roma, based on interviewer-reporting analysis. Furthermore, obstructive pain hindering physical activity in the last 4 weeks (OR: 2.23; 95% CI: 1.04-4.79), bleeding gums (OR: 1.87; 95% CI: 1.20-2.90) or lost teeth (OR: 1.85; 95% CI: 1.11-3.08) were more frequent among the Roma in the interviewer-reporting analysis.
The positive correlation between the point estimates for ORs using the two approaches was strong (r = 0.840, p < 0.001), with three outliers (risk of road traffic accidents, not taking medicine for respiratory diseases, and tooth cavities without dental filling). Statistical interpretations of the differences between Roma and non-Roma individuals from the two analyses were the same for each outlier (Figure 2). In the self-reporting-based Roma analysis, a body mass index (BMI) above the normal value had less risk of respiratory system disorders (OR: 0.64; 95%CI: 0.41-0.99), whereas respiratory system disorders occurred with higher risk (OR: 1.88; 95%CI: 1.09-3.26) among the Roma. However, the use of glasses or contact lenses (OR: 0.47; 95%CI: 0.28-0.80) and blood glucose measurement in the last year (OR: 0.65; 95%CI: 0.44-0.95) were less likely among the Roma, based on interviewer-reporting analysis. Furthermore, obstructive pain hindering physical activity in the last 4 weeks (OR: 2.23; 95%CI: 1.04-4.79), bleeding gums (OR: 1.87; 95%CI: 1.20-2.90) or lost teeth (OR: 1.85; 95%CI: 1.11-3.08) were more frequent among the Roma in the interviewer-reporting analysis.
The positive correlation between the point estimates for ORs using the two approaches was strong (r = 0.840, p < 0.001), with three outliers (risk of road traffic accidents, not taking medicine for respiratory diseases, and tooth cavities without dental filling). Statistical interpretations of the differences between Roma and non-Roma individuals from the two analyses were the same for each outlier (Figure 2).

Discussion
The self-and external designations of the Roma ethnicity in health surveys were investigated by parallel application to obtain information about the quality of results based on these methodological approaches.
Our observation confirmed the common belief that the observer reports are more effective in identifying Roma adults than the self-reporting approach. In the case of Roma adults, the intention not to admit one's Roma ethnicity is stronger than the misclassification by an observer who assesses the Roma ethnicity by obtaining information during the interview. In fact, the application of the observer-reported Roma classification resulted in 1.44 times more identified Roma individuals (N = 179) than the application of only the self-identification (N = 124) approach.

Discussion
The self-and external designations of the Roma ethnicity in health surveys were investigated by parallel application to obtain information about the quality of results based on these methodological approaches.
Our observation confirmed the common belief that the observer reports are more effective in identifying Roma adults than the self-reporting approach. In the case of Roma adults, the intention not to admit one's Roma ethnicity is stronger than the misclassification by an observer who assesses the Roma ethnicity by obtaining information during the interview. In fact, the application of the observer-reported Roma classification resulted in 1.44 times more identified Roma individuals (N = 179) than the application of only the self-identification (N = 124) approach.
According to the evaluation of the socio-demographic differences between only-observeridentified and self-identified Roma adults, the working Roma are more willing to reject the admission of Roma ethnicity. It is likely that this characteristic is more common among the younger Roma population. Since one of the most important social characteristics of the Roma is their exclusion from the labour market, this profile suggests that the Roma who can break out of this marginalized social position through employment may have a secretive attitude regarding their ethnicity. It seems that this subgroup can be reached by the application of observer reports classifying Roma individuals in health data collection.
The crude descriptive analysis showed significant differences between Roma and non-Roma groups for 35 indicators out of the 52 investigated. There was only one indicator shown to reflect better conditions among the Roma BMI above normal value; ≥25 kg/m 2 ). According to the majority of the studied indicators (30 in self-reporting and 35 in observer-reporting analyses), the health status of the Roma was disadvantageous compared to that of the non-Roma. Each difference between self-reporting and observer-reporting results showed the Roma health status as more disadvantageous in the case of observer-reporting. The added value of observer-reporting in Roma health studies can be presented by these 5 out of 52 investigated indicators. This higher effectiveness of the observer-reporting approach in demonstrating health status differences between the Roma and non-Roma can also confirm the lower reliability of the self-reporting of the Roma ethnicity.
Since the Roma ethnicity covaried positively with deprivation, the indicators for Roma-to-non-Roma differences, without adjustment for socio-demographic status, are obviously not informative about the role of Roma ethnicity in influencing risk. The indicators corrected by socio-demographic factors confirmed the results from univariate analyses, such that the Roma health status was shown as inferior to that of the non-Roma (with the exception of BMI above 25 kg/m 2 ). However, the number of adjusted indicators with statistically significant Roma-to-non-Roma differences was remarkably reduced in comparison with unadjusted indicators (self-reporting: 15 out of 30; observer-reporting: 15 out of 35). The disadvantageous risk pattern among the Roma is in good concordance with the published results from Hungary [21,65,71,72].
The indicators with statistically significant differences between Roma and non-Roma individuals that could be interpreted differently by self-reporting and observer-reporting do not unequivocally support the higher sensitivity of observer-reporting. Observer-reporting showed higher effectiveness for five of seven indicators, while self-reporting proved to be more effective for two of seven indicators. Our results suggest that the higher effectiveness of observer-reporting in Roma identification, and in crude descriptive evaluation, is not accompanied with higher effectiveness in the evaluation of socio-demographically adjusted indicators.
Our results show that it is causeless to undertake the elaboration of methodology that can handle all the sensitive (historical, legal, and ethical) issues related to the external Roma classification. There is a low probability that the survey results based on external Roma classification could improve the effectiveness of data-driven health policy formulation.

Strengths and Limitations
The present study was a population-based investigation with the sample selected at random. The size of the non-Roma population was considerably large, ensuring relatively precise reference values for Roma-specific risk evaluation. The quality of collected data was ensured by the application of questions from the European Health Interview Survey, which was tested in a Hungarian national survey as well. The health-determining role of ethnicity could be studied with control for deprivation because Roma-specific risks were adjusted for a number of socio-demographic factors. The main strength of this study was the parallel use of self-reporting and interview-reporting identification, allowing a direct comparison of the two methods.
The most important limitations of our study were the low response rate and the weak statistical power because of the relatively small number of Roma subjects in the studied sample. This small number of Roma subjects likely resulted in a type II error, which is responsible for the lack of any observable differences between the Roma risks computed by the two approaches, whereas many Roma-to-non-Roma differences were detected by both methods.
We could not investigate the added value of interviewer-reporting ethnicity assessment as extra question in survey added to questions on the self-reported ethnicity. Odds ratios for interaction could not be computed by logistic regression models with term for interaction between self-reported and interviewer-reported Roma ethnicity in case of many indicators, because of the small number of Roma participants in our survey (data not shown). Therefore, the direct measure for the added value of interviewer-reporting ethnicity assessment as additional question could not been computed using our database. On the other hand, according to the logistic regression models which distinguished (a) the Roma by self-reporting irrespective of the result of interviewer-reporting; (b) the interviewer-reported Roma without admitted Roma ethnicity; (c) non-Roma classified by both self-reporting and interviewer reporting, there was no indicator with significant difference between the two Roma groups. Due to the small number of Roma participants, the lack of significant difference was accompanied with wide 95% confidence intervals (Appendix B).
Our study investigated one important source of uncertainty of Roma health studies. We could not investigate the role of the interaction between the observer's and interviewee's personality and the interview conditions. Furthermore, we could not investigate how uncertainties of the social construct for the Roma ethnicity can influence the ethnicity classification.

Conclusions
Although the young and employed Roma seem to be less willing to declare their Roma ethnicity than the older and unemployed Roma, there is no remarkable discrepancy in survey conclusions in the difference between Roma and non-Roma adults' health status if we use ethnicity data based on self-reporting or interviewer-reporting. Based on our observations adjusted by socio-demographic status, both approaches for ethnicity identification are equally applicable in surveys, and it seems that the hesitation to insert self-reported Roma ethnicity into the set of surveyed indicators due to the assumed uncertain nature of self-identification is not justified.
The health status differences between the Roma and non-Roma are much larger than those between self-reported and interviewer-reported Roma. Therefore, the issues related to the value of self-reported Roma ethnicity data are not reasonable to prevent extending these surveys by Roma-specific data collection, despite the fact that the Roma identification based on the combination of self-reporting and interviewer-reporting approaches yields remarkably larger Roma subgroups in surveys.

Author Contributions:
The analysis and interpretation of data were performed by Eszter Anna Janka and Ferenc Vincze. János Sándor designed the study, contributed to interpretation of data and writing the manuscript. Eszter Anna Janka was involved in writing the manuscript. Róza Ádány authorized the study and was revising it critically for important intellectual content.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Appendix A
The list of indicators with the dichotomized categories based on the aggregation of original answers collected according to the European Health Interview Survey questions.