Item Response Theory Applied to the Spiritual Needs Questionnaire (SpNQ) in Portuguese

The item response theory (IRT), or latent trace theory, is based on a set of mathematical models to complement the qualitative analysis of the items in a given questionnaire. This study analyzes the items of the Spiritual Needs Questionnaire (SpNQ) in the Portuguese version, applied to HIV+ patients, with R Studio 3.4.1, mirt statistical package, to find out if the items of the SpNQ possess appropriate psychometric qualities to discriminate between respondents as to the probability of marking one answer and not another, in the same item, showing whether or not the questionnaire is biased towards a pattern of response desired by the researcher. The parameters of discrimination, difficulty, information, and the characteristic curve of the items are evaluated. The reliable items to measure the constructs of each of the five dimensions of the SpNQ of this HIV+ sample (Religious Needs; Inner Peace and Family Support Needs; Existential Needs; Social Recognition Needs; and Time Domain Needs) are presented, as well as the most likely response categories, depending on the latent trace level of the individuals. The questionnaire items showed satisfactory discrimination and variability of difficulty, confirming the good psychometric quality of SpNQ.


Introduction
The item response theory (IRT) consists of a set of psychometric models to develop and refine questionnaires measures (Embretson and Reise 2000), being useful to evaluate questionnaires on health field and psychometric evaluation in general, searching more detailed information to ameliorate these instruments (Valentini and Laros 2011). No other study has applied these analyses to the Spiritual Needs Questionnaire.
The Spiritual Needs Questionnaire (SpNQ) measures psychosocial, existential, and spiritual needs in clinical contexts. Religiosity/spirituality plays a vital role in facing the consequences and daily life of various chronic diseases as a coping strategy, improving the quality of life, and even confirming the purpose of living of those patients . SpNQ (Büssing et al. 2010) evaluates the intensity of individuals' spiritual, existential, and psychosocial needs; the respondents indicate whether

The Item Response Theory
The item response theory (IRT)-or latent trace theory-refers to a contiguous of mathematical models that considers the item as the basic unit of analysis, representing the probability of an individual to provide a particular response to an item as a function of the item's parameters and the respondent's ability (or trace) (Primi et al. 2014). In general, the IRT allows the measurement of specific characteristics of the individuals to be evaluated (Andrade et al. 2000) and verifies whether a given questionnaire presents desirable and valuable psychometric qualities of discrimination between respondents.
The IRT offers the performance evaluation of the individual faced with the item (whether it is the behavior or the effect observed directly). This theory aims to fill some limitations of the classical theory of tests (CTT); the main one is that the model to make a scale is based only on the results obtained with the questionnaire as a whole. However, the IRT uses the item as a fundamental unit for analysis. Secondly, for IRT, the measures depend on the sample of individuals who answered the questionnaire, therefore being valid only for that specific sample, or another similar one (Embretson and Reise 2000). In addition, IRT considers that different tests with different indexes of difficulty and discrimination generate different results for the same individuals.
For IRT, if two different tests measure the same construct, the results are not expressed on the same scale, which jeopardizes a direct comparison of them. So, through the latent trace estimated on this theory, there is the bonus of possible comparisons between the latent trace of respondents from distinct samples when submitted to the same test or different ones (Embretson and Reise 2000).
Another point is that of the trustworthiness of the evaluation because CTT supposes that two tests applied to the same sample must produce valid and identical scores, as well as its variances (Pasquali and Primi 2003;Pasquali 2009). Finally, CTT also considers that the variance in the measure of the errors, for all respondents, is the same (Sartes and de Souza-Formigoni 2013). Differently, IRT observes distinct levels of precision for each item per se, according to specific latent traces levels.
The way the response is offered (i.e., its causes) is also considered, taking into account the set of variables and the intensity of the latent trace that is present in that individual. In this way, conclusions are not tied to the test or questionnaire in general, but to each item that constitutes it. Its analysis derives in a group of valid items, allowing the elaboration of numerous tests with each item that composes the questionnaire (Andrade et al. 2000;Lima et al. 2019).
The latent trace theory assumes that the items of a test are a behavioral representation given by individuals in response to one or more of their latent traits. In other words, it considers that a psychic process causes the behavior in the questionnaire response. Regarding latent traits, two primary axioms are used: (1) the prediction of the performance of the subject in the item (task) is considered by a grouping of latent traits (aptitudes, abilities, or trace), which are identified by the Greek letter theta (θ). The performance itself represents the effect, and the latent traits are its cause. (2) the item characteristic curve (ICC) refers to a mathematical function of the relationship between test performance and the latent trait of the individual who answered it. Using the ICC, it is observed that individuals with higher latent trace will are more likely to endorse or hit an item (for a dichotomous case or a specific category of an item, in a polytomic response) between individuals with a certain level of that attribute (Cappelleri et al. 2014).
For polytomic-type items, Samejima's Graded Response Model (SGRM) (Samejima 1969) offers two parameters: discrimination (a) and difficulty (b)-which may vary from item to item. Discrimination refers to the power of the item to differentiate subjects with different aptitudes, considered satisfactory values above 0.60 (Nakano et al. 2015). The difficulty is the level of theta (latent traces) required for an individual to mark specific categories of response. The use of SGRM is appropriate when we have instruments with categorical and orderly response items, and not all items need to have the same number of response categories (Lima et al. 2019).
In addition to the breakdown and difficulty of the items, the model assesses the amount of information on the items (accuracy), information curves, and characteristic curves of the items (CCIs). The CCIs are constructed from the number of response categories (k) of each item and the parameters of discrimination and difficulty. In the SGRM model, there are (k − 1) response thresholds (thresholds) for each item, i.e., the meeting between two categories. Each threshold corresponds to the measure of the difficulty of the item (parameter b). For example, on a Likert scale of four thresholds (b 1 , b 2 , b 3 , and b 4 ), the first (b 1 ) represents the amount of probable theta for an individual to choose category 2 instead of category 1. The inclination of each curve depends on the parameter "a" (discrimination).
The IRT reports on the accuracy of the measure from a different perspective from CTT. It considers that the accuracy of the measure varies according to the latent trace obtained and items' quality presented in the region near theta. Thus, a test composed of items that present good psychometric properties in higher areas of theta will better evaluate people with higher latent trace levels in this area (Nakano et al. 2015).
The IRT also offers the item information function (IIF) through the item information curve (IIC), which shows the amount of information (ability to differentiate between respondents) that the item has for different latent trace levels. The IIF evaluates the accuracy of the measurement of that item, considering the differences between respondents with different levels of the underlying trace (Cappelleri et al. 2014), showing how well the items represent theta (Pasquali 2007).
The IRT has been widely used to evaluate questionnaires applied in the health area, in the areas of psychometry and education, in marketing, surveys, and cognitive diagnoses. In each of them, the items in the questionnaires can be calibrated to fit the mathematical model, considering the individual scores of the respondents (Van Der Linden 2016).
In the IRT, the default error is inversely proportional to the concept of information function (Nakano et al. 2015). The item and test information functions are relevant as they are a viable alternative to the classical concepts of standard accuracy and error. The advantage is that the default error can be represented at any chosen latent trace level, thus determining the accuracy at any theta level (Pasquali 2007).

Participants
The SpNQ validation study in Portuguese (Valente et al. 2018) was performed among 200 HIV+ patients treated in a public hospital in Rio de Janeiro (Brazil). The present psychometric study applying IRT relied on the replication study in a new sample of 157 HIV+ patients treated in outpatients at a public hospital in João Pessoa, Paraíba (Brazil) (Silva et al.).

Instruments
The SpNQ validated for Brazilian Portuguese (Valente et al. 2018) used in a similar sampling (Silva et al.).

Procedures
The replication of the validation of the SpNQ in Portuguese was preliminarily approved by the Research Ethics Committee (CEP) of the Center of Health Sciences of the Federal University of Paraíba, code number 2.564.096. The research complied with Resolution 466 of 12 December 2012 of the National Health Council-CNS. The general manager of the public hospital where data was collected also approved the project.

Statistical Analysis
An IRT model was applied for Samejima's Graded Response Model (SGRM) polytomic items; run through the R program version 3.2.4 with the implementation of the mirt package. Two parameters of the IRT items (difficulty and discrimination) and the likelihood of endorsement for each category were evaluated, aiming at defining the levels of theta necessary to respond at each response threshold. In addition, the information area of each item, total information charts, and error of each component and analysis of the item characteristics curves (ICCs) were analyzed. The ICCs were interpreted according to three standards as recommended by Sales et al. (2018): Standard 1 represents the items where the three intermediate response categories were overlapped by the end categories (values 0 and 3); Standard 2 consists of those ICCs where only one of the categories of response, among the two intermediate categories (1 and 2), was more likely to be endorsed than the others at some point in theta level; and Standard 3 occurs when all categories of response at some point in the theta scale were more likely to endorse than the others. The latter is the most desirable pattern.

Results
An analysis of the SpNQ items was performed from the item response theory with the model of the Samejima's Graded Responses Model. The parameters of the items are in Table 1 below.
The items in the Religious Needs component showed discrimination between 0.66 (item 26. Transmit your own life experience to other people) and 2.57 (item 19. Have someone prays for you). All items presented satisfactory discrimination (>0.60). It is also observed that the items covered a large portion of theta, given the b values (thresholds) between −3.0 (item 20 b 1 ) and 1.46 (item 26 b 3 ). Items 14 (Hand over, donate something of yours) and 26 presented a higher level of difficulty since higher theta values are required to agree with category 3 (very strong). The item with the largest area of information was the 19, and the general information of this component was 18.06.
The breakdown of the Inner Peace and Family Support Needs component ranged from 0.30 (item 25. Feel connected with your family) to 3.03 (item 7. Remain in a place of stillness and peace). Item 25 discriminated individuals in an unsatisfactory way (<0.60). The b values ranged from −6.5 (item 25 b 1 ) to 1.90 (item 25 b 3 ). Item 25 proved to be more difficult for individuals due to the high theta value required to choose category 3. The item with the most extensive information area was 7 (6.29), and the component information was 14.7.
The items in the Existential Needs component offered discrimination between 0.53 (item 10. Finding meaning in disease and/or suffering) to 2.89 (item 12. Talking to someone about the possibility of life after death). Only item 10 showed low discrimination. It was also found that the items covered a large portion of theta, given the b values (thresholds) ranging from −2.01 (item 2 b 1 ) to 3.0 (item 10 b 3 ). All items presented great difficulty, given the high theta values required. The individuals in item 2 did not endorse category 4. The item with the largest area of information was 12 (5.42), and general component information was 7.89. Note: SpNQ-Spiritual Needs Questionnaire; a-Discrimination; b 1 , b 2 , b 3 , thresholds; I (θ; −3; +3)-Information area between thetas −3 and +3.
The Social Recognition Needs component showed discrimination between 0.88 (item 22. Reading religious/spiritual books) and 1.69 (item 3. That someone from your religious community (ex.: pastor, priest) would take care of you). All items adequately discriminated individuals. The b values ranged from −0.84 (b 1 of item 21) to 1.68 (b 3 of item 3). Items 3 and 22 revealed greater difficulty: item 3 presented a larger area of information, and the total factor information was 6.20.
The fifth and final factor, Time Domain Needs component, presented high discrimination between 1.08 (item 5. Solving "open" aspects, outstanding problems in your life), and 3.75 (item 4. Reflect on your past). The b values ranged from −0.62 (b 1 of item 4) to 2.14 (b 3 of item 5), with item 5 being the most difficult. The most informative item was 4 (9.45), and total information was 11.07.
Next, we analyzed the total information curves of items (TICs) and the item characteristic curves (ICCs) separately by component.
From the total information curves of the items (TICs) of the Religious Needs component (Figure 1), the items with the most information were the 14, 18, 19, and 20; also, they were more accurate to estimate intermediate theta levels (in the order of zero). Items 13, 15, 23, and 26 were less accurate to estimate theta levels.  TICs of the items of the Inner Peace and Family Support Needs component (Figure 3) showed that the 6, 7, and 8 items had more information; also, they were more accurate to estimate theta levels between −3 and 0. Items 13, 15, 23, and 26 were less accurate to estimate theta levels.  TICs of the items of the Inner Peace and Family Support Needs component (Figure 3) showed that the 6, 7, and 8 items had more information; also, they were more accurate to estimate theta levels between −3 and 0. Items 13, 15, 23, and 26 were less accurate to estimate theta levels. TICs of the items of the Inner Peace and Family Support Needs component (Figure 3) showed that the 6, 7, and 8 items had more information; also, they were more accurate to estimate theta levels between −3 and 0. Items 13, 15, 23, and 26 were less accurate to estimate theta levels.      The ICCs of the Existential Needs component (Figure 6) showed that categories of item 2 and 11 were differentially distributed along with theta levels according to Standard 3 (more desirable) in which all categories have a higher probability at a specific point of theta. In item 2 (Talk to others about your fears and concerns), particularly, category 3 (very strong) was not marked by individuals. Items 10 and 12 presented a higher probability of extreme responses 1 (0 = no) or 4 (3 = very strong), and only one of the intermediate categories, representing Standard 2.  The ICCs of the Existential Needs component (Figure 6) showed that categories of item 2 and 11 were differentially distributed along with theta levels according to Standard 3 (more desirable) in which all categories have a higher probability at a specific point of theta. In item 2 (Talk to others about your fears and concerns), particularly, category 3 (very strong) was not marked by individuals. Items 10 and 12 presented a higher probability of extreme responses 1 (0 = no) or 4 (3 = very strong), and only one of the intermediate categories, representing Standard 2. The ICCs of the Existential Needs component ( Figure 6) showed that categories of item 2 and 11 were differentially distributed along with theta levels according to Standard 3 (more desirable) in which all categories have a higher probability at a specific point of theta. In item 2 (Talk to others about your fears and concerns), particularly, category 3 (very strong) was not marked by individuals. Items 10 and 12 presented a higher probability of extreme responses 1 (0 = no) or 4 (3 = very strong), and only one of the intermediate categories, representing Standard 2.
The application of IRT by Samejima's Graded Response Model (SGRM) allowed for the more detailed investigation of the forces and fragilities of the items that make up the SpNQ, providing instrument accuracy indices, knowing the discrimination and difficulty of the items, besides the evaluation of the patterns of the characteristic curves of each item.
This theory considers that the items that offer the largest area of information also present broader parameters of discrimination. The higher the discrimination, the greater is the ability to differentiate individuals as theta changes, providing greater accuracy (Embretson and Reise 2000). The results of this work showed satisfactory parameters for the SpNQ: most of the items were discriminatory (a > 0.60). There is now evidence of an adequate ability to distinguish individuals located in different regions from the latent traits investigated.
The Religious Needs component indicated items 14 (Give up, give something of yours), 18 (Pray with someone), and 19 (Having someone to pray for you) as the most discriminatory and precise to predict this latent trait. Item 26 (Transmitting your own life experience to other people) was more difficult to be endorsed by these HIV+ patients, probably because of the stigma and prejudice regarding the personal experiences of HIV-positive people (Bennett et al. 2016).
The component Inner Peace and Family Support Needs had items 6 (Have more contact with the beauty of nature), 7 (Stay in a place of stillness and peace), and 8 (Find inner peace) with the most satisfactory parameters. Item 25 (Feeling connected with your family) presented greater difficulty of endorsement, probably because of the distance between the family and the patient, and the subsequent isolation (Maposse and Seidl 2019). This item was neither discriminatory nor precise, in isolation, to predict levels in the evaluated construct. Thus, it turned out to be unsatisfactory.
Regarding the Existential Needs component, the most reliable item was 12 (Talk to someone about the possibility of life after death). Item 10 (Finding sense in disease and/or suffering) has not proved reliable to discriminate between different levels of Existential Needs. All items of this component showed great difficulty because they refer to questions such as the meaning of life and death, strongly associated with HIV and other diseases.
In the Social Recognition Needs component, items 3 (That someone from your religious community (e.g., pastor, priest) take care of you) and 21 (Participate in a religious ceremony (e.g., mass, worship) were the best evaluated. Item 22 (Reading religious/spiritual books) presented greater difficulty of endorsement by these participants. Considering that only 16 out of 157 respondents indicated to participate actively in some religion, it is understandable.
Finally, the Time Domain Needs component offered item 4 (Reflect on your past) as the most reliable to correctly predict the level of this latent trait and adequately discriminate individuals. Item 5 (Solving "open" aspects, outstanding problems in your life) presented greater difficulty of endorsement in this sample.
In general, the items reflected satisfactory discrimination and breadth in theta ranges. The questionnaire offers items of varying difficulty, capable of greater accuracy for intermediate needs levels. The ICCs showed a satisfactory pattern for most of the items. Still, the most common pattern was the higher probability of extreme responses (not or very strong), which was expected, as SpNQ only presents the alternative of total disagreement with the item statement. Otherwise, one can agree with the item, but only with three degrees of concordance intensity to choose from.
These results are not conclusive and suggest other studies applying SpNQ for comparison should be conducted. It is essential to detect possible group biases through the differential functioning of the items within the theory. Other IRT resources can be applied in studies with SpNQ, especially the construction of people-item map (person-item map) for the development of interpretative standards based on the items (Cappelleri et al. 2014;Primi et al. 2014).
It was possible to find weaknesses in the questionnaire explicitly applied in the HIV+ population: items with low discrimination and precision indices were discovered, appearing as the less indicated to predict responses considering the difference between the person's theta and the item's intensity. In such cases, people with high theta value could endorse categories of responses not corresponding to their latent trait level in spiritual needs (Linacre 2015).
This study provided advancements on the validity of the SpNQ. Although the results were satisfactory, this study was not exempt from limitations: the reduced sample size; the specific disease of the sample could have brought biases to the results analyzed; the lack of similar information about this population in Brazil; and the insufficient national representativeness of the sampling in these only two studies that applied the questionnaire (so far). Future studies are necessary for other Brazilian regions, and with a higher quantity of respondents, for results comparisons.