Analysis of the Structural Characteristics and Psychometric Properties of the Pelvic Floor Bother Questionnaire (PFBQ): A Systematic Review

Background: The Pelvic Floor Bother Questionnaire is a validated and reliable questionnaire that studies the presence and degree of pelvic floor discomfort, providing a global vision of pelvic floor dysfunction. This questionnaire assesses urinary stress incontinence, urinary urgency, urinary frequency, urge urinary incontinence, pelvic organ prolapses, dysuria, dyspareunia, defecatory dysfunction, fecal incontinence, and the disability it causes to the respondent. Aim: The aim of the present study was to analyze the structural characteristics and psychometric properties of the different versions of the pelvic floor bother questionnaire, as well as the methodological quality, the quality of evidence, and the criteria used for good measurement properties. Methods: A systematic review was carried out in different databases, such as PubMed, SCOPUS, Web of Science, Dialnet, ScienceDirect, and CINAHL, on studies adapting and validating the pelvic floor bother questionnaire in other languages. The data were analyzed taking into account the guidelines of the preferred reporting item statement for systematic reviews and meta-analyses (PRISMA) and following the COSMIN guidelines, considering articles published up to 28 February 2022, and registered in the PROSPERO database. Results: Initially, a total of 443 studies were found, from which a total of four studies were analyzed with regard to structural characteristics and psychometric properties, such as reliability, internal consistency, construct validity, and criterion validity. Conclusions: The different versions of the questionnaires show basic structural characteristics and psychometric properties for the evaluation of patients with pelvic floor dysfunctions. Most of the analyzed versions present criteria for good measurement properties qualified as sufficient, inadequate–adequate methodological quality, and low–moderate quality of evidence.


Introduction
The pelvic floor (PF) is composed of muscles, ligaments, and fascia that function to support the bladder, reproductive organs, and rectum [1]. This musculature is enclosed within the scaffolding formed by the bones of the pelvis: ilium, ischium, and pubis, which are articulated with the sacrum by two posterior sacroiliac joints and an anterior pubic symphysis joint [2]. The correct function of the muscles and structures that make up the PF is essential, since pelvic floor dysfunction (PFD) can cause symptoms such as: urinary incontinence (UI), whether urgency (UUI), stress (USI) or mixed [3], fecal incontinence (FI), overactive bladder (OB), bladder emptying dysfunction, obstructive defecation syndrome, pelvic organ prolapses (POP) [4] or sexual dysfunctions (dyspareunia, anorgasmia, vaginismus, or vulvodynia), among others [3,5,6].
There are risk factors, the best known of which are pregnancy and childbirth, that increase the probability of suffering one of these PFDs [7], although perineal surgeries, obesity, constipation, smoking, lack of knowledge and awareness of the perineal area, and hormonal causes are also behind this symptomatology [8,9]. In reference to prevalence, PFDs are very common, millions of women around the world are affected by this type of problem. Approximately 40% are affected by POP, one in three will experience UI, one in ten will experience UI, and one in ten will experience FI [10] and some may have pain [11], The quality of life of many women is affected to a greater or lesser degree and can affect the social, sexual, and psychological life of women of all ages [12].
In recent years, the use of patient self-reported measures (PROM) [13] has increased considerably both in the field of research and in clinical practice, since they allow the patient to be assessed and the results obtained to be evaluated in a simple way for better planning and monitoring of the patient's state of health. Thanks to the use of PROMs, we can directly evaluate different subjective aspects of the pathology [14]. Some of these questionnaires include aspects to be assessed as PF symptoms, such as [15][16][17], UI [18][19][20], FI [21], sexual activity [22,23], and quality of life [24].
The Pelvic Floor Bother Questionnaire (PFBQ) is a validated and reliable questionnaire that studies the presence and degree of PF discomfort, providing a global vision of PFD. This questionnaire assesses USI, urinary urgency, urinary frequency, UUI, POP, dysuria, dyspareunia, defecatory dysfunction, FI, and the disability it causes to the respondent. It was developed in 2010 by Peterson et al. in English [25] and subsequently translated and validated in four different languages: Chinese, Turkish, Portuguese-Brazilian, and Arabic [26][27][28][29]. It would be necessary that this pelvic floor assessment tool be adapted and validated in other languages and that the structural characteristics and psychometric properties used in the published versions be taken into account in order to improve future versions. Therefore, the aim of the present study was to analyze the structural characteristics and psychometric properties of the different language versions of the PFBQ, as well as the methodological quality, the quality of evidence, and the criteria used for good measurement properties.

Protocol
A systematic review was carried out considering articles published up to 28 February 2022 and was registered in the PROSPERO database (PROSPERO ID: CRD42022307970) following the recommendations of the PRISMA statement [30] and COSMIN guidelines [31].

Resources and Search
The search was carried out in PubMed, SCOPUS, Web of Science, Dialnet, ScienceDirect, and CINAHL databases. The following MeSH terms were included with the Boolean AND/OR operators: "pelvic floor bother questionnaire" AND "pelvic floor disorders"

Selection Criteria
The following selection criteria were taken into account in this search: studies that performed a cross-cultural adaptation and validation of the PFBQ in languages other than that of the original publication. The exclusion criteria were: all papers that did not present the results conclusively and that did not include a validation phase.

Selection of Documents
Documents from the different databases were extracted and included in the Rayyan platform [32]. First, the duplicate documents were eliminated, then blinded by two researchers (LAL and MMCP). The documents were selected by title and abstract. In the case of disagreement between the two researchers when selecting the documents, the selection was made by a third researcher (GMT). The documents that were finally selected were obtained in full text to analyze their content and evaluate their inclusion in this review.

Instrument
The PFBQ is a validated questionnaire that was developed by Cleveland Clinic pelvic floor staff based on clinical interviews and review of commonly used surveys, such as the Urinary Distress Inventory and the Pelvic Floor Distress Inventory (PFDI) and Pelvic Floor Impact Questionnaire (PFIQ) [33,34]. The PFBQ evaluates USI, urinary urgency, urinary frequency, UUI, POP, dysuria, dyspareunia, defecatory dysfunction, fecal incontinence, and the disability generated by the respondent. It consists of 9 items, each scored from 0 to 5, the total score of the questionnaire being between 0 and 45 points, where 0 indicates no discomfort and 45 indicates greater disability. The total score of the questionnaire is multiplied by 20 to obtain a result from 0 to 100 [25].

Synthesis of Results and Data Extraction
To gather information on the structural characteristics and psychometric properties of each of the versions of the PFBQ, an analysis of the different versions of this questionnaire was carried out.
The methodological quality of each of the versions of a measurement property was analyzed using the risk of bias checklist from the guide of Standards for the selection of health Measurement Instruments (COSMIN) [35,36] whose objective is to facilitate the selection of high-quality PROMs for research and clinical practice [36].
The structural characteristics extracted from each version were: title, self-report, year of publication, version, population, sample size, age, sex, characteristics, environment, geographic location, target population, number of subjects -pilot phase, number of subjects per item. On the other hand, the results of the psychometric properties extracted were: test-retest, internal consistency, construct validity, and criterion validity. Subsequently, according to the updated criteria for a good measurement property, the result of each version is evaluated individually for each measurement property and rated as sufficient (+), insufficient (-), or undetermined (?) [36,37]. Finally, the evidence is summarized, and the quality of the evidence is graded according to the approach of Grades of Recommendation, Assessment, Development, and Evaluation (GRADE) [36].

Results
After the initial search performed in Pubmed, Scopus, Web of Science, Dialnet, Sci-enceDirect, and CINAHL databases, as shown in the flow chart of the selected studies ( Figure 1), a total of 445 results were found. Excluding duplicates and after selection of papers by title and abstract, 199 were selected, from which 187 were excluded and 12 fulltext papers were selected for eligibility, of which eight of them did not meet the inclusion criteria, did not present the results conclusively, did not include a validation phase, or were not a cross-cultural adaptation of PFBQ. Finally, a total of four versions adapted and validated in languages other than the original were selected: Arabic [28], Chinese [29], Turkish [27], and Portuguese-Brazilian [26]. Then, the structural characteristics of each one of them were analyzed (see Table 1). In the Arabic version [28] some changes and/or words were introduced to make them culturally acceptable in items 6, 7, and 8. As for item 9, the question "is sexually active" had to be rephrased to "has sexual relations with her husband or male partner". Table 2 shows the data corresponding to the psychometric properties of each questionnaire, such as: test-retest, internal consistency, construct validity, and criterion validity.

Structural Validity
None of the versions analyzed assessed the structural validity of the PFBQ except for the Turkish version which included confirmatory factor analysis. The factor analysis tests whether the items of a questionnaire can be classified into different dimensions. When determining the number of dimensions, measurements were taken with a value greater than 1, which subdivided the PFBQ into four dimensions [27].

Internal Consistency
Of the four adaptations of the PFBQ, only Liu et al. [29] and Peterson et al. [26] included this measure using Cronbach's alpha where values ≥0.70 indicate good internal consistency and values <0.70 are considered low consistency [36]. The Chinese version [29] scored 0.677 and the Portuguese-Brazilian version [26] 0.625, both of which were of low consistency. In addition, only the Turkish version [27] considered the structural validity taken as a requirement in the COSMIN guidelines [31] to determine the level of internal consistency. Therefore, the rest of the versions were rated as "indeterminate".
On the other hand, neither the Turkish version [27] not the Arabic version [28] evaluated this property because each of the PFBQ questions is focused on a different characteristic, and internal consistency could not be calculated by comparing the scores of individual items with the total score, thus not contributing to the validity of the questionnaire.

Test-Retest Reliability
The four versions calculated test-retest reliability using the intraclass correlation coefficient (ICC) to determine both the total coefficient of the questionnaire and the coefficients by domains. ICC values greater than 0.7 are considered to have acceptable reliability [38]. All the coefficients calculated both globally and by domain exceed 0.7. In the case of the Arabic version, the global score is 0.7 [28], and Dogan et al. [27] obtained the highest score with an ICC = 0.998. The time determined between the first and second test was homogeneous between the Chinese, Turkish, and Portuguese-Brazilian versions, being approximately one week. However, in the Arabic version, a time span of 1-6 weeks was considered. Therefore, according to the criteria for a good measurement of properties, the reliability of all versions was rated as "sufficient".

Responsiveness
Only the Arabic version [28] and Portuguese-Brazilian version [26] include this measure, the ability of a PROM to detect changes over time in the construct to be measured. Both versions were considered not evaluable because they did not have a hypothesis defined by the review team. Therefore, the methodological quality was inadequate in both versions and the quality of the evidence was low.

Methodological Quality
Methodological quality was assessed according to the COSMIN guidelines criteria [39], which establish, among others, that: "very good" methodological quality requires 7 subjects per item in samples ≥100 people; "adequate" quality requires 5 participants per item in samples ≥100 or 6 subjects per item in samples <100; versions with 5 subjects per item in samples <100 will be rated as "doubtful"; "inadequate" methodological quality is reserved for studies with fewer than 5 subjects per item (see Table 3).

Quality of Evidence
In order to carry out the evaluation of the quality of evidence, data should be classified according to the GRADE approach [40] (high, moderate, low, low, very low evidence), which takes into account the risk of bias, inconsistency, imprecision, and indirectness (see Table 3).

Discussion
The objective of this review was to analyze the structural characteristics and psychometric properties, as well as to evaluate the methodological quality, quality of evidence, and good measurement properties of the included questionnaires and compare them with the original version [25]. A total of four versions of the PFBQ were included: Arabic [28], Chinese [29], Turkish [27], Portuguese-Brazilian [26].
Regarding the piloting phase, a total of 30 patients were included in the Turkish version [27], 10 patients in Portuguese-Brazilian [26] and Chinese versions [29], and 18 patients in the Arabic version [28], being comparable to this sample in the original version [25], administered to a total of 20 patients. On the other hand, in the validation phase, the version that included more patients was the Portuguese-Brazilian version [26] with a total of 147 patients, and the one with the smallest sample was the Chinese version [29] with 102 patients. The number of patients in the other versions was similar, with 130 in the Turkish version [27] and 130 in the Arabic version [28]. This number of study subjects included in the validation phase is comparable to that included in the original version [25]; 141 patients were included. In order for the questionnaire validation study to be rated as excellent according to the COSMIN guidelines, [31] in this case, the PFBQ consists of nine items, so all the validations of versions would be rated as excellent, since all of them consist of more than 90 sample patients.
Turkish version only [27] considered structural validity, which included confirmatory factor analysis, and the rest of the versions were rated as indeterminate. Internal consistency was only included in the Chinese [29] and Portuguese-Brazilian versions [26], both of which are of low quality, while the original version [25] obtained a high internal consistency.
Reliability was calculated for all versions [26][27][28][29] by the test-retest. However, the time elapsed between the patients' responses to the questionnaire rated them as sufficient. In the original version [25], test-retest reliability was very high and the time elapsed between questionnaire administration was one week.
The psychometric properties measurement error and hypothesis testing were not analyzed in any of the versions of the PFBQ nor in the original version [25]. The internal consistency was only analyzed in the versions of Liu et al. [29] and Peterson et al. [26], resulting in a low internal consistency, contrary to the original version [25] that had good internal consistency.

Strengths and Limitations of the Study
This is the first review to analyze the structural characteristics and psychometric properties of this questionnaire in different languages, being a tool used for the assessment of patients with pelvic floor dysfunction. However, the results of this review have limitations that should be taken into account for future versions of the PFBQ as only the Turkish adaptation considered the structural validity (taken as a requirement in the COSMIN guide) to determine the level of internal consistency. Moreover, none of the versions considered the measurement error.

Conclusions
The PFBQ, focused on the assessment of patients with pelvic floor dysfunction, has been adapted into four languages and each of these versions has criteria for good measurement properties rated as mostly sufficient, inadequate methodological quality, and low-doubtful quality of evidence, taking into consideration the COSMIN guidelines. Different validated instruments, with properties similar to each other and to those of the original questionnaire, are available internationally to health professionals, whether clinicians or researchers. The existence of psychometric properties assures us that the results of research or treatment using any version of the PFBQ are reliable and comparable with each other. We can conclude that the different versions of the PFBQ are valid for use among the Portuguese-Brazilian, Turkish, Arabic, and Chinese-speaking populations and that adaptation of the questionnaire to Spanish and other languages will be necessary for its use in other countries. The psychometric properties and structural characteristics collected in this review should be taken into account to improve future versions.