Translation, Cross-Cultural Adaptation, and Validation of the Japanese Version of the Patient Education Materials Assessment Tool (PEMAT)

Background: The Patient Education Materials Assessment Tool (PEMAT) systematically evaluates the understandability and actionability of patient education materials. This study aimed to develop a Japanese version of PEMAT and verify its reliability and validity. Methods: After assessing content validation, experts scored healthcare-related leaflets and videos according to PEMAT to verify inter-rater reliability. In validation testing with laypeople, the high-scoring material group (n = 800) was presented with materials that received high ratings on PEMAT, and the low-scoring material group (n = 799) with materials that received low ratings. Both groups responded to the understandability and actionability of the materials and perceived self-efficacy for the recommended actions. Results: The Japanese version of PEMAT showed strong inter-rater reliability (PEMAT-P: % agreement = 87.3, Gwet’s AC1 = 0.83. PEMAT-A/V: % agreement = 85.7, Gwet’s AC1 = 0.80). The high-scoring material group had significantly higher scores for understandability and actionability than the low-scoring material group (PEMAT-P: understandability 6.53 vs. 5.96, p < 0.001; actionability 6.04 vs. 5.49, p < 0.001; PEMAT-A/V: understandability 7.65 vs. 6.76, p < 0.001; actionability 7.40 vs. 6.36, p < 0.001). Perceived self-efficacy increased more in the high-scoring material group than in the low-scoring material group. Conclusions: Our study showed that materials rated highly on Japanese version of PEMAT were also easy for laypeople to understand and action.


Introduction
A variety of patient education materials, including pamphlets, web pages, videos, and smartphone apps, support patients in the medical field, helping them understand their conditions, make decisions, and communicate with their health care providers. However, studies show that patient education materials are often poorly understood by patients, especially those with limited health literacy [1,2]. Inadequate health literacy is associated with more limited disease control, medical adherence, and patient outcomes [1,3]. Therefore, patient-friendly materials, regardless of the reader's health literacy, are essential in improving health outcomes for patients.
To address the situation, we focused on the Patient Education Materials Assessment Tool (PEMAT), a reliable and valid instrument developed by the Agency for Healthcare Research and Quality (AHRQ) to evaluate the understandability and actionability of patient education materials [4,5]. Understandability refers to the extent to which consumers of diverse backgrounds can process and explain key messages [4]. Actionability refers to the degree to which consumers of diverse backgrounds and varying levels of health literacy can identify what actions they should take to improve their health, based on the presented information [4]. PEMAT is divided into two parts: PEMAT-P, a scale for printable materials (brochures and PDFs), and PEMAT-A/V, a scale for audiovisual materials (videos and multimedia materials including smartphone apps). The scores are calculated by taking the sum of the points, dividing by the total possible points, and multiplying by 100 to obtain a percentage. The developers have set the cutoff value for understandability and actionability at 70%.
Studies have used PEMAT to identify issues with patient materials. For example, Yiu et al. evaluated web-based education materials for patients taking non-vitamin K oral anticoagulants. The study revealed the need to include more summaries of information, visual aids, and tangible tools such as checklists [6]. PEMAT is also reported to be useful in developing or improving patient education materials. Jamil et al. developed an integrated diabetes-periodontitis nutrition and health education module using PEMAT [7].
PEMAT has been translated into Malay [8,9] and Korean [10], but a Japanese version has not yet been developed. Lee et al. created a cardiovascular disease-prevention material for Korean immigrants based on the Korean version of PEMAT and found that they could improve the understandability and actionability of the material [10]. Therefore, we believe it is essential for PEMAT to be deployed in multiple languages and for findings on understandability and actionability to be accumulated. Furthermore, to our knowledge, there are no validated tools to assess whether the material is understandable and actionable in Japanese. Therefore, our study aimed to translate and cross-culturally adapt the PEMAT into Japanese and verify its reliability and validity.

Materials and Methods
The development of the Japanese version of the PEMAT consists of five steps. We translated the PEMAT in Step 1, examined content validity in Step 2, examined inter-rater reliability in Step 3, examined convergent validity in Step 4, and examined predictive validity in Step 5.

Stage 1: Translation of PEMAT into Japanese
We translated the PEMAT questionnaire and user's guide with the permission of AHRQ. The translation and cross-cultural adaptation process were carried out according to the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) Task Force [11] and Guidelines for the Process of Cross-Cultural Adaptation of Self-Report Measures [12]. Two translators whose native language is Japanese independently translated the original PEMAT into Japanese (T1 and T2). A health communication researcher (TO) and a physician (EF) reviewed T1 and T2, integrating them into the Japanese forwardtranslated version (T-12). Two translators whose native language is English independently back-translated T-12 into English (BT1 and BT2). The back-translators were neither aware of the concept of PEMAT nor involved in the forward translation. After making a comparison and integration of BT1 and BT2, the expert committee, including a health communication researcher (TO) and three medical professionals (physician (EF) and nurses (HU and HO)), created a final Japanese version of PEMAT.

Stage 2: Assessment of Content Validity by the Expert Panel
At a panel meeting of the experts, we determined whether the items apply to the Japanese cultural background, and whether the content is appropriate. The original PEMAT is intended for use by health professionals, health librarians, and other professionals who provide health and medical information to patients and the general public [4,13]. Following this, our study recruited experts with the attributes of potential PEMAT users for the content validity evaluation. The expert panel consisted of twelve members: medical specialists with clinical experience (two doctors, one nurse, one pharmacist, and two dieticians); nonmedical specialists (two patient advocates, one editor with experience in developing health and medical materials, two media professionals, and one health communication researcher with a background in education.) At the meeting, content validity was assessed based on: (1) relevance (whether the items were relevant to the constructs of interest in the particular population and context of use), (2) comprehensiveness (whether essential aspects of the construct were missing), and (3) comprehensibility (whether the items were understood by the raters as intended). When there were discrepancies among the panel of experts, we consulted with a third person (TK) for reconciliation. If issues or questions were not resolved at the meeting, we asked the developer of the original PEMAT (CB) for feedback.

Stage 3: Determining the Reliability of the Instrument
We tested the inter-rater reliability of the PEMAT using patient education materials written in Japanese. In this study, we selected materials on the primary prevention of common diseases for use by non-healthcare professionals to reduce the variability of materials outside the scope of what the PEMAT aims to measure. This was because if these materials are intended for patients with a particular disease, the message may differ significantly across different materials depending on the medical condition and associated complications. The eligibility criteria for the materials were as follows: (1) developed by academic societies, government offices, or non-profit organizations; (2) including any of the nine topics presented in Health Japan 21 (2nd edition) [13] such as nutrition and dietary habits, physical activities and exercise, rest and mental health, smoking, alcohol, dental health, diabetes, cardiovascular disease, and cancer; and (3) materials that could be downloaded for free from the Internet. We searched via the two most popular search engines in Japan, Google Japan [14] and Yahoo! Japan [15]. The search terms in Japanese were 'topic' (where 'topic' was any of the nine topics in Health Japan 21) AND 'pamphlet' OR 'leaflet' OR 'video' OR 'patients' OR 'explanation.' We then selected the first 100 written materials and the first 50 audiovisual materials from the search results.
The evaluators for this study comprised four experts: two physicians with more than 5 years of clinical experience (EF and RS); one dietitian without clinical experience (RI); and one health communication specialist with an educational background (RY). Two rounds of reliability testing were performed because of the low reliability found in the first round. In each round, the evaluators followed the guidance on the question items and evaluation methods using the Japanese version of the PEMAT User's Guide before evaluating the material. For the 100 PEMAT-P materials, the first 50 were evaluated by EF and RS and the second 50 by EF and RI. In addition, EF and RY evaluated 50 videos for PEMAT-A/V reliability verification. In the second round of reliability testing, the evaluators switched materials to ensure they did not evaluate the same materials as in the first round. The first 50 of the PEMAT-P materials were evaluated by EF and RS and the second 50 by EF and RI. In the second round of the PEMAT-A/V reliability testing, EF and RY evaluated 50 videos. Each material was assessed only once in the second round of reliability assessments reported in this study. Two evaluators independently judged the material and calculated the overall PEMAT score as a percentage. For all materials, the average scores from the two evaluation results were calculated. The materials with the highest and lowest average scores were selected as the high-and low-scoring materials to be presented to the general public, respectively.

Stage 4: Testing Convergent Validity with Readability Scores
We evaluated the materials included in Stage 3 with a readability scale to test convergent validity. EF proposed the text-readability measurement system "jReadability" [16,17], which is a web system for automatically evaluating the readability of Japanese text, as the developers of the original PEMAT recommended using readability evaluation tools to evaluate the readability of printed materials in addition to the PEMAT [18]. It has been demonstrated that the jReadability formula can predict the difficulty of a text with a high degree of accuracy. Furthermore, it has been shown that differences in readability can be detected even when analyzing data other than those used to create the readability formulas (the Japanese Language Proficiency Test) [19]. To assess readability, EF manually retrieved the text from the printable materials and transcribed the audio from the audiovisual materials. The text from the materials used in Stage 3 was then pasted to Microsoft Word, and any formatting elements that may interfere with the readability assessment (e.g., headings, symbols, author information, and references) were removed. The plain text from each material was assessed using the jReadability online readability calculator. This validated measure calculates readability based on the average length of sentences, the difficulty level of words, and the proportion of grammatical parts of speech and types of characters per sentence. Scores range from 0.5 to 6.4, and a high score indicates that the text is relatively easy to read. Scores of 5.5-6.4 indicate the text is very easy to read; 4.5-5.4 indicate easy; 3.5-4.4 is a neutral evaluation; 2.5-3.4 indicate the text is a little difficult to read; 1.5-2.4 indicate difficult; and 0.5-1.4 indicate that it is very difficult.

Stage 5: Assessment of Predictive Validity by Testing with the General Public
In this stage, we conducted an online survey to determine whether non-experts found the material with high/low PEMAT scores (from the expert evaluation in stage 3) easy/difficult to understand and take action from. The online survey consisted of two studies to test the validity of the PEMAT-P and PEMAT-A/V, one with a leaflet presentation and the other with a video presentation.

Participants
Study participants were recruited from registered monitors of an online survey company (Rakuten Insight). The survey company reported that approximately 2.2 million active monitors (who have logged into their registered accounts within 12 months) registered in the panel as of September 2022 [20]. Men and women who use Japanese as a native language were eligible to participate in the study. We solicited the monitors who met the age criteria described below to participate in the survey via email or push notification via the survey company and conducted a screening survey of all who agreed to participate. In the PEMAT-P study, participants aged from 18 to 69 years were included, and in the PEMAT-A/V part, participants aged from 60 to 79 years were included. This is because the age groups targeted by the materials used for intervention in PEMAT-P and A/V were different, as described below. Participants were excluded in the screening section if they had experience in health care or were restricted from practicing the action recommended in the materials due to illness or injury.
Participants were randomized into two groups using a central computerized random allocation system of the survey company. One group (high-scoring material group) viewed the material that was highly scored by experts in stage 3, while the other (low-scoring material group) viewed the low-scoring material. Participants were not aware of which group they were assigned. We asked participants about the content of the material to see if the participants had viewed the material properly. We also adopted a trap question that requires reading the question carefully. Participants who answered these questions incorrectly were excluded. We stopped recruiting participants when the number of valid responses reached the sample size.

Materials
For testing PEMAT-P, participants viewed leaflets that promote healthy eating habits. The PEMAT-P score of the leaflets was 100% for the high-scoring material group and 69.7% for the low-scoring material group. When testing PEMAT-A/V, we used videos on the topic of locomotive syndrome prevention for the elderly. Locomotive syndrome occurs in conditions with a high risk of motor function decline due to locomotive organ impairment [21]. The overall PEMAT-A/V score of each video was 85.4% (intervention group) and 25.0% (control group), respectively.

Measures
The survey company provided participants' gender and age, and participants responded to questions about their educational background, annual family income, occupation, marital status, and self-perceived health. Participants also answered questions about their baseline health literacy and perceived self-efficacy. Measuring the change in behavior before and after viewing the materials as an outcome would be ideal to verify whether materials rated highly on the PEMAT are more likely to support participants to take action. However, participants may find it difficult to take action (e.g., cooking healthy meals, going out for exercise) immediately after viewing the materials. It is also not feasible to measure behavioral implementation in an online survey. Bandura stated that, for behavior changes, it is vital to increase self-efficacy, which means confidence in carrying out the behavior and overcoming temptations that prevent change [22,23]. Self-efficacy is likely to change over a shorter period and improve as the stage of behavioral change progresses. We measured participants' self-efficacy to examine the predictive validity, hypothesizing the understandability and actionability scores assessed on the PEMAT predicted self-efficacy. Health literacy was measured using the 14-item health literacy scale for Japanese adults (HLS-14) [19]. Self-efficacy was measured by the Self-Efficacy Scale for Positive Eating Behavior [20] for PEMAT-P and the Home-Exercise Barrier Self-Efficacy Scale [24] for PEMAT-A/V.
After responding to these questions, participants viewed the relevant materials. They then rated how easy the material was to understand or take action from, on a scale from 1 to 10. They also responded to eight selected items in PEMAT (items 1,4,8,9,11,17,19, and 21) (see Tables 1 and 2). These items were asked in both the PEMAT-P and PEMAT-A/V studies and were relevant for all the presented materials. At the end of the survey, participants responded about their self-efficacy immediately after the intervention on a scale from 1 to 10. The participants scored the items as 1 if they completely disagreed with the content of the item and 10 if they completely agreed with it.

Statistical Analysis
Inter-rater reliability was used to assess the external consistency of the PEMAT using percentage agreement and Fleiss' kappa for two evaluators. Fleiss' kappa is an extension of the more commonly reported Cohen's kappa. However, Cohen's kappa requires that all materials are evaluated by the same evaluator, whereas Fleiss' kappa allows for two evaluators, chosen from a pool of potential evaluators [25]. We also calculated Gwet's AC1 [26] when low kappa values were observed despite a high percentage of agreement [27]. In addition, we calculated the IRR for the summary scales for understandability and actionability. As understandability and actionability scores are quantitative variables, we used Shrout and Fleiss' intraclass coefficient (ICC) to determine the reliability [28]. Inter-rater agreement was deemed poor (0), slight (0.01-0.20), fair (0.21-0.40), moderate (0.41-0.60), substantial (0.61-0.80), or almost perfect (0.81-1.0) [29]. Pearson's correlation coefficient was used to determine whether there was a correlation between the PEMAT understandability scores and jReadability scores.
As the questionnaire survey for the general public in this study was designed to measure the accuracy of the scale, we believe that it was crucial to assess the validity of the tool by including only those who correctly answered all questionnaire items. Therefore, we designed Stage 5 to conduct a per-protocol analysis. Sample size calculation was performed based on an effect size of 0.2 (Cohen's d) [22], a significance level of 0.05, and a power of 0.8. It was estimated that 394 participants per group were required. Differences between the control and intervention groups were evaluated using the two-sample t-test for age and the chi-square test or Fisher's exact test for sex, educational background, occupation, annual household income, marital status, and self-perceived health. Welch's t-test was used to compare understandability, actionability, and perceived self-efficacy between the two groups.
All p-values were two-sided, and p < 0.05 was considered statistically significant. All analyses were conducted with R version 4.0.3 (10 October 2020).

Content Validation
At the expert panel meeting, content validity was assessed based on relevance, comprehensiveness, and comprehensibility. The comments and revisions made by the expert panel are shown in Table S1. All items were deemed necessary by the expert panel, except item number 5. Consequently, the final Japanese PEMAT consists of 25 items and two scales, including understandability (18 items) and actionability (7 items). Details of the text for each item are presented in Tables 1 and 2. We used kappa coefficients for the evaluation of each item rather than the total score. For understandability and actionability scores, we calculated Fleiss's kappa and ICC. We used kappa coefficients for the evaluation of each item rather than the total score. For understandability and actionability scores, we calculated Fleiss's kappa and ICC.
The kappa range for the understandability items was 0.30-0.84 in PEMAT-P and 0.35-0.84 in PEMAT-A/V. For the actionability items, scores were 0.47-1.00 for PEMAT-P and 0.67-0.81 for PEMAT-A/V. Gwet's AC1 revealed strong agreement for both scales and material types. In both the PEMAT-P and PEMAT-A/V, understandability and actionability scores showed substantial to perfect reliability (ICC > 0.7) (Tables 1 and 2).
To examine the strength of influence of missing items (i.e., items that the evaluator scored as N/A) in the overall PEMAT evaluation, we prepared two scenarios for each missing item: a "best-case scenario" (i.e., "agree = 1" for the missing item) and "worst-case" scenario (i.e., "disagree = 0" for the missing item). The scores for the best-and worstcase scenarios were compared with the scores obtained by the original PEMAT scoring method. In addition, we calculated correlation coefficients between the ratings obtained by the original PEMAT scoring method and the best-case scenario scores and between ratings using the original PEMAT scoring method and the worst-case scenario scores. The distribution of scores across scenarios was nearly identical, confirming that this was a robust approach to manage missing items (Table S2).

Comparison of the PEMAT Understandability Scores and jReadability
We calculated an average readability score to examine the correlation with the PEMAT understandability scores. The average readability score was 2.7 (range 1.1-4.4) for PEMAT-P and 2.8 (range 0.8-4.0) for PEMAT-A/V. These scores indicate that the materials were at the 'upper intermediate' level, which can be understood by people who comprehend the language of daily life and some technical terms. There was a moderate positive correlation between the understandability scores and the readability score for printable materials (Pearson's r = 0.46; 95% CI, 0.27-0.62), and a weak positive correlation for audiovisual materials (Pearson's r = 0.33; 95% CI, 0.03-0.57).

Baseline Participant Characteristics
Participant recruitment and surveys were conducted from 18 to 22 June 2021. For PEMAT-P, out of 1526 randomized participants, we analyzed 400 in the high-scoring material group and 399 in the low-scoring material group. For PEMAT-A/V, of 1211 participants randomized, 400 in the high-scoring material group and 400 in the low-scoring material group were analyzed (Figure 1). Those who incorrectly answered the screening questions or did not complete the questionnaire were removed from the analysis. Study arm characteristics are described in Tables 3 and 4.

Self-Efficacy
In PEMAT-AV, perceived self-efficacy significantly increased in the high-scoring material group than in the low-scoring material group (increase in self-efficacy scores 2.18 vs. 1.46, p < 0.01). In PEMAT-P, the scores increased more in the high-scoring material group than the low-scoring material group; however, the difference did not reach significance (increase in self-efficacy scores 2.22 vs. 1.53, p = 0.14). (Table 6).

Main Findings
We developed and tested the reliability and validity of the Japanese version of PEMAT, a tool for assessing the understandability and actionability of patient education materials. The inter-rater reliability was moderate when measured by the kappa coefficient but showed a convincingly strong agreement when calculated with Gwet's AC1. In the development of the original PEMAT, the overall agreement was 69-90%, and Gwet's AC1 range was 0.56-0.86 (mean 0.74) [4]. In the reliability testing of the Malay version of PEMAT, Wong et al. evaluated 13 leaflets and 13 videos. They found that understandability of PEMAT-P had an agreement of 61.5-91.6%, and Gwet's AC1 was 0.26-0.97. For actionability, agreement was 69.7-98.3% and Gwet's AC1 was 0.394-0.980. For PEMAT-A/V, the agreement was 64.1-98.3% and Gwet's AC1 was 0.40-0.98 for understandability. The agreement was 79.5-91.5%, and the AC1 statistic was 0.40-0.93 for actionability [8]. Our study demonstrated that the reliability of the Japanese version of PEMAT is not considerably different from other language versions. In addition, when testing inter-rater reliability, our study included material on diverse topics: from recommendations for healthy living (having medical checkups, improving dietary and exercise habits) to secondary/tertiary prevention in patients with diabetes and cardiovascular disease. This suggests that the Japanese version of PEMAT can evaluate a wide range of health materials in the real world.
In the validation testing with the non-experts, the high-scoring material group rated higher than the low-scoring material group on the PEMAT-P and PEMAT-AV for all eight selected items. This result suggests that materials that medical professionals rated as easy to understand and act upon were validated using the Japanese version of PEMAT. At the time of the development of the original version of PEMAT, there were significant positive correlations between PEMAT-A/V actionability scores and consumer actionability scores [4]. However, there was no clear relationship between understandability as rated by the experts and non-experts' comprehension scores [4]. This may be attributed to the inadequate sample size of n = 47 for consumer testing. In our study, we were able to secure a sufficiently large sample size of nearly 800 participants for each of PEMAT-P and A/V and overcome the limitations of the original version. In addition, in the Japanese version of PEMAT, the increase in self-efficacy tended to be greater in the high-scoring material group than in the low-scoring material group. As self-efficacy is a predictor of behavior [23], materials that receive high ratings on the PEMAT may encourage individuals' behaviors. Studies have shown that health materials that are easy to understand and act upon may encourage their audiences to adopt healthier behaviors. Arterburn et al. found that understandable decision aids improved the quality of decision-making and reduced uncertainty about the treatment for bariatric surgery in obese patients [30]. Nagle et al. reported pregnant women who viewed a decision aid for prenatal testing of fetal abnormalities were more likely to make an informed decision than those who viewed the less informative material [31].
Nakayama et al. note that 85.4% of Japanese people have inadequate health literacy [32]. However, according to Yamamoto et al., the drug guides for patients written in Japanese are designed to be understandable for patients with at least a high school education level [33]. Thus, it is essential to create and improve materials so they are easy to understand and act on, regardless of individuals' health literacy. It is also important to improve the understandability and actionability of patient materials in order to communicate the findings of epidemiological and clinical studies to patients and improve patient outcomes. Evaluating and improving materials using the Japanese version of PEMAT may contribute to supporting behavior change in terms of health literacy.

Limitations of This Study
There are limitations to our study. First, although we did not use quantitative measures such as the content validity ratio (CVR) and content validity index (CVI) to evaluate the content validity, we followed the COSMIN methodology used to assess the content validity of PROMs [34] in the expert panel meeting. Second, the validation survey indicated whether the material was easy to understand, but did not measure whether individuals actually understood the information. This was due to the lack of novelty in the materials on eating behavior and exercise, making it impossible to create a comprehension test that would specifically tap knowledge of the content of the materials. Therefore, it is desirable to have non-healthcare professionals quantitatively evaluate materials on a specific disease to measure comprehension and understandability in PEMAT. Third, we performed two rounds of inter-rater reliability evaluation. The modifications made to the User's Guide between the first and second rounds were important to increase the reliability and usability of the scale. In addition, we could not assess concurrent validity because we could not find validated tools that were similar to PEMAT available in Japanese. However, in the domain of understandability, we observed a moderate positive correlation with jReadability, which has criterion-related validity with the Japanese Language Proficiency Test, supporting the comprehensibility of the Japanese version of PEMAT. Lastly, although we assessed perceived self-efficacy, we could not measure actual behavior. Future research is needed to measure actual outcomes in terms of behavioral change by following up with participants for some time after the intervention.

Conclusions
The Japanese version of PEMAT developed in this study is the first reliability-validated tool for assessing the patient-friendliness of patient education materials created in Japanese. This instrument indicated that, as with the original version of PEMAT, the materials that experts rated as easy to understand and act upon using the Japanese version of PEMAT were also easy for laypeople to understand and act on. The Japanese version of PEMAT enables medical professionals to select more understandable and actionable patient education materials. It also allows for them to develop and improve patientfriendly materials, ultimately encouraging patients to practice self-management and healthy behaviors. (The Japanese version of the PEMAT is available free of charge for use in noncommercial projects.) Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ijerph192315763/s1; Table S1: Comments from the expert panel; Table S2: Influence of missing items.  Informed Consent Statement: Informed consent was obtained from all participants involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.