Research Quality of Clinical Trials Reported for Foods with Function Claims in Japan, 2023–2024: Evaluation Based on a Revised Tool to Assess Risk of Bias in Randomized Trials

Background: The Foods with Function Claim was introduced in Japan in April 2015 to make more products available that are labeled with health functions. A product’s functionality of function claims must be explained by the scientific evidence presented in clinical trials (CTs) or systematic reviews, but the quality of recent CTs is unclear. The purpose of this study was to evaluate the risk of bias (RoB) using “a revised tool to assess risk (RoB 2)” published in 2018 for notifications based on all recent CTs published on the Consumer Affairs Agency website. Methods: A total of 38 submitted papers based on CTs that were published on the Consumer Affairs Agency website during the period from 1 January 2023 to 30 June 2024 were eligible. The RoB 2 tool provides a framework for considering the risk of bias in the findings of any type of randomized trial. This tool with five domains was used to evaluate the quality of research methods. Results: Eligible CTs were assessed as “low risk” (11%, n = 4), “medium risk” (13%, n = 5), and “high risk” (76%, n = 29). A number of highly biased papers were published. Bias occurred in all five domains, especially “bias in selection of the reported result (Domain 5)”, which was the most serious (“high risk”; 75%). For elements correlated with RoB, there was no significant difference (p = 0.785) in the RoB 2 score between for-profit and academic research in the author’s affiliated organization. There was no significant difference (p = 0.498) in the RoB score between the published year categories of 2000–2019 and 2020–2024, and no significant difference (p = 0.643) in the RoB score between English and Japanese language publications. Conclusion: Overall, the quality of the latest CTs submitted after 2023 was very low, occurring in all five domains, and was most serious for “bias in selection of the reported result (Domain 5)”.


Introduction
In April 2015, Japan introduced Food with Function Claims (FFCs), a new type of food health claim to make it easier for consumers to obtain safe products with specific health functions clearly labeled [1].Under the FFC system, manufacturers are permitted to sell their products by submitting to the Secretary-General of the Consumer Affairs Agency (CAA) claims that are expected to provide specific health functions and the scientific basis for these claims.Although the CAA does not evaluate the safety or functionality of notified products, applicants are required to complete all procedures necessary for notification [2].All food products, including fresh produce, are subject to the FFC system.Prior to market entry (before at least 60 d), food business operators are required to submit information, such as food safety and functionality, to the system in place to collect information on adverse health effects and to the Secretary-General of CAA.The system is highly transparent, and all information submitted, including withdrawals and amendments, is made publicly available on the CAA website.In order to prove the functionality of a product, its scientific evidence needs to be described with positive results from either clinical trials (CTs), such as randomized controlled trials (RCTs), or systematic reviews (SRs).In fact, since the system's launch, 8461 cases have been reported as of 30 June 2024, of which 326 cases (4%) used CTs and 8115 cases (96%) used SRs [3].
However, one of the weaknesses of the FFC system is the adequacy and rigor of the scientific evidence.Given this concern, the CAA undertook national verification projects [4,5], and some scholar groups evaluated research methodologies [6][7][8][9][10] and reporting methods [11].
Regarding SRs, in 2016, the CAA established a working group of experts (SR methodologists) to verify whether submitted SRs are clearly described in accordance with the first edition of the PRISMA checklist [12].As a result, many SRs submitted had multiple items omitted or inadequate explanations due to an insufficient understanding of the required content [4].Academic researchers [6] conducted a quality evaluation of SRs using the first edition of 'assessment of multiple systematic reviews' (AMSTARs) [13] reports [10,12].They found very poor descriptions and/or the implementation of study selection, data extraction, search strategies, evaluation methodology for risk of bias (RoB), assessment of publication bias, and the formulation of conclusions based on the methodological rigor and scientific quality of the included CTs.Furthermore, a recent study [7] reported on the methodological quality of SRs evaluated by the AMSTAR 2 checklist [14].In that study, 40 SRs were randomly extracted based on eligibility criteria and recruitment procedures.Overall confidence for the SRs was rated as "High" (N = 0, 0%), "Moderate" (N = 0, 0%), "Low" (N = 2, 5%), or "Critically low" (N = 38, 95%).Registering the review protocol and using comprehensive search strategies are particularly common deficiencies.Additionally, RoB was insufficiently considered.
The CAA investigated 50 CTs reported in 2017 and found that many had inadequate protocols, unclear methods of RoB assessment, and conflicts of interest issues [5].One study [8] examined how accurately 33 RCTs described the 29 required items on the CON-SORT 2010 checklist [15] and concluded that an average of 13.8 out of 29 items (47.6%) had sufficient descriptions.A study in 2021 identified many studies in which it was unclear whether the protocols were followed in the CTs that had been submitted and also found issues with selective reporting and the intentional concealment of the intervention content (test foods) [9].A subsequent 2022 study by these same researchers reported that randomization, deviations from intended interventions, the measurement of outcomes, and selective reporting (in particular, RoB, including the lack of intention-to-treat [ITT] analysis, unknown compliance and multiple outcome tests) seriously damaged the study quality [10].A recent meta-epidemiological study evaluated the quality of RCTs facilitated by prominent contract research organizations in Japan and examined the quality of these representations used to convey their results to consumers [11].It was reported that approximately 72% of the RCT publications exhibited a high RoB due to selective outcome reporting, and ''spin'' appeared in 73% of press releases/advertisements due to selective outcome reporting.Thus, the reported CTs had many research methodological problems despite being published in academic journals.Research that has a high RoB is problematic because it is scientifically unsound and can mislead readers.
There have been previous studies that evaluated the quality of notified CTs using the "Revised Cochrane risk-of-bias tool for randomized trials (RoB 2)" [16].However, these were not exhaustive surveys, included many CTs published in older years, and did not investigate the relationship with other factors.
Therefore, the present study aimed to evaluate the RoB using RoB 2 for notifications based on all recent CTs published on the CAA website from 1 January 2023 to 30 June 2024 and to clarify the characteristics of the CTs themselves as well as other factors.

Eligibility and Exclusion Criteria (Target Article)
All submitted papers based on CTs published on the CAA website during the period from 1 January 2023 to 30 June 2024* were eligible.We set this period because we believe that the more recent CTs are of higher quality than earlier ones.Papers based on SRs and observational studies were excluded.Intervention studies (single arm) without a comparison group were excluded as a research design.If there were overlaps, such as multiple notifications using the same paper as proof of functionality, only the first paper was accepted.In SRs, authors must verify items that are not mentioned in a paper.However, in this study, we were focusing only on understanding whether there was bias in a target paper, so if the essential information was not stated in the paper, we did not confirm the contents with the authors.
*The health hazards detected in March 2024 that were caused by supplements manufactured and sold by KOBAYASHI Pharmaceutical Co., Ltd. have become a major social issue [17].As a result, our study period could be extended if the number of samples were extremely small due to delays in notifications to the CAA.Therefore, in the middle of our ongoing study, we revised the protocol to include a start time of 1 January 2023 instead of 1 January 2024.

Data Extraction Source
We downloaded target articles from the CAA website.

Data Item and Evaluation of Methodological Quality (RoB Score)
The RoB 2 used to evaluate the quality of research methods included (i) bias resulting from the randomization process, (ii) bias due to deviations from the intended intervention, (iii) bias due to missing outcomes, (iv) bias in measurement/evaluation, and (v) bias in the selection of reported results (for details of the method, refer to the original literature [16]).The RoB 2 tool provides a framework for considering RoB in the findings of any type of randomized trial.The evaluation procedure was carried out in accordance with the RoB 2 manual.The crossover trial was evaluated using the RoB 2 preliminary tool version [18].
Each domain was evaluated in three stages: "low risk", "medium risk (somewhat suspicious)", and "high risk".The criteria were implemented in accordance with the original published guidance [16,18].In accordance with the items covered in previous studies [9,10], for each targeted study, we also searched the affiliation characteristics of the first author (for-profit researcher or academic researcher), the year in which the paper was published, the language of the paper (English or Japanese), and the impact factor (IF) in 2022.The IF was assessed according to the Clarivate Analytics' gate (https://jcr.clarivate.com/)(Accessed on 1 June 2024).For journals that did not have an IF, we quantified the IF as 0.

Summary Scale
To compute the overall RoB 2 score and each domain score, low risk was quantified as 1, medium risk as 2, and high risk as 3.

Statistical Analysis
The RoB 2 score (1-3) and each domain score (1-3) were used as dependent variables.The author's affiliation (for-profit researcher or academic researcher), year of publication (before 2019 or after 2020), language characteristics (English or Japanese), and IF were used as explanatory variables.The three items mentioned above were tested using Fisher's exact test and the Kruskal-Wallis test.All statistical analyses were performed with SPSS Statistics 25.0 (IBM Corporation, Armonk, NY, USA).p-values less than 0.05 were considered statistically significant.

Protocol Registration
The present study's methodology (protocol) was established on 3 April 2024.The study was registered as UMIN 000,054,051 by the University Hospital Medical Information Network Clinical Trials Registry (UMIN-CTR)* in Japan (refer: https://center6.umin.ac.jp/cgi-open-bin/ctr/ctr_view.cgi?recptno=R000061712) (Accessed on 1 June 2024).However, UMIN-CTR could not register the contents of all protocols in the input settings, so the complete protocol was stored in an online cloud, which can be viewed from this link: https://1drv.ms/b/s!AoQmpnIHE3YUhNMaSGl3ydWdcwiwuA?e=gdFDVb (Accessed on 1 June 2024).
*UMIN-CTR is the largest CTR in Japan and joined the WHO registry network in October 2008.

Study Selection and Characteristics
Preliminary research identified 48 applicable publications, of which 38 met the eligibility criteria before final confirmation (Figure 1 and Supplementary Table S1).Eligible articles were published in 14 journals, and most (68%) were published in 2020-2024 (Table 1).The languages of eligible publications were English (55%) and Japanese (45%).According to the affiliation classification of the first author, for-profit research comprised 84%, and academic research comprised 16%.Seventy-three percent of journals had no IF.
The present study's methodology (protocol) was established on 3 April 20 study was registered as UMIN 000,054,051 by the University Hospital Medica mation Network Clinical Trials Registry (UMIN-CTR)* in Japan (refer: http ter6.umin.ac.jp/cgi-open-bin/ctr/ctr_view.cgi?recptno=R000061712) (Accessed on 2024).However, UMIN-CTR could not register the contents of all protocols in th settings, so the complete protocol was stored in an online cloud, which can be from this link: https://1drv.ms/b/s!A nIHE3YUhNMaSGl3ydWdcwiwuA?e=gdFDVb (Accessed on 1 June 2024).
*UMIN-CTR is the largest CTR in Japan and joined the WHO registry netw October 2008.

Study Selection and Characteristics
Preliminary research identified 48 applicable publications, of which 38 met th bility criteria before final confirmation (Figure 1 and Supplementary Table S1).E articles were published in 14 journals, and most (68%) were published in 2020-2024 1).The languages of eligible publications were English (55%) and Japanese (45%).A ing to the affiliation classification of the first author, for-profit research comprise and academic research comprised 16%.Seventy-three percent of journals had no I

RoB 2 Score
There was no significant difference (p = 0.785) between for-profit and acad search in the author's use of the RoB 2 score (Table 2).There was no significant di (p = 0.498) in the RoB score between the published year categories of 2000-2019 an 2024, and no significant difference (p = 0.643) in the RoB score between Eng Feature of RoB 2 score and each domain score.Note: All heat maps show "low risk" in green, "medium risk" in yellow, and "high risk" in red.

RoB 2 Score
There was no significant difference (p = 0.785) between for-profit and academic research in the author's use of the RoB 2 score (Table 2).There was no significant difference (p = 0.498) in the RoB score between the published year categories of 2000-2019 and 2020-2024, and no significant difference (p = 0.643) in the RoB score between English and Japanese language publications.Regarding the RoB score and IF, the Kruskal-Wallis test showed no significant difference (p = 0.312).

Each Domain Score
Concerning bias resulting from the randomization process, there was a significant difference (p = 0.018) in the RoB score between for-profit and academic research in the author's organization (Table 3).Also, there was a significant difference (p = 0.006) in the RoB score between the published year categories of 2000-2019 and 2020-2024 and a significant difference (p = 0.031) in the RoB score between English and Japanese language publications.There was no significant difference in the RoB score for IF (p = 0.989).Regarding bias due to deviation from the intended intervention, there was a significant difference (p = 0.002) in the RoB score between for-profit and academic research in the author's organization.In other items, there were no significant differences.
There was no significant difference in all items for bias due to missing outcomes, bias in measurement/evaluation, and bias in the selection of reported results.
All heat maps show "low risk" in green, "medium risk" in yellow, and "high risk" in red.

Discussion
This was the first study to evaluate CTs reported as scientific evidence of efficacy in the FFC system, using the RoB 2 tool, and identify associated factors.Unfortunately, the CTs tended to have high RoB.As in other healthcare fields, nutritional SRs are best conducted to synthesize data from CTs.Therefore, SRs that collect low-quality CTs cannot draw valid conclusions about food functionality.Such conclusions lead consumers to make poor decisions when purchasing products that claim to have certain functionalities.

Features of RoB on CTs
Bias is defined as a systematic error in study results and is caused by incorrect research methodology [19].In observational studies, such as cross-sectional, cohort, and case-control studies, it is well-known that confounding factors are biases that have a greater impact on outcomes.In this study, bias occurred in all five domains and was most serious for "bias in selection of the reported result (Domain 5)".Due to unclear descriptions of outcomes in the protocols, there were many inconsistencies in the outcomes reported in the articles.This finding is consistent with previous studies evaluating protocol compliance [9].Also, multiple outcome measures (e.g., scales, definitions, and time points) were utilized; thus, the multiplicity of outcome items was the most serious problem.If this bias was high, the appropriateness of the clinical trial itself could be seriously questioned.In particular, in addition to selective reporting, a new problem identified was that the content of the intervention (test food) was intentionally concealed.This problem was also previously reported in a review of funding for pharmaceutical industry studies [20].In the guidelines, it may be necessary to make it compulsory not to accept papers that do not have a detailed description of the protocol.
"Bias arising from the randomization process (Domain 1)" is a very common form of miss-reporting.In many papers, the specific randomization method was unclear, with "allocation sequence concealed".A study evaluating the RoB of 10,103 trials reported frequent random sequence generation that did not follow the instructions in the Cochrane Handbook [21].A related study suggested that the blinding of participants and personnel (performance bias) was also frequently not in line with the Handbook recommendations [22].These findings highlight the RoB domain as a potential pitfall in various kinds of CTs, and researchers should diligently avoid it, beginning at the planning stage.
In addition, "bias due to missing outcome data (Domain 3)" was high.Because CTs of health foods generally have a relatively short intervention period (most often 8-12 weeks [8]), it is necessary to analyze the ITT population or the full-set population.

Elements Correlated with RoB
The sub-research questions assessed in this study were as follows: "Is RoB correlated with author characteristics (for-profit corporate authors and academic authors), the year in which the paper was published, the relationship between English and other languages, and the IF".Overall, RoB 2 scores were not associated with the author affiliation, year of publication, language, or IF.However, academic researchers tended to be significantly more biased in two domains than for-profit researchers.The reason for this may be related to the fact that the year of a publication by an academic researcher is older than that of a for-profit researcher.In fact, in bias resulting from the randomization process, 75% of papers published in 2000-2019 were high risk compared with 23% of papers published in 2020-2024.This suggests that there are many defects in partial domains in older papers.

Impact on SRs
In the FFC system, approximately 95% of notifications are submitted using SRs [6,7].A recent study demonstrated that the quality of SRs is extremely poor [7].In fact, CTs on functional ingredients have been included as target papers in SRs.SRs must have little RoB in the reviewed CTs, so our findings bring into question the reliability and quality of SRs in the FFC system.When determining the credibility of study results by meta-analysis, it is very important to know whether only low RoB CTs are included or high RoB CTs are excluded.For example, a previous study that evaluated 59 SRs reported that only 50% of the SRs performed sensitivity analyses for low RoB CTs [23].
The CAA has issued updated guidelines [2] requiring all newly filed SRs (including updated SRs) to comply with PRISMA 2020 [24].When reporting an SR based on PRISMA 2020, the assessment of the certainty of the evidence is paramount for the final conclusion.Consequently, the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) theory is often followed [25].The GRADE method has already been adopted in the nutrition-and food-related fields [26][27][28].A methodological review of nutrition SRs [29] reported that among 800 SRs, 55 used the GRADE method; certainty of evidence has been downgraded mostly for the RoB (37.8%) in the SRs of RCTs.In the FFC-SR system, proper evaluation of the RoB of each CT is an absolute requirement for the implementation of an SR.Conducting RoB assessments to validate SR findings imposes high demands on reviewers' expertise as well as on resources such as time and cost.However, it is doubtful whether the appropriate evaluation method is rigorously implemented for SRs [6,7].The results of the present study reveal that, in addition to the low quality of the individual CTs, there are methodological problems with the SR research conducted by extracting the CTs, which is a fundamental problem for the FFC system.

Future Research Challenges to Improve the Quality of CT on the FFC
In the planning stage for conducting a CT, researchers need to carefully review some type of RoB 2 checklist [16,18], as well as the current reporting guidelines (i.e., SPIRIT 2013 [30], CONSORT 2010 statement [15,31], and CONSORT 2010 statement: crossover extension [32]) and take steps to avoid bias.The SPIRIT 2024 and the CONSORT 2024 guidelines are expected to be published soon for CT protocols and reporting methods [33], so any new CTs will need to comply.
We assume that in order to conduct a high-quality CT on food in the future, it will be necessary not only for the researchers themselves to improve their processes but also for a tripartite effort by the government, academic organizations/researchers, and the food industry will be needed (Figure 3).The frequency order of these domains that were particularly inaccurate or inappropriate was "selection of the reported results", "missing outcome data", and "randomization process", and a common problem was a lack of rigor in the items that needed to be reported.The CCA may need to clearly state in its guidelines that studies with a high RoB above a certain level will not be able to apply for the FFC system.It may be an effective measure for nutrition-and food-related journals to incorporate RoB into their editorial and review policies.Of course, journal editors should reject lowquality research to avoid future misunderstandings in the field.At this time, UMIN-CTR is not reviewing the contents of protocols or requesting corrections if there are any deficiencies.Identifying flaws at the protocol stage may help to avoid some of the problems of low-quality papers being published.We consider that academic researchers need to be more critical in peer-reviewing research papers, and they should promote an evaluation of clinical trial methodologies for food.Food industry associations should place importance on evidence that contradicts the sale of large numbers of functional foods and comply with high standards of research ethics.One way they could undertake this would be to institute self-regulation in order to reduce RoB.
Nutrients 2024, 16, x FOR PEER REVIEW 10 of 14 are expected to be published soon for CT protocols and reporting methods [33], so any new CTs will need to comply.We assume that in order to conduct a high-quality CT on food in the future, it will be necessary not only for the researchers themselves to improve their processes but also for a tripartite effort by the government, academic organizations/researchers, and the food industry will be needed (Figure 3).The frequency order of these domains that were particularly inaccurate or inappropriate was "selection of the reported results", "missing outcome data", and "randomization process", and a common problem was a lack of rigor in the items that needed to be reported.The CCA may need to clearly state in its guidelines that studies with a high RoB above a certain level will not be able to apply for the FFC system.It may be an effective measure for nutrition-and food-related journals to incorporate RoB into their editorial and review policies.Of course, journal editors should reject low-quality research to avoid future misunderstandings in the field.At this time, UMIN-CTR is not reviewing the contents of protocols or requesting corrections if there are any deficiencies.Identifying flaws at the protocol stage may help to avoid some of the problems of low-quality papers being published.We consider that academic researchers need to be more critical in peer-reviewing research papers, and they should promote an evaluation of clinical trial methodologies for food.Food industry associations should place importance on evidence that contradicts the sale of large numbers of functional foods and comply with high standards of research ethics.One way they could undertake this would be to institute self-regulation in order to reduce RoB.In the FFC system, if a positive result is obtained for a CT's functionality, it will be accepted as a notification of functionality.It is also true that the contents of CTs and their interpretations are difficult for the average consumer to understand.Since CTs often have low-quality notifications, it is necessary to correctly communicate this information to con-

Challenges in Building a Bridge with End Users (Consumers)
In the FFC system, if a positive result is obtained for a CT's functionality, it will be accepted as a notification of functionality.It is also true that the contents of CTs and their interpretations are difficult for the average consumer to understand.Since CTs often have low-quality notifications, it is necessary to correctly communicate this information to consumers in order to make appropriate purchasing decisions.Consumers probably have no idea what RoB means, so they will accept the results presented in an academic paper.One previous study pointed out that academic researchers, responsible authorities, and relevant government agencies need to work together to properly convey this information to consumers so that they can make appropriate purchasing decisions [34].
To achieve this aim, it may be necessary to (i) present the content of the CT in plain language; (ii) provide a method for academic researchers or responsible authorities to include easy-to-understand comments on the paper's RoB 2 evaluation; and (iii) develop human resources and create a system to further educate consumers on how to scientifically interpret these evaluations.

Limitations
There were several limitations to the present study.First, we only focused on CTs based on the notification to the FFC in Japan (a single country), so our findings may not necessarily be generalized to all CTs of healthy foods.In fact, about half of the articles in our study were written in Japanese.Second, since it targets the latest CTs, the 38 articles included in our study were a relatively medium sample size.Third, although there could be many other potential elements related to RoB, we only assessed the following four aspects: first author characteristics, published year, languages, and the IF.In other words, in the case of authors of companies, and where the published paper was old, a Japanese paper was not written in English, the IF was low, or there was no IF, it was assumed that the RoB was high.Fourth, we could not describe the results of a quality assessment based on the food business operator's real name (i.e., identified for a product) because of the potential risk of civil suits and other serious issues.However, target articles were listed as supplementary data and can be found online at: https://1drv.ms/b/s!AoQmpnIHE3YUhNsckPJfab7b9FJyeA?e=2WB4ff (Accessed on 1 July 2024).
Finally, a meta-epidemiological study suggested that many SRs did not adhere to the RoB 2 guidance because they applied the tool at the study level rather than at the outcome measure level [35].However, our study adapted the RoB 2 tool to evaluate the quality of CTs.Our RoB evaluation was performed by only one author (i.e., HK) who was fully experienced in the assessment of CT quality, which may have introduced some errors.A recent study using RoB 2 reported that the tool and its guidance are useful but resourceintensive and challenging to implement [36].In addition, this study notes that despite the extensive guidance, it was difficult to implement aspects of most domains.In other words, this suggests that it is difficult to make a judgment without more detailed explanations for each domain.A previous study estimated the time taken to apply the RoB 2 tool, finding that it was demanding with problematic reliability, and therefore, recommended the development of operational criteria specific to the review to improve implementation [37].A recent study summarized findings from other studies that evaluated the design and usability of RoB tools such as the Prediction model Risk Of Bias ASessment Tool (PROBAST), RoB2, Risk Of Bias In Non-randomized Studies of Interventions (ROBINS-I), the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2), and others [38].It reported that these evaluation tools have methodological limitations restricting the generalizability of their findings.As such, there are common challenges and limitations in any tools when evaluating RoB.

Conclusions
The quality of the most recent CTs submitted after 2023 under the Japanese FFC system is very low.In particular, there are three common biases in most CTs that have been

Figure 1 .
Figure 1.Flowchart of the trial process and study implementation.* Number of notificat Food with Function Claims (n = 42).** Additional information can be found at the followi https://1drv.ms/b/s!AoQmpnIHE3YUhNsckPJfab7b9FJyeA?e=2WB4ff (Accessed on 1 July 2

Figure 1 .
Figure 1.Flowchart of the trial process and study implementation.* Number of notifications for Food with Function Claims (n = 42).** Additional information can be found at the following URL: https://1drv.ms/b/s!AoQmpnIHE3YUhNsckPJfab7b9FJyeA?e=2WB4ff (Accessed on 1 July 2024).

Figure 2 .
Figure 2. Feature of RoB 2 score and each domain score.Note: All heat maps show "low ri green, "medium risk" in yellow, and "high risk" in red 3.3.Elements Correlated with RoB

Figure 3 .
Figure 3. Challenges to strengthen the research quality of food-related clinical trials.4.5.Challenges in Building a Bridge with End Users (Consumers)

Figure 3 .
Figure 3. Challenges to strengthen the research quality of food-related clinical trials.

Table 2 .
Elements correlated with RoB 2 score.

Table 3 .
Elements correlated with each bias domain.