Next Article in Journal
Prediction Analysis of Integrative Quality Zones for Corydalis yanhusuo W. T. Wang Under Climate Change: A Rare Medicinal Plant Endemic to China
Previous Article in Journal
Role of High-Fat Diet Alone on Lipids, Arterial Wall and Hippocampal Neural Cell Alterations in Animal Models and Their Implications for Humans
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Methodological Standards for Conducting High-Quality Systematic Reviews

1
Department of Medicine (DIMED), University of Padua, 35127 Padua, Italy
2
Institute of Anesthesia and Intensive Care, University Hospital of Padua, 35127 Padua, Italy
3
Department of Anesthesiology and Reanimation, Ondokuz Mayis University Faculty of Medicine, Samsun 55270, Türkiye
4
Department of Anesthesiology and Reanimation, Samsun University Faculty of Medicine, Samsun 55080, Türkiye
5
Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padua, 35127 Padua, Italy
*
Author to whom correspondence should be addressed.
Biology 2025, 14(8), 973; https://doi.org/10.3390/biology14080973 (registering DOI)
Submission received: 27 June 2025 / Revised: 27 July 2025 / Accepted: 29 July 2025 / Published: 1 August 2025
(This article belongs to the Section Theoretical Biology and Biomathematics)

Simple Summary

Systematic reviews are an essential way for researchers to collect and evaluate all the scientific studies on a specific topic, helping to make informed decisions in biology policy-making and in other settings. This article explains step-by-step how to plan, conduct and report a high-quality systematic review. It begins with developing a clear research plan, which prevents bias and improves transparency. It then describes how to search for studies in scientific databases, selecting relevant studies, and extracting important information from them. The article also highlights how to assess the quality and reliability of each study to avoid misleading conclusions. Depending on the data, researchers may combine results using statistical methods or summarize them in a clear, descriptive way. Finally, the article emphasizes the importance of assessing how certain we can be about the overall findings. This process ensures that knowledge is based on all available evidence, not just on a few selected studies, and supports better outcomes for society.

Abstract

Systematic reviews are a cornerstone of evidence-based research, providing comprehensive summaries of existing studies to answer specific research questions. This article offers a detailed guide to conducting high-quality systematic reviews in biology, health and social sciences. It outlines key steps, including developing and registering a protocol, designing comprehensive search strategies, and selecting studies through a screening process. The article emphasizes the importance of accurate data extraction and the use of validated tools to assess the risk of bias across different study designs. Both meta-analysis (quantitative approach) and narrative synthesis (qualitative approach) are discussed in detail. The guide also highlights the use of frameworks, such as GRADE, to assess the certainty of evidence and provides recommendations for clear and transparent reporting in line with the PRISMA 2020 guidelines. This paper aims to adapt and translate evidence-based review principles, commonly applied in clinical research, into the context of biological sciences. By highlighting domain-specific methodologies, challenges, and resources, we provide tailored guidance for researchers in ecology, molecular biology, evolutionary biology, and related fields in order to conduct transparent and reproducible evidence syntheses.

1. Introduction

Systematic reviews (SRs) are widely regarded as the highest form of evidence within the hierarchy of research designs. However, this position has been increasingly challenged, with some arguing that SRs should be conceptualized not as the apex of evidence-based practice, but rather as a methodological lens through which primary studies are critically appraised, synthesized, and interpreted [1]. From this perspective, SRs serve as structured tools that enable stakeholders to consume, evaluate, and apply the underlying evidence more effectively [1]. This holds true not only within the field of biology but across all areas of scientific research, where systematic reviews play a crucial role in synthesizing evidence and guiding practice, because well conducted SRs minimize bias and offer robust conclusions that inform clinical practice, public health policy, and further research endeavors.
However, not all systematic reviews are of high quality, as the reporting and methodological quality of SRs are inconsistent, although reporting guidelines and quality assessment tools exist [2].
In a survey of 102 SRs and meta-analyses in ecology and evolutionary biology (2010–2019), only ~16% referenced any reporting guideline, and those that did scored significantly higher on reporting quality metrics than the average [3]. Moreover, reviews dating back to the early 2000s highlighted that ecological meta-analyses often lacked essential methodological rigor—such as protocol registration, risk-of-bias assessments, or clear inclusion criteria—revealing a gap relative to established practices in clinical research [4] Poor methodological rigor can result in misleading conclusions, which in turn may negatively impact clinical decision-making and policy formulation. Therefore, understanding the correct methodology is essential not only for researchers but also for reviewers, editors and readers undertaking or interpreting systematic reviews.
Over recent decades, methodological standards for conducting SRs have become increasingly formalized, with the Cochrane Handbook for Systematic Reviews of Interventions widely regarded as the gold standard, particularly for intervention reviews in health care [5]. Although this manuscript addresses SRs across diverse fields and review types—including those beyond clinical interventions—the foundational methods outlined in the Cochrane Handbook remain highly influential and form the methodological bedrock for many review approaches.
This paper seeks to translate these principles into the biological context by addressing unique methodological challenges and incorporating domain-specific tools and resources. The aim of this paper is to offer a comprehensive overview of how to conduct a high-quality systematic review, with detailed discussions of each step based on internationally recognized guidelines and best practices.

2. Formulating the Research Question

The first and most fundamental step in a systematic review is to develop a clearly articulated research question [5]. A well-structured question serves as the foundation for the entire review process, guiding search strategies, study selection, and data synthesis. However, even if this is the most crucial step, a study found that 3 out of 10 studies would have needed a major rewording of the research question, as the stated questions were either too vague, inconsistent with the inclusion criteria, or misaligned with the actual objectives and findings [6].
In biology, and more generally in health sciences, the PICO framework and, more frequently, its extended version PICOS, is the most commonly employed tool to structure the research question. PICOS stands for the following: P: Population or patient group, I: Intervention or exposure, C: Comparator or control, O: Outcomes of interest, S: Study design. It has been shown that researchers using the PICO framework have better precision in retrieving relevant studies when querying databases compared to a free keywords strategy [7].
An even more comprehensive variation is PICOTS, even if rarely used, which adds Timeframe (T) to explicitly define the duration over which outcomes are assessed. This is particularly important for ecological or physiological outcomes that vary significantly over time. For instance, if assessing the impact of habitat restoration on bird population density, combining studies measuring outcomes after one breeding season with those measured after five years would likely yield misleading conclusions due to ecological lag effects.
Below three examples of PICOTS questions are presented.
(1)
In degraded tropical forest ecosystems, does reforestation with native tree species, compared to natural recovery without intervention, lead to greater increases in species richness and abundance of native fauna, based on long-term field observational studies?
Population (P): Degraded tropical forest ecosystems, defined as forests with >50% canopy loss, located within the Amazon Basin (including Brazil, Peru, Colombia), based on WWF ecoregion classifications.
Intervention (I): Reforestation efforts involving the planting of native tree species, defined as species naturally occurring in the respective ecoregion, excluding exotics and non-native cultivars.
Comparator (C): Natural recovery without active planting or management interventions, defined as passive regrowth following disturbance.
Outcome (O): Change in species richness (number of native plant and animal species per hectare, measured via standardized biodiversity surveys) and abundance of native fauna (total counts of individuals per species), assessed separately.
Timeframe (T): Minimum of 5 years post-intervention to capture long-term ecological recovery trajectories.
Study design (S): Prospective or retrospective observational studies with longitudinal monitoring, excluding short-term experiments (<1 year).
(2)
In Pseudomonas aeruginosa cultures, does exposure to sub-lethal concentrations of ciprofloxacin, compared to no antibiotic treatment, reduce biofilm formation in in vitro experimental studies?
Population (P): Clinical isolates of Pseudomonas aeruginosa collected from hospitalized adult patients (>18 years) in tertiary care hospitals.
Intervention (I): Use of antibiotic A (e.g., ciprofloxacin) administered according to standard dosing guidelines.
Comparator (C): Use of antibiotic B (e.g., ceftazidime) or no antibiotic treatment (supportive care only).
Outcome (O): Antibiotic resistance development, defined as ≥4-fold increase in minimum inhibitory concentration (MIC) measured by broth microdilution assays.
Timeframe(T): Resistance assessed at baseline and after a minimum treatment period of 7 days.
Study design (S): Randomized controlled trials or observational cohort studies with appropriate resistance-testing protocols.
(3)
In Arabidopsis thaliana, how does exposure to salicylic acid, compared to untreated controls, influence the expression of pathogen-resistance genes, based on transcriptomic data from time-course experiments?
Population (P): Laboratory-grown Arabidopsis thaliana ecotype Col-0 plants at the rosette stage (4 weeks post-germination), cultivated under standardized photoperiod and temperature conditions.
Intervention (I): Treatment with 1 mM salicylic acid (SA), applied via foliar spray, simulating immune signaling induction.
Comparator (C): Untreated control plants sprayed with distilled water under identical conditions.
Outcome (O): Differential expression levels of key defense-related genes (e.g., PR1, NPR1, WRKY70) quantified using RNA-seq and normalized counts (TPM/RPKM), with validation by qRT-PCR.
Timeframe (T): Gene expression measured at baseline (0 h), 1 h, 6 h, and 24 h post-treatment to capture temporal transcriptional dynamics.
Study Design (S): Controlled in vitro experiments with randomized assignment and technical triplicates, using at least two independent biological replicates per time point.
While PICO is a strong starting point for many clinical and intervention-based studies, exploring these alternatives can lead to a more focused and effective research question for other contexts. For example, “SPIDER” (sample, phenomenon of interest, design, evaluation, research type), is designed specifically to identify relevant qualitative and mixed-method studies [8]; however, its use is more widely limited than the PICOS framework [9] and is exemplified as follows:
(4)
How do adult members of Indigenous communities living adjacent to protected wildlife reserves in the Amazon Basin perceive the social and ecological impacts of community-based conservation programs, as explored through in-depth qualitative interviews? S (Sample): Adult members (≥18 years old) of Indigenous communities residing within 10 km of protected wildlife reserves in Brazil’s Amazon Basin (e.g., Kayapó, Yanomami territories); PI (Phenomenon of Interest): Perceptions and lived experiences related to community-based conservation programs, including views on biodiversity, land use, and cultural autonomy; D (Design): Qualitative studies using semi-structured or in-depth interviews, ethnographic methods, or participatory focus groups; E (Evaluation): Thematic data describing perceived benefits (e.g., increased wildlife), concerns (e.g., reduced hunting access), and trust in conservation agencies; R (Research type): Qualitative studies only (excluding quantitative surveys or mixed-methods designs unless qualitative data are reported separately and in detail).
(5)
How do early-career molecular biology researchers perceive the use of CRISPR-Cas9 gene editing technologies in basic versus translational research contexts, as explored through qualitative interviews? S (Sample): Early-career researchers (PhD students and postdoctoral fellows with ≤5 years of experience) working in molecular biology laboratories across research institutions in the United States and Europe; PI (Phenomenon of Interest): Perceptions, ethical concerns, and decision-making processes related to the use of CRISPR-Cas9 gene editing in both basic research (e.g., model organism studies) and translational/clinical applications (e.g., gene therapy); D (Design): Qualitative studies using semi-structured interviews, online focus groups, or ethnographic observation in lab settings; E (Evaluation): Thematic data describing perceived benefits (e.g., research acceleration, therapeutic potential), risks (e.g., off-target effects, ethical misuse), and institutional pressures (e.g., funding priorities, publication expectations); R (Research type): Qualitative studies only, excluding mixed-method studies unless the qualitative data are clearly reported and analyzed separately.
Several alternatives to the PICOS and SPIDER framework exist for the structuring of research questions, especially for qualitative and mixed-method studies [10]. SPICE (Setting, Perspective, Intervention, Comparison, Evaluation) focuses on the context and is useful in social research—for example, assessing how farmers in arid regions perceive agroforestry compared to monoculture. PEO (Population, Exposure, Outcome) simplifies qualitative inquiries, e.g., exploring the experiences of fishers affected by marine protected areas. For policy and service evaluation, ECLIPSE (Expectation, Client group, Location, Impact, Professionals, Service) frames questions around service improvement, such as evaluating community expectations of local conservation programs. CIMO (Context, Intervention, Mechanism, Outcome) suits organizational or environmental management research, e.g., examining how community-led wetland restoration (intervention) in urban areas fosters stewardship (mechanism) and improves biodiversity (outcome).

3. Developing a Protocol

Developing and pre-registering a protocol serves two key purposes [11]. First, it provides structured guidance for each stage of the review, ensuring that all team members understand the workflow and process. Second, it acts as a safeguard against methodological bias by promoting transparency and accountability. Several platforms are available for protocol pre-registration. The most widely recognized is PROSPERO [12], a public registry specifically dedicated for systematic reviews. Another widely used option is the Open Science Framework (OSF), which supports the pre-registration of a broader range of research projects beyond systematic reviews. Additionally, researchers may consider publishing their protocol as an article in a peer-reviewed journal that accepts protocol submissions, which provides the added benefit of formal peer review and greater visibility.
Some registries guide researchers by requiring them to complete structured fields that cover the key aspects of the systematic review. Others allow researchers to simply upload a study protocol without predefined fields. The first approach is better suited to inexperienced systematic review authors, as it reduces the risk of producing an incomplete protocol that may omit essential methodological details. Regardless of the format, certain critical information should always be provided: the objectives and research questions, eligibility criteria, information sources (such as databases and registries), detailed search strategies, the study selection process, the data extraction plan, strategies for assessing risk of bias, and methods for data synthesis—including plans for meta-analysis or narrative synthesis. Additionally, any intended subgroup or sensitivity analyses should be specified if applicable.
In relation to minimizing the risk of bias, it is important to emphasize that a poorly detailed or vague protocol carries similar risks to conducting a systematic review without a protocol at all [13] and pre-registering the protocol helps reduce the risk of selective inclusion of studies or outcome data, which could otherwise lead to overestimation or underestimation of the systematic review’s results [14]. Inadequate protocol transparency may lead readers to underestimate the potential for bias, thereby limiting the critical appraisal of the review’s methodological rigor.

4. Formulating the Search Strategy

A comprehensive search strategy aims to capture all relevant studies [15]. While the PICOS model is fundamental in formulating the research question, it should not be directly applied in full when designing the search strategy. In fact, evidence suggests that incorporating all elements of PICOS—particularly the comparator and outcomes—can reduce the accuracy of search results, as these elements are often inconsistently reported in titles and abstracts [16], and current guidelines generally recommend that search strategies focus on the population, intervention(s), and study design [17].
The Cochrane Handbook for Systematic Reviews of Interventions [5] advises searching multiple databases to minimize the risk of missing relevant studies. Commonly used databases include MEDLINE/PubMed, EMBASE, Scopus, Cochrane Central Register of Controlled Trials (CENTRAL), CINAHL, PsycINFO, CABI and BIOSIS. However, this list is not exhaustive, and researchers are encouraged to expand their search to as many relevant databases as resources allow. In addition to these general databases, it is essential to highlight biology domain-specific resources, such as GenBank for genetic sequences, BOLD Systems for DNA barcoding data, and GBIF for biodiversity records. Furthermore, repositories, like Dryad, BioStudies, and Zenodo, facilitate open data sharing and promote transparency and reproducibility in biological research. Utilizing these resources helps researchers access diverse datasets, essential for addressing complex biological questions across molecular biology, ecology, and evolutionary studies.
It is crucial for researchers to recognize that each database has its own controlled vocabulary and search syntax. While a detailed explanation of database-specific search languages falls outside the scope of this article, search strings typically combine database specific controlled vocabulary (e.g., MeSH terms) with free-text keywords. Boolean operators (AND, OR, NOT) and truncation symbols are employed to refine searches. Furthermore, to ensure transparency and reproducibility, the exact search strategy used for each database—including the date the search was conducted—should be reported, typically as supplementary material.
Additional strategies should be employed to maximize the retrieval of relevant articles for a systematic review. These include searching the grey literature, such as theses, dissertations, conference proceedings, and clinical trial registries (e.g., ClinicalTrials.gov, WHO ICTRP). Citation tracking—both backward (screening the reference lists of included studies) and forward (identifying articles that have cited the included studies, often referred to as snowballing)—is another effective approach. Furthermore, contacting the authors of included studies can help identify unpublished or ongoing studies that may otherwise be missed.
Other recommended strategies include searching preprint servers, consulting subject-specific repositories, and exploring regulatory agency reports or industry submissions, where applicable. All these supplementary search methods should be clearly outlined in the pre-registered protocol.
Another interesting aspect that warrants discussion is the use of AI-driven platforms to enhance the efficiency and accuracy of the systematic review process [18]. These technologies can assist with tasks such as formulating the search strategy, de-duplication, screening, data extraction, and even risk-of-bias assessment. When used appropriately, these tools can streamline workflows, reduce reviewer burden, and improve consistency—though human oversight remains essential to ensure methodological rigor. However, despite ongoing research and promising developments, there remains uncertainty regarding the full scope of their future applications and limitations. Moreover, the use of AI in systematic reviews raises important ethical concerns, including transparency, accountability, and the potential for bias embedded in algorithmic decision-making.

5. Study Screening and Selection

Study screening is typically conducted in two stages: (1) title and abstract screening, and (2) full-text screening. At each stage, screening decisions should be based on predefined inclusion and exclusion criteria, which are developed from the review’s PICOTS (or equivalent) framework, but are not identical to it.
While PICOTS helps structure the research question, inclusion and exclusion criteria must translate that question into concrete, testable conditions. For example, if the “population” in PICOTS is “tropical forest ecosystems with >50% canopy loss,” the inclusion criteria must further specify what qualifies as “tropical,” and which geographic regions are eligible. Similarly, if the study design is limited to “observational studies with ≥5 years of follow-up,” the exclusion criteria must rule out short-term experiments, cross-sectional studies, or studies without clear follow-up durations.
During title and abstract screening, reviewers should generally err on the side of inclusion—retaining any study that potentially meets the criteria. This stage is intended to be sensitive, rather than specific. Full-text screening, in contrast, is more stringent and is used to exclude studies that do not fully meet the eligibility criteria. This step requires careful judgment and often necessitates pilot screening to calibrate reviewer decisions.
To minimize the risk of bias, this process should be performed independently by at least two reviewers, with disagreements resolved by consensus or by a third reviewer. Emphasizing a double-screening approach is crucial, as evidence indicates that single-reviewer screening is associated with the omission of a significant number of eligible studies [19].
However, manually screening hundreds—or even thousands—of records is highly resource-intensive and, in some cases, may become impractical. To address this challenge, various web-based screening tools have been developed to improve efficiency and manageability. Popular examples include Abstrackr [20], Rayyan [21], Covidence [22], and EPPI-Reviewer [23], which facilitate collaborative screening, streamline decision tracking, and reduce reviewer workload without compromising methodological rigor.
The overall results of the screening and selection process are typically summarized using the PRISMA flow diagram (Figure 1), which visually documents each step of the process.
To improve the overall quality, researchers are encouraged to provide, as supplementary material, a dataset listing all studies retrieved, screened, included, or excluded, along with detailed reasons for exclusion at each stage of the selection process.

6. Data Extraction

Data extraction is the systematic collection of information from included studies [24]. This step should be guided by a predefined data extraction form, often piloted on a subset of studies to ensure clarity and consistency. Essential data typically extracted include study identifiers (authors, year, journal), study design (e.g., RCT, cohort, case-control), population characteristics (e.g., sample size, demographics), intervention and comparator details, outcomes measured and timepoints, results (e.g., effect sizes, confidence intervals), funding sources, and potential conflicts of interest Similarly to the study screening and selection process, data extraction should be performed independently by at least two researchers, as evidence indicates that relying on a single extractor can lead to 21% more errors compared to a double-extraction approach [25]. Although this method nearly doubles the time required compared to a single-extractor approach, the substantial improvement in accuracy and reduction in errors justifies the additional effort [25]. Readers should be aware that various tools, such as EPPI-Reviewer, Covidence, and Excel-based templates, can support data extraction.

7. Assessing Risk of Bias

Evaluating the methodological quality of included studies is a cornerstone of a rigorous systematic review, as it directly influences the reliability and validity of the review’s conclusions. Several standardized tools have been developed to assess the risk of bias in different types of studies (Table 1) [26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41].
For randomized controlled trials (RCTs), the most widely recommended tools are the Cochrane Risk-of-Bias 2 (RoB 2) tool [27] and the Jadad scale [28], with RoB 2 offering a more comprehensive, domain-based assessment. For non-randomized studies of interventions, tools such as ROBINS-I (Risk Of Bias In Non-randomized Studies of Interventions) [29] and ROBINS-E (for exposures) [30] are commonly used, alongside the older but still widely applied Newcastle–Ottawa Scale (NOS) [31].
For diagnostic accuracy studies, the QUADAS-2 tool is the standard [32]. When dealing with qualitative studies, researchers frequently employ the CASP (Critical Appraisal Skills Programme) checklists [33]. For preclinical animal studies, the SYRCLE Risk-of-Bias tool is specifically designed to capture methodological concerns relevant to laboratory in vivo-based research [34].
Each tool is structured around a set of domains that reflect potential sources of bias relevant to the study design. While the specific domains vary between tools, they commonly assess areas such as the following:
(1)
Selection bias (e.g., randomization, allocation concealment)
(2)
Performance bias (e.g., blinding of participants and personnel)
(3)
Detection bias (e.g., blinding of outcome assessors)
(4)
Attrition bias (e.g., incomplete outcome data)
(5)
Reporting bias (e.g., selective outcome reporting)
The assessment of each domain typically involves assigning a judgment categorized as low risk, high risk, or unclear risk of bias, often visually represented by colors (e.g., green for low risk, yellow for unclear, and red for high risk). Some tools, such as RoB 2 and ROBINS-I, generate an overall risk-of-bias judgment for each study based on domain-level assessments, while others, like the Newcastle–Ottawa Scale, provide a score without a formal overall risk category. The results of risk-of-bias assessments are commonly presented in summary tables or graphical displays, such as traffic light plots or weighted bar charts. However, to enhance transparency and allow readers to fully understand how judgments were made, researchers should also provide detailed justifications for each risk-of-bias judgment assigned to each domain for every included study. Ideally, these justifications should be made available in the main text, appendices, or as supplementary material. The process should be conducted independently by at least two reviewers to minimize subjective judgment and ensure reliability.

8. Data Synthesis–Quantitative Synthesis (Meta-Analysis)

When a group of studies addressing the same research question are sufficiently similar in terms of participants, interventions, comparators, and outcomes, it is often appropriate to conduct a meta-analysis. Meta-analysis is a statistical technique that combines the results of individual studies to produce a single pooled estimate of the effect size, thereby increasing the overall statistical power and providing a more precise estimate than any single study alone. This approach also allows researchers to explore variability among study results and can offer insights into patterns that may not be apparent from individual studies.
A fundamental aspect of meta-analysis is the choice of the appropriate effect measure. Commonly used effect measures include the risk ratio (RR), odds ratio (OR), and hazard ratio (HR) for dichotomous outcomes, while continuous outcomes are often summarized using the mean difference (MD) or standardized mean difference when different scales are used. Selecting the correct effect measure depends on the nature of the outcome data and the clinical question. Another critical consideration is the degree of heterogeneity among the included studies. Heterogeneity refers to the variation in study outcomes beyond what would be expected by chance alone. It can arise from differences in study populations, interventions, outcome definitions, or study quality. The most commonly used statistic to quantify heterogeneity is the I2 statistic, which represents the percentage of total variation across studies due to heterogeneity rather than chance. According to guidelines proposed by Higgins and colleagues (2003) [41], I2 values of approximately 0–40% might not be important, 30–60% may represent moderate heterogeneity, 50–90% may represent substantial heterogeneity, and 75–100% considerable heterogeneity. The importance of the observed value of I2 depends on the magnitude and direction of effect and strength of evidence of heterogeneity (for example p-value from the chi square test); however, the interpretation of heterogeneity is complex but important and should always be explored whenever possible (even through sensitivity analysis). It is important to remember that the uncertainty in the I2 is substantial when the number of studies is small.
When potential sources of heterogeneity are known or suspected—such as differences in populations, interventions, or study quality—sensitivity analyses should be conducted. These involve re-running the meta-analysis after excluding specific subsets of studies (e.g., low-quality studies, studies with extreme effect sizes) to assess how robust the overall findings are, and they are helpful in potentially identifying the source of heterogeneity.
Regarding model choice, two primary approaches exist: fixed-effect and random-effects models. Fixed-effect models operate under the assumption that all included studies estimate the same underlying effect size, and observed differences are due solely to sampling error. In contrast, random-effects models acknowledge that true effect sizes may vary across studies due to genuine clinical or methodological differences and therefore incorporate this variability into the pooled estimate. The choice between a fixed or a random effect model should be performed a priori, and a fixed effects models should be preferred over random effect models only if researchers are absolutely certain that all the studies included do not have confounding effects or heterogeneity, a situation that rarely takes place.
Finally, assessing publication bias is a crucial step in meta-analysis to determine whether the included studies disproportionately represent positive or statistically significant findings, which can lead to an overestimation of the true effect. One common method for detecting publication bias is the funnel plot [42], a scatterplot that displays individual study effect sizes against a measure of their precision, typically the standard error or sample size. The underlying principle of funnel plots is that smaller studies, which have less precise estimates, should show a wider spread of effect sizes, while larger studies with greater precision tend to cluster closer to the true effect. In the absence of bias, the plot should resemble a symmetrical inverted funnel.
When there are a sufficient number of studies (usually ten or ore) publication bias can be evaluated with funnel plots through visual inspection for asymmetry, potentially suggesting the presence of publication bias or other small-study effects. However, visual interpretation can be subjective and inconsistent. Therefore, statistical tests such as Egger’s regression test can be employed to provide a more objective and reproducible assessment of funnel plot asymmetry [43].
In addition to traditional pairwise meta-analyses, which compare two interventions at a time, network meta-analyses (NMAs) allow for simultaneous comparisons of multiple interventions, even if some have not been directly compared in any individual study. NMA integrates both direct and indirect evidence using a common comparator (e.g., placebo), enabling the ranking of interventions in terms of effectiveness. This approach is particularly valuable in fields where multiple treatment or management strategies exist, and head-to-head trials are sparse.

9. Data Synthesis–Qualitative Synthesis (Narrative Synthesis)

When conducting a systematic review, a meta-analysis may not be feasible or appropriate when there is substantial heterogeneity across studies—whether in terms of interventions, populations, outcome measures, or study designs—or when the available data are insufficient for meaningful quantitative synthesis, particularly when key outcomes are reported in too few studies or with inconsistent measures. In such cases, researchers should perform a narrative synthesis, which involves systematically summarizing and explaining the findings without statistically combining them. This approach typically organizes results thematically, categorically, or according to key characteristics, such as intervention type, population group, or outcome domain.
To improve the rigor and transparency of narrative syntheses, the SWiM (Synthesis Without Meta-analysis) guideline was developed [44]. SWiM provides structured recommendations on how to transparently report the methods used for grouping studies, how findings are synthesized, and how certainty in the evidence is assessed. In particular, key components of SWiM include detailed descriptions of how studies were grouped or clustered for synthesis, the criteria and rationale for combining or comparing results, and the approach used to assess the certainty of evidence. SWiM also emphasizes the importance of reporting how data were extracted and how any quantitative transformations were conducted when applicable, such as when effect sizes are calculated but not pooled.
Another important aspect of SWiM is promoting clarity in presenting results, encouraging authors to use tables and figures effectively to summarize study characteristics and findings. This helps readers understand the evidence without relying solely on textual descriptions, which can sometimes be ambiguous or incomplete. The guideline also advocates for explicit discussion of the limitations of the narrative synthesis approach, including potential biases and challenges arising from lack of statistical pooling.

10. Assessing Certainty of Evidence

One of the final and most critical steps in conducting a systematic review is the assessment of the certainty of evidence, which reflects the degree of confidence that the estimated effect is close to the true effect. The GRADE (Grading of Recommendations, Assessment, Development and Evaluation) approach is the internationally recognized standard for evaluating the certainty of evidence [45] and should be applied in systematic reviews regardless of whether a meta-analysis is performed or not. GRADE provides a structured and transparent framework to assess the strength of evidence based on key domains, ensuring that the conclusions of the review are well-founded, even when quantitative synthesis is not feasible [44].
GRADE assesses certainty based on five key domains. The first is risk of bias, which considers whether limitations in study design or execution may have introduced systematic errors. The second is inconsistency, which refers to unexplained heterogeneity or variability in results across studies. The third domain, indirectness, evaluates the applicability of the evidence to the research question, particularly when there are differences in populations, interventions, or outcomes. The fourth domain is imprecision, which addresses whether the evidence is weakened by wide confidence intervals or small sample sizes that reduce the reliability of the effect estimate. Finally, publication bias assesses whether the available evidence might be skewed due to the selective publication of studies with positive or significant findings.
Based on an assessment of these domains, the certainty of evidence is classified into one of four levels: high, moderate, low, or very low. High-certainty evidence indicates strong confidence that the true effect lies close to the estimate, while very low certainty reflects considerable uncertainty about the effect estimate. The results of the GRADE assessment are commonly presented in a Summary of Findings (SoF) table, which succinctly displays the main outcomes, effect sizes, confidence intervals, and corresponding GRADE ratings.

11. Alternative Approaches to Evidence Synthesis

While this manuscript focuses on broadly applicable, quantitative synthesis approaches (e.g., GRADE, SWiM, Cochrane Handbook), it is important to acknowledge alternative methodologies developed for different types of evidence and research questions. For instance, the RAMESES (Realist And MEta-narrative Evidence Syntheses: Evolving Standards) [46] framework supports realist reviews, which aim to understand how and why interventions work (or fail) in particular contexts—especially in complex, system-level settings. Similarly, in qualitative research, meta-aggregation (as used in the Joanna Briggs Institute model) [47] offers a structured approach for synthesizing findings while preserving participants’ original meaning. Unlike thematic synthesis, meta-aggregation emphasizes transparency and alignment with primary study conclusions. These alternative methodologies serve distinct purposes and are complementary to the general frameworks described here. Researchers should carefully consider their review goals and the nature of the available evidence when selecting an appropriate synthesis method.

12. Unique Methodological Challenges in Biological Research

Biological research, particularly in ecology, evolutionary biology, and related subfields, faces several methodological challenges that are distinct from clinical or health-related studies. These challenges stem from the complexity of living systems and their interactions with highly variable environments.
 (a)
Non-standardized outcome measures in ecological and evolutionary studies
Unlike clinical trials with well-established and standardized endpoints (e.g., blood pressure or mortality), ecological and evolutionary studies often rely on diverse outcome measures, such as species richness, biomass, gene expression levels, or behavioral traits. This diversity can make it difficult to directly compare results across studies or synthesize findings through meta-analyses [48]. Moreover, these outcomes may be measured using different methodologies, scales, or temporal resolutions, increasing methodological heterogeneity and complicating systematic reviews [49].
 (b)
Species-specific or strain-specific biological responses
Variation between species, strains, or populations is a hallmark of biological research, reflecting differences in genetics, physiology, life history, and ecological niches. Such biological heterogeneity means that responses to experimental treatments or environmental factors can differ markedly, even within seemingly similar contexts. For example, a pesticide might have a strong toxic effect on one insect species but be benign to another. Failing to account for these differences risks overgeneralizing results and undermines the validity of pooled analyses [50]. Researchers must carefully consider taxonomic and genetic diversity when designing studies and synthesizing data.
 (c)
High heterogeneity due to environmental contexts in biodiversity or phylogenetic research
Environmental variability introduces another significant source of heterogeneity in biological research. Spatial factors (e.g., latitude, habitat type), temporal dynamics (e.g., seasonal variation), and anthropogenic impacts (e.g., pollution, land use changes) all affect biological outcomes and can lead to conflicting or context-dependent findings. Biodiversity studies, in particular, must grapple with this complexity when synthesizing data from multiple ecosystems or geographic regions. Phylogenetic relationships further complicate analyses, as closely related species may share traits that influence responses, requiring sophisticated statistical models to control for evolutionary relatedness [51,52]. These challenges highlight the need for explicit modeling of heterogeneity and sensitivity analyses in biological meta-research.

13. Reporting the Systematic Review

Once the SR has been completed, the next crucial step is to report it in a clear, transparent, and comprehensive manner. Fortunately, researchers can rely on the PRISMA 2020 reporting guideline (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) [53], which provides a structured framework to ensure that all essential components of the review are adequately reported. Adhering strictly to PRISMA enhances both the transparency and reproducibility of the review process. The PRISMA 2020 checklist consists of 27 items covering all sections of a systematic review, from title and abstract to methods, results, and discussion. Researchers are strongly encouraged to carefully read the checklist before drafting the manuscript and to complete it thoroughly prior to submission to a peer-reviewed journal. It is highly recommended that researchers annotate the checklist not only with the corresponding page and line numbers but also by including the exact sentences or sections that address each item. This practice greatly facilitates the work of editors, peer reviewers, and other research teams in verifying compliance with reporting standards, further strengthening the credibility, clarity, and transparency of the systematic review.

14. Conclusions

The responsibility of conducting a high-quality systematic review lies firmly with researchers. Beyond simply aggregating studies [54], researchers must engage in several steps that have to be planned at the very beginning of the development of a meta-analysis, as we have tried to illustrate in our review. Accurate planning, rigorous methodology and transparent reporting are the key-components in producing evidence that is both credible and useful. High-quality systematic reviews are foundational to evidence-based medicine, informing clinical decision-making, shaping community policies, and guiding future research priorities. Without strict adherence to internationally recognized best practices, systematic reviews risk bias, irreproducibility and, ultimately, misleading conclusions that could affect patient care and resource allocation. Several tools have been developed to help researchers in this hard work, such as the PRISMA 2020 guideline, the Cochrane Handbook for Systematic Reviews of Interventions, and the GRADE approach enhances the methodological rigor and transparency of reviews. PRISMA ensures comprehensive and standardized reporting, making the review accessible and interpretable to a wide audience, including clinicians, policymakers, and fellow researchers. The Cochrane Handbook offers detailed guidance on every stage of the review process, from framing the research question and designing search strategies to assessing risk of bias and synthesizing evidence. Meanwhile, GRADE provides a systematic method to evaluate the certainty of evidence, helping stakeholders understand how confident they can be in their findings.
Moreover, researchers have an ethical obligation to minimize bias and maximize transparency throughout the review. This includes pre-registering protocols, conducting thorough and unbiased literature searches, independently screening and extracting data, and openly sharing data and analytic methods. Such transparency facilitates reproducibility and critical appraisal, strengthening trust in the findings. Additionally, acknowledging and addressing limitations—whether in study quality, heterogeneity, or publication bias—demonstrates intellectual honesty and allows readers to interpret results within the appropriate context.
Ultimately, high-quality systematic reviews serve as powerful tools that synthesize vast amounts of information into actionable knowledge. By committing to excellence in their conduct, researchers ensure that these reviews remain invaluable resources that advance science and inform policies that affect lives worldwide.
In summary, this guide aims to consolidate widely accepted standards for conducting systematic reviews and adapting them to the specific needs of researchers in the biological sciences. From diverse study designs and outcome measures to field-specific challenges, like ecological variability and non-standardized interventions, systematic reviews in biology require careful methodological attention. By contextualizing established frameworks—such as PRISMA, GRADE, and SWiM—and highlighting additional tools relevant to biology (e.g., SPIDER), this manuscript supports researchers in producing high-quality, transparent, and reproducible evidence syntheses. Strengthening the methodological foundation of systematic reviews in biology is essential for advancing reliable, evidence-based decision-making in conservation, ecology, agriculture, and biomedical science.

Author Contributions

Conceptualization, A.D.C. and B.D.; methodology, A.D.C.; resources, A.D.C. and B.D.; writing—original draft preparation, A.D.C., B.D., S.T. and A.B.; writing—review and editing, A.D.C., B.D., S.T. and A.B.; visualization, A.B.; supervision, A.B.; project administration, A.D.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data referenced in the manuscript are fully available within the manuscript itself.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Murad, M.H.; Asi, N.; Alsawas, M.; Alahdab, F. New evidence pyramid. BMJ Evid.-Based Med. 2016, 21, 125–127. [Google Scholar] [CrossRef]
  2. Pussegoda, K.; Turner, L.; Garritty, C.; Mayhew, A.; Skidmore, B.; Stevens, A.; Boutron, I.; Sarkis-Onofre, R.; Bjerre, L.M.; Hróbjartsson, A.; et al. Systematic review adherence to methodological or reporting quality. Syst. Rev. 2017, 6, 131. [Google Scholar] [CrossRef]
  3. O’Dea, R.E.; Lagisz, M.; Jennions, M.D.; Koricheva, J.; Noble, D.W.A.; Parker, T.H.; Gurevitch, J.; Page, M.J.; Stewart, G.; Moher, D.; et al. Preferred Reporting Items for Systematic Reviews and Meta-Analyses in Ecology and Evolutionary Biology: A PRISMA Extension. Biol. Rev. 2021, 96, 1695–1722. [Google Scholar] [CrossRef]
  4. Gates, S. Review of Methodology of Quantitative Reviews Using Meta-Analysis in Ecology. J. Anim. Ecol. 2002, 71, 547–557. [Google Scholar] [CrossRef]
  5. Higgins, J.P.T.; Thomas, J.; Chandler, J.; Cumpston, M.; Li, T.; Page, M.J.; Welch, V.A. (Eds.) Cochrane Handbook for Systematic Reviews of Interventions; Version 6.3; Cochrane: London, UK, 2022; Available online: https://training.cochrane.org/handbook (accessed on 1 June 2025).
  6. Mayo, N.E.; Asano, M.; Barbic, S.P. When is a research question not a research question? J. Rehabil. Med. 2013, 45, 513–518. [Google Scholar] [CrossRef] [PubMed]
  7. Schardt, C.; Adams, M.B.; Owens, T.; Keitz, S.; Fontelo, P. Utilization of the PICO framework to improve searching PubMed for clinical questions. BMC Med. Inform. Decis. Mak. 2007, 7, 16. [Google Scholar] [CrossRef]
  8. Cooke, A.; Smith, D.; Booth, A. Beyond PICO: The SPIDER tool for qualitative evidence synthesis. Qual. Health Res. 2012, 22, 1435–1443. [Google Scholar] [CrossRef]
  9. Methley, A.M.; Campbell, S.; Chew-Graham, C.; McNally, R.; Cheraghi-Sohi, S. PICO, PICOS and SPIDER: A comparison study of specificity and sensitivity in three search tools for qualitative systematic reviews. BMC Health Serv. Res. 2014, 14, 579. [Google Scholar] [CrossRef] [PubMed]
  10. Davies, K.S. Formulating the Evidence Based Practice Question: A Review of the Frameworks. Evid. Based Libr. Inf. Pract. 2011, 6, 75–80. [Google Scholar] [CrossRef]
  11. Moher, D.; Shamseer, L.; Clarke, M.; Ghersi, D.; Liberati, A.; Petticrew, M.; Shekelle, P.; Stewart, L.A.; PRISMA-P Group. Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) 2015 Statement. Syst. Rev. 2015, 4, 1. [Google Scholar] [CrossRef]
  12. Booth, A.; Clarke, M.; Dooley, G.; Ghersi, D.; Moher, D.; Petticrew, M.; Stewart, L. The nuts and bolts of PROSPERO: An international prospective register of systematic reviews. Syst. Rev. 2012, 1, 2. [Google Scholar] [CrossRef]
  13. Frost, A.D.; Hróbjartsson, A.; Nejstgaard, C.H. Adherence to the PRISMA-P 2015 reporting guideline was inadequate in systematic review protocols. J. Clin. Epidemiol. 2022, 150, 179–187. [Google Scholar] [CrossRef]
  14. Page, M.J.; McKenzie, J.E.; Forbes, A. Many scenarios exist for selective inclusion and reporting of results in randomized trials and systematic reviews. J. Clin. Epidemiol. 2013, 66, 524–537. [Google Scholar] [CrossRef]
  15. McGowan, J.; Sampson, M.; Salzwedel, D.M.; Cogo, E.; Foerster, V.; Lefebvre, C. PRESS Peer Review of Electronic Search Strategies: 2015 Guideline Statement. J. Clin. Epidemiol. 2016, 75, 40–46. [Google Scholar] [CrossRef]
  16. Frandsen, T.F.; Bruun Nielsen, M.F.; Lindhardt, C.L.; Eriksen, M.B. Using the full PICO model as a search tool for systematic reviews resulted in lower recall for some PICO elements. J. Clin. Epidemiol. 2020, 127, 69–75. [Google Scholar] [CrossRef]
  17. Duyx, B.; Swaen, G.M.H.; Urlings, M.J.E.; Bouter, L.M.; Zeegers, M.P. The strong focus on positive results in abstracts may cause bias in systematic reviews: A case study on abstract reporting bias. Syst. Rev. 2019, 8, 174. [Google Scholar] [CrossRef]
  18. De Cassai, A.; Dost, B.; Karapinar, Y.E.; Beldagli, M.; Yalin, M.S.O.; Turunc, E.; Turan, E.I.; Sella, N. Evaluating the utility of large language models in generating search strings for systematic reviews in anesthesiology: A comparative analysis of top-ranked journals. Reg. Anesth. Pain Med. 2025, in press. [CrossRef]
  19. Gartlehner, G.; Affengruber, L.; Titscher, V.; Noel-Storr, A.; Dooley, G.; Ballarini, N.; König, F. Single-reviewer abstract screening missed 13 percent of relevant studies: A crowd-based, randomized controlled trial. J. Clin. Epidemiol. 2020, 121, 20–28. [Google Scholar] [CrossRef] [PubMed]
  20. Rathbone, J.; Hoffmann, T.; Glasziou, P. Faster title and abstract screening? Evaluating Abstrackr, a semi-automated online screening program for systematic reviewers. Syst. Rev. 2015, 4, 80. [Google Scholar] [CrossRef] [PubMed]
  21. Ouzzani, M.; Hammady, H.; Fedorowicz, Z.; Elmagarmid, A. Rayyan: A web and mobile app for systematic reviews. Syst. Rev. 2016, 5, 210. [Google Scholar] [CrossRef] [PubMed]
  22. Covidence Systematic Review Software. Veritas Health Innovation, Melbourne, Australia. Available online: www.covidence.org (accessed on 27 June 2025).
  23. Thomas, J.; Brunton, J.; Graziosi, S. EPPI-Reviewer 4.0: Software for Research Synthesis; EPPI Centre Software: London, UK, 2010. [Google Scholar]
  24. Aromataris, E.; Lockwood, C.; Porritt, K.; Pilla, B.; Jordan, Z. (Eds.) JBI Manual for Evidence Synthesis; JBI: Adelaide, Australia, 2024. [Google Scholar] [CrossRef]
  25. Buscemi, N.; Hartling, L.; Vandermeer, B.; Tjosvold, L.; Klassen, T.P. Single data extraction generated more errors than double data extraction in systematic reviews. J. Clin. Epidemiol. 2006, 59, 697–703. [Google Scholar] [CrossRef]
  26. De Cassai, A.; Boscolo, A.; Zarantonello, F.; Pettenuzzo, T.; Sella, N.; Geraldini, F.; Munari, M.; Navalesi, P. Enhancing study quality assessment: An in-depth review of risk of bias tools for meta-analysis—A comprehensive guide for anesthesiologists. J. Anesth. Analg. Crit. Care 2023, 3, 44. [Google Scholar] [CrossRef]
  27. Sterne, J.A.C.; Savović, J.; Page, M.J.; Elbers, R.G.; Blencowe, N.S.; Boutron, I.; Cates, C.J.; Cheng, H.Y.; Corbett, M.S.; Eldridge, S.M.; et al. RoB 2: A revised tool for assessing risk of bias in randomised trials. BMJ 2019, 366, l4898. [Google Scholar] [CrossRef]
  28. Jadad, A.R.; Moore, R.A.; Carroll, D.; Jenkinson, C.; Reynolds, D.J.; Gavaghan, D.J.; McQuay, H.J. Assessing the quality of reports of randomized clinical trials: Is blinding necessary? Control Clin. Trials 1996, 17, 1–12. [Google Scholar] [CrossRef]
  29. Sterne, J.A.; Hernán, M.A.; Reeves, B.C.; Savović, J.; Berkman, N.D.; Viswanathan, M.; Henry, D.; Altman, D.G.; Ansari, M.T.; Boutron, I.; et al. ROBINS-I: A tool for assessing risk of bias in non-randomised studies of interventions. BMJ 2016, 355, i4919. [Google Scholar] [CrossRef] [PubMed]
  30. Higgins, J.P.T.; Morgan, R.L.; Rooney, A.A.; Taylor, K.W.; Thayer, K.A.; Silva, R.A.; Lemeris, C.; Akl, E.A.; Bateson, T.F.; Berkman, N.D.; et al. A tool to assess risk of bias in non-randomized follow-up studies of exposure effects (ROBINS-E). Environ. Int. 2024, 186, 108602. [Google Scholar] [CrossRef] [PubMed]
  31. Stang, A. Critical evaluation of the Newcastle-Ottawa scale for the assessment of the quality of nonrandomized studies in meta-analyses. Eur. J. Epidemiol. 2010, 25, 603–605. [Google Scholar] [CrossRef] [PubMed]
  32. Whiting, P.F.; Rutjes, A.W.; Westwood, M.E.; Mallett, S.; Deeks, J.J.; Reitsma, J.B.; Leeflang, M.M.; Sterne, J.A.; Bossuyt, P.M.; QUADAS-2 Group. QUADAS-2: A revised tool for the quality assessment of diagnostic accuracy studies. Ann. Intern. Med. 2011, 155, 529–536. [Google Scholar] [CrossRef]
  33. Critical Appraisal Skills Programme UK. CASP Qualitative Studies Checklist. 2024. Available online: https://casp-uk.net/casp-tools-checklists/qualitative-studies-checklist/ (accessed on 25 July 2025).
  34. Hooijmans, C.R.; Rovers, M.M.; de Vries, R.B.; Leenaars, M.; Ritskes-Hoitinga, M.; Langendam, M.W. SYRCLE’s risk of bias tool for animal studies. BMC Med. Res. Methodol. 2014, 14, 43. [Google Scholar] [CrossRef]
  35. Shea, B.J.; Reeves, B.C.; Wells, G.; Thuku, M.; Hamel, C.; Moran, J.; Moher, D.; Tugwell, P.; Welch, V.; Kristjansson, E.; et al. AMSTAR 2: A critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ 2017, 358, j4008. [Google Scholar] [CrossRef]
  36. Slim, K.; Nini, E.; Forestier, D.; Kwiatkowski, F.; Panis, Y.; Chipponi, J. Methodological index for non-randomized studies (MINORS): Development and validation of a new instrument. ANZ J. Surg. 2003, 73, 712–716. [Google Scholar] [CrossRef]
  37. Armijo-Olivo, S.; Stiles, C.R.; Hagen, N.A.; Biondo, P.D.; Cummings, G.G. Assessment of study quality for systematic reviews: A comparison of the Cochrane Collaboration Risk of Bias Tool and the Effective Public Health Practice Project Quality Assessment Tool: Methodological research. J. Eval. Clin. Pract. 2012, 18, 12–18. [Google Scholar] [CrossRef]
  38. Porritt, K.; Gomersall, J.; Lockwood, C. JBI’s Systematic Reviews: Study selection and critical appraisal. Am. J. Nurs. 2014, 114, 47–52. [Google Scholar] [CrossRef]
  39. Whiting, P.; Savović, J.; Higgins, J.P.; Caldwell, D.M.; Reeves, B.C.; Shea, B.; Davies, P.; Kleijnen, J.; Churchill, R.; ROBIS Group. ROBIS: A new tool to assess risk of bias in systematic reviews was developed. J. Clin. Epidemiol. 2016, 69, 225–234. [Google Scholar] [CrossRef]
  40. Downes, M.J.; Brennan, M.L.; Williams, H.C.; Dean, R.S. Development of a critical appraisal tool to assess the quality of cross-sectional studies (AXIS). BMJ Open 2016, 6, e011458. [Google Scholar] [CrossRef] [PubMed]
  41. Higgins, J.P.T.; Thompson, S.G.; Deeks, J.J.; Altman, D.G. Measuring inconsistency in meta-analyses. BMJ 2003, 327, 557–560. [Google Scholar] [CrossRef]
  42. Sterne, J.A.; Sutton, A.J.; Ioannidis, J.P.; Terrin, N.; Jones, D.R.; Lau, J.; Carpenter, J.; Rücker, G.; Harbord, R.M.; Schmid, C.H.; et al. Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. BMJ 2011, 343, d4002. [Google Scholar] [CrossRef] [PubMed]
  43. Lin, L.; Chu, H. Quantifying publication bias in meta-analysis. Biometrics 2018, 74, 785–794. [Google Scholar] [CrossRef]
  44. Campbell, M.; McKenzie, J.E.; Sowden, A.; Katikireddi, S.V.; Brennan, S.E.; Ellis, S.; Hartmann-Boyce, J.; Ryan, R.; Shepperd, S.; Thomas, J.; et al. Synthesis without meta-analysis (SWiM) in systematic reviews: Reporting guideline. BMJ 2020, 368, l6890. [Google Scholar] [CrossRef] [PubMed]
  45. Guyatt, G.H.; Oxman, A.D.; Vist, G.E.; Kunz, R.; Falck-Ytter, Y.; Alonso-Coello, P.; Schünemann, H.J. GRADE: An emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008, 336, 924–926. [Google Scholar] [CrossRef]
  46. Greenhalgh, T.; Wong, G.; Westhorp, G.; Pawson, R. Protocol—Realist and meta-narrative evidence synthesis: Evolving standards (RAMESES). BMC Med. Res. Methodol. 2011, 11, 115. [Google Scholar] [CrossRef] [PubMed]
  47. Lockwood, C.; Munn, Z.; Porritt, K. Qualitative research synthesis: Methodological guidance for systematic reviewers utilizing meta-aggregation. Int. J. Evid. Based Healthc. 2015, 13, 179–187. [Google Scholar] [CrossRef] [PubMed]
  48. Lortie, C.J.; Stewart, G.; Rothstein, H.; Lau, J. How to critically read ecological meta-analyses. Res. Synth. Methods 2015, 6, 124–133. [Google Scholar] [CrossRef] [PubMed]
  49. Haddaway, N.R.; Macura, B.; Whaley, P.; Pullin, A.S. ROSES RepOrting standards for Systematic Evidence Syntheses: Pro forma, flow-diagram and descriptive summary of the plan and conduct of environmental systematic reviews and systematic maps. Environ. Evid. 2018, 7, 7. [Google Scholar] [CrossRef]
  50. Nakagawa, S.; Lagisz, M.; O’Dea, R.E.; Rutkowska, J.; Yang, Y.; Noble, D.W.A.; Senior, A.M. The orchard plot: Cultivating a forest plot for use in ecology, evolution, and beyond. Res. Synth. Methods 2021, 12, 4–12. [Google Scholar] [CrossRef] [PubMed]
  51. Senior, A.M.; Grueber, C.E.; Kamiya, T.; Lagisz, M.; O’Dwyer, K.; Santos, E.S.; Nakagawa, S. Heterogeneity in ecological and evolutionary meta-analyses: Its magnitude and implications. Ecology 2016, 97, 3293–3299. [Google Scholar] [CrossRef]
  52. Stewart, G. Meta-analysis in applied ecology. Biol. Lett. 2010, 6, 78–81. [Google Scholar] [CrossRef]
  53. Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G.; PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. PLoS Med. 2009, 6, e1000097. [Google Scholar] [CrossRef]
  54. De Cassai, A.; Tassone, M.; Geraldini, F.; Sergi, M.; Sella, N.; Boscolo, A.; Munari, M. Explanation of trial sequential analysis: Using a post-hoc analysis of meta-analyses published in Korean Journal of Anesthesiology. Korean J. Anesthesiol. 2021, 74, 383–393. [Google Scholar] [CrossRef]
Figure 1. Illustrative example of a PRISMA flowchart.
Figure 1. Illustrative example of a PRISMA flowchart.
Biology 14 00973 g001
Table 1. Risk-of-bias tools.
Table 1. Risk-of-bias tools.
Risk-of-Bias ToolStudy Type
Cochrane Risk of Bias 2 (RoB 2) [27]Randomized Controlled Trials
Jadad Scale [28]Randomized Controlled Trials
ROBINS-I (Risk Of Bias In Non-randomized Studies—of Interventions) [29]Non-Randomized Intervention Studies
ROBINS-E (Risk Of Bias In Non-randomized Studies—of Exposures) [30]Observational Exposure Studies (Cohort, Case-Control)
Newcastle-Ottawa Scale (NOS) [31]Cohort and Case-Control Studies
QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies) [32]Diagnostic Accuracy Studies
CASP (Critical Appraisal Skills Programme) [33]Qualitative Studies
SYRCLE Risk-of-Bias Tool [34]Animal Intervention Studies
AMSTAR 2 (A Measurement Tool to Assess Systematic Reviews) [35]Systematic Reviews and Meta-Analyses
MINORS (Methodological Index for Non-Randomized Studies) [36]Non-Randomized Surgical Studies
Effective Public Health Practice Project (EPHPP) Quality Assessment Tool [37]Public Health Intervention Studies
Joanna Briggs Institute (JBI) Critical Appraisal Tools [38]Various Study Designs
ROBIS (Risk of Bias in Systematic Reviews) [39]Systematic Reviews
AXIS Tool [40]Cross-Sectional Studies
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

De Cassai, A.; Dost, B.; Tulgar, S.; Boscolo, A. Methodological Standards for Conducting High-Quality Systematic Reviews. Biology 2025, 14, 973. https://doi.org/10.3390/biology14080973

AMA Style

De Cassai A, Dost B, Tulgar S, Boscolo A. Methodological Standards for Conducting High-Quality Systematic Reviews. Biology. 2025; 14(8):973. https://doi.org/10.3390/biology14080973

Chicago/Turabian Style

De Cassai, Alessandro, Burhan Dost, Serkan Tulgar, and Annalisa Boscolo. 2025. "Methodological Standards for Conducting High-Quality Systematic Reviews" Biology 14, no. 8: 973. https://doi.org/10.3390/biology14080973

APA Style

De Cassai, A., Dost, B., Tulgar, S., & Boscolo, A. (2025). Methodological Standards for Conducting High-Quality Systematic Reviews. Biology, 14(8), 973. https://doi.org/10.3390/biology14080973

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop