The included studies emphasize the pivotal role of bioinformatics tools in advancing CRISPR-Cas9 applications, including gRNA design, gene essentiality screening, and minimizing off-target effects. Tools such as MAGeCK, CERES, CRISPRFinder, and DeepCRISPR demonstrate utility in areas like off-target prediction, CRISPR array detection, and enhancing the accuracy of CRISPR screens. Databases like CRISPRdb and CRISPR-Casdb enable comprehensive storage and comparison of annotated CRISPR data, while tools like CRISPRDetect and CRISPRmap classify CRISPR systems and targets. Despite their value, the studies highlight ongoing challenges, such as predicting sgRNA efficiency and addressing genomic variations, which need further research. Risk of bias was generally low-to-moderate, though concerns about standardization and data transparency were noted. The evidence is moderately to highly certain, with consistent findings across studies, despite variability in tool design, experimental setups, and limited sample sizes. No formal statistical synthesis or sensitivity analyses were conducted, but tools like MAGeCK and CRISPRDetect showed robustness under diverse conditions.
3.1. Study Selection
Bias was minimized by focusing on relevance, quality, and CRISPR-Cas9 alignment. Pre-2012 studies were excluded, and those with significant conflicts of interest were omitted. Priority was given to newer, highly cited studies in reputable journals, with conflicts addressed by including both perspectives. Author credibility was evaluated based on expertise in CRISPR-Cas9 or bioinformatics.
3.1.1. Initial Selection
The initial search in PubMed yielded over 2700 articles after using specific phrases like “CRISPR-Cas9 bioinformatics”, “CRISPR computer tools”, “Artificial Intelligence CRISPR”, and other similar terms, with the goal of identifying studies directly relevant to bioinformatics resources for CRISPR-Cas9. Filters were applied to restrict the results to studies published after 2012 in English or Polish, as CRISPR-Cas9 technology was first discovered in 2012, marking a clear starting point for this area of research. This choice was very important, as studies before this date might reference CRISPR more broadly—CRISPR elements themselves were known before 2012—but would not address the specific CRISPR-Cas9 gene-modifying technology, which is the main focus of this systematic review. So, excluding pre-2012 studies ensured the inclusion of only those articles relevant to this more recent and specific gene-editing technology.
The volume of results required a practical selection approach, as the review was conducted independently and without the help of automation tools. Consequently, the first 50 articles from PubMed were screened (since they were probably the most relevant and matched the most used phrases), of which 30 were deemed potentially relevant based on their titles and abstract review. During this phase, some studies were excluded due to limited access to full texts or the presence of significant conflicts of interest, but most of them were excluded because of a lack of relevant information for this review, resulting in a subset of 15 articles chosen for a more detailed examination.
From these 15 studies, 7 were ultimately chosen for inclusion as they presented detailed, relevant information about CRISPR-Cas9-specific bioinformatics tools, focusing on their applications, functionalities, or limitations. The remaining eight articles were excluded primarily for not providing specific information about bioinformatics tools; instead, they largely discussed CRISPR-Cas9 technology’s overall applications and predictions or detailed physical tools rather than the computational tools that are the focus of this review.
Additional records were identified by checking the references of the articles initially found (30 in total, with 21 finally included). After screening these references, those already included in the article were selected for direct citation, as they were relevant and frequently referenced in the chosen articles. This approach ensured the inclusion of pertinent sources directly linked to the specific article information.
For enhancing transparency and clarity in reporting, a PRISMA-style flow diagram (
Figure 1) is included to visually represent each stage of the study selection process, offering a concise overview of the steps from the initial article search through the final selection of studies included in this review. This diagram illustrates how the initial thousands of articles were systematically narrowed to a small and easy-to-follow amount.
3.1.2. Minimizing Risk of Bias
To minimize the risk of bias, this study’s selection process was strict, focusing on article relevance, quality, and alignment with CRISPR-Cas9 bioinformatics tools. Studies published before 2012 were excluded to ensure that the findings aligned with CRISPR-Cas9 technology, discovered in that year. This distinction helped avoid research that might reference early CRISPR knowledge but lacked a clear focus on CRISPR-Cas9’s gene-editing applications. In addition, journals rooted in fields entirely unrelated to bioinformatics, genetics, molecular biology, computational biology, or computer science were not selected, as they would not align with the scientific rigor required for this review.
Studies were further assessed for conflicts of interest on a case-by-case basis, especially if the relevance of the article or study to the review’s objectives was high. In cases where conflicts of interest existed but were minor or involved a small proportion of researchers and/or authors, the study was generally included. However, if conflicts involved a large percentage of the authors or if the conflicts were considered substantial, the study was excluded. This selective approach allowed relevant research to be included while minimizing the potential bias from undue conflicts of interest, ensuring a high standard of scientific focus and quality.
To reconcile conflicting findings across studies, several criteria were employed. Studies that were more recent, published in high-impact journals, or had higher citation counts were often prioritized. Given CRISPR-Cas9 technology’s relatively recent development, newer studies were likely to provide more accurate information and reflect current technology standards. In cases where two studies met these requirements similarly, both were included to illustrate the conflict, with an indication that additional research may be required for clarification. For further context, popular science sources (like YouTube videos or scientific blogs and websites) were sometimes consulted when disagreements were found, though these sources were not given the same weight as primary scientific research.
Finally, authors’ backgrounds were occasionally reviewed to confirm expertise in CRISPR-Cas9 and bioinformatics fields. Authors with a consistent publication history in this area were assumed to have higher credibility than those new to the field, contributing an additional layer of reliability. To summarize, this risk-of-bias assessment applied rigorous standards, balancing the inclusion of highly relevant studies with quality indicators like citation counts, recency, and author credibility.
3.2. Study Characteristics
The included studies highlight the critical role of bioinformatics tools in optimizing CRISPR-Cas9 technology for applications like gRNA design, essential gene identification, and reducing off-target effects. Tools such as MAGeCK, CERES, and SSC improve the efficiency of CRISPR systems. While these studies provide valuable insights, they also note limitations, including challenges in predicting sgRNA efficiency, off-target effects, and genomic variations. These ongoing issues emphasize the need for continued research and development of more accurate bioinformatics tools.
Alkhnbashi et al. [
4] present a comprehensive overview of the CRISPR-Cas systems and details the structural components of these systems, including a repeat-spacer array, a leader sequence, and various cas genes that encode proteins crucial for processing genetic information. CRISPR-Cas systems perform three essential functions: adaptation, involving the incorporation of foreign genetic material; the biogenesis of CRISPR RNAs (crRNAs) for guidance; and interference, focusing on degrading invading DNA or RNA. The article discusses various bioinformatics tools designed to predict the presence of these systems by identifying cas genes and CRISPR arrays, utilizing tailored approaches because of the unique features of these components. Several tools, such as CRISPRFinder, PILER-CR, and CRISPRDetect, are highlighted for their efficiency in identifying CRISPR arrays based on direct repeat (DR) sequences, local self-alignments, and regex searches, respectively. Moreover, the article explores databases like CRISPRdb and CRISPRCasdb, which provide extensive information on CRISPR arrays and Cas annotations, crucial for understanding the genomic context and evolutionary history of CRISPR systems.
The primary objective of Naeem and Alkhnbashi [
6] is to review and evaluate bioinformatics tools designed to optimize CRISPR/Cas9 experiments, focusing on reducing off-target effects that can lead to unintended mutations. This review systematically analyzes literature on gRNA design, delivery methods, off-target detection techniques, and post-experimental analysis tools. Key bioinformatics tools discussed include CHOPCHOP, Cas-OFFinder, and CRISTA for optimizing gRNA sequences, as well as off-target detection methods such as MOFF, TIDE, and CRISPResso2. While the review synthesizes findings from multiple studies, it acknowledges limitations in detecting INDELs and genetic variability among organisms, putting in focus the need for tools that integrate this variability for better assessments. The study is highly relevant to the systematic review, as it encapsulates the current landscape of bioinformatics tools for CRISPR/Cas9 applications and contributes to understanding the effectiveness and challenges associated with CRISPR technologies.
Li et al. [
7] develop and validate the MAGeCK algorithm for identifying essential genes from genome-scale CRISPR/Cas9 knockout screens. MAGeCK upgrades sensitivity and controls False Discovery Rates (FDRs) through a methodology that includes median-normalizing read counters and applying a negative binomial model. The study analyzes data from three distinct CRISPR/Cas9 knockout experiments and highlights MAGeCK’s superior performance compared with existing tools. The research significantly contributes to understanding bioinformatics tools in CRISPR-Cas9 applications, demonstrating MAGeCK’s effectiveness in enhancing the identification of essential genes and pathways.
Xu et al. [
8] identify sequence features that enhance the efficiency of single-guide RNA (sgRNA) in CRISPR applications. The researchers developed Spacer Scoring for the CRISPR (SSC) software package (version SSC0.1), which analyzes genomic sequences to predict sgRNA efficiency based on specific nucleotide compositions. Key findings reveal that the Protospacer Adjacent Motif (PAM) and nucleotide compositions significantly influence sgRNA performance. While the study provides valuable insights into optimizing sgRNA design, it also notes limitations, as approximately 40% of inefficient sgRNAs remain unpredictable.
Meyers et al. [
9] investigate the identification of essential genes for cancer cell proliferation using CRISPR-Cas9 technology. The primary bioinformatics tool introduced is CERES, designed to correct for the copy number effect in CRISPR-Cas9 screens. CERES enables unbiased interpretation of gene dependency and enhances the recall of essential genes necessary for cancer cell survival. The study employs robust data collection processes and validation methods, highlighting the critical role of bioinformatics tools in refining CRISPR-Cas9 essentiality screens. The findings contribute to understanding how computational corrections can mitigate the impact of genomic confounding factors, improving the accuracy of essential gene identification in cancer research.
The importance of effective sgRNA design for successful genetic manipulation is emphasized by Doench et al. [
10]. Researchers delivered sgRNAs via lentiviral vectors into mouse and human cells and evaluated sgRNA efficacy across diverse gene targets. The study developed predictive models based on specific nucleotide preferences observed in active sgRNAs, optimizing sgRNA library design for improved success rates in gene editing. However, limitations include variability in sgRNA activity across different contexts, affecting the reliability of predictions regarding sgRNA efficacy.
Joung et al. [
11] explore high-throughput genetic perturbation technologies for understanding gene function and epigenetic regulation. The primary objective is to show how the CRISPR-Cas9 system, particularly the use of dCas9 for transcriptional activation, can enhance our ability to manipulate gene expression effectively. By employing pooled sgRNA libraries for simultaneous gene perturbation, the study allows for loss-of-function (LOF) and gain-of-function analyses across various conditions and cell types. The research provides valuable insights into the application of CRISPR-Cas9 technology for genome-scale screening, contributing to the understanding of gene function and regulation. It aligns with the objectives of the systematic review by showcasing the advancements and effectiveness of bioinformatics tools in genetic research.
The following is a summary of the included articles:
The articles collectively underscore the important role of bioinformatics tools in optimizing CRISPR-Cas technology for various applications, including gRNA design, essential gene identification, and off-target effect reduction.
Each study presents unique methodologies and tools, such as MAGeCK, CERES, and SSC, enhancing the understanding of CRISPR systems and improving the efficiency of genetic manipulations.
Limitations across the studies highlight ongoing challenges in accurately predicting sgRNA efficiency, off-target effects, and genomic variations, emphasizing the need for continued research and tool development in this evolving field.
3.3. Risk of Bias in Studies
Overall, the articles demonstrate a low-to-moderate risk of bias, primarily benefiting from reputable funding sources. While many use established methodologies, some show minor subjectivity due to a lack of standardized criteria or data transparency. Concerns about generalization arise from limited data sources and potential conflicts of interest. Despite these issues, most provide a solid foundation for their conclusions.
The work of Alkhnbashi et al. [
4] shows a low-to-moderate risk of bias. It received funding from the DFG (SPP 2141 grants VO 1450/6-1 and BA 2168/23-1), an independent, non-commercial research organization, which minimizes bias from funding sources. Since the article focuses on summarizing available bioinformatics tools without direct testing, there is no bias in sample or data selection. However, tool comparisons lack standardized criteria, which could introduce minor subjectivity. Although some limitations are mentioned, such as availability and processing speed, other tools are not critically assessed, likely due to a genuine lack of documented limitations. This omission slightly affects transparency, though overall, the article maintains a low risk of bias due to its descriptive approach and neutral funding.
Naeem and Alkhnbashi [
6] use both internal and external data sources, as a review of a range of bioinformatics tools specifically designed for sgRNA design and off-target detection is performed. This integration of data sources likely minimizes bias by encompassing a broader perspective on tool efficacy; however, because the study does not provide a detailed breakdown of individual data origins, there is a potential for minor bias due to limited transparency. As the article is primarily a review and not a primary research study, it does not include original experimental data, which reduces risks associated with data collection bias but also limits the depth of validation evidence for each bioinformatics tool. Funded by King Fahd University of Petroleum and Minerals, the study appears to carry a low risk of bias related to financial or institutional influence, as no direct affiliation with the tools assessed is evident. Overall, the absence of detailed primary data and sample information, combined with the broad review format, suggests a low risk of bias, though transparency in data origins could be improved.
The study by Li et al. [
7] shows a moderate risk concerning source reliability, as it used a mix of open-access and restricted datasets, making most of its data reasonably accessible to researchers and students. For algorithm validation, the risk is moderate-to-high because the study developed a new tool but did not apply external validation or test on independent datasets, which may limit the generalization of its results. There is a possible risk associated with data limitations, as the completeness of the data and assumptions during analysis were not clearly specified. Finally, the funding and conflicts of interest are assessed to be at a low risk, with no competing interests declared by the authors and funding from reputable sources such as the NIH and the Dana–Farber Cancer Institute, suggesting minimal commercial influence.
A low risk of bias is present in [
8], as it incorporates both well-known experimental techniques, such as Western blotting, and reliable databases. No significant data limitations were found, showing a comprehensive analysis. The study also includes validation through multiple datasets and experimental verification, further supporting its reliability. There is minimal risk of bias due to funding, as the project received support from reputable organizations like the NIH and NSF, with no direct corporate affiliations to the developed software, SSC. The software is openly available via SourceForge (
https://sourceforge.net/, accessed on 24 April 2025), enhancing transparency.
The work of Meyers et al. [
9] shows a low risk of bias because of its use of reputable data sources, including the Cancer Cell Line Encyclopedia (CCLE) and well-established algorithms such as PoolQ for data deconvolution and Bowtie for genome mapping. The incorporation of open-source software (Eigen 3.3 and ‘RccpEigen’ package) and the availability of code for replication further enhance the reliability of the findings. No significant data limitations were identified; the study’s thoroughness in data management and processing suggests it properly addressed potential issues that could affect results. However, the financial support from several grants and the Slim Initiative for Genomic Medicine, along with the involvement of author W. C. Hahn, who reported receiving a commercial research grant from Novartis and serving as a consultant for this company, introduces a moderate risk of bias, as such commercial interests may influence research focus or interpretation. Other authors disclosed no potential conflicts of interest.
A moderate risk of bias is found in [
10]. A predictive model for sgRNA activity is developed using a logistic regression classifier trained on data from multiple mouse and human genes, employing reputable sequence features for activity predictions and cross-validating its model across different genes. But the reliance on data from a specific human melanoma cell line may limit the generalization of the findings. When extensive datasets were used, potential biases may arise from the specific characteristics of the genes selected for analysis and the limitations of the datasets used for validation, particularly in off-target predictions. The authors declared no competing financial interests, reducing the risk related to financial incentives, although funding from multiple NIH institutes may influence research directions. Notably, the model was validated using a substantial independent dataset of sgRNAs, enhancing the robustness of the findings, and the provision of a web tool for sgRNA scoring facilitates accessibility and potential reproducibility.
The study by Joung et al. [
11] presents a moderate risk of bias for several reasons. It uses well-established genomic screening methods and the GeCKO v2 and SAM libraries, which are designed to minimize off-target activity and prioritize sgRNAs with fewer potential off-target sites. The scripts and libraries used are available from available sources, including Addgene and the Zhang laboratory. However, while the study outlines a good design for sgRNA libraries and screening analysis methods, it lacks mention of potential limitations in the datasets or biases introduced by the experimental design, such as the conditions under which the screening was conducted. Also, the authors acknowledge financial support from various institutions and disclose competing financial interests, particularly noting that F. Z. is a founder and advisor for Editas Medicine and Horizon Discovery. These relationships could influence the interpretation of the results and the emphasis placed on the findings related to the tools and methods discussed.
3.4. Results of Individual Studies
The CRISPR-Cas9 bioinformatics tools presented in
Table 1,
Table 2,
Table 3,
Table 4,
Table 5,
Table 6 and
Table 7 cover a range of functions essential for gene-editing experiments, from CRISPR array detection to off-target effect prediction and gRNA design. Tools like CRISPRmap and CRISPRtarget aid in CRISPR system classification and target identification, while prediction tools such as CRISPRFinder and PILER-CR detect CRISPR arrays in genomes. Databases like CRISPRdb and CRISPR-Casdb store annotated CRISPR data for various organisms, facilitating comparison and analysis. Off-target detection tools, including MOFF and DeepCRISPR, enhance editing specificity, and gRNA design tools like Cas-OFFinder optimize on-target and off-target outcomes. Finally, tools like MAGeCK and CERES improve the accuracy of CRISPR screens, enabling better experimental design and gene analysis.
Repair Outcome Prediction Tools are bioinformatics resources that help anticipate DNA repair outcomes through non-homologous end joining (NHEJ), microhomology-mediated end joining (MMEJ), and homology-directed repair (HDR) pathways. These tools are primarily used to forecast the success of gene knock-out and knock-in strategies, aiding researchers in experimental design. Leveraging machine learning, they predict repair patterns based on the genetic context and repair pathway, providing insights tailored to various cell lines and organisms. This adaptability makes them versatile for applications across diverse genetic research settings.
CERES is a sophisticated tool designed to enhance the accuracy of gene dependency analysis from CRISPR-Cas9 essentiality screens, particularly by accounting for the antiproliferative effects unrelated to gene knockout [
9]. By decoupling gene-dependent effects from those induced by DNA cleavage, CERES minimizes false positives, improving the reliability of cancer vulnerability assessments. Its large-scale application across 342 cancer cell lines from diverse lineages broadens the dataset, providing robust insights into gene essentiality. Although CERES offers high-quality predictions validated through ROC analysis and independent datasets, its complexity may demand considerable computational resources and high-quality input data for optimal results.
The bioinformatics tools in CRISPR-Cas9 research support crucial steps in designing, executing, and analyzing gene-editing experiments [
10]. The sgRNA Target Site Identification Tool facilitates targeted gene editing by pinpointing sgRNA sites within the NGG PAM context in human and mouse genomes, incorporating exonic and intronic sequences to allow customized, species-specific sgRNA design. Meanwhile, the Data Processing Pipeline for Illumina Sequencing offers a streamlined workflow for measuring sgRNA abundance across various experimental conditions, employing normalization (e.g., “Reads per Million”) and stringent filtering to ensure dataset quality—although this can sometimes result in valuable sgRNAs being excluded from analysis. Enhancing precision further, the sgRNA Activity Predictive Model applies machine learning to evaluate sgRNA efficacy based on nucleotide features, using SVM and logistic regression techniques to generate robust predictions; however, this complexity demands substantial computational power and expertise.
For off-target assessment, the Off-Target Scoring Tool is essential, drawing on external databases to assess potential unintended effects of sgRNAs and enhance the safety of gene-editing experiments. While these tools are invaluable for gauging specificity, their predictions still often require experimental validation to confirm off-target risks. Finally, Data Normalization and Analysis Tools are instrumental in ensuring that pooled screening data are both standardized and insightful, using log-fold change calculations to evaluate sgRNA effectiveness while normalizing for variability in gene expression. This enhances the reliability of cross-comparisons but requires careful calibration of normalization parameters to prevent bias. Together, these tools provide a robust suite for managing the complexities inherent in CRISPR-Cas9 gene-editing projects, from initial target site identification to final data analysis.
Makarova’s method [
12] classifies CRISPR-Cas systems into two classes, six types, and nineteen subtypes, based on the architecture of Cas loci and protein families. It achieves a 99.8% accuracy, misclassifying only 4 out of 1942 loci, making it highly reliable for large-scale genomic classification. However, it requires extensive computational resources and expertise, limiting its accessibility. In contrast, CRISPRmap [
13] classifies CRISPR arrays based on direct repeats (DRs), identifying 33 structural motifs and 40 sequence families from over 3500 repeats. It is a user-friendly web-based tool, making it accessible for researchers without deep bioinformatics knowledge. While it focuses on repeat sequences rather than entire CRISPR-Cas systems, it is ideal for evolutionary studies and metagenomic data analysis. CASPERpam, used to predict PAM sequences, processes 720,391 spacers from prokaryotic genomes and identifies 26,364 hits (3.7%). It is efficient for large-scale data but shows variable accuracy depending on the CRISPR class, with 12 predicted PAMs showing 7 matches or strong correlations to experimental data. Its reliance on computational predictions requires experimental validation for more reliable results.
CRISPRdb [
14], CRISPRCasdb, CRISPRone [
15], and Anti-CRISPRdb [
16] are key databases supporting CRISPR-Cas research, each with distinct focus areas and dataset sizes. CRISPRdb catalogs 870 CRISPR arrays from 232 archaeal and 8069 arrays from 6782 bacterial genomes, primarily offering CRISPR array information with monthly automatic updates and user-friendly features like BLAST-based private database comparison. CRISPRCasdb builds on this by integrating Cas protein annotations, covering a slightly broader dataset—240 archaeal and 9242 bacterial genomes—making it more suitable for functional studies involving both CRISPR arrays and Cas genes.
In contrast, CRISPRone prioritizes accuracy and large-scale analysis, identifying CRISPR-Cas systems in 11,102 complete and 21,186 draft genomes. It offers advanced tools like MetaCRT and HMMER for precise array and Cas protein prediction, and uniquely addresses false CRISPR arrays (mock CRISPRs), making it ideal for metagenomic and incomplete genome data. Anti-CRISPRdb is more specialized, focusing solely on anti-CRISPR (Acr) proteins. Though smaller (106 non-redundant sequences in 6 main and 23 sub-families), its manual curation supports high accuracy, and its search, screen, and download functions make it valuable for research on CRISPR inhibition.
CRISPRFinder [
17] and CRISPRCasFinder [
18] are widely used, detecting repeats 23–55 nt long with >80% similarity and discarding arrays if spacer similarity exceeds 60%. CRISPRCasFinder adds an evidence rating (levels 1–4) to assess prediction quality. Both tools are user-friendly, available via web and command line. PILER-CR uses self-alignment and graph-based repeat clustering, filtering repeats with <90% conservation. It is efficient but less intuitive for users. CRT, focused on low memory use, scans with a sliding window and uses exact string matching. It is Java-based, platform-independent, and offers both GUI and CLI, making it practical for resource-limited systems. CRISPRDetect uses regex and local alignment, refining arrays by extending repeats. It filters tandem repeats and is moderately efficient, but offers only command-line and web access. CRISPRdisco [
19] combines Cas gene and CRISPR detection using MinCED and curated Cas profiles. It supports complete system classification but lacks a web interface.
For orientation prediction, CRISPRstrand [
20] applies a graph-kernel machine learning model, reaching 0.95 AUC ROC (vs. 0.88 from older tools), improving repeat classification and downstream analysis. CRASS [
21] and metaCRISPR [
22] target unassembled metagenomic reads. CRASS achieves 0.89 spacer order sensitivity and 0.99 specificity, scales linearly with read count, and handles 76–2000 bp reads. metaCRISPR uses 13.6 GB RAM for 271M reads (150 bp), running in 108 minutes on four threads, offering clustering and contig assembly, suited for complex metagenomes. CRISPRleader [
23] finds leader sequences using clustering and string kernels; it supports visualization and outputs in HTML and BED. CRISPRtionary [
24], a web tool, compares CRISPR loci across strains and outputs binary spacer matrices for phylogenetics.
DeepCRISPR uses deep learning to predict on- and off-target effects with improved accuracy, while MOFF incorporates mismatch tolerance and epigenetics, outperforming previous models, though exact accuracy numbers aren’t provided. In unbiased detection, GUIDE-seq is favored for its low cost, low false-positive rate, and broad applicability. GUIDE-Tag improves this with Tn5 tagging to reduce PCR bias, achieving sensitivity for off-target effects at
editing efficiency. PEM-Seq, combined with LAM-HTGTS, is highly sensitive for detecting genomic translocations. For prime editing, PEAC-Seq and TAPE-Seq [
25] offer genome-wide detection, with TAPE-Seq providing live-cell analysis and both on- and off-target activity detection.
Guide RNA (gRNA) design tools optimize CRISPR/Cas9 efficiency and specificity. CHOPCHOP [
26] is an early tool supporting over 100 organisms but lacks advanced features. Cas-OFFinder, integrated into CRISPR RNGE, uses machine learning for better off-target detection. CRISTA supports 100+ organisms but is limited to spCas9, while GuideScan focuses on mouse and human genomes, enhancing precision. CRISPRDo is versatile, supporting both Cas9 and Cpf1, and is effective for zebrafish, mice, and humans, incorporating epigenetic factors for better accuracy. sgRNACas9 targets the mouse genome and quantifies off-target effects. For plants, CRISPR-P is popular, while PhytoCRISPR and CRISPRz focus on plant genome editing. CRISPOR is the most versatile, supporting 30+ Cas variants and providing detailed off-target analysis. Png Designer specializes in base and prime editing. Tools like CRISPOR and CRISPRDo excel at off-target prediction. Numerical data show that truncating gRNA from 20 to 17 nucleotides can reduce off-target effects by 500-fold, and the optimal GC content for gRNAs is 40–60%. Overall, CRISPOR and CRISPRDo are highly versatile and accurate, while others like CRISTA and GuideScan are specialized but effective within specific domains.
Cas9-based GeCKO v2 libraries are more accurate and efficient than shRNA [
27]. GeCKO screens target 19,050 human or 20,611 mouse genes with six sgRNAs per gene and also include non-targeting controls. These libraries identify more lethal genes, with lower false negative rates compared with shRNA. GeCKO’s ability to pinpoint essential genes is particularly beneficial for loss-of-function (LOF) screening, where complete gene inactivation is crucial. In contrast, shRNA often fails to achieve full gene knockdown, leading to less reliable results. Despite its advantages, Cas9 has limitations. In cancer genomes, Cas9-induced double-strand breaks (DSBs) can cause false positives due to gene-independent DNA damage [
28]. Also, targeting the 5’ exons may produce in-frame variants that retain partial gene functionality, masking true genetic dependencies. For gene activation, the SAM (Synergistic Activation Mediator) library targets the 200 bp region upstream of the transcriptional start site of 23,430 human or 23,439 mouse genes, with three sgRNAs per isoform. SAM requires additional effectors in a two- or three-vector system for activation, allowing precise transcriptional control. GeCKO and SAM libraries prioritize sgRNAs with minimal off-target effects, optimizing the specificity and efficiency of the screening process. Custom sgRNA libraries can be designed using a Python script 2.7 [
11] to minimize off-target activity, which is crucial for reliable results.
3.5. Results of Syntheses
This synthesis analyzes various bioinformatics tools for CRISPR-Cas9 applications, focusing on tools for CRISPR array detection, Cas gene identification, PAM prediction, off-target prediction, and gene essentiality screening. The studies highlight tools like CRISPRFinder, MAGeCK, and CRISPRdb, which employ diverse methods such as regex searches, machine learning, and graph-based analysis. No formal statistical synthesis was performed, but variability in tool performance across experimental conditions was observed, with tools like MAGeCK and CRISPRDetect showing robustness in different contexts. Sources of heterogeneity included differences in tool design and experimental setups. While no sensitivity analyses were conducted, variations in tool design were noted as affecting result robustness. Reporting biases were not evident, and the certainty of evidence is considered moderate-to-high, with consistent findings across studies despite some limitations like narrow sample sizes and potential biases in self-reported tool comparisons.
3.5.1. Summary of Study Characteristics and Risk of Bias
The synthesis focuses on the analysis of various bioinformatics tools for CRISPR-Cas9 applications, including tools for CRISPR array detection, Cas gene identification, PAM prediction, off-target prediction, and gene essentiality screening. Key tools and databases discussed include CRISPRFinder, CRISPRCasFinder, PILER-CR, CRISPRDetect, CRISPRdisco, and others. Each tool employs diverse methods such as regex searches, graph-based analysis, and machine learning to optimize CRISPR detection and analysis across genomes. The studies collectively reflect a range of methods to improve CRISPR/Cas9 specificity and efficiency. For instance, CRISPRdb and CRISPRCasdb focus on CRISPR loci and anti-CRISPR protein data, while tools like MAGeCK are focused on identifying essential genes in CRISPR screens. No formal statistical synthesis was conducted across studies due to the descriptive nature of the tools, but each study emphasizes the strengths and intended applications of these tools in diverse genomic contexts. The risk of bias across the studies is generally low-to-moderate, but variability exists due to the use of different data sources and experimental conditions.
3.5.2. Statistical Syntheses Results
No formal meta-analysis or statistical synthesis was performed across the studies, as most tools were evaluated descriptively rather than compared quantitatively. However, the studies implicitly address variability in tool performance across different experimental conditions. For instance, tools like MAGeCK were tested in different cell types, showing their robustness in identifying essential genes. Similarly, tools like CRISPRDetect and CRISPRdisco were highlighted for their refinement in CRISPR array detection, though they vary in their precision based on the tools’ designs and the complexity of the genomic data used. The direction of effects varies depending on the specific context, but tools designed for specific tasks (e.g., off-target prediction or gene essentiality screens) show improvements in CRISPR/Cas9 specificity and effectiveness. The studies suggest that combining different tools is often necessary for comprehensive CRISPR analysis.
3.5.3. Investigations of Causes of Heterogeneity Among Study Results
The studies identified several sources of heterogeneity in the results. For example, the variation in CRISPR array detection methods, such as those used by CRISPRFinder and CRISPRCasFinder, shows how differences in approach (e.g., regex-based vs. graph-based analysis) can affect detection sensitivity. Similarly, the use of machine learning for off-target prediction in tools like E-CRISP and CHOPCHOP contrasts with sequence alignment-based methods, highlighting the complexity of achieving standardized results across different organisms and experimental setups. Factors like the delivery methods for CRISPR/Cas9 (e.g., ribonucleoprotein versus plasmid-based delivery) and differences in genome complexity between species also contribute to the observed heterogeneity in tool performance.
3.5.4. Sensitivity Analyses
No explicit sensitivity analyses were conducted in the studies, but variations in tool design were discussed as factors affecting the robustness of results. For instance, tools like CASPERpam require large input datasets for accurate PAM prediction, while tools like SSC for sgRNA efficiency emphasize sequence features that contribute to off-target effects. Additionally, studies discussed specific optimization strategies to improve tool sensitivity, such as varying Cas-to-gRNA ratios and truncating gRNAs. While no formal sensitivity analysis was performed, these modifications suggest efforts to refine the robustness of CRISPR tool results across different contexts. Tools like MAGeCK demonstrated robustness even with reduced sequencing depth or fewer sgRNAs, showing their reliability under variable experimental conditions.
3.5.5. Reporting Biases
There were no apparent issues with missing results or unpublished data that could have affected the conclusions drawn in this review. Although there is no direct evidence of reporting bias (e.g., selective outcome reporting), it is possible that some missing data may have been overlooked due to the nature of the review process. The studies reviewed were primarily from reputable sources, and the risk of bias was generally low-to-moderate, as discussed in the overall risk of bias section. None of the studies involved specific tools for detecting reporting bias, such as funnel plots, as the analysis was conducted manually. While the articles often cited established methodologies and had funding from reputable sources, a few studies exhibited minor subjectivity. This subjectivity arose due to the lack of standardized criteria or transparency in some of the data presented, which could have contributed to selective reporting in those cases. However, despite these concerns, the studies provided a solid foundation for the conclusions.
3.5.6. Certainty of Evidence
The certainty of evidence in this review is considered moderate-to-high. The included studies were drawn from respected databases such as NCBI and demonstrated low-to-moderate risk of bias. Most of the studies were professionally written, well cited, and showed reliable study design and results. Additionally, conflicts of interest were generally disclosed and did not appear to significantly influence the conclusions of the studies. Although no specific frameworks for assessing the certainty of evidence were used, the review found the evidence to be trustworthy due to the quality of the articles. However, there are a few limitations that may affect the certainty of the evidence. These include the narrow scope of some studies (e.g., limited sample sizes, testing on specific cell types or organisms) and the possibility of bias in articles authored by creators of the tools being discussed (e.g., MAGeCK and CERES), which may have led to favorable comparisons of their own tools against others. Despite these limitations, the evidence was generally consistent, with similar findings reported across multiple studies regarding the tools discussed. While some tools were only mentioned in one article, those mentioned in more than one showed consistent data. Given the specific nature of the tools reviewed, smaller sample sizes were expected and did not seem to significantly affect the results. Overall, the certainty of the evidence is supported by the credibility of the sources and the consistency of findings across the studies.