1. Introduction
As COVID-19 spreads worldwide with 25 million cases and 800,000 deaths occurring between January and August 2020 [
1], there is an urgent need for novel treatment options. There are currently no known pharmaceutical treatments for SARS-CoV-2 infection. One strategy to accelerate the identification of possible leads is to reposition drugs with known targets and mechanisms that may have been through parts of the FDA approval process [
2]. Avoiding this development pipeline known for its low success rate [
3] advantageously saves invaluable time and monetary cost. While this is ultimately the fastest way to get treatments to patients in need, the most efficient way to discover drugs with the potential for repurposing is unclear.
In the short time since the beginning of the pandemic, many attempts to predict candidate drugs for repositioning have been made. Given the novel nature of the virus, methods of target prediction have been forced to utilize the limited data that is available or creatively repurpose data from related coronaviruses. In vitro screenings of chemical libraries have been used to identify inhibitors of SARS-CoV-2 replication [
4,
5] and cellular toxicity [
6]. Screenings of experimentally verified SARS-CoV-2 interacting host proteins [
7] have elucidated key infection mechanisms which, when compared to drug databases, have predicted a range of possible targets for repurposing. Network analyses using protein interaction data from up to 13 related human coronaviruses [
8,
9] combined with in vitro screenings have identified additional sets of cellular pathways to consider for drug repurposing. Topology of protein interactions and drug–gene interactions combined with differential expression and pathway analysis has been used to identify possible mechanisms of action for SARS-CoV-2 infection [
10]. With each method integrating varying levels of biological detail, overlap between studies is optimal for ensuring infection-specific relevance and effectiveness.
Here, the existing methods for identifying influenza A virus drug targets [
11] are applied to SARS-CoV-2–human host protein interaction data to predict and prioritize candidate targets for drug repurposing. Two methods of network controllability determine the identity of proteins acting as regulators of the infected cell marked by changes to the network’s behavior after the addition of virus–host protein interactions. Both methods use a maximum matching algorithm (e.g., Hopcroft-Karp) to identify the “driver nodes” of the network which must be manipulated for the system to be fully controlled (analogous to the non-zero elements of the state space B matrix of classic control systems engineering [
12]). These nodes, specified by the paths which span the maximum amount of the network with no node sharing two edges, dictate the “easiest” way in which control can propagate through the network. Directing total system behavior is impossible without manipulating all driver nodes of the system at once. Of note, driver sets are typically not unique with the number of driver node sets scaling exponentially with the size of the network [
13]. As a result, each driver node set (size
) can also be referred to as a minimum input set (MIS).
The analysis contains two methods of controllability. In robust controllability [
14], each node of the network is removed, the driver set is re-calculated (size
), and the removed node is classified by its effect on the changes to the size of the driver set. Increasing the number of driver nodes (
) makes it more difficult to control the network (these nodes are classified as indispensable nodes) and decreasing the number of driver nodes (
) makes it easier to control the network (these nodes are classified as dispensable nodes). A removed node with no effect on the number of driver nodes (
) is classified as a neutral node. This method provides information concerning the structural robustness of the network and the effect of losing singular network components. A second method, global controllability [
15], classifies each node by its membership to all possible MISs of the network. Critical nodes are included in all of the network’s possible MISs, intermittent nodes are only included in some of the possible MISs, and redundant nodes are not included in any of the possible MISs. Therefore, this method presents information about alternative methods of network control.
A comparison of the controllability of the human protein–protein interaction network (Host Interaction Network, HIN) and the human network with the addition of SARS-CoV-2–host protein interactions (Virus Integrated Network, VIN) can be used to identify proteins with unique post-infection roles in driving total cell behavior. Assuming the identified differences are representative of biological changes within the cell (such as changes to gene regulation), the protein predictions have potential as virus-specific drug targets. Here, 16 proteins are identified by topological, controllability, and biological relevance to viral infection. Of these proteins, eight are prioritized for drug repurposing efforts to treat SARS-CoV-2 infection based on previous druggability and relevance to functions such as translation, cellular transport, and the immune response.
4. Discussion
Here, a set of drug targets is prioritized for drug repurposing efforts in the global fight against COVID-19. Network controllability methods, with only disease-specific virus–host and host–host protein interaction data, create a large-scale representation of regulatory changes occuring during infection. With no additional biological information, the connectivity of the network is sufficient to predict the most biologically relevant components of the disease system, as evidenced by the high level of overlap between the presented results and the extensive biological analysis performed in Gordon et al.’s study for SARS-CoV-2 [
7]. In total, this study demonstrates a simple computational approach to prioritizing drug target predictions with minimal biological context, an advantage in present times where viral understanding and data is even more sparse than usual.
As seen in the previous study of influenza A virus [
11], the magnitude of control needed to manipulate the total cell system (number of driver proteins) is comparable in the healthy and infected cellular networks. The small changes in driver proteins between the networks are seen in immunoregulatory proteins that are typically upregulated during viral infection (such as TRIM51 and MICA) and many of the proteins identified in the controllability analyses. This is reflective of the activation of the immune response pathways and their effect on the cell as a whole.
With respect to the ratio of resultant classifications in both controllability methods, outcomes are again similar to those achieved with the influenza A virus–host network [
11]. This is unsurprising due to the use of the same host network in the analyses. However, the low overlap between the controllability predicted proteins for the two diseases (3/16 proteins, PVR, RAB14, and SAAL1) demonstrates that while the method is easily applied to other viruses, the result is unique.
One limitation of this method is the requirement of high-confidence virus–host protein interaction data where the host proteins exist in the HIN. As experimentally-validated, directed networks are typically smaller than the available undirected networks, the method was unfortunately unable to use over half of the 332 known SARS-CoV-2–host protein interactions (in comparison, the influenza A virus network contains 752 virus–host interactions). Even so, the controllability analysis was able to predict biologically relevant proteins involved in functions like the cellular stress response, host translation, and cellular transport, proving the robustness of the method.
Of the eight prioritized targets, all but NUP210 exhibit large increases in betweenness after the addition of virus–host protein interactions, placing particular importance on their role in infected cell behavior based on topology. Further, most of the global protein set (including the novel predictions, PVR, and SCARB1) have a betweenness of zero in the HIN, implying that their individual significance to cellular network flow is truly unique to the infected cell. With the majority of the identified proteins being regulated by the interferon response (with a fold change in expression greater than two), this network result translates to immunological significance. The alterations in classification for all global proteins indicate a step down in network control where critical proteins have the most control and redundant the least. Biologically, this could represent viral interruption of normal host function or activation of a new pathway, both being interesting prospects for drug development.
Given the biological relevance of the topologically predicted/controllability target proteins, there is a good reason to pursue these recommendations, either in drug repurposing or in novel drug development. The extended list of untested compounds found in
Supplementary Table S1 will be considered for further viral inhibition studies, particularly tocopherol/vitamin E which is already approved and has documented positive effects on viral clearance for influenza A [
36]. Predictions indicate opportunity to both interfere with the viral replication cycle or to modulate the immune response to infection. Therefore, to most efficiently translate these findings to bedside, knockdown studies or siRNA screens should be used to validate drug predictions for each target. Cell culture studies that track interferon and cytokine activity may further establish a possible mechanism between the proposed targets and immune regulation. By narrowing the pool of drug target candidates with controllability methods, experimental validation will be efficient and timely.