The fast replication rate and lack of repair mechanisms of human immunodeficiency virus (HIV) contribute to its high mutation frequency, with some mutations resulting in the evolution of resistance to antiretroviral therapies (ART). As such, studying HIV drug resistance allows for real-time evaluation of evolutionary mechanisms. Characterizing the biological process of drug resistance is also critically important for sustained effectiveness of ART. Investigating the link between “black box” deep learning methods applied to this problem and evolutionary principles governing drug resistance has been overlooked to date. Here, we utilized publicly available HIV-1 sequence data and drug resistance assay results for 18 ART drugs to evaluate the performance of three architectures (multilayer perceptron, bidirectional recurrent neural network, and convolutional neural network) for drug resistance prediction, jointly with biological analysis. We identified convolutional neural networks as the best performing architecture and displayed a correspondence between the importance of biologically relevant features in the classifier and overall performance. Our results suggest that the high classification performance of deep learning models is indeed dependent on drug resistance mutations (DRMs). These models heavily weighted several features that are not known DRM locations, indicating the utility of model interpretability to address causal relationships in viral genotype-phenotype data.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited