Next Article in Journal
Transcriptome Dynamics Reveal the Potential Roles of Long Non-Coding RNAs in Regulating Flower Color of Safflowers (Carthamus tinctorius)
Previous Article in Journal
Novel Disease-Specific Panel of Salivary microRNAs for the Detection of Oral Squamous Cell Carcinoma from Early Invasion to Stage IV Disease
Previous Article in Special Issue
Potyvirus HcPro Suppressor of RNA Silencing Induces PVY Superinfection Exclusion in a Strain-Specific Manner
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Identifying Conserved Regions in HIV-1 Proteins by Entropy Analysis of Sequence Variability

by
Alexandr N. Shchemelev
*,
Elena N. Serikova
,
Yulia V. Ostankova
,
Vladimir S. Davydenko
,
Edward S. Ramsay
and
Areg A. Totolian
Saint Petersburg Pasteur Institute, 197101 St. Petersburg, Russia
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2026, 27(11), 5139; https://doi.org/10.3390/ijms27115139 (registering DOI)
Submission received: 27 March 2026 / Revised: 2 June 2026 / Accepted: 3 June 2026 / Published: 5 June 2026
(This article belongs to the Special Issue Viral Infections and Viral Pathogenesis)

Abstract

The extraordinary genetic diversity of human immunodeficiency virus type 1 (HIV-1), driven by high mutation and recombination rates, poses significant challenges for diagnostics, therapy, and vaccine development. While variable regions enable immune escape, hyperconserved regions are critical for viral function and represent promising targets for novel therapeutic interventions. This study aimed to develop and validate a bioinformatic algorithm for quantitative assessment of sequence conservation and automated identification of functionally significant conserved regions across all major HIV-1 proteins. A total of 1119 full-length HIV-1 genome sequences representing major subtypes (A1, A2, A6, B, C, D, F1, F2, G, H, J, K) were analyzed. Normalized Shannon entropy (S-index) was calculated for each alignment column. Statistical thresholds for conserved regions were established using 95% confidence intervals derived from bootstrap resampling. Two complementary algorithms, clustering and local maxima detection, were applied to identify conserved regions, which were subsequently mapped to known functional domains based on literature data. Protein conservation varied markedly, with Sm values ranging from 0.784 (Vpu) to 0.920 (Pol). Gag, Pol, and Vpr demonstrated the highest overall conservation, while Env, Rev, Tat, and Vpu exhibited pronounced variability interspersed with conserved domains. In total, 25 conserved regions in Gag, 49 in Pol, 28 in Env, and 6–4 regions in accessory proteins (Vif, Vpr, Rev, Tat, Nef, Vpu) were identified. These regions corresponded to critical functional elements including enzyme catalytic centers, zinc fingers, receptor-binding sites, protein interaction interfaces, and membrane-anchoring domains. The developed computational framework enables statistically grounded identification of evolutionarily constrained regions across analyzed HIV-1 subtypes. The identified conserved regions represent candidate sites for further investigation and may inform downstream studies focused on antiviral target prioritization, immunogen design, and diagnostic assay development. However, their translational applicability requires additional analytical, structural, and experimental validation.
Keywords: HIV-1; genetic diversity; sequence conservation; Shannon entropy; gag; pol; env; accessory proteins; conserved regions; bioinformatics HIV-1; genetic diversity; sequence conservation; Shannon entropy; gag; pol; env; accessory proteins; conserved regions; bioinformatics

Share and Cite

MDPI and ACS Style

Shchemelev, A.N.; Serikova, E.N.; Ostankova, Y.V.; Davydenko, V.S.; Ramsay, E.S.; Totolian, A.A. Identifying Conserved Regions in HIV-1 Proteins by Entropy Analysis of Sequence Variability. Int. J. Mol. Sci. 2026, 27, 5139. https://doi.org/10.3390/ijms27115139

AMA Style

Shchemelev AN, Serikova EN, Ostankova YV, Davydenko VS, Ramsay ES, Totolian AA. Identifying Conserved Regions in HIV-1 Proteins by Entropy Analysis of Sequence Variability. International Journal of Molecular Sciences. 2026; 27(11):5139. https://doi.org/10.3390/ijms27115139

Chicago/Turabian Style

Shchemelev, Alexandr N., Elena N. Serikova, Yulia V. Ostankova, Vladimir S. Davydenko, Edward S. Ramsay, and Areg A. Totolian. 2026. "Identifying Conserved Regions in HIV-1 Proteins by Entropy Analysis of Sequence Variability" International Journal of Molecular Sciences 27, no. 11: 5139. https://doi.org/10.3390/ijms27115139

APA Style

Shchemelev, A. N., Serikova, E. N., Ostankova, Y. V., Davydenko, V. S., Ramsay, E. S., & Totolian, A. A. (2026). Identifying Conserved Regions in HIV-1 Proteins by Entropy Analysis of Sequence Variability. International Journal of Molecular Sciences, 27(11), 5139. https://doi.org/10.3390/ijms27115139

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop