Models of Low-Dimensional Vector-Fuzzy Representations of Genetic Sequences and Amino Acids
Abstract
1. Introduction
1.1. Description of the Contributions of the Paper
- We introduce two new representations of genetic sequences in ℝ4, using vectors and fuzzy sets. In the new representations, we succeed in representing nucleotides, triplets and general genetic sequences in the 4-dimensional Euclidean space ℝ4, assigning unit vectors of ℝ4 to such sequences. Thus, we avoid restricting ourselves to only the 12-dimensional space ℝ12 and investigate the geometrical image in the Euclidean space ℝ4, significantly decreasing the number of coordinates of the corresponding biological data: amino acids and larger genetic sequences.
- Having these new representations, we investigate the similarity, difference and Euclidean distances between genetic sequences, studying the influence of the new methodologies. For that, we compare the new models with known models representing genetic sequences in the fuzzy polynucleotide space . Our study proves that there is a better differentiation of the sequences in the new models and that the criteria of similarity, difference and Euclidean distance provide better qualitative and quantitative data.
1.2. Outline of the Paper
2. Preliminaries
2.1. Basic Notions for Fuzzy Sets
- (1)
- : X → [0, 1] and
- (2)
- A = {(x, (x)): x ∈ X}, that is A is the set of all pairs (x, (x)) such that x ∈ X and (x) is the degree of its membership in A.
- (1)
- by A ∧ B we define the fuzzy set for which the membership function : X → [0, 1] is given as
- (2)
- by A ∨ B we define the fuzzy set for which the membership function : X → [0, 1] is given as
- (1)
- The degree of similarity between A and B, denoted by sim(A,B), is defined to be the numberwhere C is the fuzzy set , that is C is the canonical midpoint between A and B.
- (2)
- The degree of difference between A and B, denoted by dif(A,B), is defined to be the number
2.2. Fuzzy Polynucleotide Space (FPS)
2.3. Basic Notions for Vectors
- (1)
- and
- (2)
- .
3. Vector-Fuzzy-I Representation of Genetic Sequences
- New Representation[called Vector-Fuzzy-I (in short VF-I) representation of s]
- Step 1. We make the correspondence of unit vectors to each codon as follows:As each , i = 1, 2, , k, of the above sequence s is one of the elements U, C, A, G, we find that each , i = 1, 2, , k, is one of the vectors , Go to Step 2.
- Step 2. Find the vector:and go to Step 3.
- Step 3. Find the vector:and go to Step 4.
- Step 4. Find the vector:and go to Step 5.
- Step k − 1. Find the vector:and go to Step k.
- Step k. Find the vector:and go to Step k + 1.
- Step k + 1. Assign the genetic sequence s to the vector wk−1.
- (1)
- The final vector w of the VF-I representation of a genetic sequence s is unique.
- (2)
- If and are two different genetic sequences and are their VF-I representations, respectively, then .
- (1)
- For CAU (histidine) we have C = , A = and U = . Thus, the corresponding vectors of VF-I representation are as follows:
- Step 1. We add the first two vectors corresponding to C and A and construct the corresponding unit vector, as follows:
- Step 2. We add to the previous unit vector w1 the next vector corresponding to U, as follows:Thus, CAU corresponds to the following unit vector:A schematic geometrical image of CAU is given in Figure 4.
- (2)
- For CCG (proline) we have C = , C = and G = . Thus, the corresponding vectors of VF-I representation are as follows:
- Step 1. We add the first two vectors corresponding to C and C and construct the corresponding unit vector, as follows:
- Step 2. We add to the resulting unit vector w1 the unit vector corresponding to G, as follows:Thus, CCG corresponds to the following unit vector:A schematic geometrical image of CCG is given in Figure 5.
- (3)
- (4)
- In Table 4 we can see the VF-I representations of the genetic sequences with three triplets (approximately, two decimal digits).
4. Vector-Fuzzy-II Representation of Genetic Sequences
- (1)
- The partial positional weight of N, denoted by (N) is defined to be a number of the set {1, 2, 3} which refers to the position i of the nucleotide N in the triple j of s. That is,
- (2)
- The total positional weight of N, denoted by Σ(N), is defined to be the sum of all (N) in the sequence s. That is,
- (1)
- For UCG (serine) we have the following:
- (2)
- For CGU (arginine) we have the following:
- (3)
- For UCG-CGU we have the following:
- New Representation[called Vector-Fuzzy-II (in short VF-II) representation of s]
- Step 1. Find the vector:and go to Step 2.
- Step 2. Assign the genetic sequence s of (24) to the vector w.
- (1)
- The vector w of the VF-II representation of a genetic sequence s is unique.
- (2)
- If and are two different genetic sequences and are their VF-II representations, respectively, then
- (1)
- For UGU (cysteine) we have as U is present 2 times at positions 1 and 3, and as G is present only once at position 2. Thus, the corresponding vector of VF-II is as follows:Thus, UGU corresponds to the following unit vector:
- (2)
- For AUG (methionine) we have , as U is present only once at position 2, PF(C) = 0, as A is located once at position 1, and as G is present only once at position 3. Thus, the corresponding vector of VF-II is as follows:Thus, AUG corresponds to the following unit vector:Additionally, regarding the cosine of the angle θ of the above vector representations ofand applying Definition 3, we have cosθ = Thus, θ ≈ 60°.
- (3)
5. Characteristics of Vector-Fuzzy Representation
- Application 1: Regarding in combination with and
- Application 2: Regarding in combination with and
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Paun, G.; Rozenberg, G.; Saloma, A. DNA Computing: New Computing Paradigms; Springer: Berlin, Germany, 1998. [Google Scholar]
- Percus, J. Mathematics of Genome Analysis; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
- Freeland, S.J.; Hurst, L.D. The genetic code is one in a million. J. Mol. Evol. 1998, 47, 238–248. [Google Scholar] [CrossRef]
- Chou, K.C. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins-Struct. Funct. Genet. 2001, 43, 246–255. [Google Scholar] [CrossRef] [PubMed]
- Chen, C.; Tian, Y.X.; Zou, X.Y.; Cai, P.X.; Mo, J.Y. Using pseudo-amino acid composition and support vector machine to predict proteins tructural class. J. Theor. Biol. 2006, 243, 444–448. [Google Scholar] [CrossRef]
- Chou, K.C.; Cai, Y.D. Predicting protein quaternary structure by pseudo amino acid composition. Proteins-Struct. Funct. Genet. 2003, 53, 282–289. [Google Scholar] [CrossRef]
- Wang, S.; Liu, S. Protein sub-nuclear localization based on effective fusion representations and dimension reduction algorithm LDA. Int. J. Mol. Sci. 2015, 16, 30343–30361. [Google Scholar] [CrossRef]
- Du, P.; Li, Y. Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinform. 2006, 7, 518. [Google Scholar] [CrossRef]
- Georgiou, D.N.; Karakasidis, T.E.; Nieto, J.J.; Torres, A. A study of entropy/clarity of genetic sequences using metric spaces and fuzzy sets. J. Theor. Biol. 2010, 267, 95–105. [Google Scholar] [CrossRef] [PubMed]
- Lin, H.; Li, Q.Z. Using pseudo amino acid composition to predict protein structural class: Approached by incorporating 400 dipeptide components. J. Comput. Chem. 2007, 28, 1463–1466. [Google Scholar] [CrossRef]
- Mondal, S.; Bhavna, R.; MohanBabu, R.; Ramakumar, S. Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. J. Theor. Biol. 2006, 243, 252–260. [Google Scholar] [CrossRef]
- Mundra, P.; Kumar, M.; Kumar, K.K.; Jayaraman, V.K.; Kulkarni, B.D. Using pseudo amino acid composition to predict protein subnuclear localization: Approached with PSSM. Pattern Recognit. Lett. 2007, 28, 1610–1615. [Google Scholar] [CrossRef]
- Xiao, X.; Shao, S.H.; Huang, Z.D.; Chou, K.C. Using pseudo amino acid composition to predict protein structural classes: Approached with complexity measure factor. J. Comput. Chem. 2006, 27, 478–482. [Google Scholar] [CrossRef]
- Zhou, X.B.; Chen, C.; Li, Z.C.; Zou, X.Y. Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J. Theor. Biol. 2007, 248, 546–551. [Google Scholar] [CrossRef]
- Sadegh-Zadeh, K. Fundamentals of clinical methodology: 3. Nosology. Artif. Intell. Med. 1999, 17, 87–108. [Google Scholar] [CrossRef]
- Kosko, B. Neural Networks and Fuzzy Systems; Prentice-Hall: Englewood Cliffs, NJ, USA, 1992. [Google Scholar]
- Torres, A.; Nieto, J.J. The fuzzy polynucleotide space: Basic properties. Bioinformatics 2003, 19, 587–592. [Google Scholar] [CrossRef]
- Lin, C.T. Adaptive subsethood for radial basis fuzzy systems. In Fuzzy Engineering; Kosko, B., Ed.; Prentics-Hall: Upper Saddle River, NJ, USA, 1997; pp. 429–464. [Google Scholar]
- Sadegh-Zadeh, K. Fuzzy genomes. Artif. Intell. Med. 2000, 18, 1–28. [Google Scholar] [CrossRef]
- Nieto, J.J.; Torres, A. Midpoints for fuzzy sets and their application in medicine. Artif. Intell. Med. 2023, 17, 81–101. [Google Scholar] [CrossRef]
- Chou, K.C.; Shen, H.B. Review: Recent progresses in protein subcellular location prediction. Anal. Biochem. 2007, 370, 1–16. [Google Scholar] [CrossRef]
- Gusev, V.D.; Nemytikova, L.A.; Chuzhanova, N.A. On the complexity measures of genetic sequences. Bioinformatics 1999, 15, 994–999. [Google Scholar] [CrossRef]
- Jiang, T.; Lin, G.; Ma, B.; Zhang, K. A general edit distance between RNA structures. J. Comput. Biol. 2002, 9, 371–388. [Google Scholar] [CrossRef]
- Li, C.; Dai, Q.; He, P.A. A time series representation of protein sequences for similarity comparison. J. Theor. Biol. 2022, 538, 111039. [Google Scholar] [CrossRef]
- Koo, P.K.; Eddy, S.R. Representation learning of genomic sequence motifs with convolutional neural networks. PLoS Comput. Biol. 2019, 15, e1007560. [Google Scholar] [CrossRef]
- Dennler, O.; Ryan, C.J. Evaluating sequence and structural similarity metrics for predicting shared paralog functions. NAR Genom. Bioinform. 2025, 7, lqaf051. [Google Scholar] [CrossRef]
- Kösoglu-Kind, B.; Loredo, R.; Grossi, M.; Bernecker, C.; Burks, J.M.; Buchkremer, R. A biological sequence comparison algorithm using quantum computers. Sci. Rep. 2023, 13, 14552. [Google Scholar] [CrossRef]
- Lee, Y.C.; Jung, S.H.; Kumar, A.; Shim, I.; Song, M.; Kim, M.S.; Kim, K.; Myung, W.; Park, W.Y.; Won, H.H. ICD2Vec: Mathematical representation of diseases. J. Biomed. Inform. 2023, 141, 104361. [Google Scholar] [CrossRef]
- Liu, Y.; Wu, R.; Yang, A. Research on Medical Problems Based on Mathematical Models. Mathematics 2023, 11, 2842. [Google Scholar] [CrossRef]
- Zayed, A.I. A new perspective on the role of mathematics in medicine. J. Adv. Res. 2019, 17, 49–54. [Google Scholar] [CrossRef]
- Kuruvilla, F.G.; Park, P.J.; Schreiber, S.L. Vector algebra in the analysis of genome-wide expression data. Genome Biol. 2002, 3, research0011.1–research0011.11. [Google Scholar] [CrossRef]
- Liu, E.S.F.; Wu, V.W.C.; Harris, B.; Foote, M.; Lehman, M.; Chan, L.W.C. Vector-model-supported optimization in volumetric-modulated arc stereotactic radiotherapy planning for brain metastasis. Med. Dosim. 2017, 42, 85–89. [Google Scholar] [CrossRef]
- Liang, Y.; Yang, S.; Zheng, L.; Wang, H.; Zhou, J.; Huang, S.; Yang, L.; Zuo, Y. Research progress of reduced amino acid alphabets in protein analysis and prediction. Comput. Struct. Biotechnol. J. 2022, 20, 3503–3510. [Google Scholar] [CrossRef]
- Zadeh, L.A. Fuzzy Sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
- Klir, G.J.; Yuan, B. Fuzzy Sets and Fuzzy Logic (Theory and Applications); Prentice Hall PRT: Hoboken, NJ, USA, 1995. [Google Scholar]
- Terano, T.; Asai, K.; Sugeno, M. Fuzzy Systems Theory and its Applications; Academic Press, Harcount Brace Jovanovich Publishers: San Diego, CA, USA, 1992. [Google Scholar]
- Zimmermann, H.J. Fuzzy Theory and Its Applications; Kluwer Academic Publishers: New York, NY, USA, 1991. [Google Scholar]
- Stojanovic, N.; Lakovic, M. ℚ[ε]-Fuzzy Sets. J. Korean Soc. Ind. Appl. Math. 2024, 28, 303–318. [Google Scholar]
- Nieto, J.J.; Torres, A.; Georgiou, D.N.; Karakasidis, T.E. Fuzzy polynucleotide spaces and metrics. Bull. Math. Biol. 2006, 68, 703–725. [Google Scholar] [CrossRef] [PubMed]
- Smith, L. Linear Algebra, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 1984. [Google Scholar]
- Engelking, R. General Topology; Heldermann Verlag: Berlin, Germany, 1989. [Google Scholar]
- Karakasidis, T.E.; Georgiou, D.N. Partitioning elements of the periodic table via fuzzy clustering technique. Soft Comput. 2004, 8, 231–236. [Google Scholar]
- Samaras, P.; Kungolos, A.; Karakasidis, T.; Georgiou, D.; Perakis, K. Statistical evaluation of PCDD/F emission data during solid waste combustion by fuzzy clustering techniques. J. Environ. Sci. Health Part A 2001, 36, 153–161. [Google Scholar] [CrossRef] [PubMed]





| Amino Acid | 3-Letter Code | Name | Reverse Codon |
|---|---|---|---|
| 1. | Ala | Alanine | GCU, GCC, GCA, GCG |
| 2. | Cys | Cysteine | UGU, UGC |
| 3. | Asp | Aspartic acid | GAU, GAC |
| 4. | Glu | Glutamic acid | GAA, GAG |
| 5. | Phe | Phenylalanine | UUU, UUC |
| 6. | Gly | Glycine | GGU, GGC, GGA, GGG |
| 7. | His | Histidine | CAU, CAC |
| 8. | Ile | Isoleucine | AUU, AUC, AUA |
| 9. | Lys | Lysine | AAA, AAG |
| 10. | Leu | Leucine | UUA, UUG, CUU, CUC, CUA, CUG |
| 11. | Met | Methionine | AUG |
| 12. | Asn | Asparagine | AAU, AAC |
| 13. | Pro | Proline | CCU, CCC, CCA, CCG |
| 14. | Gln | Glutamine | CAA, CAG |
| 15. | Arg | Arginine | CGU, CGC, CGA, CGG, AGA, AGG |
| 16. | Ser | Serine | UCU, UCC, UCA, UCG, AGU, AGC |
| 17. | Thr | Threonine | ACU, ACC, ACA, ACG |
| 18. | Val | Valine | GUU, GUC, GUA, GUG |
| 19. | Trp | Tryptophane | UGG |
| 20. | Tyr | Tyrosine | UAU, UAC |
| Genetic Sequence | Symbol | Genetic Code | FPS Representation |
|---|---|---|---|
| tyrosine/cysteine | UAC-UGU | (1, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0.5, 0, 0) | |
| histidine/cysteine | CAC-UGU | (0.5, 0.5, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0.5, 0, 0) | |
| leucine/cysteine | CUC-UGU | (0.5, 0.5, 0, 0, 0.5, 0, 0, 0.5, 0.5, 0.5, 0, 0) | |
| histidine/cysteine | CAU-UGU | (0.5, 0.5, 0, 0, 0, 0, 0.5, 0.5, 1, 0, 0, 0) | |
| glutamine/cysteine | CAG-UGU | (0.5, 0.5, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0, 0, 0.5) | |
| glutamine/cysteine | CAA-UGU | (0.5, 0.5, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0, 0.5, 0) |
| Genetic Sequence | Symbol | Genetic Code | VF-I Representation |
|---|---|---|---|
| tyrosine/cysteine | UAC-UGU | (0.90, 0.16, 0.12, 0.39) | |
| histidine/cysteine | CAC-UGU | (0.87, 0.27, 0.11, 0.41) | |
| leucine/cysteine | CUC-UGU | (0.89, 0.22, 0.00, 0.40) | |
| histidine/cysteine | CAU-UGU | (0.91, 0.10, 0.10, 0.39) | |
| glutamine/cysteine | CAG-UGU | (0.84, 0.12, 0.12, 0.52) | |
| glutamine/cysteine | CAA-UGU | (0.87, 0.11, 0.27, 0.41) |
| Genetic Sequence | Symbol | Genetic Code | VF-I Representation |
|---|---|---|---|
| leucine/asparagine/serine | UUA-AAU-UCU | (0.91, 0.39, 0.13, 0.00) | |
| alanine/glutamic acid/phenylalanine | GCU-GAG-UUU | (0.98, 0.02, 0.07, 0.16) | |
| valine/tyrosine/arginine | GUG-UAU-AGG | (0.19, 0.00, 0.30, 0.94) |
| Genetic Sequence | Symbol | Genetic Code | VF-II Representation |
|---|---|---|---|
| tyrosine/cysteine | UAC-UGU | (0.61, 0.34, 0.51, 0.51) | |
| histidine/cysteine | CAC-UGU | (0.50, 0.50, 0.50, 0.50) | |
| leucine/cysteine | CUC-UGU | (0.57, 0.57, 0.00, 0.57) | |
| histidine/cysteine | CAU-UGU | (0.33, 0.77, 0.38, 0.38) | |
| glutamine/cysteine | CAG-UGU | (0.39, 0.78, 0.39, 0.31) | |
| glutamine/cysteine | CAA-UGU | (0.39, 0.78, 0.31, 0.39) |
| Genetic Sequence | Symbol | Genetic Code | VF-II Representation |
|---|---|---|---|
| leucine/asparagine/serine | UUA-AAU-UCU | (0.58, 0.58, 0.58, 0.00) | |
| alanine/glutamic acid/phenylalanine | GCU-GAG-UUU | (0.43, 0.49, 0.49, 0.58) | |
| valine/tyrosine/arginine | GUG-UAU-AGG | (0.53, 0.00, 0.71, 0.46) |
| Euclidean Distance d | |||
|---|---|---|---|
| FPS Representation Table 1 | VF-I Representation Table 2 | VF-II Representation Table 4 | |
| 0.7071068 | 0.1161895 | 0.1946792 | |
| 1 | 0.1349074 | 0.5640922 | |
| Similarity/Difference | |||
|---|---|---|---|
| FPS Representation Table 1 | VF-I Representation Table 2 | VF-II Representation Table 4 | |
| 0.8333333 | 0.9473684 | 0.9269521 | |
| 0.1666667 | 0.0526316 | 0.0730479 | |
| 0.6666667 | 0.9350649 | 0.7717391 | |
| 0.3333333 | 0.0649351 | 0.2282609 | |
| Euclidean Distance d | |||
|---|---|---|---|
| FPS Representation Table 1 | VF-I Representation Table 2 | VF-II Representation Table 4 | |
| 0.7071068 | 0.1161895 | 0.1946792 | |
| 0.7071068 | 0.1228821 | 0.5144900 | |
| 0.7071068 | 0.1760682 | 0.3613862 | |
| 0.7071068 | 0.1886796 | 0.3724245 | |
| 0.7071068 | 0.2262741 | 0.3724245 | |
| Similarity/Difference | |||
|---|---|---|---|
| FPS Representation Table 1 | VF-I Representation Table 2 | VF-II Representation Table 4 | |
| 0.8333333 | 0.9400631 | 0.8086253 | |
| 0.1666667 | 0.0599369 | 0.1913747 | |
| 0.8333333 | 0.9240506 | 0.8238342 | |
| 0.1666667 | 0.0759494 | 0.1761658 | |
| 0.8333333 | 0.9079755 | 0.8217054 | |
| 0.1666667 | 0.0920245 | 0.1782946 | |
| 0.8333333 | 0.9036145 | 0.8217054 | |
| 0.1666667 | 0.0963855 | 0.1782946 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Sereti, F.; Georgiou, D.; Karakasidis, T. Models of Low-Dimensional Vector-Fuzzy Representations of Genetic Sequences and Amino Acids. AppliedMath 2026, 6, 39. https://doi.org/10.3390/appliedmath6030039
Sereti F, Georgiou D, Karakasidis T. Models of Low-Dimensional Vector-Fuzzy Representations of Genetic Sequences and Amino Acids. AppliedMath. 2026; 6(3):39. https://doi.org/10.3390/appliedmath6030039
Chicago/Turabian StyleSereti, Fotini, Dimitrios Georgiou, and Theodoros Karakasidis. 2026. "Models of Low-Dimensional Vector-Fuzzy Representations of Genetic Sequences and Amino Acids" AppliedMath 6, no. 3: 39. https://doi.org/10.3390/appliedmath6030039
APA StyleSereti, F., Georgiou, D., & Karakasidis, T. (2026). Models of Low-Dimensional Vector-Fuzzy Representations of Genetic Sequences and Amino Acids. AppliedMath, 6(3), 39. https://doi.org/10.3390/appliedmath6030039

