Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (32)

Search Parameters:
Keywords = k-mer lengths

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 2394 KiB  
Article
First Genomic Survey of Pleurocryptella shinkai Provides Preliminary Insights into Genome Characteristics and Evolution of a Deep-Sea Parasitic Isopod
by Aiyang Wang, Min Hui and Zhongli Sha
Diversity 2025, 17(4), 297; https://doi.org/10.3390/d17040297 - 20 Apr 2025
Cited by 1 | Viewed by 427 | Correction
Abstract
Genomic adaptations of parasitic crustaceans in deep-sea extreme environments are poorly understood. This study presents the first genome survey of Pleurocryptella shinkai, a bopyrid isopod parasitizing deep-sea squat lobsters, using Illumina sequencing. The genome size was estimated to be 1.06 Gb via [...] Read more.
Genomic adaptations of parasitic crustaceans in deep-sea extreme environments are poorly understood. This study presents the first genome survey of Pleurocryptella shinkai, a bopyrid isopod parasitizing deep-sea squat lobsters, using Illumina sequencing. The genome size was estimated to be 1.06 Gb via a K-mer analysis, smaller than its free-living relatives. The repeat content and heterozygosity were 66.31% and 1.14%, respectively, indicating a complex genome. The draft genome assembly yielded 0.93 Gb of scaffolds with an N50 length of 989 bp, and a complete mitochondrial genome of 14,711 bp was obtained. Phylogenetic analyses of 13 mitochondrial protein-coding genes confirmed the monophyly of Bopyridae, supporting Pleurocryptella as the most primitive genus within the group and the key role of deep sea in the origin and diversification of bopyrids. A mitochondrial gene variation analysis identified NAD2 and NAD4 as promising DNA markers for a population genetic study of P. shinkai. Twenty-four positively selected sites across COX1, NAD2, and NAD4 genes in P. shinkai explained the genetic basis of its adaptive evolution at the mitochondrial level. These findings provide valuable genomic resources for deep-sea parasitic crustaceans and establish a foundation for further high-quality genome assembly and adaptive mechanism studies of P. shinkai. Full article
Show Figures

Figure 1

14 pages, 1096 KiB  
Article
Whole-Genome Sequencing of Hexagrammos otakii Provides Insights into Its Genomic Characteristics and Population Dynamics
by Dong Liu, Xiaolong Wang, Jifa Lü, Yijing Zhu, Yuxia Jian, Xue Wang, Fengxiang Gao, Li Li and Fawen Hu
Animals 2025, 15(6), 782; https://doi.org/10.3390/ani15060782 - 10 Mar 2025
Viewed by 641
Abstract
Hexagrammos otakii, also commonly called “Fat Greenling”, is highly valued as an important commercial fish due to its extremely delicious flesh. However, the absence of a genomic resource has limited our understanding of its genetic characteristics and hindered artificial breeding efforts. In [...] Read more.
Hexagrammos otakii, also commonly called “Fat Greenling”, is highly valued as an important commercial fish due to its extremely delicious flesh. However, the absence of a genomic resource has limited our understanding of its genetic characteristics and hindered artificial breeding efforts. In this study, we performed Illumina paired-end sequencing of H. otakii, generating a total of 73.19 Gb of clean data. Based on K-mer analysis, the genome size was estimated to be 679.23 Mb, with a heterozygosity rate of 0.68% and a repeat sequence proportion of 43.60%. De novo genome assembly using SOAPdenovo2 resulted in a draft genome size of 723.31 Mb, with the longest sequence length being 86.24 Kb. Additionally, the mitochondrial genome was also assembled, which was 16,513 bp in size, with a GC content of 47.20%. Minisatellites were the most abundant tandem repeats in the H. otakii genome, followed by microsatellites. In the phylogenetic tree, H. otakii was placed within a well-supported clade (bootstrap support = 100%) that included S. sinica, N. coibor, L. crocea, and C. lucidus. PSMC analysis revealed that H. otakii underwent a population bottleneck during the Pleistocene, peaking around 500 thousand years ago (Kya) and declining to a minimum during the Last Glacial Period (~70–15 Kya), with no significant recovery observed by ~10 Kya. This study was a comprehensive genome survey analysis of H. otakii, providing insights into its genomic characteristics and population dynamics. Full article
(This article belongs to the Special Issue Omics in Economic Aquatic Animals)
Show Figures

Figure 1

15 pages, 4945 KiB  
Article
Genome Survey of Male Rana dybowskii to Further Understand the Sex Determination Mechanism
by Yuan Xu, Hanyu Liu, Xinshuai Jiang, Xinning Zhang, Jiayu Liu, Yaguang Tian, Xiujuan Bai, Shiquan Cui and Shengwei Di
Animals 2024, 14(20), 2968; https://doi.org/10.3390/ani14202968 - 14 Oct 2024
Viewed by 1107
Abstract
Rana dybowskii is one of the important aquaculture species in Northeast China. The fallopian tubes of female R. dybowskii are used to prepare oviductus ranae (an important traditional Chinese medicine). Therefore, R. dybowskii females have higher economical value than males. An increasing female [...] Read more.
Rana dybowskii is one of the important aquaculture species in Northeast China. The fallopian tubes of female R. dybowskii are used to prepare oviductus ranae (an important traditional Chinese medicine). Therefore, R. dybowskii females have higher economical value than males. An increasing female R. dybowskii population can increase the benefits from R. dybowskii culture. However, the genome of amphibians is complex, making it difficult to investigate their sex determination mechanism. In this study, we analyzed the genome of male R. dybowskii using next-generation sequencing technology. A total of 200,046,452,400 bp of clean data were obtained, and the K-mer analysis indicated that the depth was 50×. The genome size of R. dybowskii was approximately 3585.05 M, with a heterozygosity rate, repeat sequence ratio, and genome GC content of 1.15%, 68.96%, and approximately 43.0%, respectively. In total, 270,785 contigs and 498 scaffolds were generated. The size of the contigs and scaffolds was 3,748,543,415 and 3,765,862,278 bp, respectively, with the N50 length of 31,988 and 336,385,783. The longest contig and scaffold were of the size 137,967,485 and 1,808,367,828 bp, respectively. The number of contigs and scaffolds > 10K nt was 99,620 and 451, respectively. Through annotation, 40,913 genes were obtained, including 156,609 CDS (i.e., 3.83 CDS per gene). Sequence alignment was performed with the assembled scaffolding genome in this study. Two and one fragment had high homology with two male-specific DNA molecular markers of R. dybowskii discovered previously (namely, MSM-222 and MSM-261, respectively). In addition, the Dmrt1 gene of R. dybowskii was obtained with a length of 18,893 bp by comparison and splicing. The forward primers amplifying MSM-222 and MSM-261 were located at 322–343 and 14,501–14,526 bp of Dmrt1, respectively. However, sequence alignment revealed that MSM-222 and MSM-261 were not located on Dmrt1, and only some homologous parts were observed. This indicated that in addition to Dmrt1, other important genes may play a crucial role in the sex determination mechanism of R. dybowskii. Our study provided a foundation for the subsequent high-quality genome construction and provided important genomic resources for future studies on R. dybowskii. Full article
(This article belongs to the Section Animal Genetics and Genomics)
Show Figures

Figure 1

14 pages, 2640 KiB  
Article
SNP-Based and Kmer-Based eQTL Analysis Using Transcriptome Data
by Mei Ge, Chenyu Li and Zhiyan Zhang
Animals 2024, 14(20), 2941; https://doi.org/10.3390/ani14202941 - 11 Oct 2024
Cited by 2 | Viewed by 2225
Abstract
Traditional expression quantitative trait locus (eQTL) mapping associates single nucleotide polymorphisms (SNPs) with gene expression, where the SNPs are derived from large-scale whole-genome sequencing (WGS) data or transcriptome data. While WGS provides a high SNP density, it also incurs substantial sequencing costs. In [...] Read more.
Traditional expression quantitative trait locus (eQTL) mapping associates single nucleotide polymorphisms (SNPs) with gene expression, where the SNPs are derived from large-scale whole-genome sequencing (WGS) data or transcriptome data. While WGS provides a high SNP density, it also incurs substantial sequencing costs. In contrast, RNA-seq data, which are more accessible and less expensive, can simultaneously yield gene expressions and SNPs. Thus, eQTL analysis based on RNA-seq offers significant potential applications. Two primary strategies were employed for eQTL in this study. The first involved analyzing expression levels in relation to variant sites detected between populations from RNA-seq data. The second approach utilized kmers, which are sequences of length k derived from RNA-seq reads, to represent variant sites and associated these kmer genotypes with gene expression. We discovered 87 significant association signals involving eGene on the basis of the SNP-based eQTL analysis. These genes include DYNLT1, NMNAT1, and MRLC2, which are closely related to neurological functions such as motor coordination and homeostasis, play a role in cellular energy metabolism, and function in regulating calcium-dependent signaling in muscle contraction, respectively. This study compared the results obtained from eQTL mapping using RNA-seq identified SNPs and gene expression with those derived from kmers. We found that the vast majority (23/30) of the association signals overlapping the two methods could be verified by haplotype block analysis. This comparison elucidates the strengths and limitations of each method, providing insights into their relative efficacy for eQTL identification. Full article
(This article belongs to the Section Animal Genetics and Genomics)
Show Figures

Figure 1

17 pages, 7028 KiB  
Article
Patterns of Diversity and Humoral Immunogenicity for HIV-1 Antisense Protein (ASP)
by Diogo Gama Caetano, Paloma Napoleão-Pêgo, Larissa Melo Villela, Fernanda Heloise Côrtes, Sandra Wagner Cardoso, Brenda Hoagland, Beatriz Grinsztejn, Valdilea Gonçalves Veloso, Salvatore Giovanni De-Simone and Monick Lindenmeyer Guimarães
Vaccines 2024, 12(7), 771; https://doi.org/10.3390/vaccines12070771 - 13 Jul 2024
Cited by 3 | Viewed by 1509
Abstract
HIV-1 has an antisense gene overlapping env that encodes the ASP protein. ASP functions are still unknown, but it has been associated with gp120 in the viral envelope and membrane of infected cells, making it a potential target for immune response. Despite this, [...] Read more.
HIV-1 has an antisense gene overlapping env that encodes the ASP protein. ASP functions are still unknown, but it has been associated with gp120 in the viral envelope and membrane of infected cells, making it a potential target for immune response. Despite this, immune response patterns against ASP are poorly described and can be influenced by the high genetic variability of the env gene. To explore this, we analyzed 100k HIV-1 ASP sequences from the Los Alamos HIV sequence database using phylogenetic, Shannon entropy (Hs), and logo tools to study ASP variability in worldwide and Brazilian sequences from the most prevalent HIV-1 subtypes in Brazil (B, C, and F1). Data obtained in silico guided the design and synthesis of 15-mer overlapping peptides through spot synthesis on cellulose membranes. Peptide arrays were screened to assess IgG and IgM responses in pooled plasma samples from HIV controllers and individuals with acute or recent HIV infection. Excluding regions with low alignment accuracy, several sites with higher variability (Hs > 1.5) were identified among the datasets (25 for worldwide sequences, 20 for Brazilian sequences). Among sites with Hs < 1.5, sequence logos allowed the identification of 23 other sites with subtype-specific signatures. Altogether, amino acid variations with frequencies > 20% in the 48 variable sites identified were included in 92 peptides, divided into 15 sets, representing near full-length ASP. During the immune screening, the strongest responses were observed in three sets, one in the middle and one at the C-terminus of the protein. While some sets presented variations potentially associated with epitope displacement between IgG and IgM targets and subtype-specific signatures appeared to impact the level of response for some peptides, signals of cross-reactivity were observed for some sets despite the presence of B/C/F1 signatures. Our data provides a map of ASP regions preferentially targeted by IgG and IgM responses. Despite B/C/F1 subtype signatures in ASP, the amino acid variation in some areas preferentially targeted by IgM and IgG did not negatively impact the response against regions with higher immunogenicity. Full article
(This article belongs to the Special Issue Research on HIV/AIDS Vaccine)
Show Figures

Figure 1

13 pages, 3102 KiB  
Article
Engineering a Dual Specificity γδ T-Cell Receptor for Cancer Immunotherapy
by David M. Davies, Giuseppe Pugliese, Ana C. Parente Pereira, Lynsey M. Whilding, Daniel Larcombe-Young and John Maher
Biology 2024, 13(3), 196; https://doi.org/10.3390/biology13030196 - 20 Mar 2024
Cited by 4 | Viewed by 2789
Abstract
γδ T-cells provide immune surveillance against cancer, straddling both innate and adaptive immunity. G115 is a clonal γδ T-cell receptor (TCR) of the Vγ9Vδ2 subtype which can confer responsiveness to phosphoantigens (PAgs) when genetically introduced into conventional αβ T-cells. Cancer immunotherapy using γδ [...] Read more.
γδ T-cells provide immune surveillance against cancer, straddling both innate and adaptive immunity. G115 is a clonal γδ T-cell receptor (TCR) of the Vγ9Vδ2 subtype which can confer responsiveness to phosphoantigens (PAgs) when genetically introduced into conventional αβ T-cells. Cancer immunotherapy using γδ TCR-engineered T-cells is currently under clinical evaluation. In this study, we sought to broaden the cancer specificity of the G115 γδ TCR by insertion of a tumour-binding peptide into the complementarity-determining region (CDR) three regions of the TCR δ2 chain. Peptides were selected from the foot and mouth disease virus A20 peptide which binds with high affinity and selectivity to αvβ6, an epithelial-selective integrin that is expressed by a range of solid tumours. Insertion of an A20-derived 12mer peptide achieved the best results, enabling the resulting G115 + A12 T-cells to kill both PAg and αvβ6-expressing tumour cells. Cytolytic activity of G115 + A12 T-cells against PAg-presenting K562 target cells was enhanced compared to G115 control cells, in keeping with the critical role of CDR3 δ2 length for optimal PAg recognition. Activation was accompanied by interferon (IFN)-γ release in the presence of either target antigen, providing a novel dual-specificity approach for cancer immunotherapy. Full article
Show Figures

Graphical abstract

14 pages, 1945 KiB  
Article
Specific Patterns in Correlations of Super-Short Tandem Repeats (SSTRs) with G+C Content, Genic and Intergenic Regions, and Retrotransposons on All Human Chromosomes
by Lukas Henn, Aaron Sievers, Michael Hausmann and Georg Hildenbrand
Genes 2024, 15(1), 33; https://doi.org/10.3390/genes15010033 - 25 Dec 2023
Viewed by 1713
Abstract
The specific characteristics of k-mer words (2 ≤ k ≤ 11) regarding genomic distribution and evolutionary conservation were recently found. Among them are, in high abundance, words with a tandem repeat structure (repeat unit length of 1 bp to 3 bp). Furthermore, there [...] Read more.
The specific characteristics of k-mer words (2 ≤ k ≤ 11) regarding genomic distribution and evolutionary conservation were recently found. Among them are, in high abundance, words with a tandem repeat structure (repeat unit length of 1 bp to 3 bp). Furthermore, there seems to be a class of extremely short tandem repeats (≤12 bp), so far overlooked, that are non-random-distributed and, therefore, may play a crucial role in the functioning of the genome. In the following article, the positional distributions of these motifs we call super-short tandem repeats (SSTRs) were compared to other functional elements, like genes and retrotransposons. We found length- and sequence-dependent correlations between the local SSTR density and G+C content, and also between the density of SSTRs and genes, as well as correlations with retrotransposon density. In addition to many general interesting relations, we found that SINE Alu has a strong influence on the local SSTR density. Moreover, the observed connection of SSTR patterns to pseudogenes and -exons might imply a special role of SSTRs in gene expression. In summary, our findings support the idea of a special role and the functional relevance of SSTRs in the genome. Full article
(This article belongs to the Section Bioinformatics)
Show Figures

Figure 1

18 pages, 5406 KiB  
Article
Crohn’s Disease Prediction Using Sequence Based Machine Learning Analysis of Human Microbiome
by Metehan Unal, Erkan Bostanci, Ceren Ozkul, Koray Acici, Tunc Asuroglu and Mehmet Serdar Guzel
Diagnostics 2023, 13(17), 2835; https://doi.org/10.3390/diagnostics13172835 - 1 Sep 2023
Cited by 5 | Viewed by 2946
Abstract
Human microbiota refers to the trillions of microorganisms that inhabit our bodies and have been discovered to have a substantial impact on human health and disease. By sampling the microbiota, it is possible to generate massive quantities of data for analysis using Machine [...] Read more.
Human microbiota refers to the trillions of microorganisms that inhabit our bodies and have been discovered to have a substantial impact on human health and disease. By sampling the microbiota, it is possible to generate massive quantities of data for analysis using Machine Learning algorithms. In this study, we employed several modern Machine Learning techniques to predict Inflammatory Bowel Disease using raw sequence data. The dataset was obtained from NCBI preprocessed graph representations and converted into a structured form. Seven well-known Machine Learning frameworks, including Random Forest, Support Vector Machines, Extreme Gradient Boosting, Light Gradient Boosting Machine, Gaussian Naïve Bayes, Logistic Regression, and k-Nearest Neighbor, were used. Grid Search was employed for hyperparameter optimization. The performance of the Machine Learning models was evaluated using various metrics such as accuracy, precision, fscore, kappa, and area under the receiver operating characteristic curve. Additionally, Mc Nemar’s test was conducted to assess the statistical significance of the experiment. The data was constructed using k-mer lengths of 3, 4 and 5. The Light Gradient Boosting Machine model overperformed over other models with 67.24%, 74.63% and 76.47% accuracy for k-mer lengths of 3, 4 and 5, respectively. The LightGBM model also demonstrated the best performance in each metric. The study showed promising results predicting disease from raw sequence data. Finally, Mc Nemar’s test results found statistically significant differences between different Machine Learning approaches. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

21 pages, 2742 KiB  
Article
Assessing the Resilience of Machine Learning Classification Algorithms on SARS-CoV-2 Genome Sequences Generated with Long-Read Specific Errors
by Bikram Sahoo, Sarwan Ali, Pin-Yu Chen, Murray Patterson and Alexander Zelikovsky
Biomolecules 2023, 13(6), 934; https://doi.org/10.3390/biom13060934 - 2 Jun 2023
Viewed by 2481
Abstract
The emergence of third-generation single-molecule sequencing (TGS) technology has revolutionized the generation of long reads, which are essential for genome assembly and have been widely employed in sequencing the SARS-CoV-2 virus during the COVID-19 pandemic. Although long-read sequencing has been crucial in understanding [...] Read more.
The emergence of third-generation single-molecule sequencing (TGS) technology has revolutionized the generation of long reads, which are essential for genome assembly and have been widely employed in sequencing the SARS-CoV-2 virus during the COVID-19 pandemic. Although long-read sequencing has been crucial in understanding the evolution and transmission of the virus, the high error rate associated with these reads can lead to inadequate genome assembly and downstream biological interpretation. In this study, we evaluate the accuracy and robustness of machine learning (ML) models using six different embedding techniques on SARS-CoV-2 error-incorporated genome sequences. Our analysis includes two types of error-incorporated genome sequences: those generated using simulation tools to emulate error profiles of long-read sequencing platforms and those generated by introducing random errors. We show that the spaced k-mers embedding method achieves high accuracy in classifying error-free SARS-CoV-2 genome sequences, and the spaced k-mers and weighted k-mers embedding methods are highly accurate in predicting error-incorporated sequences. The fixed-length vectors generated by these methods contribute to the high accuracy achieved. Our study provides valuable insights for researchers to effectively evaluate ML models and gain a better understanding of the approach for accurate identification of critical SARS-CoV-2 genome sequences. Full article
Show Figures

Figure 1

18 pages, 6871 KiB  
Article
Ultrasonication-Tailored Graphene Oxide of Varying Sizes in Multiple-Equilibrium-Route-Enhanced Adsorption for Aqueous Removal of Acridine Orange
by Zhaoyang Han, Ling Sun, Yingying Chu, Jing Wang, Chenyu Wei, Yifang Liu, Qianlei Jiang, Changbao Han, Hui Yan and Xuemei Song
Molecules 2023, 28(10), 4179; https://doi.org/10.3390/molecules28104179 - 18 May 2023
Cited by 4 | Viewed by 2328
Abstract
Graphene oxide (GO) has shown remarkable performance in the multiple-equilibrium-route adsorption (MER) process, which is characterized by further activation of GO through an in-situ reduction process based on single-equilibrium-route adsorption (SER), generating new adsorption sites and achieving an adsorption capacity increase. However, the [...] Read more.
Graphene oxide (GO) has shown remarkable performance in the multiple-equilibrium-route adsorption (MER) process, which is characterized by further activation of GO through an in-situ reduction process based on single-equilibrium-route adsorption (SER), generating new adsorption sites and achieving an adsorption capacity increase. However, the effect of GO on MER adsorption in lateral size and thickness is still unclear. Here, GO sheets were sonicated for different lengths of time, and the adsorption of MER and SER was investigated at three temperatures to remove the typical cationic dye, acridine orange (AO). After sonication, we found that freshly prepared GO was greatly reduced in lateral size and thickness. In about 30 min, the thickness of GO decreased dramatically from several atomic layers to fewer atomic layers to a single atomic layer, which was completely stripped off; after that, the monolayer lateral size reduction dominated until it remained constant. Surface functional sites, such as hydroxyl groups, showed little change in the experiments. However, GO mainly reduces the C=O and C-O bonds in MER, except for the conjugated carbon backbone (C-C). The SER adsorption kinetics of all temperatures fitted the pseudo-first-order and pseudo-second-order models, yet room temperature preferred the latter. An overall adsorption enhancement appeared as sonication time, but the equilibrium capacity of SER GO generally increased with thickness and decreased with the single-layer lateral size, while MER GO conversed concerning the thickness. The escalated temperature facilitated the exfoliation of GO regarding the adsorption mechanism. Thus, the isotherm behaviors of the SER GO changed from the Freundlich model to Langmuir as size and temperature changed, while the MER GO were all of the Freundlich. A record capacity of ~4.3 g of AO per gram of GO was obtained from the MER adsorption with a sixty-minute ultrasonicated GO at 313.15 K. This work promises a cornerstone for MER adsorption with GO as an adsorbent. Full article
(This article belongs to the Special Issue Wastewater Treatment: Functional Materials and Advanced Technology)
Show Figures

Graphical abstract

15 pages, 2123 KiB  
Article
Embedded-AMP: A Multi-Thread Computational Method for the Systematic Identification of Antimicrobial Peptides Embedded in Proteome Sequences
by Germán Meléndrez Carballo, Karen Guerrero Vázquez, Luis A. García-González, Gabriel Del Rio and Carlos A. Brizuela
Antibiotics 2023, 12(1), 139; https://doi.org/10.3390/antibiotics12010139 - 10 Jan 2023
Cited by 7 | Viewed by 3227
Abstract
Antimicrobial peptides (AMPs) have gained the attention of the research community for being an alternative to conventional antimicrobials to fight antibiotic resistance and for displaying other pharmacologically relevant activities, such as cell penetration, autophagy induction, immunomodulation, among others. The identification of AMPs had [...] Read more.
Antimicrobial peptides (AMPs) have gained the attention of the research community for being an alternative to conventional antimicrobials to fight antibiotic resistance and for displaying other pharmacologically relevant activities, such as cell penetration, autophagy induction, immunomodulation, among others. The identification of AMPs had been accomplished by combining computational and experimental approaches and have been mostly restricted to self-contained peptides despite accumulated evidence indicating AMPs may be found embedded within proteins, the functions of which are not necessarily associated with antimicrobials. To address this limitation, we propose a machine-learning (ML)-based pipeline to identify AMPs that are embedded in proteomes. Our method performs an in-silico digestion of every protein in the proteome to generate unique k-mers of different lengths, computes a set of molecular descriptors for each k-mer, and performs an antimicrobial activity prediction. To show the efficiency of the method we used the shrimp proteome, and the pipeline analyzed all k-mers between 10 and 60 amino acids in length to predict all AMPs in less than 20 min. As an application example we predicted AMPs in different rodents (common cuy, common rat, and naked mole rat) with different reported longevities and found a relation between species longevity and the number of predicted AMPs. The analysis shows as the longevity of the species is higher, the number of predicted AMPs is also higher. The pipeline is available as a web service. Full article
Show Figures

Figure 1

17 pages, 3474 KiB  
Article
K-mer-Based Human Gesture Recognition (KHGR) Using Curved Piezoelectric Sensor
by Sathishkumar Subburaj, Chih-Ho Yeh, Brijesh Patel, Tsung-Han Huang, Wei-Song Hung, Ching-Yuan Chang, Yu-Wei Wu and Po Ting Lin
Electronics 2023, 12(1), 210; https://doi.org/10.3390/electronics12010210 - 1 Jan 2023
Cited by 5 | Viewed by 2592
Abstract
Recently, human activity recognition (HAR) techniques have made remarkable developments in the field of machine learning. In this paper, we classify human gestures using data collected from a curved piezoelectric sensor, including elbow movement, wrist turning, wrist bending, coughing, and neck bending. The [...] Read more.
Recently, human activity recognition (HAR) techniques have made remarkable developments in the field of machine learning. In this paper, we classify human gestures using data collected from a curved piezoelectric sensor, including elbow movement, wrist turning, wrist bending, coughing, and neck bending. The classification process relies on data collected from a sensor. Machine learning algorithms enabled with K-mer are developed and optimized to perform human gesture recognition (HGR) from the acquired data to achieve the best results. Three machine learning algorithms, namely support vector machine (SVM), random forest (RF), and k-nearest neighbor (k-NN), are performed and analyzed with K-mer. The input parameters such as subsequence length (K), number of cuts, penalty parameter (C), number of trees (n_estimators), maximum depth of the tree (max_depth), and nearest neighbors (k) for the three machine learning algorithms are modified and analyzed for classification accuracy. The proposed model was evaluated using its accuracy percentage, recall score, precision score, and F-score value. We achieve promising results with accuracy of 94.11 ± 0.3%, 97.18 ± 0.4%, and 96.90 ± 0.5% for SVM, RF, and k-NN, respectively. The execution time to run the program with optimal parameters is 19.395 ± 1 s, 5.941 ± 1 s, and 3.832 ± 1 s for SVM, RF, and k-NN, respectively. Full article
(This article belongs to the Special Issue Selected Papers from Advanced Robotics and Intelligent Systems 2021)
Show Figures

Graphical abstract

9 pages, 705 KiB  
Article
Genomic Survey and Microsatellite Marker Investigation of Patagonian Moray Cod (Muraenolepis orangiensis)
by Eunkyung Choi, Seung Jae Lee, Euna Jo, Jinmu Kim, Steven J. Parker, Jeong-Hoon Kim and Hyun Park
Animals 2022, 12(13), 1608; https://doi.org/10.3390/ani12131608 - 22 Jun 2022
Cited by 3 | Viewed by 1972
Abstract
The Muraenolepididae family of fishes, known as eel cods, inhabits continental slopes and shelves in the Southern Hemisphere. This family belongs to the Gadiformes order, which constitutes one of the most important commercial fish resources worldwide, but the classification of the fish species [...] Read more.
The Muraenolepididae family of fishes, known as eel cods, inhabits continental slopes and shelves in the Southern Hemisphere. This family belongs to the Gadiformes order, which constitutes one of the most important commercial fish resources worldwide, but the classification of the fish species in this order is ambiguous because it is only based on the morphological and habitat characteristics of the fishes. Here, the genome of Patagonian moray cod was sequenced using the Illumina HiSeq platform, and screened for microsatellite motifs. The genome was predicted to be 748.97 Mb, with a heterozygosity rate of 0.768%, via K-mer analysis (K = 25). The genome assembly showed that the total size of scaffolds was 711.92 Mb and the N50 scaffold length was 1522 bp. Additionally, 4,447,517 microsatellite motifs were identified from the genome survey assembly, and the most abundant motif type was found to be AC/GT. In summary, these data may facilitate the identification of molecular markers in Patagonian moray cod, which would be a good basis for further whole-genome sequencing with long read sequencing technology and chromosome conformation capture technology, as well as population genetics. Full article
(This article belongs to the Section Animal Genetics and Genomics)
Show Figures

Figure 1

22 pages, 1715 KiB  
Article
Systems-Based Approach for Optimization of Assembly-Free Bacterial MLST Mapping
by Natasha Pavlovikj, Joao Carlos Gomes-Neto, Jitender S. Deogun and Andrew K. Benson
Life 2022, 12(5), 670; https://doi.org/10.3390/life12050670 - 30 Apr 2022
Viewed by 3149
Abstract
Epidemiological surveillance of bacterial pathogens requires real-time data analysis with a fast turnaround, while aiming at generating two main outcomes: (1) species-level identification and (2) variant mapping at different levels of genotypic resolution for population-based tracking and surveillance, in addition to predicting traits [...] Read more.
Epidemiological surveillance of bacterial pathogens requires real-time data analysis with a fast turnaround, while aiming at generating two main outcomes: (1) species-level identification and (2) variant mapping at different levels of genotypic resolution for population-based tracking and surveillance, in addition to predicting traits such as antimicrobial resistance (AMR). Multi-locus sequence typing (MLST) aids this process by identifying sequence types (ST) based on seven ubiquitous genome-scattered loci. In this paper, we selected one assembly-dependent and one assembly-free method for ST mapping and applied them with the default settings and ST schemes they are distributed with, and systematically assessed their accuracy and scalability across a wide array of phylogenetically divergent Public Health-relevant bacterial pathogens with available MLST databases. Our data show that the optimal k-mer length for stringMLST is species-specific and that genome-intrinsic and -extrinsic features can affect the performance and accuracy of the program. Although suitable parameters could be identified for most organisms, there were instances where this program may not be directly deployable in its current format. Next, we integrated stringMLST into our freely available and scalable hierarchical-based population genomics platform, ProkEvo, and further demonstrated how the implementation facilitates automated, reproducible bacterial population analysis. Full article
(This article belongs to the Special Issue Computational Analysis of Biomedical Data)
Show Figures

Figure 1

15 pages, 12811 KiB  
Article
Fractal Analysis of DNA Sequences Using Frequency Chaos Game Representation and Small-Angle Scattering
by Eugen Mircea Anitas
Int. J. Mol. Sci. 2022, 23(3), 1847; https://doi.org/10.3390/ijms23031847 - 6 Feb 2022
Cited by 17 | Viewed by 4421
Abstract
The fractal characteristics of DNA sequences are studied using the frequency chaos game representation (FCGR) and small-angle scattering (SAS) technique. The FCGR allows representation of the frequencies of occurrence of k-mers (oligonucleotides of length k) in the form of images. The [...] Read more.
The fractal characteristics of DNA sequences are studied using the frequency chaos game representation (FCGR) and small-angle scattering (SAS) technique. The FCGR allows representation of the frequencies of occurrence of k-mers (oligonucleotides of length k) in the form of images. The numerically encoded data are then used in a SAS analysis to enhance hidden features in DNA sequences. It is shown that the simulated SAS intensity allows us to obtain the fractal dimensions and scaling factors at various scales. These structural parameters can be used to distinguish unambiguously between the scaling properties of complex hierarchical DNA sequences. The validity of this approach is illustrated on several sequences from: Escherichia coli, Mouse mitochondrion, Homo sapiens mitochondrion and Human cosmid. Full article
Show Figures

Figure 1

Back to TopTop