Search for Articles

Article

1,465 Views

22 Pages

Improved Script Identification Algorithm Using Unicode-Based Regular Expression Matching Strategy

Mamtimin Qasim and
Wushour Silamu

Data2025, 10(4), 43;https://doi.org/10.3390/data10040043

-

25 March 2025

While script identification is the first step in many natural language processing and text mining tasks, at present, there is no open-source script identification algorithm for text. For this reason, we analyze the Unicode encoding of each type of sc...

Full Article

Article

4 Citations

3,769 Views

17 Pages

Robotic Writing of Arbitrary Unicode Characters Using Paintbrushes

David Silvan Zingrebe,
Jörg Marvin Gülzow and
Oliver Deussen

Robotics2023, 12(3), 72;https://doi.org/10.3390/robotics12030072

-

11 May 2023

Human handwriting is an everyday task performed regularly by most people. In the domain of robotic painting, multiple calligraphy machines exist which were built to replicate some aspects of human artistic writing; however, most projects are limited...

Full Article

Article

33 Pages

Improving the Time Efficiency of a Script Identification Algorithm Using a Unicode-Based Regular Expression Matching Strategy

Mamtimin Qasim,
Wushour Silamu and
Yasen Aizezi

Appl. Sci.2026, 16(4), 1714;https://doi.org/10.3390/app16041714

-

9 February 2026

Script identification is the first step in most multilingual text-processing systems. To improve the time efficiency of script identification algorithms, whether there is content written in a certain script in the text is first determined; if so, the...

Full Article

Article

1 Citations

4,170 Views

24 Pages

Convolutional Neural Network Based Ensemble Approach for Homoglyph Recognition

Md. Taksir Hasan Majumder,
Md. Mahabur Rahman,
Anindya Iqbal and
M. Sohel Rahman

Math. Comput. Appl.2020, 25(4), 71;https://doi.org/10.3390/mca25040071

-

21 October 2020

Homoglyphs are pairs of visual representations of Unicode characters that look similar to the human eye. Identifying homoglyphs is extremely useful for building a strong defence mechanism against many phishing and spoofing attacks, ID imitation, prof...

Full Article

Article

4 Citations

5,231 Views

20 Pages

Improving Scene Text Recognition for Indian Languages with Transfer Learning and Font Diversity

Sanjana Gunna,
Rohit Saluja and
Cheerakkuzhi Veluthemana Jawahar

J. Imaging2022, 8(4), 86;https://doi.org/10.3390/jimaging8040086

-

23 March 2022

Reading Indian scene texts is complex due to the use of regional vocabulary, multiple fonts/scripts, and text size. This work investigates the significant differences in Indian and Latin Scene Text Recognition (STR) systems. Recent STR works rely on...

Full Article

Article

1,248 Views

36 Pages

A Survey of Printable Encodings

Marco Botta,
Davide Cavagnino,
Alessandro Druetto,
Maurizio Lucenteforte and
Annunziata Marra

Algorithms2025, 18(8), 504;https://doi.org/10.3390/a18080504

-

12 August 2025

The representation of binary data in a compact, printable, efficient, and often human-readable format is essential in numerous computing applications, mainly driven by the limitations of systems and communication protocols not designed to handle arbi...

Full Article

Article

2,324 Views

13 Pages

CRank: Reusable Word Importance Ranking for Text Adversarial Attack

Xinyi Chen and
Bo Liu

Appl. Sci.2021, 11(20), 9570;https://doi.org/10.3390/app11209570

-

14 October 2021

Deep learning models have been widely used in natural language processing tasks, yet researchers have recently proposed several methods to fool the state-of-the-art neural network models. Among these methods, word importance ranking is an essential p...

Full Article

Article

5 Citations

4,442 Views

18 Pages

A Syllable-Based Technique for Uyghur Text Compression

Wayit Abliz,
Hao Wu,
Maihemuti Maimaiti,
Jiamila Wushouer,
Kahaerjiang Abiderexiti,
Tuergen Yibulayin and
Aishan Wumaier

Information2020, 11(3), 172;https://doi.org/10.3390/info11030172

-

23 March 2020

To improve utilization of text storage resources and efficiency of data transmission, we proposed two syllable-based Uyghur text compression coding schemes. First, according to the statistics of syllable coverage of the corpus text, we constructed a...

Full Article

Article

3 Citations

3,131 Views

16 Pages

Hiding the Source Code of Stored Database Programs

Vitalii Yesin,
Mikolaj Karpinski,
Maryna Yesina,
Vladyslav Vilihura and
Kornel Warwas

Information2020, 11(12), 576;https://doi.org/10.3390/info11120576

-

9 December 2020

The objective of the article is to reveal an approach to hiding the code of stored programs stored in the database. The essence of this approach is the complex use of the method of random permutation of code symbols related to a specific stored progr...

Full Article

Article

1 Citations

1,794 Views

11 Pages

The Design of a Script Identification Algorithm and Its Application in Constructing a Text Language Identification Dataset

Mamtimin Qasim,
Wushour Silamu and
Minghui Qiu

Data2024, 9(11), 134;https://doi.org/10.3390/data9110134

-

11 November 2024

Script identification is easier to implement than language identification, and its identification rate is very high. The fewer languages are identified when using a language identification algorithm, the higher the identification rate is. However, no...

Full Article

Article

3 Citations

4,043 Views

18 Pages

Transcription Alignment of Historical Vietnamese Manuscripts without Human-Annotated Learning Samples

Anna Scius-Bertrand,
Michael Jungo,
Beat Wolf,
Andreas Fischer and
Marc Bui

Appl. Sci.2021, 11(11), 4894;https://doi.org/10.3390/app11114894

-

26 May 2021

The current state of the art for automatic transcription of historical manuscripts is typically limited by the requirement of human-annotated learning samples, which are are necessary to train specific machine learning models for specific languages a...

Full Article

Article

15 Citations

10,577 Views

18 Pages

Homoglyph Attack Detection Model Using Machine Learning and Hash Function

Abdullah M. Almuhaideb,
Nida Aslam,
Almaha Alabdullatif,
Sarah Altamimi,
Shooq Alothman,
Amnah Alhussain,
Waad Aldosari,
Shikah J. Alsunaidi and
Khalid A. Alissa

J. Sens. Actuator Netw.2022, 11(3), 54;https://doi.org/10.3390/jsan11030054

-

16 September 2022

Phishing is still a major security threat in cyberspace. In phishing, attackers steal critical information from victims by presenting a spoofing/fake site that appears to be a visual clone of a legitimate site. Several Unicode characters are visually...

Full Article

Article

1 Citations

2,279 Views

27 Pages

Extracting Geoscientific Dataset Names from the Literature Based on the Hierarchical Temporal Memory Model

Kai Wu,
Zugang Chen,
Xinqian Wu,
Guoqing Li,
Jing Li,
Shaohua Wang,
Haodong Wang and
Hang Feng

ISPRS Int. J. Geo-Inf.2024, 13(7), 260;https://doi.org/10.3390/ijgi13070260

-

21 July 2024

Extracting geoscientific dataset names from the literature is crucial for building a literature–data association network, which can help readers access the data quickly through the Internet. However, the existing named-entity extraction methods...

Full Article

13 Results Found

Improved Script Identification Algorithm Using Unicode-Based Regular Expression Matching Strategy

Robotic Writing of Arbitrary Unicode Characters Using Paintbrushes

Improving the Time Efficiency of a Script Identification Algorithm Using a Unicode-Based Regular Expression Matching Strategy

Convolutional Neural Network Based Ensemble Approach for Homoglyph Recognition

Improving Scene Text Recognition for Indian Languages with Transfer Learning and Font Diversity

A Survey of Printable Encodings

CRank: Reusable Word Importance Ranking for Text Adversarial Attack

A Syllable-Based Technique for Uyghur Text Compression

Hiding the Source Code of Stored Database Programs

The Design of a Script Identification Algorithm and Its Application in Constructing a Text Language Identification Dataset

Transcription Alignment of Historical Vietnamese Manuscripts without Human-Annotated Learning Samples

Homoglyph Attack Detection Model Using Machine Learning and Hash Function

Extracting Geoscientific Dataset Names from the Literature Based on the Hierarchical Temporal Memory Model